Documente Academic
Documente Profesional
Documente Cultură
SH
Advanced Data Center Switching
T
NO
14.a
DO
Student Guide
LY
ON
E
US
408-745-2000
www.juniper.net
A
marks are the property of their respective owners.
Advanced Data Center Switching Student Guide, Revision 14.a
Copyright 2016 Juniper Networks, Inc. All rights reserved.
SH
Printed in USA.
Revision History:
Revision 14.aApril 2016
The information in this document is current as of the date listed above.
The information in this document has been carefully verified and is believed to be accurate for software Release 14.1X53.
T
Juniper Networks assumes no responsibilities for any inaccuracies that may appear in this document. In no event will Juniper Networks be liable for direct, indirect, special,
exemplary, incidental, or consequential damages resulting from any defect or omission in this document, even if advised of the possibility of such damages.
NO
DO
Juniper Networks reserves the right to change, modify, transfer, or otherwise revise this publication without notice.
YEAR 2000 NOTICE
Juniper Networks hardware and software products do not suffer from Year 2000 problems and hence are Year 2000 compliant. The Junos operating system has no known
time-related limitations through the year 2038. However, the NTP application is known to have some difficulty in the year 2036.
SOFTWARE LICENSE
The terms and conditions for using Juniper Networks software are described in the software license provided with the software, or to the extent applicable, in an agreement
executed between you and Juniper Networks, or Juniper Networks agent. By using Juniper Networks software, you indicate that you understand and agree to be bound by its
license terms and conditions. Generally speaking, the software license restricts the manner in which you are permitted to use the Juniper Networks software, may contain
prohibitions against certain uses, and may state conditions under which the license is automatically terminated. You should consult the software license for further details.
LY
ON
E
US
AL
RN
TE
IN
Contents
A RE
Chapter 1: Course Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
SH
Chapter 2: Next Generation Data Centers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
Traditional Multitier Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
Data Center Fabric Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
T
Chapter 3: IP Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1
NO
IP Fabric Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
IP Fabric Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12
IP Fabric Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-25
Configure an IP Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-30
Lab: IP Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-49
DO
Chapter 4: VXLAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1
Layer Connectivity Over a Layer 3 Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
VXLAN Using Multicast Control Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11
VXLAN Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-24
Lab: VXLAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-42
LY
iv Contents www.juniper.net
Course Overview
RE
This two-day course is designed to introduce various QFX5k and MX/vMX features including, but not limited to, IP Fabric,
Virtual eXtensible Local Area Network (VXLAN) Layer 2 and Layer 3 Gateways, VXLAN with Ethernet VPN (EVPN) signaling,
Data Center Interconnect (DCI) for a VXLAN overlay. Students will learn to configure and monitor these features that exist
A
on the Junos operating system running on the QFX5100 and vMX Series platform.
Through demonstrations and hands-on labs, students will gain experience configuring, monitoring, and analyzing the
SH
above features of the Junos OS. This course is based on software Release 14.1X53.
Intended Audience
This course benefits individuals responsible for configuring and monitoring switching features that exist on the Junos OS
running on the QFX5k and MX Series platforms, including individuals in professional services, sales and support
T
organizations, and the end users.
NO
Course Level
Advanced Data Center Switching (ADCX) is an advanced-level course.
Prerequisites
The following are the prerequisites for this course:
DO
Understanding of the OSI model;
Junos OS configuration experiencethe Introduction to the Junos Operating System (IJOS) course or
equivalent;
Advanced routing knowledgethe Advanced Junos Enterprise Routing (AJER) course or equivalent; and
Intermediate switching knowledgethe Junos Enterprise Switching Using Enhanced Layer 2 Software
(JEX-ELS) and Data Center Switching (DCX) courses or equivalent.
Objectives
LY
RE
Day 1
Chapter 1: Course Introduction
A
Chapter 2: Next Generation Data Centers
SH
Chapter 3: IP Fabric
Lab: IP Fabric
Chapter 4: VXLAN
Lab: VXLAN
T
Day 2
NO
Chapter 5: EVPN
Lab: VXLAN with EVPN Signaling
Chapter 6: Data Center Interconnect
Lab: Data Center Interconnect
DO
LY
ON
E
US
AL
RN
TE
IN
RE
CLI and GUI Text
Frequently throughout this course, we refer to text that appears in a command-line interface (CLI) or a graphical user
A
interface (GUI). To make the language of these documents easier to read, we distinguish GUI and CLI text from chapter
text according to the following table.
SH
Style Description Usage Example
Franklin Gothic Normal text. Most of what you read in the Lab Guide
and Student Guide.
T
Courier New Console text:
commit complete
Screen captures
NO
Noncommand-related Exiting configuration mode
syntax
GUI text elements:
Select File > Open, and then click
Menu names Configuration.conf in the
DO
Filename text box.
Text field entry
CLI Input Text that you must enter. lab@San_Jose> show route
GUI Input Select File > Save, and type
config.ini in the Filename field.
E
Finally, this course distinguishes between regular text and syntax variables, and it also distinguishes between syntax
variables where the value is already assigned (defined variables) and syntax variables where you must assign the value
(undefined variables). Note that these styles can be combined with the input style as well.
AL
CLI Undefined Text where the variables value is Type set policy policy-name.
the users discretion or text where
ping 10.0.x.y
the variables value as shown in
GUI Undefined the lab guide might differ from the Select File > Save, and type
TE
RE
Education Services Offerings
You can obtain information on the latest Education Services offerings, course dates, and class locations from the World
A
Wide Web by pointing your Web browser to: http://www.juniper.net/training/education/.
About This Publication
SH
The Advanced Data Center Switching Student Guide was developed and tested using software Release 14.1X53.
Previous and later versions of software might behave differently so you should always consult the documentation and
release notes for the version of code you are running before reporting errors.
This document is written and maintained by the Juniper Networks Education Services development team. Please send
T
questions and suggestions for improvement to training@juniper.net.
NO
Technical Publications
You can print technical manuals and release notes directly from the Internet in a variety of formats:
Go to http://www.juniper.net/techpubs/.
Locate the specific software or hardware release and title you need, and choose the format in which you
DO
want to view or print the document.
Documentation sets and CDs are available through your local Juniper Networks sales office or account representative.
Juniper Networks Support
For technical support, contact Juniper Networks at http://www.juniper.net/customers/support/, or at 1-888-314-JTAC
(within the United States) or 408-745-2121 (outside the United States).
LY
ON
E
US
AL
RN
TE
IN
T
NO
Chapter 1: Course Introduction
DO
LY
ON
E
US
AL
RN
TE
IN
Advanced Data Center Switching
A RE
SH
T
NO
DO
LY
ON
We Will Discuss:
Objectives and course content information;
E
A RE
SH
T
NO
DO
LY
ON
Introductions
The slide asks several questions for you to answer during class introductions.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Course Contents
The slide lists the topics we discuss in this course.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Prerequisites
The slide lists the prerequisites for this course.
E
US
AL
RN
TE
IN
RE
A
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
Additional Resources
The slide provides links to additional resources available to assist you in the installation, configuration, and operation of
Juniper Networks products.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Satisfaction Feedback
Juniper Networks uses an electronic survey system to collect and analyze your comments and feedback. Depending on the
class you are taking, please complete the survey at the end of the class, or be sure to look for an e-mail about two weeks
E
from class completion that directs you to complete an online survey form. (Be sure to provide us with your current e-mail
address.)
US
Submitting your feedback entitles you to a certificate of class completion. We thank you in advance for taking the time to
help us improve our educational offerings.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
staff with deep technical and industry knowledge, providing you with instructor-led hands-on courses in the classroom and
online, as well as convenient, self-paced eLearning courses. In addition to the courses shown on the slide, Education
US
Services offers training in automation, E-Series, firewall/VPN, IDP, network design, QFabric, support, and wireless LAN.
Courses
Juniper Networks courses are available in the following formats:
AL
Learning bytes: Short, topic-specific, video-based lessons covering Juniper products and technologies
Find the latest Education Services offerings covering a wide range of platforms at
http://www.juniper.net/training/technical_education/.
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
hands-on configuration and troubleshooting exams. Successful candidates demonstrate a thorough understanding of
Internet and security technologies and Juniper Networks platform configuration and troubleshooting skills.
US
Associate-level, Specialist-level, and Professional-level exams are computer-based exams composed of multiple choice
questions administered at Pearson VUE testing centers worldwide.
Expert-level exams are composed of hands-on lab exercises administered at select Juniper Networks testing centers. Please
visit the JNCP website at http://www.juniper.net/certification for detailed exam information, exam pricing, and exam
TE
registration.
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
Junos Genius
The Junos Genius application takes certification exam preparation to a new level. With Junos Genius you can practice for
your exam with flashcards, simulate a live exam in a timed challenge, and even build a virtual network with device
E
achievements earned by challenging Juniper instructors. Download the app now and Unlock your Genius today!
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Find Us Online
The slide lists some online resources to learn and share information about Juniper Networks.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Any Questions?
If you have any questions or concerns about the class you are attending, we suggest that you voice them now so that your
instructor can best address your needs during class.
E
US
AL
RN
TE
IN
T
NO
Chapter 2: Next Generation Data Centers
DO
LY
ON
E
US
AL
RN
TE
IN
Advanced Data Center Switching
A RE
SH
T
NO
DO
LY
ON
We Will Discuss:
The benefits and challenges of the traditional multitier architecture;
E
The networking requirements that are requiring a change to the design of a data center; and
The various data center fabric architectures.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
Multiple Tiers
Legacy data centers are often hierarchical and consist of multiple layers. The diagram on the slide illustrates the typical
layers, which include access, distribution (sometimes referred to as aggregation), and core. Each of these layers performs
E
unique responsibilities. We cover the functions of each layer on a subsequent slide in this section.
US
Hierarchical networks are designed in a modular fashion. This inherent modularity facilitates change and makes this design
option quite scalable. When working with a hierarchical network, the individual elements can be replicated as the network
grows. The cost and complexity of network changes is generally confined to a specific portion (or layer) of the network rather
than to the entire network.
Because functions are mapped to individual layers, faults relating to a specific function can be isolated to that functions
AL
corresponding layer. The ability to isolate faults to a specific layer can greatly simplify troubleshooting efforts.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Functions of Layers
The individual layers usually represent specific functions found within a network. It is often mistakenly thought that the
access, distribution (or aggregation), and core layers must exist in clear and distinct physical devices, but this is not a
E
The slide highlights the access, aggregation, and core layers and provides a brief description of the functions commonly
implemented in those layers. If CoS is used in a network, it should be incorporated consistently in all three layers.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Since using a hierarchical implementation does not require the use of proprietary features or protocols, a
multitier topology can be constructed using equipment from multiple vendors.
US
A multitier implementation allows flexible placement of a variety of switching platforms. The simplicity of the
protocols used does not require specific Junos versions or platform positioning.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
The legacy multitier switching architecture cannot provide todays applications and users with predictable
latency and uniform bandwidth. This problem is made worse when virtualization is introduced, where the
US
performance of virtual machines (VMs) depends on the physical location of the servers hosting those VMs.
The management of an ever growing data center is becoming more and more taxing administratively speaking.
While the north to south boundaries have been fixed for years, the east to west boundaries have not stopped
growing. This growth, of the compute, storage, and infrastructure, requires a new management approach.
AL
The power consumed by networking gear represents a significant proportion of the overall power consumed in
the data center. This challenge is particularly important today, when escalating energy costs are putting
additional pressure on budgets.
The increasing performance and densities of modern CPUs has led to an increase in network traffic. The
RN
network is often not equipped to deal with the large bandwidth demands and increased number of media
access control (MAC) addresses and IP addresses on each network port.
Separate networks for Ethernet data and storage traffic must be maintained, adding to the training and
management budget. Siloed Layer 2 domains increase the overall costs of the data center environment. In
TE
addition, outages related to the legacy behavior of the Spanning Tree Protocol (STP), which is used to support
these legacy environments, often results in lost revenue and unhappy customers.
Given these challenges, along with others, data center operators are seeking solutions.
IN
A RE
SH
T
NO
DO
LY
ON
Resource Utilization
In the multitier topology displayed on the slide, you can see that almost half the links are not utilized. In this example you
would also need to be running some type of spanning tree protocol (STP) to avoid loops which would introduce a delay with
E
your network convergence as well as introduce significant STP control traffic taking up valuable bandwidth.
US
This topology is relatively simple but allows us to visualize the lack of resource utilization. Imagine a data center with a
hundred racks of servers with a hundred top of rack access switches. The access switches all aggregate up to the
core/distribution switches including redundant connections. In this much larger and complicated network you would have
1000s of physical cable connections that are not being utilized. Now imagine these connections are fiber, in addition to the
unused cables you would also have two transceivers per connection that are not being used. Because of the inefficient use
AL
of physical components there is a significant amount of usable bandwidth that is sitting idle.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
Understanding why these changes are being implemented is important when trying to understand the needs of the
customer. There are a few reasons impacting this change including:
US
Application Flows: More east-west traffic communication is happening in data centers. With todays
applications, many requests can generate a lot of traffic between devices in a single data center. Basically a
single user request triggers a barrage of additional request to other devices. Then go here, get this, then go
here get that, behavior of many applications is being done on such a large scale today that it is driving data
AL
those customers that are forward looking and eventually want to incorporate some level of virtualization.
Everything as a service: To be cost effective, a data center offering hosting services must be easy to scale out
and scale back as demands change. The data center should be very agile and easy to deploy new services
quickly.
TE
IN
A RE
SH
T
NO
DO
LY
ON
server ports.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
redundant connections to your servers as well as offers dual control planes. In addition to the access layer, MC-LAGs are also
commonly deployed at the core layer. When MC-LAG is deployed in an Active/Active fashion, both links between the attached
US
device and the MC-LAG peers are active and available for forwarding traffic. Using MC-LAG eliminates the need to run STP on
member links and depending on the design, can eliminate the need for STP all together.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
devices into a single logical device, inside of a fabric architecture. A VCF is constructed using a spine-and-leaf architecture. In
the spine-and-leaf architecture, each spine device is interconnected to each leaf device. A VCF supports up to 32 total
US
A RE
SH
T
NO
DO
LY
ON
QFabric
The QFabric System is composed of multiple components working together as a single switch to provide high-performance,
any-to-any connectivity and management simplicity in the data center. The QFabric System flattens the entire data center
E
network to a single tier where all access points are equal, eliminating the effects of network locality and making it the ideal
network foundation for cloud-ready, virtualized data centers. QFabric is a highly scalable system that improves application
US
performance with low latency and converged services in a non-blocking, lossless architecture that supports Layer 2, Layer 3,
and Fibre Channel over Ethernet (FCoE) capabilities. The reason you can consider the QFabric system as a single system is
that the Director software running on the Director group allows the main QFabric system administrator to access and
configure every device and port in the QFabric system from a single location. Although you configure the system as a single
entity, the fabric contains four major hardware components. The hardware components can be chassis-based, group-based,
AL
A RE
SH
T
NO
DO
LY
ON
Junos Fusion
Junos Fusion is a Juniper Networks Ethernet fabric architecture designed to provide a bridge from legacy networks to
software-defined cloud networks. With Junos Fusion, service providers and enterprises can reduce network complexity and
E
operational costs by collapsing underlying network elements into a single, logical point of management. The Junos Fusion
architecture consists of two major components: aggregation devices and satellite devices. With this structure it can also be
US
classified as a spine and leaf architecture. These components work together as a single switching system, flattening the
network to a single tier without compromising resiliency. Data center operators can build individual Junos Fusion pods
comprised of a pair of aggregation devices and a set of satellite devices. Each pod is a collection of aggregation and satellite
devices that are managed as a single device. Pods can be smallfor example, a pair of aggregation devices and a handful of
satellitesor large with up to 64 satellite devices based on the needs of the data center operator.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
IP Fabric
An IP Fabric is one of the most flexible and scalable data center solutions available. Because an IP Fabric operates strictly
using Layer 3, there are no proprietary features or protocols being used so this solution works very well with data centers
E
that must accommodate multiple vendors. One of the most complicated tasks in building an IP Fabric is assigning all of the
details like IP addresses, BGP AS numbers, routing policy, loopback address assignments, and many other implementation
US
details.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
requirement that is becoming less and less necessary is the ability of the underlying switch fabric to carry native Ethernet
frames between VMs/server in different racks. Some of the major reasons for this shift are...
US
1. IP-only Data: Many data centers simply need IP connectivity between racks of equipment. There is less and less
need for the stretching of Ethernet networks over the fabric. For example, one popular compute and storage
methodology is Apaches Hadoop. Hadoop allows for a large set of data (i.e. like a single Tera-bit file) to be
stored in chunks across many servers in a data center. Hadoop also allows for the stored chunks of data to be
processed in parallel by the same servers they are stored upon. The connectivity between the possibly
AL
virtual eXtensible local area network (VXLAN), multiprotocol label switching (MPLS), and generic routing
encapsulation (GRE) are some of the common tunneling protocols used to transport Layer 2 frames of the
fabric of a data center. One of the benefits of overlay networking is that when there is a change to layer 2
connectivity between VMs/servers (the overlay network), the underlying fabric (underlay network) can remain
relatively untouched and unaware of the changes occurring in the overlay network.
TE
IN
A RE
SH
T
NO
DO
LY
ON
requirements including load balancing over equal cost paths (assuming Virtual Chassis Fabric) as well as having no blocked
spanning tree ports in the network. However, this topology does not solve the VM agility problem or the 802.1q VLAN overlap
US
problem. Also, as 802.1q VLANs are added to the virtual switches, those same VLANs must be provisioned on the underlay
network. Managing the addition, removal, and movement of VMs (and their VLANs) for the 1000s of customers would be a
nightmare for the operators of the underlay network.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Overlay Networking
Overlay networking can help solve many of the requirements and problems discussed in the previous slides. This slide shows
the addition of an overlay network that includes the use of VXLAN. The overlay network consists of the virtual switches and
E
the VXLAN tunnel endpoints (VTEPs). A VTEP will encapsulate the Ethernet frames that it receives from the virtual switch into
IP and forward the resulting IP packet to the remote VTEP. The underlay network simply needs to forward IP packets between
US
VTEPs. The receiving VTEP will de-encapsulate the VXLAN IP packets and then forward the resulting Ethernet Frame to the
appropriate VM. Adding and removing VMs from the data center has no effect on the underlay network. The underlay
network simply needs to provide IP connectivity between the VTEPs.
When implementing the underlay network in this scenario, you have a few choices. You can use an Ethernet fabric like Virtual
AL
Chassis (VC), Virtual Chassis Fabric (VCF), or Junos Fusion. All of these are valid solutions. Because all of the traffic crossing
the underlay network is IP, the option for an IP fabric becomes available. The choice of underlay network comes down to
scale and future growth. An IP fabric is considered to be the most scalable underlay solution.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
We Discussed:
The benefits and challenges of the traditional multitier architecture;
E
The networking requirements that are requiring a change to the design of a data center; and
The various data center fabric architectures.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Review Questions
1.
E
2.
US
3.
AL
RN
TE
IN
RE
1.
Some of the challenges of the traditional data center designs are the slow convergence times of xSTPs as well as the wasted resources of
unused (blocked by xSTP) interfaces.
A
2.
Some of the applications that are driving change in the data center are multitenancy, increase in east-to-west traffic, Hadoop, and
SH
overlay networking.
3.
Layer 2 networks can be stretched over an IP network using an overlay like VXLAN or GRE.
T
NO
DO
LY
ON
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
E
US
AL
RN
TE
IN
T
NO
Chapter 3: IP Fabric
DO
LY
ON
E
US
AL
RN
TE
IN
Advanced Data Center Switching
RE
A
SH
T
NO
DO
LY
ON
We Will Discuss:
Routing in an IP Fabric;
E
A RE
SH
T
NO
DO
LY
ON
IP Fabric Overview
The slide lists the topics we will discuss. We discuss the highlighted topic first.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
IP Fabric
An IP Fabric is one of the most flexible and scalable data center solutions available. Because an IP Fabric operates strictly
using Layer 3, there are no proprietary features or protocols being used so this solution works very well with data centers
E
that must accommodate multiple vendors. Some of the most complicated tasks in building an IP Fabric are assigning all of
the details like IP addresses, BGP AS numbers, routing policy, loopback address assignments, and many other
US
implementation details.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
three-stage architecture, an ingress stage, a middle stage, and an egress stage. The theory is that there are multiple paths
for a call to be switched through the network such that calls will always be connected and not "blocked" by another call. The
US
term Clos fabric came about later as people began to notice that the pattern of links looked like threads in a woven piece
of cloth.
You should notice that the goal of the design is to provide connectivity from one ingress crossbar switch to an egress
crossbar switch. Notice that there is no need for connectivity between crossbar switches that belong to the same stage.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
do not present the topology with 3 distinct stages as shown on this slide. Most diagrams show an IP Fabric with the Ingress
and Egress stage combined as a single stage. It would be like taking the top of the diagram and folding it over onto itself with
US
all Spines nodes on top and all Leaf nodes on the bottom of the diagram (see the next slide).
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
with multiple paths to all other devices. An important fact to keep in mind is that a member switch has no idea of its location
(Spine or Leaf) in an IP Fabric. The Spine or Leaf function is simply a matter of a devices physical location in the fabric. In
US
general, the choice of router to be used as a Spine nodes should be partially based on the interface speeds and number of
ports that it supports. The example on the slide shows an example where every Spine node is a QFX5100-24q. The
QFX5100-24q supports (32) 40GbE interfaces and was literally designed by Juniper to be a Spine node.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
medium to large deployment. Although we do not cover the configuration of a 5-stage fabric, you should know that the
configuration of a 5-stage fabric is quite complicated.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
pertinent features for a Spine node include overlay networking support, Layer 2 and Layer 3 VXLAN Gateway support, and
number of VLANs supported.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
IP Fabric Routing
The slide highlights the topic we discuss next.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
destination.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
homed hosts and two path available for multihomed hosts. It just so happens that getting these routes and associated next
hops into the forwarding table of a Spine node can be tricky. The rest of the chapter discusses the challenges as well as the
US
A RE
SH
T
NO
DO
LY
ON
Layer 3 Connectivity
Remember that your IP Fabric will be forwarding IP data only. Each node will be an IP router. In order to forward IP packets
between routers, they need to exchange IP routes. So, you have to make a choice between routing protocols. You want to
E
ensure that your choice of routing protocol is scalable and future proof. As you can see by the chart, BGP is the natural
choice for a routing protocol.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
IBGP: Part 1
IBGP is a valid choice as the routing protocol for your fabric. IBGP peers almost always peer to loopback addresses as
opposed to physical interface addresses. In order to establish a BGP session (over a TCP session), a router must have a route
E
to the loopback address of its neighbor. To learn the route to a neighbor an Interior Gateway Protocol (IGP) like OSPF must be
enabled in the network. One purpose of enabling an IGP is simply to ensure every router knows how to get to the loopback
US
address of all other routers. Another problem that OSPF will solve is determining all of the equal cost paths to remote
destinations. For example, router A will determine from OSPF that there are 2 equal cost paths to reach router B. Now
router A can load share traffic destined for router Bs loopback address (IBGP learned routes, see next few slides) across the
two links towards router B.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
IBGP: Part 2
There is a requirement in an IBGP network that if one IBGP router needs to advertise an IBGP route, then every other IBGP
router must receive a copy of that route (to prevent black holes). One way to ensure this happens is to have every IBGP router
E
peer with every other IBGP router (a full mesh). This works fine but it does not scale (i.e., add a new router to your IP fabric
and you will have to configure every router in your IP fabric with a new peer). There are two ways to help scale the full mesh
US
issue; route reflection or confederations. Most often, it is route reflection that is chosen (it is easy to implement). It is
possible to have redundant route reflectors as well (shown on the slide). It is best practice to configure one or more of the
Spine nodes as route reflectors.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
IBGP: Part 3
Note: The next few slides will highlight the problem faced by a Spine node (router D) that is NOT a route reflector.
E
You must build your IP Fabric such that all routers load share traffic over equal cost paths (when they exist) towards remote
networks. Each router should be configured for BGP multipath so that they will load share when multiple BGP routes exist.
US
The slide shows that router A and B advertise the 10.1/16 network to RR-A. RR-A will use both routes for forwarding
(multipath) but will chose only one of those routes (the one from router B because it B has the lowest router ID) to send to
router C (a Leaf node) and router D (a Spine node). Router C and router D will receive the route for 10.1/16. Both copies will
have a BGP next hop of router Bs loopback address. This is the default behavior of route advertisement and selection in the
IBGP with route reflection scenario.
AL
Did you notice the load balancing problem (Hint: the problem is not on router C)? Since router C has two equal cost paths to
get to router B (learned from OSPF), router C will load share traffic to 10.1/16 over the two uplinks towards the Spine routers.
The load balancing problem lies on router D. Since router D received a single route that has a BGP next hop of router Bs
loopback, it forwards all traffic destined to 10.1/16 towards router B. The path to router A (which is an equal cost path to
RN
10.1/16) will never be used in this case. The next slide discusses the solution to this problem.
It should be worth noting that although router C has no problem load sharing towards the 10.1/16 network, if router B were
to fail, it may take some time for router C to learn about the router through router A. The next slide discusses the solution to
this problem.
TE
IN
A RE
SH
T
NO
DO
LY
ON
IBGP: Part 4
The problem on RR-A is that it sees the routes received from routers A and B, 10.1/16, as a single route that has been
received twice. If an IBGP router receives different versions of the same route it is supposed to make a choice between them
E
and then advertise the one, chosen route to its appropriate neighbors. One solution to this problem is to make every Spine
node a route reflector. This would be fine in a small fabric but probably would not make sense when there are 10s of Spine
US
nodes. Another option would be to make each of the advertisements from router A and B look like unique routes. How can we
make the multiple advertisements of 10.1/16 from router A and B appear to be unique routes? There is a draft RFC
(draft-ietf-idr-add-paths) that defines the ADD-PATH capability which does just that; makes the advertisements look unique.
All routers in the IP Fabric should support this capability for it to work. Once enabled, routers advertise and evaluate routes
based on a tuple of the network and its path ID. In the example, router A and B advertise the 10.1/16 route. However, this
AL
time, every router supports the ADD-PATH capability, RR-A attaches a unique path ID to each route and is able to advertise
both routes to all clients including router D. When the routes arrive on the clients, the clients install both routes in its routing
table (allowing them to load share towards routers A and B.) Although, router C was already able to load share without the
additional route, router C will be able to continue forwarding traffic to 10.1/16 even in the event of a failure of either router A
RN
or router B.
TE
IN
A RE
SH
T
NO
DO
LY
ON
EBGP: Part 1
EBGP is also a valid design to use in your IP Fabric. You will notice that the load balancing problem is much easier to fix in the
EBGP scenario. For example, there will be no need for the routers to support any draft RFCs! Generally, each router in an IP
E
Fabric should be in its own unique AS. You can use AS numbers from the private or public range or, if you will need thousands
of AS numbers, you can use 32-bit AS numbers.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
EBGP: Part 2
In an EBGP-based fabric, there is no need for route reflectors or an IGP. The BGP peering sessions parallel the physical
wiring. For example, every Leaf node has a BGP peering session with every Spine node. There is no leaf-to-leaf or
E
spine-to-spine BGP sessions just like there is no leaf-to-leaf or spine-to-spine physical connectivity. EBGP peering is done
using the physical interface IP addresses (not loopback interfaces). To enable proper load balancing, all routers need to be
US
configured for multipath multiple-as as well as a load balancing policy. Both of these configurations will be covered
later in this chapter.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
EBGP: Part 3
The slide shows that the router in AS64516 and AS64517 are advertising 10.1/16 to their 2 EBGP peers. Because
multipath multiple-as is configured on all routers, the receiving routers in AS64512 and AS64513 will install both
E
routes in their routing table and load share traffic destined to 10.1/16.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
EBGP: Part 4
The slide shows that the routers in AS64512 and AS64513 are advertising 10.1/16 to all of their EBGP peers (all Leaf
nodes). Since multipath multiple-as is configured on all routers, the receiving router in the slide, the router in
E
AS64514, will install both routes in its routing table and load share traffic destined to 10.1/16.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Best Practices
When enabling an IP fabric you should follow some best practices. Remember, two of the main goals of an IP fabric design
(or a Clos design) is to provide a non-blocking architecture that also provides predictable load-balancing behavior.
E
All Spine nodes should be the exact same type of router. They should be the same model and they should also
have the same line cards installed. This helps the fabric to have a predictable load balancing behavior.
All Leaf nodes should be the exact same type of router. Leaf nodes do not have to be the same router as the
Spine nodes. Each Leaf node should be the same model and they should also have the same line cards
AL
installed. This helps the fabric to have a predictable load balancing behavior.
Every Leaf node should have an uplink to every Spine node. This helps the fabric to have a predictable load
balancing behavior.
RN
All uplinks from Leaf node to Spine node should be the exact same speed. This helps the fabric to have
predictable load balancing behavior and also helps with the non-blocking nature of the fabric. For example, let
us assume that a Leaf has one 40GbE uplink and one 10GbE uplink to the Spine. When using the combination
of OSPF (for loopback interface advertisement and BGP next hop resolution) and IBGP, when calculating the
shortest path to the BGP next hop, the bandwidth of the links will be taken into consideration. OSPF will most
TE
likely always chose the 40GbE interface during its shortest path first (SPF) calculation and use the interface for
forwarding towards remote BGP next hops. This essentially blocks the 10GbE interface from ever being used. In
the EBGP scenario, the bandwidth will not be taken into consideration, so traffic will be equally load shared over
the two different speed interfaces. Imagine trying to equally load share 60 Gbps of data over the two links, how
IN
will the 10GbE interface handle 30 Gbps of traffic? The answer is...it wont.
A RE
SH
T
NO
DO
LY
ON
IP Fabric Scaling
The slide highlights the topic we discuss next.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Scaling
To increase the overall throughput of an IP Fabric, you simply need to increase the number of Spine devices (and the
appropriate uplinks from the Leaf nodes to those Spine nodes). If you add one more Spine node to the fabric, you will also
E
have to add one more uplink to each Leaf node. Assuming that each uplink is 40GbE, each Leaf node can now forward an
extra 40Gbps over the fabric.
US
Adding and removing both server-facing ports (downlinks from the Leaf nodes) and Spine nodes will affect the
oversubscription (OS) ratio of a fabric. When designing the IP fabric, you must understand OS requirements of your data
center. For example, does your data center need line rate forwarding over the fabric? Line rate forwarding would equate to
1-to-1 (1:1) OS. That means the aggregate server-facing bandwidth is equal to the aggregate uplink bandwidth. Or, maybe
AL
your data center would work perfectly fine with a 3:1 OS of the fabric. That is, the aggregate server-facing bandwidth is 3
times that of the aggregate uplink bandwidth. Most data centers will probably not require to design around a 1:1 OS. Instead,
you should make a decision on an OS ratio that makes the most sense based on the data centers normal bandwidth usage.
The next few slides discuss how to calculate OS ratios of various IP fabric designs.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
3:1 Topology
The slide shows a basic 3:1 OS IP Fabric. All Spine nodes, four in total, are qfx5100-24q routers that each have (32) 40GbE
interfaces. All leaf nodes, 32 in total, are qfx5100-48s routers that have (6) 40GbE uplink interfaces and (48) 10GbE
E
server-facing interfaces. Each of the (48) 10GbE ports for all 32 Spine nodes will be fully utilized (i.e., attached to
downstream servers). That means that the total server-facing bandwidth is 48 x 32 x 10Gbps which equals 15360 Gbps.
US
Each of the 32 Leaf nodes has (4) 40GbE Spine-facing interfaces. That means, that the total uplink bandwidth is 4 x 32 x
40Gbps which equals 5120 Gbps. The OS ratio for this fabric is 15360:5120 or 3:1.
An interesting thing to note is that if you remove any number of Leaf nodes, the OS ratio does not change. For example, what
would happen to the OS ratio if their were only 31 nodes. The server facing bandwidth would be 48 x 31 x 10Gbps which
AL
equals 14880 Gbps. The total uplink bandwidth is 4 x 31 x 40Gbps which equals 4960 Gbps. The OS ratio for this fabric is
14880:4960 or 3:1. This fact actually makes your design calculations very simple. Once you decide on an OS ratio and
determine the number of Spine nodes that will allow that ratio, you can simply add and remove Leaf nodes from the topology
without effecting the original OS ratio of the fabric.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
2:1 Topology
The slide shows a basic 2:1 OS IP Fabric in which two Spine nodes were added to the topology from the last slide. All Spine
nodes, six in total, are qfx5100-24q routers that each have (32) 40GbE interfaces. All leaf nodes, 32 in total, are
E
qfx5100-48s routers that have (6) 40GbE uplink interfaces and (48) 10GbE server-facing interfaces. Each of the (48) 10GbE
ports for all 32 Spine nodes will be fully utilized (i.e., attached to downstream servers). That means that the total
US
server-facing bandwidth is still 48 x 32 x 10Gbps which equals 15360 Gbps. Each of the 32 Leaf nodes has (6) 40GbE
Spine-facing interfaces. That means, that the total uplink bandwidth is 6 x 32 x 40Gbps which equals 7680 Gbps. The OS
ratio for this fabric is 15360:7680 or 2:1.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
1:1 Topology
The slide shows a basic 1:1 OS IP Fabric. All Spine nodes, six in total, are qfx5100-24q routers that each have (32) 40GbE
interfaces. All leaf nodes, 32 in total, are qfx5100-48s routers that have (6) 40GbE uplink interfaces and (48) 10GbE
E
server-facing interfaces. There are many ways that an 1:1 OS ratio can be attained. In this case, although the Leaf nodes
each have (48) 10GbE server-facing interfaces, we are only going to allow 24 servers to be attached at any given moment.
US
That means that the total server-facing bandwidth is still 24 x 32 x 10Gbps which equals 7680 Gbps. Each of the 32 Leaf
nodes has (6) 40GbE Spine-facing interfaces. That means, that the total uplink bandwidth is 6 x 32 x 40Gbps which equals
7680 Gbps. The OS ratio for this fabric is 7680:7680 or 1:1.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Configure an IP Fabric
The slide highlights the topic we discuss next.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Example Topology
The slide shows the example topology that will be used in the subsequent slides. Notice that each router is the single
member of a unique autonomous system. Each router will peer using EBGP with its directly attached neighbors using the
E
physical interface addresses. Host A is singly homed to the router in AS 64514. Host B is multihomed to the routers in AS
64515 and AS 64516.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
Verifying Neighbors
Once you configure BGP neighbors, you can check the status of the relationships using either the show bgp summary or
show bgp neighbor command.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Routing Policy
Once BGP neighbors are established in the IP Fabric, each router must be configured to advertise routes to its neighbors and
into the fabric. For example, as you attach a server to a top-of-rack (TOR) switch/router (which is usually a Leaf node of the
E
fabric) you must configure the TOR to advertise the servers IP subnet to the rest of the network. The first step in advertising
route is to write a policy that will match on a route and then accept that route. The slide shows the policy that must be
US
A RE
SH
T
NO
DO
LY
ON
Applying Policy
After configuring a policy, the policy must be applied to the router EBGP peers. The slide shows the direct policy being
applied as an export policy as64515s EBGP neighbors.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
Default Behavior
Assuming the routers in AS 64515 and AS 64516 are advertising Host Bs subnet, the slide shows the default routing
behavior on a Spine node. Notice that the Spine node has received two advertisements for the same subnet. However,
E
because of the default behavior of BGP, the Spine node chooses a single route to select as the active route in the routing
table (you can tell which is the active route because of the asterisk). Based on what is shown in the slide, the Spine node will
US
send all traffic destined for 10.1.2/24 over the ge-0/0/2 link. The Spine node will not load share over the two possible next
hops by default.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
Verify Multipath
View the routing table to see the results of the multipath statement. As you can see the active BGP route now has two next
hops that can be use for forwarding. Do you think the router is using both next hops for forwarding?
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
the Spine node is continuing to only forward traffic destined to 10.1.2/24 over a single link.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
Results
The output shows that after applying the load balancing policy to the forwarding table, all next hops associated with active
routes in the routing table have been copied into the forwarding table.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
AS 64514
The slide shows the BGP and policy configuration for the router in AS 64514.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
AS 64515
The slide shows the BGP and policy configuration for the router in AS 64515.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
AS 64512
The slide shows the BGP and policy configuration for the router in AS 64512.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
We Discussed:
Routing in an IP Fabric;
E
A RE
SH
T
NO
DO
LY
ON
Review Questions
1.
E
2.
US
3.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Lab: IP Fabric
The slide provides the objective for this lab.
E
US
AL
RN
TE
IN
RE
1.
Some of the Juniper Networks products that can be used in the Spine position of an IP Fabric are MX, QFX10k, and QFX5100 Series
routers.
A
2.
Routing should be implemented in such a way that when multiple, equal physical paths exist between two points data traffic should be
SH
load-shared over those paths to reach those two points.
3.
To allow a BGP speaker to install more than one next hop in the routing table when the same route is received from two or more
neighbors, multipath must be enabled.
T
NO
DO
LY
ON
E
US
AL
RN
TE
IN
T
NO
Chapter 4: VXLAN
DO
LY
ON
E
US
AL
RN
TE
IN
Advanced Data Center Switching
RE
A
SH
T
NO
DO
LY
ON
We Will Discuss:
Reasons why you would use VXLAN in your data center;
E
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
Layer 2 Apps
The needs of the applications that run on the servers in a data center usually drive the designs of those data centers. There
are many server-to-server applications that have strict requirements layer 2 connectivity between servers. A switched
E
infrastructure that is built around xSTP or a layer 2 fabric (like Juniper Networks Virtual Chassis Fabric or Junos Fusion) is
perfectly suited for this type of connectivity. This type of infrastructure allow for broadcast domains to be stretched across
US
IP Fabric
Many of todays next generation data centers are being built around IP Fabrics which, as their name implies, provide IP
AL
connectivity between the racks of a data center. How can a next generation data center based on IP-only connectivity
support the layer 2 requirements of the traditional server-to-server applications? The rest of this section of this chapter will
discuss the possible solutions to the layer 2 connectivity problem.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Layer 2 VPNs
One possible solution to providing layer 2 connectivity over an IP-based data center would be to implement some form of
layer 2 virtual private network (VPN) on the routers that directly attach to the servers in the rack. Usually these routers would
E
be the top-of-rack (TOR) routers/switches. In this scenario, each TOR router would act as a layer 2 VPN gateway. A gateway is
the device in a VPN that performs the encapsulation and decapsulation of VPN data. In a layer 2 VPN based on Ethernet, a
US
gateway (router on left) will take Ethernet frames destined for a remote MAC address, encapsulate the original Ethernet
frame in some other data type (like IP, MPLS, IPsec, etc.) and transmit the newly formed packet to the remote gateway. The
receiving gateway (router on right) will receive the VPN data, decapsulate the data by removing the outer encapsulation, and
then forward the remaining original Ethernet frame to the locally attached server. Notice on the diagram, that the IP Fabric
simply had to forward IP data. The IP Fabric had no knowledge of the Ethernet connectivity that exists between Host A and B.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Data Plane
There are generally two components of a VPN. There is the data plane (as described on this slide) and the control plane (as
described on the next slide).
E
The data plane of a VPN describes the method in which a gateway encapsulates and decapsulates the original data. Also, in
US
regards to an Ethernet layer 2 VPN, it might be necessary for the gateway to learn the MAC addresses of both local and
remote servers much like a normal Ethernet switch learns MAC addresses. In almost all forms of Ethernet VPNs, the
gateways learn the MAC addresses of locally attached servers in the data plane (i.e. from received Ethernet frames). Remote
MAC addresses can be learned either in the data plane (after decapsulating data received from remote gateways) or in the
control plane.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Control Plane
One question that must be asked is, How does a gateway learn about remote gateways? The learning of remote gateways
can happen in one of two ways. Remote gateways can be statically configured on each gateway participating in a VPN or they
E
Static configuration works fine but it does not really scale. For example, imagine that your have 20 TOR routers participating
in a statically configured layer 2 VPN. If you add another TOR router to the VPN, you would have to manually configure each of
the 20 switches to recognize the newly added gateway to the VPN.
Usually a VPN has some form of dynamic signaling protocol for the control plane. The signaling protocol can allow for
dynamic adds and deletions of gateways from the VPN. Some signaling protocols also allow a gateway to advertise its locally
AL
learned MAC addresses to remote gateways. Usually a gateway has to receive an Ethernet frame from a remote host before
it can learn the hosts MAC address. Learning remote MAC addresses in the control plane allows the MAC tables of all
gateways to be more in sync. This has a positive side effect of causing the forwarding behavior of the VPN to be more
efficient (less flooding of data over the fabric).
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
Virtualization
Data centers are relying on virtualization more and more. The slide shows the concepts of virtualizing servers in a data
center. Instead of installing a bare metal servers (BMS), a server can run as a virtual machine (VM) on a host machine. A VM
E
is a software computer that runs the same OS and applications as a BMS. A host machine is the physical machine that
house the VMs that run inside it.
US
One interesting piece of virtualization is how networking works between VMs. Normally, a BMS would simply need a physical
network interface card (NIC) to attach to the network. In the virtualized world, the VMs also utilize NICs, however they are in
fact, virtual. VMs use their virtual NICs to communicate with other VMs. To provide connectivity between VMs on the same
host machine, the virtual NICs attach to virtual switches. To allow VMs to communicate over the physical network, the virtual
AL
switches use the physical NICs of the host machine. If the physical network is a switched network (as in the diagram), the
virtual switches appear to standard switches attached to the network. VLANs can simply be stretched from one virtual
switch, across the physical switched network, and terminate on one or more remote virtual switches. This works great when
the physical network is some sort of Ethernet switches network. However, what happens when the physical network is based
RN
on IP routing?
TE
IN
A RE
SH
T
NO
DO
LY
ON
vendors of virtualized products have chosen to support VXLAN as the layer 2 VPN. VXLAN functionality can be found in the
virtual switches like VMWares Distributed vSwitch, Open vSwitch, and Juniper Networks Contrail vRouters. If virtualizing the
US
network is the future, it would seem that VXLAN has become the de facto layer 2 VPN in the data center.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
plane, other signaling methods for VXLAN exist including Multi-protocol Border Gateway Protocol (MP-BGP) Ethernet VPN
(EVPN) as well as Open Virtual Switch Database (OVSDB). This chapter covers the multicast method of signaling.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
1. Original Ethernet Frame: The Ethernet frame being tunneled over the underlay network minus the VLAN tagging.
2. VXLAN Header (64 bits): Consists of an 8 bit flags field, the VNI, and two reserved fields. The I flag must be set
US
4. Outer IP Header: The source address is the IP address of the sending VXLAN Tunnel End Point (VTEP). The
destination address is the IP address of the receiving VTEP.
5. Outer MAC: As with any packet being sent over a layer 3 network, the source and destination MAC addresses will
RN
A RE
SH
T
NO
DO
LY
ON
VTEP: Part 1
The VXLAN Tunnel Endpoint (VTEP) is the VPN gateway for VXLAN. It performs the encapsulation (and decapsulation) of
Ethernet frames using VXLAN encapsulation. Usually, the mapping of VLAN (VM-facing) to VNI is manually configured on the
E
VTEP.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
VTEP: Part 2
The slide shows how a VTEP handles an Ethernet frame from a locally attached VM that must be sent to a remote VM. Here
is the step by step process taken by Virtual Switch 1...
E
2. VS1 performs a MAC table look up and determines that the frame must be sent over the VXLAN tunnel to the
remote VTEP, VS2.
3. VS1 removes any outer VLAN tagging on the original Ethernet frame and then encapsulates the remaining
Ethernet frame using VXLAN encapsulation while also setting the destination IP address to VS2s VTEP address
AL
A RE
SH
T
NO
DO
LY
ON
VTEP: Part 3
The slide shows how a VTEP handles a VXLAN packet from a remote VTEP that must be decapsulated and sent to a local VM.
Here is the step by step process taken by the network and VS2...
E
1. The routers in the IP fabric simply route the VXLAN packet to its destination, VS2s VTEP address.
US
2. VS2 receives the VXLAN packet and uses the received VNI to determine on which MAC table the MAC table
lookup should be performed.
3. VS2 strips the VXLAN encapsulation leaving the original Ethernet frame.
4. VS2 performs a MAC table lookup to determine the outgoing virtual interface to send the Ethernet frame.
AL
5. VS2, if necessary, pushes on VLAN tag and forwards the Ethernet frame to VM3.
One thing you should notice about the VLAN tagging between the VMs and the virtual switches is that since the VLAN tags
are stripped before sending over the IP Fabric, the VLAN tags do not have to match between remote VMs. This actually allows
RN
for more flexibility in VLAN assignments from server to server and rack to rack.
TE
IN
A RE
SH
T
NO
DO
LY
ON
a networking device like a router or switch can handle the VTEP role. A networking device that can perform that role is called
a VXLAN Gateway. There are two types of VXLAN Gateways; layer 2 and layer 3. The slide shows how a VXLAN Layer 2
US
Gateway (router on the right) handles VXLAN packets received from a remote VTEP. It simply provides layer 2 connectivity
between hosts on the same VLAN.
As you discuss the concept of a VTEP with others, you may notice that people refer to the different types of VTEPs in different
ways. For example, a VTEP that is part of a virtual switch (as shown in previous slides) is sometimes referred to as a software
AL
VTEP. A physical router or switch acting as a VXLAN Gateway (Layer 2 or Layer 3) is sometimes referred to as a hardware
VTEP.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Router Bs IRB interface. To send a packet to 1.1.1.1 (a remote IP subnet) VM1 must use Address Resolution Protocol (ARP)
to determine the MAC address of 10.1.1.254. Once VM1 knows the MAC address for 10.1.1.254, VM1 and the devices along
US
the way to the 1.1.1.1 will use the following procedure to forward an IP packet to its destination...
1. VM1 creates an IP packet destined to 1.1.1.1.
2. Since 1.1.1.1 is on a different subnet than VM1, VM1 encapsulates the IP packet in an Ethernet frame with a
destination MAC address of the default gateways MAC address and sends the Ethernet frame to VS1.
AL
3. VS1 receives the Ethernet frame and performs a MAC table lookup and determines that the Ethernet frame
must be sent over the VXLAN tunnel to Router B. Router B appears to VS1 as the VTEP that is directly attached
the host that owns the destination MAC address. The reality is that the destination MAC address is the MAC
RN
interface.
6. Router B strips the remaining Ethernet framing and performs a routing table lookup to determine the nexthop to
the destination network.
IN
7. Router B encapsulates the IP packet in the outgoing interfaces encapsulation and forwards it to the nexthop.
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
BUM Traffic
The slide discusses the handling of BUM traffic by VTEPs according to the VXLAN standard model. In this model, you should
note that the underlay network must support a multicast routing protocol, preferably some form of Protocol Independent
E
Multicast Sparse Mode (PIM-SM). Also, the VTEPs must support Internet Group Membership Protocol (IGMP) so that they can
inform the underlay network that it is a member of the multicast group associated with a VNI.
US
For every VNI used in the data center, there must also be a multicast group assigned. Remember that there are 2^24 (~16M)
possible VNIs so your customer will need 2^24 group addresses. Luckily, 239/8 is a reserved set of organizationally scoped
multicast group addresses (2^24 group addresses in total) that can be used freely within your customers data center.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
that each VTEP that belongs to 239.1.1.1 will also build its branch of the RPT (including VTEP B).
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Multicast Forwarding
When VTEP B receives a broadcast packet from a local VM, VTEP B encapsulates the Ethernet frame into the appropriate
VXLAN/UDP/IP headers. However, it sets the destination IP address of the outer IP header to the VNIs group address
E
(239.1.1.1 on the slide). Upon receiving the multicast packet, VTEP Bs DR (the PIM router closest to VTEP B) encapsulates
the multicast packet into unicast PIM register messages that are destined to the IP address of the RP. Upon receiving the
US
register messages, the RP de-encapsulates the register messages and forwards the resulting multicast packets down the
(*,G) tree. Upon receiving, the multicast VXLAN packet, VTEP A does the following:
1. Strips the VXLAN/UDP/IP headers;
2. Forwards the broadcast packet towards the VMs using the virtual switch;
AL
For all of this to work, you must ensure that the appropriate devices support PIM-SM, IGMP, and the PIM DR and RP
functions.
It is not shown on this slide but once R1 receives the first native multicast packet from the RP (source address is VTEP Bs
address), R1 will build a shortest path tree (SPT) to the DR closest to VTEP B which will establish (S,G) state on all routers
TE
A RE
SH
T
NO
DO
LY
ON
VXLAN Configuration
The slide highlights the topic we discuss next.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Example Topology
The slide shows the example topology that will be used for the subsequent slides.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Logical View
To help you understand the behavior of the example, the slide shows a logical view of the overlay network. Using the help of
VXLAN, it will appear that Host A, Host B, and the IRBs of the routers in AS 64512 and 64513 will be in the same broadcast
E
domain as well as IP subnet. Also, VRRP will run between the two routers so as to provide a redundant default gateway to the
two hosts.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Routing
You must ensure that all VTEP addresses are reachable by all of the routers in the IP Fabric. Generally, the loopback
interface will be used on Juniper Networks routers as the VTEP interfaces. Therefore, you must make sure that the loopback
E
addresses of the routers are reachable. Remember, the loopback interface for each router in the IP Fabric fell into the
172.16.100/24 range.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
PIM
Some form of PIM must be enabled in the IP Fabric. The slide shows that the routers will run PIM-SM with a statically
configured RP. The configurations of the RP as well as all other routers is shown on the slide. Notice that PIM-SM only needs
E
A RE
SH
T
NO
DO
LY
ON
Source Address
You must decided on the source address of the VXLAN and multicast packets that will be generated by the local VTEP. Use
the vtep-source-interface statement to specify the interface where the IP address will come from. This command is
E
A RE
SH
T
NO
DO
LY
ON
However, it may cause a remote VXLAN gateway to receive unwanted BUM traffic for a VNI that does not belong to.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
vrrp-group 1 {
virtual-address 10.1.1.254;
priority 100;
}
RN
}
}
}
The bridge domain configuration on router as64513 would be the identical to that shown on the slide.
TE
IN
A RE
SH
T
NO
DO
LY
ON
As you know, multicast is used in the control plane for VXLAN. It helps in the forwarding of BUM traffic (here we care about
US
the multicast traffic). Normally, when a VTEP receives multicast traffic from an attached server, it will send a copy to all other
locally attached servers on the same VLAN. It will also send a VXLAN encapsulated copy over the IP fabric using the
multicast-group for the VXLAN segment. That is, every remote VTEP will receive a copy of the original multicast packet,
regardless of whether or not they have any attached receivers. If you know that there are no receivers attached to any
remote VTEPs for a particular multicast group, you can use the command on the slide to help stop the transmission of transit
AL
A RE
SH
T
NO
DO
LY
ON
automatically discard any received VXLAN packets that, when decapsulated, contain a VLAN tagged Ethernet frame. The
slide shows the commands that can override those default behaviors. One reason that you might want to preserve the VLAN
US
A RE
SH
T
NO
DO
LY
ON
means that the gateway has received multicast traffic (BUM traffic encapsulated in VXLAN) from a remote VTEP allowing it to
learn the remote VTEPs IP address, so the local gateway has instantiated a SPT towards that remote VTEP.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
PIM Neighbors
The commands on the slide verify which PIM neighbors have been discovered and the associated settings for the neighbors.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
VTEP Interfaces
Prior to learning any remote neighbors, a VXLAN Gateway will create a single logical VTEP interface, vtep.32768 on the
slide. Although this interface is never used for forwarding, when it shows up in the output of this command it allows you to
E
verify two things; the local device is configured as a VXLAN Gateway and its source IP address for VXLAN packets. For each
remote VTEP learned, a gateway will instantiate another logical VTEP interface, vtep.32769 on the slide. These interfaces
US
represent the VXLAN tunnel established between the local gateway and the remote gateway. These interfaces are actually
used for forwarding as you can tell from the input and output packet counts.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
MAC Table
A VXLAN Gateway uses a MAC table for forwarding decisions. The slide shows the two commands to verify the MACs and
associated interfaces that have been learned by the gateway.
E
US
AL
RN
TE
IN
RE
A
SH
T
NO
DO
LY
ON
We Discussed:
Reasons why you would use VXLAN in your data center;
E
A RE
SH
T
NO
DO
LY
ON
Review Questions
1.
E
2.
US
3.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Lab: VXLAN
The slide provides the objective for this lab.
E
US
AL
RN
TE
IN
RE
1.
Major vendors of virtualization product support VXLAN to provide the layer 2 stretch over an IP-based data center. If the vSwitches
of your virtualized product ONLY support VXLAN, then more than likely your other networking devices will need to support
A
VXLAN as well.
2.
SH
A VXLAN Gateway automatically removes the VLAN tag for an Ethernet frames received from a locally attached server.
3.
show ethernet-switching vxlan-tunnel-end-point remote mac-table on a QFX5100 Series switch or
show l2-learning vxlan-tunnel-end-point remote mac-table on an MX Series router can be used to view the
T
MAC learned from remote gateways.
NO
DO
LY
ON
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
E
US
AL
RN
TE
IN
T
NO
Chapter 5: EVPN
DO
LY
ON
E
US
AL
RN
TE
IN
Advanced Data Center Switching
RE
A
SH
T
NO
DO
LY
ON
We Will Discuss:
The benefits of using EVPN signaling for VXLAN;
E
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
(PIM) and multicast in the signaling plane, other signaling methods for VXLAN exist including Multi-protocol Border Gateway
Protocol (MP-BGP) Ethernet VPN (EVPN) as well as Open Virtual Switch Database (OVSDB). This chapter covers the EVPN
US
method of signaling. Although we cover EVPN as the signaling component for VXLAN in this chapter, it should be noted that
EVPN can also be used as the signaling component for both MPLS/MPLS and MPLS/GRE encapsulations as well. Those
encapsulation types are not covered in this course.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
EVPN protocol.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
MP-BGP
EVPN is based on Multiprotocol Border Gateway Protocol (MP-BGP). It uses the Address Family Identifier (AFI) of 25 which is
the Layer 2 VPN address family. It uses the Subsequent Address Family Identifier of 70 which is the EVPN address family.
E
BGP is a proven protocol in both service provider and enterprise networks. It has the ability to scale to millions of route
US
advertisements. BGP also has the added benefit of being policy oriented. Using policy, you have complete control over route
advertisements allowing you to control which devices learn which routes.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Active/Active Forwarding
When using PIM in the control plane for VXLAN, it is really not possible to have a server attach to two different top of rack
switches with the ability to forward data over both links (i.e., both links active). When using EVPN signaling in the control
E
plane, active/active forwarding is totally possible. EVPN allows for VXLAN gateways (Leaf1 at the top of the slide) to use
multiple paths and multiple remote VXLAN gateways to forward data to multihomed hosts. Also, EVPN has mechanisms (like
US
split horizon, etc.) to ensure that broadcast, unknown unicast, and multicast traffic (BUM) does not loop back towards a
multihomed host.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
1. Leaf2 receives an Ethernet frame with a source MAC address of HostB and a destination MAC address of
HostC.
US
2. Based on a MAC table lookup, Leaf2 forwards the Ethernet frame to its destination over the VXLAN tunnel.
Leaf2 also populates its MAC table with HostBs MAC address and associates with the outgoing interface.
3. Since Leaf2 just learned a new MAC address, it advertises the MAC address to the remote VXLAN gateway,
Leaf1. Leaf1 installs the newly learned MAC address in its MAC table and associates it with an outgoing
AL
A RE
SH
T
NO
DO
LY
ON
Proxy ARP
Although not currently supported, the EVPN RFC mentions that a EVPN Provider Edge (PE) router, Leaf1 in the example, can
perform Proxy ARP. It is possible that if Leaf2 knows the IP-to-MAC binding for HostB (because it was snooping some form of
E
IP traffic from HostB), it can send the MAC advertisement for HostB that also contains HostBs IP address. Then, when HostA
sends an ARP request for HostBs IP address (a broadcast Ethernet frame), Leaf1 can simply send an ARP reply back to
US
HostA without ever having to send the broadcast frame over the fabric.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
with the same virtual IP address of 10.1.1.254. If the Spine nodes are MX Series routers, they also share the same virtual
MAC address, 00:00:5e:00:01:01 (same as VRRP even though VRRP is not used). SpineA and SpineB send a MAC
US
Advertisement to LeafC for the same MAC. Now, LeafC can load share traffic from HostC to the default gateway.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
EVPN Terminology
The slide highlights the terms used in a network using VXLAN with EVPN signaling.
E
PE devices: These are the networking devices (Leaf nodes in the diagram) to which servers attach in a data
center. These devices also act as VXLAN Tunnel Endpoints (VTEPs) or VXLAN gateways (can be Layer 2 or Layer
US
are the servers, switches, and storage devices that need layer 2 connectivity with other devices in the data
center.
Site: An EVPN site is a set of CEs that communicate with one another without needing to send Ethernet frames
over the fabric.
RN
EVPN Instance (EVI): An EVPN Instance spanning the PE devices participating in that EVPN.
Bridge Domain: A MAC table for a particular VLAN associated with an EVI. There can be many bridge domains
for a given EVI.
TE
A RE
SH
T
NO
DO
LY
ON
EVPN Routes
The slide lists the EVPN routes, their usage, as well as where they are defined. The subsequent slides will discuss most of
these routes in detail.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
MAC addresses in data plane from Ethernet frames received from CEs, CE2 in the example. Once Leaf2 learns CE2s MAC
address, it will automatically advertise it to remote PEs and attaches a target community, community Orange in the
US
example. Leaf1, another EVPN PE, upon receiving the route must decide on whether it should keep the route. It makes this
decision based on the received route target community. Leaf1, in order to accept and use this advertisement, must be
configured with an import policy that accepts routes tagged with the Orange target community. Without a configured policy
that matches on the Orange route target, Leaf1 would just discard the advertisement. So, at a minimum, each EVI on each
participating PE for a given EVPN must be configured with an export policy that attaches a unique target community to MAC
AL
advertisements and also configured with an import policy that matches and accepts advertisements based on that unique
target community.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Type 0: This format uses a 2-byte administration field that codes the providers autonomous system number,
followed by a 4-byte assigned number field. The assigned number field is administered by the provider and should
US
The examples on the slide show both the Type 0 and Type 1 route distinguisher formats. The first example shows the 2-byte
administration field with the 4-byte assigned number field (Type 0).
RFC 7432 recommends using the Type 1 route distinguisher for EVPN signaling.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
When a PE router receives route advertisements from remote PE routers, it determines whether the associated route target
US
matches one of its local VRF tables. Matching route targets cause the PE router to install the route into the VRF table whose
configuration matches the route target.
Because the application of policy determines a VPNs connectivity, you must take extra care when writing and applying VPN
policy to ensure that the tenants connectivity requirements are faithfully met.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
copied into the VRF table as EVPN Type 2 routes. Each of the Type 2 routes associated with locally learned MACs will be
tagged with the community target:1:1. Finally, these tagged routes are then advertised to all remote PEs.
US
In the next few slides, you will learn the details of the other EVPN route types. You should know that the vrf-target
statement always sets the target community (using hidden VRF import and export policies) of Type 1 routes. By default, the
vrf-target statement also sets the target community of Type 2 and Type 3 routes as well. Later in this chapter, you will
learn how to set a different target community for Type 2 and Type 3 routes.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
that advertises EVPN routes tagged with the target community. The statement also happens to enable the associated import
policy which will accept routes that are tagged with that target community. So, you must configure the vrf-target
US
statement to enable export policy at a minimum. To override the import policy instantiated by that statement, you can apply
the vrf-import statement.
In the example, the vrf-target target:1:1 is applied to Leaf1s EVI. When Leaf1 receives the MAC Advertisement
from Leaf2, it runs the route through the configured import policy which will accept routes tagged with target:1:1. Once
AL
accepted, the route is copied into the Leaf1s global RIB-IN table and then copied into the appropriate VRF table (the one
configured with the vrf-target target:1:1 statement). Finally, the route is converted into a MAC entry and stored in
Leaf1s MAC table for the Orange EVI. The outgoing interface associated with the MAC is the VXLAN tunnel that terminates
on Leaf2.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Ethernet Segment
The set of links that attaches a site to one or more PEs is called an Ethernet segment. In the slide, there are two Ethernet
Segments. Site 1 has an Ethernet segment that consists of links A and B. Site 2 has an Ethernet segment that consists of
E
link C. Each Ethernet Segment must be assigned a 10-octet Ethernet Segment Identifier (ESI). There are two reserved ESI
values as shown in the slide. For a single-homed site, like Site 2, the ESI should be set to
US
0x00:00:00:00:00:00:00:00:00:00. This is the default ESI setting for a server facing interface on a Juniper Networks EVPN
PE. For any multihomed site, the ESI should be set to a globally unique ESI. In the example, both link A and link B have their
ESI set to 0x01:01:01:01:01:01:01:01:01:01. The commands below shows how to set the ESI on the server-facing interface.
lab@leaf1# show
esi {
01:01:01:01:01:01:01:01:01:01;
RN
all-active;
}
unit 0 {
family ethernet-switching {
TE
interface-mode trunk;
vlan {
members v100;
IN
...
A RE
SH
T
NO
DO
LY
ON
contains the Single-Active Flag. This flag lets the remote PEs know whether or not they can load share traffic over the
multiple links attached to the site. If the Single-Active flag is set to 1, that means only one link associated with the Ethernet
US
segment can be used for forwarding. If the Single-Active flag is set to 0, that means that all links associated with the Ethernet
segment can be used for forwarding data (we call this active/active forwarding). Juniper Networks devices only support
active/active forwarding (we always set the flag to 0).
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Remote PE Behavior
When a remote PE, Leaf 3 in the example, receives the Ethernet Autodiscovery routes from Leaf1 and Leaf2 it now knows
that it can use either of the two VXLAN tunnels to forward data to MACs learned from Site 1. Based on the forwarding choice
E
made by CE1, it may be that Leaf1 was the only PE attached to Site1 that learned CE1s MAC address. That means that
Leaf3 may have only ever received a MAC Advertisement for CE1s MAC from Leaf1. However, since Leaf1 and Leaf2 are
US
attached to the same Ethernet Segment (as advertised in their Type 1 routes), Leaf3 knows it can get to CE1s MAC through
either Leaf1 or Leaf2. You can see in Leaf3s MAC table, that both VXLAN tunnels have been installed as next hops for CE1s
MAC address.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Added Benefit
Another benefit of the Ethernet Autodiscovery route is that it helps to enable faster convergence times when a link fails.
Normally, when a site-facing link fails, a PE will simply withdraw each of its individual MAC Advertisement. Think about the
E
case where there are thousands of MACs associated with that link. The PE would have to send 1000s of withdrawals. When
the Ethernet Autodiscovery route is being advertised (because the esi statement is configured on the interface), a PE (like
US
Leaf1 on the slide) can simply send a single withdrawal of its Ethernet Autodiscovery route and Leaf3 can immediately
update the MAC table for all of the 1000s of MACs it had learned from Leaf1. This allows convergence times to greatly
improve.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
BUM Traffic
When EVPN signaling is used with VXLAN encapsulation, Juniper Networks devices only support ingress replication of BUM
traffic. That is, when BUM traffic arrives on a PE, the PE will unicast copies of the BUM packets to each of the individual PEs
E
A RE
SH
T
NO
DO
LY
ON
be used and the addressing that should be used to send the BUM traffic. In the diagram, Leaf2 advertises that it is expecting
and using ingress replication and that Leaf1 should use 4.4.4.4 as the destination address of the VXLAN packets that are
US
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
In the top diagram, Leaf1 will make copies of the BUM packets and unicast them to each remote PE belonging to the same
EVPN. This will cause CE2 to receive multiple copies of the same packets. This is not good.
In the bottom diagram, Leaf3 receives BUM traffic from the attached CE. It makes copies and unicasts them to the remote
PEs including Leaf2. Leaf2 because of the default split horizon rules will forward BUM traffic back to the source creating a
loop.
AL
A RE
SH
T
NO
DO
LY
ON
Designated Forwarder
To fix the problems described on the previous slide, all the PEs attached to the same Ethernet Segment will elect a
designated forwarded for the Ethernet segment (2 or more PEs advertising the same ESI). A designated forwarder will be
E
elected per broadcast domain. Remember that an EVI can contain 1 or more broadcast domains or VLANs. The Ethernet
Segment Route (Type 4) is used to help with the election of the designated forwarder.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Notice that Leaf2 and Leaf3 will advertise a type 4 to every PE belonging to an EVPN. However, notice that the route is not
tagged with a target community. Instead, it is tagged with a ES-import target community. The ES-import target community is
automatically generated by the advertising PE and is based off of the ESI value. Since Leaf1 does not have an import policy
that matches on the ES-import target, it will drop the type 4s. However, since Leaf2 and Leaf3 are configured with the same
ESI, the routes are accepted by a hidden policy that matches on the ES-import target community that is only known by the
AL
PEs attached to the same Ethernet Segment. Now Leaf2 and Leaf3 use the Originator IP address in the Type 4 route to build
a table that associates an Originator IP address (i.e. the elected designated forwarder) with a VLAN in a round-robin fashion.
After the election, If a non-designated forwarder for a VLAN receives BUM traffic from a remote PE, it will simply drop those
packets.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
lab@spine1# show
irb {
unit 0 {
family inet {
AL
If both Spine1 and Spine2 are configured in this manner using the same virtual gateway address, both devices will not only
share the same virtual IP address but they will share a virtual MAC address, 00:00:5e:00:01:01. The Spine nodes will each
advertise that MAC address to the other PEs. Now the remote PEs will be able to load share traffic over the multiple paths to
the same virtual MAC address.
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
Underlay Topology
The slide shows the IP Fabric that will serve as the underlay network. It is based on EBGP with each router being in its own
autonomous system. Each router will advertise its loopback address which will also serve as the VTEP address as well.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Overlay Topology
The slide shows the overlay topology. Each leaf will act as a VXLAN Layer 2 Gateway. Each Spine will act as a distributed
VXLAN Layer 3 Gateways and provide routing into and out of the 10.1.1/24 subnet. Host A will be dual-homed using a LAG to
E
two Leaf nodes. The control plane for VXLAN will be EVPN using MP-IBGP. In the IBGP topology, the Spine nodes will act as
route reflectors.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Logical View
To help you understand they behavior of the example, the slide shows a logical view of the overlay network. Using the help of
VXLAN, it will appear that Host A, Host B, and the IRBs of the routers in AS 64512 and 64513 will be in the same broadcast
E
domain as well as IP subnet. Also, a matching virtual IP address and a matching virtual MAC address will be assigned to
each Spine nodes IRB interface which will provide a redundant, distributed default gateway to the two hosts.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Common Configuration
The slide shows the common configuration for all routers. Notice that a load-balancing policy has been applied to the
forwarding-table that will allow for multiple next hops to be installed in the forwarding table. Also, there is a policy called
E
direct that will be applied to the EBGP neighbors. The main purpose of this policy is to advertise each routers loopback
interface (VTEP source interface) to all routers in the fabric. Lastly, in order for each router to run BGP, the autonomous
US
system must be set under [edit routing-options]. Looking at the example topology, you should notice that each
router will belong to two autonomous systems. Each router will belong to one AS in the underlay and one AS in the overlay. If
you plan to use the automatic route target function (described in subsequent slides) you should set the AS under [edit
routing-options] to the overlay networks AS number.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Underlay Configuration: Each router is peering to one another using EBGP. The export statement allows for all
directly connect networks to be advertise to BGP neighbors. The local-as statement overrides the settings
US
under routing-options just for the neighbors in this group. The multipath multi-as statement allows
for multiple routes from multiple ASs to be used as active routes in the routing table.
Overlay Configuration: Each Spine node is acting as a route reflector running IBGP with its clients. The
cluster statement cause the local router to act like a route reflector for the neighbors in this group. The
family evpn signaling statement sets the AFI and SAFI for the IBGP sessions. The local-as
AL
configuration is probably unnecessary since the same AS is configured under routing-options. The
multipath statement allows for multiple similar received BGP routes to be active in the routing table.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
Underlay Routing
You must ensure that all VTEP addresses are reachable by all of the routers in the IP Fabric. Generally, the loopback
interface will be used on Juniper Networks routers as the VTEP interface. Therefore, you must make sure that the loopback
E
A RE
SH
T
NO
DO
LY
ON
from the route reflectors, you can see the RIB-IN that will be used for both sessions is bgp.evpn.0.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
configuration. Since Leaf1 and Leaf2 have interfaces that belong to the same Ethernet Segment, both Leaf1 and Leaf2
should have their et-0/0/50s ESI value set to the same value. When you assign an ESI value, you need to make sure that it
US
A RE
SH
T
NO
DO
LY
ON
route-distinguisher, and the vrf-target statement. It is under [edit protocols evpn] that you will set the
encapsulation of VXLAN, the multicast mode, and the list of VNIs that will receive the benefit of EVPN signaling.
US
The slide mentions the vrf-target statement and its behavior in the exporting of EVPN routes. It literally creates a hidden
export policy that advertises all locally generated Type1, Type2, and Type3 routes to remote PE routers after tagging the
routes with the specified target community. Also, the vrf-target statement creates a hidden import policy that accepts
any received EVPN routes that are tagged with the specified target community. We will discuss how to modify the routers
AL
A RE
SH
T
NO
DO
LY
ON
bridge domain and EVPN configuration occurs in the context of a virtual switch, tenant1_vs. Virtual switch configuration is
required on an MX Series when using EVPN signaling. Everything under the tenant1_vs enables the MX Series to be a
US
VXLAN Layer 2 Gateway, similar to the QFX5100 series configuration on the previous slide except for the
routing-interface irb.0 statement. Notice the IRB interface has been given a real IP address of 10.1.1.10. It has
also been assigned a virtual-gateway-address of 10.1.1.254 which is the default gateway for the 10.1.1/24 subnet.
It may not be obvious but the virtual-gateway-address statement also binds a virtual MAC address of
00:00:5e:00:01:01 to the 10.1.1.254 address on the spine1 router. The virtual-gateway-address does everything
AL
mentioned above as well as cause the Spine1 router to send a MAC Advertisement route to all remote PEs advertising the
virtual MAC address. Since Spine1 and Spine2 are configured with the same virtual-gateway-address (and virtual
MAC), the remote PEs can load share traffic towards the virtual MAC address (i.e. whenever a host needs to send data to the
default gateway). One last thing to mention is that by default, the subnet associated with the IRB interface is installed in the
RN
inet.0 table. The slide shows that the IRB interface has been associated with the tenant1_vr routing instance. That
means that any packet arriving on the IRB interface will be routed based on the tenant_vr.inet.0 routing table.
TE
IN
A RE
SH
T
NO
DO
LY
ON
VRF tables. The slide shows that Leaf2 only needs to receive MAC Advertisements for VNI 1000. However, since each Leaf
node is only configured with the vrf-target statement, Leaf2 will receive and accept MAC Advertisement Routes for VNI
US
2000 also. Even though, Leaf2 does not have a MAC table for VLAN 200, it will still install all the MAC Advertisement routes
in its RIB-IN table as well as its VRF table. This can be a major waste of memory on Leaf2 depending on how many MACs
have been advertised for VNI 2000. The next few slides will show you how to get control over which routes are accepted by
the PE routers.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
That leaves us with the Type 2 MAC Advertisement Route and the Type 3 Inclusive Multicast Ethernet Tag Route. As you know,
both of these routes carry the VNI value. That means that these types of routes are VNI-specific (Type 1 and Type 4 routes are
US
Ethernet Segment-specific). It is possible to set VNI-specific import and export policy using VNI-specific target communities.
The slide shows how to configure the VNI-specific vrf-target export statement under the vni-options hierarchy.
Although, the vrf-target export statements apply a hidden export policy that advertises and tags the Type 2 and Type
3 routes for the related VNI using the configured target community, the commands do not apply any import policies. So, after
applying the vrf-target export statements, you must also configure and apply a vrf-import policy that accepts the
AL
new target communities as well as the original target community for the EVI (target:64520:1 in the example). Your
import policy will override the hidden import policy that was created by the original vrf-target statement (vrf-target
target:64520:1 in the example).
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
you how you can have your router automatically assign route targets to each configured VNI by configuring the auto
statement. This statement will also cause your router to automatically enable hidden VRF import and export policies to
US
advertise and accept received routes tagged with the automatically generated target communities. You should note that the
automatically generated VRF import policies that are created as a result of the auto statement will override the import
policy that gets instantiated with the vrf-target target:64520:1 statement on the slide (which is used for the Type 1
advertisements). So, you must configure and apply an import policy that will accept the Type 1 routes.
AL
In order for the auto statement to work nicely between PEs (so they calculate the same target communities), every PE router
must be configured with auto statement. Also, each PE router must be configured for the same autonomous system under
the [edit routing-instance] hierarchy since the automatically generated target communities are based on that AS
value.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
automatically discard any received VXLAN packets that, when decapsulated, contain a VLAN tagged Ethernet frame. The
slide shows the commands that can override those default behaviors. One reason that you might want to preserve the VLAN
US
A RE
SH
T
NO
DO
LY
ON
BGP Status
Use the show bgp summary command to determine the status and routing tables used with your routers BGP neighbors.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
EVPN RIB-IN
The slide shows how to view all the routes (for all EVPN instances) that have been accepted by VRF import policies.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
VRF Table
The slide shows you how to view the routes for a particular EVPN instance.
E
US
AL
RN
TE
IN
RE
A
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
VTEP Interfaces
Prior to learning any remote neighbors, a VXLAN Gateway will create a single logical VTEP interface, vtep.32768 on the
slide. Although this interface is never used for forwarding, when it shows up in the output of this command it allows you to
E
verify two things; the local device is configured as a VXLAN Gateway and its source IP address for VXLAN packets. For each
remote VTEP learned, a gateway will instantiate another logical VTEP interface, vtep.32769 on the slide. These interfaces
US
represent the VXLAN tunnel established between the local gateway and the remote gateway. These interfaces are actually
used for forwarding as you can tell from the input and output packet counts.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
MAC Table
A VXLAN Gateway uses a MAC table for forwarding decisions. The slide shows the two commands to verify the MACs and
associated interfaces that have been learned by the gateway.
E
US
AL
RN
TE
IN
RE
A
SH
T
NO
DO
LY
ON
We Discussed:
The benefits of using EVPN signaling for VXLAN;
E
A RE
SH
T
NO
DO
LY
ON
Review Questions
1.
E
2.
US
3.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
RE
1.
EVPN allows for CE devices to multi-home to more than one Leaf node such that all interfaces are actively forwarding data. EVPN
signaling minimizes unknown unicast flooding since PE routers advertise locally learned MACs to all remote PEs.
A
2.
An Ethernet Segment route is tagged with the ES-Import Route Target community.
SH
3.
Because configuring the auto statement overrides the hidden import policies of the vrf-target statement, you must configure
and apply a VRF import policy that accepts the target community that is assigned to the Type 1 routes.
T
NO
DO
LY
ON
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
E
US
AL
RN
TE
IN
T
NO
Chapter 6: Data Center Interconnect
DO
LY
ON
E
US
AL
RN
TE
IN
Advanced Data Center Switching
A RE
SH
T
NO
DO
LY
ON
We Will Discuss:
The meaning of the term Data Center Interconnect;
E
A RE
SH
T
NO
DO
LY
ON
DCI Overview
The slide lists the topics we will discuss. We discuss the highlighted topic first.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
center to another.
US
Many of the DCI communication options rely on an MPLS network to transport frames between data centers. Although in
most cases an MPLS network can be substituted with an IP network (i.e., by encapsulating MPLS in GRE), there are several
advantages to using an MPLS network including availability, cost, fast failover, traffic engineering, and scalable VPN options.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Interconnect Network
Between two data centers that need to be interconnected is a network of some type. A typical interconnect network could be
a point-to-point line, an IP network, or an MPLS network. The slide shows that these networks can be owned by customer
E
(the owners of the data center) or by a service provider. All the DCI options that we discuss in this chapter will work in both a
customer-owned or service provider-owned interconnect network. The main difference is how much control a customer has
US
over the DCI. Sometimes it is just easier and cost effective to let the service provider manage the DCI.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Point-to-Point DCI
In general, if there is great distance between data centers, a point-to-point interconnect can be pretty expensive. However, if
the data centers are just down the street from one another, it might make sense to have a point-to-point interconnect. This
E
type of interconnect is usually provided as a dark fiber between the data center. The customer simply attaches equipment to
the fiber and has the choice of running any type of protocol they wish over the interconnect.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
IP DCI
It is possible to provide a DCI over an IP network. If the DCI is meant to provide Layer 2 stretch (extending of VLANs) between
the data centers then the Ethernet frames will need to be encapsulated in IP as it traverses the DCI. VXLAN and GRE are
E
some of the typical IP encapsulations that provide the layer 2 stretch. If the DCI is to provide layer 3 reachability between
data center, then an IP network is well suited to meet those needs. However, sometime the DCI network may only support
US
globally routeable IP addressing while the data centers use RFC 1918 addressing. When that is the case, it might make
sense to create a layer 3 VPN between the two data centers, like GRE, IPsec, or RFC 4364 (MPLS Layer 3 VPN over GRE).
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
MPLS DCI
The slide shows the encapsulation boundary of an MPLS transport network. The boundaries are different depending on who
owns the MPLS network. If the customer own the MPLS network then MPLS can be used for encapsulation from end-to-end.
E
If the service provider owns the MPLS network then the encapsulations between DC and MPLS network completely depends
on what is allowed by the service provider. If the service provider is providing a layer 2 VPN service, then the customer should
US
expect that any Ethernet frames sent from one data center will appear unchanged as it arrives at the remote data center. If
the service provider is providing a layer 3 VPN service, then the customer should expect that any IP packets sent from one
data center will appear unchanged as it arrives at the remote data center. In some cases, the service provider will allow a
customer to established data center-to-data center MPLS label switched paths (LSPs).
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
MPLS Advantages
Many of the DCI technologies that we will discuss depend on an MPLS network to transport frames between data centers.
Although in most cases an MPLS network can be substituted with an IP network (i.e., by encapsulating MPLS in GRE), there
E
1. Fast failover between MPLS nodes: Fast reroute and Node/Link protection are two features of an MPLS network
that allow for 50ms or better recovery time in the event of a link failure or node failure along the path of an
MPLS label switched path (LSP).
2. Scalable VPNs: VPLS, EVPN, L3 MPLS VPNs are DCI technologies that use MPLS to transport frames between
data centers. These same technologies allow for the interconnection of many sites (potentially hundreds)
AL
without the need for the manual setup of a full mesh of tunnels between those sites. In most cases, adding a
new site only requires administrator to configure the devices at the new site. The remote sites do not need to be
touched.
RN
3. Traffic engineering: MPLS allows for the administrator to decide the path takes over the MPLS network. You no
longer have to take the same path calculated by the IGP (i.e., all data takes the same path between sites). You
can literally direct different traffic types to take different paths over the MPLS network.
4. Any-to-any connectivity: When using an MPLS backbone to provide the DCI, it will allow you the flexibility to
TE
provide any type of MPLS-based Layer 2 DCI, Layer 3 DCI, or both combinations that you choose. An MPLS
backbone is a network that can generally support most types of MPLS or IP-based connectivity at the same
time.
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
egress router. Duplex traffic requires two LSPsthat is, one path to carry traffic in each direction. An LSP is created by the
concatenation of one or more label-switched hops that direct packets between label-switching routers (LSRs) to transit the
US
MPLS domain.
When an IP packet enters a label-switched path, the ingress router examines the packet and assigns it a label based on its
destination, placing a 32-bit (4-byte) label in front of the packets header immediately after the Layer 2 encapsulation. The
label transforms the packet from one that is forwarded based on IP addressing to one that is forwarded based on the
AL
fixed-length label. The slide shows an example of a labeled IP packet. Note that MPLS can be used to label non-IP traffic,
such as in the case of a Layer 2 VPN.
MPLS labels can be assigned per interface or per router. The Junos operating system currently assigns MPLS label values on
a per-router basis. Thus, a label value of 10234 can only be assigned once by a given Juniper Networks router.
RN
At egress the IP packet is restored when the MPLS label is removed as part of a pop operation. The now unlabeled packet is
routed based on a longest-match IP address lookup. In most cases, the penultimate (or second to last) router pops the label
stack in penultimate hop popping. In some cases, a labeled packet is delivered to the ultimate routerthe egress LSRwhen
the stack is popped, and the packet is forwarded using conventional IP routing.
TE
IN
A RE
SH
T
NO
DO
LY
ON
20-bit label: Identifies the packet as belonging to a particular LSP. This value changes as the packet flows on
the LSP from LSR to LSR.
US
Traffic Class (TC): Formerly called EXP (experimental), these three bits can be used to convey class-of-service
information, specifically the forwarding class a given packet belongs to. The 3-bit width of this field makes it
possible to give a frame a total of eight possible markings, each of them potentially linked to a different
forwarding behavior, for example a different queuing priority and a different buffer size.
AL
Bottom-of-stack bit: many MPLS applications require a packet to be tagged with several labels, one stacked on
top of the other.
The bottom-of-stack bit of a MPLS header is set to 1 if this is the bottom of the label stack, and below is the
payload. The bottom-of-stack bit is set to zero instead if below the header lays another MPLS header (i.e.
RN
another label).
Among the applications which require label stacking are for example VPNs. Here the outer label, or transport
label, indicates which label-switching router traffic should be delivered to. The inner label, called service label,
describes instead how the payload should be treated once it reaches its destination label-switching router.
TE
Time to live (TTL): As in the case for the equivalent IP field, TTL limits the number of hops a MPLS packet can
travel. It is decremented at each hop, and if its value drops to zero, the packet is discarded. When using MPLS
for IP traffic engineering, the default behavior is to copy the value of the IP TTL field into the MPLS TTL field. This
allows diagnostic tools like traceroute to continue working even when packets are encapsulated within MPLS
IN
A RE
SH
T
NO
DO
LY
ON
and traffic following the path will typically be tagged with a different label at each hop.
US
A second important point is that generally labels are global to the router, and not tied to the incoming interface; a packet
tagged with a given label will be subject to the same forwarding treatment regardless from the interface it has been received
on. This apparently minor point will play a major role in MPLS traffic protection - a set of MPLS features that try and minimize
packet loss during a link or a node failure.
There are only very few exceptions to this rule, mostly to do with specific (and very advanced) MPLS applications. One
AL
example is carrier-of-carriers, where a MPLS-enabled service provider offers a MPLS transport service to other service
providers.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
A value of 0 represents the IP version 4 (IPv4) explicit null label. This label indicates that the label must be
popped, and the forwarding of the packet must then be based on what is below it, either another label or the
US
payload.
A value of 1 represents the router alert label. This label value is legal anywhere in the label stack except at the
bottom. When a received packet contains this label value at the top of the label stack, it is delivered to a local
software module for processing. The label beneath it in the stack determines the actual forwarding of the
packet. However, if the packet is forwarded further, the router alert label should be pushed back onto the label
AL
stack before forwarding. The use of this label is analogous to the use of the router alert option in IP packets.
A value of 2 represents the IP version 6 (IPv6) explicit null label. This label value is legal only when it is the sole
label stack entry. It indicates that the label stack must be popped, and the forwarding of the packet then must
RN
this value can never appear in the encapsulation, it can be specified by a label signaling protocol.
Continued on the next page.
IN
RE
The following list is a continuation of reserved Labels 0 through 15 (RFC 3032, MPLS Label Stack Encoding).
A value of 7 is used for the Entropy Label Indicator (ELI). After determining a load balancing methodology, the ELI
allows the ingress LSR to notify the downstream LSRs of the chosen load balancing methodology.
A
A value of 13 is used for Generic Associated Channel Label (GAL). This label informs an LSR that a received LSP
belongs to a Virtual Circuit Connectivity Verification (VCCV) control channel.
SH
A value of 14 is used as the OAM Alert Label. This label indicates that a packet is an MPLS OAM packet as
described in ITU-T Recommendation Y.1711.
Values 46, 8-12, and 15 are reserved for future use.
T
NO
DO
LY
ON
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Label 0 (Explicit null): This label is always assigned an action of decapsulate (pop); the label-switching router
US
will just remove the MPLS header, and take a forwarding action based on what is below it (either another label,
or the actual LSP payload).
Label 3 (Implicit Null): This is a special label value that is never actually found in MPLS frames, but only within
MPLS signaling protocols. It is used by the egress router, i.e. the last hop in a label-switched path, to request the
previous router to remove the MPLS header. This behavior, referred to as penultimate-hop popping, is the
AL
Junos OS default.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Label-Switching Router
The original definition of label-switching router is a router that takes forwarding decisions based only on the content of the
MPLS header. In other words, a label-switching router always operates in label-switching mode. We will use a definition
E
which is slightly less restrictive, to include also ingress and egress routers, sometimes referred to as label edge routers.
Traffic at the ingress or at the egress of a label-switched path is typically not encapsulated into MPLS, so label-switching is
US
not possible, and a forwarding decision needs to be taken according to other rules.
We will use the term label-switching router (LSR) to mean any router which participates in MPLS forwarding, including both
the ingress and the egress nodes. For brevity, in the rest of the course we will also use the term router as synonym for
label-switching router.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
encapsulate a non-MPLS packet and allow it to be forwarded by label switching within the MPLS domain.
Pop: remove a MPLS header from a MPLS-encapsulated packet.
This is often done either at the end of an LSP or, as we will see shortly, by the second-to-last router (the
penultimate hop).
AL
Swap: replace the label value of a MPLS packet with another value.
This operation is typically performed by transit label-switching routers, as a packet traverses a label-switched
path.
After performing one of these MPLS basic operations, the packet is generally forwarded to the next-hop router.
RN
In some cases the forwarding treatment can be more complex, involving different combinations of the three basic
operations. For some types of services, for example for VPNs, it is common to see a double-push forwarding action; while in
some traffic protection scenarios, when building a local detour to avoid a link failure, sometimes a transit router will have to
perform a swap-push operation.
TE
IN
A RE
SH
T
NO
DO
LY
ON
Label-Switched Path
A label-switched path (LSP) is a unidirectional path through the network defined in terms of label switching operations (push,
pop, swap). You can think of a LSP as a tunnel: any packet that enters it is delivered to its endpoint, no matter what type of
E
payload it contains.
US
Establishing a label-switched path across a MPLS domain means determining the actual labels and label operations
performed by the label-switching routers on the path. This can be done with manual configuration, or by some type of
dynamic label distribution protocol.
Often a label-switched path will reside within a single MPLS domain, for example within a single service provider. However,
the development of advanced BGP-based MPLS signaling allows the creation of label-switched paths that span multiple
AL
A RE
SH
T
NO
DO
LY
ON
The ingress router is not a pure label-switching router: the initial decision of which traffic to forward down which LSP is taken
not according to the content of labels (which are not present yet), but according to other criteria, e.g. a route lookup for IP
MPLS traffic engineering, or even the incoming interface, in case of point-to-point transport of layer-2 frames over MPLS
(layer-2 circuits, circuit-cross-connect).
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
frames.
US
Very often transit LSRs will perform a swap operation, replacing the incoming label with the one expected by the next-hop of
the label-switched path. Transit LSRs are typically not aware of the content of the MPLS traffic they are forwarding, and do
not know if the payload is IP, IPv6, layer-2 frames or anything else.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Even if the label information base can be populated by static entries, generally this is done by a dynamic label distribution
US
protocol.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Penultimate-Hop Popping
Often the MPLS header is removed by the second-to-last (the penultimate) router in an LSP. This removal is an optimization
that helps in several cases, including using MPLS for IP traffic engineering. Removing the label at the penultimate hop
E
facilitates the work of the last-hop (egress) router, which, instead of having both to remove the MPLS header and then take
an IP routing decision, will only need to do the latter.
US
Penultimate-hop popping (PHP) is the default behavior on Juniper routers; however, it can be disabled in the configuration.
Some applications require PHP to be disabled, but that is often done automatically: the Junos OS is smart enough to detect
the need to signal the LSP so that PHP is disabled.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
incoming label.
US
In case of MPLS IP traffic engineering, the egress router will be delivered ordinary IP packets due to penultimate-hop
popping, and will take a forwarding decision based on ordinary IP routing.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
As soon as you enable MPLS processing, four default entries are automatically created: they are for label 0 (explicit null),
label 1 (router alert), label 2 (ipv6 explicit null) and label 13 (Generic Associated Label, used for Operation and Maintenance
US
A RE
SH
T
NO
DO
LY
ON
Each label is associated with a forwarding action, typically composed of a MPLS label operation (push, pop, swap or a
US
combination of these) and a next-hop. In this example, label 300576 has been installed by a dynamic protocol called LDP,
while the remaining label, 1004792, has been configured statically.
Note that there are two entries for this last label. This is because, in some cases, a label-switching router may have to take
different forwarding actions according to whether the label is or is not at the bottom of the label stack. In this case, the
forwarding actions turn out to be the same: pop the MPLS header and sent the content to 172.17.23.1 via interface ge-1/1/
AL
5.0. The IP address of the next hop needs of course to be directly connected: it is only use to derive which MAC address to
use for layer-2 encapsulation.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
which one LSR informs a peer LSR of the meaning of the labels used to forward traffic between them. MPLS uses this
information to create the forwarding tables in each LSR.
US
Label distribution protocols are often referred to as signaling protocols. However, label distribution is a more accurate
description of their function and is preferred in this course.
The label distribution protocols create and maintain an LSP dynamically with little or no user intervention. Once the label
distribution protocols are configured for the signaling of an LSP, the egress router of an LSP will send label (and other)
AL
information in the upstream direction towards the ingress router based on the configured options.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
RSVP
The Junos OS uses RSVP as the label distribution protocol for traffic engineered LSPs.
E
RSVP was designed to be the resource reservation protocol of the Internet and provide a general facility for
creating and maintaining distributed reservation state across a set of multicast or unicast delivery paths
US
(RFC 2205). Reservations are an important part of traffic engineering, so it made sense to continue to use
RSVP for this purpose rather than reinventing the wheel.
RSVP was explicitly designed to support extensibility mechanisms by allowing it to carry what are called opaque
objects. Opaque objects make no real sense to RSVP itself but are carried with the understanding that some
adjunct protocol (such as MPLS) might find the information in these objects useful. This encourages RSVP
AL
extensions that create and maintain distributed state for information other than pure resource reservation. The
designers believed that extensions could be developed easily to add support for explicit routes and label
distribution.
RN
RE
Extensions do not make the enhanced version of RSVP incompatible with existing RSVP implementations. An
RSVP implementation can differentiate between LSP signaling and standard RSVP reservations by examining
the contents of each message.
With the proper extensions, RSVP provides a tool that consolidates the procedures for a number of critical
A
signaling tasks into a single message exchange:
SH
Extended RSVP can establish an LSP along an explicit path that would not have been chosen by the
interior gateway protocol (IGP);
Extended RSVP can distribute label-binding information to LSRs in the LSP;
Extended RSVP can reserve network resources in routers comprising the LSP (the traditional role of
T
RSVP); and
Extended RSVP permits an LSP to be established to carry best-effort traffic without making a specific
NO
resource reservation.
Thus, RSVP provides MPLS-signaled LSPs with a method of support for explicit routes (go here, then here, finally here),
path numbering through label assignment, and route recording (where the LSP actually goes from ingress to egress, which is
very handy information to have).
DO
RSVP also gives MPLS LSPs a keepalive mechanism to use for visibility (this LSP is still here and available) and redundancy
(this LSP appears deadis there a secondary path configured?).
LDP
LDP associates a set of destinations (prefixes) with each data link layer LSP. This set of destinations is called the FEC. These
destinations all share a common data LSP path egress and a common unicast routing path. LDP supports topology-driven
MPLS networks in best-effort, hop-by-hop implementations. The LDP signaling protocol always establishes LSPs that follow
the contours of the IGPs shortest path. Traffic engineering is not possible with LDP.
LY
ON
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Layer 2 Options
Three classifications exist for Layer 2 DCIs:
E
1. No MAC learning by the Provider Edge (PE) device: This type of layer 2 DCI does not require that the PE devices
learn MAC addresses.
US
2. Data plane MAC learning by the PE device: This type of DCI requires that the PE device learns the MAC
addresses of both the local data center as well as the remote data centers.
3. Control plane MAC learning - This type of DCI requires that a local PE learn the local MAC addresses using the
control plane and then distribute these learned MAC addressed to the remote PEs.
AL
Layer 3 Options
A Layer 3 DCI uses routing to interconnect data centers. Each data center must maintain a unique IP address space. A
Layer 3 DCI can be established using just about any IP capable link. Another important consideration for DCIs is
RN
incorporating some level of redundancy by using link aggregation groups (LAGs), IGPs using equal cost multipath, and BGP or
MP-BGP using the mutipath or multihop features.
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
(VRF) tables. In a Layer 3 VPN scenario, the PE and CE routers function as routing peers (RIP, OSPF, BGP, etc), with the PE
router terminating the routing exchange between customer sites and the IP/MPLS core. In a Layer 2 VPN scenario, the PEs
US
CE-facing interface is configured with matching VLAN-tagging to the CEs PE-facing interfaces and any frames received from
the CE device will be forwarding over the MPLS backbone to the remote site.
Information is exchanged between PE routers using either MP-BGP or LDP. This information exchange allows the PE routers to
map data to and from the appropriate MPLS LSPs traversing the IP/MPLS core.
AL
PE routers, and Ingress and Egress LSRs, use MPLS LSPs when forwarding customer VPN traffic between sites. LSP tunnels
in the interconnect network separate VPN traffic in the same fashion as PVCs in a legacy ATM or Frame Relay network.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Provider Routers
Provider (P) routers are located in the IP/MPLS core. These routers do not carry VPN data center routes, nor do they interface
in the VPN control and signaling planes. This is a key aspect of the RFC 4364 scalability model; only PE devices are aware of
E
VPN routes, and no single PE router must hold all VPN state information.
US
P routers are involved in the VPN forwarding plane where they act as label-switching routers (LSRs) performing label
swapping (and popping) operations.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
VPN Site
A VPN site is a collection of devices that can communicate with each other without the need to transit the IP/MPLS
backbone (i.e., a single data center). A site can range from a single location with one switch or router to a network consisting
E
A RE
SH
T
NO
DO
LY
ON
route to CE2s address should list three things in terms of next-hop. It will list the outgoing interface and the inner and outer
label that should be pushed onto the IP packet. The outer label is swapped by the P routers along the way to deliver the
US
MPLS packet to PE2. P3 performs a penultimate hop pop, leaving only single labeled packets and forwards them to PE2. PE2
receives the labeled packets, pops the inner label, and uses the inner label to determine which VRF table to use (PE2 might
have many VRF table). PE2 performs a lookup on the Green VRF table (because label 1000=Green VRF) and forwards the
original IP packets to CE2.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
format to PE3 where they enter the red VPN. From PE2s perspective, PE3 is the CE for the green VPN. From PE3s
perspective, PE2 is the CE for the red VPN. You might think that you need 2 physical devices, PE2 and PE3 to stitch the two
US
VPNs together. Well, as the bottom diagram shows, you can actually stitch two VPNs together using a single MX Series
router. You can use the logical tunnel interface feature which are internal interfaces that allow you to connect two virtual
routers together. The two virtual routers enabled on the MX Series device would simply perform the same functions as PE2
and PE3 in the top diagram.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
The LSP
The next few slides are going to discuss the details of MPLS Layer 3 VPNs. One thing to remember with Juniper Networks
routers is that once an LSP is established (from PE1 to PE2 in the diagram) the ingress PE will install a host route (/32) to the
E
loopback interface of the egress router in the inet.3 with a next-hop of the LSP (i.e. outbound interface of LSP and push a
label). This default behavior means that not all traffic entering PE1 can get routed through the LSP. So what traffic gets
US
A RE
SH
T
NO
DO
LY
ON
VPN-IPv4 Route
The VPN-IPv4 route has a very simple purpose which is to advertise IP routes. PE2 installs locally learned routes in its VRF
table. That includes the directly connected PE-CE interface as well as any routes PE2 learns from CE2 (RIP, OSPF, BGP, etc).
E
Once PE2 has locally learned routes in its VRF table, it advertises it (based on configured policy) to remote PEs and attaches
a target community, target community Orange in the example. PE1, upon receiving the route must decide on whether it
US
should keep the route. It makes this decision based on resolving the BGP nexthop in inet.3 as well as looking at the received
route target community. PE1, in order to accept and use this advertisement, must be configured with an import policy that
accepts routes tagged with the Orange target community. Without a configured policy that matches on the Orange route
target, PE1 would just discard the advertisement. So, at a minimum, each VRF on each participating PE for a given VPN must
be configured with an export policy that attaches a unique target community to routes and also configured with an import
AL
policy that matches and accepts advertisements based on that unique target community.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Type 0: This format uses a 2-byte administration field that codes the providers autonomous system number,
followed by a 4-byte assigned number field. The assigned number field is administered by the provider and should
US
The examples on the slide show both the Type 0 and Type 1 route distinguisher formats. The first example shows the 2-byte
administration field with the 4-byte assigned number field (Type 0).
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
When a PE router receives route advertisements from remote PE routers, it determines whether the associated route target
US
matches one of its local VRF tables. Matching route targets cause the PE router to install the route into the VRF table whose
configuration matches the route target.
Because the application of policy determines a VPNs connectivity, you must take extra care when writing and applying VPN
policy to ensure that the tenants connectivity requirements are faithfully met.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
simply by either advertising or not advertising particular routes to a remote site. Another important function of the VRF
export policy is that it will also cause the advertised routes to be tagged with a target community. In the slide, PE2 has a
US
locally learned route (10.1.2/24, the network between PE2 and CE2) in its VRF table. To ensure CE1 and PE1 can send data
to CE2, PE2 has an VRF export policy applied to its IBGP neighbor, PE1, which advertises locally learned routes tagged with
the target community, target:1:1. The next slide shows PE1s process of installing the VPN-IPv4 route in its own VRF table.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
Service Provider service. From the two QFX perspectives, they are separated by an IP network. The QFXs simply forward
VXLAN packets between each other based on the MAC addresses learned through EVPN signaling. The MX devices have an
US
MPLS layer 3 VPN between each other (Bidirectional MPLS LSPs, IGP, L3 VPN MP-BGP routing, etc). The MXs advertise the
local QFXs loopback address to the other MX.
When forwarding data from West to East, QFX1 takes a locally received Ethernet frame and encapsulates it in a VXLAN
packet destined to QFX2s loopback address. MX1 performs a lookup for the received packet on the VRF table associated
AL
with the VPN interface (the incoming interface) and encapsulates the VXLAN packet into two MPLS headers (outer for MPLS
LSP, inner for MX2 VRF mapping). Upon receiving the MPLS encapsulated packet, MX2 uses the inner MPLS header to
determine the VRF table so that it can route the remaining VXLAN packet to QFX2. QFX2 strips the VXLAN encapsulation and
forwards the original Ethernet frame to the destination host.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
signaled using EVPN MP-BGP signaling and are stitched together on the MX devices using logical tunnel interfaces.
US
When forwarding data from West to East, QFX1 takes a locally received Ethernet frame and encapsulates it in a VXLAN
packet destined to MX1s loopback address. MX1 strips the VXLAN encapsulation and forwards the remaining Ethernet
frame out of a logical tunnel interface. MX1 receives the Ethernet frame over the associated (looped) logical tunnel
interface. MX1 takes the locally received Ethernet frame and encapsulates it in two MPLS headers (outer for MPLS LSP,
inner for MX2 VRF mapping). Upon receiving the MPLS encapsulated packet, MX2 uses the inner MPLS header to determine
AL
the appropriate VRF and outgoing interface. MX2 forwards the remaining Ethernet frame out of a logical tunnel interface.
MX2 receives the Ethernet frame over the associated (looped) logical tunnel interface. MX2 takes the locally received
Ethernet frame and encapsulates it in a VXLAN packet destined to QFX2s loopback address. QFX2 strips the VXLAN
encapsulation and forwards the remaining Ethernet frame to the destination host.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
signaled using EVPN MP-BGP signaling and are stitched together on the MX devices using logical tunnel interfaces.
US
When forwarding data from West to East, QFX1 takes a locally received Ethernet frame and encapsulates it in a VXLAN
packet destined to MX1s loopback address. MX1 strips the VXLAN encapsulation and forwards the remaining Ethernet
frame out of a logical tunnel interface. MX1 receives the Ethernet frame over the associated (looped) logical tunnel
interface. MX1 takes the locally received Ethernet frame and encapsulates it in a VXLAN packet destined to MX2s loopback
address. MX2 strips the VXLAN encapsulation and forwards the remaining Ethernet frame out of a logical tunnel interface.
AL
MX2 receives the Ethernet frame over the associated (looped) logical tunnel interface. MX2 takes the locally received
Ethernet frame and encapsulates it in a VXLAN packet destined to QFX2s loopback address. QFX2 strips the VXLAN
encapsulation and forwards the remaining Ethernet frame to the destination host.
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
EVPN over IP
The slide shows an example of the signaling/data plane when using EVPN/VXLAN over an IP network. EVPN MP-BGP is used
to synchronize MAC tables.
E
When forwarding data from West to East, QFX1 takes a locally received Ethernet frame and encapsulates it in a VXLAN
US
packet destined to MX1s loopback address. QFX2 strips the VXLAN encapsulation and forwards the remaining Ethernet
frame to the destination host.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
Stretching Subnets
The slides shows the EVPN Type 2 MAC Advertisements that must be exchanged between data centers when individual
subnets are stretched between data centers. Notice that Host1 and Host2 are attached to the same subnet. The example
E
shows the advertisement of just a single MAC addresses. However, in a real environment you might see 1000s of MAC
addresses advertised between data centers. That is a bunch of routes! MAC moves, adds, and changes in one data center
US
will actually effect the MAC tables/EVPN routing exchanges in another data center.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Unique Subnets
The EVPN Type 5 IP Prefix route can be used in a DCI situation in which the IP subnets between data centers are completely
unique. Notice that Host1 and Host2 are attached to different subnets. This fact is very important to the discussion. In this
E
situation, if Host1 needs to send an IP packet to Host2 it will send it to its default gateway which is the IRB of PE1. Leaf 1 will
encapsulate the Ethernet frames from Host1 into VXLAN and send the VXLAN packets to PE1. PE1 will strip the VXLAN
US
header and notice that the remaining Ethernet frames from Leaf1 have a destination MAC of its own IRB. It will strip the
Ethernet header and route the remaining IP packet based on the routing table associated with the IRB interface. PE1 will use
the EVPN Type 5 route that was received from PE2 for the 10.1.2/24 network and the packet will be forwarded over the
VXLAN tunnel between PE1 and PE2. You might ask yourself, Why couldnt PE1 use a standard IP route? Why does the
10.1.2/24 network need to be advertise by an EVPN Type 5 route? The answer is that the Type 5 route allows for inter-data
AL
center traffic to be forwarded over VXLAN tunnels (i.e. the end to end VXLAN-based VPN is maintained between data
centers). This is very similar to stitching concept discussed earlier. PE2 then receives the VXLAN encapsulated packet and
forwards the remaining IP packet towards the destination over the IRB interface (while encapsulating the IP packet in an
Ethernet header with a destination MAC of Host2). Finally, PE performs a MAC table lookup and forwards the Ethernet frame
RN
A RE
SH
T
NO
DO
LY
ON
DCI Example
The slide highlights the topic we discuss next.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
LDP to automatically establish MPLS LSPs to each others loopback address. Finally, each PE will establish a VPN-IPv4
MP-IBGP session with each other. The PEs will exchange locally learned routes (the loopback addresses of the Leaf nodes)
US
so that they Leaf nodes can establish the overlay network (next slide).
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
2 Gateways and also establish an EVPN MP-IBGP session with each other to exchange EVPN routes to advertise locally
learned MACs to the remote Leaf. Host A and Host B will be able to communicate as if they were on the same LAN segment.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
A RE
SH
T
NO
DO
LY
ON
remote PE. Remember, there needs to be an MPLS LSP established in each direction so you must check the inet.3 table on
both PEs.
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
VRF Configuration
The slide shows the VRF configuration for PE1. Notice the use of the vrf-target statement. Originally, VRF import policies
could only be enabled by writing explicit policies under [edit policy-options] and applying them using the
E
vrf-import and vrf-export statements. However, more recent versions of the Junos Operating System allow you to
skip those steps and simple configure a single vrf-target statement. The vrf-target statement actually enables two
US
hidden policies. One policy is an VRF export policy that takes all locally learned routes in the VRF (direct interface routes as
well as routes learned from the local CE) and advertises them to the remote PE tagged with the specified target community.
The other policy is an VRF import policy that will accept all VPN-IPv4 routes learned from remote PEs that are tagged with the
specified target community.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
MP-BGP Routing
The slide shows how to enable VPN-IPv4 signaling between PEs. Use the show bgp summary command to verify that the
MP-BGP neighbor relationship is established and that the PE is receiving routes from the remote neighbor.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
VRF Table
Remember, the main purpose of establishing an underlay network and the DCI is to ensure that the routers in each site can
reach the loopback addresses (VTEP source addresses) of the remote Leaf nodes. The slide shows that PE1 has learned the
E
A RE
SH
T
NO
DO
LY
ON
Leaf1 Configuration
The slide shows the underlay and overlay network configuration of Leaf1. Leaf2 would be configured very similarly.
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
We Discussed:
The meaning of the term Data Center Interconnect;
E
A RE
SH
T
NO
DO
LY
ON
Review Questions
1.
E
2.
US
3.
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
RE
1.
A DCI can be provided by a point-to-point link, and IP network, or an MPLS network.
2.
A
The VPN-IPv4 NLRI includes an MPLS label, the route distinguisher, and an IP prefix. A target community is also tagged to the route
but it is not officially part of the NLRI.
SH
3.
When the transport network of a DCI is a public IP network, the option available for a DCI is option 3.
T
NO
DO
LY
ON
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO
LY
ON
Content Explorer: Junos OS and ScreenOS software feature information to find the right software release and
hardware platform for your network.
Feature Explorer: Technical documentation for Junos OS-based products by product, task, and software
release, and downloadable documentation PDFs.
AL
Learning Bytes: Concise tips and instructions on specific features and functions of Juniper technologies.
Installation and configuration courses: Over 60 free Web-based training courses on product installation and
configuration (just choose eLearning under Delivery Modality).
RN
J-Net Forum: Training, certification, and career topics to discuss with your peers.
Juniper Networks Certification Program: Complete details on the certification program, including tracks, exam
details, promotions, and how to get started.
TE
Technical courses: A complete list of instructor-led, hands-on courses and self-paced, eLearning courses.
Translation tools: Several online translation tools to help simplify migration tasks.
IN
www.juniper.net
Advanced Data Center Switching
A RE
SH
T
NO
DO
LY
ON
E
US
AL
RN
TE
IN
www.juniper.net
Acronym List
RE
AD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . aggregation device
A
AFI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Address Family Identifier
BGP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Border Gateway Protocol
SH
BUM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . broadcast, unknown unicast, and multicast
CapEx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .capital expenses
CE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . customer edge
CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .command-line interface
CSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Control and Status Protocol
T
DCI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Center Interconnect
EVI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EVPN Instance
NO
FCoE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fibre Channel over Ethernet
FCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Frame Check Sequence
FEC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . forwarding equivalence class
GRE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .generic routing encapsulation
GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .graphical user interface
IBGP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . internal BGP
DO
IGMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Internet Group Management Protocol
IGP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . interior gateway protocol
IPv6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IP version 6
JNCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juniper Networks Certification Program
LAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . link aggregation group
LSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . label switched path
LSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .label-switching router
MAC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . media access control
MC-LAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Multichassis Link Aggregation
LY
P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . provider
PE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . provider edge
PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .penultimate-hop popping
PIM-SM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Protocol Independent Multicast Sparse Mode
RID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . router ID
E
RP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rendezvous point
RPT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rendezvous point tree
US
SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . satellite device
STP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spanning Tree Protocol
VC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virtual Chassis
VCF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virtual Chassis Fabric
VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .virtual machine
VPN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . virtual private network
AL