SE ADCX-14.a-R SG PDF

ARE
SH
Advanced Data Center Switching
T
NO
14.a
DO

Student Guide
LY
ON
E
US
Worldwide Education Services

AL
1133 Innovation Way

Sunnyvale, CA 94089
USA
RN
408-745-2000
www.juniper.net
Course Number: EDU-JUN-ADCX

TE
IN
RE
This document is produced by Juniper Networks, Inc.
This document or any part thereof may not be reproduced or transmitted in any form under penalty of law, without the prior written permission of Juniper Networks Education
Services.
Juniper Networks, Junos, Steel-Belted Radius, NetScreen, and ScreenOS are registered trademarks of Juniper Networks, Inc. in the United States and other countries. The
Juniper Networks Logo, the Junos logo, and JunosE are trademarks of Juniper Networks, Inc. All other trademarks, service marks, registered trademarks, or registered service
A
marks are the property of their respective owners.
Advanced Data Center Switching Student Guide, Revision 14.a
Copyright 2016 Juniper Networks, Inc. All rights reserved.
SH
Printed in USA.
Revision History:
Revision 14.aApril 2016
The information in this document is current as of the date listed above.
The information in this document has been carefully verified and is believed to be accurate for software Release 14.1X53.
T
Juniper Networks assumes no responsibilities for any inaccuracies that may appear in this document. In no event will Juniper Networks be liable for direct, indirect, special,
exemplary, incidental, or consequential damages resulting from any defect or omission in this document, even if advised of the possibility of such damages.
NO
DO
Juniper Networks reserves the right to change, modify, transfer, or otherwise revise this publication without notice.
YEAR 2000 NOTICE
Juniper Networks hardware and software products do not suffer from Year 2000 problems and hence are Year 2000 compliant. The Junos operating system has no known
time-related limitations through the year 2038. However, the NTP application is known to have some difficulty in the year 2036.

SOFTWARE LICENSE
The terms and conditions for using Juniper Networks software are described in the software license provided with the software, or to the extent applicable, in an agreement
executed between you and Juniper Networks, or Juniper Networks agent. By using Juniper Networks software, you indicate that you understand and agree to be bound by its
license terms and conditions. Generally speaking, the software license restricts the manner in which you are permitted to use the Juniper Networks software, may contain
prohibitions against certain uses, and may state conditions under which the license is automatically terminated. You should consult the software license for further details.
LY
ON
E
US
AL
RN
TE
IN
Contents
A RE
Chapter 1: Course Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
SH
Chapter 2: Next Generation Data Centers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
Traditional Multitier Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
Data Center Fabric Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
T
Chapter 3: IP Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1
NO
IP Fabric Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
IP Fabric Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12
IP Fabric Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-25
Configure an IP Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-30
Lab: IP Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-49
DO
Chapter 4: VXLAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1
Layer Connectivity Over a Layer 3 Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
VXLAN Using Multicast Control Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11

VXLAN Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-24
Lab: VXLAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-42
LY
Chapter 5: EVPN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1

The Benefits of EVPN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3
VXLAN Using EVPN Control Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11
EVPN /VXLAN Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-31
ON
Lab: EVPN Control Plane for VXLAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-56
Chapter 6: Data Center Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1

DCI Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3
E
MPLS VPN Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10

DCI Options for a VXLAN Overlay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-43
US
EVPN Type 5 Routes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-49

DCI Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-52
Lab: Data Center Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-63
AL
Acronym List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ACR-1

RN
TE
IN
www.juniper.net Contents iii

RE
A
SH
T
NO
DO

LY
ON
E
US
AL
RN
TE
IN
iv Contents www.juniper.net
Course Overview
RE
This two-day course is designed to introduce various QFX5k and MX/vMX features including, but not limited to, IP Fabric,
Virtual eXtensible Local Area Network (VXLAN) Layer 2 and Layer 3 Gateways, VXLAN with Ethernet VPN (EVPN) signaling,
Data Center Interconnect (DCI) for a VXLAN overlay. Students will learn to configure and monitor these features that exist
A
on the Junos operating system running on the QFX5100 and vMX Series platform.
Through demonstrations and hands-on labs, students will gain experience configuring, monitoring, and analyzing the
SH
above features of the Junos OS. This course is based on software Release 14.1X53.
Intended Audience
This course benefits individuals responsible for configuring and monitoring switching features that exist on the Junos OS
running on the QFX5k and MX Series platforms, including individuals in professional services, sales and support
T
organizations, and the end users.
NO
Course Level
Advanced Data Center Switching (ADCX) is an advanced-level course.
Prerequisites
The following are the prerequisites for this course:
DO
Understanding of the OSI model;
Junos OS configuration experiencethe Introduction to the Junos Operating System (IJOS) course or
equivalent;
Advanced routing knowledgethe Advanced Junos Enterprise Routing (AJER) course or equivalent; and

Intermediate switching knowledgethe Junos Enterprise Switching Using Enhanced Layer 2 Software
(JEX-ELS) and Data Center Switching (DCX) courses or equivalent.
Objectives
LY
After successfully completing this course, you should be able to:

Describe the benefits and challenges of the traditional multitier architecture.
ON
Describe the new networking requirements in a data center.

Describe the various data center fabric architectures.
Explain routing in an IP Fabric.
Describe how to scale an IP Fabric.
E
Configure an EBGP-based IP Fabric.

US
Explain why you would use VXLAN in your data center.

Describe the control and data plane of VXLAN in a controller-less overlay.
Describe how to configure and monitor VXLAN when using multicast signaling.
AL
Describe the benefits of using EVPN signaling for VXLAN.

Describe the operation of the EVPN protocol.
Configure and monitor EVPN signaling for VXLAN.
RN
Define the term Data Center Interconnect.

Describe the control and data plane of an MPLS VPN.
Describe the DCI options when using a VXLAN overlay with EVPN signaling.
TE
IN
www.juniper.net Course Overview v

Course Agenda
RE
Day 1
Chapter 1: Course Introduction
A
Chapter 2: Next Generation Data Centers
SH
Chapter 3: IP Fabric
Lab: IP Fabric
Chapter 4: VXLAN
Lab: VXLAN
T
Day 2
NO
Chapter 5: EVPN
Lab: VXLAN with EVPN Signaling
Chapter 6: Data Center Interconnect
Lab: Data Center Interconnect
DO

LY
ON
E
US
AL
RN
TE
IN
vi Course Agenda www.juniper.net

Document Conventions
RE
CLI and GUI Text
Frequently throughout this course, we refer to text that appears in a command-line interface (CLI) or a graphical user
A
interface (GUI). To make the language of these documents easier to read, we distinguish GUI and CLI text from chapter
text according to the following table.
SH
Style Description Usage Example
Franklin Gothic Normal text. Most of what you read in the Lab Guide
and Student Guide.
T
Courier New Console text:
commit complete
Screen captures
NO
Noncommand-related Exiting configuration mode
syntax
GUI text elements:
Select File > Open, and then click
Menu names Configuration.conf in the
DO
Filename text box.
Text field entry
Input Text Versus Output Text

You will also frequently see cases where you must enter input text yourself. Often these instances will be shown in the

context of where you must enter them. We use bold style to distinguish text that is input versus text that is simply
displayed.

LY
Normal CLI No distinguishing variant. Physical interface:fxp0,

Enabled
Normal GUI
View configuration history by clicking
ON
Configuration > History.
CLI Input Text that you must enter. lab@San_Jose> show route
GUI Input Select File > Save, and type
config.ini in the Filename field.
E
Defined and Undefined Syntax Variables

US
Finally, this course distinguishes between regular text and syntax variables, and it also distinguishes between syntax
variables where the value is already assigned (defined variables) and syntax variables where you must assign the value
(undefined variables). Note that these styles can be combined with the input style as well.
AL
CLI Variable Text where variable value is already policy my-peers

assigned.
GUI Variable Click my-peers in the dialog.
RN
CLI Undefined Text where the variables value is Type set policy policy-name.
the users discretion or text where
ping 10.0.x.y
the variables value as shown in
GUI Undefined the lab guide might differ from the Select File > Save, and type
TE
value the user must input filename in the Filename field.

according to the lab topology.
IN
www.juniper.net Document Conventions vii

Additional Information
RE
Education Services Offerings
You can obtain information on the latest Education Services offerings, course dates, and class locations from the World
A
Wide Web by pointing your Web browser to: http://www.juniper.net/training/education/.
About This Publication
SH
The Advanced Data Center Switching Student Guide was developed and tested using software Release 14.1X53.
Previous and later versions of software might behave differently so you should always consult the documentation and
release notes for the version of code you are running before reporting errors.
This document is written and maintained by the Juniper Networks Education Services development team. Please send
T
questions and suggestions for improvement to training@juniper.net.
NO
Technical Publications
You can print technical manuals and release notes directly from the Internet in a variety of formats:
Go to http://www.juniper.net/techpubs/.
Locate the specific software or hardware release and title you need, and choose the format in which you
DO
want to view or print the document.
Documentation sets and CDs are available through your local Juniper Networks sales office or account representative.
Juniper Networks Support
For technical support, contact Juniper Networks at http://www.juniper.net/customers/support/, or at 1-888-314-JTAC

(within the United States) or 408-745-2121 (outside the United States).
LY
ON
E
US
AL
RN
TE
IN
viii Additional Information www.juniper.net

RE
A
SH
T
NO
Chapter 1: Course Introduction
DO

LY
ON
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO

LY
ON
We Will Discuss:
Objectives and course content information;
E
Additional Juniper Networks, Inc. courses; and

The Juniper Networks Certification Program.
US
AL
RN
TE
IN
Chapter 12 Course Introduction www.juniper.net

A RE
SH
T
NO
DO

LY
ON
Introductions
The slide asks several questions for you to answer during class introductions.
E
US
AL
RN
TE
IN
www.juniper.net Course Introduction Chapter 13

A RE
SH
T
NO
DO

LY
ON
Course Contents
The slide lists the topics we discuss in this course.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Prerequisites
The slide lists the prerequisites for this course.
E
US
AL
RN
TE
IN

RE
A
SH
T
NO
DO

LY
ON
General Course Administration

The slide documents general aspects of classroom administration.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Training and Study Materials

The slide describes Education Services materials that are available for reference both in the classroom and online.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Additional Resources
The slide provides links to additional resources available to assist you in the installation, configuration, and operation of
Juniper Networks products.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Satisfaction Feedback
Juniper Networks uses an electronic survey system to collect and analyze your comments and feedback. Depending on the
class you are taking, please complete the survey at the end of the class, or be sure to look for an e-mail about two weeks
E
from class completion that directs you to complete an online survey form. (Be sure to provide us with your current e-mail
address.)
US
Submitting your feedback entitles you to a certificate of class completion. We thank you in advance for taking the time to
help us improve our educational offerings.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Juniper Networks Education Services Curriculum

Juniper Networks Education Services can help ensure that you have the knowledge and skills to deploy and maintain
cost-effective, high-performance networks for both enterprise and service provider environments. We have expert training
E
staff with deep technical and industry knowledge, providing you with instructor-led hands-on courses in the classroom and
online, as well as convenient, self-paced eLearning courses. In addition to the courses shown on the slide, Education
US
Services offers training in automation, E-Series, firewall/VPN, IDP, network design, QFabric, support, and wireless LAN.
Courses
Juniper Networks courses are available in the following formats:
AL
Classroom-based instructor-led technical courses

Online instructor-led technical courses
Hardware installation eLearning courses as well as technical eLearning courses
RN
Learning bytes: Short, topic-specific, video-based lessons covering Juniper products and technologies
Find the latest Education Services offerings covering a wide range of platforms at
http://www.juniper.net/training/technical_education/.
TE
IN

A RE
SH
T
NO
DO

LY
ON
Juniper Networks Certification Program

A Juniper Networks certification is the benchmark of skills and competence on Juniper Networks technologies.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Juniper Networks Certification Program Overview

The Juniper Networks Certification Program (JNCP) consists of platform-specific, multitiered tracks that enable participants
to demonstrate competence with Juniper Networks technology through a combination of written proficiency exams and
E
hands-on configuration and troubleshooting exams. Successful candidates demonstrate a thorough understanding of
Internet and security technologies and Juniper Networks platform configuration and troubleshooting skills.
US
The JNCP offers the following features:

Multiple tracks;
Multiple certification levels;
AL
Written proficiency exams; and

Hands-on configuration and troubleshooting exams.
Each JNCP track has one to four certification levelsAssociate-level, Specialist-level, Professional-level, and Expert-level. The
RN
Associate-level, Specialist-level, and Professional-level exams are computer-based exams composed of multiple choice
questions administered at Pearson VUE testing centers worldwide.
Expert-level exams are composed of hands-on lab exercises administered at select Juniper Networks testing centers. Please
visit the JNCP website at http://www.juniper.net/certification for detailed exam information, exam pricing, and exam
TE
registration.
IN

A RE
SH
T
NO
DO

LY
ON
Preparing and Studying

The slide lists some options for those interested in preparing for Juniper Networks certification.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Junos Genius
The Junos Genius application takes certification exam preparation to a new level. With Junos Genius you can practice for
your exam with flashcards, simulate a live exam in a timed challenge, and even build a virtual network with device
E
achievements earned by challenging Juniper instructors. Download the app now and Unlock your Genius today!
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Find Us Online
The slide lists some online resources to learn and share information about Juniper Networks.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Any Questions?
If you have any questions or concerns about the class you are attending, we suggest that you voice them now so that your
instructor can best address your needs during class.
E
US
AL
RN
TE
IN

RE
A
SH
T
NO
Chapter 2: Next Generation Data Centers
DO

LY
ON
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO

LY
ON
We Will Discuss:
The benefits and challenges of the traditional multitier architecture;
E
The networking requirements that are requiring a change to the design of a data center; and
The various data center fabric architectures.
US
AL
RN
TE
IN
Chapter 22 Next Generation Data Centers www.juniper.net

A RE
SH
T
NO
DO

LY
ON
Traditional Multitier Architecture

The slide lists the topics we will discuss. We discuss the highlighted topic first.
E
US
AL
RN
TE
IN
www.juniper.net Next Generation Data Centers Chapter 23

A RE
SH
T
NO
DO

LY
ON
Multiple Tiers
Legacy data centers are often hierarchical and consist of multiple layers. The diagram on the slide illustrates the typical
layers, which include access, distribution (sometimes referred to as aggregation), and core. Each of these layers performs
E
unique responsibilities. We cover the functions of each layer on a subsequent slide in this section.
US
Hierarchical networks are designed in a modular fashion. This inherent modularity facilitates change and makes this design
option quite scalable. When working with a hierarchical network, the individual elements can be replicated as the network
grows. The cost and complexity of network changes is generally confined to a specific portion (or layer) of the network rather
than to the entire network.
Because functions are mapped to individual layers, faults relating to a specific function can be isolated to that functions
AL
corresponding layer. The ability to isolate faults to a specific layer can greatly simplify troubleshooting efforts.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Functions of Layers
The individual layers usually represent specific functions found within a network. It is often mistakenly thought that the
access, distribution (or aggregation), and core layers must exist in clear and distinct physical devices, but this is not a
E
requirement, nor does it make sense in some cases.

US
The slide highlights the access, aggregation, and core layers and provides a brief description of the functions commonly
implemented in those layers. If CoS is used in a network, it should be incorporated consistently in all three layers.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Benefits of Using Hierarchy

Data centers built utilizing a hierarchical implementation can bring some flexibility to designers:
E
Since using a hierarchical implementation does not require the use of proprietary features or protocols, a
multitier topology can be constructed using equipment from multiple vendors.
US
A multitier implementation allows flexible placement of a variety of switching platforms. The simplicity of the
protocols used does not require specific Junos versions or platform positioning.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Challenges of Using Hierarchy

Data centers built more than a few years ago face one or more of the following challenges:
E
The legacy multitier switching architecture cannot provide todays applications and users with predictable
latency and uniform bandwidth. This problem is made worse when virtualization is introduced, where the
US
performance of virtual machines (VMs) depends on the physical location of the servers hosting those VMs.
The management of an ever growing data center is becoming more and more taxing administratively speaking.
While the north to south boundaries have been fixed for years, the east to west boundaries have not stopped
growing. This growth, of the compute, storage, and infrastructure, requires a new management approach.
AL
The power consumed by networking gear represents a significant proportion of the overall power consumed in
the data center. This challenge is particularly important today, when escalating energy costs are putting
additional pressure on budgets.
The increasing performance and densities of modern CPUs has led to an increase in network traffic. The
RN
network is often not equipped to deal with the large bandwidth demands and increased number of media
access control (MAC) addresses and IP addresses on each network port.
Separate networks for Ethernet data and storage traffic must be maintained, adding to the training and
management budget. Siloed Layer 2 domains increase the overall costs of the data center environment. In
TE
addition, outages related to the legacy behavior of the Spanning Tree Protocol (STP), which is used to support
these legacy environments, often results in lost revenue and unhappy customers.
Given these challenges, along with others, data center operators are seeking solutions.
IN

A RE
SH
T
NO
DO

LY
ON
Resource Utilization
In the multitier topology displayed on the slide, you can see that almost half the links are not utilized. In this example you
would also need to be running some type of spanning tree protocol (STP) to avoid loops which would introduce a delay with
E
your network convergence as well as introduce significant STP control traffic taking up valuable bandwidth.
US
This topology is relatively simple but allows us to visualize the lack of resource utilization. Imagine a data center with a
hundred racks of servers with a hundred top of rack access switches. The access switches all aggregate up to the
core/distribution switches including redundant connections. In this much larger and complicated network you would have
1000s of physical cable connections that are not being utilized. Now imagine these connections are fiber, in addition to the
unused cables you would also have two transceivers per connection that are not being used. Because of the inefficient use
AL
of physical components there is a significant amount of usable bandwidth that is sitting idle.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Data Center Fabric Architectures

The slide highlights the topic we discuss next.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Applications Are Driving Change

Data centers must be flexible and change as the users needs change. This means that todays data centers must evolve and
are becoming flatter simpler and more flexible in order to keep up with the constantly increasing end user demands.
E
Understanding why these changes are being implemented is important when trying to understand the needs of the
customer. There are a few reasons impacting this change including:
US
Application Flows: More east-west traffic communication is happening in data centers. With todays
applications, many requests can generate a lot of traffic between devices in a single data center. Basically a
single user request triggers a barrage of additional request to other devices. Then go here, get this, then go
here get that, behavior of many applications is being done on such a large scale today that it is driving data
AL
centers to become flatter and provide higher performance with consistency.

Network Virtualization: This means overlay networks for example, NSX and Contrail. Virtualization is being
implemented in todays data centers and will continue to gain in popularity in the future. Some customer might
not be currently using virtualization in their data center, but it could definitely plays a role in your design for
RN
those customers that are forward looking and eventually want to incorporate some level of virtualization.
Everything as a service: To be cost effective, a data center offering hosting services must be easy to scale out
and scale back as demands change. The data center should be very agile and easy to deploy new services
quickly.
TE
IN

A RE
SH
T
NO
DO

LY
ON
Data Center Fabric Architectures

The graphic on the slide is designed to serve as a quick Juniper Networks data center architecture guide based strictly on the
access (server) ports needed. The scaling numbers provided are calculated based on access switches that have 96 available
E
server ports.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Multitier Using MC-LAG

This combination is the recommended deployment method if the data center requires a standard multitier architecture.
Multichassis link aggregation groups (MC-LAGs) are very useful in a data center when deployed at the access layer to allow
E
redundant connections to your servers as well as offers dual control planes. In addition to the access layer, MC-LAGs are also
commonly deployed at the core layer. When MC-LAG is deployed in an Active/Active fashion, both links between the attached
US
device and the MC-LAG peers are active and available for forwarding traffic. Using MC-LAG eliminates the need to run STP on
member links and depending on the design, can eliminate the need for STP all together.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Virtual Chassis Fabric

The Juniper Networks Virtual Chassis Fabric (VCF) provides a low-latency, high-performance fabric architecture that can be
managed as a single device. VCF is an evolution of the Virtual Chassis feature, which enables you to interconnect multiple
E
devices into a single logical device, inside of a fabric architecture. A VCF is constructed using a spine-and-leaf architecture. In
the spine-and-leaf architecture, each spine device is interconnected to each leaf device. A VCF supports up to 32 total
US
devices, including up to four devices being used as the spine.

AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
QFabric
The QFabric System is composed of multiple components working together as a single switch to provide high-performance,
any-to-any connectivity and management simplicity in the data center. The QFabric System flattens the entire data center
E
network to a single tier where all access points are equal, eliminating the effects of network locality and making it the ideal
network foundation for cloud-ready, virtualized data centers. QFabric is a highly scalable system that improves application
US
performance with low latency and converged services in a non-blocking, lossless architecture that supports Layer 2, Layer 3,
and Fibre Channel over Ethernet (FCoE) capabilities. The reason you can consider the QFabric system as a single system is
that the Director software running on the Director group allows the main QFabric system administrator to access and
configure every device and port in the QFabric system from a single location. Although you configure the system as a single
entity, the fabric contains four major hardware components. The hardware components can be chassis-based, group-based,
AL
or a hybrid of the two.

RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Junos Fusion
Junos Fusion is a Juniper Networks Ethernet fabric architecture designed to provide a bridge from legacy networks to
software-defined cloud networks. With Junos Fusion, service providers and enterprises can reduce network complexity and
E
operational costs by collapsing underlying network elements into a single, logical point of management. The Junos Fusion
architecture consists of two major components: aggregation devices and satellite devices. With this structure it can also be
US
classified as a spine and leaf architecture. These components work together as a single switching system, flattening the
network to a single tier without compromising resiliency. Data center operators can build individual Junos Fusion pods
comprised of a pair of aggregation devices and a set of satellite devices. Each pod is a collection of aggregation and satellite
devices that are managed as a single device. Pods can be smallfor example, a pair of aggregation devices and a handful of
satellitesor large with up to 64 satellite devices based on the needs of the data center operator.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
IP Fabric
An IP Fabric is one of the most flexible and scalable data center solutions available. Because an IP Fabric operates strictly
using Layer 3, there are no proprietary features or protocols being used so this solution works very well with data centers
E
that must accommodate multiple vendors. One of the most complicated tasks in building an IP Fabric is assigning all of the
details like IP addresses, BGP AS numbers, routing policy, loopback address assignments, and many other implementation
US
details.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
IP-Based Data Centers

Next generation data centers have different requirements than the traditional data center. One major requirement in a next
generation data center is that traffic is load balanced over the multiple paths between rack in a data center. Also, a
E
requirement that is becoming less and less necessary is the ability of the underlying switch fabric to carry native Ethernet
frames between VMs/server in different racks. Some of the major reasons for this shift are...
US
1. IP-only Data: Many data centers simply need IP connectivity between racks of equipment. There is less and less
need for the stretching of Ethernet networks over the fabric. For example, one popular compute and storage
methodology is Apaches Hadoop. Hadoop allows for a large set of data (i.e. like a single Tera-bit file) to be
stored in chunks across many servers in a data center. Hadoop also allows for the stored chunks of data to be
processed in parallel by the same servers they are stored upon. The connectivity between the possibly
AL
hundreds of servers needs only to be IP-based.

2. Overlay Networking: Overlay networking allows for Layer 2 connectivity between racks however, instead of layer
2 frames being transferred natively over the fabric, they are tunneled using a different outer encapsulation.
RN
virtual eXtensible local area network (VXLAN), multiprotocol label switching (MPLS), and generic routing
encapsulation (GRE) are some of the common tunneling protocols used to transport Layer 2 frames of the
fabric of a data center. One of the benefits of overlay networking is that when there is a change to layer 2
connectivity between VMs/servers (the overlay network), the underlying fabric (underlay network) can remain
relatively untouched and unaware of the changes occurring in the overlay network.
TE
IN

A RE
SH
T
NO
DO

LY
ON
Layer 2 Transport Network

The diagram above shows a typical scenario with a Layer 2 underlay network with attached servers that host VMs as well as
virtual switches. The example shows the underlay network as an Ethernet fabric. The fabric solves some of the customer
E
requirements including load balancing over equal cost paths (assuming Virtual Chassis Fabric) as well as having no blocked
spanning tree ports in the network. However, this topology does not solve the VM agility problem or the 802.1q VLAN overlap
US
problem. Also, as 802.1q VLANs are added to the virtual switches, those same VLANs must be provisioned on the underlay
network. Managing the addition, removal, and movement of VMs (and their VLANs) for the 1000s of customers would be a
nightmare for the operators of the underlay network.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Overlay Networking
Overlay networking can help solve many of the requirements and problems discussed in the previous slides. This slide shows
the addition of an overlay network that includes the use of VXLAN. The overlay network consists of the virtual switches and
E
the VXLAN tunnel endpoints (VTEPs). A VTEP will encapsulate the Ethernet frames that it receives from the virtual switch into
IP and forward the resulting IP packet to the remote VTEP. The underlay network simply needs to forward IP packets between
US
VTEPs. The receiving VTEP will de-encapsulate the VXLAN IP packets and then forward the resulting Ethernet Frame to the
appropriate VM. Adding and removing VMs from the data center has no effect on the underlay network. The underlay
network simply needs to provide IP connectivity between the VTEPs.
When implementing the underlay network in this scenario, you have a few choices. You can use an Ethernet fabric like Virtual
AL
Chassis (VC), Virtual Chassis Fabric (VCF), or Junos Fusion. All of these are valid solutions. Because all of the traffic crossing
the underlay network is IP, the option for an IP fabric becomes available. The choice of underlay network comes down to
scale and future growth. An IP fabric is considered to be the most scalable underlay solution.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
We Discussed:
The benefits and challenges of the traditional multitier architecture;
E
The networking requirements that are requiring a change to the design of a data center; and
The various data center fabric architectures.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Review Questions
1.
E
2.
US
3.
AL
RN
TE
IN

Answers to Review Questions
RE
1.
Some of the challenges of the traditional data center designs are the slow convergence times of xSTPs as well as the wasted resources of
unused (blocked by xSTP) interfaces.
A
2.
Some of the applications that are driving change in the data center are multitenancy, increase in east-to-west traffic, Hadoop, and
SH
overlay networking.
3.
Layer 2 networks can be stretched over an IP network using an overlay like VXLAN or GRE.
T
NO
DO

LY
ON
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
E
US
AL
RN
TE
IN

RE
A
SH
T
NO
Chapter 3: IP Fabric
DO

LY
ON
E
US
AL
RN
TE
IN
RE
A
SH
T
NO
DO

LY
ON
We Will Discuss:
Routing in an IP Fabric;
E
Scaling of an IP Fabric; and

Configuring an IP Fabric.
US
AL
RN
TE
IN
Chapter 32 IP Fabric www.juniper.net

A RE
SH
T
NO
DO

LY
ON
IP Fabric Overview
E
US
AL
RN
TE
IN
www.juniper.net IP Fabric Chapter 33

A RE
SH
T
NO
DO

LY
ON
IP Fabric
An IP Fabric is one of the most flexible and scalable data center solutions available. Because an IP Fabric operates strictly
using Layer 3, there are no proprietary features or protocols being used so this solution works very well with data centers
E
that must accommodate multiple vendors. Some of the most complicated tasks in building an IP Fabric are assigning all of
the details like IP addresses, BGP AS numbers, routing policy, loopback address assignments, and many other
US
implementation details.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
A Three Stage Clos Network

In the 1950s, Charles Clos first wrote about his idea of a non-blocking, multistage, telephone switching architecture that
would allow calls to be completed. The switches in his topology are called crossbar switches. A Clos network is based on a
E
three-stage architecture, an ingress stage, a middle stage, and an egress stage. The theory is that there are multiple paths
for a call to be switched through the network such that calls will always be connected and not "blocked" by another call. The
US
term Clos fabric came about later as people began to notice that the pattern of links looked like threads in a woven piece
of cloth.
You should notice that the goal of the design is to provide connectivity from one ingress crossbar switch to an egress
crossbar switch. Notice that there is no need for connectivity between crossbar switches that belong to the same stage.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
An IP Fabric Is Based on a Clos Fabric

The diagram shows an IP Clos Fabric using Juniper Networks switches. In an IP Fabric the Ingress and Egress stage crossbar
switches are called Leaf nodes. The middle stage crossbar switches are called Spine nodes. Most diagrams of an IP Fabric
E
do not present the topology with 3 distinct stages as shown on this slide. Most diagrams show an IP Fabric with the Ingress
and Egress stage combined as a single stage. It would be like taking the top of the diagram and folding it over onto itself with
US
all Spines nodes on top and all Leaf nodes on the bottom of the diagram (see the next slide).
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Spine and Leaf Architecture: Part 1

To maximize the throughput of the fabric, each Leaf node should have a connection to each Spine node. This ensures each
server-facing interface is always two hops away from any other server-facing interfaces. This creates a highly resilient fabric
E
with multiple paths to all other devices. An important fact to keep in mind is that a member switch has no idea of its location
(Spine or Leaf) in an IP Fabric. The Spine or Leaf function is simply a matter of a devices physical location in the fabric. In
US
general, the choice of router to be used as a Spine nodes should be partially based on the interface speeds and number of
ports that it supports. The example on the slide shows an example where every Spine node is a QFX5100-24q. The
QFX5100-24q supports (32) 40GbE interfaces and was literally designed by Juniper to be a Spine node.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Spine and Leaf Architecture: Part 2

The slide shows that there are four distinct paths (1 path per Spine node) between Host A and Host B across the fabric. In an
IP Fabric, the main goal of your design should be that traffic is automatically load shared over those equal cost paths using a
E
hash algorithm (keeping frames from same flow on same path).

US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
IP Fabric Design Options

IP Fabrics are generally structured in either a 3-stage topology or a 5-stage topology. A 3-stage topology is used in small to
medium deployments. We cover the configuration of a 3-stage fabric in the upcoming slides. A 5-stage topology is used in a
E
medium to large deployment. Although we do not cover the configuration of a 5-stage fabric, you should know that the
configuration of a 5-stage fabric is quite complicated.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Recommended Spine Nodes

The slide shows some of the recommended Juniper Networks products that can act as Spine nodes. As stated earlier, you
should consider port density and scaling limitations when choosing the product to place in the Spine location. Some of the
E
pertinent features for a Spine node include overlay networking support, Layer 2 and Layer 3 VXLAN Gateway support, and
number of VLANs supported.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Recommended Leaf Nodes

The slide shows some of the recommended Juniper Networks products that can act as Leaf nodes.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
IP Fabric Routing
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Routing Strategy: Part 1

The slide highlights the desired routing behavior of a Leaf node. Ideally, each Leaf node should have multiple next-hops to
use to load share traffic over the IP fabric. Notice the router C can use two different paths to forward traffic to any remote
E
destination.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Routing Strategy: Part 2

The slide highlights the desired routing behavior of a Spine node. Ideally, each Spine node should have multiple next-hops to
use to load share traffic to remote destinations attached to the IP fabric. Notice that routers D and E have one path for singly
E
homed hosts and two path available for multihomed hosts. It just so happens that getting these routes and associated next
hops into the forwarding table of a Spine node can be tricky. The rest of the chapter discusses the challenges as well as the
US
solutions to the problem.

AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Layer 3 Connectivity
Remember that your IP Fabric will be forwarding IP data only. Each node will be an IP router. In order to forward IP packets
between routers, they need to exchange IP routes. So, you have to make a choice between routing protocols. You want to
E
ensure that your choice of routing protocol is scalable and future proof. As you can see by the chart, BGP is the natural
choice for a routing protocol.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
IBGP: Part 1
IBGP is a valid choice as the routing protocol for your fabric. IBGP peers almost always peer to loopback addresses as
opposed to physical interface addresses. In order to establish a BGP session (over a TCP session), a router must have a route
E
to the loopback address of its neighbor. To learn the route to a neighbor an Interior Gateway Protocol (IGP) like OSPF must be
enabled in the network. One purpose of enabling an IGP is simply to ensure every router knows how to get to the loopback
US
address of all other routers. Another problem that OSPF will solve is determining all of the equal cost paths to remote
destinations. For example, router A will determine from OSPF that there are 2 equal cost paths to reach router B. Now
router A can load share traffic destined for router Bs loopback address (IBGP learned routes, see next few slides) across the
two links towards router B.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
IBGP: Part 2
There is a requirement in an IBGP network that if one IBGP router needs to advertise an IBGP route, then every other IBGP
router must receive a copy of that route (to prevent black holes). One way to ensure this happens is to have every IBGP router
E
peer with every other IBGP router (a full mesh). This works fine but it does not scale (i.e., add a new router to your IP fabric
and you will have to configure every router in your IP fabric with a new peer). There are two ways to help scale the full mesh
US
issue; route reflection or confederations. Most often, it is route reflection that is chosen (it is easy to implement). It is
possible to have redundant route reflectors as well (shown on the slide). It is best practice to configure one or more of the
Spine nodes as route reflectors.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
IBGP: Part 3
Note: The next few slides will highlight the problem faced by a Spine node (router D) that is NOT a route reflector.
E
You must build your IP Fabric such that all routers load share traffic over equal cost paths (when they exist) towards remote
networks. Each router should be configured for BGP multipath so that they will load share when multiple BGP routes exist.
US
The slide shows that router A and B advertise the 10.1/16 network to RR-A. RR-A will use both routes for forwarding
(multipath) but will chose only one of those routes (the one from router B because it B has the lowest router ID) to send to
router C (a Leaf node) and router D (a Spine node). Router C and router D will receive the route for 10.1/16. Both copies will
have a BGP next hop of router Bs loopback address. This is the default behavior of route advertisement and selection in the
IBGP with route reflection scenario.
AL
Did you notice the load balancing problem (Hint: the problem is not on router C)? Since router C has two equal cost paths to
get to router B (learned from OSPF), router C will load share traffic to 10.1/16 over the two uplinks towards the Spine routers.
The load balancing problem lies on router D. Since router D received a single route that has a BGP next hop of router Bs
loopback, it forwards all traffic destined to 10.1/16 towards router B. The path to router A (which is an equal cost path to
RN
10.1/16) will never be used in this case. The next slide discusses the solution to this problem.
It should be worth noting that although router C has no problem load sharing towards the 10.1/16 network, if router B were
to fail, it may take some time for router C to learn about the router through router A. The next slide discusses the solution to
this problem.
TE
IN

A RE
SH
T
NO
DO

LY
ON
IBGP: Part 4
The problem on RR-A is that it sees the routes received from routers A and B, 10.1/16, as a single route that has been
received twice. If an IBGP router receives different versions of the same route it is supposed to make a choice between them
E
and then advertise the one, chosen route to its appropriate neighbors. One solution to this problem is to make every Spine
node a route reflector. This would be fine in a small fabric but probably would not make sense when there are 10s of Spine
US
nodes. Another option would be to make each of the advertisements from router A and B look like unique routes. How can we
make the multiple advertisements of 10.1/16 from router A and B appear to be unique routes? There is a draft RFC
(draft-ietf-idr-add-paths) that defines the ADD-PATH capability which does just that; makes the advertisements look unique.
All routers in the IP Fabric should support this capability for it to work. Once enabled, routers advertise and evaluate routes
based on a tuple of the network and its path ID. In the example, router A and B advertise the 10.1/16 route. However, this
AL
time, every router supports the ADD-PATH capability, RR-A attaches a unique path ID to each route and is able to advertise
both routes to all clients including router D. When the routes arrive on the clients, the clients install both routes in its routing
table (allowing them to load share towards routers A and B.) Although, router C was already able to load share without the
additional route, router C will be able to continue forwarding traffic to 10.1/16 even in the event of a failure of either router A
RN
or router B.
TE
IN

A RE
SH
T
NO
DO

LY
ON
EBGP: Part 1
EBGP is also a valid design to use in your IP Fabric. You will notice that the load balancing problem is much easier to fix in the
EBGP scenario. For example, there will be no need for the routers to support any draft RFCs! Generally, each router in an IP
E
Fabric should be in its own unique AS. You can use AS numbers from the private or public range or, if you will need thousands
of AS numbers, you can use 32-bit AS numbers.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
EBGP: Part 2
In an EBGP-based fabric, there is no need for route reflectors or an IGP. The BGP peering sessions parallel the physical
wiring. For example, every Leaf node has a BGP peering session with every Spine node. There is no leaf-to-leaf or
E
spine-to-spine BGP sessions just like there is no leaf-to-leaf or spine-to-spine physical connectivity. EBGP peering is done
using the physical interface IP addresses (not loopback interfaces). To enable proper load balancing, all routers need to be
US
configured for multipath multiple-as as well as a load balancing policy. Both of these configurations will be covered
later in this chapter.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
EBGP: Part 3
The slide shows that the router in AS64516 and AS64517 are advertising 10.1/16 to their 2 EBGP peers. Because
multipath multiple-as is configured on all routers, the receiving routers in AS64512 and AS64513 will install both
E
routes in their routing table and load share traffic destined to 10.1/16.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
EBGP: Part 4
The slide shows that the routers in AS64512 and AS64513 are advertising 10.1/16 to all of their EBGP peers (all Leaf
nodes). Since multipath multiple-as is configured on all routers, the receiving router in the slide, the router in
E
AS64514, will install both routes in its routing table and load share traffic destined to 10.1/16.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Best Practices
When enabling an IP fabric you should follow some best practices. Remember, two of the main goals of an IP fabric design
(or a Clos design) is to provide a non-blocking architecture that also provides predictable load-balancing behavior.
E
Some of the best practices that should be followed include...

US
All Spine nodes should be the exact same type of router. They should be the same model and they should also
have the same line cards installed. This helps the fabric to have a predictable load balancing behavior.
All Leaf nodes should be the exact same type of router. Leaf nodes do not have to be the same router as the
Spine nodes. Each Leaf node should be the same model and they should also have the same line cards
AL
installed. This helps the fabric to have a predictable load balancing behavior.
Every Leaf node should have an uplink to every Spine node. This helps the fabric to have a predictable load
balancing behavior.
RN
All uplinks from Leaf node to Spine node should be the exact same speed. This helps the fabric to have
predictable load balancing behavior and also helps with the non-blocking nature of the fabric. For example, let
us assume that a Leaf has one 40GbE uplink and one 10GbE uplink to the Spine. When using the combination
of OSPF (for loopback interface advertisement and BGP next hop resolution) and IBGP, when calculating the
shortest path to the BGP next hop, the bandwidth of the links will be taken into consideration. OSPF will most
TE
likely always chose the 40GbE interface during its shortest path first (SPF) calculation and use the interface for
forwarding towards remote BGP next hops. This essentially blocks the 10GbE interface from ever being used. In
the EBGP scenario, the bandwidth will not be taken into consideration, so traffic will be equally load shared over
the two different speed interfaces. Imagine trying to equally load share 60 Gbps of data over the two links, how
IN
will the 10GbE interface handle 30 Gbps of traffic? The answer is...it wont.

A RE
SH
T
NO
DO

LY
ON
IP Fabric Scaling
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Scaling
To increase the overall throughput of an IP Fabric, you simply need to increase the number of Spine devices (and the
appropriate uplinks from the Leaf nodes to those Spine nodes). If you add one more Spine node to the fabric, you will also
E
have to add one more uplink to each Leaf node. Assuming that each uplink is 40GbE, each Leaf node can now forward an
extra 40Gbps over the fabric.
US
Adding and removing both server-facing ports (downlinks from the Leaf nodes) and Spine nodes will affect the
oversubscription (OS) ratio of a fabric. When designing the IP fabric, you must understand OS requirements of your data
center. For example, does your data center need line rate forwarding over the fabric? Line rate forwarding would equate to
1-to-1 (1:1) OS. That means the aggregate server-facing bandwidth is equal to the aggregate uplink bandwidth. Or, maybe
AL
your data center would work perfectly fine with a 3:1 OS of the fabric. That is, the aggregate server-facing bandwidth is 3
times that of the aggregate uplink bandwidth. Most data centers will probably not require to design around a 1:1 OS. Instead,
you should make a decision on an OS ratio that makes the most sense based on the data centers normal bandwidth usage.
The next few slides discuss how to calculate OS ratios of various IP fabric designs.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
3:1 Topology
The slide shows a basic 3:1 OS IP Fabric. All Spine nodes, four in total, are qfx5100-24q routers that each have (32) 40GbE
interfaces. All leaf nodes, 32 in total, are qfx5100-48s routers that have (6) 40GbE uplink interfaces and (48) 10GbE
E
server-facing interfaces. Each of the (48) 10GbE ports for all 32 Spine nodes will be fully utilized (i.e., attached to
downstream servers). That means that the total server-facing bandwidth is 48 x 32 x 10Gbps which equals 15360 Gbps.
US
Each of the 32 Leaf nodes has (4) 40GbE Spine-facing interfaces. That means, that the total uplink bandwidth is 4 x 32 x
40Gbps which equals 5120 Gbps. The OS ratio for this fabric is 15360:5120 or 3:1.
An interesting thing to note is that if you remove any number of Leaf nodes, the OS ratio does not change. For example, what
would happen to the OS ratio if their were only 31 nodes. The server facing bandwidth would be 48 x 31 x 10Gbps which
AL
equals 14880 Gbps. The total uplink bandwidth is 4 x 31 x 40Gbps which equals 4960 Gbps. The OS ratio for this fabric is
14880:4960 or 3:1. This fact actually makes your design calculations very simple. Once you decide on an OS ratio and
determine the number of Spine nodes that will allow that ratio, you can simply add and remove Leaf nodes from the topology
without effecting the original OS ratio of the fabric.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
2:1 Topology
The slide shows a basic 2:1 OS IP Fabric in which two Spine nodes were added to the topology from the last slide. All Spine
nodes, six in total, are qfx5100-24q routers that each have (32) 40GbE interfaces. All leaf nodes, 32 in total, are
E
qfx5100-48s routers that have (6) 40GbE uplink interfaces and (48) 10GbE server-facing interfaces. Each of the (48) 10GbE
ports for all 32 Spine nodes will be fully utilized (i.e., attached to downstream servers). That means that the total
US
server-facing bandwidth is still 48 x 32 x 10Gbps which equals 15360 Gbps. Each of the 32 Leaf nodes has (6) 40GbE
Spine-facing interfaces. That means, that the total uplink bandwidth is 6 x 32 x 40Gbps which equals 7680 Gbps. The OS
ratio for this fabric is 15360:7680 or 2:1.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
1:1 Topology
The slide shows a basic 1:1 OS IP Fabric. All Spine nodes, six in total, are qfx5100-24q routers that each have (32) 40GbE
interfaces. All leaf nodes, 32 in total, are qfx5100-48s routers that have (6) 40GbE uplink interfaces and (48) 10GbE
E
server-facing interfaces. There are many ways that an 1:1 OS ratio can be attained. In this case, although the Leaf nodes
each have (48) 10GbE server-facing interfaces, we are only going to allow 24 servers to be attached at any given moment.
US
That means that the total server-facing bandwidth is still 24 x 32 x 10Gbps which equals 7680 Gbps. Each of the 32 Leaf
nodes has (6) 40GbE Spine-facing interfaces. That means, that the total uplink bandwidth is 6 x 32 x 40Gbps which equals
7680 Gbps. The OS ratio for this fabric is 7680:7680 or 1:1.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Configure an IP Fabric
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Example Topology
The slide shows the example topology that will be used in the subsequent slides. Notice that each router is the single
member of a unique autonomous system. Each router will peer using EBGP with its directly attached neighbors using the
E
physical interface addresses. Host A is singly homed to the router in AS 64514. Host B is multihomed to the routers in AS
64515 and AS 64516.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
BGP ConfigurationSpine Node

The slide shows the configuration of the Spine node in AS 64512. It is configured to peer with each of the Leaf nodes using
EBGP.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
BGP ConfigurationLeaf Node

The slide shows the configuration of the Leaf node in AS 64515. It is configured to peer with each of the Spine nodes using
EBGP.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Verifying Neighbors
Once you configure BGP neighbors, you can check the status of the relationships using either the show bgp summary or
show bgp neighbor command.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Routing Policy
Once BGP neighbors are established in the IP Fabric, each router must be configured to advertise routes to its neighbors and
into the fabric. For example, as you attach a server to a top-of-rack (TOR) switch/router (which is usually a Leaf node of the
E
fabric) you must configure the TOR to advertise the servers IP subnet to the rest of the network. The first step in advertising
route is to write a policy that will match on a route and then accept that route. The slide shows the policy that must be
US
configured on the routers in AS64515 and AS 64516.

AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Applying Policy
After configuring a policy, the policy must be applied to the router EBGP peers. The slide shows the direct policy being
applied as an export policy as64515s EBGP neighbors.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Verifying Advertised Routes

After applying the policy, the router should begin advertise any routes that were accepted by the policy. Use the show
route advertising-protocol bgp command to see which routes are being advertised to a routers BGP neighbors.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Default Behavior
Assuming the routers in AS 64515 and AS 64516 are advertising Host Bs subnet, the slide shows the default routing
behavior on a Spine node. Notice that the Spine node has received two advertisements for the same subnet. However,
E
because of the default behavior of BGP, the Spine node chooses a single route to select as the active route in the routing
table (you can tell which is the active route because of the asterisk). Based on what is shown in the slide, the Spine node will
US
send all traffic destined for 10.1.2/24 over the ge-0/0/2 link. The Spine node will not load share over the two possible next
hops by default.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Override Default BGP Behavior

The multipath statement overrides the default BGP routing behavior and allows two or more next hops to be used for
routing. The statement by itself requires that the multiple routes must be received from the same autonomous system. Use
E
the multiple-as modifier to override that matching AS requirement.

US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Verify Multipath
View the routing table to see the results of the multipath statement. As you can see the active BGP route now has two next
hops that can be use for forwarding. Do you think the router is using both next hops for forwarding?
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Default Forwarding Table Behavior

The slide shows that since multipath was configured in the previous slides, two next hops are associated with the 10.1.2/
24 route in the routing table. However, only one next-hop is pushed down to the forwarding table, by default. So, at this point,
E
the Spine node is continuing to only forward traffic destined to 10.1.2/24 over a single link.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Load Balancing Policy

The final step in getting a router to load share, is to write and apply a policy that will cause the multiple next hops in the
routing table to be exported from the routing table into the forwarding table. The slide shows the details of that process.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Results
The output shows that after applying the load balancing policy to the forwarding table, all next hops associated with active
routes in the routing table have been copied into the forwarding table.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
AS 64514
The slide shows the BGP and policy configuration for the router in AS 64514.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
AS 64515
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
AS 64512
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
We Discussed:
Routing in an IP Fabric;
E
Scaling of an IP Fabric; and

Configuring an IP Fabric.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Review Questions
1.
E
2.
US
3.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Lab: IP Fabric
The slide provides the objective for this lab.
E
US
AL
RN
TE
IN

RE
1.
Some of the Juniper Networks products that can be used in the Spine position of an IP Fabric are MX, QFX10k, and QFX5100 Series
routers.
A
2.
Routing should be implemented in such a way that when multiple, equal physical paths exist between two points data traffic should be
SH
load-shared over those paths to reach those two points.
3.
To allow a BGP speaker to install more than one next hop in the routing table when the same route is received from two or more
neighbors, multipath must be enabled.
T
NO
DO

LY
ON
E
US
AL
RN
TE
IN

RE
A
SH
T
NO
Chapter 4: VXLAN
DO

LY
ON
E
US
AL
RN
TE
IN
RE
A
SH
T
NO
DO

LY
ON
We Will Discuss:
Reasons why you would use VXLAN in your data center;
E
The control and data plane of VXLAN in a controller-less overlay; and

Configuration and monitoring of VXLAN when using multicast signaling.
US
AL
RN
TE
IN
Chapter 42 VXLAN www.juniper.net

A RE
SH
T
NO
DO

LY
ON
Layer Connectivity Over a Layer 3 Network

E
US
AL
RN
TE
IN
www.juniper.net VXLAN Chapter 43

A RE
SH
T
NO
DO

LY
ON
Layer 2 Apps
The needs of the applications that run on the servers in a data center usually drive the designs of those data centers. There
are many server-to-server applications that have strict requirements layer 2 connectivity between servers. A switched
E
infrastructure that is built around xSTP or a layer 2 fabric (like Juniper Networks Virtual Chassis Fabric or Junos Fusion) is
perfectly suited for this type of connectivity. This type of infrastructure allow for broadcast domains to be stretched across
US
the data center using some form of VLAN tagging.
IP Fabric
Many of todays next generation data centers are being built around IP Fabrics which, as their name implies, provide IP
AL
connectivity between the racks of a data center. How can a next generation data center based on IP-only connectivity
support the layer 2 requirements of the traditional server-to-server applications? The rest of this section of this chapter will
discuss the possible solutions to the layer 2 connectivity problem.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Layer 2 VPNs
One possible solution to providing layer 2 connectivity over an IP-based data center would be to implement some form of
layer 2 virtual private network (VPN) on the routers that directly attach to the servers in the rack. Usually these routers would
E
be the top-of-rack (TOR) routers/switches. In this scenario, each TOR router would act as a layer 2 VPN gateway. A gateway is
the device in a VPN that performs the encapsulation and decapsulation of VPN data. In a layer 2 VPN based on Ethernet, a
US
gateway (router on left) will take Ethernet frames destined for a remote MAC address, encapsulate the original Ethernet
frame in some other data type (like IP, MPLS, IPsec, etc.) and transmit the newly formed packet to the remote gateway. The
receiving gateway (router on right) will receive the VPN data, decapsulate the data by removing the outer encapsulation, and
then forward the remaining original Ethernet frame to the locally attached server. Notice on the diagram, that the IP Fabric
simply had to forward IP data. The IP Fabric had no knowledge of the Ethernet connectivity that exists between Host A and B.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Data Plane
There are generally two components of a VPN. There is the data plane (as described on this slide) and the control plane (as
described on the next slide).
E
The data plane of a VPN describes the method in which a gateway encapsulates and decapsulates the original data. Also, in
US
regards to an Ethernet layer 2 VPN, it might be necessary for the gateway to learn the MAC addresses of both local and
remote servers much like a normal Ethernet switch learns MAC addresses. In almost all forms of Ethernet VPNs, the
gateways learn the MAC addresses of locally attached servers in the data plane (i.e. from received Ethernet frames). Remote
MAC addresses can be learned either in the data plane (after decapsulating data received from remote gateways) or in the
control plane.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Control Plane
One question that must be asked is, How does a gateway learn about remote gateways? The learning of remote gateways
can happen in one of two ways. Remote gateways can be statically configured on each gateway participating in a VPN or they
E
can be learned through some dynamic VPN signaling protocol.

US
Static configuration works fine but it does not really scale. For example, imagine that your have 20 TOR routers participating
in a statically configured layer 2 VPN. If you add another TOR router to the VPN, you would have to manually configure each of
the 20 switches to recognize the newly added gateway to the VPN.
Usually a VPN has some form of dynamic signaling protocol for the control plane. The signaling protocol can allow for
dynamic adds and deletions of gateways from the VPN. Some signaling protocols also allow a gateway to advertise its locally
AL
learned MAC addresses to remote gateways. Usually a gateway has to receive an Ethernet frame from a remote host before
it can learn the hosts MAC address. Learning remote MAC addresses in the control plane allows the MAC tables of all
gateways to be more in sync. This has a positive side effect of causing the forwarding behavior of the VPN to be more
efficient (less flooding of data over the fabric).
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Layer 2 VPN Options

The slide lists some of the layer 2 VPNs that exist today.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Virtualization
Data centers are relying on virtualization more and more. The slide shows the concepts of virtualizing servers in a data
center. Instead of installing a bare metal servers (BMS), a server can run as a virtual machine (VM) on a host machine. A VM
E
is a software computer that runs the same OS and applications as a BMS. A host machine is the physical machine that
house the VMs that run inside it.
US
One interesting piece of virtualization is how networking works between VMs. Normally, a BMS would simply need a physical
network interface card (NIC) to attach to the network. In the virtualized world, the VMs also utilize NICs, however they are in
fact, virtual. VMs use their virtual NICs to communicate with other VMs. To provide connectivity between VMs on the same
host machine, the virtual NICs attach to virtual switches. To allow VMs to communicate over the physical network, the virtual
AL
switches use the physical NICs of the host machine. If the physical network is a switched network (as in the diagram), the
virtual switches appear to standard switches attached to the network. VLANs can simply be stretched from one virtual
switch, across the physical switched network, and terminate on one or more remote virtual switches. This works great when
the physical network is some sort of Ethernet switches network. However, what happens when the physical network is based
RN
on IP routing?
TE
IN

A RE
SH
T
NO
DO

LY
ON
VXLAN is Supported by Major Vendors

As described in the previous slides, a layer 2 VPN can solve the problem by tunneling Ethernet frames over the IP network. In
the case of virtualized networks, the virtual switches running on the host machines will act as the VPN gateways. Many
E
vendors of virtualized products have chosen to support VXLAN as the layer 2 VPN. VXLAN functionality can be found in the
virtual switches like VMWares Distributed vSwitch, Open vSwitch, and Juniper Networks Contrail vRouters. If virtualizing the
US
network is the future, it would seem that VXLAN has become the de facto layer 2 VPN in the data center.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VXLAN Using Multicast Control Plane

E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VXLANAn Ethernet VPN

VXLAN is defined in RFC 7348 and describes a method of tunneling Ethernet frames over an IP network. RFC 7348
describes the data plane and a signaling plane for VXLAN. Although, RFC 7348 discusses PIM and multicast in the signaling
E
plane, other signaling methods for VXLAN exist including Multi-protocol Border Gateway Protocol (MP-BGP) Ethernet VPN
(EVPN) as well as Open Virtual Switch Database (OVSDB). This chapter covers the multicast method of signaling.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VXLAN Packet Format

The VXLAN packet consist of the following:
E
1. Original Ethernet Frame: The Ethernet frame being tunneled over the underlay network minus the VLAN tagging.
2. VXLAN Header (64 bits): Consists of an 8 bit flags field, the VNI, and two reserved fields. The I flag must be set
US
to 1 and the other 7 reserved flags must be set to 0.

3. Outer UDP Header: Usually contain the well known destination UDP port 4789. Some VXLAN implementations
allow for this destination port to be configured to some other value. The destination port is a hash of the inner
Ethernet frames header.
AL
4. Outer IP Header: The source address is the IP address of the sending VXLAN Tunnel End Point (VTEP). The
destination address is the IP address of the receiving VTEP.
5. Outer MAC: As with any packet being sent over a layer 3 network, the source and destination MAC addresses will
RN
change at each hop in the network.

6. Frame Check Sequence (FCS): New FCS for the outer Ethernet frame.
TE
IN

A RE
SH
T
NO
DO

LY
ON
VTEP: Part 1
The VXLAN Tunnel Endpoint (VTEP) is the VPN gateway for VXLAN. It performs the encapsulation (and decapsulation) of
Ethernet frames using VXLAN encapsulation. Usually, the mapping of VLAN (VM-facing) to VNI is manually configured on the
E
VTEP.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VTEP: Part 2
The slide shows how a VTEP handles an Ethernet frame from a locally attached VM that must be sent to a remote VM. Here
is the step by step process taken by Virtual Switch 1...
E
1. VS1 receives an Ethernet frame with a destination MAC of VM3.

US
2. VS1 performs a MAC table look up and determines that the frame must be sent over the VXLAN tunnel to the
remote VTEP, VS2.
3. VS1 removes any outer VLAN tagging on the original Ethernet frame and then encapsulates the remaining
Ethernet frame using VXLAN encapsulation while also setting the destination IP address to VS2s VTEP address
AL
as well as setting the VNI appropriately.

4. VS1 forwards the VXLAN packet towards the IP Fabric.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VTEP: Part 3
The slide shows how a VTEP handles a VXLAN packet from a remote VTEP that must be decapsulated and sent to a local VM.
Here is the step by step process taken by the network and VS2...
E
1. The routers in the IP fabric simply route the VXLAN packet to its destination, VS2s VTEP address.
US
2. VS2 receives the VXLAN packet and uses the received VNI to determine on which MAC table the MAC table
lookup should be performed.
3. VS2 strips the VXLAN encapsulation leaving the original Ethernet frame.
4. VS2 performs a MAC table lookup to determine the outgoing virtual interface to send the Ethernet frame.
AL
5. VS2, if necessary, pushes on VLAN tag and forwards the Ethernet frame to VM3.
One thing you should notice about the VLAN tagging between the VMs and the virtual switches is that since the VLAN tags
are stripped before sending over the IP Fabric, the VLAN tags do not have to match between remote VMs. This actually allows
RN
for more flexibility in VLAN assignments from server to server and rack to rack.
TE
IN

A RE
SH
T
NO
DO

LY
ON
VXLAN Gateways: Part 1

We have discussed VTEPs that exist on virtual switches that sit on the host machines. However, what happens when the VMs
on the host machine need to communicate with a standard BMS that doesnt support VXLAN. The VXLAN RFC describes how
E
a networking device like a router or switch can handle the VTEP role. A networking device that can perform that role is called
a VXLAN Gateway. There are two types of VXLAN Gateways; layer 2 and layer 3. The slide shows how a VXLAN Layer 2
US
Gateway (router on the right) handles VXLAN packets received from a remote VTEP. It simply provides layer 2 connectivity
between hosts on the same VLAN.
As you discuss the concept of a VTEP with others, you may notice that people refer to the different types of VTEPs in different
ways. For example, a VTEP that is part of a virtual switch (as shown in previous slides) is sometimes referred to as a software
AL
VTEP. A physical router or switch acting as a VXLAN Gateway (Layer 2 or Layer 3) is sometimes referred to as a hardware
VTEP.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VXLAN Gateways: Part 2

Another form of gateway is the VXLAN Layer 3 Gateway. A layer 3 gateway acts as the default gateway for hosts on the same
VXLAN Segment (i.e. broadcast domain). In the slide, the default gateway for VM1 and VM2 is 10.1.1.254 which belongs to
E
Router Bs IRB interface. To send a packet to 1.1.1.1 (a remote IP subnet) VM1 must use Address Resolution Protocol (ARP)
to determine the MAC address of 10.1.1.254. Once VM1 knows the MAC address for 10.1.1.254, VM1 and the devices along
US
the way to the 1.1.1.1 will use the following procedure to forward an IP packet to its destination...
1. VM1 creates an IP packet destined to 1.1.1.1.
2. Since 1.1.1.1 is on a different subnet than VM1, VM1 encapsulates the IP packet in an Ethernet frame with a
destination MAC address of the default gateways MAC address and sends the Ethernet frame to VS1.
AL
3. VS1 receives the Ethernet frame and performs a MAC table lookup and determines that the Ethernet frame
must be sent over the VXLAN tunnel to Router B. Router B appears to VS1 as the VTEP that is directly attached
the host that owns the destination MAC address. The reality is that the destination MAC address is the MAC
RN
address of Router Bs IRB interface for that VLAN/VXLAN segment.

4. Router B receives the VXLAN packet, determines the VNI which maps to a particular MAC table, and strips the
VXLAN encapsulation leaving the original Ethernet frame.
5. Router B performs a MAC table lookup and determines that the destination MAC belongs to its own IRB
TE
interface.
6. Router B strips the remaining Ethernet framing and performs a routing table lookup to determine the nexthop to
the destination network.
IN
7. Router B encapsulates the IP packet in the outgoing interfaces encapsulation and forwards it to the nexthop.

A RE
SH
T
NO
DO

LY
ON
Layer 3 Gateway Placement

The slide shows that the standard place to implement VXLAN layer 2 gateways is on the Leaf nodes. Layer 3 GW placement
is usually in the Spine or Fabric tier but can also be found on the Leaf nodes. Currently, most Juniper Leaf nodes QFX5100,
E
EX4300, etc do not support Layer 3 GW functionality.

US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VXLAN MAC Learning

This slide discusses the MAC learning behavior of a VTEP. The next few slides will discuss the details of how remote MAC
addresses are learned by VTEPs when using PIM as the control protocol.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
BUM Traffic
The slide discusses the handling of BUM traffic by VTEPs according to the VXLAN standard model. In this model, you should
note that the underlay network must support a multicast routing protocol, preferably some form of Protocol Independent
E
Multicast Sparse Mode (PIM-SM). Also, the VTEPs must support Internet Group Membership Protocol (IGMP) so that they can
inform the underlay network that it is a member of the multicast group associated with a VNI.
US
For every VNI used in the data center, there must also be a multicast group assigned. Remember that there are 2^24 (~16M)
possible VNIs so your customer will need 2^24 group addresses. Luckily, 239/8 is a reserved set of organizationally scoped
multicast group addresses (2^24 group addresses in total) that can be used freely within your customers data center.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Building the Multicast Tree

The slide shows an example of a PIM-SM enabled network where the (*,G) rendezvous point tree (RPT) is established from
VTEP A to R1 and finally to the rendezvous point (RP). This is the only part of the RPT shown for simplicity but keep in mind
E
that each VTEP that belongs to 239.1.1.1 will also build its branch of the RPT (including VTEP B).
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Multicast Forwarding
When VTEP B receives a broadcast packet from a local VM, VTEP B encapsulates the Ethernet frame into the appropriate
VXLAN/UDP/IP headers. However, it sets the destination IP address of the outer IP header to the VNIs group address
E
(239.1.1.1 on the slide). Upon receiving the multicast packet, VTEP Bs DR (the PIM router closest to VTEP B) encapsulates
the multicast packet into unicast PIM register messages that are destined to the IP address of the RP. Upon receiving the
US
register messages, the RP de-encapsulates the register messages and forwards the resulting multicast packets down the
(*,G) tree. Upon receiving, the multicast VXLAN packet, VTEP A does the following:
1. Strips the VXLAN/UDP/IP headers;
2. Forwards the broadcast packet towards the VMs using the virtual switch;
AL
3. If VTEP B was unknown, VTEP A learns the IP address of VTEP B; and

4. Learns the remote MAC address of the sending VM and maps it to VTEP Bs IP address.
RN
For all of this to work, you must ensure that the appropriate devices support PIM-SM, IGMP, and the PIM DR and RP
functions.
It is not shown on this slide but once R1 receives the first native multicast packet from the RP (source address is VTEP Bs
address), R1 will build a shortest path tree (SPT) to the DR closest to VTEP B which will establish (S,G) state on all routers
TE
along that path.

IN

A RE
SH
T
NO
DO

LY
ON
VXLAN Configuration
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Example Topology
The slide shows the example topology that will be used for the subsequent slides.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Logical View
To help you understand the behavior of the example, the slide shows a logical view of the overlay network. Using the help of
VXLAN, it will appear that Host A, Host B, and the IRBs of the routers in AS 64512 and 64513 will be in the same broadcast
E
domain as well as IP subnet. Also, VRRP will run between the two routers so as to provide a redundant default gateway to the
two hosts.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Routing
You must ensure that all VTEP addresses are reachable by all of the routers in the IP Fabric. Generally, the loopback
interface will be used on Juniper Networks routers as the VTEP interfaces. Therefore, you must make sure that the loopback
E
addresses of the routers are reachable. Remember, the loopback interface for each router in the IP Fabric fell into the
172.16.100/24 range.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
PIM
Some form of PIM must be enabled in the IP Fabric. The slide shows that the routers will run PIM-SM with a statically
configured RP. The configurations of the RP as well as all other routers is shown on the slide. Notice that PIM-SM only needs
E
to be enabled on the IP Fabric facing interfaces.

US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Source Address
You must decided on the source address of the VXLAN and multicast packets that will be generated by the local VTEP. Use
the vtep-source-interface statement to specify the interface where the IP address will come from. This command is
E
the same for both MX and QFX5100 Series devices.

US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VXLAN Layer 2 Gateway Configuration: Part 1

The slide shows the configuration necessary to enable VXLAN Layer 2 Gateway functionality on a QFX5100 Series router. It
might be worth noting that you can configure the same multicast group for different VNIs on the same VXLAN gateway.
E
However, it may cause a remote VXLAN gateway to receive unwanted BUM traffic for a VNI that does not belong to.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VXLAN Layer 2 Gateway Configuration: Part 2

The slide shows the configuration necessary to enable VXLAN Layer 2 Gateway functionality on a MX Series router.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VXLAN Layer 3 Gateway

The slide shows how to enable VXLAN Layer 3 Gateway functionality on an MX Series router (not supported on QFX5100
series). Also, notice that VRRP has been enable on router as64512.
E
The VRRP/IRB configuration for router as64513 is as follows...

US
[edit interfaces irb]

lab@as64513# show
unit 0 {
family inet {
address 10.1.1.11/24 {
AL
vrrp-group 1 {
virtual-address 10.1.1.254;
priority 100;
}
RN
}
}
}
The bridge domain configuration on router as64513 would be the identical to that shown on the slide.
TE
IN

A RE
SH
T
NO
DO

LY
ON
Multicast Transit Traffic

Since VXLAN-based bridge domains do not support any form of multicast snooping, you can use the command on the slide to
block the forwarding of multicast traffic over the VXLAN tunnels.
E
As you know, multicast is used in the control plane for VXLAN. It helps in the forwarding of BUM traffic (here we care about
US
the multicast traffic). Normally, when a VTEP receives multicast traffic from an attached server, it will send a copy to all other
locally attached servers on the same VLAN. It will also send a VXLAN encapsulated copy over the IP fabric using the
multicast-group for the VXLAN segment. That is, every remote VTEP will receive a copy of the original multicast packet,
regardless of whether or not they have any attached receivers. If you know that there are no receivers attached to any
remote VTEPs for a particular multicast group, you can use the command on the slide to help stop the transmission of transit
AL
multicast traffic to uninterested VTEPs.

RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Preserve Original VLAN Tag

As you know, the default behavior of a Juniper Networks device acting as a VXLAN Layer 2 Gateway is to strip the original
VLAN tag of Ethernet frames received from locally attached receivers. Another default behavior of those same devices, is to
E
automatically discard any received VXLAN packets that, when decapsulated, contain a VLAN tagged Ethernet frame. The
slide shows the commands that can override those default behaviors. One reason that you might want to preserve the VLAN
US
tagging is to preserve the 802.1p bits for class of service purposes.

AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
PIM State Verification

The command on the slide helps determine the current (*,G) and (S,G) state for a router. From the point of view of a VXLAN
Gateway, the (*,G) state should instantiate as soon as you commit the vxlan statement in the configuration. Any (S,G) state
E
means that the gateway has received multicast traffic (BUM traffic encapsulated in VXLAN) from a remote VTEP allowing it to
learn the remote VTEPs IP address, so the local gateway has instantiated a SPT towards that remote VTEP.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
PIM Neighbors
The commands on the slide verify which PIM neighbors have been discovered and the associated settings for the neighbors.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VTEP Interfaces
Prior to learning any remote neighbors, a VXLAN Gateway will create a single logical VTEP interface, vtep.32768 on the
slide. Although this interface is never used for forwarding, when it shows up in the output of this command it allows you to
E
verify two things; the local device is configured as a VXLAN Gateway and its source IP address for VXLAN packets. For each
remote VTEP learned, a gateway will instantiate another logical VTEP interface, vtep.32769 on the slide. These interfaces
US
represent the VXLAN tunnel established between the local gateway and the remote gateway. These interfaces are actually
used for forwarding as you can tell from the input and output packet counts.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VTEP Source and Remote

The source command allows you see the locally configured values for a gateway. The remote command allows you to see
the details of the remotely learned gateway/VTEPs.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
MAC Table
A VXLAN Gateway uses a MAC table for forwarding decisions. The slide shows the two commands to verify the MACs and
associated interfaces that have been learned by the gateway.
E
US
AL
RN
TE
IN

RE
A
SH
T
NO
DO

LY
ON
We Discussed:
Reasons why you would use VXLAN in your data center;
E
The control and data plane of VXLAN in a controller-less overlay; and

Configuration and monitoring of VXLAN when using multicast signaling.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Review Questions
1.
E
2.
US
3.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Lab: VXLAN
E
US
AL
RN
TE
IN

RE
1.
Major vendors of virtualization product support VXLAN to provide the layer 2 stretch over an IP-based data center. If the vSwitches
of your virtualized product ONLY support VXLAN, then more than likely your other networking devices will need to support
A
VXLAN as well.
2.
SH
A VXLAN Gateway automatically removes the VLAN tag for an Ethernet frames received from a locally attached server.
3.
show ethernet-switching vxlan-tunnel-end-point remote mac-table on a QFX5100 Series switch or
show l2-learning vxlan-tunnel-end-point remote mac-table on an MX Series router can be used to view the
T
MAC learned from remote gateways.
NO
DO

LY
ON
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
E
US
AL
RN
TE
IN

RE
A
SH
T
NO
Chapter 5: EVPN
DO

LY
ON
E
US
AL
RN
TE
IN
RE
A
SH
T
NO
DO

LY
ON
We Will Discuss:
The benefits of using EVPN signaling for VXLAN;
E
The operation of the EVPN protocol; and

Configuring and monitoring EVPN signaling for VXLAN.
US
AL
RN
TE
IN
Chapter 52 EVPN www.juniper.net

A RE
SH
T
NO
DO

LY
ON
The Benefits of EVPN

E
US
AL
RN
TE
IN
www.juniper.net EVPN Chapter 53

A RE
SH
T
NO
DO

LY
ON
VXLANAn Ethernet VPN

VXLAN is defined in RFC 7348 and describes a method of tunneling Ethernet frames over an IP network. RFC 7348
describes the data plane and a signaling plane for VXLAN. Although, RFC 7348 discusses Protocol Independent Multicast
E
(PIM) and multicast in the signaling plane, other signaling methods for VXLAN exist including Multi-protocol Border Gateway
Protocol (MP-BGP) Ethernet VPN (EVPN) as well as Open Virtual Switch Database (OVSDB). This chapter covers the EVPN
US
method of signaling. Although we cover EVPN as the signaling component for VXLAN in this chapter, it should be noted that
EVPN can also be used as the signaling component for both MPLS/MPLS and MPLS/GRE encapsulations as well. Those
encapsulation types are not covered in this course.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Benefits of EVPN Signaling

The slide lists some of the benefits of using EVPN signaling instead of PIM. The subsequent slides of this section will discuss
each of these benefits at a very high level. It will be in the next section of this chapter that we will take a deep dive into the
E
EVPN protocol.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
MP-BGP
EVPN is based on Multiprotocol Border Gateway Protocol (MP-BGP). It uses the Address Family Identifier (AFI) of 25 which is
the Layer 2 VPN address family. It uses the Subsequent Address Family Identifier of 70 which is the EVPN address family.
E
BGP is a proven protocol in both service provider and enterprise networks. It has the ability to scale to millions of route
US
advertisements. BGP also has the added benefit of being policy oriented. Using policy, you have complete control over route
advertisements allowing you to control which devices learn which routes.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Active/Active Forwarding
When using PIM in the control plane for VXLAN, it is really not possible to have a server attach to two different top of rack
switches with the ability to forward data over both links (i.e., both links active). When using EVPN signaling in the control
E
plane, active/active forwarding is totally possible. EVPN allows for VXLAN gateways (Leaf1 at the top of the slide) to use
multiple paths and multiple remote VXLAN gateways to forward data to multihomed hosts. Also, EVPN has mechanisms (like
US
split horizon, etc.) to ensure that broadcast, unknown unicast, and multicast traffic (BUM) does not loop back towards a
multihomed host.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Minimizing Unknown Unicast Flooding

The slide shows how EVPN signaling minimizes unknown unicast flooding.
E
1. Leaf2 receives an Ethernet frame with a source MAC address of HostB and a destination MAC address of
HostC.
US
2. Based on a MAC table lookup, Leaf2 forwards the Ethernet frame to its destination over the VXLAN tunnel.
Leaf2 also populates its MAC table with HostBs MAC address and associates with the outgoing interface.
3. Since Leaf2 just learned a new MAC address, it advertises the MAC address to the remote VXLAN gateway,
Leaf1. Leaf1 installs the newly learned MAC address in its MAC table and associates it with an outgoing
AL
interface, the VXLAN tunnel to Leaf2.

Now, when Leaf1 needs to send an Ethernet frame to HostB, it can send it directly to Leaf2 because it is a known MAC
address. Without the sequence above, Leaf1 would have no MAC entry in its table for HostB (making the frame destined to
HostB an unknown unicast Ethernet frame), so it would have to send a copy of the frame to all remote VXLAN gateways.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Proxy ARP
Although not currently supported, the EVPN RFC mentions that a EVPN Provider Edge (PE) router, Leaf1 in the example, can
perform Proxy ARP. It is possible that if Leaf2 knows the IP-to-MAC binding for HostB (because it was snooping some form of
E
IP traffic from HostB), it can send the MAC advertisement for HostB that also contains HostBs IP address. Then, when HostA
sends an ARP request for HostBs IP address (a broadcast Ethernet frame), Leaf1 can simply send an ARP reply back to
US
HostA without ever having to send the broadcast frame over the fabric.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Distributed Layer 3 Gateways

The EVPN control plane also helps enable distributed layer 3 gateways. In the slide, notice that HostC has a default gateway
configured of 10.1.1.254. SpineA and SpineB have been enabled as VXLAN Layer3 Gateway. They both have been configured
E
with the same virtual IP address of 10.1.1.254. If the Spine nodes are MX Series routers, they also share the same virtual
MAC address, 00:00:5e:00:01:01 (same as VRRP even though VRRP is not used). SpineA and SpineB send a MAC
US
Advertisement to LeafC for the same MAC. Now, LeafC can load share traffic from HostC to the default gateway.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VXLAN Using EVPN Control Plane

E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
EVPN Terminology
The slide highlights the terms used in a network using VXLAN with EVPN signaling.
E
PE devices: These are the networking devices (Leaf nodes in the diagram) to which servers attach in a data
center. These devices also act as VXLAN Tunnel Endpoints (VTEPs) or VXLAN gateways (can be Layer 2 or Layer
US
3). These devices can be any node of an IP fabric; Leaf or Spine.

P devices: These are networking devices that only forward IP data. The do not instantiate any bridge domains
related to the EVPN.
Customer Edge (CE) devices: These are the devices that require the Layer 2 stretch over the data center. They
AL
are the servers, switches, and storage devices that need layer 2 connectivity with other devices in the data
center.
Site: An EVPN site is a set of CEs that communicate with one another without needing to send Ethernet frames
over the fabric.
RN
EVPN Instance (EVI): An EVPN Instance spanning the PE devices participating in that EVPN.
Bridge Domain: A MAC table for a particular VLAN associated with an EVI. There can be many bridge domains
for a given EVI.
TE
MP-BGP Session: EVPN PEs exchange EVPN routes using MP-BGP.

VXLAN Tunnel: A tunnel established between EVPN PE devices used to encapsulate Ethernet frames in VXLAN IP
packets.
IN

A RE
SH
T
NO
DO

LY
ON
EVPN Routes
The slide lists the EVPN routes, their usage, as well as where they are defined. The subsequent slides will discuss most of
these routes in detail.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
EVPN Type 2 RouteMAC/IP Advertisement Route

The type 2 route has a very simple purpose which is to advertise MAC addresses. Optionally, this route can be used to
advertise a MAC address, as usual, but also an IP address that is bound to that MAC address. Leaf2, an EVPN PE, will learn
E
MAC addresses in data plane from Ethernet frames received from CEs, CE2 in the example. Once Leaf2 learns CE2s MAC
address, it will automatically advertise it to remote PEs and attaches a target community, community Orange in the
US
example. Leaf1, another EVPN PE, upon receiving the route must decide on whether it should keep the route. It makes this
decision based on the received route target community. Leaf1, in order to accept and use this advertisement, must be
configured with an import policy that accepts routes tagged with the Orange target community. Without a configured policy
that matches on the Orange route target, Leaf1 would just discard the advertisement. So, at a minimum, each EVI on each
participating PE for a given EVPN must be configured with an export policy that attaches a unique target community to MAC
AL
advertisements and also configured with an import policy that matches and accepts advertisements based on that unique
target community.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Two Route Distinguisher Formats Defined

The route distinguisher can be formatted two ways:
E
Type 0: This format uses a 2-byte administration field that codes the providers autonomous system number,
followed by a 4-byte assigned number field. The assigned number field is administered by the provider and should
US
be unique across the autonomous system.

Type 1: This format uses a 4-byte administration field that is normally coded with the router ID (RID) of the
advertising PE router, followed by a 2-byte assigned number field that caries a unique value for each VRF table
supported by the PE router.
AL
The examples on the slide show both the Type 0 and Type 1 route distinguisher formats. The first example shows the 2-byte
administration field with the 4-byte assigned number field (Type 0).
RFC 7432 recommends using the Type 1 route distinguisher for EVPN signaling.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Route Target Community

Each EVPN route advertised by a PE router contains one or more route target communities. These communities are added using
VRF export policy or explicit configuration.
E
When a PE router receives route advertisements from remote PE routers, it determines whether the associated route target
US
matches one of its local VRF tables. Matching route targets cause the PE router to install the route into the VRF table whose
configuration matches the route target.
Because the application of policy determines a VPNs connectivity, you must take extra care when writing and applying VPN
policy to ensure that the tenants connectivity requirements are faithfully met.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VRF Export Policy

VRF export policy for EVPN is applied using the vrf-target statement. In the example, the statement vrf-target
target:1:1 is applied to Leaf2s orange EVI. That statement causes all locally learned MACs (in the MAC table) to be
E
copied into the VRF table as EVPN Type 2 routes. Each of the Type 2 routes associated with locally learned MACs will be
tagged with the community target:1:1. Finally, these tagged routes are then advertised to all remote PEs.
US
In the next few slides, you will learn the details of the other EVPN route types. You should know that the vrf-target
statement always sets the target community (using hidden VRF import and export policies) of Type 1 routes. By default, the
vrf-target statement also sets the target community of Type 2 and Type 3 routes as well. Later in this chapter, you will
learn how to set a different target community for Type 2 and Type 3 routes.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VRF Import Policy

VRF import policy can be applied using the vrf-target statement or it can be enabled by manually writing a policy and
then applying it with the vrf-import statement. As you know, the vrf-target statement is used to enable export policy
E
that advertises EVPN routes tagged with the target community. The statement also happens to enable the associated import
policy which will accept routes that are tagged with that target community. So, you must configure the vrf-target
US
statement to enable export policy at a minimum. To override the import policy instantiated by that statement, you can apply
the vrf-import statement.
In the example, the vrf-target target:1:1 is applied to Leaf1s EVI. When Leaf1 receives the MAC Advertisement
from Leaf2, it runs the route through the configured import policy which will accept routes tagged with target:1:1. Once
AL
accepted, the route is copied into the Leaf1s global RIB-IN table and then copied into the appropriate VRF table (the one
configured with the vrf-target target:1:1 statement). Finally, the route is converted into a MAC entry and stored in
Leaf1s MAC table for the Orange EVI. The outgoing interface associated with the MAC is the VXLAN tunnel that terminates
on Leaf2.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Ethernet Segment
The set of links that attaches a site to one or more PEs is called an Ethernet segment. In the slide, there are two Ethernet
Segments. Site 1 has an Ethernet segment that consists of links A and B. Site 2 has an Ethernet segment that consists of
E
link C. Each Ethernet Segment must be assigned a 10-octet Ethernet Segment Identifier (ESI). There are two reserved ESI
values as shown in the slide. For a single-homed site, like Site 2, the ESI should be set to
US
0x00:00:00:00:00:00:00:00:00:00. This is the default ESI setting for a server facing interface on a Juniper Networks EVPN
PE. For any multihomed site, the ESI should be set to a globally unique ESI. In the example, both link A and link B have their
ESI set to 0x01:01:01:01:01:01:01:01:01:01. The commands below shows how to set the ESI on the server-facing interface.
{master:0}[edit interfaces et-0/0/50]

AL
lab@leaf1# show
esi {
01:01:01:01:01:01:01:01:01:01;
RN
all-active;
}
unit 0 {
family ethernet-switching {
TE
interface-mode trunk;
vlan {
members v100;
IN
...

A RE
SH
T
NO
DO

LY
ON
Type 1Ethernet Autodiscovery Route

Once you have configured a non-reserved ESI value on a site-facing interface, the PE will advertise an Ethernet Autodiscovery
route to all remote PEs. The route carries the ESI value as well as the ESI Label Extended Community. The community
E
contains the Single-Active Flag. This flag lets the remote PEs know whether or not they can load share traffic over the
multiple links attached to the site. If the Single-Active flag is set to 1, that means only one link associated with the Ethernet
US
segment can be used for forwarding. If the Single-Active flag is set to 0, that means that all links associated with the Ethernet
segment can be used for forwarding data (we call this active/active forwarding). Juniper Networks devices only support
active/active forwarding (we always set the flag to 0).
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Remote PE Behavior
When a remote PE, Leaf 3 in the example, receives the Ethernet Autodiscovery routes from Leaf1 and Leaf2 it now knows
that it can use either of the two VXLAN tunnels to forward data to MACs learned from Site 1. Based on the forwarding choice
E
made by CE1, it may be that Leaf1 was the only PE attached to Site1 that learned CE1s MAC address. That means that
Leaf3 may have only ever received a MAC Advertisement for CE1s MAC from Leaf1. However, since Leaf1 and Leaf2 are
US
attached to the same Ethernet Segment (as advertised in their Type 1 routes), Leaf3 knows it can get to CE1s MAC through
either Leaf1 or Leaf2. You can see in Leaf3s MAC table, that both VXLAN tunnels have been installed as next hops for CE1s
MAC address.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Added Benefit
Another benefit of the Ethernet Autodiscovery route is that it helps to enable faster convergence times when a link fails.
Normally, when a site-facing link fails, a PE will simply withdraw each of its individual MAC Advertisement. Think about the
E
case where there are thousands of MACs associated with that link. The PE would have to send 1000s of withdrawals. When
the Ethernet Autodiscovery route is being advertised (because the esi statement is configured on the interface), a PE (like
US
Leaf1 on the slide) can simply send a single withdrawal of its Ethernet Autodiscovery route and Leaf3 can immediately
update the MAC table for all of the 1000s of MACs it had learned from Leaf1. This allows convergence times to greatly
improve.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
BUM Traffic
When EVPN signaling is used with VXLAN encapsulation, Juniper Networks devices only support ingress replication of BUM
traffic. That is, when BUM traffic arrives on a PE, the PE will unicast copies of the BUM packets to each of the individual PEs
E
that belong the same EVPN.

US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Type 3Inclusive Multicast Ethernet Tag Route

This EVPN route is very simple. The route informs remote PEs of how BUM traffic should be handled. This information is
carried in the Provider Multicast Service Interface (PMSI) Tunnel attribute. It specifies whether PIM or ingress replication will
E
be used and the addressing that should be used to send the BUM traffic. In the diagram, Leaf2 advertises that it is expecting
and using ingress replication and that Leaf1 should use 4.4.4.4 as the destination address of the VXLAN packets that are
US
carrying BUM traffic.

AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Split Horizon Rules: Part 1

The slide shows the default split horizon rules that EVPN PEs follow when they receive BUM traffic from a local CE.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Split Horizon Rules: Part 2

The slide shows the default split horizon rules that EVPN PEs follow when they receive BUM traffic from a remote PE.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Active/Active Breaks Split Horizon

Earlier we discussed how the Type 1 Ethernet Autodiscovery route can enable multipath forwarding when a site is
multihomed to 2 or more PEs. That advertisement works great for known unicast traffic. However, the slide shows what
E
happens when Leaf1 must send BUM traffic.

US
In the top diagram, Leaf1 will make copies of the BUM packets and unicast them to each remote PE belonging to the same
EVPN. This will cause CE2 to receive multiple copies of the same packets. This is not good.
In the bottom diagram, Leaf3 receives BUM traffic from the attached CE. It makes copies and unicasts them to the remote
PEs including Leaf2. Leaf2 because of the default split horizon rules will forward BUM traffic back to the source creating a
loop.
AL
Electing a designated forwarder for an ESI will solve these problems.

RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Designated Forwarder
To fix the problems described on the previous slide, all the PEs attached to the same Ethernet Segment will elect a
designated forwarded for the Ethernet segment (2 or more PEs advertising the same ESI). A designated forwarder will be
E
elected per broadcast domain. Remember that an EVI can contain 1 or more broadcast domains or VLANs. The Ethernet
Segment Route (Type 4) is used to help with the election of the designated forwarder.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Designated Forwarder Election

Once youve configured an ESI on an interface, the PE will advertise the Ethernet Autodiscovery Route (Type 1) and also a
Ethernet Segment Route (Type 4). The type 4 solves two problems. It helps in the designated forwarder election process and
E
it helps add a new split horizon rule.

US
Notice that Leaf2 and Leaf3 will advertise a type 4 to every PE belonging to an EVPN. However, notice that the route is not
tagged with a target community. Instead, it is tagged with a ES-import target community. The ES-import target community is
automatically generated by the advertising PE and is based off of the ESI value. Since Leaf1 does not have an import policy
that matches on the ES-import target, it will drop the type 4s. However, since Leaf2 and Leaf3 are configured with the same
ESI, the routes are accepted by a hidden policy that matches on the ES-import target community that is only known by the
AL
PEs attached to the same Ethernet Segment. Now Leaf2 and Leaf3 use the Originator IP address in the Type 4 route to build
a table that associates an Originator IP address (i.e. the elected designated forwarder) with a VLAN in a round-robin fashion.
After the election, If a non-designated forwarder for a VLAN receives BUM traffic from a remote PE, it will simply drop those
packets.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Distributed Default Gateways

It is possible to have multiple default gateways sharing the same IP address for a subnet. Notice the configuration on an MX
Series router...
E
[edit interfaces irb]

US
lab@spine1# show
irb {
unit 0 {
family inet {
AL
address 10.1.1.10/24 { <<<<<this address must be different per GW

virtual-gateway-address 10.1.1.254; <<<this should be the same per GW
}

RN
If both Spine1 and Spine2 are configured in this manner using the same virtual gateway address, both devices will not only
share the same virtual IP address but they will share a virtual MAC address, 00:00:5e:00:01:01. The Spine nodes will each
advertise that MAC address to the other PEs. Now the remote PEs will be able to load share traffic over the multiple paths to
the same virtual MAC address.
TE
IN

A RE
SH
T
NO
DO

LY
ON
EVPN /VXLAN Configuration

E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Underlay Topology
The slide shows the IP Fabric that will serve as the underlay network. It is based on EBGP with each router being in its own
autonomous system. Each router will advertise its loopback address which will also serve as the VTEP address as well.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Overlay Topology
The slide shows the overlay topology. Each leaf will act as a VXLAN Layer 2 Gateway. Each Spine will act as a distributed
VXLAN Layer 3 Gateways and provide routing into and out of the 10.1.1/24 subnet. Host A will be dual-homed using a LAG to
E
two Leaf nodes. The control plane for VXLAN will be EVPN using MP-IBGP. In the IBGP topology, the Spine nodes will act as
route reflectors.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Logical View
To help you understand they behavior of the example, the slide shows a logical view of the overlay network. Using the help of
VXLAN, it will appear that Host A, Host B, and the IRBs of the routers in AS 64512 and 64513 will be in the same broadcast
E
domain as well as IP subnet. Also, a matching virtual IP address and a matching virtual MAC address will be assigned to
each Spine nodes IRB interface which will provide a redundant, distributed default gateway to the two hosts.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Common Configuration
The slide shows the common configuration for all routers. Notice that a load-balancing policy has been applied to the
forwarding-table that will allow for multiple next hops to be installed in the forwarding table. Also, there is a policy called
E
direct that will be applied to the EBGP neighbors. The main purpose of this policy is to advertise each routers loopback
interface (VTEP source interface) to all routers in the fabric. Lastly, in order for each router to run BGP, the autonomous
US
system must be set under [edit routing-options]. Looking at the example topology, you should notice that each
router will belong to two autonomous systems. Each router will belong to one AS in the underlay and one AS in the overlay. If
you plan to use the automatic route target function (described in subsequent slides) you should set the AS under [edit
routing-options] to the overlay networks AS number.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Spine Node Configuration

The slide shows the BGP configuration of the Spine nodes.
E
Underlay Configuration: Each router is peering to one another using EBGP. The export statement allows for all
directly connect networks to be advertise to BGP neighbors. The local-as statement overrides the settings
US
under routing-options just for the neighbors in this group. The multipath multi-as statement allows
for multiple routes from multiple ASs to be used as active routes in the routing table.
Overlay Configuration: Each Spine node is acting as a route reflector running IBGP with its clients. The
cluster statement cause the local router to act like a route reflector for the neighbors in this group. The
family evpn signaling statement sets the AFI and SAFI for the IBGP sessions. The local-as
AL
configuration is probably unnecessary since the same AS is configured under routing-options. The
multipath statement allows for multiple similar received BGP routes to be active in the routing table.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Leaf Node Configuration

The slide shows the BGP configuration of a leaf node. The configuration is very similar to the previous slide. The main
difference is that in the overlay group, a leaf node only needs to peers with the route reflectors.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Underlay Routing
You must ensure that all VTEP addresses are reachable by all of the routers in the IP Fabric. Generally, the loopback
interface will be used on Juniper Networks routers as the VTEP interface. Therefore, you must make sure that the loopback
E
addresses of the routers are reachable.

US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
BGP Neighbor Status

Use the show bgp summary command to determine the status of your BGP neighbors. The slide shows that Leaf1 has
established neighbor relationships using EVPN signaling with the route reflectors. Although no routes have been received
E
from the route reflectors, you can see the RIB-IN that will be used for both sessions is bgp.evpn.0.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VXLAN Layer 2 Gateway Configuration

The slide shows the interface and VLAN configuration necessary to enable VXLAN Layer 2 Gateway functionality on a
QFX5100 Series router. Although this slide shows the configuration on the Leaf1 device, Leaf2 will have the exact same
E
configuration. Since Leaf1 and Leaf2 have interfaces that belong to the same Ethernet Segment, both Leaf1 and Leaf2
should have their et-0/0/50s ESI value set to the same value. When you assign an ESI value, you need to make sure that it
US
is globally unique for the ES.

AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
QFX5100 VRF Import and Export Policy

The slides shows the minimum configuration (along with the previous slide) to enable VXLAN Layer 2 Gateway functionality
using EVPN signaling. It is under [edit switch-options] that you will configure the vtep-source-interface,
E
route-distinguisher, and the vrf-target statement. It is under [edit protocols evpn] that you will set the
encapsulation of VXLAN, the multicast mode, and the list of VNIs that will receive the benefit of EVPN signaling.
US
The slide mentions the vrf-target statement and its behavior in the exporting of EVPN routes. It literally creates a hidden
export policy that advertises all locally generated Type1, Type2, and Type3 routes to remote PE routers after tagging the
routes with the specified target community. Also, the vrf-target statement creates a hidden import policy that accepts
any received EVPN routes that are tagged with the specified target community. We will discuss how to modify the routers
AL
import and export policies in subsequent slides.

RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
MX - VXLAN Layer 3 Gateway

The slide shows the minimum configuration of an MX Series router acting as a VXLAN Layer 3 Gateway. Even though the slide
shows the top to bottom view, it may help to understand what is going on if you look at it from the bottom up. Notice that the
E
bridge domain and EVPN configuration occurs in the context of a virtual switch, tenant1_vs. Virtual switch configuration is
required on an MX Series when using EVPN signaling. Everything under the tenant1_vs enables the MX Series to be a
US
VXLAN Layer 2 Gateway, similar to the QFX5100 series configuration on the previous slide except for the
routing-interface irb.0 statement. Notice the IRB interface has been given a real IP address of 10.1.1.10. It has
also been assigned a virtual-gateway-address of 10.1.1.254 which is the default gateway for the 10.1.1/24 subnet.
It may not be obvious but the virtual-gateway-address statement also binds a virtual MAC address of
00:00:5e:00:01:01 to the 10.1.1.254 address on the spine1 router. The virtual-gateway-address does everything
AL
mentioned above as well as cause the Spine1 router to send a MAC Advertisement route to all remote PEs advertising the
virtual MAC address. Since Spine1 and Spine2 are configured with the same virtual-gateway-address (and virtual
MAC), the remote PEs can load share traffic towards the virtual MAC address (i.e. whenever a host needs to send data to the
default gateway). One last thing to mention is that by default, the subnet associated with the IRB interface is installed in the
RN
inet.0 table. The slide shows that the IRB interface has been associated with the tenant1_vr routing instance. That
means that any packet arriving on the IRB interface will be routed based on the tenant_vr.inet.0 routing table.
TE
IN

A RE
SH
T
NO
DO

LY
ON
Default vrf-target Behavior

Earlier, we saw the minimum configuration needed to allow a device to advertise EVPN routes using the vrf-target
statement. Using the vrf-target statement by itself gives you very little control over the routes that get populated in the
E
VRF tables. The slide shows that Leaf2 only needs to receive MAC Advertisements for VNI 1000. However, since each Leaf
node is only configured with the vrf-target statement, Leaf2 will receive and accept MAC Advertisement Routes for VNI
US
2000 also. Even though, Leaf2 does not have a MAC table for VLAN 200, it will still install all the MAC Advertisement routes
in its RIB-IN table as well as its VRF table. This can be a major waste of memory on Leaf2 depending on how many MACs
have been advertised for VNI 2000. The next few slides will show you how to get control over which routes are accepted by
the PE routers.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Targets Per VNI

Type 1 Ethernet Autodiscovery routes are always advertised with the target associated with the vrf-target statement.
Also, Type 4 Ethernet Segment routes do not carry a standard target community, instead, they carry an ES-import community.
E
That leaves us with the Type 2 MAC Advertisement Route and the Type 3 Inclusive Multicast Ethernet Tag Route. As you know,
both of these routes carry the VNI value. That means that these types of routes are VNI-specific (Type 1 and Type 4 routes are
US
Ethernet Segment-specific). It is possible to set VNI-specific import and export policy using VNI-specific target communities.
The slide shows how to configure the VNI-specific vrf-target export statement under the vni-options hierarchy.
Although, the vrf-target export statements apply a hidden export policy that advertises and tags the Type 2 and Type
3 routes for the related VNI using the configured target community, the commands do not apply any import policies. So, after
applying the vrf-target export statements, you must also configure and apply a vrf-import policy that accepts the
AL
new target communities as well as the original target community for the EVI (target:64520:1 in the example). Your
import policy will override the hidden import policy that was created by the original vrf-target statement (vrf-target
target:64520:1 in the example).
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Automatic Route Targets

In the previous slide you learned how to take control of a routers Type 2 and Type 3 advertisements. If you are working with
1000s of VLANs/VNI per interface, the task of applying per-VNI route targets might become cumbersome. The slide shows
E
you how you can have your router automatically assign route targets to each configured VNI by configuring the auto
statement. This statement will also cause your router to automatically enable hidden VRF import and export policies to
US
advertise and accept received routes tagged with the automatically generated target communities. You should note that the
automatically generated VRF import policies that are created as a result of the auto statement will override the import
policy that gets instantiated with the vrf-target target:64520:1 statement on the slide (which is used for the Type 1
advertisements). So, you must configure and apply an import policy that will accept the Type 1 routes.
AL
In order for the auto statement to work nicely between PEs (so they calculate the same target communities), every PE router
must be configured with auto statement. Also, each PE router must be configured for the same autonomous system under
the [edit routing-instance] hierarchy since the automatically generated target communities are based on that AS
value.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Preserve Original VLAN Tag

As you know, the default behavior of a Juniper Networks device acting as a VXLAN Layer 2 Gateway is to strip the original
VLAN tag of Ethernet frames received from locally attached servers. Another default behavior of those same devices, is to
E
automatically discard any received VXLAN packets that, when decapsulated, contain a VLAN tagged Ethernet frame. The
slide shows the commands that can override those default behaviors. One reason that you might want to preserve the VLAN
US
tagging is to preserve the 802.1p bits for class of service purposes.

AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
BGP Status
Use the show bgp summary command to determine the status and routing tables used with your routers BGP neighbors.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
EVPN RIB-IN
The slide shows how to view all the routes (for all EVPN instances) that have been accepted by VRF import policies.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VRF Table
The slide shows you how to view the routes for a particular EVPN instance.
E
US
AL
RN
TE
IN

RE
A
SH
T
NO
DO

LY
ON
Other BGP Commands

The slide shows some other useful BGP troubleshooting commands.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VTEP Interfaces
Prior to learning any remote neighbors, a VXLAN Gateway will create a single logical VTEP interface, vtep.32768 on the
slide. Although this interface is never used for forwarding, when it shows up in the output of this command it allows you to
E
verify two things; the local device is configured as a VXLAN Gateway and its source IP address for VXLAN packets. For each
remote VTEP learned, a gateway will instantiate another logical VTEP interface, vtep.32769 on the slide. These interfaces
US
represent the VXLAN tunnel established between the local gateway and the remote gateway. These interfaces are actually
used for forwarding as you can tell from the input and output packet counts.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VTEP Source and Remote

The source command allows you see the locally configured values for a gateway. The remote command allows you to see
the details of the remotely learned gateway/VTEPs.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
MAC Table
A VXLAN Gateway uses a MAC table for forwarding decisions. The slide shows the two commands to verify the MACs and
associated interfaces that have been learned by the gateway.
E
US
AL
RN
TE
IN

RE
A
SH
T
NO
DO

LY
ON
We Discussed:
The benefits of using EVPN signaling for VXLAN;
E
The operation of the EVPN protocol; and

Configuring and monitoring EVPN signaling for VXLAN.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Review Questions
1.
E
2.
US
3.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Lab: EVPN Control Plane for VXLAN

E
US
AL
RN
TE
IN

RE
1.
EVPN allows for CE devices to multi-home to more than one Leaf node such that all interfaces are actively forwarding data. EVPN
signaling minimizes unknown unicast flooding since PE routers advertise locally learned MACs to all remote PEs.
A
2.
An Ethernet Segment route is tagged with the ES-Import Route Target community.
SH
3.
Because configuring the auto statement overrides the hidden import policies of the vrf-target statement, you must configure
and apply a VRF import policy that accepts the target community that is assigned to the Type 1 routes.
T
NO
DO

LY
ON
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
E
US
AL
RN
TE
IN

A RE
SH
T
NO
Chapter 6: Data Center Interconnect
DO

LY
ON
E
US
AL
RN
TE
IN
A RE
SH
T
NO
DO

LY
ON
We Will Discuss:
The meaning of the term Data Center Interconnect;
E
The control and data plane of an MPLS VPN; and

The DCI options when using a VXLAN overlay with EVPN signaling.
US
AL
RN
TE
IN
Chapter 62 Data Center Interconnect www.juniper.net

A RE
SH
T
NO
DO

LY
ON
DCI Overview
E
US
AL
RN
TE
IN
www.juniper.net Data Center Interconnect Chapter 63

A RE
SH
T
NO
DO

LY
ON
What Is a Data Center Interconnect?

Data center interconnect (DCI) is basically a method to connect multiple data centers together. As the name implies, a
Layer 3 DCI uses IP routing between data centers while a Layer 2 DCI extends the Layer 2 network (VLANs) from one data
E
center to another.
US
Many of the DCI communication options rely on an MPLS network to transport frames between data centers. Although in
most cases an MPLS network can be substituted with an IP network (i.e., by encapsulating MPLS in GRE), there are several
advantages to using an MPLS network including availability, cost, fast failover, traffic engineering, and scalable VPN options.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Interconnect Network
Between two data centers that need to be interconnected is a network of some type. A typical interconnect network could be
a point-to-point line, an IP network, or an MPLS network. The slide shows that these networks can be owned by customer
E
(the owners of the data center) or by a service provider. All the DCI options that we discuss in this chapter will work in both a
customer-owned or service provider-owned interconnect network. The main difference is how much control a customer has
US
over the DCI. Sometimes it is just easier and cost effective to let the service provider manage the DCI.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Point-to-Point DCI
In general, if there is great distance between data centers, a point-to-point interconnect can be pretty expensive. However, if
the data centers are just down the street from one another, it might make sense to have a point-to-point interconnect. This
E
type of interconnect is usually provided as a dark fiber between the data center. The customer simply attaches equipment to
the fiber and has the choice of running any type of protocol they wish over the interconnect.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
IP DCI
It is possible to provide a DCI over an IP network. If the DCI is meant to provide Layer 2 stretch (extending of VLANs) between
the data centers then the Ethernet frames will need to be encapsulated in IP as it traverses the DCI. VXLAN and GRE are
E
some of the typical IP encapsulations that provide the layer 2 stretch. If the DCI is to provide layer 3 reachability between
data center, then an IP network is well suited to meet those needs. However, sometime the DCI network may only support
US
globally routeable IP addressing while the data centers use RFC 1918 addressing. When that is the case, it might make
sense to create a layer 3 VPN between the two data centers, like GRE, IPsec, or RFC 4364 (MPLS Layer 3 VPN over GRE).
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
MPLS DCI
The slide shows the encapsulation boundary of an MPLS transport network. The boundaries are different depending on who
owns the MPLS network. If the customer own the MPLS network then MPLS can be used for encapsulation from end-to-end.
E
If the service provider owns the MPLS network then the encapsulations between DC and MPLS network completely depends
on what is allowed by the service provider. If the service provider is providing a layer 2 VPN service, then the customer should
US
expect that any Ethernet frames sent from one data center will appear unchanged as it arrives at the remote data center. If
the service provider is providing a layer 3 VPN service, then the customer should expect that any IP packets sent from one
data center will appear unchanged as it arrives at the remote data center. In some cases, the service provider will allow a
customer to established data center-to-data center MPLS label switched paths (LSPs).
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
MPLS Advantages
Many of the DCI technologies that we will discuss depend on an MPLS network to transport frames between data centers.
Although in most cases an MPLS network can be substituted with an IP network (i.e., by encapsulating MPLS in GRE), there
E
are several advantages to using an MPLS network:

US
1. Fast failover between MPLS nodes: Fast reroute and Node/Link protection are two features of an MPLS network
that allow for 50ms or better recovery time in the event of a link failure or node failure along the path of an
MPLS label switched path (LSP).
2. Scalable VPNs: VPLS, EVPN, L3 MPLS VPNs are DCI technologies that use MPLS to transport frames between
data centers. These same technologies allow for the interconnection of many sites (potentially hundreds)
AL
without the need for the manual setup of a full mesh of tunnels between those sites. In most cases, adding a
new site only requires administrator to configure the devices at the new site. The remote sites do not need to be
touched.
RN
3. Traffic engineering: MPLS allows for the administrator to decide the path takes over the MPLS network. You no
longer have to take the same path calculated by the IGP (i.e., all data takes the same path between sites). You
can literally direct different traffic types to take different paths over the MPLS network.
4. Any-to-any connectivity: When using an MPLS backbone to provide the DCI, it will allow you the flexibility to
TE
provide any type of MPLS-based Layer 2 DCI, Layer 3 DCI, or both combinations that you choose. An MPLS
backbone is a network that can generally support most types of MPLS or IP-based connectivity at the same
time.
IN

A RE
SH
T
NO
DO

LY
ON
MPLS VPN Review

E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
MPLS Shim Header: Part 1

MPLS is responsible for directing a flow of IP packets along a predetermined path across a network. This path is the LSP,
which is similar to an ATM VC in that it is unidirectional. That is, the traffic flows in one direction from the ingress router to an
E
egress router. Duplex traffic requires two LSPsthat is, one path to carry traffic in each direction. An LSP is created by the
concatenation of one or more label-switched hops that direct packets between label-switching routers (LSRs) to transit the
US
MPLS domain.
When an IP packet enters a label-switched path, the ingress router examines the packet and assigns it a label based on its
destination, placing a 32-bit (4-byte) label in front of the packets header immediately after the Layer 2 encapsulation. The
label transforms the packet from one that is forwarded based on IP addressing to one that is forwarded based on the
AL
fixed-length label. The slide shows an example of a labeled IP packet. Note that MPLS can be used to label non-IP traffic,
such as in the case of a Layer 2 VPN.
MPLS labels can be assigned per interface or per router. The Junos operating system currently assigns MPLS label values on
a per-router basis. Thus, a label value of 10234 can only be assigned once by a given Juniper Networks router.
RN
At egress the IP packet is restored when the MPLS label is removed as part of a pop operation. The now unlabeled packet is
routed based on a longest-match IP address lookup. In most cases, the penultimate (or second to last) router pops the label
stack in penultimate hop popping. In some cases, a labeled packet is delivered to the ultimate routerthe egress LSRwhen
the stack is popped, and the packet is forwarded using conventional IP routing.
TE
IN

A RE
SH
T
NO
DO

LY
ON
MPLS Shim Header: Part 2

The MPLS shim header is composed of four fields:
E
20-bit label: Identifies the packet as belonging to a particular LSP. This value changes as the packet flows on
the LSP from LSR to LSR.
US
Traffic Class (TC): Formerly called EXP (experimental), these three bits can be used to convey class-of-service
information, specifically the forwarding class a given packet belongs to. The 3-bit width of this field makes it
possible to give a frame a total of eight possible markings, each of them potentially linked to a different
forwarding behavior, for example a different queuing priority and a different buffer size.
AL
Bottom-of-stack bit: many MPLS applications require a packet to be tagged with several labels, one stacked on
top of the other.
The bottom-of-stack bit of a MPLS header is set to 1 if this is the bottom of the label stack, and below is the
payload. The bottom-of-stack bit is set to zero instead if below the header lays another MPLS header (i.e.
RN
another label).
Among the applications which require label stacking are for example VPNs. Here the outer label, or transport
label, indicates which label-switching router traffic should be delivered to. The inner label, called service label,
describes instead how the payload should be treated once it reaches its destination label-switching router.
TE
Time to live (TTL): As in the case for the equivalent IP field, TTL limits the number of hops a MPLS packet can
travel. It is decremented at each hop, and if its value drops to zero, the packet is discarded. When using MPLS
for IP traffic engineering, the default behavior is to copy the value of the IP TTL field into the MPLS TTL field. This
allows diagnostic tools like traceroute to continue working even when packets are encapsulated within MPLS
IN
and sent down a label-switched path.

A RE
SH
T
NO
DO

LY
ON
Labels Have Only Local Significance

A very important point to keep in mind is that labels have only local significance: they are assigned by each router according
to its own label availability. In other words, you can establish a label-switched path across a domain, between two endpoints,
E
and traffic following the path will typically be tagged with a different label at each hop.
US
A second important point is that generally labels are global to the router, and not tied to the incoming interface; a packet
tagged with a given label will be subject to the same forwarding treatment regardless from the interface it has been received
on. This apparently minor point will play a major role in MPLS traffic protection - a set of MPLS features that try and minimize
packet loss during a link or a node failure.
There are only very few exceptions to this rule, mostly to do with specific (and very advanced) MPLS applications. One
AL
example is carrier-of-carriers, where a MPLS-enabled service provider offers a MPLS transport service to other service
providers.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Reserved MPLS Label Values

Labels 0 through 15 are reserved (RFC 3032, MPLS Label Stack Encoding).
E
A value of 0 represents the IP version 4 (IPv4) explicit null label. This label indicates that the label must be
popped, and the forwarding of the packet must then be based on what is below it, either another label or the
US
payload.
A value of 1 represents the router alert label. This label value is legal anywhere in the label stack except at the
bottom. When a received packet contains this label value at the top of the label stack, it is delivered to a local
software module for processing. The label beneath it in the stack determines the actual forwarding of the
packet. However, if the packet is forwarded further, the router alert label should be pushed back onto the label
AL
stack before forwarding. The use of this label is analogous to the use of the router alert option in IP packets.
A value of 2 represents the IP version 6 (IPv6) explicit null label. This label value is legal only when it is the sole
label stack entry. It indicates that the label stack must be popped, and the forwarding of the packet then must
RN
be based on the IPv6 header.

A value of 3 represents the implicit null label. This is a label that a LSR can assign and distribute, but it never
actually appears in the encapsulation. When an LSR would otherwise replace the label at the top of the stack with
a new label, but the new label is implicit null, the LSR pops the stack instead of doing the replacement. Although
TE
this value can never appear in the encapsulation, it can be specified by a label signaling protocol.
Continued on the next page.
IN

Reserved MPLS Label Values (contd.)
RE
The following list is a continuation of reserved Labels 0 through 15 (RFC 3032, MPLS Label Stack Encoding).
A value of 7 is used for the Entropy Label Indicator (ELI). After determining a load balancing methodology, the ELI
allows the ingress LSR to notify the downstream LSRs of the chosen load balancing methodology.
A
A value of 13 is used for Generic Associated Channel Label (GAL). This label informs an LSR that a received LSP
belongs to a Virtual Circuit Connectivity Verification (VCCV) control channel.
SH
A value of 14 is used as the OAM Alert Label. This label indicates that a packet is an MPLS OAM packet as
described in ITU-T Recommendation Y.1711.
Values 46, 8-12, and 15 are reserved for future use.
T
NO
DO

LY
ON
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Implicit and Explicit Null Labels

Two labels deserve special attention: Label 0 and Label 3. These two labels can only be used at the end of an LSP, between
the penultimate (that is, second-to-last) and the egress router.
E
Label 0 (Explicit null): This label is always assigned an action of decapsulate (pop); the label-switching router
US
will just remove the MPLS header, and take a forwarding action based on what is below it (either another label,
or the actual LSP payload).
Label 3 (Implicit Null): This is a special label value that is never actually found in MPLS frames, but only within
MPLS signaling protocols. It is used by the egress router, i.e. the last hop in a label-switched path, to request the
previous router to remove the MPLS header. This behavior, referred to as penultimate-hop popping, is the
AL
Junos OS default.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Label-Switching Router
The original definition of label-switching router is a router that takes forwarding decisions based only on the content of the
MPLS header. In other words, a label-switching router always operates in label-switching mode. We will use a definition
E
which is slightly less restrictive, to include also ingress and egress routers, sometimes referred to as label edge routers.
Traffic at the ingress or at the egress of a label-switched path is typically not encapsulated into MPLS, so label-switching is
US
not possible, and a forwarding decision needs to be taken according to other rules.
We will use the term label-switching router (LSR) to mean any router which participates in MPLS forwarding, including both
the ingress and the egress nodes. For brevity, in the rest of the course we will also use the term router as synonym for
label-switching router.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
MPLS Label Operations

The forwarding behavior of a label-switching router is defined according to three basic label operations:
E
Push: add a MPLS header (a label) to a packet.

This operation is typically done by the label-switching router at the beginning of a label-switched path, to
US
encapsulate a non-MPLS packet and allow it to be forwarded by label switching within the MPLS domain.
Pop: remove a MPLS header from a MPLS-encapsulated packet.
This is often done either at the end of an LSP or, as we will see shortly, by the second-to-last router (the
penultimate hop).
AL
Swap: replace the label value of a MPLS packet with another value.
This operation is typically performed by transit label-switching routers, as a packet traverses a label-switched
path.
After performing one of these MPLS basic operations, the packet is generally forwarded to the next-hop router.
RN
In some cases the forwarding treatment can be more complex, involving different combinations of the three basic
operations. For some types of services, for example for VPNs, it is common to see a double-push forwarding action; while in
some traffic protection scenarios, when building a local detour to avoid a link failure, sometimes a transit router will have to
perform a swap-push operation.
TE
IN

A RE
SH
T
NO
DO

LY
ON
Label-Switched Path
A label-switched path (LSP) is a unidirectional path through the network defined in terms of label switching operations (push,
pop, swap). You can think of a LSP as a tunnel: any packet that enters it is delivered to its endpoint, no matter what type of
E
payload it contains.
US
Establishing a label-switched path across a MPLS domain means determining the actual labels and label operations
performed by the label-switching routers on the path. This can be done with manual configuration, or by some type of
dynamic label distribution protocol.
Often a label-switched path will reside within a single MPLS domain, for example within a single service provider. However,
the development of advanced BGP-based MPLS signaling allows the creation of label-switched paths that span multiple
AL
domains and multiple administrations.

RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Ingress Label Switching Router

The ingress router, sometimes called head end, is typically performing a label operation of push, inserting the MPLS header
between the layer-2 encapsulation and the payload packet. Its role is encapsulating non-MPLS traffic by adding one or more
E
labels to it, and forwarding it down a label-switched path.

US
The ingress router is not a pure label-switching router: the initial decision of which traffic to forward down which LSP is taken
not according to the content of labels (which are not present yet), but according to other criteria, e.g. a route lookup for IP
MPLS traffic engineering, or even the incoming interface, in case of point-to-point transport of layer-2 frames over MPLS
(layer-2 circuits, circuit-cross-connect).
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Transit Label Switching Routers

The transit label-switching routers are LSRs that are neither at the beginning nor at the end of a label-switched path. They
typically operate in pure label switching mode, taking forwarding decisions only based on the label value of incoming MPLS
E
frames.
US
Very often transit LSRs will perform a swap operation, replacing the incoming label with the one expected by the next-hop of
the label-switched path. Transit LSRs are typically not aware of the content of the MPLS traffic they are forwarding, and do
not know if the payload is IP, IPv6, layer-2 frames or anything else.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
The Label Information Base

The Label Information Base contains the actual MPLS label switching table which associates incoming MPLS labels to
forwarding actions, typically a label operation of either swap or pop and a forwarding next-hop.
E
Even if the label information base can be populated by static entries, generally this is done by a dynamic label distribution
US
protocol.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Penultimate-Hop Popping
Often the MPLS header is removed by the second-to-last (the penultimate) router in an LSP. This removal is an optimization
that helps in several cases, including using MPLS for IP traffic engineering. Removing the label at the penultimate hop
E
facilitates the work of the last-hop (egress) router, which, instead of having both to remove the MPLS header and then take
an IP routing decision, will only need to do the latter.
US
Penultimate-hop popping (PHP) is the default behavior on Juniper routers; however, it can be disabled in the configuration.
Some applications require PHP to be disabled, but that is often done automatically: the Junos OS is smart enough to detect
the need to signal the LSP so that PHP is disabled.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Egress Label Switching Router

The egress router (or tail end) of a LSP is the last router in the label-switched path. Exactly as in the case of the ingress LSR,
it is generally not a pure label-switching router, as it has to take a forwarding decision based on other factors than the
E
incoming label.
US
In case of MPLS IP traffic engineering, the egress router will be delivered ordinary IP packets due to penultimate-hop
popping, and will take a forwarding decision based on ordinary IP routing.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Label Information Base

On routers running the Junos OS, the label information base is stored into the mpls.0 table.
E
As soon as you enable MPLS processing, four default entries are automatically created: they are for label 0 (explicit null),
label 1 (router alert), label 2 (ipv6 explicit null) and label 13 (Generic Associated Label, used for Operation and Maintenance
US
and defined in RFC5586).

AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Label to Forwarding Action Mappings

On top of the pre-defined four labels, the mpls.0 table can be populated by static configuration or, much more frequently, by
dynamic label distribution protocols.
E
Each label is associated with a forwarding action, typically composed of a MPLS label operation (push, pop, swap or a
US
combination of these) and a next-hop. In this example, label 300576 has been installed by a dynamic protocol called LDP,
while the remaining label, 1004792, has been configured statically.
Note that there are two entries for this last label. This is because, in some cases, a label-switching router may have to take
different forwarding actions according to whether the label is or is not at the bottom of the label stack. In this case, the
forwarding actions turn out to be the same: pop the MPLS header and sent the content to 172.17.23.1 via interface ge-1/1/
AL
5.0. The IP address of the next hop needs of course to be directly connected: it is only use to derive which MAC address to
use for layer-2 encapsulation.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Label Distribution Protocols

Label distribution protocols create and maintain the label-to-forwarding equivalence class (FEC) bindings along an LSP from
the MPLS ingress label-switching router (LSR) to the MPLS egress LSP. A label distribution protocol is a set of procedures by
E
which one LSR informs a peer LSR of the meaning of the labels used to forward traffic between them. MPLS uses this
information to create the forwarding tables in each LSR.
US
Label distribution protocols are often referred to as signaling protocols. However, label distribution is a more accurate
description of their function and is preferred in this course.
The label distribution protocols create and maintain an LSP dynamically with little or no user intervention. Once the label
distribution protocols are configured for the signaling of an LSP, the egress router of an LSP will send label (and other)
AL
information in the upstream direction towards the ingress router based on the configured options.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
RSVP
The Junos OS uses RSVP as the label distribution protocol for traffic engineered LSPs.
E
RSVP was designed to be the resource reservation protocol of the Internet and provide a general facility for
creating and maintaining distributed reservation state across a set of multicast or unicast delivery paths
US
(RFC 2205). Reservations are an important part of traffic engineering, so it made sense to continue to use
RSVP for this purpose rather than reinventing the wheel.
RSVP was explicitly designed to support extensibility mechanisms by allowing it to carry what are called opaque
objects. Opaque objects make no real sense to RSVP itself but are carried with the understanding that some
adjunct protocol (such as MPLS) might find the information in these objects useful. This encourages RSVP
AL
extensions that create and maintain distributed state for information other than pure resource reservation. The
designers believed that extensions could be developed easily to add support for explicit routes and label
distribution.
RN
Continued on the next page.

TE
IN

RSVP (contd.)
RE
Extensions do not make the enhanced version of RSVP incompatible with existing RSVP implementations. An
RSVP implementation can differentiate between LSP signaling and standard RSVP reservations by examining
the contents of each message.
With the proper extensions, RSVP provides a tool that consolidates the procedures for a number of critical
A
signaling tasks into a single message exchange:
SH
Extended RSVP can establish an LSP along an explicit path that would not have been chosen by the
interior gateway protocol (IGP);
Extended RSVP can distribute label-binding information to LSRs in the LSP;
Extended RSVP can reserve network resources in routers comprising the LSP (the traditional role of
T
RSVP); and
Extended RSVP permits an LSP to be established to carry best-effort traffic without making a specific
NO
resource reservation.
Thus, RSVP provides MPLS-signaled LSPs with a method of support for explicit routes (go here, then here, finally here),
path numbering through label assignment, and route recording (where the LSP actually goes from ingress to egress, which is
very handy information to have).
DO
RSVP also gives MPLS LSPs a keepalive mechanism to use for visibility (this LSP is still here and available) and redundancy
(this LSP appears deadis there a secondary path configured?).
LDP
LDP associates a set of destinations (prefixes) with each data link layer LSP. This set of destinations is called the FEC. These

destinations all share a common data LSP path egress and a common unicast routing path. LDP supports topology-driven
MPLS networks in best-effort, hop-by-hop implementations. The LDP signaling protocol always establishes LSPs that follow
the contours of the IGPs shortest path. Traffic engineering is not possible with LDP.
LY
ON
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Layer 2 Options
Three classifications exist for Layer 2 DCIs:
E
1. No MAC learning by the Provider Edge (PE) device: This type of layer 2 DCI does not require that the PE devices
learn MAC addresses.
US
2. Data plane MAC learning by the PE device: This type of DCI requires that the PE device learns the MAC
addresses of both the local data center as well as the remote data centers.
3. Control plane MAC learning - This type of DCI requires that a local PE learn the local MAC addresses using the
control plane and then distribute these learned MAC addressed to the remote PEs.
AL
Layer 3 Options
A Layer 3 DCI uses routing to interconnect data centers. Each data center must maintain a unique IP address space. A
Layer 3 DCI can be established using just about any IP capable link. Another important consideration for DCIs is
RN
incorporating some level of redundancy by using link aggregation groups (LAGs), IGPs using equal cost multipath, and BGP or
MP-BGP using the mutipath or multihop features.
TE
IN

A RE
SH
T
NO
DO

LY
ON
Customer Edge Devices

CE devices are located in the DC and usually perform standard switching or routing functions. CE devices can interface to PE
routers using virtually any Layer 2 technology and routing protocol.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Provider Edge Devices

PE devices are located at the edge of the data center or at the edge of a Service Providers network. They interface to the CE
routers on one side and to the IP/MPLS core routers on the other. PE devices maintain site-specific VPN route and forwarding
E
(VRF) tables. In a Layer 3 VPN scenario, the PE and CE routers function as routing peers (RIP, OSPF, BGP, etc), with the PE
router terminating the routing exchange between customer sites and the IP/MPLS core. In a Layer 2 VPN scenario, the PEs
US
CE-facing interface is configured with matching VLAN-tagging to the CEs PE-facing interfaces and any frames received from
the CE device will be forwarding over the MPLS backbone to the remote site.
Information is exchanged between PE routers using either MP-BGP or LDP. This information exchange allows the PE routers to
map data to and from the appropriate MPLS LSPs traversing the IP/MPLS core.
AL
PE routers, and Ingress and Egress LSRs, use MPLS LSPs when forwarding customer VPN traffic between sites. LSP tunnels
in the interconnect network separate VPN traffic in the same fashion as PVCs in a legacy ATM or Frame Relay network.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Provider Routers
Provider (P) routers are located in the IP/MPLS core. These routers do not carry VPN data center routes, nor do they interface
in the VPN control and signaling planes. This is a key aspect of the RFC 4364 scalability model; only PE devices are aware of
E
VPN routes, and no single PE router must hold all VPN state information.
US
P routers are involved in the VPN forwarding plane where they act as label-switching routers (LSRs) performing label
swapping (and popping) operations.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VPN Site
A VPN site is a collection of devices that can communicate with each other without the need to transit the IP/MPLS
backbone (i.e., a single data center). A site can range from a single location with one switch or router to a network consisting
E
of many geographically diverse devices.

US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
MPLS VPN Packet

The slide shows how VPN data is encapsulated in an MPLS VPN scenario (MPLS L3 VPN as an example). PE1 receives IP
packets destined for CE2. PE1 performs a lookup in its Green VRF table (the table associated with the PE-CE interface). The
E
route to CE2s address should list three things in terms of next-hop. It will list the outgoing interface and the inner and outer
label that should be pushed onto the IP packet. The outer label is swapped by the P routers along the way to deliver the
US
MPLS packet to PE2. P3 performs a penultimate hop pop, leaving only single labeled packets and forwards them to PE2. PE2
receives the labeled packets, pops the inner label, and uses the inner label to determine which VRF table to use (PE2 might
have many VRF table). PE2 performs a lookup on the Green VRF table (because label 1000=Green VRF) and forwards the
original IP packets to CE2.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
MPLS VPN Stitching

Sometimes data from one CE may need to pass through multiple VPNs before reaching the remote CE. The top diagram
shows the situation where packets enter the green VPN at PE1, get decapsulated at PE2, and then forwarded in their original
E
format to PE3 where they enter the red VPN. From PE2s perspective, PE3 is the CE for the green VPN. From PE3s
perspective, PE2 is the CE for the red VPN. You might think that you need 2 physical devices, PE2 and PE3 to stitch the two
US
VPNs together. Well, as the bottom diagram shows, you can actually stitch two VPNs together using a single MX Series
router. You can use the logical tunnel interface feature which are internal interfaces that allow you to connect two virtual
routers together. The two virtual routers enabled on the MX Series device would simply perform the same functions as PE2
and PE3 in the top diagram.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
The LSP
The next few slides are going to discuss the details of MPLS Layer 3 VPNs. One thing to remember with Juniper Networks
routers is that once an LSP is established (from PE1 to PE2 in the diagram) the ingress PE will install a host route (/32) to the
E
loopback interface of the egress router in the inet.3 with a next-hop of the LSP (i.e. outbound interface of LSP and push a
label). This default behavior means that not all traffic entering PE1 can get routed through the LSP. So what traffic gets
US
routed over the LSP then?

Looking at the example in the slide, remember PE1 and PE2 are MP-BGP peers. That means that PE2 will advertise VPN
routes to PE1 using MP-BGP which will have a BGP next-hop of 2.2.2.2 (PE2s loopback). For these VPN routes to be usable
by PE1, PE1 must find a route to reach 2.2.2.2 in the inet.3 table. PE1 will not look in inet.0 to resolve the nexthop of MPLS
AL
VPN MP-BGP routes.

RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VPN-IPv4 Route
The VPN-IPv4 route has a very simple purpose which is to advertise IP routes. PE2 installs locally learned routes in its VRF
table. That includes the directly connected PE-CE interface as well as any routes PE2 learns from CE2 (RIP, OSPF, BGP, etc).
E
Once PE2 has locally learned routes in its VRF table, it advertises it (based on configured policy) to remote PEs and attaches
a target community, target community Orange in the example. PE1, upon receiving the route must decide on whether it
US
should keep the route. It makes this decision based on resolving the BGP nexthop in inet.3 as well as looking at the received
route target community. PE1, in order to accept and use this advertisement, must be configured with an import policy that
accepts routes tagged with the Orange target community. Without a configured policy that matches on the Orange route
target, PE1 would just discard the advertisement. So, at a minimum, each VRF on each participating PE for a given VPN must
be configured with an export policy that attaches a unique target community to routes and also configured with an import
AL
policy that matches and accepts advertisements based on that unique target community.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Route Distinguisher Formats Defined

The route distinguisher can be formatted two ways:
E
Type 0: This format uses a 2-byte administration field that codes the providers autonomous system number,
followed by a 4-byte assigned number field. The assigned number field is administered by the provider and should
US
be unique across the autonomous system.

Type 1: This format uses a 4-byte administration field that is normally coded with the router ID (RID) of the
advertising PE router, followed by a 2-byte assigned number field that caries a unique value for each VRF table
supported by the PE router.
AL
The examples on the slide show both the Type 0 and Type 1 route distinguisher formats. The first example shows the 2-byte
administration field with the 4-byte assigned number field (Type 0).
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Route Target Community

Each VPN-IPv4 route advertised by a PE router contains one or more route target communities. These communities are added
using VRF export policy or explicit configuration.
E
When a PE router receives route advertisements from remote PE routers, it determines whether the associated route target
US
matches one of its local VRF tables. Matching route targets cause the PE router to install the route into the VRF table whose
configuration matches the route target.
Because the application of policy determines a VPNs connectivity, you must take extra care when writing and applying VPN
policy to ensure that the tenants connectivity requirements are faithfully met.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VRF Export Policy

For each VRF, you must apply a VRF export policy. A VRF export policy determine which routes in a PEs VRF table will be
advertised to remote PEs. A VRF export policy gives you complete control over the connectivity from one site to another
E
simply by either advertising or not advertising particular routes to a remote site. Another important function of the VRF
export policy is that it will also cause the advertised routes to be tagged with a target community. In the slide, PE2 has a
US
locally learned route (10.1.2/24, the network between PE2 and CE2) in its VRF table. To ensure CE1 and PE1 can send data
to CE2, PE2 has an VRF export policy applied to its IBGP neighbor, PE1, which advertises locally learned routes tagged with
the target community, target:1:1. The next slide shows PE1s process of installing the VPN-IPv4 route in its own VRF table.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VRF Import Policy

The slide shows the process that PE1 goes through when it receives a VPN-IPv4 route advertisement PE2. There is an
assumption that PE1 is configured with a VRF import policy that accepts the same route target, target:1:1, that PE2 is
E
attaching to its VPN routes.

US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
DCI Options for a VXLAN Overlay

E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
EVPN DCI Option

This slide shows the 4 options for DCI when the data centers are enabled for VXLAN using EVPN signaling. The next few
slides discuss each option in detail.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Over the Top (OTT) of an L3VPN

The slide shows an example of the signaling and data plane when using EVPN/VXLAN over a Layer 3 VPN. The two MX Series
devices are the PE routers for the Layer 3 VPN. The layer 3 VPN can be over a private MPLS network or could be a purchased
E
Service Provider service. From the two QFX perspectives, they are separated by an IP network. The QFXs simply forward
VXLAN packets between each other based on the MAC addresses learned through EVPN signaling. The MX devices have an
US
MPLS layer 3 VPN between each other (Bidirectional MPLS LSPs, IGP, L3 VPN MP-BGP routing, etc). The MXs advertise the
local QFXs loopback address to the other MX.
When forwarding data from West to East, QFX1 takes a locally received Ethernet frame and encapsulates it in a VXLAN
packet destined to QFX2s loopback address. MX1 performs a lookup for the received packet on the VRF table associated
AL
with the VPN interface (the incoming interface) and encapsulates the VXLAN packet into two MPLS headers (outer for MPLS
LSP, inner for MX2 VRF mapping). Upon receiving the MPLS encapsulated packet, MX2 uses the inner MPLS header to
determine the VRF table so that it can route the remaining VXLAN packet to QFX2. QFX2 strips the VXLAN encapsulation and
forwards the original Ethernet frame to the destination host.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
EVPN Stitching over MPLS Network

The slide shows an example of the signaling/data plane when using EVPN stitching between three EVPNs; EVPN/VXLAN
between QFX1 and MX1, EVPN/MPLS between MX1 and MX2, and an EVPN/VXLAN between MX2 and QFX2. Each EVPN is
E
signaled using EVPN MP-BGP signaling and are stitched together on the MX devices using logical tunnel interfaces.
US
packet destined to MX1s loopback address. MX1 strips the VXLAN encapsulation and forwards the remaining Ethernet
frame out of a logical tunnel interface. MX1 receives the Ethernet frame over the associated (looped) logical tunnel
interface. MX1 takes the locally received Ethernet frame and encapsulates it in two MPLS headers (outer for MPLS LSP,
inner for MX2 VRF mapping). Upon receiving the MPLS encapsulated packet, MX2 uses the inner MPLS header to determine
AL
the appropriate VRF and outgoing interface. MX2 forwards the remaining Ethernet frame out of a logical tunnel interface.
MX2 receives the Ethernet frame over the associated (looped) logical tunnel interface. MX2 takes the locally received
Ethernet frame and encapsulates it in a VXLAN packet destined to QFX2s loopback address. QFX2 strips the VXLAN
encapsulation and forwards the remaining Ethernet frame to the destination host.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Over the Top (OTT) of a Public IP Network

The slide shows an example of the signaling/data plane when using EVPN stitching between three EVPNs; EVPN/VXLAN
between QFX1 and MX1, EVPN/VXLAN between MX1 and MX2, and EVPN/VXLAN between MX2 and QFX2. Each EVPN is
E
signaled using EVPN MP-BGP signaling and are stitched together on the MX devices using logical tunnel interfaces.
US
packet destined to MX1s loopback address. MX1 strips the VXLAN encapsulation and forwards the remaining Ethernet
frame out of a logical tunnel interface. MX1 receives the Ethernet frame over the associated (looped) logical tunnel
interface. MX1 takes the locally received Ethernet frame and encapsulates it in a VXLAN packet destined to MX2s loopback
address. MX2 strips the VXLAN encapsulation and forwards the remaining Ethernet frame out of a logical tunnel interface.
AL
MX2 receives the Ethernet frame over the associated (looped) logical tunnel interface. MX2 takes the locally received
Ethernet frame and encapsulates it in a VXLAN packet destined to QFX2s loopback address. QFX2 strips the VXLAN
encapsulation and forwards the remaining Ethernet frame to the destination host.
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
EVPN over IP
The slide shows an example of the signaling/data plane when using EVPN/VXLAN over an IP network. EVPN MP-BGP is used
to synchronize MAC tables.
E
US
packet destined to MX1s loopback address. QFX2 strips the VXLAN encapsulation and forwards the remaining Ethernet
frame to the destination host.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
EVPN Type 5 Routes

E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Stretching Subnets
The slides shows the EVPN Type 2 MAC Advertisements that must be exchanged between data centers when individual
subnets are stretched between data centers. Notice that Host1 and Host2 are attached to the same subnet. The example
E
shows the advertisement of just a single MAC addresses. However, in a real environment you might see 1000s of MAC
addresses advertised between data centers. That is a bunch of routes! MAC moves, adds, and changes in one data center
US
will actually effect the MAC tables/EVPN routing exchanges in another data center.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Unique Subnets
The EVPN Type 5 IP Prefix route can be used in a DCI situation in which the IP subnets between data centers are completely
unique. Notice that Host1 and Host2 are attached to different subnets. This fact is very important to the discussion. In this
E
situation, if Host1 needs to send an IP packet to Host2 it will send it to its default gateway which is the IRB of PE1. Leaf 1 will
encapsulate the Ethernet frames from Host1 into VXLAN and send the VXLAN packets to PE1. PE1 will strip the VXLAN
US
header and notice that the remaining Ethernet frames from Leaf1 have a destination MAC of its own IRB. It will strip the
Ethernet header and route the remaining IP packet based on the routing table associated with the IRB interface. PE1 will use
the EVPN Type 5 route that was received from PE2 for the 10.1.2/24 network and the packet will be forwarded over the
VXLAN tunnel between PE1 and PE2. You might ask yourself, Why couldnt PE1 use a standard IP route? Why does the
10.1.2/24 network need to be advertise by an EVPN Type 5 route? The answer is that the Type 5 route allows for inter-data
AL
center traffic to be forwarded over VXLAN tunnels (i.e. the end to end VXLAN-based VPN is maintained between data
centers). This is very similar to stitching concept discussed earlier. PE2 then receives the VXLAN encapsulated packet and
forwards the remaining IP packet towards the destination over the IRB interface (while encapsulating the IP packet in an
Ethernet header with a destination MAC of Host2). Finally, PE performs a MAC table lookup and forwards the Ethernet frame
RN
over the VXLAN tunnel between PE2 and Leaf2.

TE
IN

A RE
SH
T
NO
DO

LY
ON
DCI Example
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Option 1 Example Topology: Part 1

The slide shows the topology that will serve as the underlay network. It is based on EBGP routing between the routers in the
same data center. In AS 64555, PE and P routers will run OSPF to advertise each routers loopback address. They will run
E
LDP to automatically establish MPLS LSPs to each others loopback address. Finally, each PE will establish a VPN-IPv4
MP-IBGP session with each other. The PEs will exchange locally learned routes (the loopback addresses of the Leaf nodes)
US
so that they Leaf nodes can establish the overlay network (next slide).
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Option 1 Example Topology: Part 2

Once the underlay network is established, each Leaf node will have learned a route from the local PE (using the EBGP
session) to reach the loopback address (VTEP source address) of the remote Leaf. Leaf1 and Leaf2 will act as VXLAN Layer
E
2 Gateways and also establish an EVPN MP-IBGP session with each other to exchange EVPN routes to advertise locally
learned MACs to the remote Leaf. Host A and Host B will be able to communicate as if they were on the same LAN segment.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
PE1s MPLS Configuration

The slide shows the MPLS configuration of PE1.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
PE1s MPLS Status

Any easy way to see that there are MPLS LSPs established when using LDP signaling is to view the inet.3 table. If there is a
route in the inet.3 table to the remote PEs loopback address then there is a unidirectional MPLS LSP established to the
E
remote PE. Remember, there needs to be an MPLS LSP established in each direction so you must check the inet.3 table on
both PEs.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VRF Configuration
The slide shows the VRF configuration for PE1. Notice the use of the vrf-target statement. Originally, VRF import policies
could only be enabled by writing explicit policies under [edit policy-options] and applying them using the
E
vrf-import and vrf-export statements. However, more recent versions of the Junos Operating System allow you to
skip those steps and simple configure a single vrf-target statement. The vrf-target statement actually enables two
US
hidden policies. One policy is an VRF export policy that takes all locally learned routes in the VRF (direct interface routes as
well as routes learned from the local CE) and advertises them to the remote PE tagged with the specified target community.
The other policy is an VRF import policy that will accept all VPN-IPv4 routes learned from remote PEs that are tagged with the
specified target community.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
MP-BGP Routing
The slide shows how to enable VPN-IPv4 signaling between PEs. Use the show bgp summary command to verify that the
MP-BGP neighbor relationship is established and that the PE is receiving routes from the remote neighbor.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
VRF Table
Remember, the main purpose of establishing an underlay network and the DCI is to ensure that the routers in each site can
reach the loopback addresses (VTEP source addresses) of the remote Leaf nodes. The slide shows that PE1 has learned the
E
loopback address of Leaf2.

US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Leaf1 Configuration
The slide shows the underlay and overlay network configuration of Leaf1. Leaf2 would be configured very similarly.
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
We Discussed:
The meaning of the term Data Center Interconnect;
E
The control and data plane of an MPLS VPN; and

The DCI options when using a VXLAN overlay with EVPN signaling.
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Review Questions
1.
E
2.
US
3.
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Lab: Data Center Interconnect

E
US
AL
RN
TE
IN

RE
1.
A DCI can be provided by a point-to-point link, and IP network, or an MPLS network.
2.
A
The VPN-IPv4 NLRI includes an MPLS label, the route distinguisher, and an IP prefix. A target community is also tagged to the route
but it is not officially part of the NLRI.
SH
3.
When the transport network of a DCI is a public IP network, the option available for a DCI is option 3.
T
NO
DO

LY
ON
E
US
AL
RN
TE
IN

A RE
SH
T
NO
DO

LY
ON
Resources to Help You Learn More

The slide lists online resources available to learn more about Juniper Networks and technology. These resources include the
following sites:
E
Pathfinder: An information experience hub that provides centralized product details.

US
Content Explorer: Junos OS and ScreenOS software feature information to find the right software release and
hardware platform for your network.
Feature Explorer: Technical documentation for Junos OS-based products by product, task, and software
release, and downloadable documentation PDFs.
AL
Learning Bytes: Concise tips and instructions on specific features and functions of Juniper technologies.
Installation and configuration courses: Over 60 free Web-based training courses on product installation and
configuration (just choose eLearning under Delivery Modality).
RN
J-Net Forum: Training, certification, and career topics to discuss with your peers.
Juniper Networks Certification Program: Complete details on the certification program, including tracks, exam
details, promotions, and how to get started.
TE
Technical courses: A complete list of instructor-led, hands-on courses and self-paced, eLearning courses.
Translation tools: Several online translation tools to help simplify migration tasks.
IN
www.juniper.net
A RE
SH
T
NO
DO

LY
ON
E
US
AL
RN
TE
IN
www.juniper.net
Acronym List
RE
AD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . aggregation device
A
AFI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Address Family Identifier
BGP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Border Gateway Protocol
SH
BUM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . broadcast, unknown unicast, and multicast
CapEx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .capital expenses
CE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . customer edge
CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .command-line interface
CSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Control and Status Protocol
T
DCI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Center Interconnect
EVI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EVPN Instance
NO
FCoE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fibre Channel over Ethernet
FCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Frame Check Sequence
FEC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . forwarding equivalence class
GRE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .generic routing encapsulation
GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .graphical user interface
IBGP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . internal BGP
DO
IGMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Internet Group Management Protocol
IGP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . interior gateway protocol
IPv6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IP version 6
JNCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juniper Networks Certification Program
LAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . link aggregation group

LSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . label switched path
LSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .label-switching router
MAC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . media access control
MC-LAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Multichassis Link Aggregation
LY
MP-BGP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multi-protocol Border Gateway Protocol

MPLS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiprotocol Label Switching
OpEx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . operating expenditures
OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . operating system
ON
P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . provider
PE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . provider edge
PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .penultimate-hop popping
PIM-SM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Protocol Independent Multicast Sparse Mode
RID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . router ID
E
RP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rendezvous point
RPT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rendezvous point tree
US
SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . satellite device
STP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spanning Tree Protocol
VC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virtual Chassis
VCF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virtual Chassis Fabric
VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .virtual machine
VPN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . virtual private network
AL
VRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VPN route and forwarding

VTEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .VXLAN Tunnel End Point
VXLAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virtual eXtensible Local Area Network
RN
TE
IN
www.juniper.net Acronym List ACR1

A RE
SH
T
NO
DO

LY
ON
E
US
AL
RN
TE
IN
ACR2 Acronym List www.juniper.net

SE ADCX-14.a-R SG PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

SE ADCX-14.a-R SG PDF

Încărcat de

Drepturi de autor:

Formate disponibile

ARE

Worldwide Education Services

1133 Innovation Way

Course Number: EDU-JUN-ADCX

Chapter 5: EVPN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1

Lab: EVPN Control Plane for VXLAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-56

Chapter 6: Data Center Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1

MPLS VPN Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10

EVPN Type 5 Routes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-49

Acronym List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ACR-1

www.juniper.net Contents iii

After successfully completing this course, you should be able to:

Describe the new networking requirements in a data center.

Configure an EBGP-based IP Fabric.

Explain why you would use VXLAN in your data center.

Describe the benefits of using EVPN signaling for VXLAN.

Define the term Data Center Interconnect.

www.juniper.net Course Overview v

vi Course Agenda www.juniper.net

Input Text Versus Output Text

Style Description Usage Example

Normal CLI No distinguishing variant. Physical interface:fxp0,

Configuration > History.

Defined and Undefined Syntax Variables

Style Description Usage Example

CLI Variable Text where variable value is already policy my-peers

value the user must input filename in the Filename field.

www.juniper.net Document Conventions vii

viii Additional Information www.juniper.net

Additional Juniper Networks, Inc. courses; and

Chapter 12 Course Introduction www.juniper.net

www.juniper.net Course Introduction Chapter 13

Chapter 14 Course Introduction www.juniper.net

www.juniper.net Course Introduction Chapter 15

General Course Administration

Chapter 16 Course Introduction www.juniper.net

Training and Study Materials

www.juniper.net Course Introduction Chapter 17

Chapter 18 Course Introduction www.juniper.net

www.juniper.net Course Introduction Chapter 19

Juniper Networks Education Services Curriculum

Classroom-based instructor-led technical courses

Chapter 110 Course Introduction www.juniper.net

Juniper Networks Certification Program

www.juniper.net Course Introduction Chapter 111

Juniper Networks Certification Program Overview

The JNCP offers the following features:

Written proficiency exams; and

Chapter 112 Course Introduction www.juniper.net

Preparing and Studying

www.juniper.net Course Introduction Chapter 113

Chapter 114 Course Introduction www.juniper.net

www.juniper.net Course Introduction Chapter 115

Chapter 116 Course Introduction www.juniper.net

Chapter 22 Next Generation Data Centers www.juniper.net

Traditional Multitier Architecture

www.juniper.net Next Generation Data Centers Chapter 23

Chapter 24 Next Generation Data Centers www.juniper.net

requirement, nor does it make sense in some cases.

www.juniper.net Next Generation Data Centers Chapter 25

Benefits of Using Hierarchy

Chapter 26 Next Generation Data Centers www.juniper.net

Challenges of Using Hierarchy