Sunteți pe pagina 1din 40

Universitatea Politehnica din București

Facultatea de Electronică, Telecomunicații și Tehnologia Informației


Master Telecomunicații - Disciplina Cercetare Științifică și Practică S2
Sesiunea de examinare: Mai-Iunie 2017
Masterand: Ion Alexandru IONESCU

APLICAȚII SDN (SOFTWARE-DEFINED NETWORKING) ȘI NFV (NETWORK


FUNCTION VIRTUALIZATION) ÎN MEDII DE CENTRE DE DATE
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

Acest document este menit să ilustreze activitatea de cercetare aferentă disciplinei Cercetare Științifică și
Practică S2 din programa de Master Telecomunicații – ETTI. Totodată, pe baza acestuia, precum și a
documentelor adeferente discipinelor Cercetare Științifică și Practică S1 si S3, se va construi lucrarea de
dizertație de la finalul studiilor de master.
De asemenea, autorul menționează și faptul că, în ansamblu, lucrarea finală de dizertație se dorește a fi o
aprofundare a tezei de licență susținută de acesta la finalul primului ciclu de învățământ superior (desfășurat
în cadrul The Open University, Marea Britanie). Din acest motiv, prezentul document este redactat în limba
engleză – lucru ce urmează a fi modificat pentru lucrarea finală de dizertație – în conformitate cu rigorile
academice impuse de Universitatea Politehnica din București

2
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

Table of Contents
Introduction............................................................................................................................................................... 4
Problem statement................................................................................................................................................................................. 4
Proposed solution.................................................................................................................................................................................. 4
Technology overview.............................................................................................................................................. 5
Generic concepts regarding SDN and NVF................................................................................................................................... 5
Background information for network virtualization within the data center.................................................................7
Current SDN paradigms in data center environments – Cisco ACI................................................................................... 12
Current SDN paradigms in data center environments – VMware NSX...........................................................................14
Project roll-out....................................................................................................................................................... 16
Reference topology.............................................................................................................................................................................. 16
Access layer migration....................................................................................................................................................................... 18
Aggregation and core layers – Stage 1 – vPC............................................................................................................................. 26
Services layer migration – NFV appliances................................................................................................................................ 28
Aggregation and core layers – Stage 2 – ACI............................................................................................................................. 33
Current stage of project work......................................................................................................................................................... 33
Project management........................................................................................................................................................................... 35
Personal development....................................................................................................................................................................... 37
References............................................................................................................................................................... 38
Appendices.............................................................................................................................................................. 39
Spanning Tree Protocol (STP)......................................................................................................................................................... 39
Virtual Port-Channel (vPC).............................................................................................................................................................. 39

3
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

Introduction

Problem statement
SDN (Software-Defined Networking) and NFV (Network Function Virtualization) technologies have been in
place for some years now. Many organizations joined the ranks of innovators and early adopters and helped
SDN and NFV transition from pure laboratory experiments to practical enhancements of widely deployed
infrastructures. This evolution continued in the last 2-3 years to the point where large technology providers
began to market end-to-end solutions for SDN and NFV integration.
Among the first adopters of SDN in the data center market were the cloud providers. The features of SDN
with regards to automation, orchestration, management, reporting, time-to-market, openness, etc, were and
still are attractive to any business running very large and complex networks – which cloud providers are.
The vast majority of businesses tend to be more conservative when it comes to new technology adoption.
SDN penetration for a regular company (with a primary business that is not IT-related) is happening at a
lower pace and at later intervals – closely tied to the respective technology becoming more mature, and with
how other companies are adopting it. The benefits of SDN and NFV are not reserved to cloud providers
alone, therefore the purpose of this project is to offer an analysis regarding SDN and NFV implementation
within an enterprise (non-cloud provider) data center environment, focusing on existing infrastructures,
technology and solution choices, and integration challenges.

Proposed solution
In order to contain the project scope to a manageable size the proposed solution is centered on specific
technologies offered by industry leaders like Cisco Systems and VMware. Many other vendors have similar
offerings, however an in-depth analysis for all of them is next to impossible taking into account the project
time constraints.
The project approach is to use a baseline scenario (further detailed in the project report by means of a
reference topology), identify common SDN and NFV characteristics, and explore how these are being
implemented within specific vendor technologies. The following specific project goals have been deemed as
most relevant to the proposed solution:
- Transitioning from traditional data center network topologies to topologies better suited for SDN
architectures (implementing an SDN solution). Analysis will focus on SDN and overlay networks
implemented with Cisco technologies.
- Adding NFV on top of an SDN-enabled network. Issues like NFV deployment, network policy
enforcement and service chaining are central to this topic. Analysis will focus on Citrix Netscaler
1000v network load balancer and Cisco ASAv firewall as NFV appliances.
- Connecting the SDN infrastructure to the rest of the data center and to external networks. Analysis
will focus on Cisco Application Centric Infrastructure.
Business decisions, vendor pricing, as well as any other non-technical aspects are considered out of scope
and will not be discussed within this paper.

4
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

Technology overview
NOTE: This section corresponds to “Account of related literature” heading from Table 3.1 referring to EMA structure recommendation.

Generic concepts regarding SDN and NVF


This section refers to key characteristics of SDN and NFV. These are not limited to the project scope of
enterprise data centers, but they are the aspects on which the technologies and scenarios used within the
project are assessed against.

Abstractions
Abstractions represent the way in which SDN technologies are able interact with one another. In order to do
so, there is a need to decouple proprietary implementations of various network functionalities across a large
variety of network hardware and software from an established common language that all SDN implementers
should support (Casado, 2014). As long as an SDN implementer is able to expose a common, well-known
abstraction regardless of the code (or hardware) it uses to achieve that abstraction, other SDN products are
able to interact with it. For example, one of the most mature SDN products at large is the ONF’s (Open
Networking Foundation) OpenFlow (Kobayashi, 2014). It is of little relevance for SDN compatibility of
how OpenFlow and network functionality is actually implemented within a device, as long as the
communication interface follows OpenFlow messaging and uses OpenFlow abstractions (i.e. flow tables).

Control plane – data plane separation


SDN devices need to connect to an SDN controller in order to be able to function properly. The controller
enforces network policies, dictates how a device should behave in the case of an exception, maintains an
overall view of the SDN network and acts as a central point for management. To some extend the SDN
controller may be considered the brains of the network, and the SDN controlled device as the muscle
(actually processing the network traffic). That does not mean that SDN switches, for example, completely
lack a control plane, but rather that their control plane functionality is greatly reduced since most of the
decision making takes place on the central SDN controller.

Figure 1 – SDN control plane decoupling (Casado, 2014)

5
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

Overlay networks
In order to have an SDN infrastructure up-and-running, there is a need for a communication channel to be
established between the SDN devices and the SDN controller. With most implementations to date this
communication channel requires IP connectivity. Therefore, there needs to be a way in which IP connectivity
can be assured between SDN devices before the SDN infrastructure is able to function properly. Any
network constructs within SDN, like hypernets (Chowdry, 2010), are actually virtual networks running on
top of an existing physical infrastructure. While it may seem that this manner of doing things adds
complexity to an existing network, the SDN paradigm actually postulates that the underlying networks
should be designed as simple as possible, with any additional complexity related to SDN services
themselves to be leveraged from the intelligence of the SDN controller. Therefore, SDN actually moves
complexity from (basic) IP communications networks to the SDN-participating nodes, and allows for high
level network services to be offered only networking segments that actually require them.

Policy consistency
Traditional network devices use their own control plane to make decisions about traffic processing. The
control plane protocols themselves are well standardized, however they work by taking into account the
information available on the particular device itself (Casado, 2014). For instance, a layer 2 (OSI model)
switch will only know MAC addresses based on its own port ingress / egress information, and will not be
able to differentiate between external MAC addresses sourced from different switches. Similarly, a layer 3
(OSI model) router running OSPF will know information regarding network link states based on its own
computation of Djikstra’s SPF (Shortest Path First) algorithm which puts the router at the root of the SPF
tree. Even more, an advanced layer 7 network service device like a firewall or network load balancer with
network policies in place will enforce them regardless of any changes in the underlying network (ex. a
network access policy remains on a firewall configuration even if the subject of that policy is no longer of
use – say the service was retired from production by the server administration team, but no such information
was available to the security team).

Management
A central SDN controller allows for a single point of management for an entire SDN network infrastructure
(Raza, 2014). As a network grows in size, so does the number of network devices (switches, routers, etc.).
Network operations that involve actions over the entire network may pose a challenge to administrators, and
may induce implementation errors. Using a SDN controller keeps operational efforts to a manageable level
and minimizes the risk of network downtime (i.e. misconfigurations, faulty logic, etc.).

Complementarity of NFV with SDN


Network function virtualization refers to moving network services from being offered from a hardware
device to being offered from a virtual machine. Following in the footsteps of computer virtualization, NFV
services like routers, security appliances, balancers, session border controllers (SBCs), radio access nodes
may be used in the for of a VM instead of a physical box. The advantages of NFV are the same as those of
computer virtualization in terms of space, electrical power, ventilation, time to implement etc.

6
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

As described on the ETSI’s (European Telecommunications Standards Institute) NFV portal, “[…] Network
Functions Virtualization is able to support SDN by providing the infrastructure upon which the SDN
software can be run. Furthermore, Network Functions Virtualization aligns closely with the SDN objectives
to use commodity servers and switches.” (ETSI, 2012).

Background information for network virtualization within the data center

Traditional data center topologies


Traditional data center topologies are based on a classic multi-tier model consisting of core, aggregation,
services and access layers (Cisco, 2007). Each layer has its own specific function within the infrastructure.

Figure 2 – Traditional data center layered approach with service layer collapsed with the aggregation layer (Cisco, 2007)

The core layer is in charge with high-speed routing and switching and its main characteristic is high-
bandwidth, redundant links. At this layer it is common to have 10 Gigabit or 40 Gigabit links in order to
accommodate the aggregated traffic for all data center nodes. Usually there is little advanced packet
processing at this layer, the main focus being performance routing and switching, as well as scalability. In
this context, scalability allows an increase in capacity without the need of a complete network redesign, but
merely by adding similar building blocks to the network topology (i.e. links or equipment).
The aggregation layer is responsible with interconnecting many downstream links from of access equipment
as well as with providing upstream connections. Usually, it is here that layer 2 (data-link) and layer 3
(network) separations occur, thus containing failure points like broadcast domains, VLANs, or subnets. As
with the core layer, redundancy and scalability are a major focus for the distribution / aggregation layer.

7
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

The services layer deals with advanced network functionalities like security, quality of service, network
services (ex. load-balancing). By doing so, it allows for policy-based network connectivity. For example,
traffic flows can be sent through security network appliances (like a firewall), or they can receive different
QoS treatment depending on their priority and traffic characteristics. In some implementation scenarios, it
may make sense to collapse the functionalities of core, aggregation and services into one single layer –
depending on various constraints (type of equipment, role-based network management, budget, data-center
facilities, etc.).
The access layer is responsible with providing network access data center nodes (servers). Access equipment
physically connects to hosts and needs to take into account the various server requirements with regards to
network like NIC teaming, clustering, or broadcast containment. Two popular deployment scenarios for the
access layer are ToR (top-of-rack) and EoR (end-of-row). The ToR approach employs network switches to
be present in every data center rack cabinet, keeping all cables between servers and switch within the same
rack cabinet. This greatly simplifies cable management and structured cabling design, by requiring that only
switch uplinks (usually fiber-optic) to be run outside of the rack towards aggregation equipment. The
downside of ToR is that it provides a large number of management points, since each ToR switch has its own
management interface. An alternative approach is EoR, which provides for a single redundant (or pair of)
high port density equipment to connect servers from multiple rack cabinets, thus reducing the number of
data center management points at the cost of more complex cabling.

Compute and storage virtualization solutions


The model depicted above is designed taking into account yet another traditional paradigm – that of
deploying applications within the data center by installing physical servers – each of which is dedicated to a
specific application. Again, this is a time-proven approach that has its uses, however, for a large variety of
industry standard solutions, compute virtualization appeared as a better option. Studies were performed by
virtualization vendors (VMware, 2013), which claimed that the average dedicated servers were not used to
their full potential, but only to about 5-10% of their capacity. This claim and subsequent technology
developments in the processor industry, paved the way for organizations to try and address the inefficient
use of resources, thus compute virtualization solutions gained wide acceptance. As of 2015, most enterprise
data centers are running at least one virtualization infrastructure, and current trends are showing that
organizations are contemplating a two-provider approach for the future.
From the networking perspective, each virtualization hypervisor is running some form of a virtual network
software in order to allow the guest machines running on the physical server to gain network access. The
most common approach in the data center is the use of a virtual switch, others approaches being bridging
and NAT (network address translation).
The virtual switch approach adds another access layer to the core / aggregation / access data center model.
This virtual access layer is responsible with providing network access to virtual machines – much like the
(physical) access layer does for regular servers. There are, however limitations, as most virtual switch
software have only a subset of the functionalities that are usually found within a regular physical switch. In
addition to this, these functionalities may be implemented in a completely different manner and special care
must be taken when integrating them within an existing infrastructure. One such example is the lack of use
by virtual switches of spanning tree protocol (STP), a mechanism that ensures a loop-free topology in layer
8
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

2 environments. In the virtual world, the loop-free topology is constructed in a different manner, and this
needs to be taken into account when connecting a virtual switch to a physical infrastructure.
Traditional multi-tier network model integration of compute and storage virtualization solutions
Within the classic multi-tier model, some degree of network virtualization is already built-in in the form of
logical separation constructs within each type of node. Examples of such virtualization are VLANs (Virtual
Local Area Networks) and VRFs (Virtual Routing and Forwarding instances). While effective and widely
used in production environments for many years, they were designed and first implemented before compute
node virtualization solutions gained wide commercial adoption. Therefore, it cannot be said that the network
array of traditional virtualization solutions kept pace with the challenges and demands of compute and
storage virtualization. From practical day-to-day network operations experience of the author, virtualization
integration challenges arise in terms of network performance, network policy consistency, job responsibility
separation, and time to deliver. The project work will aim to address these challenges by implementing SDN
and NFV technologies specifically designed with these issues in mind.
Within this context, network performance refers to the shifting traffic patterns that compute virtualization
solutions incurred. For instance, as opposed to the traditional North-South direction of data center traffic
(data-center in-out traffic), currently more traffic flows in an East-West direction (from one server to another
– traffic within the data center). An example of such is the VMware’s vMotion set of technologies that allow
virtual machines to be moved between host servers – moving a 500GB virtual disk of a web server from one
host to another requires significantly more network resources and performance than having the web server
handle HTTP user requests. There are intrinsic differences in traffic requirements both in capacity (500GB
of virtual disk as opposed to a few kilo-bytes of an HTTP message exchange) as well as sensitivity to delay
and loss (close to zero tolerance for storage related traffic, as opposed to built-in recovery mechanisms for
TCP-based HTTP).
Network policy consistency refers to situations in which the traditional multi-tier model is not able to
enforce same network policies to virtual servers as to physical ones. Usually, network controls like security
ACLs (access control lists) or QoS (Quality of Service) marking are applied at the access layer – which is
equipment that physically connects the servers to the data center network. In the case of virtual servers,
however, many guest machines reside on the same physical server host, which means that the controls at the
traditional access layer need to be cumulative per each physical port in order to ensure proper handling of
traffic generated by all VMs (virtual machines). This approach has limitations. For instance, no network
control is enforced within the virtualization host itself (traffic from one VM to another VM on the same
server is hidden from the network). Also, if one VM moves to another host, the network policy does not
automatically follow, which means that manual network reconfiguration is required (which, in turn, may not
always be possible since some automation mechanisms from the virtual compute solutions world have no
equivalent on the network side – an example of which is VMware DRS).
Job responsibility separation needs to take into account that virtualization solutions like VMware vSphere
and Microsoft Hyper-V are usually implemented and maintained by the server and storage teams within a
data center. This places additional overhead on those teams since they now need to deal with issues that
traditionally fell into the network teams area of expertise. At the opposite end, the network teams may find
themselves in the dark regarding network issues since they have little or no control on the networking

9
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

capabilities of compute virtualization solutions, which translates to risk when it comes to end-to-end
network resource provisioning.
Finally, the time to deliver varies greatly between the above-mentioned teams. Leveraging the full
capabilities of a virtualization solution makes resource provisioning for compute and storage a task that can
be implemented in a matter of minutes since in order to deploy a VM there is no need for physical server
installation. However, the underlying network infrastructure needs to be provisioned as well in order to
allow that VM access to other networked resources. Configuration of switch ports, firewall rules, or load-
balancing instances may take days or even more. Therefore, most advantages derived from the quick
provisioning of compute and storage are lost at the network side.

Early SDN implementations within the traditional data center


Software-defined networking has existed in some form or another within the data center landscape a long
time before major technology vendors line Cisco or VMware announced “Software-Defined Data Center” as
an end-to-end solution. As the project work will make use of these vendors’ data center technologies and
products to further exemplify and analyze SDN concepts, it seems only logical to have a brief look at the
predecessor products and solutions form the same vendors. This course of action may provide a useful
insight regarding data center SDN evolution since, as new products and solutions are based on previous
versions and technology, so do current SDN concepts and approaches grow on top of previous constructs.
One of the first issues that data center SDN tried to address was the multitude on management points on the
network. As an example, a typical enterprise data center of medium size can have as many as 80-100
management points for the network infrastructure alone. For the most part, these management points
represent de ToR equipment discussed in an earlier section. It is easy to see how these numbers add up –
assuming at least 2 (redundant) network switches per every 2 rack cabinets, and around 10 rack cabinets per
row, 4 rows per data room and 2 data rooms per data center, simple arithmetic provides an estimated 2 / 2 x
10 x 4 x 2 = 80 network switches for ToR alone. Other equipment from core, aggregation and services layer
add to this number, thus providing a complex management picture even a small to medium data center.
A main characteristic of SDN is the decoupling of control plane from the data plane. In SDN practice, this
means having a network controller managing a multitude of network nodes. Cisco chose implemented this
functionality in its data center line of products – both Catalyst and Nexus switches – in different manners.
For the Catalyst, the technology employed is called VSS (Virtual Switching System). VSS provides a single
active control plane for two Catalyst 6500 modular chassis, effectively cutting in half the management
points for these systems. While not true SDN, since there still are two control planes for separate equipment,
the active-standby behavior of these is SDN-like – there is just one control plane managing two data planes.
These switches were Cisco’s flagship data center network products for more than 10 years, and addition of
VSS technology ensured that organizations using these products could have at their disposal an SDN
migration strategy that took into account existing infrastructure, therefore maximizing the ROI (return of
investment) for these organizations. VSS technology proved popular therefore as of 2013 it was ported on
other Cisco platforms that were not necessarily designed for the data center (Catalyst 4500 series) as well as
on 6500-derived platforms (Catalyst 6800 series).

10
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

For the newer line of data center products, the Nexus family, Cisco kept control planes separate for
equipment residing at the core and aggregation / services layers and used link virtualization instead (vPC –
virtual port channel technology). This approach has the benefit of reducing single points of failure of the
control plane for these types of equipment thus ensuring that a control plane issue at this network layer does
not result in complete outage for an entire data center pod (section). At the same time, vPC makes sure that
downstream equipment still sees one single logical entity (as with VSS) from the upper layer equipment,
thus allowing load-sharing and redundant data paths without the risk of having a single upper layer
management plane. vPC works like a regular channel bundling technique (like LACP), with the exception
that it allows ports from different switches to be bundled together in a single uplink channel. Link bundling
is considered a better option to redundant-only topologies that use STP because these topologies keep traffic
flows out of the redundant paths in order to avoid network loops.
For the Nexus ToR line of equipment, however, the approach taken with upper layer equipment would only
add complexity with little benefit. Therefore, true SDN decoupling of control and data planes is
implemented at this layer. The Cisco Nexus 2000 switches are data plane only equipment that is unable to
function without a “parent” (a controller) that provides a central, decoupled control plane. The parent switch
is responsible for any decision-making (ex. populating the MAC address table, switching based on MAC
information, etc.), leaving the access switch to function as a mere port extender. In Cisco terminology these
“dumb” switches are called FEX (Fabric EXtender).
Technologies like VSS, vPC and FEX managed to address some of the requirements for SDN like control
and data decoupling, as well as the existence of a controller. However, other SDN characteristics were still
not addressed. Among those topics there are higher-level abstractions, and programmability / existence of
APIs. Even more, the above-mentioned technologies still leave a large gap between the physical and the
virtual infrastructures.

Early SDN implementation within the compute and storage virtualization solutions
The virtual infrastructure of choice for many organizations was and still is VMware vSphere. Once the
uncontested marker leader, it recently began to lose ground, as may other technology vendors entered the
market. Microsoft’s Hyper-V is one such competitor, reporting at its 2014 WPC (Worldwide Partner
Conference) keynote address a virtualization market share split of 30.6% for Microsoft versus 46.4% for
VMware (Maurer, 2014). However, Microsoft entered the market late in 2008, while VMware was one of
the first promoters of virtualization solutions. Therefore, most SDN capabilities added by network
equipment vendors focused first on compatibility with VMware, and later on with others like Microsoft.
This is the case for the Nexus 1000v, which started as a joint Cisco-VMware effort meant to address some of
the challenges described earlier - network policy consistency, and job responsibility separation. This Cisco
product is basically a replacement of VMware’s VDS (vNetwork Distributed Switch). VDS is an extension
of the regular VMware virtual switches running on server hosts. It implements the SDN concepts of control-
data plane decoupling, and single point of management (SDN controller) at the virtual infrastructure level.
Its main functionality is network aggregation of all virtual switches into one single logical switch. Each
server hosts keeps its data plane (the VEM - Virtual Ethernet Module), while the control plane is managed
though VMware vCenter as with the compute and storage components of the virtual infrastructure.

11
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

Nexus 1000v replaces VDS by using a modified version of VMware’s network virtualization software (a
different VEM) at the host level, which is compatible with its own separate controller (the VSM – Virtual
Switching Module). The VSM is usually deployed in pairs in order to achieve redundancy. The main
advantage of the VSM is the fact that it provides a separate management interface for the network portion of
the virtual infrastructure, which is similar to the rest of Cisco’s data center switches, therefore achieving job
role separation. The network team configures the Nexus 1000v through the controller, which in turn,
connects to vCenter and host VEMs for policy enforcement. In addition to this, it supports more layer 2
network features than VDS, coming close to feature parity with physical switches and allowing network
policy consistency across all servers within the data center, VMs and physical alike. Equally important,
because Nexus 1000v is connected with vCenter (VMware’s controller), it is aware of events that may
involve VMs moving from one host to another, therefore it is able to automatically move the associated
network policy to the new physical host. This too, is critical in order to ensure network policy consistency.
Lastly, Cisco vPath technology introduces NFV to the SDN enabled virtual network infrastructures based on
Nexus 1000v and VMware. According to Cisco, “vPath provides the forwarding plane abstraction and
programmability required to implement the Layer 2 to Layer 7 network services such as segmentation
firewalls, edge firewalls, load balancers, WAN optimization, and others” (Cisco 2013). It works by
redirecting VM-related traffic flows to service nodes (like firewalls) for network policy enforcement. The
initial packet from a traffic flow is sent to the VSM controller for policy evaluation, and the subsequent
packets are offloaded directly to the VEMs for policy enforcement. Network policy may require that
multiple service nodes need to analyze the traffic, therefore vPath allows for service chaining as well (for
example, traffic passes through a firewall, then goes to a load-balancer). This manner of dealing with data-
flows is similar to other SDN products like the OpenFlow enabled network controllers. The NFV
functionality is being introduced by vPath in the form of vServices Nodes (VSN), which are virtual network
appliances rather than dedicated physical machines. The policy manager for vPath is called Virtual Network
Management Center (VNMC) and it is responsible for integration with the virtual machine infrastructure
(VMware vCenter), as well as providing the policies to the VSM through the use of agent software running
on it.

Current SDN paradigms in data center environments – Cisco ACI

Fabric infrastructure
The first component of any Cisco ACI based datacenter network is the fabric infrastructure. There are strict
rules on how network devices connect to each other, and the hierarchical reference topology needs complete
redesign. Even more, current equipment is unable to run ACI services therefore new network devices are
required.
To this end, the project approach is to deploy the ACI infrastructure in parallel with the existing reference
topology and interconnect the two infrastructures via layer 2 links. Subsequently, services, servers, VMs,
and whole pods of FEXs may be migrated to ACI.
The fabric infrastructure is based on work developed by Charles Clos in the 1950s for telecommunications
networks (Abts, 2012). In the case of ACI, there are two types of network devices:

12
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

- The spine switches are the ones that provide the backbone of the data-center network. Their sole
purpose is to connect to leaf switches over high-speed 40 Gbps links;
- The leaf switches are the devices that connect everything else: servers, services, VM hosts and
external access (routers and bridges). These connections may be 1/10/40 Gbps, depending on the
end-hosts.
The rules of thumb for an ACI implementation are:
- One spine switch cannot connect to another spine switch;
- One leaf switch must connect to all spines within a domain.

Figure 3 – ACI Fabric Infrastructure (Cisco, 2014)

Looking at the provided diagram, it is easy to see that between any two endpoints there is a maximum hop-
count of 3 (two leafs and one spine). This provides predictable performance within the fabric, as each leaf
switch is the same distance away from other leafs. This architecture eliminates the traditional hierarchy of
core / aggregation / services / access, as all network hosts and nodes are connected exclusively to leaf
switches. It is a more appropriate topology for the East-West (server-to-server) data center traffic, and as
described in an earlier section, this type of traffic is the one requiring the vast majority of network resources
within a modern data center.
It is important to note that the underlying protocols running between the ACI fabric are very different from
traditional data center infrastructures. All traffic passing through the ACI fabric is encapsulated using
VXLAN (Virtual Extensible LAN) (Cisco, 2013) at the ingress leaf switch and de-capsulated at the egress
leaf. In between the ACI nodes themselves, there is a routing protocol (OSI’s IS-IS – Intermediate System to
Intermediate System and MP-BGP – Multi-Protocol BGP) that allows traffic to be forwarded through the
fabric.

13
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

APIC network access policy definition


The APIC represents the SDN controller used by ACI devices (Cisco, 2014). In a normal implementation, it
consists on a minimum of three (could be more) separate servers connected to leaf switches. All of these
servers maintain the ACI inventory database employing a sharding mechanism (i.e. each piece of data is
located in more than one server) that ensures that in the event of one sever going down, all data in the
database is still available.
As opposed to regular SDN controllers, APIC uses high-level SDN abstractions in order to communicate
with the managed fabric nodes via the OpFlex protocol (Cisco, 2015). Instead of having the controller push
instructions to the managed switches, OpFlex states the desired end-result and allows the managed device’s
control plane to decide the best course of action to get to the end-state. The APIC is not concerned with
lower-level SDN abstractions like flow tables and instructions, but with policy compliance and therefore
uses a declarative management mechanism (specifies desired state that SDN switch must achieve) as
opposed to an OpenFlow-style imperative one (specified step-by-step instructions to be executed on SDN
switch).
Similar to OpenFlow controllers, the common SDN high-level abstractions (route, bridge) are embedded
within APIC, along with 3rd-party integrations for NFV (firewalls, load-balancers, etc.). The 3rd party
integrations allow APIC to “talk” OpFlex or imperative methods to network software provided by other
vendors, thus allowing the inclusion of any SDN / NFV compliant device within the APIC-based
infrastructure.
The network policy definition within the APIC controllers is based on EPGs (Endpoint Groups) and
established contracts between them. This means that the controller needs to be provided with all contracts
(who can “talk” with who) in order to ensure proper communication, otherwise, if no communication
“contract” is specified, the traffic will not flow through the ACI fabric. This concept is similar with zoning
(a cornerstone principle found in SAN storage networks), and is not limited to endpoints per-se but to all
entities external to the ACI fabric. That is, an EPG may very well be defined as a set of networks external to
the data center itself – this EPG being used to actually allow traffic in and out of the data center. The key
advantage in using EPGs and contracts is that once the policy has been defined and implemented,
subsequent modifications (add a new server, add a new firewall, delete de-commissioned machines, migrate
from physical host to VM) need to address just the membership of the EPG instead of the network policy.

Current SDN paradigms in data center environments – VMware NSX


VMware NSX is an SDN implementation based infrastructures that use VMware suite of virtualization
products. It basically consists of a controller (the NSX) and an overlay network built between VMware
hosts. As with Cisco ACI, the overlay network is using VXLAN, however it actually does not take into
account the underlying hardware topology. While more consistent with classical SDN solutions, this
approach has several limitations like the lack of visibility beyond the overlay (the SDN network is not able
to react properly to changes or failures happening on the physical network) and the limited integration with
non-virtualized environments through VXLAN gateways (this SDN solution is designed mainly for VMs,
while applications and services may very well run on both VMs and physical servers).

14
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

Since the project scope encompasses an enterprise data-center with mixed virtual and non-virtual servers,
the usability of this solution is restricted. Therefore no further discussion on NSX will follow. It is however,
important to mention it in the project report since it represented one of the comparison criteria against which
both traditional SDNs and Cisco ACI were assessed.

15
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

Project roll-out
NOTE: This section corresponds to “Account of project work and its outcome” heading from Table 3.1 referring to EMA structure recommendation.

Reference topology
As described in the previous sections, the focus of this project report is to propose a solution for SDN and
NVF transitioning within a mid-sized, non-cloud, data center environment. The reference topology will
serve as the base of discussion from here on and represents the starting point for the transition to the
proposed solution.
The reference topology consists of an actual data center site from a medium enterprise. For legal reasons the
details referring to the owner’s identity will be omitted from this project report, suffice to say that the owner
in question is a European technology company that operates a mid-sized data center infrastructure consisting
of about 1000 servers (if required, proof of project work on this topology may be provided by the company
in question, which is the employer of the project author).
The topology described below is the result of the company development and growth within a period of 8
years. It consists of both physical and virtual servers supported by a network infrastructure that was
designed based on the traditional approach to data center networking. It currently faces challenges related to
performance, SLA monitoring, service assurance and resource provisioning similar to those that SDN
technologies aim to solve.

Figure 4 – Project Reference Topology (Visio Diagram #1)

16
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

Looking at the diagram, the following characteristics are to be noted:


- Infrastructure based on Cisco networking equipment – a somewhat derived requirement of which
would be that the new solution maintains the same technology vendor;
- Infrastructure based on 1 Gbps links, both for server connectivity and for inter-switch links;
- Each network node has its own control plane with its own management interface. On top of these,
several (proprietary) management solutions are used mainly for monitoring purposes. Most resource
provisioning is being done via CLI (command-line interface) on a node-by-node basis;
- Core and distribution layers are collapsed within the same physical switches;
- Services layer consists of physical machines, either in the form of dedicated service modules within a
modular switch chassis, or in the form on hardware network appliances;
- Access layer consists of large EoR (end of row) modular switches;
- Spanning tree protocol (STP) runs between data center switches, therefore some of the redundant
links are actively blocked in order to ensure a loop-free layer 2 topology;
- Routing is performed at the edge of the data-center (from core/distribution layer to outside networks)
and at the services layer (between core/distribution layer and services layer);
- Server distribution is amalgamated based on facility limitations (rack space, power availability,
distribution of HVAC – heating, ventilation, air conditioning) rather than logical separation (VLANs,
routing domains, etc.). Also to be noted, the actual number of servers (1000+) cannot accurately be
represented in the diagram, therefore just a few of those are represented in order to better understand
their position as it relates to various data-center layers. Same comments should be considered with
regards to the number of network nodes (switches, routers, network service appliances), which is
actually close to 80;
- Virtualization (network) infrastructure is completely hidden from the data center topology. Network
policies are applied at the physical access layer. Separate virtualization teams run and manage the
VMware and Hyper-V clusters;
The goals set by the solution proposal are to:
- Migrate network infrastructure nodes and infrastructure to SDN technologies (less management
points, policy consistency, rapid resource provisioning);
- Add NFV on top of SDN and integrating NFV functionality with SDN policy provisioning (service
chaining);
- Review the interconnect options available between the data-center and external entities from an SDN
perspective;
The means to achieve these goals is to take a phased approach, modifying one data center section at a time
without major traffic disruption. As the reference topology is far from a green field deployment, daily
business operations that rely on the DC infrastructure must continue unhindered by the structural

17
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

modifications that are being performed on the infrastructure. To this end, some intermediary steps that result
in backward compatibility with the status quo will be required.
Therefore, starting from the reference topology, the following phases have been identified as being required
for transitioning to an SDN solution:
- Reconfiguration of the access layer – during which server connectivity options are reviewed and new
access equipment is installed. In addition to this, access layer alignment between the physical servers
and the virtual servers should be achieved;
- Reconfiguration of the way connectivity is achieved between data center switches (1st stage) – during
which interconnects based on STP protocol should disappear, and new technologies like vPC (and
subsequent ACI spline-leaf topologies) should replace them. This is a two-stage approach since the
newer ACI technologies will not be implemented from the beginning (lack of hardware
compatibility, risk of disruption for business operations, etc.);
- Moving network services from hardware appliances to virtual ones – during which firewall and load
balancer services are re-designed and migrated to a new solution that allows SDN / NFV integration,
while permitting existing client systems to function as they were;
- Implementing a parallel infrastructure with new spine-leaf connectivity between data center switches
(2nd stage) – during which an ACI topology is implemented along with an SDN controller (APIC). At
this stage, the APIC controller network access policy will be defined as well;
- Migration of access layer switches (both physical and virtual) to ACI.
All these phases are detailed in the following sections.

Access layer migration

Server connectivity
This section explores the way in which servers are connected to the data center network infrastructure. It is
to be noted that connectivity refers to physical servers alone – not logical connections within a virtualization
environment.
While servers may be equipped with a variety of network adapters and software configuration options that
allow for flexible connectivity, it is common that only a few such configurations become the standard norm
of connectivity within a single data center. The case of the referenced topology described above makes no
exception, therefore servers may categorized based on their network access interconnect.
The simplest interconnect is the single homed option. In this case, a single server NIC (Network Interface
Card) is connected to a single network port from an access switch. While simple to configure and cost-
effective from a network resource consumption perspective, there are two big disadvantages with this
option. First, there is no redundancy – should either the NIC or the access switch fail for any reason (power
failure, misconfiguration, circuit board fault, etc.), the entire server would become isolated from the
network. The only way to ensure some degree of resiliency to a solution based on this type of interconnect is
to rely on a higher level application feature, like server clustering, which permits multiple servers to run the
same application. In this way, the end-user may not be affected by the failure of a single cluster member,
18
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

since others are (presumably) still online and able to process requests. Second, the network throughput from
and to the server is limited to the capacity of that single NIC. Considering one gigabit NIC, this means that
the sum of all traffic in one direction may not exceed 1 Gbps (the actual limitation is usually even lower due
to processing overhead, network management processes, TCP/IP stack implementation on the operating
system, etc.). This limitation may become a problem for certain scenarios where more bandwidth is required,
such as streaming serves, file servers, or network storage systems. In conclusion, the single homed
interconnect option is least desirable from a network perspective. However, it is important to consider that
some data center systems may rely on this interconnect type.
Another server interconnect option multiple homing the server to different network access switches. This
means that the server has more than one NIC (usually in increments of two), and each of them is connected
to a separate network switch. This option is better than the previous one both from a redundancy and a
throughput perspective. There are several design options that can be used here, depending on the degree of
redundancy and network throughput required. Looking at all the aspects involved with multi-homed servers
may prove very complex therefore the reference topology will be limited to dual-homed scenarios alone.
Even so, the dual-homed interconnect may yet be analyzed from multiple perspectives, as follows.
While redundancy is assured by the use of multiple server NICs, it is important to be aware of the
dependencies that a redundant NIC configuration relies upon. For instance, server manufacturers provide
multiple NICs within the same board, like the Intel I350 network adapter (Intel, 2015). This PCIe
(Peripheral Component Interconnect – PCI Express) card has 2 or 4 network interfaces that may be used as
separate NICs within a server configuration. However, all of these NICs still rely on a single card for both
redundancy and throughput. Should that PCIe card experience issues, it would prove to be a single point of
failure for all NICs on top of it. For best redundancy, network connectivity should be achieved by using
redundant NICs from different PCIe cards (where server configuration allows for it) – in addition to having
separate access switches.
Another aspect influencing the dual-homed option is the way in which NICs are utilized by the operating
system (OS) running on the server. Usually, specialized drivers and software bundle these NICs together in a
single logical entity (i.e. a bond or a team) that is presented to the OS. Therefore, the TCP/IP stack uses a
single virtual NIC for traffic processing, while underneath it more physical NICs are in fact used. The
physical NICs may be used in either an active-standby or an active-active, aggregated configuration.
The active-standby option allows for one physical NIC to pass all server traffic while the other sits online
acting like a hot spare. Should something happen with the active link (NIC problem, cable problem, or
switch problem), the standby assumes the active role with minimal traffic disruption. Of course, the
“minimal” disruption of traffic depends on the mechanism used to switch NIC roles, like, bridging, GARPs
(Gratuitous ARP – Address Resolution Protocol), MAC table aging of entries, ARP caches, or failure
detection mechanisms like beaconing.
The active-active option splits the server traffic between the available physical NICs. There are multiple
ways of achieving this, however, in the reference topology above, just two of them are present. First way is
to use a link aggregation protocol (like LACP – Link Aggregation Control Protocol) that negotiates the
traffic load balancing across two or more links between the server and the upstream switch. An important
limitation of this protocol is that it can only be used between one server and one switch. In the context of a

19
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

dual-homed server, this limitation may be a showstopper, since dual-homed means two switches, but using
technologies like VSS and vPC (described in an earlier section) provides a means to circumvent such
restrictions.
The second way is to have the server running a form of virtual switch software that is able to handle switch
functionalities like layer 2 loop avoidance and MAC-based forwarding. For example, a server running
VMware ESXi with vSwitch technology is able to have multiple active “uplinks” to the physical network by
statically pinning VM MAC addresses to each vmnic (VMware’s physical NIC representation), while
rejecting any other MAC address learning from its MAC address table (as opposed to a standard switch
dynamic MAC learning). This scenario can be further enhanced by aggregating vmnics together, perhaps by
using LACP (as described above).
In conclusion, the reference topology makes use of the following types of server interconnect:
- Single-homed server;
- Dual-homed server (active-standby);
- Dual-homed server (active-active).

Figure 5 – Server interconnect options (Visio Diagram #2)

This section is important because it directly influences the migration strategy of the access layer switches
from the reference topology. The various access layer design options will be weighed based on these server
connectivity requirements, thus enabling the selection of a model that fits all needs.

Physical access layer transition to FEX-based topologies


As seen in the reference topology, all servers and systems are connected to the data center network via layer
2 EoR switches. This design choice was suited in the original layout and floor plan of the whole data room at
the moment of its commissioning. However, as new systems came into production, and subsequent rack
cabinets and pods were installed, the necessary extensions to the horizontal structured cabling became more
challenging both from an investment perspective as well from an available space for new cable runs
perspective. The new solution proposal is based on the replacement of the EoR layer 2 switches with ToR

20
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

fabric extenders (FEX) in order to address those challenges. By keeping most of the wires within the rack
cabinet (server to switch), the only requirement from the structured cabling is to be able to provide fiber
optic switch uplinks out of each rack, which is significantly less demanding, as one switch may have up to
48 ports for end host connectivity an only up to 8 ports for uplinks.
The ToR switches that form the new access layer are actually “dumb” data-plane-only devices that cannot
function without a “parent” switch control plane. All network configurations are being performed on the
parent device, with very little functionality available on the child device board itself. Apart from
configurations, the local switching between ports from the same FEX is performed on the parent switch as
well, effectively mirroring the parent-child structure to a traditional modular chassis switch (except the
parents and children may be located in different rack cabinets, while a modular switch cannot). The uplinks
between child and parent are based on 10-Gigabit fiber optic cables on top of which runs a modified version
of Ethernet, namely VN-Tag technology, which adds an additional header with a special ether-type to the
packets traversing the links. There is no STP running between the parent switches and the FEX-ex, since
they are seen as just one single switch, therefore the whole layer 2 topology tree is greatly simplified.
The parent-child topologies have their limitations, and only a few interconnect options are supported. The
simplest option is the single-homed FEX in which one or more FEX devices connect to a single parent
Nexus device. This however, means that if the parent switch fails or loses connectivity for some reason
(fiber cut, software upgrade, malfunction, etc.) all its FEX children become isolated along with all connected
host interfaces. While similar in behavior with the reference topology deployment, this may not provide the
best connectivity option.
The second option is dual homing one FEX to two different parent switches. This topology allows for two
independent control planes to be available for the FEX, therefore a single parent failure has little to no effect
on the downstream servers. The requirement of dual homing is for the parents to be aware of each other via
the vPC protocol, so that configuration provisioning, modifications, conflicts, as well topology changes are
consistent across both control planes.
This better redundancy comes at a cost, as the maximum FEX capacity per parent switch is effectively
halved in comparison with the single-homed option. Assuming the same number n of FEX ToR devices, by
implementing the single-homed option across a (redundant) pair of parent switches, each parent would need
to support n/2 number of children, while by implementing the dual-homed option, there is a n number of
children per parent. In addition to this, dual homing FEX-es requires the use of vPC and well as the two
parent control planes to be synchronized. Unfortunately, the synchronization of two independent control
planes may prove a cumbersome and often manual process.
The selection of either FEX topologies requires a good understanding of the underlying servers that need
connectivity (thus the need of discussing the server connectivity options from the previous section). As the
Nexus family of products is data center grade, and one of the most important data center design features is
redundancy, regardless of the FEX connectivity to parents, all switches will be deployed in pairs. Most
servers will be dual homed to one pair of ToR switches, as described above, and all ToR switches need to
connect (single or dual homed) to one pair of parents as seen in the diagram below:

21
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

Figure 6 – Dual-homed FEX topology (Visio Diagram #3)

Both options are suited for normal operations, however the possible failures and outages of network devices
need to be taken into account. As described above, one parent failing for the single homed FEX scenario
means that the server will sense that event. Even in redundant dual homed server configurations, there is the
possibility of having the NIC configurations misaligned with the rest of the infrastructure (i.e. default NIC
teaming settings, different timeout parameters, etc.). This may translate in a temporarily loss of connectivity
for the entire server while the traffic switches over the remaining NIC. Unfortunately, some application-level
features like clusters may become unsynchronized during this short outage and initiate a cluster-wide
switchover, which translates to an overall long outage from the application perspective. Even more, for
single homed servers, the loss of its only network port means complete network isolation until the network
issue from two layers above is fixed.
The conclusion is that in spite of capacity penalty and a somewhat difficult management caveat, the best
option for access layer migration of the reference topology remains the dual homed FEX. This option allows
for the best redundancy and incident containment as far as the downstream servers are concerned.
The practical aspect of this stage involves the grafting of the parent-child tandem onto the reference
topology of the access layer, followed by the one-by-one migration of each rack cabinet of servers to the

22
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

new equipment.

Figure 7 – FEX integration with the Reference Topology (Visio Diagram #4)

Virtual access layer transition from vSwitch to Nexus1000v


The virtual access layer of the reference topology consists of VMware hosts running vSwitch software being
connected as regular servers in the access layer switches. The virtualization team, in little synchronization
with the network infrastructure team, is doing the management of each independent vSwitch. This state of
facts may pose significant issues to consistent network service delivery across all servers within the
infrastructure. As an example, an application cluster may be formed from 3 bare metal installations on
physical servers supplemented by an additional 2 virtual machines. If VLAN access, security ACLs or QoS
settings are not applied consistently across both physical and virtual servers, performance and availability
issues may occur. This issues will be mitigated by the implementation of a central management software
(a.k.a. SDN controller) within the existing solution.
The alignment strategy chosen for virtual access layer is to replace the existing independent networking
software (vSwitch) on each VMware host with a distributed virtual switch that is able to centrally control all
networking components within a virtual infrastructure as well as to provide the much needed visibility and

23
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

integration with the rest of the data center infrastructure, especially with regards to network policy
consistency.
There are three options available for vSwitch replacement – VMware’s own DVS (Distributed Virtual
Switch) that works in conjunction with the vSphere solution, NSX that is the end-to-end SDN virtual
network solution proposed by VMware, and Cisco Nexus 1000v which represents an joint Cisco-VMware
collaborative effort meant to augment the capabilities of DVS. While all three options are able to centrally
manage individual host switches, the better option is deemed to be the Nexus1000v due to the fact that it is
the only one able to provide the required network policy consistency across both physical and virtual
infrastructure, as the other two products are designed with a focus on the virtual infrastructure alone, with
little features available for physical integration.

Figure 8 – Cisco Nexus 1000v – VSM and VEM relationship (Heffer, 2012)

As Nexus 1000v is the implementation choice, a brief overview of its functionality follows. Similar to the
parent-child schema, Nexus 1000v solution has two components as described in an earlier section. The VEM
(Virtual Ethernet Module) represents the piece of networking software running on each virtualization host
that actually handles VM traffic (data plane). The VSM (Virtual Switching Module) represents the parent
software (SDN controller / control plane) that handles policy enforcement on the VEMs, usually running as
a virtual machine, either on the same infrastructure it controls from the networking perspective, or on a
separate network appliance. There is worth mentioning that unlike their physical counterparts, the VEMs are
able to pass traffic by themselves without the aid of the VSM and should the VSM (or VSM-pair) fail, they
are able to continue passing traffic, just not modifying any of the policies governing it. That being said, the
redundancy of VSMs will still be needed in order to ensure that regular network operations are not affected
by one VSM failure.
The first step of Nexus 1000v deployment is to deploy the VSMs in a network segment from the reference
topoplogy that is reachable to the VMware infrastructure vCenter and vmKernel ports. As mentioned before,
the VSMs are actually VMs that could run on the very same virtualization infrastructure that they would
eventually control (networking-wise), much like some of the supported vCenter deployments. This does,
however, raise a chicken-and-egg type of problem when some components of the solution fail. Therefore it

24
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

is considered best to deploy the control elements on a separate machine, in the current case – Nexus 1110-X
CSP (Cloud Services Platform). This machine is nothing more than a regular x86 Intel architecture server
running KVM virtualization on top of which sit the VSM VMs. The reason that led to this choice is that
besides the Nexus 1000v software, the 1110 CSP is able to run additional VMs (or VSB – Virtual Service
Blades) that would be needed at later stages of the current project (i.e. the NFV implementation of the load-
balancer functionality will be installed on this exact physical server).
Once the VSM pair is deployed in an active-standby manner, the active unit is connected with VMware’s
vCenter and the VEM software packages are deployed and enabled on each virtualization host. The new
virtual network does not replace the existing vSwitch infrastructure, but rather runs in parallel with it – thus
ensuring that no traffic disruption is happening during Nexus 1000v installation.
Once the installation of all components is complete, the actual networking configuration of both uplink and
port profiles (the Nexus equivalent of VMware’s port groups) may initiate as a final preparation stage before
the actual VM networking migration. In the case of the uplink ports (vmnics from virtualization hosts), new
server NICs were used instead of existing ones (used by the vSwitches).
From this point on, the migration of virtual machine networking may happen on a VM-by-VM basis, or on a
host-by-host basis – as dictated by the availability requirements and maintenance windows available to the
virtualization team. In the end, all VMs should be able to benefit from the same networking features as their
physical counterparts. In addition to this, the network provisioning responsibility shifts from the
virtualization team to the data center networking team (onto which the policy enforcement responsibility lies
with in the first place).
By the end of this stage the reference topology access layer should be fully migrated to the SDN
technologies of FEX and Nexus 1000v.

25
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

Figure 9 – Nexus 1000v integration with the Reference Topology (Visio Diagram #5)

Aggregation and core layers – Stage 1 – vPC


The end of the previous stage should have completed most of the tasks concerning server re-configuration
and re-cabling. This allows for the project to continue with minimal involvement from server and storage
teams, which means less work is required from the perspective of synchronizing implementation steps
across different teams. Since the aggregation and core layers interconnect only nodes like switches and
network service appliances, solely the network teams can handle this stage alone.
The main objective here is to eliminate the use of STP from the core and aggregation. By doing so, the
bandwidth available between nodes is greatly increased since all available physical links will load-balance
data traffic, instead of behaving in an active-backup manner. The reason for the latter behavior is that STP
running on layer 2 network devices will try to build a loop-free topology in the form of a tree, thus removing
any redundant network links from traffic data path. While this is a legitimate and time-proven method
(developed in 1985, standards published in 1990, 1998, and 2004) it has drawbacks for data center core and
aggregation. Out of these drawbacks, the most significant ones are:
- Links in DC environments are expensive, and structured cabling as well as hardware platform port
density may be limited. Maintaining a topology with links blocked by STP may not be optimal,
especially when it comes to 10 Gbps or 40 Gbps links.

26
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

- Topology changes trigger STP convergence mechanisms to act. This translates in downtime for data
traffic during the STP tree recalculation. Based on its original IEEE 802.1d specification this
downtime may add up to 50 seconds (20 seconds for BPDU max age, plus an additional 15 seconds
for STP listening and learning stages). While later extensions of the protocol like the IEEE 802.1w
RSTP (Rapid STP) reduce the downtime to 6 seconds and even lower (3 times the Hello interval of 2
second), some data center environments may have trouble with any downtime, regardless of
duration.
Therefore, in order to achieve the elimination of STP from core and aggregation links, technologies like vPC
and VSS need to be used. As will be seen in a later stage, even those may prove not to be the optimal choice,
however when approaching a data center project that changes the underlying topology, a major concern is to
keep the disruption of live traffic to a minimum, even if this means longer implementation schedules and
additional intermediary stages. VSS and vPC were detailed earlier in this report therefore little information
regarding those will be added to this section.
The end-result of this stage is to achieve the modifications of the reference topology as shown in the
diagram below. Fortunately, in the real case that the reference topology was based on, the network devices
from core and aggregation were capable of VSS / vPC / LACP with minimal intervention (i.e. software
upgrades), however some topologies may require new equipment to replace existing devices (therefore
increasing the risk of prolonged network downtime). As can be seen in the diagram, the physical links from
core and aggregation are bundled in logical links, effectively constructing a loop-free topology. It is
important to notice that STP is still running on the network infrastructure, so “elimination of STP” is not
technically achieved. However, any topology change will be contained on the logical link alone, and will not
trigger STP recalculations that may affect the entire tree.

27
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

Figure 10 – VSS and vPC implementation for core / aggregation layers with the Reference Topology (Visio Diagram #6)

Services layer migration – NFV appliances


This section deals with NFV implementation at the services layer. In regards to the reference topology, this
translates in the addition of new firewalls and network load balancers, in parallel with existing ones. These
new devices are virtual machines, as opposed to existing physical dedicated appliances. The rationale behind
this decision is based on the following factors:
- Hardware appliances are usually expensive and may not scale well. For example, assuming an
existing network firewall that works at 80% load, and a new application being deployed that requires
an additional 25% load, this will mean that a new (usually similar – for ease of management)
hardware box will need to be purchased. The total load on the two boxes will now be 52.5% (since
80 + 25 = 130 out of 100 + 100 = 200 is 52.5%). In contrast, VMs may either be modified as to
support the additional load (increase VM CPU, RAM, disk at the hypervisor level), or additional
VMs may be created on the same virtualization platform (no new equipment purchase).
- Multiple applications share a hardware network appliance. If something goes wrong on this box
(electrical problem, misconfiguration, etc.), all the applications will suffer downtime. By using

28
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

separate network service VMs for each application or set of applications (running on different
virtualization servers), most problems can usually be contained on a single VM / application set pair.
- While some network service devices require the use of dedicated hardware boards in order to
perform computationally intensive tasks like IPSec encryption or SSL offload, there are use cases in
which those resources are not needed. For example, a data-center firewall may only be required to
maintain access policies based on simple permit / deny statements, and standard protocol inspection.
In this case, a hardware box with built-in dedicated encryption boards may make little sense since a
VM running on standard x64 architecture is able to perform the same job. Even in the case of
encryption requirements, there are some workarounds, since PCI-e x64 compatible crypto cards are
available (like the Cavium Nitrox SSL Accelerator cards in Citrix Netscaler) for use in virtualization
environments.
Looking at the reference topology, the current network services are provided via hardware appliances, either
in the form of standalone devices or as service modules (line cards) within a modular network switch. The
application requirements dictate the following scenarios:
- Load balancer services are required to provide advanced functionalities including SSL termination;
- Firewall services are required to provide basic access control (no VPN, no IPSec) and protocol
inspection;
Both of these sets of requirements will be addressed by using NFV technology.
It is important to note that this section considers service migration in two different ways. First method, once
the NVF appliances are up and running, services may be stopped on existing hardware appliances and ported
over. Second method (a potential safer strategy), current services may be provided from the old boxes up
until they are no longer needed (lifecycle is coming to an end), while any new services will be brought up
directly on the new service VMs.

Load-balancers
In the case of LBs, a combination of Cisco x64 servers, open-source virtualization software, hardware
acceleration cards, and Citrix load balancing software is chosen to demonstrate a complex NFV appliance
deployment. The table below shows a simplified structure of the implemented solution:

VSB (Virtual Service Blade) #1 VSB #2 VSB #n


Citrix Netscaler software VM Other network appliance VM Other network appliance VM

Cisco NX-OS – management shell / command line interface

Open source virtualization software – KVM (Kernel-based Virtual Machine)

Cisco x64 server architecture (UCS C220 M3 server) Dedicated crypto capability
(Similar to other garden-variety server) (Cavium Nitrox III PCI-e Card)

29
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

As seen above, the virtual service appliance runs on top of management and virtualization software, which
in turn, runs on top a x64 server architecture. The physical server LAN card connected via layer 2 trunk
links to aggregation switches provides the network connectivity to the rest of the data center infrastructure.
In the case of reference topology integration, two virtualization servers were used in order to provide full
redundancy. On top of these servers, up to eight VMs may be configured, thus taking into account the factors
described above. The two servers (N1K-1110-X / CSP – Cloud Services Platform), along with redundancy
relationships (HA = High Availability), and the internal connectivity path between the VM appliance
(NS10.5) and the rest of the network (WS-C6513 switches and WS-C3750 switches) are illustrated in the
diagram below:

Figure 11 – Netscaler 1000v integration with Reference Topology (layer 2 view) (Visio Diagram #7)

From the logical perspective, network service diagrams look completely different, as it is usually the case
with any type of topology overlays (ex. a layer 7 service topology running on top of a layer 2/3 network
topology). The additional diagrams below illustrate two different service deployment options that were
considered for implementation within the reference topology, as well as one diagram that provides the NFV
appliance perspective of the larger layer 2/3 data center topology (TD = traffic domains). These diagrams are
intended only to show that using VMs instead of hardware-based appliances changes very little in the logic
of a network service itself, and most of the implementation work is transparent for the administrators of said
service (i.e. the web-services team required to manage the load-balancers).
30
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

Figure 12 – Netscaler integration with Reference Topology – Scenario #1 – layer-7 view (Visio Diagram #8)

Figure 13 – Netscaler integration with Reference Topology – Scenario #2 – layer-7 view (Visio Diagram #9)

31
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

Figure 14 – Netscaler integration with Reference Topology – layer-3 view (Visio Diagram #9)

Firewalls
The firewall implementation section is quite similar to the load balancer section. All key points addressed
earlier still stand and will not be iterated again. The only reason for including it in the project report is to
provide evidence of work being done on firewall NFV appliances.
The included diagram shows a Cisco ASA (Adaptive Security Appliance) firewall VM deployed on an
existing VMware infrastructure environment, along with the logical network diagram and the corresponding
relationships between network and virtualization items.
This section also further shows that the service layer of the reference topology can be implemented in such a
way that more than one virtualization solution (VMware, Linux KVM) can be used – depending on
necessity. In the firewall case, there is no need for dedicated hardware cards – therefore, plain servers and
virtualizations are used.
Also to be noted, particular text and notations within the NFV diagrams (like IP addresses, configuration
commands, etc.) are of no relevance for the overall reference diagram, but represent labels used by the
author during lab testing and research.

32
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

Figure 15 – Cisco ASA-v running on top of VMware ESXi (Visio Diagram #10 – contains screenshot from lab vSphere Client)

Aggregation and core layers – Stage 2 – ACI


While initially part of the project scope, this section is kept in the document as a pure reference only, and no
implementation details will follow. The reasons for this are as follows:
- Lack of implementation resources (mainly time and available equipment);
- The owner of the data center (on which the reference topology was based on) implementation
timeline changed from the assumed TM470 timeline, therefore no production environment
integration was possible.
The proof of work on Cisco ACI will be restricted to the review from the next section and will contain just
future planning of work on the reference topology.

Current stage of project work


The work being done within the project scope stopped at Stage 2 modification of core and aggregation
layers, as described above. The objectives met by the project are as follows:
- Physical access layer was consolidated using FEX technology with ToR switches. The topology for
access is considered suited for all types of existing server interconnects;
- Virtual access layer was upgraded to a virtual distributed switch solution based on Nexus 1000v.
vPath support is present, although no service chaining on NFV devices is currently implemented;

33
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

- Core and aggregation layers have been upgraded to 10 Gbps links. All redundant links are not
blocked, but actively participating to traffic processing by the use of bundling techniques like LACP,
VSS, and vPC. A PoC (Proof of Concept) laboratory for Cisco ACI is connected via layer 2 links to
the core of the existing data center. The PoC consists of just one spine switch with two leaves and
one APIC controller, therefore is not suited for production due to vendor recommendations and
redundancy requirements. Furthermore, the APIC controller is configured with a permit any-to-any
policy instead of conditional access. Additional equipment needs to be purchased before further
implementation can take place.
- Services layer currently consists of both physical network service devices and NFV appliances
(VMs). Existing services are slowly being migrated to NFV with minimal traffic disruption, while
new services are implemented directly on NFV.
The reference topology is now changed as illustrated in the diagram below. It can be seen that SDN and
NFV technologies have been implemented at every functional layer of the hierarchical data center model.
The reference topology is currently used in a production environment by the employer of the author, and
provides services for over 500 servers (out of the initial 1000 – the rest of which are awaiting approval for
migration). Evidence of this may be provided by request from the company that owns the data center
environment described throughout the project report.

34
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

Figure 16 – Reference Topology by the end of project work (Visio Diagram #11)

Project management

Lifecycle model
The lifecycle model that was most appropriate for the project is structured-case model. The following
attributes of structured-case model seem to best fit the current project:
- The author’s own experience with SDN may provide some initial preconceptions on the project
approach (as already seen in the initial e-mail exchange with the allocated tutor), and these may not
always fit very well with the project scope;
- It acknowledges the fact that a pre-determined outcome may be difficult to achieve (ex. due to
scarcity of relevant literature), however, when assessing the overall success of the project, it takes
into account the “discernable and helpful contribution to everyone’s understanding of the problem
area”;
- It allows for flexibility by the use of an evolving conceptual framework for researcher aims, goals
and understandings. For example, this lifecycle model may allow the possibility of problem

35
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

redefinition (as already specified in an earlier section, the project progress may indicate that such a
direction is required).
- It provides for an iterative research cycle, which in turn fits with the (rather busy) schedule of the
project (not everything needs to be researched at once, and with 100% understanding, before moving
on to other tasks within the project);
- The scope of the project relates to SDN, which is a contentious topic with technology developers and
providers. There are different perspectives and recommendations that may contradict each other, and
currently there is no definitive agreed-upon way to implement SDN. The proposed project has a
sizable amount of exploratory work within a “messy” subject matter. Therefore an evolving
conceptual framework will help to bring some structure to an otherwise open-ended subject.

Resources
As specified by TM470 TMA/EMA instructions, resources concerning the OU Library access or basic PC
functionality will not be discussed here. The following resources have been identified as distinctive to the
current project approach:
- Technology experts and vendor information access – Some documentation may not be obtained from
the Internet and University Library, therefore the use of restricted documentation repositories may
require some additional entitlements. Such is the case of Cisco PEC (Partner Education Connection)
and technology demo laboratories.
- Purchase of audio-visual material from technology education providers – Many types of available
documentation (especially Linux-based technologies) have limited application and may require
prerequisite knowledge of the subject matter. The purchasing (and auditing) of educational content
may ease understanding of some topics.
- Hardware equipment – As stated earlier, the project may evolve to a stage in which testing and
implementation may need to occur in lieu of already existing, documented material. This will
probably involve a server chassis running virtualization software.
Skills
The skills and methods required to work with the above-mentioned resources include (in no particular
order):
- Access to restricted resources – due to existing accounts and job status, such access is available;
- Ability to understand vendor terminology and taxonomy as well as to disseminate between marketing
and technical information – job experience should be able to help;
- Time-management skills – in order to cope with large number of overlapping tasks – already existing
skills with some more development on the way;
- Soft-skills – in order to get appropriate information from technology experts, a clear problem
statement and expected results must be made (from expert to expert, depending on their area of
expertise and technology background, this may prove time-consuming since this breaks down to
communication skills adapted to technical and corporate people) – again, job experience should be
able to help;
- Technology skills – general networking, Linux, Cisco and VMware expertise is required.

36
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

- Access to hardware – before production assessment, lab environments may be used, either from
technology vendor online labs, or resource rentals from cloud environments.

Risks
The following risks have been identified:
- Spending project resources (i.e. time) investigating dead-ends => skim-reading some resources
instead of reading them cover-to-cover;
- Project scope over-reaching => setting and end-date in project schedule for final definition of
problem statement (whatever was successfully evaluated by that date will be included in the scope,
everything else will be discarded);
- Expert feedback not relevant => lab setups and tests;
- Lack of relevant documentation on a specific topic => removing topic from project scope;
- Overlapping project activities => task prioritization, with lower priority tasks being assigned to
different time frames;
- Not being able to deliver 100% of project outcomes => 90% would have to do.

Personal development
The project work built upon existing expertise for the core part, but new skill-sets were achieved as well.
While these skill-sets come both from the technological area as well as from the project approach area, the
focus of this section will be on the non-technical ones. Therefore, some of the most notable personal
achievements would be the researching, familiarity with project management concepts (definitely not
expertise, but still important), as well as a better assessment of one’s capabilities of undertaking a large and
complex task (it will definitely prove a lot easier in the future to plan, research, estimate duration).

37
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

References
Casado, M. et al (2014) – “Abstractions for Software- Defined Networks” - Communications of the ACM. Oct2014, Vol. 57 Issue 10, p86-95

Raza, M. H. et al (2014) – “A Comparison of Software Defined Network (SDN) Implementation Strategies” – Procedia Computer Science, Vol. 32, p. 1050-1055,
[Online]. Available at: http://www.sciencedirect.com/science/article/pii/S1877050914007327 (last accessed 1 March 2015)

Chowdry, M. K. et al (2010) – “A survey of network virtualization” – Computer Networks 54, p. 862–876, [Online]. Available at:
http://rboutaba.cs.uwaterloo.ca/Papers/Journals/2010/Mosharaf10.pdf (last accessed 1 March 2015)

Kobayashi, M. et al (2014) – “Maturing of OpenFlow and Software-defined Networking through deployments” – COMPUTER NETWORKS; MAR 14, 2014; 61;
p151-p175

ETSI (2012) – “Network Functions Virtualisation – Introductory White Paper” – SDN and OpenFlow World Congress, Darmstadt, Gemany, [Online]. Available at:
http://portal.etsi.org/NFV/NFV_White_Paper.pdf (last accessed 1 March 2015)

Cisco Systems Inc. (2015) – “OpFlex: An Open Policy Protocol”. [Online]. Available at: http://www.cisco.com/c/en/us/solutions/collateral/data-center-
virtualization/application-centric-infrastructure/white-paper-c11-731302.pdf (last accessed 10 Sept 2015)

Cisco Systems Inc. (2007) – “Cisco Data Center Infrastructure 2.5 Design Guide – Cisco Validated Design“, Ch1, p.7-16, [Online]. Available at:
http://www.cisco.com/application/pdf/en/us/guest/netsol/ns107/c649/ccmigration_09186a008073377d.pdf (last accessed 25 Apr 2015)

VMare Inc. (2013) – “VMware vSphere 5.5: Install, Configure, Manage” – “Module 2: Software-Defined Data Center”, p.13-32. VMware Student Guide

Maurer, T. (2014) – “HYPER-V IS EATING VMWARE’S LUNCH”, [Online]. Available at: http://www.thomasmaurer.ch/2014/07/hyper-v-is-eating-vmwares-
lunch/, (last accessed 25 Apr 2015)

Cisco Systems Inc. (2013) – “Cisco vPath and vServices Reference Guide for VMware vSphere“, p.11-15, [Online]. Available at:
http://www.cisco.com/c/en/us/td/docs/switches/datacenter/vsg/sw/4_2_1_VSG_2_1_1/vpath_vservices/reference/guide/vPath_vService_Reference.pdf (last
accessed 25 Apr 2015)

Abts, D et al (2012) – “A Guided Tour of Data-Center Networking”, p44-51, Communications of The ACM, no.6, vol.55

Cisco Systems Inc. (2014) – “The Cisco Application Policy Infrastructure Controller”, [Online]. Available at:
http://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/unified-fabric/white-paper-c11-730021.html (last accessed 1 March 2015)

Cisco Systems Inc. (2014) – “Cisco Application Centric Infrastructure Design Guide”, [Online]. Available at:
http://www.cisco.com/c/en/us/solutions/collateral/data-center-virtualization/application-centric-infrastructure/white-paper-c11-731960.pdf (last accessed 10 Sept
2015)

Cisco Systems Inc. (2013) – “Data Center Overlay Technologies”, [Online]. Available at: http://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-
series-switches/white-paper-c11-730116.html (last accessed 1 March 2015)

Heffer, R (2012) – “Components of the Cisco Nexus 1000V on VMware vSphere”, [Online]. Available at http://www.rayheffer.com/how-to-install-the-cisco-
nexus-1000v-distributed-virtual-switch/ (last accessed 2 June 2015)

38
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

Appendices

Spanning Tree Protocol (STP)


Spanning tree protocol is a loop avoidance mechanism used in layer 2 Ethernet topologies. It works by
constructing a tree from all participating layer 2 bridges within a topology. At the root of the tree is a single
switch (bridge) known as a root bridge. This bridge is selected during an election process based on all
participating devices constantly exchange BPDU (Bridge Protocol Data Units) information. These BPDUs
are compared and the bridge with the superior BPDU (lowest priority) wins the election process. From here
on, the STP tree is constructed using the lowest path cost to the root bridge. STP actively blocks any
redundant links in the topology, with only BPDU traffic being allowed to pass through them.
In the event of a link going down, or a new link being added, the switches experiencing the event issue a
TCN (Topology Change Notification). The entire STP algorithm is recalculated in order to forma new tree
(and possibly a new root bridge is elected). During this process, traffic disruption may occur within the
entire layer 2 Ethernet domain.
There are five different states that a STP participating switch port can be in: blocking, listening, learning,
forwarding and disabled (no STP running). Out of these, the listening and learning states are transient, which
means that one port stays in such a state for a finite period of time (15 seconds) before advancing to the next
state. Therefore, the maximum convergence time (duration of STP algorithm) on a whole layer 2 domain is
the sum of listening (forwarding delay) and learning timer plus the max age timer of 20 seconds (the
maximum amount of time that a stored foreign BPDU can be considered valid), witch adds up to 50 seconds.
The converged STP topology can have switch ports in one of the following states:
- Root port – port connected towards the root bridge;
- Designated port – port connected downstream from the root bridge to a non-root device;
- Blocking port – port connected to a redundant path and disabled by STP
The provided description is valid for IEEE 802.1D standard, but newer modifications and standards for STP
exist. A modern variant is IEEE 802.1w RSTP (Rapid STP) witch has a big advantage over plain STP in
terms of convergence time (reduced to as low as 2 seconds).
In the case of multiple layer 2 domains present on the same switch (i.e. multiple VLANs), there needs to be
a separate STP instance running for every such domain. Therefore additional standards that combine existing
STP with IEEE 802.1q VLAN tagging exist in the form of PVST+ (Per-VLAN Spanning Tree) and MST
(Multiple Spanning Tree). The latter is used in service provider environments where large numbers of
VLANs translate in large numbers of STP instances that may consume too much CPU and memory from a
network device. MST uses a single STP instance for a group of VLANs in order to mitigate this issue.

Virtual Port-Channel (vPC)


vPC is a Cisco proprietary protocol that allows two control planes from two devices to share information
(over a dedicated link) in order to allow cross-device link bundling.

39
Ion Alexandru IONESCU
ETTI - Master TC
Cercetare Științifică și Practică S2

Link bundling techniques like the IEEE 802.3ad LACP (Link Aggregation Control Protocol) allow for
several ports of the same type (media, speed, duplex, etc.) to be joined in a single logical port. The
advantage of this is that the virtual port bandwidth is increased by a factor equal to the number of bundled
ports (i.e. if four similar ports are bundled, the bandwidth is four times larger). This is especially useful
when either port bandwidth limitations are an issue (increase the bandwidth), link redundancy is required, or
when protocols that react to link state changes are used in environments that do not cope well with traffic
interruption (ex. STP TCN will be send only if the entire virtual port changes state, but not if just one
physical port changes state).
The main limitation of link bundling is that it needs all members of the virtual port to be located on the same
physical device. This limitation is addressed by vPC so that member ports can span across two physical
switches.
vPC works by having the two switches (vPC peers) connected via two types of links:
- vPC peer keepalive – a layer 3 control link needed to determine the state of the vPC peer switch –
used to solve split brain failure scenarios;
- vPC peer link – a data and control link used to exchange both control plane information, as well as
data plane traffic between the two peers;
vPC and its extensions allow the two vPC peers to act like a single STP bridge within the STP tree, therefore
the loop avoidance mechanism built within STP is not triggered by the presence of a layer 2 loop between
the two vPC peers and a third device. The vPC member ports from the two vPC peers actually behave like a
two LACP member ports within a single switch.

40