Sunteți pe pagina 1din 7

<

The 7 Deadly Sins of Virtualization


How Health Checks Can Optimize your VMware Investment

A Virtustream White Paper

The 7 Deadly Sins of Virtualization


How Health Checks can Optimize your VMware Investment
By: Matt Theurer,VCP #6 and VCDX #17 worldwide
Senior Vice President, Professional Services - North America, Virtustream

TABLE OF
CONTENTS
So How Did This Happen? .......................................................................... 1
Sin #1: Virtual Server Sprawl..............................................................................1
Sin #2: Maintaining inconsistent and Incompatible Host Platforms..................1
Sin #3: Putting Homogeneous Workloads on the Same ESX Host ....................2
Sin #4: Overlooking Single Points of Failure ......................................................2
Sin #5: Misconfiguring the Network ..................................................................2
Sin #6: Choosing the Incorrect Storage Pathing Policy ......................................3
Sin #7: Not accounting for Disk I/O Requirements ............................................4
So What Do I Now? ............................................................................................4
Virtualization Infrastructure Heath Check ........................................................4
Virtustreams VMware Credentials and Expertise ............................................5

Copyright 2010 by Virtustream, Inc.

All rights reserved worldwide. No part of this publication may be reproduced, transmitted, transcribed, stored in
a retrieval system, or translated into any human or computer language in any form or by any means without the
express written permission of Virtustream, Inc.

This publication is provided as is without warranty of any kind, express or implied, including, but not limited to,
the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. This publication
could include technical inaccuracies or typographical errors and changes are periodically added to the information
herein. Virtustream, Inc., may make improvements and/or changes at any time to the product(s) and/or the
program(s) described in this publication.

By now, the benefits of a virtualized infrastructure


are well known. Reduced hardware expenditures,
decreased power and cooling in the datacenter,
increased ROI, increased flexibility and
responsiveness by IT are all benefits of virtualization.
In addition, you have made a significant investment
to virtualize your corporate IT infrastructure
(storage, network, server, etc.). Virtualization has
enhanced policies/procedures, strategic direction,
interaction with various business stakeholders, and
served as an enablement mechanism.

<

But what happens when the virtualized


infrastructure is not living up to expectations?

Why am I not achieving the server


consolidation ratios that I expected?

Why are my virtual machines exhibiting poor


performance?
Why cant I implement advanced features
such as load balancing (DRS)?

Virtual machines can

be deployed in

minutes as opposed to

Here are 7 common mistakes and how to avoid them.

days, or even weeks


when you factor in
procurement time.
Unfortunately, this
capability creates one
of the biggest
challenges to a
virtualized environment.

Why do I intermittently lose network


connectivity to my virtual machines?

Why has my virtualized infrastructure


gone bad?

Sin #1: Virtual Server Sprawl

One of the greatest benefits of a virtual


infrastructure built on VMware is how easily and
quickly a new instance of an operating system is
deployed, especially when templates are utilized.
Virtual machines can be deployed in minutes as
opposed to days or even weeks when you factor in
procurement time. Unfortunately, this capability
creates one of the biggest challenges to a virtualized
environment, as well. Virtualization is the solution to
hardware sprawl but ironically is the biggest enabler
of operating system sprawl; the temptation to
deploy a new virtual machine for every request or
possible use case is almost impossible to resist. The
perception of end users that VMs are free only
compounds this problem.
Every virtual machine consumes resources, memory,
disk, processor and network. If you consume all of
your resources with unnecessary or deprecated
virtual machines, then you wont have the necessary
resources available when a real need presents itself.
Not only do those virtual machines consume
compute and storage resources, they consume
human and financial resources as well. If they are
running a licensed operating system (Windows, Red
Hat Linux, etc.) then a license must be purchased.
Every virtual machine must be maintained, patched,
monitored, etc. the same way a physical server

needs to be maintained, patched, and monitored.


Solution:

The question should not be can I deploy a virtual


machine but SHOULD I deploy a virtual machine.
A well documented and audited IT life cycle
management and change control process based on
the ITIL framework are vital to keeping your virtual
infrastructure free from unnecessary or low value
virtual machine workloads that disproportionally
consume your resources.
Sin #2: Maintaining inconsistent and incompatible
host platforms

One of the great features of VMware Virtual


Infrastructure is hardware agnosticism. A company
can have servers from Dell, IBM, and HP all sitting
together in the same rack, configured as part of the
same VI cluster and it all works together. That said,
there are certain limitations and best practices
regarding host platforms.
Solution:

First, having a consistent server vendor will


make monitoring and troubleshooting hardware
issues easier.

Second, try to maintain as much consistency in terms


of server components as possible. This is particularly
true for NIC cards, HBAs and memory across your
heterogeneous hosts. Having a mix of hosts with 4,
8, 16, or 32 gigabytes of RAM within an ESX cluster
makes the scheduler work much harder than
necessary. In addition, it is possible to end up in a
situation where the loss of a more powerful host, say
one with 32 GB of RAM, will result in HA being
unable to restart all the virtual machines as the
remaining hosts with a mix of 4 or 8 GB of RAM may
have insufficient resources.
Inconsistent presentation of Logical Unit Numbers
(LUNS) to the same ESX hosts can result in loss of
virtual machines if an ESX host fails. For example,
lets say we have a 3 node ESX cluster. Hosts A, B,
and C all have LUNs 0, 1, and 2 presented from the
SAN. Host A has LUN3 presented, but not Hosts B
and C. The loss of Host A will result in HA being
unable to restart any virtual machine with resources
on LUN 3 since Hosts B and C do not have access to
LUN 3.

It is also critical to have consistent configuration with


respect to Host Bust Adapters (HBAs) whether
using iSCSI or Fibre Channel and consistent
presentation of paths though the Storage Area
Network (SAN) fabric. This vastly simplifies
operational support of the Virtual Infrastructure,
while improving performance and redundancy.
Things to focus on are similar numbers and formats

A Virtustream White Paper

of the HBAs, consistent pathing policies across hosts, and balancing LUN presentation across storage
processors.
Third and probably most important, you must maintain processor compatibility in order for VMotion to
function properly. You cannot VMotion from AMD to INTEL or vice versa. You cannot VMotion from a
32-bit processor to a 64-bit processor. This obviously makes Distributed Resource Scheduling (DRS) much
less useful and can severely degrade consolidation ratios.
Sin #3: Putting Homogeneous Workloads on the same ESX Host.

There are 4 main resources in a virtual infrastructure: CPU, RAM, DISK I/O and Network I/O. Access to each
of these resources is scheduled by the VMkernel. Differing workloads place different requirements on
these resources. For instance, SQL Server tends to be memory and DISK I/O intensive, while a web or
streaming video server may be Network I/O intensive, and a server running a modeling process may be
CPU intensive, etc.

<
There are many more
business problems to
address than simple server
sprawl. There is the loss of
license control, the
difficulty of management
and deployment, 24 x 7
demands and legal
requirements as well as the
straightforward need to
optimise resources.

Placing servers with homogenous workloads on the same ESX host (for example putting all my SQL servers
on a single host) puts an enormous strain on the same set resources that those specific workloads require,
and typically may leave the other resources on that host underutilized. This creates resource bottlenecks
that will often result in decreased consolidation ratios and degraded performance.
Solution:

For best aggregate performance across the Virtual Infrastructure, it is best to place heterogeneous
workloads (disk, network, CPU, and Memory) onto the same ESX host. This way the workloads will not be
competing for the same resource.
Sin #4: Overlooking single points of failure

When a company implements a virtual infrastructure, the reliability of that infrastructure is highly critical.
After all, losing a single ESX host can mean the loss of tens or possibly even hundreds of virtual machines.
Redundancy at the storage level, the network level and the host level is the key to minimizing outages.
Solution:

No production virtual infrastructure should be implemented on a single storage fabric. Every host should
be connected to independent, redundant SAN fabrics. Each controller on the SAN should have
connections to each SAN fabric. The Host Bus Adapters (HBA) should be independent cards in separate
PCI slots to minimize the impact of a card loss or PCI bus anomalies and they should be verifiably
connected to redundant independent fabrics. Storage should be configured with appropriate RAID levels
and hot spares should be available. Keep spare disks on the shelf in the datacenter.

When configuring a virtual switch (vSwitch), always link at least two physical NICs (pNICs) to each vSwitch.
Each pNIC linked to a given vSwitch should come from a different PCI card plugged into a different PCI bus.
Those pNICs should be plugged into separate physical switches or at least separate cards in a chassis based
switch. Choose the correct Network Failover Detection method for your switch vendor and location in the
network; if the ESX host connects directly to the Core switch(es) of the LAN, then link status only is
appropriate. If the ESX host connects to edge switches then perhaps Beacon Probing is appropriate.

Finally, make use of the advanced features of ESX clusters. Each ESX cluster should have enough hosts
(currently 32 maximum) so that there is enough spare capacity spread throughout the cluster to handle
the loss of at least one host (N+1 configuration). Implement High Availability (HA). In the event of the
catastrophic loss of any given host, the affected virtual machines are restarted somewhere else in the ESX
cluster. Implement Dynamic Resource Scheduling (DRS). DRS will automatically vMotion virtual machines
from host to host in response to shifting workloads, based on policies you set. This allows your Virtual
Infrastructure to keep running at an optimal level.
Sin #5: Misconfiguring the Network

Network configuration issues usually present themselves quickly, but there are cases where intermittent
network issues can cause performance degradation or even loss of connectivity to one or more virtual
machines in the environment. This normally occurs in more complex network configurations where the
individual ESX hosts have multiple vSwitches, multiple vLANs and complex Load Balancing algorithms.

A Virtustream White Paper

Solution:

Heres a quick list of things to check and verify:

1. Port speed and duplex while speeds and duplex settings can be mixed on a single vSwitch, this can
cause anomalies that are difficult to diagnose.

2. VLAN trunking are all the appropriate VLANs actually defined and trunked both within the vSwitches
on each host and on the physical switch ports?
3. Load Balancing (LACP/EtherChannel) make sure the vSwitches and physical switches match in terms
of source MAC or ip hash routing.
4. Promiscuous Mode allowed?

5. MAC Address changes allowed?


6. Forged transmits allowed?

All of these settings must match on the vSwitch, the pNIC and the physical switch. An incorrectly
configured component at any layer can cause problems.
Sin #6: Choosing the incorrect storage Pathing Policy

ESX has two basic pathing policies when it comes to accessing storage provided on a Storage Area Network
(SAN): Most Recently Used (MRU) and Fixed. MRU Policy should be used for an Active/Passive SAN. Fixed
Path Policy should be used for an Active/Active SAN. It is vitally important to understand what ESX
considers to be an Active/Active SAN versus what the SAN vendor marketing department considers an
Active/Active SAN.
Choosing the incorrect pathing policy can result in LUN thrashing, poor performance or even virtual
machine and host crashes. (LUN Thrashing is when control of a LUN is constantly being transferred
between SAN controllers.)

ESX considers a SAN to be an Active/Passive SAN and only ONE controller on the SAN owns, and thus
reads and writes to, any given LUN at a time. Most entry to mid-level SANs are Active/Passive. If one
controller on an Active / Passive SAN has a failure, then by design another controller in the same group will
take ownership of the LUNs that the failed controller owned and I/O will continue. There is a specific
Pathing Policy used by ESX to support SANs with this architecture.
Solution:

ESX considers a SAN to be Active/Active when two or more controllers on a SAN can actively read and write
to the SAME LUN AT THE SAME TIME. These are typically high end SANS, such as an EMC Symmetrix.
These SANS typically have a front end/back end controller configuration with large shared caches between
all the controllers.

Here is where the SAN vendor marketing department causes confusion. Typically, each controller in an
Active/Passive SAN can own and be actively reading and writing to DIFFERENT LUNS. So the marketing
department trumpets that these SANs are Active/Active because more than one controller is handling IO
at the same time. Unfortunately, this has caused many an administrator to choose a fixed path policy for
a SAN that at an individual LUN layer and from the ESX perspective is Active/Passive. When in doubt,
follow the pathing policy for your array as defined the VMware compatibility guides.

Some mid level SANs can function in a quasi Active/Active manner by implementing Asymmetric Logical
Unit Access (ALUA). In this case, any IO sent to the Passive controller is sent across an interconnect bus to
the Active controller. Be careful here, even if a SAN has implemented ALUA, it may still cause LUN
thrashing, as some SAN vendors will trespass or migrate the LUN from the Active controller to the Passive
controller in response to very small amounts of IO on the Passive controller. It is also crucial to understand
how the SAN vendor implements ALUA. It is commonly assumed that ALUA is implemented at the LUN
level but some vendors implement ALUA at the RAID Group level. Misunderstanding this can lead to
serious performance issues that are difficult to troubleshoot.
It is vitally important to reference the ESX integration and technical support guides published by the SAN
vendors to ensure that the proper pathing policy is chosen. It is equally important that the proper pathing
policy is chosen and VERIFIED on every ESX host connected to the SAN. The Pathing policy is chosen PER
LUN PER HOST. Be aware that you may have LUNs presented to your ESX hosts from many different arrays,
with different pathing policies. The very worst scenario is one where different ESX hosts have different
pathing policies for a given LUN or LUNS presented by the same SAN. This will invariably cause intermittent issues.

<
When a company
implements a virtual
infrastructure, the reliability
of that infrastructure is
highly critical. After all,
losing a single ESX host can
mean the loss of tens or
possibly even hundreds of
virtual machines.
Redundancy at the storage
level, the network level and
the host level is the key to
minimizing outages.

A Virtustream White Paper

Virtustreams suite of virtualization solutions includes


specific offerings that are geared towards optimizing
a virtualized environment.
Sin #7: Not accounting for Disk I/O requirements

One of the most common and deadliest mistakes that a virtual


infrastructure administrator makes is not accounting for disk input
output operations per second, commonly known as IOPS. When it
comes to the SAN and disk utilization, most administrators are only
concerned with how much space they need in terms of Gigabytes or
Terabytes.
Most organizations, especially small to medium
organizations, will purchase a SAN with the smallest number of the
largest disks that they can afford. For example, an organization requires
3TB of space so they purchase a SAN with 7 7.2K 1TB SATA drives. After
all, this gives them 5TB useable space in a RAID 5 configuration with a
hot spare. That is almost twice as much space as they need!
Here is the deadly mistake: disks running at a given rotational speed
(RPM) provide the same number of IOPS, regardless of size. A 15K Fiber
Channel Drive can handle approximately 180 IOPS whether its a 73GB
drive or a 450GB drive. A 7.2K SATA drive can handle approximately 95
IOPS whether it is 250GB or 1TB.
The average virtual machine requires approximately 20 IOPS and
consumes approximately 30GB of disk space. In addition, the IOPS
consumed by the virtual machine must be adjusted for the RAID
configuration of the SAN LUN. The formula for RAID 5 adjusted IOPS is:
R5adj IOPS = (Total IOPS*Read Ratio) + ((Total IOPS*Write Ratio)*4)

In our hypothetical SAN above, we can fit 102 virtual machines in that
3TB of space. Lets assume that each virtual machine requires 20 IOPS.
Lets also assume an even 50/50 mix of reads to writes. In this case the
R5adj IOPS requirement for each virtual machine becomes 50 IOPS.
Running all those virtual machines results in a requirement of 5100 IOPS
from just 6 SATA disks which can only provide a grand total of 576 IOPS!

Virtual machines running disk intensive applications such as Oracle,


Exchange or SQL have much higher IOPS requirements, which only
exacerbate this problem.
Further compounding this problem is the virtual server sprawl discussed
above. Even in cases where the VI was properly sized and configured for
disk I/O during the initial deployment, most organizations see dramatic
growth in their virtual infrastructure over time. Typically new spindles
are not added as new virtual machines are deployed. Again, users tend
to think that the VMs are free that the resources are there for the
taking. As new VMs appear, they are placed on VMFS volumes with free
space often without consideration for the IOPs that the VMFS LUN is
using already, or what is being added. Virtual machines use the same
number of IOPs for a given workload as a physical server. It is important
to have a good understanding of your requirements for both storage
capacity and IOPs and plan for growth accordingly.

Solution:

The net result of haphazard growth or poor design work on the disk
subsystem is very long disk access times, disk timeouts and even virtual
machine and host crashes. It is vitally important to design the disk/SAN
infrastructure to accommodate not only space requirements but IOPS
requirements. It is often far more prudent to buy a larger number of
smaller capacity disks than a small number of larger capacity disks.
So What Do I Do Now?

Any one, or any combination, of the seven mistakes listed above can be
a threat to your virtualized enterprise and can result in a higher TCO, an
unrealized ROI, and even failed application services. Fortunately, any
one of these critical issues can be solved with a focused and pragmatic
approach, addressing each issue on its own and as it relates to other
aspects of your network.
How Virtustream Can Help You

Virtustreams suite of virtualization solutions includes specific offerings


that are geared towards optimizing a virtualized environment. For
existing virtualized environments, Virtustream recommends a periodic
Health Check.
Virtual Infrastructure Health Check

Our experience has uncovered that a very large majority of VMware


implementations are not in compliance with Best Practices from a
storage, network and virtualization perspective. These inefficiencies can
be the result of expedited implementation timeframes, limited and over
worked IT staff, or a lack of deep technical knowledge of the
components. Regularly scheduled virtual infrastructure Health Checks
are designed to assist organizations in correcting configuration mistakes
and help them get back on track to get the most out of their investment
in virtualization technology.
A typical Health Check includes, but is not limited to, the following:

Review and analysis of your virtual architecture


 Networking
 Storage

 Security ESX Host Servers


 Virtual Machines
 vCenter

 Fault Tolerance

 High Availability

Review and analysis of operations with your business


objectives in mind
Recommendations to achieve compliance with vendor
Best Practices
Standardization and supportability

A Virtustream White Paper

Virtustream in Action

About the Author

A full-service law firm with over 500 lawyers practicing in litigation, antitrust,
government contracts, corporate, intellectual property and more than 40 other
practice areas had an extensive virtual infrastructure but their efforts were not yielding
the expected benefits. An analysis of the project by Virtustream revealed several
critical issues:

Matthew Theurer has been designing


and deploying IT solutions for 20 years,
architecting IT solutions to solve real
world challenges for companies ranging
in size from tens of users to tens of
thousands of users. He was one of the
first individuals ever certified on
Vmwares virtualization platform and holds many industry
certifications including certifications from Microsoft, Cisco and EMC.

The virtualized environment was not stable resulting in outages and downtime
The consolidation ratios were significantly less than expected
There were intermittent performance issues
Virtustream conducted an exhaustive health check of all 7of the clients locations
including a deep dive into the Virtual Infrastructure, Storage Infrastructure and
Network Infrastructure. The Health Check revealed many areas for improvement and
highlighted configuration issues causing performance and stability problems.
As a result Virtustream identified storage hotspots insufficient to handle I/O load and
defined a path to remediation. The team also Identified host configuration issues
causing performance and stability issues and recommended a path to remediation. In
addition, the team identified areas where further consolidation was possible, further
reducing the physical footprint by up to 75%.

Virtustreams VMware Credentials and Expertise

Virtustream is the leading enabler of virtual infrastructures and has the deepest
level of virtualization and transformation knowledge in the marketplace. With a
suite of transformation solutions ranging from initial adoption strategy to
virtualization transformation and optimization programs, Virtustream can
ensure that you are able to maximize and maintain the benefits from your
virtualization investments. As an ISO 9001 and 27001 accredited organization,
we are a highly transparent and compliant organization adhering to
international standards in operations, security and processes.

Matt began his career working for the Federal Government


designing Smart building solutions and went on to become a
senior systems engineer and project manager for CIS Global and
VirtuaLogic, Inc. In 2001, Matt co-founded and was CIO for Brigh
Technologies, Inc. specializing in Disaster Recovery and highly
available systems architectures.

About Virtustream
Virtustream, Inc. (www.virtustream.com) is an infrastructure
services firm committed to helping clients actualize the enterprise
cloud by providing strategy, integration and managed services
leveraging virtualisation technologies, and our secure platform.

VMwares 1st Authorized Consulting Partner worldwide

Virtustream delivers efficient infrastructure solutions, backed by


guaranteed service levels and an industry leading resource-based
pricing model, built upon the companys four pillars of managed
service excellence including Virtualisation Solutions, Infrastructure
as a Service, Application Management, and Outsourcing. Through a
self-service, automated foundation built on eight years of
virtualisation expertise, Virtustream delivers flexibility that allows
clients to capitalize on the flux of todays dynamic business
requirements.

Founding and Active member of VMwares Technical Advisory Board

Headquartered in Washington, D.C., the privately held company also has


offices London, Dublin and Jersey.

On staff are 2, of 28 certified worldwide, VMware Certified Design Experts


(VCDX) which is VMwares most prestigious certification

For further information, visit www.virtustream.com or contact


info@virtustream.com

Following are details of our elite credentials:

Founding and Active member of VMwares Premier Partner Program


More than 25 staff classified as VMware Certified Professionals (VCP) including the 6th and 40th VMware certified engineers worldwide
Other VMware core certifications include:


Infrastructure Virtualization

Site Recovery Manager





Virtual Desktop Infrastructure


Business Continuity
Lab Manager

For more information on how Virtustream can help you with operational
efficiencies, cost-base efficiencies, management efficiencies and environmental
efficiencies, please contact us at info@virtustream.com or (866) 350-6400.

www.virtustream.com
North Amercia: +1 (240) 252.1007
United Kingdom: +44 (0) 870.345.3525
info@virtustream.com
NORTH AMERICA | UNITED KINGDOM | CHANNEL ISLANDS | IRELAND

S-ar putea să vă placă și