Documente Academic
Documente Profesional
Documente Cultură
TABLE OF
CONTENTS
So How Did This Happen? .......................................................................... 1
Sin #1: Virtual Server Sprawl..............................................................................1
Sin #2: Maintaining inconsistent and Incompatible Host Platforms..................1
Sin #3: Putting Homogeneous Workloads on the Same ESX Host ....................2
Sin #4: Overlooking Single Points of Failure ......................................................2
Sin #5: Misconfiguring the Network ..................................................................2
Sin #6: Choosing the Incorrect Storage Pathing Policy ......................................3
Sin #7: Not accounting for Disk I/O Requirements ............................................4
So What Do I Now? ............................................................................................4
Virtualization Infrastructure Heath Check ........................................................4
Virtustreams VMware Credentials and Expertise ............................................5
All rights reserved worldwide. No part of this publication may be reproduced, transmitted, transcribed, stored in
a retrieval system, or translated into any human or computer language in any form or by any means without the
express written permission of Virtustream, Inc.
This publication is provided as is without warranty of any kind, express or implied, including, but not limited to,
the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. This publication
could include technical inaccuracies or typographical errors and changes are periodically added to the information
herein. Virtustream, Inc., may make improvements and/or changes at any time to the product(s) and/or the
program(s) described in this publication.
<
be deployed in
minutes as opposed to
of the HBAs, consistent pathing policies across hosts, and balancing LUN presentation across storage
processors.
Third and probably most important, you must maintain processor compatibility in order for VMotion to
function properly. You cannot VMotion from AMD to INTEL or vice versa. You cannot VMotion from a
32-bit processor to a 64-bit processor. This obviously makes Distributed Resource Scheduling (DRS) much
less useful and can severely degrade consolidation ratios.
Sin #3: Putting Homogeneous Workloads on the same ESX Host.
There are 4 main resources in a virtual infrastructure: CPU, RAM, DISK I/O and Network I/O. Access to each
of these resources is scheduled by the VMkernel. Differing workloads place different requirements on
these resources. For instance, SQL Server tends to be memory and DISK I/O intensive, while a web or
streaming video server may be Network I/O intensive, and a server running a modeling process may be
CPU intensive, etc.
<
There are many more
business problems to
address than simple server
sprawl. There is the loss of
license control, the
difficulty of management
and deployment, 24 x 7
demands and legal
requirements as well as the
straightforward need to
optimise resources.
Placing servers with homogenous workloads on the same ESX host (for example putting all my SQL servers
on a single host) puts an enormous strain on the same set resources that those specific workloads require,
and typically may leave the other resources on that host underutilized. This creates resource bottlenecks
that will often result in decreased consolidation ratios and degraded performance.
Solution:
For best aggregate performance across the Virtual Infrastructure, it is best to place heterogeneous
workloads (disk, network, CPU, and Memory) onto the same ESX host. This way the workloads will not be
competing for the same resource.
Sin #4: Overlooking single points of failure
When a company implements a virtual infrastructure, the reliability of that infrastructure is highly critical.
After all, losing a single ESX host can mean the loss of tens or possibly even hundreds of virtual machines.
Redundancy at the storage level, the network level and the host level is the key to minimizing outages.
Solution:
No production virtual infrastructure should be implemented on a single storage fabric. Every host should
be connected to independent, redundant SAN fabrics. Each controller on the SAN should have
connections to each SAN fabric. The Host Bus Adapters (HBA) should be independent cards in separate
PCI slots to minimize the impact of a card loss or PCI bus anomalies and they should be verifiably
connected to redundant independent fabrics. Storage should be configured with appropriate RAID levels
and hot spares should be available. Keep spare disks on the shelf in the datacenter.
When configuring a virtual switch (vSwitch), always link at least two physical NICs (pNICs) to each vSwitch.
Each pNIC linked to a given vSwitch should come from a different PCI card plugged into a different PCI bus.
Those pNICs should be plugged into separate physical switches or at least separate cards in a chassis based
switch. Choose the correct Network Failover Detection method for your switch vendor and location in the
network; if the ESX host connects directly to the Core switch(es) of the LAN, then link status only is
appropriate. If the ESX host connects to edge switches then perhaps Beacon Probing is appropriate.
Finally, make use of the advanced features of ESX clusters. Each ESX cluster should have enough hosts
(currently 32 maximum) so that there is enough spare capacity spread throughout the cluster to handle
the loss of at least one host (N+1 configuration). Implement High Availability (HA). In the event of the
catastrophic loss of any given host, the affected virtual machines are restarted somewhere else in the ESX
cluster. Implement Dynamic Resource Scheduling (DRS). DRS will automatically vMotion virtual machines
from host to host in response to shifting workloads, based on policies you set. This allows your Virtual
Infrastructure to keep running at an optimal level.
Sin #5: Misconfiguring the Network
Network configuration issues usually present themselves quickly, but there are cases where intermittent
network issues can cause performance degradation or even loss of connectivity to one or more virtual
machines in the environment. This normally occurs in more complex network configurations where the
individual ESX hosts have multiple vSwitches, multiple vLANs and complex Load Balancing algorithms.
Solution:
1. Port speed and duplex while speeds and duplex settings can be mixed on a single vSwitch, this can
cause anomalies that are difficult to diagnose.
2. VLAN trunking are all the appropriate VLANs actually defined and trunked both within the vSwitches
on each host and on the physical switch ports?
3. Load Balancing (LACP/EtherChannel) make sure the vSwitches and physical switches match in terms
of source MAC or ip hash routing.
4. Promiscuous Mode allowed?
All of these settings must match on the vSwitch, the pNIC and the physical switch. An incorrectly
configured component at any layer can cause problems.
Sin #6: Choosing the incorrect storage Pathing Policy
ESX has two basic pathing policies when it comes to accessing storage provided on a Storage Area Network
(SAN): Most Recently Used (MRU) and Fixed. MRU Policy should be used for an Active/Passive SAN. Fixed
Path Policy should be used for an Active/Active SAN. It is vitally important to understand what ESX
considers to be an Active/Active SAN versus what the SAN vendor marketing department considers an
Active/Active SAN.
Choosing the incorrect pathing policy can result in LUN thrashing, poor performance or even virtual
machine and host crashes. (LUN Thrashing is when control of a LUN is constantly being transferred
between SAN controllers.)
ESX considers a SAN to be an Active/Passive SAN and only ONE controller on the SAN owns, and thus
reads and writes to, any given LUN at a time. Most entry to mid-level SANs are Active/Passive. If one
controller on an Active / Passive SAN has a failure, then by design another controller in the same group will
take ownership of the LUNs that the failed controller owned and I/O will continue. There is a specific
Pathing Policy used by ESX to support SANs with this architecture.
Solution:
ESX considers a SAN to be Active/Active when two or more controllers on a SAN can actively read and write
to the SAME LUN AT THE SAME TIME. These are typically high end SANS, such as an EMC Symmetrix.
These SANS typically have a front end/back end controller configuration with large shared caches between
all the controllers.
Here is where the SAN vendor marketing department causes confusion. Typically, each controller in an
Active/Passive SAN can own and be actively reading and writing to DIFFERENT LUNS. So the marketing
department trumpets that these SANs are Active/Active because more than one controller is handling IO
at the same time. Unfortunately, this has caused many an administrator to choose a fixed path policy for
a SAN that at an individual LUN layer and from the ESX perspective is Active/Passive. When in doubt,
follow the pathing policy for your array as defined the VMware compatibility guides.
Some mid level SANs can function in a quasi Active/Active manner by implementing Asymmetric Logical
Unit Access (ALUA). In this case, any IO sent to the Passive controller is sent across an interconnect bus to
the Active controller. Be careful here, even if a SAN has implemented ALUA, it may still cause LUN
thrashing, as some SAN vendors will trespass or migrate the LUN from the Active controller to the Passive
controller in response to very small amounts of IO on the Passive controller. It is also crucial to understand
how the SAN vendor implements ALUA. It is commonly assumed that ALUA is implemented at the LUN
level but some vendors implement ALUA at the RAID Group level. Misunderstanding this can lead to
serious performance issues that are difficult to troubleshoot.
It is vitally important to reference the ESX integration and technical support guides published by the SAN
vendors to ensure that the proper pathing policy is chosen. It is equally important that the proper pathing
policy is chosen and VERIFIED on every ESX host connected to the SAN. The Pathing policy is chosen PER
LUN PER HOST. Be aware that you may have LUNs presented to your ESX hosts from many different arrays,
with different pathing policies. The very worst scenario is one where different ESX hosts have different
pathing policies for a given LUN or LUNS presented by the same SAN. This will invariably cause intermittent issues.
<
When a company
implements a virtual
infrastructure, the reliability
of that infrastructure is
highly critical. After all,
losing a single ESX host can
mean the loss of tens or
possibly even hundreds of
virtual machines.
Redundancy at the storage
level, the network level and
the host level is the key to
minimizing outages.
In our hypothetical SAN above, we can fit 102 virtual machines in that
3TB of space. Lets assume that each virtual machine requires 20 IOPS.
Lets also assume an even 50/50 mix of reads to writes. In this case the
R5adj IOPS requirement for each virtual machine becomes 50 IOPS.
Running all those virtual machines results in a requirement of 5100 IOPS
from just 6 SATA disks which can only provide a grand total of 576 IOPS!
Solution:
The net result of haphazard growth or poor design work on the disk
subsystem is very long disk access times, disk timeouts and even virtual
machine and host crashes. It is vitally important to design the disk/SAN
infrastructure to accommodate not only space requirements but IOPS
requirements. It is often far more prudent to buy a larger number of
smaller capacity disks than a small number of larger capacity disks.
So What Do I Do Now?
Any one, or any combination, of the seven mistakes listed above can be
a threat to your virtualized enterprise and can result in a higher TCO, an
unrealized ROI, and even failed application services. Fortunately, any
one of these critical issues can be solved with a focused and pragmatic
approach, addressing each issue on its own and as it relates to other
aspects of your network.
How Virtustream Can Help You
Fault Tolerance
High Availability
Virtustream in Action
A full-service law firm with over 500 lawyers practicing in litigation, antitrust,
government contracts, corporate, intellectual property and more than 40 other
practice areas had an extensive virtual infrastructure but their efforts were not yielding
the expected benefits. An analysis of the project by Virtustream revealed several
critical issues:
The virtualized environment was not stable resulting in outages and downtime
The consolidation ratios were significantly less than expected
There were intermittent performance issues
Virtustream conducted an exhaustive health check of all 7of the clients locations
including a deep dive into the Virtual Infrastructure, Storage Infrastructure and
Network Infrastructure. The Health Check revealed many areas for improvement and
highlighted configuration issues causing performance and stability problems.
As a result Virtustream identified storage hotspots insufficient to handle I/O load and
defined a path to remediation. The team also Identified host configuration issues
causing performance and stability issues and recommended a path to remediation. In
addition, the team identified areas where further consolidation was possible, further
reducing the physical footprint by up to 75%.
Virtustream is the leading enabler of virtual infrastructures and has the deepest
level of virtualization and transformation knowledge in the marketplace. With a
suite of transformation solutions ranging from initial adoption strategy to
virtualization transformation and optimization programs, Virtustream can
ensure that you are able to maximize and maintain the benefits from your
virtualization investments. As an ISO 9001 and 27001 accredited organization,
we are a highly transparent and compliant organization adhering to
international standards in operations, security and processes.
About Virtustream
Virtustream, Inc. (www.virtustream.com) is an infrastructure
services firm committed to helping clients actualize the enterprise
cloud by providing strategy, integration and managed services
leveraging virtualisation technologies, and our secure platform.
Infrastructure Virtualization
For more information on how Virtustream can help you with operational
efficiencies, cost-base efficiencies, management efficiencies and environmental
efficiencies, please contact us at info@virtustream.com or (866) 350-6400.
www.virtustream.com
North Amercia: +1 (240) 252.1007
United Kingdom: +44 (0) 870.345.3525
info@virtustream.com
NORTH AMERICA | UNITED KINGDOM | CHANNEL ISLANDS | IRELAND