Sunteți pe pagina 1din 10

Maximizing Virtual Machine

Performance
An introduction to performance tuning

Written by

Mattias Sundling
Quest Software, Inc.

December 6, 2010 - Document version 1.2- Whitepaper


Contents
 
Maximizing  Virtual  Machine  Performance   ...........................................................................................  3  
Introduction  ...................................................................................................................................................................................................  3  
Requirements  ................................................................................................................................................................................................  3  
Virtual  hardware  and  guest  OS  ..............................................................................................................................................................  3  
vCPU  .....................................................................................................................................................................................................................  3  
Memory  ...............................................................................................................................................................................................................  5  
Disk  .......................................................................................................................................................................................................................  7  
Network  ..............................................................................................................................................................................................................  9  
Delete  unnecessary  devices  from  your  virtual  hardware  and  guest  OS  .................................................................................  9  
Acknowledgement  ....................................................................................................................................................................................  10  
Summary  .......................................................................................................................................................................................................  10  

2
Maximizing Virtual Machine Performance
Introduction
VM performance is ultimately determined by the underlying physical hardware and the hypervisor that serves as the foundation
for your virtual infrastructure. The construction of this foundation has become simpler over the years, but there are still several
areas that should be fine-tuned in order to maximize the VM performance in your environment. While some of the content of
this writing will be generic toward any hypervisor, this document focuses on VMware ESX(i) 4.1.

This is an introduction to performance tuning and not intended to cover everything in detail. Most topics have links to sites that
contains deep-dive information if you wish to learn more.

Requirements
§ VMware ESX(i) 4.1 - If you are running an older version, make sure to upgrade. Performance and scalability have
increased significantly since ESX(i) 3.x. ESX(i) 4.1 offers some improvements over ESX(i) 4.0 as well.

§ Virtual machine hardware version 7 – this hardware version introduces features to increase performance. If you are
not running Virtual Hardware version 7 make sure to upgrade VMware Tools first, then shutdown the VM’s guest OS.
In the VI Client, right-click the VM and select Upgrade Virtual Hardware.

Warning, once you upgrade Virtual Hardware version to 7 you will lose backward compatibility to ESX(i) 3.x,
so if you have a mixed environment make sure to upgrade all ESX(i) hosts first.

Virtual hardware and guest OS


The sections below make recommendations on how to configure the various hardware components for best performance as
well as what optimizations can be done inside the guest OS.

vCPU
Start with 1 vCPU - most applications works well with that. After some time you can evaluate CPU
utilization and application performance. If the application response is poor you can add additional
vCPUs as needed. If you start with multiple vCPUs and determine that you have over provisioned,
it can be cumbersome to revert, depending of your OS (see HAL).

vFoglight with Exchange cartridge looks


beyond the hypervisor into the application
layer

3
Make sure you select the correct Hardware Abstraction Layer (HAL) in the guest Operating System. The HAL is the driver in the
Operating System for the CPU; choices are “Uni-Processor (UP) single processor” or “Symmetric Multiprocessing (SMP)
multiple processors.”

• Windows 2008 uses the same HAL for both UP and SMP, which makes it easy to downgrade the number of CPUs.

• Windows 2003 and earlier have different HAL drivers for UP versus SMP. Windows automatically changes the HAL
driver when going from UP to SMP. It can be very complicated to go from SMP to UP, depending on the OS and
version.

• If you have a VM running Windows 2003 SP2 or later which you have downgraded from 2 vCPU to 1 vCPU you will
still have the multiprocessor HAL in the OS. This will result in slower performance than a system with correct HAL.
The HAL driver can be manually updated, however Windows versions prior to Windows 2003 SP2 cannot be easily
corrected. I have personally experienced systems with an incorrect HAL driver to consume more CPU, which can often
peak to unnecessary high CPU utilization percentages once the system gets stressed.

• Make sure your multi-processor VMs have an OS and application that support multi-threading and can take advantage
of it. If not, you are wasting resources.

This example shows a VM with almost same CPU utilization across all vCPUs.
That means OS and application are multi-threaded.

CPU scheduling

ESX 2 used strict co-scheduling which required a 2 vCPU VM to have 2 pCPU available at the same time. At this time physical
CPUs had single or dual core, which lead to slow performance when hosting to many VMs. ESX(i) 3 introduced relaxed co-
scheduling which allows a 2 vCPU VM to be scheduled even though that there are not 2 pCPU available at the same time.

ESX(i) 4 refines the relaxed co-scheduler even further, increasing performance and scalability.

4
CPU % Ready

The best indication that a VM is suffering from CPU congestion on an ESX(i) host is when CPU % Ready reaches 5-10% over
time, in this range further analysis might be needed. Values higher than 10% is definitely showing a critical contention. This
means that the VM has to wait for the ESX(i) host to schedule its CPU requests due to CPU resource contention with other
VMs. This performance metric is one of the most important ones to monitor in order to understand the overall performance in a
virtual environment. This can only be seen in the hypervisor and as a result CPU utilization inside the guest OS might be very
high.

CPU % Ready is an important metric to understand VM performance

Memory
When you create a VM you allocate a certain amount of memory to it. There is a feature in the virtual machine settings known
as Memory Limit (which often hurts more than it helps). The function of this setting is designed to limit the hypervisor memory
allocation to a value other than what is actually assigned. This means that the guest OS will still see the full amount of memory
allocation however the hypervisor will only allow use of physical memory up to the amount of the Memory Limit.

The only use case I have found for this is an application that requires 16 GB memory (as an example) to install or start, but it
only uses 4 GB in operation. You can create a Memory Limit at a much lower value than the actual memory allocation. The
guest OS and application will see the full 16 GB memory but the ESX(i) host limit the physical memory to 4 GB.

But in reality, the memory limit often gets set on VMs that you had no intention to limit. This can happen when you move VMs
across different resource pools or perform a P2V of a physical system. This may also happen as the result of a known bug in
vCenter which will randomly set a memory limit on a virtual machine, or worst of all, in the templates previously configured
which results in all deployed VMs inheriting this setting.

As further explanation: if you allocate 2 GB memory to a VM, and there is a limit at 512 MB, the guest OS will see 2 GB
memory but the ESX(i) host will only allow 512 MB physical memory. If guest OS require more than 512 MB memory, the
memory balloon driver will start to inflate to let guest OS decide what pages are actively being used. If balloon can´t reclaim any
more memory, guest OS will start to swap. If the balloon can´t deflate or if memory usage is too high on ESX(i) server it will
start to use memory compression and then VMkernel swapping as last resort. Ballooning is a first warning signal and guest OS
/ ESX(i) host swapping will definitely negatively impact the VM performance as well the ESX(i) host and the storage subsystem
that have to serve as virtual memory. For further explanation see: http://www.vmguru.com/index.php/articles-mainmenu-
62/mgmt-and-monitoring-mainmenu-68/96-memory-behavior-when-vm-limits-are-set

vFoglight: Detect, Diagnose and Resolve VM


problems. Memory limits will be detected by a
rule, diagnose telling you what is wrong and
optionally workflows to automate the resolution.

5
Memory sizing
When configuring the amount of VM memory, consider the following:

o Too much memory will increase the VM memory overhead - your VM density (number of VMs per host) will not
be as high as it could be.

o Too little memory can result in guest OS swapping -your performance will be affected negatively.

To determine the correct amount of memory you need to monitor active memory utilization over at least 30-90 days to be able
to see patterns. Some systems might only be used during a certain period of the quarter, but used very heavily during that
period.

Memory definitions:
Granted: Physical memory being granted to VM by
ESX(i) host

Active: Physical memory actively being used by


VM.

Ballooned: Memory being used by the VMware


Memory Control Driver to allow VM OS to selectively
swap memory.

Swapped: Memory being swapped to disk.

For a complete list of metrics and descriptions see:


http://communities.vmware.com/docs/DOC-5600

Memory utilization (Active Memory) in this example is very


low over time, which makes it safe to decrease memory
setting without affecting VM and application performance.

Memory reclamation

It is a best practice to right-size memory allocation in order to avoid placing extra load on ESX(i) hosts due to memory
reclamation. There are several techniques that ESX(i) host uses to reclaim VM memory. After all, you want to run as many VMs
as possible and will probably over-commit memory (allocate more than you have).

o Ballooning*: Reclaiming memory by increasing memory pressure inside the VM – this requires VMware Tools. Do not
disable ballooning, as it will negatively impact performance. If you experience a lot of ballooning, try to vMotion the VM
to another host, as it will allocate all memory back to VM. Also, make sure you don´t have a fixed memory limit
configured on the VM.

o Swapping*: Reclaiming memory by having ESX(i) host swap out VM memory to disk.

o Memory compression*: Reclaiming memory by compressing pages before they are swapped out to disk.

o Transparent Page Sharing: Reclaiming memory by removing redundant pages with same content (in the same VM or
across VMs). 10% of VM memory allocation can be used as compression cache.

* Only active when ESX(i) host is experiencing memory contention.

For more details: http://www.vmware.com/files/pdf/techpaper/vsp_41_perf_memory_mgmt.pdf

http://frankdenneman.nl/2010/06/memory-reclaimation-when-and-how/

6
Disk
Now, lets move on to the most complex building block of the foundation, the disk configuration:

ParaVirtualized SCSI (PVSCSI) controller

• PVSCSI provides better throughput and lower CPU utilization. Studies have shown,
that with PVSCSI implemented, a 12 % increase in throughput and 18 % decrease of
CPU utilization in comparison to the LSI Logic based controller.
http://blogs.vmware.com/performance/2009/05/350000-io-operations-per-second-
one-vsphere-host-with-30-efds.html

• VMware benchmarking PVSCSI versus LSI Logic: http://www.vmware.com/pdf/vsp_4_pvscsi_perf.pdf

• ESX(i) 4.1 has improved PVSCSI to be able to handle low disk I/O where older ESX (i) versions had problem with
queuing that could result in latency. http://vpivot.com/2010/02/04/pvscsi-and-low-io-workloads

• For more information on how to configure PVSCSI:


http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1010398

• Separate OS, Swap and Data disks into separate VMDK files.

o This will help performance and Data Protection (primarily by excluding the swap data)

o Consider creating a separate virtual disk controller for each disk. This will allow higher disk I/O than single
controller. Power off VM and change SCSI ID 1:0, 2:0 and so on and you will get additional controllers.

LUN sizing and VM placement

• Create the datastores with correct size (500-1000GB). LUNs that are too big will result in too many VMs, SCSI
reservation conflicts, and potentially lower disk I/O caused by metadata locking (i.e.VMotion, VM power on, snapshot).

o vStorage API for Array Integration (VAAI) is a new API in ESX(i) 4.1 that takes some of the heavy lifting from
the hypervisor and moves it to the storage hardware. If your hardware supports it, you will be able to run
bigger datastores without performance problems. It also helps lower metadata locking mentioned above. For
more details: http://www.yellow-bricks.com/2010/11/23/vstorage-apis-for-array-integration-aka-vaai/

• Use a 8MB block size when creating the datastores as it has no negative impact on performance and it can hold larger
VMDK files. It is required that you have the same block size on all datastores if you want to leverage VAAI.

• Monitor the LUN performance and identify any latency as quickly as possible to ensure that disk I/O is streamlined
across all LUNs.

7
vFoglight Storage: Monitoring paths, throughput and latency is important

VMFS and guest OS Alignment


If you create new VMFS volumes from the vCenter client, the volumes will be aligned correctly for you. If you created the VMFS
volumes during ESX(i) installation your volumes will be unaligned. The only way to fix this is to Storage vMotion all VMs in the
affected datastore to a new datastore and then recreate it from the vCenter client.

Windows 2008, 7 and Vista aligns NTFS volumes by default. All prior Windows server OS misalign the disks. You can only align
the disk when you create it. Most Linux distributions have this misalignment tendency as well.

• On average, properly aligning the disks can increase performance by 12% and decrease latency by 10%.
For more information see: http://www.vmware.com/pdf/esx3_partition_align.pdf

• Quest vOptimizer Pro can detect and resolve alignment problems on existing disks for Windows and Linux. To learn
more: http://www.quest.com/voptimizer%2Dpro/

Storage I/O Control (SIOC)

From ESX(i) 4.1, SIOC can be enabled on a per-datastore basis. This can be helpful if you are concerned that some mission
critical VMs are not getting the required disk I/O during times of disk congestion.

You can configure disk shares per VM. If there is disk congestion, the VMs with higher disk shares (shares will be used only
when there’s contention) have the priority for more disk I/O – this works the same way as memory shares.

8
Network
Physical network

• Make sure you have multiple redundant physical NICs at 1 Gbit/s or 10 Gbit/s speeds
connected to VM virtual network switches.

VMXNET3

The network driver in the guest OS can be updated from default E1000 to VMXNET3 (paravirtualized network driver, same
enhancements as paravirtualized storage described above and it can leverage 10Gbit/s network speeds).

Caution! IP address will reset to DHCP and a new MAC address will be generated. Make sure to capture your old
settings. Ipconfig /all >c:\ip.txt (will capture your settings into ip.txt for Windows).

Optionally, the option to enable jumbo frames on the network can help maximize the packets that traverse the environment.

• Set MTU to 9000 in guest OS driver, vSwitch, and physical network ports (end to end). Your network infrastructure
must also support jumbo frames.

• ESX(i) 4.1 support Fault Tolerance on VMXNET3 guest OS.

• For more detailed information and performance test see: http://www.vmware.com/pdf/vsp_4_vmxnet3_perf.pdf,


http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1001805

Network I/O Control (NetIOC)

NetIOC allows you to control the network bandwidth utilized by vMotion, NFS, iSCSI, Fault Tolerence, VMs and Management.
This can be done by configuring share or limits and allows you to control Quality of Service making sure critical components
always get the network bandwidth required.

Delete unnecessary devices from your virtual hardware and guest OS


Unnecessary devices in the virtual hardware and inside the guest OS will take unnecessary CPU and memory resources to
emulate. If you don´t use them make sure to delete them. Cleaning up inside the guest OS will not gain so much in
performance; it’s more of a housekeeping thing.

• Floppy, CD, USB, Serial Port, Com Port, Sound

• Less devices means less overhead on your VM

• Cleanup of deleted hardware in the OS as well


o For Windows
§ At the cmd prompt, type:
Set devmgr_show_nonpresent_devices=1

§ Start Device Manager (devmgmt.msc)

• In Device Manager – Show hidden devices

• Delete all non present devices

9
Acknowledgement
Thanks to my colleagues at Quest Software: Tommy Patterson, Chris Walker, Paul Martin, Thomas Bryant and Scott Herold for
reviewing and giving valuable feedback.

A special thanks to VMware Trainer and Blogger: Eric Sloof at ntpro.nl for additional review and finding some errors.

Summary
Tune the foundation and you will better utilize the infrastructure. Once the building blocks (CPU, Memory, Disk and Network)
are optimized then the true performance is ultimately determined.

10

S-ar putea să vă placă și