Sunteți pe pagina 1din 37

vSphere 5 Troubleshooting

Van de Perre Jurgen


Sr. IT Consultant
Xylos NV/SA

Van de Perre Jurgen


Around 3 years working @ Xylos
VSP 3, 4 & 5
 VTSP 3, 4 & 5
 VCP3, 4 & 5



Currently deployed @ Sappi Maastricht


as a Vmware admin:





30 x BL480 Gen8 (16 CPU - 128GB)


2 x 3Par F400
14 x EMC CX4-240
vSphere 5 migration is ongoing now

Agenda









Introduction & Preparation


vMA and command line
vCenter and ESXi logfiles
Troubleshooting vCenter Server 5
CPU and memory performance
Tracking and solving network
problems
Tracking and solving storage
problems
Troubleshooting DRS and HA

Agenda









Introduction & Preparation


vMA and command line
vCenter and ESXi logfiles
Troubleshooting vCenter Server 5
CPU and memory performance
Tracking and solving network
problems
Tracking and solving storage
problems
Troubleshooting DRS and HA

Introduction & Preparation


Why you need to know how to use the CLI


One word answer: Troubleshooting

vSphere Client does not do everything.

What if vCenter is down or unresponsive?

CLI is faster than GUI (on mass changes).

Introduction & Preparation


Different CLI Options and Tools

ESXi has a thin service console (Tech Support


Mode)
vMA is the new, single service console (like in
ESX)
vCLI on your local PC

PowerCLI / PowerGUI on your local PC




Introduction & Preparation


LAB1: Installing the tools to your admin PC


Install PuTTY

Install vMA

Install PowerCLI / PowerGUI

Install RVTools

Introduction & Preparation


Do I need to know Linux?



A little will be the right answer


Linux filesystem and editing knowledge is helpful
when using vMA and Remote Tech Support
Mode.
Recommended commands to learn:



ls, cd, rm, find, more, grep


vi/nano

Agenda









Introduction & Preparation


vMA and command line
vCenter and ESXi logfiles
Troubleshooting vCenter Server 5
CPU and memory performance
Tracking and solving network
problems
Tracking and solving storage
problems
Troubleshooting DRS and HA

What is the vMA?




vSphere Management Assistant (vMA) allows


administrators and developers to run scripts and agents
to manage ESX/ESXi and vCenter Server systems. vMA
is a virtual machine that includes prepackaged software,
a logging component, and an authentication component
that supports non-interactive login

Free downloadable virtual appliance

Comes in .ovf format

Linux format (SLES 11 64-bit)

Logging into vMA




You cannot use the root login

Login via the vi-admin user account

The password is set on initial install.

Using vMA vCLI commands







vMA contains vCLI


vCLI offers vicfg-xxxxx commands to administer ESXi
hosts (previously esxcfg)
Commands are located in /usr/bin
Also, there are other commands like vm-support,
vmstat, and vmware-cmd.

vMA: vi-fastpass


The best way to run commands on vCenter and Esxi


hosts is to use "vi-fastpass"

To do this, add servers to fastpass (incl vCenter) by


doing vifp addserver

Once servers are added, select the server you will be


managing using vifptarget -s

Consolidating Logfiles with vMA




vilogger has been deprecated and removed in vSphere 5


(it was present in 4.x)

A free syslog server has been added to vSphere vCenter


Server 5.

Agenda









Introduction & Preparation


vMA and command line
vCenter and ESXi logfiles
Troubleshooting vCenter Server 5
CPU and memory performance
Tracking and solving network
problems
Tracking and solving storage
problems
Troubleshooting DRS and HA

Locating and Viewing log files


vCenter 5 log files
 vpxd-XX.log


Located in C:\ProgramData\Vmware\Vmware
Virtualcenter\Logs (in Windows 2008)

Log file rotates when vpxd is started or when it reaches


5MB in size

Size, location, name and rotation can be manipulated


with the vpxd.cfg file in the Vmware Virtualcenter folder

Locating and Viewing log files


ESXi log files
 /var/log/vmware/hostd.log
Host management service logs, including VM tasks and
events
 /var/log/shell.log
ESXi Shell usage logs, including every command entered
 /var/log/vpxa.log
vCenter Server vpxa agent logs
 /var/log/vmkernel.log
Core Vmkernel logs, storage and networking device and
driver events
 /var/log/fdm.log
vSphere High Availability Log.

Locating and Viewing log files


You can view & search log files in a variety of ways:
 vSphere Client
 SSH to ESXi
 RDP to vCenter server
 DCUI in ESXi
 Web
 Syslog
 PowerCLI (Get-Log command)

Locating and Viewing log files


You can also view the logfiles with PowerGUI

Exporting log files


Why Export Logs?


Troubleshooting

Security

Sharing

Configuring Centralized Logging


Why do you need centralized logging?


Central point for troubleshooting and security audit

Especially needed for ESXi as log files are in memory


and are lost upon reboot or power outage

You can consolidate log files at a central point with


Syslog

Logging Consolidation Tips


Logging Consolidation Tips


Time and DNS are critical!

Configure Timezones and NTP on all ESXi hosts

Configure DNS resolution on all ESXi hosts

Changing the location of log files


Instead of using Syslog, you could just place ESXi log files
on (shared) storage.
To do this, using the vSphere client, do the following:
 Go to you host, select the Configuration tab.
 Select Software > Advanced Settings
 Go to SysLog
 Change the Syslog.global.logDir to [datastore] /folder


i.e. [xylos-san.lun0] /log_xylos-esxi1

The logfiles will be created on the datastore of your


choice.

Configure Syslog server


Configure the VMWare SysLog Collector from the VMWare
vCenter setup.

Configure Syslog server




The Syslog server has the ability to accept connections


on three different ports:




UDP 514
TCP 514
Encrypted SSL 1514

Configure Syslog server




In the vCenter inventory, select the ESXi 5.0 host

Navigate to the Configuration tab > Software > Advanced


Settings > Syslog

Enter the Syslog server address in the field for Syslog.


global.logHost. The format is <protocol>://<f.q.d.n>:port.


In our lab we use ssl://xylos-vcenter.xylos.training:1514

Do not forget to enable the syslog rule in the firewall!

Agenda









Introduction & Preparation


vMA and command line
vCenter and ESXi logfiles
Troubleshooting vCenter Server 5
CPU and memory performance
Tracking and solving network
problems
Tracking and solving storage
problems
Troubleshooting DRS and HA

Troubleshooting vCenter Server 5




Most common issues:









Hostname or IP of database server changed


vCenter database name changed
Username / password to connect to the
vCenter database changed
DSN deleted
Type of DSN is incorrect (32-bit instead of 64bit)
Out of disk space on SQL server for DB or
trans logs (or database full if Express)

Troubleshooting vCenter Server 5









Check if the vCenter Server Service is


running!
Verify if the vCenter Service is using the
correct Credentilals
Verify the vCenter Server System DSN
The Database is corrupt, so we need to
create a new one!
Check the registry for the correct settings
Recreate the database Repository

Troubleshooting vCenter Server 5


1 Rule of Thumb:
Always make sure you monitor the
VMWare vCenter Service (and keep a
backup of the database close by) !

10

Agenda









Introduction & Preparation


vMA and command line
vCenter and ESXi logfiles
Troubleshooting vCenter Server 5
CPU and memory performance
Tracking and solving network
problems
Tracking and solving storage
problems
Troubleshooting DRS and HA

CPU and memory performance


Critical Resources - CPU and Memory


Just like with any typical physical server, there are 4


critical resources:
1.
CPU
2.
Memory (n1 most constrained resource)
3.
Disk
4.
Network
You must know how to monitor CPU and Memory both
from the vSphere GUI and from the CLI.

CPU and memory performance


Analyzing CPU and Memory Performance in vCenter

11

CPU and Memory performance


LAB2: Monitoring an ESXi Server with the
vSphere Client.





Connect to the vCenter Server and select an ESXi Host.


See the Resources box and the default Performance
overview page.
Select a virtual machine and select the Performance Tab
View and modify the statistics logging level
It does not provide you with as many in-depth
troubleshooting statistics as you might need

CPU Ready

CPU and memory performance


Command-line monitoring: (r)ESXTOP
 esxtop - in the Remote Tech Support Mode (SSH)
 resxtop - in vMa
 Both provide real-time resource utilization

12

CPU Ready
High CPU Ready Time


Problematic if it is sustained for high periods

Possible contention for CPU resources among VMs





Workload Variability? Fix with vMotion/DRS.


Resource Limits on VMs? Check Limits, reservations and
shares.
Actual overcommitment? Fix with vMotion/DRS/more CPUs.

Memory troubleshooting


ESX must balance memory usage






ESX allows overcommitment of memory







Page sharing to reduce memory footprint of Virtual Machines


Ballooning to relieve memory pressure in a graceful way
Host swapping to relieve memory pressure when ballooning
insufficient
Compression to relieve memory pressure without host-level
swapping
Sum of configured memory sizes of virtual machines can be
greater than physical memory if working sets fit

Memory also has limits, shares, and reservations


Host swapping can cause performance degradation

CPU and memory performance


Memory Management with ESXTOP


MEM overcommit - a value of 0.20 is a 20% overcommitment!

PMEM/MB - Physical memory in the host.

VMKMEM/MB - Less important, this is how the VMkernel performs,


only used for troubleshooting with VMWare support.

SWAP/ZIP/MEMCTL - should be zero at all times. If not, this is an


indication that there is not enough memory to assign to the guests.

13

CPU and memory performance


Using vSphere Hot-Add to Dynamically Add CPU and
RAM
 Typically, USB, Ethernet and hard drives are the only
hardware that can be added hot


You cannot remove any hardware hot

When a VM is powered off, you can add or remove


virtual hardware

Keep in mind that your OS and apps also need to


recognize it.

Agenda









Introduction & Preparation


vMA and command line
vCenter and ESXi logfiles
Troubleshooting vCenter Server 5
CPU and memory performance
Tracking and solving network
problems
Tracking and solving storage
problems
Troubleshooting DRS and HA

vNetwork Distributed Virtual Switch


Why the Distributed Virtual Switch is so important







The vNetwork is critical to the virtual infrastructure


As the number of hosts increases, so does the
complexity of networking configuration and
troubleshooting.
With dvSwitch, configuration for all hosts is done in
vCenter (=centralized admin, config and monitoring)
dvSwitch enables numerous other features
dvSwitch requires vCenter and vSphere Enterprise Plus
licensing.

14

vSphere Networking overview


Standard vSwitch versus dvSwitch


dvSwitch
 Ease network troubleshooting
 Centralized administration
 Enterprise plus required!!
 Private VLANs
 Network vMotion (port state follows VM)
 NIC teaming based on load
 Requirement for Cisco Nexus 1000V

Standard Switch Architecture


Network configuration at
the host level

vNICs
Port Groups

VMotion Port

VM Port Group

VMotion Port

VM Port Group

Virtual

vSwitches
Physical
NICs
Physical
Switches

Physical

ESXi Host 1

ESX Host 2

Distributed Switch Architecture


Distributed
Port Groups
Distributed
Switch
(Control
Plane)

VMotion

Virtual Machines

Service Console

vCenter
Server

Service
Console

Hidden
vSwitches
(IO plane)

Virtual
Physical

ESXi Host 1

ESX Host 2

15

vNetwork Distributed Switch

Aggregated datacenter level


virtual networking
APP

APP

APP

APP

APP

APP

APP

APP

APP

OS

OS

OS

OS

OS

OS

OS

OS

OS

vSwitch

vSwitch
vSwitch
vNetwork
CiscoDistributed
Nexus 1000V
Switch
VMware vSphere

Simplified setup and change


Easy troubleshooting,
monitoring and debugging
Enables transparent third
party management of virtual
environments

vSphere Networking overview


Summary of the dvSwitch


Control plane = vCenter

I/O plane = ESX(i) host

vCenter updates dvSwitch every 5 minutes

Stored in /etc/vmware/dvsdata.db and as


.dvsData on datastore

vSphere Networking overview


Locating vSwitch Entries in VM Configuration Files


[VM name]/[VM name].vmx

16

vSphere Networking overview


Locating vSwitch Entries in Host Configuration Files


/etc/vmware/esx.conf

This only shows very basic information and no


information about the individual ports

vSphere Networking overview


Examining net-dvs Output


In ESXi, net-dvs command displays local information


about dvSwitches

Troubleshooting networking via CLI


Troubleshooting the vNetwork with esxtop


Powerful tool to monitor the virtual network

Shows all Virtual Machines, vmKernel ports and uplink


ports.

You can sort all the colums!

17

Troubleshooting networking via CLI


Using esxcfg to troubleshoot vNetworks



esxcfg-* commands are your CLI commands in ESXi


Some important commands are:




esxcfg-nics
esxcfg-route
esxcfg-vswitch

esxcli network vswitch standard (for standard switches)


esxcli network vswitch dvs (for distributed switches)

esxcfg-info -n

esxcfg* commands are found in /usr/bin

Troubleshooting networking via CLI


Understanding dvSwitch Sync and Timeouts


Control plane in vCenter syncs with I/O plane on ESXi


hosts, thus it is possible for these two to get out of
sync.

ESXi hosts will continue to function with backup


copies even when vCenter is down!

The file /etc/vmware/dvsdata.db is a binary file that


can be viewed with net-dvs.

Troubleshooting networking via CLI


Understanding dvSwitch Sync and Timeouts


vSphere Client Error:




Configuration issues: The Distributed Virtual Switch


configuration on some hosts differed from that of the vCenter
server

This tells you that the databases are out of sync.


Will go away after the next 5-minute sync window automatically.

Ports in use when trying to remove dvSwitch or


dvPortGroup

18

Troubleshooting networking via CLI


Understanding dvSwitch Sync and Timeouts


Ports in use when trying to remove dvSwitch or


dvPortGroup


VMWare KB1010913 - Changing the default timeout for


locked dvPorts
To decrease the timeout, modify the vpxd.cfg file on vCenter
in C:\ProgramData\VMware\VMware VritualCenter\vpxd.cfg
Add the line (where XX is minutes) & restart vCenter:
<dvs> <portReserveTimeout> XX </portReserveTimeout>
</dvs>

Troubleshooting networking via CLI


Understanding dvSwitch Sync and Timeouts

Troubleshooting networking via CLI


Restoring a Standard switch on an ESXi Server

19

Troubleshooting networking via CLI


Troubleshooting VMkernel Issues






Management Traffic is on a VMkernel port (vmk0)


These interfaces are in a port group called
Management Network with just one port
You should create redundant management interfaces
If all management interfaces are down, you will have
to go to the console of the server to reestablish
connectivity

Troubleshooting networking via CLI


Troubleshooting VMkernel Issues

vCenter Server

hostd

vpxa

ESXi host

Troubleshooting networking via CLI


Troubleshooting VMkernel Issues


Host Agent


Executes commands from vSphere Client & vCenter server agent.

vmware-hostd program

Uses TCP port 443

Logs to hostd.log

Can be restarted with ./etc/init.d/hostd restart (in ssh)

20

Troubleshooting networking via CLI


Troubleshooting VMkernel Issues


vCenter Server Agent




Collects actions from vCenter server and sends them to hostd

Uses TCP port 443

Logs to vpxa.log

Can be restarted with ./etc/init.d/vpxa restart (in ssh)

Troubleshooting networking via CLI


Troubleshooting VMkernel Issues


Can also be done from the DCUI

Does not interrupt Virtual Machine networking

Troubleshooting networking via CLI


Determine the Root Cause of vNetwork Trouble


Bottom-up troubleshooting methodology

Understand how VMs and management interfaces are


in port groups, port groups are in vSwitches and
vSwitches have vmnics that are the physical uplinks

dvSwitches can standardize configurations across all


hosts as well as complicate troubleshooting

Avoid the urge to reboot and continue searching for


the root cause

21

Using a Network Packet Analyzer


Packet Capture Concepts


The network (vNetwork in this case) is a critical


communication path
Once you go beyond summary stats, visibility to that
path is done with packet analysis
Packet analysis is useful for troubleshooting a variety
of issues - vMotion, DHCP, DNS, iSCSI, etc
Sometimes, excessive packets or lack of packets help
us to solve a problem
Packet/protocol analyzers are called sniffers

Using a Network Packet Analyzer


Differences between packet caputure on virtual and
physical network


Physical




Port-mirror
SPAN (switch port analyzer)
Promiscuous port

Virtual


Promiscuous port group on a vSwitch and all traffic on the


vSwitch will be sent to that port

Using a Network Packet Analyzer


Packet Analyzer Options


Wireshark (from www.wireshark.org) running in a VM

There are virtual appliances to transport the packets


on the vNetwork to the physical network like Soleras
virtual appliance (www.soleranetworks.com)

If you are using the Cisco Nexus 1000V in your virtual


infrastructure, it supports SPAN and ERSPAN.

22

Using a Network Packet Analyzer


Configuring the vNetwork for Packet Capture


Configure promiscuous mode on the port group to


send traffic from all ports on the vSwitch to all ports on
your port group

Your packet capture VM needs to be in the port


group where the promiscuous mode is configured!

You are recommended to create a new port group


and enable promiscuous mode on that port group

Agenda









Introduction & Preparation


vMA and command line
vCenter and ESXi logfiles
Troubleshooting vCenter Server 5
CPU and memory performance
Tracking and solving network
problems
Tracking and solving storage
problems
Troubleshooting DRS and HA

Storage Troubleshooting
Reviewing vSphere 5 Storage Maximums


VM maximum VMDK size = 2TB minus 512bytes

Volume Size = 64TB

Virtual machines per VMFS volume = 2048

Volumes per host = 256

LUNs per server = 256

23

Storage Troubleshooting
Storage Terms - PSA, MPP, NMP, SATP, PSP & ALUA
 VMkernel has a special layer called the Pluggable
Storage Architecture (PSA)
 PSA makes multipathing flexible and allows for 3rd party
multipathing plugins (MPP)
 The native multipathing plugin is the VMWare NMP
(native multipathing plugin)
 NMP manages sub-plugins with SATP (storage arraytype plugins) and PSP (path selection plugins) being the
defaults
 PSA takes effect when VMkernel sends a SCSI
command to access data on a block device

Storage Troubleshooting
Storage Terms - PSA, MPP, NMP, SATP, PSP & ALUA
 MPP = multipathing plugin - can coexist with NMP and
can be used on a LUN or per array basis
 MPPs job is to discover physical storage devices and
determine claim rules to export a logical device
 Claim rules are found in /etc/vmware/esx.conf
 Path failover is delegated to SATP
 Path load balancing is delegated to PSP
 EMC PowerPath is an example of a 3rd party MPP
 3rd party MPPs can offer better availability options, better
performance, improved monitoring and better load
balancing

Storage Troubleshooting
Storage Terms - PSA, MPP, NMP, SATP, PSP & ALUA
 VMWare offers a SATP for every array they support
 SATP options include VMWares default SATP for local
storage, default SATP for generic active/active storage
and default SATP for ALUA storage
 ALUA = Asymmetric Logical Unit Access - what midrange arrays do to offer active/active paths (note that
ALUA must be configured on the array)
 Once the SATP is chosen, it will monitor health of each
storage path, report changes in the path status, perform
array-specific operations such as activating passive
paths if needed.

24

Storage Troubleshooting
Storage Terms - PSA, MPP, NMP, SATP, PSP & ALUA

Storage Troubleshooting
Storage Terms - PSA, MPP, NMP, SATP, PSP & ALUA


PSP (Path Selection Plugin) is responsible for the actual


path selected for every disk I/O

Three PSPs are available by default:






MRU (Most Recently Used) - default for active/passive


Fixed - default for active/active
Round Robin - Basic load balancing

It is possible that an array has an incorrectly assigned


MPP, SATP or PSP.

Storage Troubleshooting
Storage Terms - PSA, MPP, NMP, SATP, PSP & ALUA

25

Storage Troubleshooting
Storage Friendly Names, Identifiers and Runtime
Names




Storage friendly names -> can be changed by admin


Storage identifiers -> ususally NAA. Depends on storage
Storage runtime names -> adapter, channel, target, LUN

Storage Troubleshooting
Identify Log Files Used to Troubleshoot Storage


List of PSA modules loaded at boot is found in


/var/log/sysboot.log

Critical Storage events are logged in


/var/log/vmkernel.log

Storage Troubleshooting
Identifying and setting PSP via Command Line


List all available SATP -> esxcli storage nmp satp list

List all device details -> esxcli storage nmp device list

Set PSP to Round Robin -> esxcli storage nmp device set -device <device> --psp VMW_PSP_RR

Set default PSP for all new LUNs -> esxcli storage nmp satp set s VMW_SATP_DEFAULT_AA -P VMW_PSP_RR

26

Storage Troubleshooting
Identifying Storage Performance Issues with esxtop


esxtop only offers storage latency and throughput for iSCSI and
FC




Use the d key to view the adapters


Use the v key to list the virtual machines harddisks
Use the u key to list the physical disks

Setting I/O Controls

Enabling Storage I/O Control

27

Datastore Activity Per Host

Datastore Activity per VM

Virtual Disk Activity per VM

28

Agenda









Introduction & Preparation


vMA and command line
vCenter and ESXi logfiles
Troubleshooting vCenter Server 5
CPU and memory performance
Tracking and solving network
problems
Tracking and solving storage
problems
Troubleshooting DRS and HA

Troubleshooting vMotion


Enables live migration of running virtual machines from


one physical server to another with zero downtime,
continuous service availability, and complete transaction
integrity.

It is transparent to users.

vMotion lets you:




Automatically optimize and allocate entire pools of resources for


maximum hardware utilization and availability.
Perform hardware maintenance without any scheduled
downtime.
Proactively migrate virtual machines away from failing or
underperforming servers.

Troubleshooting vMotion
Host & VM Requirements for vMotion


Source and destination host requirements for vMotion:




Supported by your version of vSphere

Storage needs to be visible to all hosts

VMkernel NIC configured for vMotion with all hoss on


the same LAN

Compatible CPUs on all hosts




EVC may help

29

Enhanced vMotion Compatibility Improvements


Usability Improvements
Preparation for AMD Next
Generation w/o 3DNow!: Future
AMD CPUs may not support
3DNow!. To prevent vMotion
incompatibilities, a new EVC mode
is introduced.

Better handling of powered-on


VMs: vCenter Server now uses a
live VM's CPU feature set (instead
of host's CPU features) to
determine migration into an EVC
cluster. This will provide better
granularity in error detection.

Troubleshooting vMotion
Host & VM Requirements for vMotion


VM Requirements for vMotion




Not connected to an internal-only switch

Not connected to a virtual floppy or CD

No CPU affinity configured

Swap accessible by destination ESXi host

No VMDirectPath or Fault Tolerance enabled

If using RDM, must be accessible by destination host

Not working using MS Clustering with another VM (or physical server)

Troubleshooting vMotion







The IP address and subnet mask should match the correct network configuration of
the local LAN.
DNS and routing configuration should be correct for the local LAN.
VLAN settings should match the VLAN configuration of the local LAN.
Both ESXi VMkernel ports should be vMotion-enabled.
Both ESX/ESXi hosts must have a VMkernel port (vmk) on the same LAN.
Test by using vmkping on all hosts

30

What is Storage vMotion?




live migration of virtual machine disk files across heterogeneous


storage arrays with complete transaction integrity and no interruption
in service for critical applications.

Troubleshooting Storage vMotion


svMotion Requirements





Source ESXi host must have access to both source and


target datastore
If you both migrate to another host and datastore, you
must power off the VM
Up to four concurrent svMotion operations can occur at
one time
VMs with snapshots can be migrated!

Troubleshooting Storage vMotion


Common svMotion problems and resolutions
 Have the basic requirements for Storage
VMotion migration been met?
 Have you exceeded a Storage VMotion limit?
 Are the virtual machine disks in persistent
mode?
 Does the virtual machine have any raw device
mappings?
 Is VMware Tools being installed?

31

DRS Ensures Capacity on Demand

Shrink and grow of


applications based on
demand and priority
APP

APP

OS

OS

APP

APP

APP

APP

OS

OS

OS

OS

Dynamic and responsive


load balancing

VMware vSphere

DRS Overview


Balances load using





Initial placement
Dynamic balancing (VMotion)

VMotion requirements






VM disk & configfiles accessible by source


and destination (all hosts in the DRS cluster)
swap file path can be local
Access to same virtual network
CPU compatibility (EVC)
ESX hosts have VMKernel interface enabled
for VMotion

DRS Host Affinity


Required rules

Preferential rules
Rule enforcement: 2 options

Required: DRS/HA will never violate the


rule; event generated if violated manually.
Only advised for enforcing host-based
licensing of ISV apps.

Preferential: DRS/HA will violate the rule if


necessary for failover or for maintaining
availability

32

DRS Host Affinity


VMs
VM groups

Hosts

APP

APP

APP

APP

APP

OS

OS

OS

OS

OS

ChassisA

ChassisA

ChassisB

ChassisB

4-host DRS/HA cluster


Host groups
A  ChassisA

Rules

B  ChassisB

VM-VM anti-affinity rule enhancement


VM-VM anti-affinity rules can now incorporate more than 2 VMs

Troubleshooting DRS Clusters




Cluster Problems


Load imbalance -> migration treshold too high, rules, DRS


disabled for a VM, device mounted on a VM, VMs not compatible
with hosts, vMotion is not configured correctly

Cluster is yellow -> not enough resources to satisfy reservations


of resource pools (host removed/crashed)

Cluster is red -> resource pool tree not consistent or failover


capacity violated.

Troubleshooting DRS Clusters




Common resolutions to check












Cluster errors / status


DRS enabled? Treshold?
DRS Rules?
Resource pool configuration
vMotion configured correctly?
Devices mounted?
VM pinned (with RDMs)
VM not compatible with host?
Host state

33

Troubleshooting DRS Clusters




Common resolutions to check












Cluster errors / status


DRS enabled? Treshold?
DRS Rules?
Resource pool configuration
vMotion configured correctly?
Devices mounted?
VM pinned (with RDMs)
VM not compatible with host?
Host state

vSphere HA Architecture

FDM
vpxa

datastore

datastore

datastore

FDM

FDM
hostd

ESXi host (slave)

vpxa

hostd

ESXi host (slave)

vpxa

hostd

ESXi host (master)

vpxd

vCenter Server

= Management network

Troubleshooting HA Failovers


High Availability overview

vSphere HA provides high availability for virtual machines by pooling


them and the hosts that they reside on into a cluster. Hosts in the
cluster are monitored and in the event of a failure, the virtual machines
on a failed host are restarted on alternate hosts.

34

Troubleshooting HA Failovers


Incorrect Virtual Machine Protection State


When a virtual machine is powered on for several minutes, yet its
vSphere HA protection state remains as unprotected, if a failure
occurs, vSphere HA might not attempt to restart the virtual machine.

Cause
 vSphere HA master host has not been elected or vCenter Server
is unable to communicate with it.
 Multiple master hosts exist and the one with which vCenter
Server is communicating is not responsible for the virtual
machine.
 Agent is unable to access the datastore on which the
configuration file of the virtual machine is stored.

Troubleshooting VM Power-On


VM Swapfile location & diskspace




VM swap files (.vswp) are created at VM power on

By default, they are stored in the same folder as the VM disk file

Swap flies are the VMs configured memory less its reserved
memory (2GB conf - 1GB res = 1GB vswp)

VM swap files can be configured in 3 places:






On each VM
On the host
On the cluster

If there isnt enough space for the swap file to be created, then
the VM cant be powered on

Troubleshooting VM Power-On


Investigating virtual machine file locks on


ESXi


Powering on a virtual machine fails

When powering on the virtual machine, you see one of these


errors:





Unable
Unable
Unable
Unable

to open Swap File


to access a file since it is locked
to access a file <filename> since it is locked
to access Virtual machine configuration

Powering on the virtual machine results in the power on task


remaining at 95% indefinitely

35

Troubleshooting VM Power-On


Solution

To identify the locked file, attempt to power on the virtual machine.


During the power on process, an error may display or be written to the
virtual machine's logs. The error and the log entry identify the virtual
machine and files:
1.

Where applicable, open and connect the VMware Infrastructure (VI) or vSphere
Client to the respective ESX host, VirtualCenter Server, or the vCenter Server
hostname or IP address.

2.

Locate the affected virtual machine, and attempt to power it on.

3.

Open a remote console window for the virtual machine.

4.

If the virtual machine is unable to power on, an error on the remote console
screen displays with the name of the affected file.

Troubleshooting VM Power-On


Identifying the locked file




To prevent concurrent changes to critical virtual machine files and file systems,
ESX hosts establish locks on these files. In certain circumstances these locks
may not be released when the virtual machine is powered off. The files cannot be
accessed by the servers while locked, and the virtual machine is unable to power
on.

These virtual machine files are commonly locked for runtime:









<VMNAME>.vswp
<DISKNAME>-flat.vmdk
<DISKNAME>-<ITERATION>-delta.vmdk
<VMNAME>.vmx
<VMNAME>.vmxf
vmware.log

Troubleshooting VM Power-On


Using the touch utility to determine if the file


can be locked

To test the file or directory locking functionality, run this command:


# touch <filename>
Note: Performing a "touch *" command performs the operation on all files in the
current directory.

If the touch * command succeeds, then the command successfully made changes to
the date/time stamp and has verified that the file can and has been locked (then
unlocked). At this point, retry the virtual machine power-on operation to see if it
succeeds.

If the touch * command fails with a device or resource busy message, it indicates that
a process is maintaining a lock on the file or directory.

36

Troubleshooting VM Power-On


Locating the lock and removing it

Start with identifying the server whose VMkernel may be locking the file. To
identify the server:


Report the MAC address of the lock holder by running the command (except on NFS
volume):
# vmkfstools -D /vmfs/volumes/<UUID>/<VMDIR>/<LOCKEDFILE.xxx>

Note: If this process does not reveal the MAC address, or the owner identifier is all zeroes, it is
possible that it is a Service Console-based lock, an NFS lock, or a lock generated by another
system or product that can use or read VMFS file systems. In other circumstances, the file is
locked by a VMkernel child or cartel world and the offending host running the process/world must
be rebooted to clear it.


To check for Service Console-based locks on non-ESXi servers, run this command:
# lsof | grep <name of locked file>

Stop the process ID and its lock using the kill command. From the above example, the
process ID is 3631: # kill 3631

Troubleshooting VM Power-On


Locating the lock and removing it

In ESXi 5.0, to find the owner of the locked file of a virtual machine, run this
command:
# vmkvsitools lsof | grep <Virtual Machine Name>
You see an output similar to:
11773 vmx 12 46 /vmfs/volumes/<Datastore
Name>/VirtualMachineName/VirtualMachineName-flat.vmdk
You can then run this command to get the PID of the process for the virtual
machine: ps | grep <PID>

You can kill the process with this command: kill -9 <PID>

Questions?

37

S-ar putea să vă placă și