VSOS6 M08 StorageOptimization

Storage Optimization
Module 8
2015 VMware Inc. All rights reserved.

You Are Here
1. Course Introduction 7. Storage Scalability
2. vSphere Security 8. Storage Optimization
3. VMware Management 9. CPU Optimization

Resources
10. Memory Optimization
4. Performance in a Virtualized
11. Virtual Machine and Cluster
Environment
Optimization
5. Network Scalability
12. Host and Management
6. Network Optimization Scalability
VMware vSphere: Optimize and Scale 8-2

Importance
Storage can limit the performance of enterprise workloads. You should
know how to monitor a hosts storage throughput. You should also know
how to troubleshoot problems that result in overloaded storage and slow
storage performance.

Module Lessons
Lesson 1: Storage Virtualization Concepts
Lesson 2: Monitoring Storage Activity
Lesson 3: Troubleshooting Storage Performance Problems

Lesson 1:
Storage Virtualization Concepts
8-5
Learner Objectives
By the end of this lesson, you should be able to meet the following
objective:
Describe factors that affect storage performance

Storage Performance Overview
Performance of centralized storage in VMware vSphere hinges on
many factors:
Storage protocols that are used:
Fibre Channel, Fibre Channel
over Ethernet (FCoE),
hardware iSCSI, software iSCSI,
NFS
Proper storage configuration
Load balancing
Queuing and LUN queue depth
VMware vSphere VMFS configuration:
Choosing between VMFS and raw device
mappings (RDMs)
SCSI reservations
Virtual disk types

Storage Protocol Performance
VMware ESXi supports Fibre Channel, FCoE, hardware iSCSI,
software iSCSI, and NFS.
All storage protocols are capable of delivering high throughput
performance:
When CPU is not a bottleneck, software iSCSI and NFS can be part of a high-
performance solution.
ESXi hosts provide support for high-performance hardware features:
16 Gb Fibre Channel
Software and hardware iSCSI and NFS support for jumbo frames:
Using Gigabit,10 Gb, and 40 Gb Ethernet NICs
Using 10 Gb iSCSI hardware initiators

About SAN Configuration
Proper SAN configuration can help eliminate performance issues.
Each LUN should have the correct RAID level and storage
characteristics for the applications in virtual machines that use it.
Choice of path selection policy greatly affects SAN performance:
Most Recently Used
Fixed
Round Robin
Optional third-party path selection policy

Storage Queues
Several storage queue types exist:
Queueing at the host:
Device driver queue controls the number of
Queues active commands on the LUN at any time.
VMkernel queue is an overflow
queue for device driver queue.
Queuing at the storage array:
Queuing occurs when the number of active
commands to a LUN is too
high for the storage array to handle.
Latency increases with excessive

Queues queuing at host or storage array.

Device Driver Queue Depth
Device driver queue depth determines
how many commands to a given LUN
can be active at one time.
Set device driver queue depth size
properly to decrease disk latency:
QLogic adapters depth of queue is 64
(default).
Other brands depth of queue is 32 (default).
Maximum recommended queue depth is
64.
Set Disk.SchedNumReqOutstanding
to the same value as the queue depth.

Network Storage: iSCSI and NFS
Avoid oversubscribing links to storage:
Applications or systems that write large amounts of data to storage should not
share Ethernet links to a storage device.
Oversubscribed links cause packet drops, which degrade performance.
VLANs do not solve the problem of oversubscription, but isolating iSCSI
traffic and NFS traffic on private VLANs is helpful.
For software iSCSI and NFS, protocol processing uses CPU resources
on the host.

VMFS Versus RDMs
Choosing the right disk-access method can be a key factor in achieving
high performance for enterprise applications:
VMFS is the preferred option for most enterprise applications.
RDM is preferred when raw disk access is necessary.
I/O Characteristic Choice for Better Performance
Random reads/writes VMFS and RDM yield similar

I/O operations per second.
Sequential reads/writes VMFS and RDM yield similar performance.

at small I/O block sizes
Sequential reads/writes VMFS.

at larger I/O block sizes

Review of Learner Objectives
You should be able to meet the following objective:
Describe factors that affect storage performance

Lesson 2:
Monitoring Storage Activity
8-15
Learner Objectives
objectives:
Determine which disk metrics to monitor
Identify metrics in vCenter Server and resxtop
Demonstrate how to monitor disk throughput

Performance Charts and Space Utilization Data
The overview charts on the
Performance tab show
usage details for a
datastore.
Several charts are available
by default:
Space Utilization
By Virtual Machines
(Top 5)
1 Day Summary

About Disk Capacity Metrics
To identify disk-related performance problems, determine the available
bandwidth on your host and compare it with your expectations.
vSphere includes key metrics to determine available disk bandwidth and
capacity:
Disk throughput
Latency (device, kernel)
Number of aborted disk commands
Number of active disk commands
Number of active commands queued

Monitoring Disk Throughput with vSphere Web Client
Advanced disk performance chart metrics to monitor include Disk Read
Rate, Disk Write Rate, Disk Usage.

Monitoring Disk Throughput with resxtop
Use the resxtop command to monitor disk throughput. Enter d for the
Adapter view.
Disk throughput appears in various columns:

READs/s and WRITEs/s
MBREAD/s and MBWRTN/s

Disk Throughput Example
Several views are available for disk throughput.
Adapter view: Enter d.
Device view: Enter u.
Virtual machine view: Enter v.

Monitoring Disk Latency with vSphere Web Client
Disk latency is the time taken to complete an I/O request and is most
commonly measured in milliseconds. Several types of counters measure
disk latency:
Physical device latency counters
Kernel latency counters

Monitoring Disk Latency with resxtop
In addition to disk throughput, the disk adapter screen of resxtop
enables you to monitor disk latency.
Host bus adapters include Latency stats from the

SCSI, iSCSI, RAID, and FC- device, the VMkernel,
HBA adapters. and the guest.
DAVG/cmd: Average latency (ms) of the device (LUN)

KAVG/cmd: Average latency (ms) in the VMkernel, also called
queuing time
GAVG/cmd: Average latency (ms) in the guest:
GAVG = DAVG + KAVG

Monitoring Commands and Command Queuing
Metrics for monitoring the number of active disk commands and the
number of disk commands that are queued are available.
Performance Metric Name in vSphere Name in

Web Client resxtop or esxtop
Number of active commands: Commands issued ACTV
I/O operations currently active
Number of commands Queue command QUED

queued: latency
I/O operations that require
processing

Disk Latency and Queuing Example
You use resxtop disk latency readings to indicate device queueing.
Normal
Adapter view: Enter d. VMkernel
Latency
Queuing at
Device view: Enter u. the Device

Monitoring Severely Overloaded Storage
Aborted commands are a sign that storage hardware is overloaded and
unable to handle the requests in line with the hosts expectations. The
Disk Command Aborts performance counter and ABRTS/s resxtop
counter show the number of aborted commands.

About Datastore Alarms
Alarms can be configured to detect when specific conditions or events
occur against a selected datastore. Monitoring for events offers more
options.

About Datastore Alarms
Use the Alarm sidebar panel (right side of the vSphere Web Client
window) to quickly check for alarms. After a triggered alarm is
discovered, you must gather information about the reason for the alarm.

Introduction to Lab 9: Monitoring Storage Performance
You use various methods to monitor storage performance.
VMFS
Linux01 Linux01.vmdk System Disk

fileserver1.sh Linux01_1.vmdk Local Data Disk
Shared Storage
fileserver2.sh VMFS
datawrite.sh Remote
Linux01.vmdk
logwrite.sh Data Disk

Your Assigned LUN

Lab 9: Monitoring Storage Performance
Use a vSphere advanced chart to monitor disk performance across a series
of tests
1. Prepare the Test Virtual Machine
2. Prepare the IP Storage Network for Testing
3. Create a Real-Time Disk I/O Performance Chart
4. Prepare to Run Tests
5. Measure Continuous Sequential Write Activity to a Virtual Disk on a Remote
Datastore
6. Measure Continuous Random Write Activity to a Virtual Disk on a Remote
Datastore
7. Measure Continuous Random Read Activity to a Virtual Disk on a Remote
Datastore
8. Measure Continuous Random Read Activity to a Virtual Disk on a Local Datastore
9. Analyze the Test Results
10. Clean Up for the Next Lab

Review of Lab 9: Monitoring Storage Performance
Test Name Latest Write Rate Latest Read Rate
Test 1
Sequential Writes to a Virtual
Disk on a Remote Datastore
Test 2
Random Writes to a Virtual Disk
on a Remote Datastore
Test 3
Random Reads from a Virtual
Disk on a Remote Datastore
Test 4
Random Reads from a Virtual
Disk on a Local Datastore

You should be able to meet the following objectives:
Determine which disk metrics to monitor
Identify metrics in vCenter Server and resxtop
Demonstrate how to monitor disk throughput

Lesson 3:
Troubleshooting Storage
Performance Problems
8-33
Learner Objectives
objectives:
Describe storage performance problems
Discuss causes of storage performance problems
Propose solutions to correct storage performance problems
Discuss examples of troubleshooting storage performance problems

Review: Basic Troubleshooting Checklist for ESXi Hosts
1. Check for VMware Tools status.
2. Check for resource pool

CPU saturation. 11. Check for using only one vCPU
in a Virtual SMP virtual machine.
16. Check for low guest CPU
3. Check for host CPU saturation.
utilization.
12. Check for high CPU ready time
4. Check for guest CPU saturation. on virtual machines running in
17. Check for past virtual machine
underutilized hosts.
memory swapping.
5. Check for active virtual machine
memory swapping.
13. Check for slow storage device. 18. Check for high memory
demand in a resource pool.
6. Check for virtual machine swap wait.
14. Check for unexpected increase
7. Check for active virtual machine In I/O latency on a shared 19. Check for high memory
memory compression. storage device. demand in a host.
8. Check for an overloaded 20. Check for high guest memory

15. Check for unexpected increase demand.
storage device.
in data transfer rate on
network controllers.
9. Check for dropped receive
packets.
10. Check for dropped transmit

packets.
Definite Problems Likely Problems Possible Problems

Overloaded Storage and Command Aborts
If a storage device is experiencing command aborts, the cause of these
aborts must be identified and corrected.
To monitor the number of disk commands aborted on the host:
1. Select the host, click the Monitor tab, and click the Performance tab.
2. On the Chart Options page, select the Commands Aborted counter.
If Commands Aborted > 0 for any LUN, then storage is overloaded on

that LUN.

Causes of Overloaded Storage
Overloaded storage is generally caused by excessive demand being
placed on the storage device or by storage being misconfigured.
Storage misconfiguration occurs in the configuration of LUN
characteristics:
Number of disks per LUN
RAID level of a LUN
Assignment of array cache to a LUN

Slow Storage
Slow storage is the most common cause of performance problems in a
vSphere environment.
For a hosts LUNs, monitor Physical Device Read Latency and Physical
Device Write Latency counters:
If average > 10 ms or peak > 20 ms for any LUN, then storage might be slow
on that LUN.
You can also monitor device latency (DAVG/cmd) in resxtop:
If value > 10, a problem might exist.
If value > 20, a problem exists.

Factors Affecting Storage Response Time
Storage subsystem response time is affected by I/O arrival rate, I/O size,
and I/O locality.
Use the storage devices monitoring tools to collect data to characterize
the workload and see whether it is a performance problem.

Unexpected Increase in I/O Latency on Shared Storage
Applications, when consolidated, share expensive physical resources
such as storage.
Situations might occur when high I/O activity on shared storage affects
the performance of latency-sensitive applications:
Very high number of I/O requests are issued concurrently.
Operations, such as a backup operation in a virtual machine, use the I/O
bandwidth of a shared storage device.
To resolve these situations, use VMware vSphere Storage I/O Control
to control each virtual machines access to I/O resources of a shared
datastore.

Example 1: Bad Disk Throughput
View latency values to determine whether a performance problem truly
exists. Good Low Device
Throughput Latency
resxtop Output 1
Bad High Device Latency Due

Throughput to Disabled Cache
resxtop Output 2

Example 2: Slow Virtual Machine Power On
When powering on a virtual machine takes longer than usual, check disk
metrics on the host.
Because powering on a virtual machine requires disk activity on the host,
check for disk latency.

Monitoring Disk Latency with vSphere Web Client
The vSphere Web Client overview chart for an ESXi hosts disk gives a
quick look at the highest disk latency experienced by the host.
Maximum disk
latencies range
from 100 ms to
1100 ms.
This latency is
very high.

Monitoring Disk Latency with resxtop
To further investigate disk latency, use resxtop to view latency
information. In general, if the value of GAVG/cmd is greater than 20
milliseconds, latency is high.
Very Large Values

for DAVG/cmd
and GAVG/cmd

Solving the Problem of Slow Virtual Machine Power On
Solving the problem of slow virtual machine power on involves not only
looking at performance statistics but also looking to whether any errors
are encountered on the host.

Example 3: Slow Login to Virtual Machines
When users suddenly cannot log in to any virtual machines on a single
ESXi host, check to see if the virtual machines are saturating a resource.
Resource saturation can occur when each virtual machines application
reads from and writes to the same NAS device and is compounded when
the NAS device is also a virtual machine.

Monitoring Host CPU Usage
High or chaotic CPU usage can show that a host is saturated.

Monitoring Host Disk Usage
Uneven, reduced disk throughput can show that a hosts disk is
saturated.

Monitoring Disk Throughput
Comparing read and write traffic helps identify what is saturating a disk.

Solving the Problem of Slow Virtual Machine Login
An application bug in a virtual machine can cause unwanted conditions
in a group of virtual machines:
CPU usage increases per virtual machine.
Write traffic increases per virtual machine.
Write traffic to the NAS virtual machine significantly increases.
Virtual machines are so busy performing writes that they never perform reads.

Resolving Storage Performance Problems
When resolving storage performance problems, follow these guidelines:
Check your hardware for proper operation and optimal configuration.
Reduce the need for storage by your hosts and virtual machines.
Balance the load across available storage.
Understand the load being placed on storage devices.

Checking Storage Hardware
When resolving problems of slow or overloaded storage, follow these
guidelines:
Ensure that hardware is working properly.
Configure the HBAs and RAID controllers for optimal use.
Upgrade your hardware, if possible.

Storage Performance Best Practices
To get the best performance from storage, follow these practices:
Configure each LUN with the correct RAID level and storage characteristics for
applications and virtual machines that use the LUN.
Avoid oversubscribing paths (SAN) and links (iSCSI and NFS).
User Storage DRS and Storage Op Control whenever applicable.
Isolate iSCSI and NFS traffic.
Applications that write a lot of data to storage should not share Ethernet links
to a storage device.
Postpone major storage maintenance until off-peak hours.
Eliminate all possible swapping to reduce the burden on the storage
subsystem.
In SAN configurations, spread I/O loads over the available paths to the storage
devices.
Strive for complementary workloads.

You should be able to meet the following objectives:
Describe storage performance problems
Discuss causes of storage performance problems
Propose solutions to correct storage performance problems
Discuss examples of troubleshooting storage performance problems

Key Points
Flash Read Cache enables you to accelerate virtual machine performance
through the use of host-resident flash devices as a cache.
Virtual SAN is a hybrid storage system that uses and aggregates local SSDs
and local HDDs to provide a clustered, shared datastore.
Factors that affect storage performance include storage protocols, storage
configuration, queuing, and VMFS configuration.
Disk throughput and latency are key metrics to monitor storage performance.
Overloaded storage and slow storage are common storage performance
problems.
Questions?


VSOS6 M08 StorageOptimization

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

VSOS6 M08 StorageOptimization

Încărcat de

Drepturi de autor:

Formate disponibile

Storage Optimization

2015 VMware Inc. All rights reserved.

1. Course Introduction 7. Storage Scalability

2. vSphere Security 8. Storage Optimization

3. VMware Management 9. CPU Optimization

VMware vSphere: Optimize and Scale 8-2

VMware vSphere: Optimize and Scale 8-3

VMware vSphere: Optimize and Scale 8-4

VMware vSphere: Optimize and Scale 8-6

Virtual disk types

VMware vSphere: Optimize and Scale 8-7

VMware vSphere: Optimize and Scale 8-8

VMware vSphere: Optimize and Scale 8-9

Latency increases with excessive

VMware vSphere: Optimize and Scale 8-10

VMware vSphere: Optimize and Scale 8-11

VMware vSphere: Optimize and Scale 8-12

I/O Characteristic Choice for Better Performance

Random reads/writes VMFS and RDM yield similar

Sequential reads/writes VMFS and RDM yield similar performance.

Sequential reads/writes VMFS.

VMware vSphere: Optimize and Scale 8-13

VMware vSphere: Optimize and Scale 8-14

VMware vSphere: Optimize and Scale 8-16

VMware vSphere: Optimize and Scale 8-17

VMware vSphere: Optimize and Scale 8-18

VMware vSphere: Optimize and Scale 8-19

Disk throughput appears in various columns:

VMware vSphere: Optimize and Scale 8-20

Device view: Enter u.

Virtual machine view: Enter v.

VMware vSphere: Optimize and Scale 8-21

VMware vSphere: Optimize and Scale 8-22

Host bus adapters include Latency stats from the

DAVG/cmd: Average latency (ms) of the device (LUN)

VMware vSphere: Optimize and Scale 8-23

Performance Metric Name in vSphere Name in

Number of commands Queue command QUED

VMware vSphere: Optimize and Scale 8-24

VMware vSphere: Optimize and Scale 8-25

VMware vSphere: Optimize and Scale 8-26

VMware vSphere: Optimize and Scale 8-27

VMware vSphere: Optimize and Scale 8-28

Linux01 Linux01.vmdk System Disk

VMware vSphere: Optimize and Scale 8-29

VMware vSphere: Optimize and Scale 8-30

Test Name Latest Write Rate Latest Read Rate

VMware vSphere: Optimize and Scale 8-31

VMware vSphere: Optimize and Scale 8-32

VMware vSphere: Optimize and Scale 8-34

1. Check for VMware Tools status.

2. Check for resource pool

8. Check for an overloaded 20. Check for high guest memory

10. Check for dropped transmit

Definite Problems Likely Problems Possible Problems

VMware vSphere: Optimize and Scale 8-35

If Commands Aborted > 0 for any LUN, then storage is overloaded on

VMware vSphere: Optimize and Scale 8-36

VMware vSphere: Optimize and Scale 8-37

VMware vSphere: Optimize and Scale 8-38

VMware vSphere: Optimize and Scale 8-39

VMware vSphere: Optimize and Scale 8-40

Bad High Device Latency Due