Sunteți pe pagina 1din 53

VMware vSphere Data Protection (VDP) Technical

Deep Dive And Troubleshooting Session


Darryl Hing, VMware Canada
Jacy Townsend, VMware
BCO4756
#BCO4756
2
Agenda
What Is VDP?
Concepts
Gathering the log bundle
Log Analysis
Backup Best Practices
Commands
Resources

3
Overview
File and image level; Full and incremental backups
.
Variable Length Block Deduplication
4
Overview
Replacement for VDR
1 Optimized for Virtual
Advanced Dedupe
3 Backup and Recovery 4
2
5
Overview
Next generation backup and
recovery solution
Superior capabilities
Next Gen
Tightly integrated with vSphere
Webclient
Integration
6
Key Features
Up to 100 VMs per appliance
100 VMs
Up to 8 TB of De-duplicated backup data
capacity per appliance
8TB Dedupe
Up to 10 VDP virtual appliances are supported per
vCenter
10 appliances
7
Key Features
Powered by EMC
Avamar
Bundled with
vSphere 5.1
Essentials & +,
Standard, Enterprise
& Enterprise Plus
Variable Length
Dedupe
8
Prerequisites
vSphere Web Client vCenter Server 5.1
SSO & Inventory
Services
VDP Appliance
VDP Plugin
9
Important URLs
Configuration: https://<VDP_IP>:8543/vdp-configure
Management URL: https://<vCenter_IP>:9443/vsphere-client
FLR Portal: https://<VDP_IP>:8543/flr
Default Credentials: root/changeme
10
Agenda
What Is VDP?
Concepts
Gathering the log bundle
Log Analysis
Backup Best Practices
Commands
Resources

11
Terminologies General Backup
VMware ESXi &
ESX
SNAPSHOT
Snapshot: Preserve state of VM at
point in time including power state.
Full Backup: Complete backup of
VM.
Full Backup Differential
Differential: Files changed since
last FULL backup.
4
Incremental: Files changed since
last backup.
4
Incremental
5
File Level Restore (FLR): Restore
files individually.
5
1
2
3
1
2 3
12
Terminologies Backup Types
Full Backup
Cumulative or
Differential
Incremental
Full Cumulative Incremental
13
Terminologies VMware Specific
CBT: Identifies disk sectors
altered.
Microsoft VSS: Automatic or
manual backups and
snapshots of data.

Quiescing: Pause or alter
running processes that can
modify disk during backup.

Steady State: When data
being imported to the dedupe
store is less or equal to the
amount of data being pruned
14
Terminologies - RPO & RTO
Recovery Time Objective (RTO) How quickly you need to have applications back
up and running after downtime.
Recovery Point Objective (RPO) Point to which data must be restored to
successfully resume work.

RTO RPO
Major Incident Last backup Backup
Data Restored
15
Terminologies - Deduplication
VM-A
A B C
D E F
1 2 3
VM-B
F E D
3 1 2
C A B
Source
Object
Pointers
Data
Compression
A B C D E F 1 3 2
X Y Z
Identify duplicate or
redundant data
Only unique data is
stored
Saves pointers
instead of multiple
copies
Consumes less
disk space
16
Backup Process
Sticky Byte
Factoring
Compression
Hashing
Store hash and
Data on GSAN
#
1 2 3
4
1
2
3
4
17
Sticky Byte Algorithm
Data chunks average size is 24kB
Data chunks vary in size between 1 and 64kB
10000000001000000000
00100000000000000000
10010001000001000001
10101010001010001010
10kB 25kB 5kB
40kB
10000000000000000000
00110000000000000000
10010001000001000001
10101010001010001010
5kB 10kB 25kB
40kB
First Backup
Subsequent Backup Change in VM
18
Terminologies Compression
Chunks are compressed
to 30% - 50% there
original size
Average compressed
chunk size 12 kB 16kB
Compression occurs
when we can achieve
=>25% compression
2kB
1kB
5kB 8kB
10kB 25kB 5kB
40kB
19
Terminologies Hashing
Hashing continues until
a single root hash for the
backup is created
Atomic hashes
are combined to
create composites.
The hash created from
each data object is
called an atomic hash.
Data is used to create
the hash, but it is not
converted into the hash
1
2
3
4
20
VMware Backup History
VDP
2013 -> TBA
VDR
2009 -> TBA
VCB
2006 - 2010
21
Agenda
What Is VDP?
Concepts
Gathering the log bundle
Log Analysis
Backup Best Practices
Commands
Resources
22
Log Procurement
Open the VDP configure
URL
Click Collect Logs
Name appropriately
23
How To Scope a VDP Issue
Who?
1 What?
When?
3 Where? 4
2
24
Core Services
Scheduler
/usr/local/avamar/var/mc/server_log/mcserver.log

MCS
Worker Thread
/usr/local/avamarclient/var-proxy-N/avagent.log
AvaAgent
VMware API Module
/usr/local/avamarclient/var-proxy-N/<Jobname>-
<EPOCH>-vmimage[w|l].log
AvVcbImage
25
Core Services
Deduplication and Compression
/usr/local/avamarclient/var-proxy-N/
<Jobname>-<EPOCH>-vmimage[w|l]_avtar.log
AvTar
Storage
/data01/cur/gsan.log
GSAN
26
Log Locations
/usr/local/avamar/var/vdr/server_logs/vdr-configure.log
Installation
/usr/local/Avamar/var/avi/server_log/avinstaller.log*
/usr/local/Avamar/var/avi/server_log/AvamarInstallSles*.log
Configuration
27
Log Locations
/usr/local/avamar/var/mc/server_log/mcserver.log*
/usr/local/avamar/var/vdr/server_logs/vdr-server*
/usr/local/avamar/var/log/dpnctl.log*
/usr/local/avamarclient/var-proxy-N/avagent*.log
/usr/local/avamarclient/var-proxy-N/<jobname>-<EPOCH>-
vmimage[w|l].log
/usr/local/avamarclient/var-proxy-N/<jobname>-<EPOCH>-
vmimage[w|l]_avtar.log
/data01/cur/gsan.log
Backup and Restore
28
Log Locations
/usr/local/avamar/var/flr/server_log/flr-server.log
/usr/local/avamarclient/bin/logs/FlrMerged.log
/usr/local/avamarclient/bin/logs/VmwareFlr.log
/usr/local/avamarclient/bin/logs/VmwareFlrWs.log
File Level Restore(FLR)
29
ALG File About the Job
<proxyDirectives>
<flag type="string" value="vm-221" name="vm_moref" />
<flag type="string" value="Windows Server 2008 R2"
name="guest_fullname" />
<flag type="string" value=VDPTest" name="vmname" />
<flag type="string" value="[VMStore1] VDPTest/VDPTest.vmx"
name="vmx_path" />
<flag type="string" value="/VDP_Lab" name="vmware_datacenter" />
<flag type="string" value="192.168.8.31" name="esxserver" />
<flag type="string" value="192.168.8.43" name="vmware_server" />
</proxyDirectives>

ALG File
30
LOG File About the Process
2013-03-05 01:03:37 avvcbimage Info <9754>: VDDK IO
102400.00 MB, Performance: 297.5 MB/minute, Duration:
05:44:15
2013-03-04 16:38:53 avvcbimage Warning <14654>: The in-use
blocks (pass 1) could not be found for 'VDP-
136243273610203b57a3b4bb8946f82f4a78bdb8e0d0da870a', using
disk extents.
2013-03-05 01:09:25 avvcbimage Error <9769>: Timeout on wait
for spawned avtar process to complete
2013-03-05 01:09:25 avvcbimage FATAL <16018>: The datastore
information from VMX '[VMStore1]
VDP_Protected_VM/VDP_Protected_VM.vmx' will not permit a
restore or backup.
LOG File
31
Finding the Work Order Logs Quickly
# cd /usr/local/avamarclient/var-proxy-3
# IFS=$(echo -en "\n\b");for i in `ls *.alg`;do grep -m 1 " START" $i | rev | awk
'{print $4" "$5}' | rev;grep vmname $i|awk -F\" '{print $4}';echo
$i;echo;done;unset IFS

2013-03-04 16:32:14
VM_Name_1
Daily 5 Day Retention-1362432700504-
618a82a5277ebb1dd536b018a407a21582926e6a-3016-vmimagew.alg

2013-03-05 16:07:30
VM_Name_1
Daily 5 Day Retention-1362517629476-
6acb4658af622ac48a52d73247aad95b1887af7c-3016-vmimagew.alg
Finding Work Orders
32
Scenario 1
/usr/local/avamar/var/mc/server_log/mcserver.log*
/usr/local/avamar/var/vdr/server_logs/vdr-server*
/usr/local/avamar/var/log/dpnctl.log*
/usr/local/avamarclient/var-proxy-N/avagent*.log
/data01/cur/gsan.log
Logs
33
Scenario 1
2013-03-05 23:01:35 avvcbimage Info <16001>: Found 1 disk(s),
0 snapshots, and 1 snapshot ctk files, on the VMs datastore.
2013-03-05 23:01:35 avvcbimage Warning <16002>: Too many extra
snapshot files (1) were found on the VMs datastore. This can
cause a problem for the backup or restore.
2013-03-05 23:01:35 avvcbimage FATAL <16018>: The datastore
information from VMX '[VMStore1]
VDP_Protected_VM/VDP_Protected_VM.vmx ' will not permit a
restore or backup.
2013-03-05 23:01:35 avvcbimage Info <0000>: Starting graceful
(staged) termination, Too many pre-existing snapshots will not
permit a restore. (wrap-up stage)
2013-03-05 23:01:35 avvcbimage Error <9759>: createSnapshot:
snapshot creation failed
LOG File
34
Scenario 2
$grep "Node restarted" ./data01/cur/err.log
2013/02/26-17:52:38.81009 {P0.0} [gsan] <0017> Node restarted
When?
2013/02/26-17:52:35.07740 {0.0} [strtask.6:3281] <0055> checkpoint
cp.20130223140423 3300 out of 3590 stripes complete

2013/02/26-17:52:36.21084 {0.0} [perfbeat.0:273] WARN:
<0963> server node 0.0 is swapping: check configuration

2013/02/26-17:52:38.81009 {P0.0} [gsan] <0017> Node restarted
Why?
35
Scenario 2 Successful Checkpoint Sample
2013/02/27-14:18:54.19296 {0.0} [manage:196] <0054>
checkpoint cp.20130227141853 started
2013/02/27-14:18:58.14928 {0.0} [strtask.1:3247] <0055>
checkpoint cp.20130227141853 300 out of 3595 stripes
complete
2013/02/27-14:19:00.72912 {0.0} [strtask.2:3483] <0055>
checkpoint cp.20130227141853 600 out of 3595 stripes
complete
<SNIP>
2013/02/27-14:19:27.42271 {0.0} [manage:2746] <0056>
checkpoint cp.20130227141853 completed
2013/02/27-14:19:27.50773 {0.0} [sched.cp:3263] <4301>
completed checkpoint maintenance
/data01/cur/err.log
36
Scenario 3 Storage Performance

2013/01/24-01:09:47.04134 {0.0} [perfbeat.7:197] WARN:
<1060> perfbeat::outoftolerance mask=[backup,restore]
average=2191.09 limit=219.1092 mbpersec=0.04
/data01/cur/gsan.log
37
Scenario 2
#grep perfbeat /data01/cur/err.log |
awk '{print $1"="$10}' | awk -F= '{print $1" - "$3}'
2013/02/18-13:16:05.93532 - 10.95
2013/02/18-13:19:40.12223 -
2013/02/18-13:20:44.07831 - 25.40
Performance Data
2013/02/18-13:19:40.12223 {0.0} [perfbeat.0:218] WARN:
<0963> server node 0.0 is swapping: check configuration
Swapping
38
What Next?
Review the monitor logs (vmware.log) at the time of the incident
for both the VDP appliance and the target VM.
1
Review the vCenter logs at the time logs at the time of the incident
2
Review the ESX logs (hostd/vmkernel) at the time of the incident.
3
39
Agenda
What Is VDP?
Concepts
Gathering the log bundle
Log Analysis
Backup Best Practices
Troubleshooting
Administration
Commands
Resources

40
Should only be used to resume
daily backups. Should not be
used as a workaround except in
extreme conditions.
Backup Best Practices - Troubleshooting
Redploy VDP
Define:
Who , What , When
Where and WHY
SCOPE - W5
Understand how the product
works and which modules
communicate with other modules.
Communications
41

Plan your deployment
Backup Best Practices Administration
Plan
Ensure your storage
infrastructure can handle
the capacity and load.
Always use HCL
hardware

Storage
Separate and group the
workload between
appliances, or
deduplication stores
Separate
42
Check backups regularly,
do not set and forget
Backup Best Practices Administration
Set And Forget
Think about single points
of failure and consider
correcting these
conditions.
Single Points Of Failure
At => 60% space
utilization be mindful of
storage consumption.
Consumption
43
Limit on-demand backups
during the maintenance
window
Backup Best Practices Administration
On Demand Backups
Avoid initiating
on-demand maintenance
activities (CP, CP
Validation, or GC)
On Demand Maintenance
44
Backup Best Practices Administration
Check the status of the deduplication
store. (Checkpoints)
Check the status of the backup
subsystems.
Review any failed backups.
Weekly
Test restore plan. Ensure business
continuity.
Review and correct any new trends.
Review storage performance, and
storage growth.
Monthly / Quarterly
45
Agenda
What Is VDP?
Concepts
Gathering the log bundle
Log Analysis
Backup Best Practices
Commands
Resources
46
Commands - MCCLI
root@vdp:~/#: mccli server show-prop

State Full Access
Total capacity 535.7 GB
Capacity used 1.7 GB
Server utilization 0.3%
Bytes protected 10.0 GB
Time since Server initialization 21 days 21h:48m
Last checkpoint 2013-03-27 11:26:37 PDT
Last validated checkpoint 2013-03-27 11:26:37 PDT
System Name vdp.vdp.lab
IP address 192.168.2.99:26000
show-prop
47
Commands - MCCLI
root@vdp:~/#: mccli server show-services

Name Status
-------------------------------- ---------------------------
Hostname vdp.vdp.lab
IP Address 192.168.2.99
Load Average 0.97
Last Admin Datastore Flush 2013-04-18 07:45:00 PDT
PostgreSQL database Running
192.168.2.103 All vCenter connections OK.
show-services
48
Agenda
What Is VDP?
Concepts
Gathering the log bundle
Log Analysis
Backup Best Practices
Commands
Resources

49
VMware Backup History - VDP
References
Datasheet: http://www.vmware.com/files/pdf/products/vsphere/VMware-vSphere-with-Operations-
Management-Datasheet.pdf
Admin Guide: http://www.vmware.com/files/pdf/products/vsphere/VMware-vSphere-Data-Protection-
Administration-Guide.pdf
VDDK Guide: https://www.vmware.com/support/developer/vddk/vddk-511-releasenotes.html





50
Other VMware Activities Related to This Session
HOL:
HOL-SDC-1305
Business Continuity and Disaster Recovery In Action
Group Discussions:
BCO1002-GD
Data Protection and Backup with Jeff Hunter


BCO4756
THANK YOU
VMware vSphere Data Protection (VDP) Technical
Deep Dive And Troubleshooting Session
Darryl Hing, VMware Canada
Jacy Townsend, VMware
BCO4756
#BCO4756

S-ar putea să vă placă și