Darryl Hing, VMware Canada Jacy Townsend, VMware BCO4756 #BCO4756 2 Agenda What Is VDP? Concepts Gathering the log bundle Log Analysis Backup Best Practices Commands Resources
3 Overview File and image level; Full and incremental backups . Variable Length Block Deduplication 4 Overview Replacement for VDR 1 Optimized for Virtual Advanced Dedupe 3 Backup and Recovery 4 2 5 Overview Next generation backup and recovery solution Superior capabilities Next Gen Tightly integrated with vSphere Webclient Integration 6 Key Features Up to 100 VMs per appliance 100 VMs Up to 8 TB of De-duplicated backup data capacity per appliance 8TB Dedupe Up to 10 VDP virtual appliances are supported per vCenter 10 appliances 7 Key Features Powered by EMC Avamar Bundled with vSphere 5.1 Essentials & +, Standard, Enterprise & Enterprise Plus Variable Length Dedupe 8 Prerequisites vSphere Web Client vCenter Server 5.1 SSO & Inventory Services VDP Appliance VDP Plugin 9 Important URLs Configuration: https://<VDP_IP>:8543/vdp-configure Management URL: https://<vCenter_IP>:9443/vsphere-client FLR Portal: https://<VDP_IP>:8543/flr Default Credentials: root/changeme 10 Agenda What Is VDP? Concepts Gathering the log bundle Log Analysis Backup Best Practices Commands Resources
11 Terminologies General Backup VMware ESXi & ESX SNAPSHOT Snapshot: Preserve state of VM at point in time including power state. Full Backup: Complete backup of VM. Full Backup Differential Differential: Files changed since last FULL backup. 4 Incremental: Files changed since last backup. 4 Incremental 5 File Level Restore (FLR): Restore files individually. 5 1 2 3 1 2 3 12 Terminologies Backup Types Full Backup Cumulative or Differential Incremental Full Cumulative Incremental 13 Terminologies VMware Specific CBT: Identifies disk sectors altered. Microsoft VSS: Automatic or manual backups and snapshots of data.
Quiescing: Pause or alter running processes that can modify disk during backup.
Steady State: When data being imported to the dedupe store is less or equal to the amount of data being pruned 14 Terminologies - RPO & RTO Recovery Time Objective (RTO) How quickly you need to have applications back up and running after downtime. Recovery Point Objective (RPO) Point to which data must be restored to successfully resume work.
RTO RPO Major Incident Last backup Backup Data Restored 15 Terminologies - Deduplication VM-A A B C D E F 1 2 3 VM-B F E D 3 1 2 C A B Source Object Pointers Data Compression A B C D E F 1 3 2 X Y Z Identify duplicate or redundant data Only unique data is stored Saves pointers instead of multiple copies Consumes less disk space 16 Backup Process Sticky Byte Factoring Compression Hashing Store hash and Data on GSAN # 1 2 3 4 1 2 3 4 17 Sticky Byte Algorithm Data chunks average size is 24kB Data chunks vary in size between 1 and 64kB 10000000001000000000 00100000000000000000 10010001000001000001 10101010001010001010 10kB 25kB 5kB 40kB 10000000000000000000 00110000000000000000 10010001000001000001 10101010001010001010 5kB 10kB 25kB 40kB First Backup Subsequent Backup Change in VM 18 Terminologies Compression Chunks are compressed to 30% - 50% there original size Average compressed chunk size 12 kB 16kB Compression occurs when we can achieve =>25% compression 2kB 1kB 5kB 8kB 10kB 25kB 5kB 40kB 19 Terminologies Hashing Hashing continues until a single root hash for the backup is created Atomic hashes are combined to create composites. The hash created from each data object is called an atomic hash. Data is used to create the hash, but it is not converted into the hash 1 2 3 4 20 VMware Backup History VDP 2013 -> TBA VDR 2009 -> TBA VCB 2006 - 2010 21 Agenda What Is VDP? Concepts Gathering the log bundle Log Analysis Backup Best Practices Commands Resources 22 Log Procurement Open the VDP configure URL Click Collect Logs Name appropriately 23 How To Scope a VDP Issue Who? 1 What? When? 3 Where? 4 2 24 Core Services Scheduler /usr/local/avamar/var/mc/server_log/mcserver.log
ALG File 30 LOG File About the Process 2013-03-05 01:03:37 avvcbimage Info <9754>: VDDK IO 102400.00 MB, Performance: 297.5 MB/minute, Duration: 05:44:15 2013-03-04 16:38:53 avvcbimage Warning <14654>: The in-use blocks (pass 1) could not be found for 'VDP- 136243273610203b57a3b4bb8946f82f4a78bdb8e0d0da870a', using disk extents. 2013-03-05 01:09:25 avvcbimage Error <9769>: Timeout on wait for spawned avtar process to complete 2013-03-05 01:09:25 avvcbimage FATAL <16018>: The datastore information from VMX '[VMStore1] VDP_Protected_VM/VDP_Protected_VM.vmx' will not permit a restore or backup. LOG File 31 Finding the Work Order Logs Quickly # cd /usr/local/avamarclient/var-proxy-3 # IFS=$(echo -en "\n\b");for i in `ls *.alg`;do grep -m 1 " START" $i | rev | awk '{print $4" "$5}' | rev;grep vmname $i|awk -F\" '{print $4}';echo $i;echo;done;unset IFS
2013-03-04 16:32:14 VM_Name_1 Daily 5 Day Retention-1362432700504- 618a82a5277ebb1dd536b018a407a21582926e6a-3016-vmimagew.alg
2013-03-05 16:07:30 VM_Name_1 Daily 5 Day Retention-1362517629476- 6acb4658af622ac48a52d73247aad95b1887af7c-3016-vmimagew.alg Finding Work Orders 32 Scenario 1 /usr/local/avamar/var/mc/server_log/mcserver.log* /usr/local/avamar/var/vdr/server_logs/vdr-server* /usr/local/avamar/var/log/dpnctl.log* /usr/local/avamarclient/var-proxy-N/avagent*.log /data01/cur/gsan.log Logs 33 Scenario 1 2013-03-05 23:01:35 avvcbimage Info <16001>: Found 1 disk(s), 0 snapshots, and 1 snapshot ctk files, on the VMs datastore. 2013-03-05 23:01:35 avvcbimage Warning <16002>: Too many extra snapshot files (1) were found on the VMs datastore. This can cause a problem for the backup or restore. 2013-03-05 23:01:35 avvcbimage FATAL <16018>: The datastore information from VMX '[VMStore1] VDP_Protected_VM/VDP_Protected_VM.vmx ' will not permit a restore or backup. 2013-03-05 23:01:35 avvcbimage Info <0000>: Starting graceful (staged) termination, Too many pre-existing snapshots will not permit a restore. (wrap-up stage) 2013-03-05 23:01:35 avvcbimage Error <9759>: createSnapshot: snapshot creation failed LOG File 34 Scenario 2 $grep "Node restarted" ./data01/cur/err.log 2013/02/26-17:52:38.81009 {P0.0} [gsan] <0017> Node restarted When? 2013/02/26-17:52:35.07740 {0.0} [strtask.6:3281] <0055> checkpoint cp.20130223140423 3300 out of 3590 stripes complete
2013/02/26-17:52:36.21084 {0.0} [perfbeat.0:273] WARN: <0963> server node 0.0 is swapping: check configuration
2013/01/24-01:09:47.04134 {0.0} [perfbeat.7:197] WARN: <1060> perfbeat::outoftolerance mask=[backup,restore] average=2191.09 limit=219.1092 mbpersec=0.04 /data01/cur/gsan.log 37 Scenario 2 #grep perfbeat /data01/cur/err.log | awk '{print $1"="$10}' | awk -F= '{print $1" - "$3}' 2013/02/18-13:16:05.93532 - 10.95 2013/02/18-13:19:40.12223 - 2013/02/18-13:20:44.07831 - 25.40 Performance Data 2013/02/18-13:19:40.12223 {0.0} [perfbeat.0:218] WARN: <0963> server node 0.0 is swapping: check configuration Swapping 38 What Next? Review the monitor logs (vmware.log) at the time of the incident for both the VDP appliance and the target VM. 1 Review the vCenter logs at the time logs at the time of the incident 2 Review the ESX logs (hostd/vmkernel) at the time of the incident. 3 39 Agenda What Is VDP? Concepts Gathering the log bundle Log Analysis Backup Best Practices Troubleshooting Administration Commands Resources
40 Should only be used to resume daily backups. Should not be used as a workaround except in extreme conditions. Backup Best Practices - Troubleshooting Redploy VDP Define: Who , What , When Where and WHY SCOPE - W5 Understand how the product works and which modules communicate with other modules. Communications 41
Plan your deployment Backup Best Practices Administration Plan Ensure your storage infrastructure can handle the capacity and load. Always use HCL hardware
Storage Separate and group the workload between appliances, or deduplication stores Separate 42 Check backups regularly, do not set and forget Backup Best Practices Administration Set And Forget Think about single points of failure and consider correcting these conditions. Single Points Of Failure At => 60% space utilization be mindful of storage consumption. Consumption 43 Limit on-demand backups during the maintenance window Backup Best Practices Administration On Demand Backups Avoid initiating on-demand maintenance activities (CP, CP Validation, or GC) On Demand Maintenance 44 Backup Best Practices Administration Check the status of the deduplication store. (Checkpoints) Check the status of the backup subsystems. Review any failed backups. Weekly Test restore plan. Ensure business continuity. Review and correct any new trends. Review storage performance, and storage growth. Monthly / Quarterly 45 Agenda What Is VDP? Concepts Gathering the log bundle Log Analysis Backup Best Practices Commands Resources 46 Commands - MCCLI root@vdp:~/#: mccli server show-prop
State Full Access Total capacity 535.7 GB Capacity used 1.7 GB Server utilization 0.3% Bytes protected 10.0 GB Time since Server initialization 21 days 21h:48m Last checkpoint 2013-03-27 11:26:37 PDT Last validated checkpoint 2013-03-27 11:26:37 PDT System Name vdp.vdp.lab IP address 192.168.2.99:26000 show-prop 47 Commands - MCCLI root@vdp:~/#: mccli server show-services
Name Status -------------------------------- --------------------------- Hostname vdp.vdp.lab IP Address 192.168.2.99 Load Average 0.97 Last Admin Datastore Flush 2013-04-18 07:45:00 PDT PostgreSQL database Running 192.168.2.103 All vCenter connections OK. show-services 48 Agenda What Is VDP? Concepts Gathering the log bundle Log Analysis Backup Best Practices Commands Resources
50 Other VMware Activities Related to This Session HOL: HOL-SDC-1305 Business Continuity and Disaster Recovery In Action Group Discussions: BCO1002-GD Data Protection and Backup with Jeff Hunter
BCO4756 THANK YOU VMware vSphere Data Protection (VDP) Technical Deep Dive And Troubleshooting Session Darryl Hing, VMware Canada Jacy Townsend, VMware BCO4756 #BCO4756