Sunteți pe pagina 1din 8

My Data Domain Notes

Data Domain:
DeDuplication Types:
- File based DeDuplication
Fixed-Length Segment DeDuplication
Variable-Length Segment DeDuplication
Post-Process DeDuplication
In-Line DeDuplication
Data Domain System Introduction
DeDuplicating hardware system
Inline
Variable-length segments
Fingerprints
Controller
Processors and RAM
Ethernet and Fibre Channel connections
Storage
-Low-cost SATA disk drives
-RAID 6 in software
-NVRAM used to protect unwritten data
Data Domain DeDuplication:
# Source Based DeDuplication
Uses DD Boost with DSP(distributed segment processing)
# Target Based DeDuplication
accessible through CIFS, NFS, and VTL protocols
Data Domain Global Compression:
# Global Compression Equals to DeDuplication and cant be turned of
# Local Compression Compress data segments before writing to disk, Equals to file compressions(uses
algorithms lz, gz and gzfast) and can be turned of.
Stream-Informed Segment Layout (SISL) scaling architecture:
# SISL architecture provides fast and efficient deduplication:
99% of duplicate data segments are identified inline in RAM before they are stored to disk.
System throughput increases directly as CPU performance increases.
Minimizes the disk footprint by minimizing disk access.
The Data Domain system DeDuplication How it works:
1. Segment Data sliced into segments
2. Fingerprint Segments given fingerprint ID (segment ID)
3. Filter Fingerprint IDs compared to fingerprints in cache1.If fingerprint ID new, continue2. If fingerprint
ID duplicate, reference, then delete
4. Compress Groups of new segments compressed using common technique(lz, gz, gzfast)
5. Write Segments (including fingerprints, metadata, & logs)written to containers,containers written to
disk
Data Invulnerability Architecture:
- Data Invulnerability Architecture is an important EMC Data Domain technology that provides safe and
reliable storage.
The EMC Data Domain operating system (DD OS) is built for data protection. Its elements comprise an
architectural design whose goal is data invulnerability. There are four technologies within the Data
Invulnerability Architecture that fight data loss:
1. End-to-end verification # Verify Stripe Integrity
# Verify user data Integrity
# Verify file system metadata Integrity
2. Fault avoidance and containment
# New data never overwrites good data. (The system never puts existing data at risk.)
# There are fewer complex data structures
# The system includes non-volatile RAM (NVRAM) for fast, safe restarts
3. Continuous fault detection and healing
# periodically rechecks the integrity of the RAID stripes and container logs
# uses RAID system redundancy to heal faults
# During every read, data integrity is re-verified
# Any errors are healed as they are encountered

4. File system recoverability


# is a feature that reconstructs lost or corrupted file system metadata. It includes file system check
tools.
Data Domain file systems:
# The administrative file system (called ddvar) /ddvar
# The storage file system (called Mtree) /backup (/data/col1)
Data Domain System Protocols:
# NFS Network file system (NFS) clients can have access to Data Domainsystem directories and Mtrees.
# CIFS The Common Internet FileSystem (CIFS) clients can have access to Data Domain system
directories and Mtrees
# VTL The virtual tape library (VTL) protocol enables backup applications to connect to and manage
Data Domain system storage as if it were a tape library. All of the functionality generally supported by a
physical tape library is available with a Data Domain system configured as a VTL. The movement of data
from a system configured as a VTL to a physical tape is managed by backup software (not by the Data
Domain system). The VTL protocol is used with Fibre Channel networking.
# DD Boost The DD Boost protocol enables backup servers to communicate with storage systems
without the need for Data Domain systems to emulate tape. There are two components to DD Boost: one
component that runs on the backup server and another component that runs on a Data Domain system
# NDMP If the VTL communication between a backup server and a Data Domain system is trough
NDMP,no Fibre Channel (FC) is required. When you use NDMP, all initiator and port functionality does not
apply.
Data Domain Data Paths:
# Data Domain data paths over Fibre Channel networks VTL
# Data Domain data paths over Ethernet networks NFS, CIFS, DD Boost and NDMP
Data Domain administration interfaces:
# The Enterprise Manager, which is the graphical user interface (GUI)
# The command line interface (CLI) Access CLI via SSH, serial console, telnet, keyboard & monitor
Data Domain Initial Configuration using Enterprise Manager Configuration Wizard: (Command Line
config setup Command)
Configuration Wizard consists of these sections:
1. Licenses,
2. Network,
3. File system,
4. System,
5. CIFS,
6. NFS.
Data Domain Manage System Access:
# User Privileges: 3 Type of Classes
1. admin
2. user
3. security
# Administration access:
Services:
1. telnet
2. ftp
3. http
4. https
5. ssh
Hardware:
Storage Disks:
1. Active tier
2. Usable disks
3. Failed/Foreign/Absent Disks
Foreign Disks The foreign state indicates that the disk contains valid Data Domain file system data and
alerts the administrator to the presence of this data to make sure it is attended properly. This commonly
happens during chassis swaps, or when new shelves are added to an active system.
NetWork Interfaces:
>Link Aggregation Definition:

Using multiple Ethernet network cables, ports, and interfaces (links) in parallel, link aggregation
increases network throughput, across a LAN or LANs, until the maximum computer speed is reached.
>Link Aggregation Bonding Types:
1. Round robin Transmits packets in sequential order from first available link through the last in the
aggregated group.
2. Balanced Data sent over the interfaces as determined by the hash method you select.
3. LACP Similar to balanced except for the control protocol that communicates with the other end and
coordinates what links, within the bond, are available. It provides heartbeat fail-over.
>Link Failover Definition:
# Definition
A virtual interface may include both physical and virtual interfaces as members (called interface group
members).
# How It Works
Link failover is supported by a bonding driver on a Data Domain system. The bonding driver checks the
carrier signal on the active interface every 0.9 seconds. If the carrier signal is lost, the active interface is
changed to another standby interface. An address resolution protocol (ARP) is sent to indicate that the
data must flow to the new interface
>Manage VLAN and IP Alias:
VLAN and IP alias network interfaces are used:
# For network security
#To segregate network traffic
#To speed up network traffic
#To organize a network
How It Works:
If youre not using VLANs, you can use IP aliases. IP aliases are easy to implement and are less expensive
than VLAN, but they are not a true VLAN. For example, you must use one IP address for management
and another IP address to back up or archive data. You can combine VLANs and IP aliases.
Data Management:
>Snapshot:
Snapshot location: /data/col1/backup/
ex: /data/col1/backup/austin/.snapshot ; /data/col1/backup/scla/.snapshot
where, .snapshot is a directory
# Replication dont replicate snapshot of a volume, it has to be manually configured for replication.
>Fast Copy:
A fast copy copies files and directory trees of a source directory to a target directory on a Data Domain
system. You can use the fast copy operation to retrieve data stored in snapshots. Fastcopy takes space
(its like a clone).
>Retention Lock: Licensed feature
- Retention lock is an optional, system-licensed software feature that enables organizations to protect
their data in non-writeable and non-erasable formats for a specified length of time, up to 70 years.
Retention lock protects against:
Accidents and user errors
Malicious activity
Data which has been locked using the retention lock feature makes the data non-writeable and nonerasable. Files cannot be modified even after the retention time for the file expires. The retention period
of a retention-locked file can be extended but not reduced.
- In order for a file to become locked with the retention lock, the files access time (called atime) must
be set to a future date that is beyond the minimum retention period configured on the Data Domain
system.
The act of setting the atime is the signal to the Data Domain system to lock the file. As soon as this
value is set, the file is locked and cannot be deleted or modified before that date.
Data sanitization:
- Data sanitization is sometimes referred to as electronic shredding
With the data sanitization function, deleted files can be overwritten using a DoD/NIST compliant
algorithm and procedures
It removes any trace of deleted files with no residual remains preventing normally deleted data from
being recovered.
5 phases of sanitization:
1. Merge
2. Analysis

3. Enumeration
4. Copy
5. Zero
Data Encryption:
- Also called inline data encryption
Protects data on a Data Domain system from unauthorized access or accidental exposure
Requires software license
When data is backed-up, data enters via NFS, CIFS, VTL, DD Boost and NDMP Tape Server protocols. It is
then:
Segmented
Fingerprinted
Deduplicated (or globally compressed)
Grouped
Locally compressed
Encrypted
Important encryption at a more granular level is not possible. Once enabled all the incoming data will
be encrypted.

File System Cleaning:


- Cleaning reclaims physical storage occupied by expired data. For example, as retention periods on
backup software expire data, old backups are removed from the backup catalog. Space from expired
backups becomes available only after a system cleaning process reclaims the disk space.
- When application software expires backup or archive data, they are deleted in the sense that they are
no longer accessible or available for recovery from the application. The data is not deleted immediately;
it is removed during a cleaning operation. In the case of retention lock, expired files will not be deleted
until the retention lock period ends.
- The default time schedule for file system cleaning is every Tuesday at 6 am and The default CPU
throttle is 50%.
- navigate to Data Management > File System > Configuration > Clean Schedule
Data Domain Replication:
Types of Data Domain Replication:
- Directory Replication: For partial site, single directory backup
MTreeReplication: For partial site, point-in-time backup
Pool Replication: In a VTL setting, specified pools of virtual cartridges are treated as a directory
(Destination does not require a VTL license)
Collection Replication: For whole system mirroring (The fastest and lightest impact replication type)
# One fundamental diference between Mtree replication and directory replication is the method used for
determining what needs to be replicated between the source and destination. MTree replication creates
periodic snapshots at the source and transmits the diferences between two consecutive snapshots to
the destination
Replication Topologies:
- 1 to 1
bidirectional
many to 1
1 to many
cascaded
cascaded 1 to many
Replication Seeding:
If the source Data Domain system has a lot of data, the initial replication seeding can take some time
over a slow link. To expedite the initial seeding, you can bring the destination system to the same
location as the source system to use a high-speed, low-latency link. Once data is initially seeded using
the high-speed network, move the system back to its intended location. As data is initially seeded, only
new data is sent from that point onwards.
Low-bandwidth Optimization:
An option that reduces WAN bandwidth utilization
Useful if using a low-bandwidth network link.
Provides additional compression
Only for replication with <6 Mb/s available bandwidth
Use bandwidth and network-delay settings together to calculate the proper TCP bufer size for
replication
#Low Bandwidth Optimization Using Delta Compression:

- Delta compression is a global compression algorithm that is applied after identity filtering. The
algorithm looks for previous similar segments using a sketch-like technique that sends only the
diference between previous and new segments.
- Delta compression reduces the amount of data to be replicated over low-bandwidth WANs by
eliminating the transfer of redundant data found with replicated deduplicated data. This feature is
typically beneficial to remote sites with lower Data Domain models
Resynchronize Recovered Data:
Resynchronization is the process of recovering (or bringing back into sync) the data between a source
and destination replication pair after a manual break in replication.
EMC DD Boost:
- EMC Data Domain Boost extends the backup optimization benefits of Data Domain deduplication
storage solutions by distributing parts of the deduplication process to the backup server or application
client. DD Boost dramatically increases throughput speeds, minimizes backup LAN load, and improves
backup server utilization.
- In a typical backup environment using in-line deduplication, client data is sent to a Data Domain system
where the data is identified in segments. These segments are identified to be unique data or duplicate
segments. If they are unique, they are compressed and written to the storage subsystem on the Data
Domain.
DD Boost Features:
- Centralized replication awareness and management Backup application well aware of replication
enabled on the DD end and easy recovery of data can be done from the data residing in failover node.
Distributed segment processing (DSP)
Advanced load balancing and failover via interface groups
DD Boost Deduplication and Distributed Segment Processing:
Steps:
1.
2.
3.
4.
5.

Segment the data


Mark finger print for the segmented data
compare the finger printed segments with DD
Filter the unique data
send and write the unique data in DD

DD Boost Configuration Symantec NetBackup


Backup Host:
1.
2.
3.
4.

License as required
Create devices, pools through backup server management console
Configure backup policies and groups to use Data Domain configured devices
Configure duplicate to use Data Domain configured devices on desired Data Domain systems.

Source DD:
1.
2.
3.
4.

License DD Boost.
Enable DD Boost
Set a Data Domain local user as a DD Boost user.
Create DD Boost storage units

Replica DD:
1. License DD Boost
2. Enable DD Boost
3. Set a Data Domain local user as a DD Boost user.
4. Create DD Boost storage units .
# NetBackupconsole: Configure Data Domain systems as disk storage servers
a.Install Data Domain OST plug-in
b.Configure disk storage servers type OST
c.Create storage lifecycle policy
# Configure Data Domain systems (A and B) for Boost
a.Enable DD Boost
b.Set user
c.Create storage unit and CIFS share
# NBU Console: Configure Backup Policy
a.Create a Backup Policy
b.Apply Storage Lifecycle Policy to Backup Policy
# NBU Console: Monitor Activity for Backup and Opt. Duplication
a.Start backup policy and monitor activity
b.Monitor file replication

# NBU Console: Restore files from system B


a.Restore from secondary copy
b.Verify Restored Files
#Verify Files on Data Domain systems A and B
a.Verify File Replication/Space Usage Stats
b.Validate backup files and file replication files
Data Domain System Performance Metrics:
#system show performance Command
-Utilizationproc recv send idle
- - - proc-percent of time spent processing network requests
recv-percent of time spent receiving requests over the network
send-percent of time spent sending requests over the network
idle-percent of time waiting for network data transfers2receivesendbackup
# system show performance
Utilization -StateCPU disk CDBVMSF
-avg/maxmax
92%/ 94%[3] 57%[14] V34%/ 36%[3] 66%[06] VState:
C cleaning
D disk reconstruction
B currently unused
V verification (used in the deduplication process)
M fingerprint merge (used in the deduplication process)
S summary vector checkpoint (used in the deduplication process)
F currently unused
Monitor Throughput:
# system show stats 2
Tuning Solutions:
Reduce stream count
Dont clean during heavy input
Dont replicate during heavy input
Consider using link aggregation
Isolate network
Consider implementing DD Boost
Monitor a Data Domain System:
SNMP
Syslog
Support bundle
Autosupport logs and alert messages
Autosupport logs and alert messages:
Report the system status and identify potential system problems
Provide daily notification of the systems condition
Send email notifications to specific recipients for quicker, targeted responses
Supply critical system data to aid support case triage and management
DD Operating System Upgrade:
Release Types
RA, IA, and GA Restricted availability, Initial Availability and General Availability
There is no down-grade path
Read all release notes before upgrading
When in doubt, contact Support before installing an upgrade
Preparing for DDOS Upgrade:
- Are you upgrading more than two release families at a time
## 4.7 to 4.9 is considered two families
## 4.7 to 5.0 is more than two families and requires two upgrades
Time required
## Single upgrades can take about 45 minutes or more
## During the upgrade, the Data Domain file system is unavailable
## Shutting down processes, rebooting after upgrade, and checking the upgrade all take time
Replication
## Do not disable replication on either system in the pair

## Upgrade the destination (replica) before upgrading the source (originator)


Stop any CIFS client connections before beginning the upgradeModule
Working on VTL Configuration:
Setting Up a Virtual Tape Library:

Enable VTL
Create a Library
Create Tapes
Import Tapes

#I# Enable VTL:


1. In the More Tasks menu, select Service > Enable.
The Enable Service dialog box appears.
2. In the Enable VTL dialog box, click OK.
The Enable Service Status dialog box appears.
3. When the Enable Service Status dialog box displays
Completed, click Close.
#II# Create a Library
1. In the More Tasks menu, click Library Create.
2. Enter the VTL library information:
Library Name Name can be from 1 to 32 alphanumeric characters.
Number of Drives From 1 to 256 tape drives. Systems with 4 G of memory (DD4xx, DD510 and DD530)
can have a maximum of 64 drives.
Systems with 8 G to 24 G (DD560to DD690) can have a maximum of 128 drives. The DD880 with 48 G of
memory can have up to 256 tape drives.
Drive Model IBM-LTO-1
IBM-LTO-2
IBM-LTO-3
Number of Slots Number of slots in the library:
Up to 32,000 slots per library
Up to 64,000 slots per system
This should be equal or greater than the number of drives.
Number of CAPs (Optional) Number of cartridge access ports (CAPs):
Up to 100 CAPs per library
Up to 2000 CAPs per system
Changer Model Name Click the drop-down list and select the model:
L180
RESTORER-L180
TS3500
Check the backup software application documentation on the Data Domain support site for the model
name that you should use.
3. Click OK.
#III# Creating Tapes
The default capacities for each IBM LTO drive type are as follows:
LTO-1 drive: 100 GB
LTO-2 drive: 200 GB
LTO-3 drive: 400 GB
#IV# Importing tapes
Importing moves existing tapes from the vault to a library slot, drive, or cartridge access port (CAP). The
number of tapes that you can import at one time is limited by the number of empty slots in the library.
(You cannot import more tapes than the number of currently empty slots.)
1. In the Tapes view, either:
a. Enter search information about the tapes to import and
click Search:
2. From the Import Tapes: library view, verify the summary information and the tape list, and click Next.
3. Click Close on the status window.
Working with Access Groups:
A VTL access group (or VTL group) is created to hold a collection of initiator WWPNs or aliases and the
drives and changers they are
allowed to access. As well, a default group exists named TapeServer, where you can add devices that will
support NDMP-based backup applications.
Access group configuration allows initiators (in general backup applications) to read and write data to the
devices that are also in
the access group.

Access groups allow clients to access only selected LUNs (media changers or virtual tape drives) on a
system. A client that is set up
for an access group can access only devices that are in its access group.
Note: Avoid making access group changes on a Data Domain system during active backup or restore
jobs. A change may cause an active job to fail. The impact of changes during active jobs depends on a
combination of backup software and host configurations.
View Access Group Information:
LUNs Tab LUN, Library, Device, In-Use Ports, Primary Ports, Secondary Ports
Initiators Tab Initiator, WWPN
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
Data Domain Encryption :Encryption of Data at Rest or inline data encryption
Protects from lost/stolen, accidental expose to a lost drive, or intrusion
Requires a license
Enables data on system drives or external storage to be encrypted, while being saved and locked, before
its moved to another location
All ingested data is encrypted
Data that exists on the Data Domain before enabling encryption is not automatically encrypted but can
be later
Inline Encryption happens during the Data Domain SISL Process:

Segment>fingerprint>Deduplicate (globally compress)>Group>Locally compress>Encrypt


The following Protocols can be encrypted as data is ingested: NFS, CIFS, VTL, DDBoost and NDMP tape
server

The available types of Encryption are:


128bit or 256 AES (Advanced Encryption Standard)
CBC mode
Or both CBC (Cipher Block Chaining) and GCM (Galios/Counter mode)
*One important thing to remember is that all data entering DD system will be encrypted; there are NO
other granular levels of encryption available

The feature can be enabled on the Encryption tab in File System shows status

Also, do not forget an Encryption passphrase when locking or unlocking file system or disabling
Encryption; do not lose your passphrase, this is imperative
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
Data Domain DD860 Technical Specifications Real Size 64 TB
- Applied Backup read throughput we are getting 100 GB/Hour
Logical Capacity (Standard) 1.4 5.7 PB (*)(****)(*****)
Logical Capacity (Redundant) 7.1 28.5 PB(**)(****)(*****)
Max. Throughput (Other) 5.1 TB/hr (Maximum throughput achieved using Symantec OpenStorage and 10
Gb Ethernet)
Max. Throughput (DD Boost) 9.8 TB/hr (***)
Power Dissipation 608 W
Cooling Requirement 2 075 BTU/hr
Data Domain DD7200 Technical Specifications Real Size 96 TB
Capacity (Raw) Max. Usabe: 428 TB;
Max. Usabe w/ DD Extended Retention: 1.7 PB
Logical Capacity (Standard) 4.2 21.4 PB (*)(**)
Logical Capacity (Redundant) w/ DD Extended Retention: 17.1 85.6 PB (*)(**)
Max. Throughput (Other) 11.9 TB/hr (Maximum throughput achieved using NFS and 10 Gb Ethernet) (**)
Max. Throughput (DD Boost) 26.0 TB/hr (Maximum throughput achieved using DD Boost and 10 Gb
Ethernet)
Posted in EMC Data Domain

S-ar putea să vă placă și