Sunteți pe pagina 1din 50

Effective Planning and Use of TSM V6 and V7 Dedupl ication

Effective Planning and Use of IBM Tivoli Storage Manager V6 and V7


Dedupl ication
12/ 09 / 2013
2.0


Authors:
Jason Basler
Dan Wolfe
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 1 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
Document Location
T%is is a snaps%ot of an on&line document# Pape" copies a"e valid onl' on t%e da' t%e' a"e p"inted# T%e
document is sto"ed at t%e follo(ing location:
%ttps://(((#i)m#com/develope"(o"*s/communit'/(i*is/%ome+lang,en-/(i*i/Tivoli Sto"age
Manage"/page/Deduplication
Revision History
.evision
/um)e"
.evision
Date
Summa"' of 0%anges
1#0 01/17/12 2nitial pu)lication
1#1 01/!1/12 0la"ification on dedup"e3ui"es)ac*up option and ot%e" mino"
edits
1#2 06/10/1! 4ene"al updates on )est p"actices
1#! 06/27/1! 5dd info"mation cove"ing deduplication of E6c%ange data
2#0 12/0/1! Ma7o" "evision to "eflect scala)ilit' and )est p"actice
imp"ovements p"ovided )' TSM 6#!#8#200 and 7#1#0
Disclai mer
T%e info"mation contained in t%is document is dist"i)uted on an 9as is9 )asis (it%out an' (a""ant' eit%e"
e6p"essed o" implied#
T%is document %as )een made availa)le as pa"t of 2:M develope";o"*s ;2<2= and is %e"e)' gove"ned )' t%e
te"ms of use of t%e ;2<2 as defined at t%e follo(ing location:
%ttps://(((#i)m#com/develope"(o"*s/communit'/te"ms/
Acknowledgements
T%e aut%o"s (ould li*e to e6p"ess t%ei" g"atitude to t%e follo(ing people fo" cont"i)utions in t%e fo"m of adding
content= editing= and p"oviding insig%t into TSM tec%nolog'#
Matt Anglin= Tivoli Sto"age Manage" Se"ve" Development
Dave Cannon= Tivoli Sto"age Manage" 5"c%itect
Robert Elder, Tivoli Sto"age Mange" Pe"fo"mance Evaluation
Tom Hughes= E6ecutive> ;; Sto"age Soft(a"e
Kathy Mitton= Tivoli Sto"age Manage" Se"ve" Development
Harley Puckett= Tivoli Sto"age Soft(a"e Development & E6ecutive 0onsultant
Michael isco= Tivoli Sto"age Manage" Se"ve" Development
Richard !urlock= 0E? and @ounde"= 0o)alt 2"on
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 2 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
Contents
1 2nt"oduction
1#1 ?ve"vie(
1#1#1 Desc"iption of deduplication tec%nolog'
1#1#2 Data "eduction and data deduplication
1#1#! Se"ve"&side and client&side deduplication
1#1#8 P"e&"e3uisites fo" configu"ing TSM deduplication
1#1#$ 0ompa"ing TSM deduplication and appliance deduplication
1#2 0onditions fo" effective use of TSM deduplication
1#2#1 T"aditional TSM a"c%itectu"es compa"ed (it% deduplication a"c%itectu"es
1#2#2 E6amples of app"op"iate use of TSM deduplication
1#2#! Data c%a"acte"istics fo" effective deduplication
1#! ;%en is it not app"op"iate to use TSM deduplication+
1#!#1 P"ima"' sto"age of )ac*up data is on VTA o" p%'sical tape
1#!#2 /o fle6i)ilit' (it% t%e )ac*up p"ocessing (indo(
1#!#! .esto"e pe"fo"mance conside"ations
2 .esou"ce "e3ui"ements fo" TSM deduplication
2#1 Data)ase and log siBe "e3ui"ements
2#1#1 TSM data)ase capacit' estimation
2#1#2 TSM data)ase log siBe estimation
2#2 Estimating capacit' fo" deduplicated sto"age pools
2#2#1 Estimating sto"age pool capacit' "e3ui"ements
2#! Ca"d(a"e "ecommendations and "e3ui"ements
2#!#1 Data)ase 2/? "e3ui"ements
2#!#2 0PD
2#!#! Memo"'
2#!#8 0onside"ations fo" t%e sto"age pool dis*
2#!#$ Ca"d(a"e "e3ui"ements fo" TSM client deduplication
! 2mplementation guidelines
!#1 Deciding )et(een client and se"ve" deduplication
!#2 TSM Deduplication configu"ation "ecommendations
!#2#1 .ecommendations fo" deduplicated sto"age pools
!#2#2 .ecommended options fo" deduplication
!#2#! :est p"actices fo" o"de"ing )ac*up ingestion and data maintenance tas*s
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page ! of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
8 Estimating deduplication savings
8#1 @acto"s t%at influence t%e effectiveness of deduplication
8#1#1 0%a"acte"istics of t%e data
8#1#2 2mpacts f"om )ac*up st"ateg' decisions
8#2 Effectiveness of deduplication com)ined (it% p"og"essive inc"emental )ac*up
8#! 2nte"action of comp"ession and deduplication
8#!#1 Co( deduplication and comp"ession inte"act (it% TSM
8#!#2 0onside"ations "elated to comp"ession (%en c%oosing )et(een client&side and se"ve"&side
deduplication
8#8 Dnde"standing t%e TSM deduplication tie"ing implementation
8#8#1 0ont"ols fo" deduplication tie"ing
8#8#2 T%e impact of tie"ing to deduplication sto"age "eduction
8#8#! 0lient cont"ols t%at optimiBe deduplication efficienc'
8#$ ;%at *inds of savings can 2 e6pect fo" diffe"ent application t'pes
8#$#1 2:M D:2
8#$#2 Mic"osoft E6c%ange
8#$#! Mic"osoft SEA
8#$#8 ?"acle
8#$#$ VM(a"e
$ Co( to dete"mine deduplication "esults
$#1 Simple TSM Se"ve" Eue"ies
$#1#1 EDE.F ST4P??A
$#1#2 ?t%e" se"ve" 3ue"ies affected )' deduplication
$#2 TSM client "epo"ts
$#! TSM deduplication "epo"t sc"ipt
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 8 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
1 Introduction
Data deduplication is a tec%nolog' t%at "emoves "edundant data to "educe t%e sto"age capacit' "e3ui"ement
fo" "etaining t%e data# ;%en deduplication tec%nolog' is applied to data p"otection it can p"ovide a %ig%l'
effective means fo" "educing ove"all cost of a data p"otection solution# Tivoli Sto"age Manage" int"oduced
deduplication tec%nolog' )eginning (it% TSM V6#1# T%is document desc"i)es t%e )enefits of deduplication
and p"ovides guidance on %o( to ma*e effective use of t%e TSM deduplication featu"e as pa"t of a (ell&
designed data p"otection solution# T%e info"mation p"ovided )' t%is document is "elevant to )ot% TSM
Ve"sion 6 and Ve"sion 7# Significant en%ancements %ave )een made t%at impact t%e scala)ilit' of TSM
deduplication )eginning in TSM se"ve" levels 6#!#8#200 and 7#1#0# Man' of t%e "ecommendations t%"oug%out
t%e document assume 'ou a"e "unning one of t%ese levels o" ne(e"#
@ollo(ing a"e *e' points "ega"ding TSM deduplication:
TSM deduplication is an effective tool fo" "educing ove"all cost of a )ac*up solution
5dditional "esou"ces GD: capacit'= 0PD= and memo"'H must )e configu"ed fo" a TSM se"ve" t%at is
ena)led (it% TSM deduplication# Co(eve"= (%en p"ope"l' configu"ed= t%e )enefit of sto"age pool
capacit' "eduction (ill "esult in a significant cost "eduction )enefit#
0ost "eduction is t%e "esult of data "eduction# Deduplication is 7ust one of seve"al met%ods t%at TSM
p"ovides fo" data "eduction Gsuc% as p"og"essive inc"emental )ac*upH# T%e goal is ove"all data
"eduction (%en all of t%e tec%ni3ues a"e com)ined= "at%e" t%an 7ust on t%e deduplication "atio#
TSM deduplication can ope"ate on )ac*up= a"c%ive= and CSM data# T%is includes data (%ic% is
sto"ed via t%e TSM 5P2#
TSM deduplication is an app"op"iate data "eduction met%od fo" man' situations# 2t can also )e used
as a cost effective option fo" )ac*ing up a su)set of an envi"onment t%at uses a deduplication
appliance fo" t%e "emaining )ac*ups#
T%is document is intended to p"ovide guidance specific to t%e use of TSM deduplication# T%e document does
not p"ovide comp"e%ensive inst"uction and guidance fo" t%e administ"ation of TSM= and s%ould )e used in
addition to t%e TSM p"oduct documentation#
1.1 Overview
1.1.1 Description of deduplication technology
Deduplication tec%nolog' detects patte"ns (it%in data t%at appea" multiple times (it%in t%e scope of a
collection of data# @o" t%e pu"poses of t%is document= t%e collection of data consists of TSM )ac*up= a"c%ive=
and CSM data Gall of t%ese t'pes of data (ill )e "efe""ed to as I)ac*up dataJ t%"oug%out t%is documentH# T%e
patte"ns t%at a"e detected a"e "ep"esented as a %as% value t%at is muc% smalle" t%an t%e o"iginal patte"n=
specificall' 20 )'tes# E6cept fo" t%e o"iginal instance of t%e patte"n= su)se3uent instances of t%e c%un* a"e
"efe"enced )' t%e %as% value# 5s a "esult= fo" a patte"n t%at appea"s man' times t%"oug%out a given
collection of data= significant "eduction in sto"age can )e ac%ieved#
Dnli*e comp"ession= deduplication can ta*e advantage of a patte"n t%at occu"s multiple times (it%in a
collection of data# ;it% comp"ession= a single instance of a patte"n is "ep"esented )' a smalle" amount of
data t%at is used to algo"it%micall' "ec"eate t%e o"iginal data patte"n# 0omp"ession cannot ta*e advantage of
common data patte"ns t%at "eoccu" t%"oug%out t%e collection of data= and t%is significantl' "educes t%e
potential "eduction capa)ilit'# Co(eve"= comp"ession can )e com)ined (it% deduplication to ta*e advantage
of )ot% tec%ni3ues and fu"t%e" "educe t%e "e3ui"ed amount of data sto"age )e'ond 7ust one tec%ni3ue o" t%e
ot%e"#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page $ of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
1.1.1.1 TSM deduplication use compared with other deduplication approaches
Deduplication tec%nolog' of an' so"t "e3ui"es 0PD and memo"' "esou"ces to detect and "eplace duplicate
c%un*s of data= as desc"i)ed t%"oug%out t%is section# Soft(a"e )ased tec%nologies suc% as TSM
deduplication c"eate simila" outcomes to %a"d(a"e )ased o" appliance tec%nologies#
:' using a soft(a"e )ased solution t%e need to p"ocu"e specialiBed= and t%e"efo"e compa"ativel' e6pensive=
dedicated %a"d(a"e is negated# T%is means t%at )' using TSM )ased deduplication standa"d %a"d(a"e
components suc% as se"ve" and sto"age can )e used# :ecause TSM %as significant data efficiencies
compa"ed to ot%e" soft(a"e )ased deduplication tec%nologies Gsee section 1#1#2 of t%is documentH t%e"e is
less duplicate data to detect p"ocess and "emove# T%e"efo"e= all ot%e" t%ings )eing e3ual= TSM "e3ui"es less
of t%is standa"d %a"d(a"e "esou"ce to function compa"ed to ot%e" soft(a"e )ased deduplication tec%nologies#
0a"e s%ould still )e ta*en in planning and implementing t%is tec%nolog'= )ut unde" t%e ma7o"it' of use cases
TSM p"ovides a via)le p"oven tec%nical platfo"m (%e"e availa)le# ;%e"e not availa)le= suc% as (%en
pe"fo"ming )ac*ups ove" t%e sto"age a"ea net(o"* GS5/H alte"nate tec%nologies= suc% as a VTA p"ovide an
app"op"iate a"c%itectu"al solution#
T%e diag"am )elo( outlines t%e "efe"ence a"c%itectu"es fo" t%ese uses cases and %ig%lig%ts some *e'
conside"ations#
1.1.1.2 How does TSM perform deduplication
TSM uses an algo"it%m to anal'Be va"ia)le siBed= contiguous segments of data= called Ic%un*sJ= fo" patte"ns
t%at a"e li*el' to )e duplicated (it%in t%e same TSM sto"age pool# T%is p"ocess is e6plained in mo"e detail in
a late" section in t%is document# 5s desc"i)ed a)ove t%e "epeated identical c%un*s of data a"e "emoved and
"eplaced (it% a smalle" pointe"#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 6 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
T%e implementation of TSM deduplication onl' applies to t%e @2AE device class Gse3uential&access dis*H
sto"age pools= and can )e used (it% p"ima"'= cop'= o" active&data pools#
1.1.2 Data reduction and data deduplication
;%en using data deduplication to su)stantiall' "educe sto"age capacit' "e3ui"ements it is impo"tant to
conside" ot%e" data "eduction tec%ni3ues t%at a"e availa)le# ;%en conside"ing t%e effectiveness of
deduplication= t%e deduplication "atio= o" pe"centage of "eduction is conside"ed to )e t%e ultimate
measu"ement of effectiveness# Dnli*e ot%e" )ac*up p"oducts= TSM p"ovides a su)stantial advantage in data
"eduction t%"oug% its native capa)ilit' to )ac* up data onl' once Grather than create dupl icate data by
repeatedly backing up unchanged fi les and other dataH# TSM p"ovides a genuine measu"e of success
"at%e" t%an claiming efficiencies t%at a"e in fact 7ust "emoving self c"eated inefficiencies#
2n%e"ent efficienc' com)ined (it% deduplication= comp"ession= e6clusion of specified o)7ects= and app"op"iate
"etention policies= ena)les TSM to p"ovide %ig%l' effective data "eduction# 2f "eduction of sto"age and
inf"ast"uctu"e costs is t%e goal= t%e focus (ill )e on ove"all data "eduction effectiveness= (it% data
deduplication effectiveness as one component# T%e follo(ing ta)le p"ovides a summa"' of t%e data "eduction
tec%nologies t%at TSM offe"s:
0lient
comp"ession
2nc"emental
fo"eve"
Su)file )ac*up Deduplication
Co( data "eduction is
ac%ieved
0lient
comp"esses
files
0lient onl'
sends c%anged
files
0lient onl'
sends c%anged
"egions of a file
Eliminates "edundant
data c%un*s
0onse"ves net(o"*
)and(idt%+
Fes Fes Fes
;%en client&side
deduplication is used#
Data suppo"ted
:ac*up=
a"c%ive=
CSM= 5P2
:ac*up
:ac*up
G;indo(s onl'H
:ac*up= a"c%ive= CSM=
5P2 GCSM suppo"ted
onl' fo" se"ve"&side
deduplicationH
Scope of data
"eduction
.edundant
data (it%in
same file on
client node
@iles t%at do not
c%ange )et(een
)ac*ups
Dnc%anged
"egions (it%in
p"eviousl'
)ac*ed up files
.edundant data f"om
an' data in sto"age
pool
5voids sto"ing identical
files "enamed= copied=
o" "elocated on client
node+
/o /o /o Fes
.emoves "edundant
data fo" files f"om
diffe"ent client nodes+
/o /o /o Fes
0an )e used (it% an'
t'pe of sto"age pool
configu"ation+
Fes Fes Fes /o
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 7 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
1.1.3 Server- side and client - side deduplication
TSM p"ovides t(o options fo" pe"fo"ming deduplication: client&side and se"ve"&side deduplication# :ot%
met%ods use t%e same algo"it%m to identif' "edundant data= %o(eve" t%e I(%enJ and I(%e"eJ of t%e
deduplication p"ocessing is diffe"ent#
1.1.3.1 Server- side deduplication
;it% se"ve"&side deduplication= all of t%e p"ocessing of "edundant data occu"s on t%e TSM se"ve"= afte" t%e
data %as )een )ac*ed up# Se"ve"&side deduplication is also called Ita"get&sideJ deduplication# T%e *e'
c%a"acte"istics of se"ve"&side deduplication a"e:
Duplicate data is identified afte" )ac*up data %as )een t"ansfe""ed to t%e sto"age pool volume#
T%e duplicate identification p"ocessing must "un "egula"l' on t%e se"ve"= and (ill consume TSM
se"ve" memo"'= 0PD and TSM data)ase "esou"ces#
Sto"age pool data "eduction is not "ealiBed until data f"om t%e deduplication sto"age pool is moved to
anot%e" sto"age pool volume= usuall' t%"oug% a "eclamation p"ocess= )ut can also occu" du"ing a
TSM IM?VE D5T5J p"ocess#
1.1.3.2 Client - side deduplication
0lient&side deduplication p"ocesses t%e "edundant data du"ing t%e )ac*up p"ocess on t%e %ost s'stem (%e"e
t%e sou"ce data is located# T%e net "esults of deduplication a"e vi"tuall' t%e same as (it% se"ve"&side
deduplication= e6cept t%at t%e sto"age savings a"e "ealiBed immediatel'= since onl' t%e uni3ue data needs to
)e sent to t%e se"ve" in its enti"et'# Data t%at is duplicated "e3ui"es onl' a small signatu"e to )e sent to t%e
TSM se"ve"# 0lient&side deduplication is especiall' effective (%en it is impo"tant to conse"ve )and(idt%
)et(een t%e TSM client and se"ve"# 2n some cases= client&side deduplication %as t%e potential to )e mo"e
scala)le t%an se"ve"&side deduplication due to t%e "educed 2/? demands t%at "esult f"om %o( it immediatel'
"emoves "edundant data )efo"e it is sent to t%e TSM se"ve"# 5 num)e" of conditions must e6ist fo" t%is to )e
t%e case:
Sufficient client 0PD "esou"ce to pe"fo"m t%e duplicate identification p"ocessing t%at occu"s in&line
du"ing )ac*up#
T%e a)ilit' to d"ive pa"allel client sessions (%e"e t%e num)e" of client sessions e6ceeds t%e num)e"
of identif' duplicates p"ocesses t%e se"ve" is capa)le of "unning#
T%e com)ination of t%e TSM data)ase "unning on fast dis*= and a %ig% )and(idt% lo( latenc'
net(o"* )et(een t%e clients and se"ve"#
1.1.3.2.1 0lient deduplication cac%e
5lt%oug% it is necessa"' fo" t%e )ac*up client to Ic%ec* inJ (it% t%e se"ve" to dete"mine (%et%e" a c%un* is
uni3ue o" a duplicate= t%e amount of data t"ansfe" is small# T%e client must 3ue"' t%e se"ve" fo" eac% c%un*
of data t%at is p"ocessed# T%e ove"%ead associated (it% t%is 3ue"' p"ocess can )e "educed su)stantiall' )'
configu"ing a cac%e on t%e client= (%ic% allo(s p"eviousl' discove"ed c%un*s on t%e client Gdu"ing t%e )ac*up
sessionH to )e identified (it%out a 3ue"' to t%e TSM se"ve"# @o" t%e )ac*up&a"c%ive client Gincluding VM(a"e
)ac*up=H it is "ecommended to al(a's configu"e a cac%e (%en using client&side deduplication# @o"
applications t%at use t%e TSM 5P2= t%e deduplication cac%e s%ould not )e used due to t%e potential fo"
)ac*up failu"es caused )' t%e cac%e )eing out of s'nc (it% t%e TSM se"ve"# 2f multiple= concu""ent TSM
client sessions a"e configu"ed Gsuc% as (it% a TSM fo" VM(a"e vSto"age )ac*up se"ve"H= t%e"e must )e a
sepa"ate cac%e configu"ed fo" eac% session# T%e"e a"e also conditions (%e"e faste" pe"fo"mance (ill )e
possi)le (%en t%e deduplication cac%e is disa)led# ;%en t%e net(o"* )et(een t%e clients and se"ve" %as
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 1 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
%ig% )and(idt% and lo( latenc' and t%e TSM se"ve" data)ase is on fast sto"age= t%e deduplication 3ue"ies
di"ectl' to t%e TSM se"ve" can outpe"fo"m 3ue"ies to t%e local cac%e#
1.1.4 Pre- requisites for configuring TSM deduplication
T%is section p"ovides gene"al desc"iption of p"e&"e3uisites (%en using TSM deduplication# @o" a complete list
of p"e&"e3uisites "efe" to t%e TSM administ"ato" documentation#
1.1.4.1 Pre- requisites common to client and server- side deduplication
T%e destination sto"age pool must )e of t'pe I@2AEJ Gse3uential dis*H
T%e ta"get sto"age pool must %ave t%e deduplication setting ena)led
T%e TSM data)ase must )e configu"ed acco"ding to )est p"actices fo" %ig% pe"fo"mance
1.1.4.2 Pre- requisites specific to client - side deduplication
;%en configu"ing client&side TSM deduplication= t%e follo(ing "e3ui"ements must )e met:
T%e client and se"ve" must )e at ve"sion 6#2#0 o" late"# T%e latest maintenance ve"sion s%ould
al(a's )e used#
T%e client must %ave t%e client&side deduplication option ena)led GDEDDPA205T2?/ FESH#
T%e se"ve" must ena)le t%e node fo" client&side deduplication (it% t%e DEDDP,0A2E/T?.SE.VE.
pa"amete" using eit%e" t%e .E42STE. /?DE o" DPD5TE /?DE commands#
@iles must )e )ound to a management class (it% t%e destination pa"amete" pointing to a sto"age pool
t%at is ena)led fo" deduplication#
:' default= all client files t%at a"e at least 2<: and smalle" t%an t%e value specified )' t%e se"ve"
clientdeduptxlimit option a"e p"ocessed (it% deduplication# T%e exclude.dedup client option p"ovides
a featu"e to selectivel' e6clude ce"tain files f"om client&side deduplication p"ocessing#
T%e follo(ing TSM featu"es a"e incompati)le (it% TSM client&side deduplication:
0lient enc"'ption
A5/&f"ee/sto"age agent
D/2K CSM client
Su)file )ac*up
Simultaneous sto"age pool ("ite
1.1.5 Comparing TSM deduplication and appliance deduplication
TSMLs deduplication p"ovides t%e most cost effective solution fo" "educing )ac*up sto"age costs= since t%e"e
is no additional soft(a"e license c%a"ge fo" it= and it does not "e3ui"e special pu"pose deduplicating %a"d(a"e
appliances# Deduplication of )ac*up data can also )e accomplis%ed )' using a deduplicating sto"age device
in t%e TSM sto"age pool %ie"a"c%'# Deduplication appliances suc% as 2:MLs P"otecT2E. and EM0Ls Data
Domain p"ovide deduplication capa)ilit' at t%e sto"age device level# /5S devices a"e also availa)le t%at
p"ovide /@S o" 02@S mounted sto"age t%at "emoves "edundant data t%"oug% deduplication#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
5 optimal )alance can )e made )et(een TSM deduplication and sto"age appliance deduplication# :ot%
tec%ni3ues can )e used in t%e same envi"onment fo" sepa"ate sto"age %ie"a"c%ies o" in sepa"ate TSM se"ve"
instances# @o" e6ample= TSM client&side deduplication is an ideal c%oice fo" )ac*ing up "emote envi"onments=
eit%e" to a local TSM se"ve" o" to a cent"al datacente"# TSM node "eplication can t%en ta*e advantage of t%e
deduplicated sto"age pools to "educe data t"ansfe" "e3ui"ements )et(een TSM se"ve"s= fo" disaste" "ecove"'
pu"poses# 5lte"nativel'= (it%in a la"ge datacente"= a sepa"ate TSM se"ve" ma' )e designated fo" )ac*ing up a
c"itical su)set of all %osts using TSM deduplication# T%e "emaining %osts (ould )ac* up to a sepa"ate TSM
se"ve" instance t%at uses a deduplicating appliance suc% as P"otecTie" fo" its p"ima"' sto"age pool and also
suppo"ts "eplication of t%e deduplicated data#
TSM deduplication s%ould not )e used in t%e same sto"age %ie"a"c%' as a deduplicating appliance# @o" a
deduplicating VTA= t%e TSM sto"age pool data (ould need to )e I"e%'d"atedJ )efo"e moving to t%e VTA Gas
(it% an' tape deviceH= and t%e"e (ould )e no data "eduction as a "esult of t%e TSM deduplication "at%e" it
(ould )e "e&deduplicated )' t%e VTA# @o" a deduplicating /5S device= a @2AE device t'pe could )e c"eated
on t%e /5S# Co(eve"= since t%e data is al"ead' deduplicated )' TSM t%e"e (ould )e little to no additional
data "eduction possi)le )' t%e /5S device#
1.1.5.1 Factors to consider when comparing TSM and appliance deduplication
T%e"e a"e t%"ee ma7o" facto"s to conside" (%en deciding (%ic% deduplication tec%nolog' to use:
Scale
Scope
0ost
1.1.5.1.1 Scale
T%e TSM deduplication tec%nolog' is a scala)le solution (%ic% uses soft(a"e tec%nolog' t%at ma*es %eav'
use of TSM data)ase t"ansactions# T%e deduplication p"ocessing %as an impact on dail' se"ve" p"ocesses
suc% as "eclamation and sto"age pool )ac*up# @o" a specific TSM se"ve" %a"d(a"e configu"ation Gfo"
e6ample= TSM data)ase dis* speed= p"ocesso" and memo"' capa)ilit'= and sto"age pool device speedsH=
t%e"e is a p"actical limit to t%e amount of data t%at can )e )ac*ed up using deduplication#
T%e t(o p"ima"' points of scala)ilit' to conside" a"e t%e dail' amount of ne( data t%at is ingested= as (ell as=
t%e total amount of data (%ic% (ill )e p"otected ove" time# T%e p"actical limits desc"i)ed a"e not %a"d limits in
t%e p"oduct= and (ill va"' )ased on t%e capa)ilities of t%e %a"d(a"e (%ic% is used# T%e limit on t%e amount of
p"otected data is p"esented as a guideline (it% t%e pu"pose of *eeping t%e siBe of t%e TSM data)ase )elo(
t%e "ecommended limit of 8T:# 5 8T: data)ase co""esponds "oug%l' to 800T: of p"otected data Go"iginal
data plus all "etained ve"sionsH# T%e"e is no %a"m in occasionall' e6ceeding t%e limit fo" dail' ingest (%ic% is
p"esc"i)ed (it% t%e goal of allo(ing enoug% time eac% da' fo" t%e TSM se"ve"Ls maintenance tas*s to "un
efficientl'# .egula"l' e6ceeding t%e p"actical limit on dail' ingest fo" 'ou" specific %a"d(a"e ma' %ave an
impact on t%e a)ilit' to ac%ieve t%e ma6imum possi)le amount of data "eduction= o" cause )ac*up du"ations
to "un longe" t%an desi"ed#
Deduplication appliances %ave dedicated "esou"ces fo" deduplication p"ocessing and do not %ave a di"ect
impact on TSM se"ve" pe"fo"mance and scala)ilit'# 2f it is desi"ed to scale up a single TSM se"ve" instance as
muc% as possi)le= )e'ond app"o6imatel' 800T: of p"otected data Go"iginal data plus all "etained ve"sionsH=
t%en appliance deduplication ma' )e conside"ed# Co(eve"= often a mo"e cost&effective app"oac% is to scale
out (it% additional TSM se"ve" instances# Dsing additional TSM se"ve" instances can p"ovide t%e a)ilit' to
manage man' multiples of 800T: p"otected data#
2n addition to t%e scale of data sto"ed= t%e scale of t%e dail' amount of data )ac*ed up (ill also %ave a
p"actical limit (it% TSM# T%e dail' ingest is esta)lis%ed )' t%e capa)ilities of s'stem "esou"ces as (ell as t%e
inclusion of seconda"' p"ocesses suc% as "eplication and sto"age pool )ac*up# Since deduplicating
appliances a"e single&pu"pose devices= t%e"e is t%e potential fo" g"eate" t%"oug%put due to t%e use of
dedicated "esou"ces# 5 cost/)enefit anal'sis s%ould )e conside"ed to dete"mine t%e app"op"iate c%oice o"
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 10 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
mi6 of deduplication tec%nologies# T%e follo(ing ta)le p"ovides some gene"al guidelines fo" dail' ingest
"anges fo" eac% TSM se"ve" "elative to %a"d(a"e configu"ation c%oices#
Ingest range Server requi rements Storage requi rements
Dp to 8T: pe" da' 12 0PD co"es
68 4: .5M
Data)ase and active log on S5S/@0
1$< "pm
Sto"age pool on /A&S5S/S5T5 o"
S5S
8 & 1 T: pe" da' 28 0PD co"es
121 4: .5M
Data)ase and active log on S5S/@0
1$< "pm
Sto"age pool on /A&S5S/S5T5 o"
S5S
1 & 20 T: pe" da'
and up to !0T: pe" da' (it%
client&side deduplication#
!2 0PD co"es
12 4: .5M
Data)ase and active log on
SSD/flas% sto"age
Sto"age pool on S5S
1.1.5.1.2 Scope
T%e scope of TSM deduplication is limited to a single TSM se"ve" instance and mo"e p"ecisel' (it%in a TSM
sto"age pool# 5 single= s%a"ed deduplication appliance can p"ovide deduplication ac"oss multiple TSM
se"ve"s#
;%en TSM node "eplication is used in a man'&to&one a"c%itectu"e= suc% as (it% )"anc% offices= t%e
deduplicated sto"age pool on t%e "eplication ta"get can deduplicate ac"oss t%e data incoming f"om t%e
multiple sou"ce se"ve"s#
1.1.5.1.3 0ost
TSM deduplication functionalit' is em)edded in t%e p"oduct (it%out an additional soft(a"e license cost> in fact
TSM soft(a"e license costs (ill "educe (%en capacit' )ased licensing is in fo"ce )ecause t%e capacit' is
calculated afte" deduplication %as occu""ed# 2t is impo"tant to conside" t%at %a"d(a"e "esou"ces must )e
app"op"iatel' siBed and configu"ed# 5dditional e6pense s%ould )e anticipated (%en planning a TSM se"ve"
configu"ation t%at (ill )e used (it% deduplication# Co(eve"= t%ese additional costs can easil' )e offset )' t%e
savings in dis* sto"age# 5lso= t%e soft(a"e license costs a"e "educed (%en capacit'&)ased p"icing is in effect#
Deduplication appliances a"e p"iced fo" t%e pe"fo"mance and capa)ilit' t%at t%e' p"ovide= and gene"all' a"e
conside"ed mo"e e6pensive pe" 4: t%an t%e %a"d(a"e "e3ui"ements fo" TSM native deduplication# 5 detailed
cost compa"ison s%ould )e done to dete"mine t%e most cost&effective solution#
1.2 Conditions for effective use of TSM deduplication
5lt%oug% TSM deduplication p"ovides a cost&effective and convenient met%od fo" "educing t%e amount of dis*
sto"age "e3ui"ed fo" )ac*ups= t%e"e a"e specific conditions t%at can p"ovide t%e most )enefit (%en using TSM
deduplication# 0onve"sel'= t%e"e a"e conditions (%e"e TSM deduplication (ill not )e effective and in fact ma'
"educe t%e efficienc' of a )ac*up ope"ation#
0onditions t%at lead to effective use of TSM deduplication including t%e follo(ing:
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 11 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
/eed fo" "eduction of t%e dis* space "e3ui"ed fo" )ac*up sto"age#
/eed fo" "emote )ac*ups ove" limited )and(idt% connections#
Dse of TSM node "eplication fo" disaste" "ecove"' ac"oss geog"ap%icall' dispe"sed locations#
Total amount of )ac*up data and data )ac*ed up pe" da' a"e (it%in t%e "ecommended limits of less
t%an 800T: total and !0T: pe" da' fo" eac% TSM se"ve" instance#
Eit%e" a dis*&to&dis* )ac*up s%ould )e configu"ed G(%e"e t%e final destination of )ac*up data is on a
deduplicating dis* sto"age poolH= o" data s%ould "eside in t%e @2AE sto"age pool fo" a significant time
Ge#g#= !0 da'sH= o" until e6pi"ation# T%e deduplication sto"age pools s%ould not )e used as a
tempo"a"' staging pool )efo"e moving to tape o" anot%e" non&deduplicating sto"age pool since t%is
can )e %ig%l' inefficient#
:ac*up data s%ould )e a good candidate fo" data "eduction t%"oug% deduplication# T%is topic is
cove"ed in g"eate" detail in late" sections#
Cig% pe"fo"mance dis* must )e used fo" t%e TSM data)ase to p"ovide accepta)le TSM deduplication
pe"fo"mance#
1.2.1 Traditional TSM architectures compared with deduplication
architectures
5 t"aditional TSM a"c%itectu"e ingests data into dis* sto"age pools= and moves t%is data to tape on a f"e3uent
)asis to maintain ade3uate f"ee space on dis* fo" continued ingestion# 5n a"c%itectu"e t%at includes
deduplication c%anges t%is model to sto"e t%e p"ima"' cop' of data in a se3uential file sto"age pool fo" its
enti"e life c'cle# Deduplication p"ovides enoug% sto"age savings to ma*e *eeping t%e p"ima"' cop' on dis*
an affo"da)le possi)ilit'#
Tape sto"age pools still %ave a place in t%is a"c%itectu"e fo" maintaining a seconda"' sto"age pool )ac*up
cop' fo" disaste" "ecove"' pu"poses o" fo" data (it% ve"' long "etention pe"iods fo" e6ample 7'ea"s o" fo"eve"#
?t%e" a"c%itectu"es a"e possi)le (%e"e data "emains in deduplicated sto"age pools fo" onl' a po"tion of its life
c'cle= )ut t%is "e3ui"es "econst"ucting t%e deduplicated o)7ects and can defeat t%e pu"pose of spending t%e
p"ocessing "esou"ces t%at a"e "e3ui"ed to deduplicate t%e data#
Tip: 5void a"c%itectu"es (%e"e data is moved f"om a deduplicated sto"age pool to a non&deduplicated
sto"age pool= (%ic% (ill fo"ce t%e deduplicated data to )e "econst"ucted and lose t%e sto"age savings t%at
(e"e p"eviousl' gained#
1.2.2 Examples of appropriate use of TSM deduplication
T%is section contains e6amples of TSM a"c%itectu"es t%at can ma*e t%e most effective use of TSM
deduplication#
1.2.2.1 Deduplication with a secondary storage pool backup architecture
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 12 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
2n t%is e6ample t%e p"ima"' sto"age pool is a file&se3uential dis* sto"age pool configu"ed fo" TSM
deduplication# T%e deduplication sto"age pool is )ac*ed up to a tape li)"a"' cop' sto"age pool using t%e
sto"age pool )ac*up capa)ilit'# T%e use of a seconda"' cop' sto"age pool is an optional featu"e (%ic%
p"ovides an e6t"a level of p"otection against dis* failu"e in 'ou" p"ima"' sto"age pool# Ce"e a"e some gene"al
conside"ations (%en a cop' sto"age pool (ill )e used:
Caving a second cop' sto"age pool using dis* G(%ic% can also )e deduplicatedH is anot%e" option#
Se"ve"&side deduplication is a t(o&step p"ocess (%ic% includes duplicate identification follo(ed )'
"emoval of t%e e6cess data du"ing a su)se3uent data movement p"ocess suc% as "eclamation o"
mig"ation# T%e second step can )e p"evented until afte" a sto"age pool )ac*up cop' is c"eated# See
t%e desc"iption of t%e deduprequi resbackup option in a late" section fo" additional conside"ations#
;%en using se"ve"&side deduplication= sc%edule t%e sto"age pool )ac*up p"ocess p"io" to t%e
"eclamation p"ocessing to ensu"e t%at t%e"e is minimal ove"%ead (%en cop'ing t%e data# 5fte"
identif' duplicates %as "un= t%e data is not deduplicated )ut it is "edefined suc% t%at it can )e
"econst"ucted and de%'d"ated du"ing t%e su)se3uent data movement ope"ation# Sufficient time must
)e allotted fo" t%e sc%eduled sto"age pool )ac*up to complete )efo"e t%e sta"t of t%e sc%edule fo"
"eclamation#
;%en using client&side deduplication= t%e sto"age pool )ac*up p"ocessing (ill al(a's occu" afte" data
%as )een deduplicated# T%is "e3ui"es deduplicated data to )e "econst"ucted du"ing t%e cop' Gif t%e
cop' sto"age pool is not also deduplicatedH# T%e "econst"uction p"ocessing can "esult in sto"age pool
)ac*up p"ocessing (%ic% is slo(e" (%en compa"ed (it% sto"age pool )ac*up p"ocessing of data
(%ic% %as not )een deduplicated# @o" planning pu"poses= estimate t%at t%e du"ation of sto"age pool
)ac*up (ill )e dou)led fo" data (%ic% is al"ead' deduplicated#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 1! of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
1.2.2.2 Deduplication with node replication copy
T%e TSM 6#! "elease p"ovides a node "eplication capa)ilit'= (%ic% allo(s fo" an alte"native a"c%itectu"e (%e"e
deduplicated data is "eplicated to a second se"ve" in an inc"emental fas%ion t%at ta*es advantage of
deduplication )' onl' "eplicating uni3ue data not p"eviousl' "eplicated# T%e "econst"uction penalt' desc"i)ed
in a p"evious section fo" sto"age pool )ac*up of deduplicated data is also avoided#
1.2.2.3 Disk- to- disk backup
IDis*&to&dis* )ac*upJ "efe"s to t%e scena"io (%e"e t%e p"efe""ed )ac*up sto"age device is dis*&)ased= as
opposed to tape o" a vi"tual tape li)"a"' GVTAH# Dis*&)ased )ac*up %as )ecome mo"e popula" as t%e unit cost
of dis* sto"age %as fallen# 2t %as also )ecome mo"e common as companies distinguis% )et(een )ac*up data=
(%ic% is *ept fo" a "elativel' s%o"t amount of time= and a"c%ive data= (%ic% %as long te"m "etention#
Dis*&to&dis* )ac*up still "e3ui"es a )ac*up of t%e sto"age pool data= and t%e )ac*up o" cop' destination ma'
)e tape o" dis*# Co(eve"= (it% dis*&to&dis* )ac*up= t%e p"ima"' sto"age pool data "emains on dis* until it
e6pi"es# 5 significant "eduction of dis* sto"age can )e ac%ieved if t%e p"ima"' sto"age pool is configu"ed fo"
deduplication#
1.2.3 Data characteristics for effective deduplication
;%en conside"ing t%e use of TSM deduplication= 'ou s%ould assess (%et%e" t%e c%a"acte"istics of t%e )ac*up
data a"e app"op"iate fo" deduplication# 5 mo"e detailed desc"iption of data c%a"acte"istics fo" deduplication is
p"ovided in t%e section on estimating deduplication efficienc'# 4ene"al t'pes of st"uctu"ed and unst"uctu"ed
data a"e good candidates fo" deduplication= )ut if 'ou" )ac*up data consists mostl' of uni3ue )ina"' images
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 18 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
o" enc"'pted data= 'ou ma' (is% to e6clude t%ese data t'pes f"om a management class t%at uses a
deduplicated sto"age pool#
1.3 When is it not appropriate to use TSM deduplication?
TSM deduplication can p"ovide significant )enefits and cost savings= )ut it does not appl' to all situations#
T%e follo(ing situations a"e not app"op"iate fo" using TSM deduplication:
1.3.1 Primary storage of backup data is on VTL or physical tape
Movement to tape "e3ui"es I"e%'d"ationJ of t%e deduplicated data# T%is ta*es e6t"a time and "e3ui"es
p"ocessing "esou"ces# 2f "egula" mig"ation to tape is "e3ui"ed= t%e )enefits of using TSM deduplication ma'
)e "educed= since t%e goal is to "educe dis* sto"age as t%e p"ima"' location of t%e )ac*up data#
1.3.2 No flexibility with the backup processing window
TSM deduplication p"ocessing "e3ui"es additional "esou"ces= (%ic% can e6tend )ac*up (indo(s o" se"ve"
p"ocessing times fo" dail' )ac*up activities# @o" e6ample= a duplicate identification p"ocess must "un fo"
se"ve"&side deduplication# 5dditional "eclamation activit' is "e3ui"ed to "emove t%e duplicate data f"om a
sto"age pool afte" t%e duplicate identification p"ocessing completes# @o" client&side deduplication= t%e client
)ac*up speed (ill gene"all' )e "educed fo" local clients G"emote clients ma' not )e impacted if t%e"e is a
)and(idt% const"aintH#
2f t%e )ac*up (indo( %as al"ead' "eac%ed t%e limit fo" se"vice level ag"eements= TSM deduplication could
possi)l' impact t%e )ac*up (indo( fu"t%e" unless ca"eful planning is done#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 1$ of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
1.3.3 Restore performance considerations
.esto"e pe"fo"mance f"om deduplicated sto"age pools is slo(e" t%an f"om a compa"a)le dis* sto"age pool
t%at does not use deduplication# Co(eve"= "esto"e f"om a deduplicated sto"age pool can compa"e favo"a)l' to
"esto"e f"om tape devices fo" ce"tain (o"*loads#
2f fastest "esto"e pe"fo"mance f"om dis* is a %ig% p"io"it'= t%en "esto"e pe"fo"mance )enc%ma"*ing s%ould )e
done to dete"mine (%et%e" t%e effects of deduplication can )e accommodated# T%e follo(ing ta)le compa"es
t%e "esto"e pe"fo"mance of small and la"ge o)7ect (o"*loads ac"oss seve"al sto"age scena"ios#
Storage pool type Small object workload Large object workload
Tape T'picall' slo(e" due to
tape mounts and see*s
T'picall' faste" due to
st"eaming capa)ilities of
mode"n tape d"ives
/on&deduplicated dis* T'picall' faste" due to
a)sence of tape mounts
and 3uic* see* times
0ompa"a)le to o" slig%tl'
slo(e" t%an tape
Deduplicated dis* @aste" t%an tape= slo(e"
t%an non&deduplicated
dis*
Slo(est since data must )e
"e%'d"ated= (%en compa"ed
to tape (%ic% is fast fo"
st"eaming la"ge o)7ects t%at
a"e not sp"ead ac"oss man'
tapes#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 16 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
2 Resource requirements for TSM deduplication
TSM deduplication p"ovides significant )enefits as a "esult of its data "eduction tec%nolog'= pa"ticula"l' (%en
com)ined (it% ot%e" data "eduction tec%ni3ues availa)le (it% TSM# Co(eve"= t%e use of deduplication in
TSM adds additional "e3ui"ements fo" %a"d(a"e and data)ase/log sto"age= (%ic% a"e essential fo" a
successful implementation# ;%en configu"ing TSM to use deduplication= 'ou must ensu"e t%at p"ope"
"esou"ces %ave )een allocated to suppo"t t%e use of t%e tec%nolog'# T%e "esou"ces include %a"d(a"e
"e3ui"ements necessa"' to meet t%e additional p"ocessing pe"fo"med du"ing deduplication= additional sto"age
"e3ui"ements fo" %andling t%e TSM data)ase "eco"ds used to sto"e t%e deduplication catalog= and additional
sto"age "e3ui"ements fo" t%e TSM se"ve" data)ase logs#
T%e TSM inte"nal data)ase pla's a cent"al "ole in ena)ling t%e deduplication tec%nolog'# Deduplication
"e3ui"es additional data)ase capacit' to )e availa)le# 2n addition= t%e"e is a significant inc"ease in t%e
f"e3uenc' of "efe"ences to "eco"ds in t%e data)ase du"ing man' TSM ope"ations including )ac*up= "esto"e=
duplicate identification= "eclamation= and e6pi"ation# T%ese demands on t%e data)ase "e3ui"e t%at t%e
data)ase dis* sto"age )e capa)le of sustaining %ig%e" "ates of 2/? ope"ations t%an (ould )e "e3ui"ed (it%out
t%e use of deduplication#
5s a "esult= planning fo" "esou"ces used )' t%e TSM data)ase is c"itical fo" a successful deduplication
deplo'ment# T%is section guides 'ou t%"oug% t%e estimation of "esou"ce "e3ui"ements to suppo"t TSM
deduplication#
2.1 Database and log size requirements
2.1.1 TSM database capacity estimation
Dse of TSM deduplication significantl' inc"eases t%e capacit' "e3ui"ements of t%e TSM data)ase# T%is
section p"ovides some guidelines fo" estimating t%e capacit' "e3ui"ements of t%e data)ase# 2t is impo"tant to
plan a%ead fo" t%e data)ase capacit' so an ade3uate amount of %ig%e"&pe"fo"ming dis* can )e "ese"ved fo"
t%e data)ase G"efe" to t%e ne6t section fo" pe"fo"mance "e3ui"ementsH#
T%e estimation guidelines a"e app"o6imate= since actual "e3ui"ements (ill depend on man' facto"s including
ones t%at cannot )e p"edicted a%ead of time Gfo" e6ample= a c%ange in t%e data )ac*up "ate= t%e e6act
amount of )ac*up data= and ot%e" facto"sH#
2.1.1.1 Planning database space requirements
T%e use of deduplication in TSM "e3ui"es mo"e sto"age space in t%e TSM se"ve" data)ase t%an (it%out t%e
use of deduplication# ?ne impo"tant point to note is t%at (%en using deduplication= t%e TSM data)ase g"o(s
p"opo"tionall' to t%e amount of data t%at is sto"ed in deduplicated sto"age pools# T%is is )ecause eac%
Ic%un*J of data t%at is sto"ed in a deduplicated sto"age pool is "efe"enced )' an ent"' in t%e data)ase#
;it%out deduplication= eac% )ac*ed&up o)7ect Gt'picall' a fileH is "efe"enced )' a data)ase ent"'= and t%e
data)ase g"o(s p"opo"tionall' to the number of objects t%at a"e sto"ed# ;it% deduplication= t%e data)ase
g"o(s p"opo"tionall' to t%e total amount of data )ac*ed up# T%e follo(ing ta)le p"ovides an e6ample to
illust"ate t%is point:

Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 17 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
Number of
objects stored
Amount of data
being managed
Storage requi rements
"ithout dedu!lication $00 million 200 T: 87$ 4: M
"ith dedu!lication $00 million 200 T: 2000 4: MM
M Dsing "ule&of&t%um) of 1<: of data)ase space pe" o)7ect sto"ed
MM Dsing "ule&of&t%um) of 1004: of data)ase space pe" 10T: of data managed
T%e document Dete"mining t%e impact of deduplication on TSM se"ve" data)ase and sto"age pools p"ovides
detailed info"mation fo" estimating t%e amount of dis* sto"age t%at (ill )e "e3ui"ed fo" 'ou" TSM data)ase#
T%e document p"ovides fo"mulas fo" estimating data)ase siBe )ased on t%e volume of data to )e sto"ed#
5s a simplified "ule&of&t%um) fo" ta*ing a "oug% estimate= 'ou can plan fo" 1004: of data)ase sto"age fo"
eve"' 10T: of data t%at (ill )e p"otected in deduplicated sto"age pools#
2.1.1.2 Database reorganization
T%e TSM se"ve" uses a p"ocess called "eo"ganiBation to "emove f"agmentation t%at can accumulate in t%e
data)ase ove" time# ;%en deduplication is used= t%e num)e" of data)ase "eco"ds inc"eases significantl' fo"
sto"ing info"mation fo" data c%un*s= and as data is e6pi"ed on t%e TSM se"ve"= significant amounts of deletion
occu"s (it%in t%e data)ase inc"easing t%e need fo" "eo"ganiBation# .eo"ganiBation can )e p"ocessed on&line
(%ile t%e TSM se"ve" is "unning= o" off&line (%ile t%e se"ve" is %alted# Depending on 'ou" se"ve" (o"*loads=
'ou mig%t need to disa)le )ot% ta)le and inde6 "eo"ganiBation to maintain se"ve" sta)ilit' and to "elia)l'
complete dail' se"ve" activities# ;it% "eo"ganiBation disa)led= if 'ou e6pe"ience unaccepta)le data)ase
g"o(t% o" se"ve" pe"fo"mance deg"adation= 'ou (ill need to plan offline "eo"ganiBation fo" t%ose ta)les#
T%e TSM data)ase siBing guidelines given in t%e p"evious section include additional space to accommodate
data)ase f"agmentation t%at can g"o( t%e data)ase used space )et(een "eo"ganiBations#
@o" additional info"mation on )est p"actices "elated to "eo"ganiBation and data deduplication= see t%e
tec%note titled Data)ase siBe= data)ase "eo"ganiBation= and pe"fo"mance conside"ations fo" Tivoli Sto"age
Manage" V6 and V7 se"ve"s#
2.1.2 TSM database log size estimation
T%e use of deduplication adds additional "e3ui"ements fo" t%e TSM se"ve" data)ase= active log= and a"c%ive
log sto"age# P"ope"l' siBing t%e sto"age capacit' fo" t%ese components is essential fo" a successful
implementation of deduplication#
2.1.2.1 Planning active log space requirements
T%e data)ase active log sto"es info"mation a)out data)ase t"ansactions t%at a"e in p"og"ess# ;it%
deduplication= t"ansactions can "un longe"= "e3ui"ing mo"e space to sto"e t%e active t"ansactions#
Tip: Dse t%e ma6imum allo(ed siBe fo" t%e active log (%ic% is 1214:#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 11 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
2.1.2.2 Planning archive log space requirements
T%e a"c%ive log sto"es olde" log files fo" completed t"ansactions until t%e' a"e cleaned up as pa"t of t%e TSM
se"ve" data)ase )ac*up p"ocessing# T%e file s'stem %olding t%e a"c%ive log must )e given sufficient capacit'
to avoid "unning out of space= (%ic% can cause t%e TSM se"ve" to )e %alted# Space is f"eed in t%e a"c%ive log
eve"' time a full )ac*up is pe"fo"med of t%e TSM se"ve"Ls data)ase#
See t%e document on SiBing t%e TSM a"c%ive log fo" detailed info"mation on %o( to ca"efull' calculate t%e
space "e3ui"ements fo" t%e TSM se"ve" a"c%ive log#
Tip: 5 file s'stem (it% $004: of f"ee space %as p"oven to )e mo"e t%an ade3uate fo" a la"ge&scale TSM
se"ve" t%at ingests seve"al te"a)'tes a da' of ne( data into deduplicated sto"age pools and pe"fo"ms a full
TSM data)ase )ac*up once a da'# @o" se"ve" (%ic% (ill ingest mo"e t%an 8T: of ne( data eac% da'= an
a"c%ive log (it% 1T: of f"ee space is "ecommended#
2.2 Estimating capacity for deduplicated storage pools
TSM deduplication "atios t'picall' "ange f"om 2:1 G$0N "eductionH to 15:1 G!N "eductionH= and is data
dependent# Ao(e" "atios a"e associated (it% )ac*ups of uni3ue data Ge#g#= suc% as p"og"essive inc"emental
dataH= and %ig%e" "atios a"e associated (it% )ac*ups t%at a"e "epeated= suc% as "epeated full )ac*ups of
data)ases o" vi"tual mac%ine images# Mi6tu"es of uni3ue and "epeated data (ill "esult in "atios (it%in t%at
"ange# 2f 'ou a"enOt su"e of (%at t'pe of data 'ou %ave and %o( (ell it (ill "educe= use !:1 fo" planning
pu"poses (%en compa"ing (it% non deduplicated TSM sto"age pool occupanc'# T%is "atio co""esponds to an
ove"all data "eduction "atio of ove" 1$:1 (%en facto"ing in t%e data "eduction )enefits of p"og"essive
inc"emental )ac*ups#
2.2.1 Estimating storage pool capacity requirements
2.2.1.1 Delayed release of storage pool data
Due to t%e latenc' fo" deletion of data c%un*s (it% multiple "efe"ences= t%e"e is a need fo" It"ansientJ sto"age
associated (it% data c%un*s t%at must "emain in a sto"age pool volume even t%oug% t%ei" associated file o"
o)7ect is deleted o" e6pi"ed# 5s a "esult of t%is )e%avio"= sto"age pool capacit' siBing must account fo" some
pe"centage of data t%at is "etained )ecause of "efe"ences )' ot%e" o)7ects# T%is latenc' can "esult in t%e
dela'ed deletion of sto"age pool volumes#
2.2.1.2 Delayed effect of post - identification processing
Sto"age "eduction does not al(a's occu" immediatel' (it% TSM deduplication# 2n t%e case of se"ve"&side
deduplication= sufficient sto"age pool capacit' is "e3ui"ed to ingest t%e full amount of dail' )ac*up data# ;it%
se"ve"&side deduplication= "emoval of "edundant data does not occu" until afte" sto"age pool "eclamation
completes= (%ic% in tu"n ma' not complete until afte" a sto"age pool )ac*up is done# 2f client&side
deduplication is used= t%is dela' (ill not appl'# Sufficient sto"age pool f"ee capacit' must )e maintained to
accommodate continued )ac*up ingestion#
2.2.1.3 Estimating storage pool capacity requirements
Fou can "oug%l' estimate sto"age pool capacit' "e3ui"ements fo" a deduplicated sto"age pool using t%e
follo(ing tec%ni3ue:
Estimate t%e )ase siBe of t%e sou"ce data
Estimate t%e dail' )ac*up siBe= using an estimated c%ange and g"o(t% "ate
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 1 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
Dete"mine "etention "e3ui"ements
Estimate t%e total amount of sou"ce data )' facto"ing in t%e )ase siBe= dail' )ac*up siBe= and
"etention "e3ui"ements#
5ppl' t%e deduplication "atio facto"
Dplift t%e estimate to conside" t"ansient sto"age pool usage
T%e follo(ing e6ample illust"ates t%e estimation met%od:
Parameter Value Notes
:ase siBe of t%e sou"ce data 80T: Data f"om all clients t%at (ill )e
)ac*ed up to t%e deduplicated
sto"age pool#
Estimated dail' c%ange "ate 2N 2ncludes ne( and c%anged data
.etention "e3ui"ement !0 da's
Estimated deduplication "atio !:1 !:1 assumes comp"ession is used
(it% client&side deduplication
Dplift fo" It"ansientJ sto"age pool volumes !0N
0omputed Values:
Parameter Computati on Resul t
:ase sou"ce data 80T: 80T:
Estimated dail' )ac*up amount 80T: M 0#02 c%ange "ate 0#1T:
Total c%anged data "etained !0 M 0#1T: dail' )ac*up 28T:
Total data "etained 80T: )ase data P 28T: "etained 68T:
.etained data afte" deduplication G!:1 "atioH 68T:/! 21#!T:
Dplift fo" dela's in c%un* deletion G!0NH 21#!T: M 1#! 27#6T:
5dd full dail' )ac*up amount 27#6T: P 0#1T: 21#8T:
.ound up: Sto"age pool capacit' "e3ui"ement 2T:
2.3 Hardware recommendations and requirements
T%e use of deduplication "e3ui"es additional p"ocessing= (%ic% inc"eases t%e TSM se"ve" %a"d(a"e
"e3ui"ements )e'ond (%at is "e3ui"ed (it%out t%e use of deduplication# The most cri tical hardware
requi rement when using dedupl ication is the I /O capabil i ty of the disk system that is used for the
TSM database.
Fou s%ould )egin )' unde"standing t%e )ase %a"d(a"e "ecommendations fo" t%e TSM se"ve"= (%ic% a"e
desc"i)ed in t%e follo(ing documents: 52K= CPDK= Ainu6 616= Ainu6 on Po(e"= Ainu6 on s'stem Q= Sola"is=
;indo(s#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 20 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
5dditional %a"d(a"e "ecommendations a"e made in t%e TSM Ve"sion 6 deplo'ment guide: TSM V6
Deplo'ment .ecommendations
T%e ?ptimiBing Pe"fo"mance guide also p"ovides configu"ation )est p"actices fo" t%e use of deduplication#
2.3.1 Database I/O requirements
@o" optimal pe"fo"mance= fast dis* sto"age is al(a's "ecommended fo" t%e TSM data)ase as measu"ed in
te"ms of 2nput/?utput ?pe"ations Pe" Second G2?PSH# Due to t%e "andom access 2/? patte"ns of t%e TSM
data)ase= minimiBing t%e latenc' of ope"ations t%at access t%e data)ase volumes is c"itical fo" optimiBing t%e
pe"fo"mance of t%e TSM se"ve"# T%e la"ge ta)les used fo" sto"ing deduplication info"mation in t%e TSM
data)ase )"ing a)out an even mo"e significant demand fo" dis* sto"age t%at can %andle a la"ge num)e" of
2?PS#
2n gene"al= s'stems )ased on solid&state dis* tec%nolog' and S5S/@0 p"ovide t%e )est capa)ilities in te"ms of
inc"eased 2?PS# :ecause t%e claims of dis* manufactu"e"s a"e not al(a's "elia)le= (e "ecommend
measu"ing actual 2?PS of a dis* s'stem )efo"e implementing a ne( TSM data)ase#
Details a)out %o( to configu"e %ig% pe"fo"ming dis* sto"age a"e )e'ond t%e scope of t%is document# T%e
follo(ing *e' points s%ould )e conside"ed (%en configu"ing dis* sto"age fo" t%e TSM data)ase:
T%e dis* used fo" t%e TSM data)ase s%ould )e configu"ed acco"ding to )est p"actices fo" a
t"ansactional data)ase#
Ao(&latenc'= %ig%&pe"fo"mance dis* devices o" sto"age su)s'stems s%ould )e used fo" t%e TSM
data)ase sto"age volumes and t%e active log# Slo(e" dis* tec%nolog' is accepta)le fo" t%e a"c%ive
log#
Dis* devices o" sto"age s'stems t%at a"e capa)le of a minimum of app"o6imatel' !000 2?PS a"e
suggested fo" t%e TSM Data)ase dis* device# 5n additional 1000 2?PS pe" T: of dail' ingested data
Gp"e&deduplicationH s%ould )e conside"ed# Ao(e"&pe"fo"ming dis* devices can )e used= )ut
pe"fo"mance ma' not )e optimal# .efe" to t%e Deduplication @5EOs fo" an e6ample configu"ation#
Dis* 2/? s%ould )e dist"i)uted ove" as man' dis* devices and cont"olle"s as possi)le#
TSM data)ase and logs s%ould )e configu"ed on sepa"ate dis* volumes GAD/SH= and s%ould not
s%a"e dis* volumes (it% t%e TSM sto"age pool o" an' ot%e" application o" file s'stem#
2.3.1.1 Using flash storage for the TSM database
Aa) testing %as demonst"ated a significant )enefit to deduplication and node "eplication scala)ilit' (%en
using flas% sto"age fo" t%e TSM data)ase# T%e"e a"e man' c%oices availa)le (%en moving to flas%
tec%nolog'# Aa"ge ingest deduplication testing %as )een pe"fo"med (it% t%e follo(ing classes of flas%&)ased
sto"age:
@las% accele"ation using in&se"ve" P02e adapte"s# @o" e6ample= t%e Cig% 2?PS MA0 GMulti Aevel 0ellH
and Ente"p"ise Value @las% adapte"s fo" 2:M S'stemK Se"ve"s= availa)le in capacities f"om !6$ 4:
to !#2 T:# T%ese adapte"s appea" as )loc* sto"age in t%e ope"ating s'stem= and p"ovide pe"sistent=
lo(&latenc' sto"age#
Solid state d"ive modules GSSDsH as pa"t of a dis* a""a'# @o" e6ample= t%e SSD options availa)le
(it% t%e 2:M Sto"(iBe famil' of dis* a""a's a"e cu""entl' availa)le (it% capacities of 8004: and
1004: eac% and can )e used to )uild a""a's of la"ge" capacit'#
@las% memo"' appliances (%ic% p"ovide a solution (%e"e flas% sto"age can )e s%a"ed ac"oss mo"e
t%an one TSM se"ve"# @o" e6ample= t%e 2:M @las%S'stem famil' of p"oducts t%at a"e cu""entl'
availa)le in siBes "anging f"om $ T: to 20T:#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 21 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
T%e follo(ing a"e some gene"al guidelines to conside" (%en implementing t%e TSM data)ase using solid&
state sto"age tec%nologies:
Solid state sto"age p"ovides t%e most significant )enefit fo" t%e data)ase containe"s and active log#
Testing %as demonst"ated a su)stantial imp"ovement f"om moving t%e active log to solid state sto"age
ve"sus moving onl' t%e data)ase containe"s#
T%e"e is no su)stantial )enefit to placing t%e a"c%ive log on solid state sto"age#
5lt%oug% a costl' design decision= testing %as demonst"ated a $&10N imp"ovement to dail' ingest
capa)ilities (%en using .52D10 fo" t%e data)ase containe" a""a's "at%e" t%an .52D$#
;%en using solid state tec%nolog' fo" t%e data)ase= faste" sto"age pool dis* suc% as S5S 10< ma'
)e "e3ui"ed to gain t%e full )enefit of t%e faste" data)ase sto"age# T%is is pa"ticula"l' t"ue (%en using
se"ve"&side deduplication#
@aste" data)ase access f"om using solid state tec%nolog' allo(s pus%ing t%e pa"allelism to t%e limit
(it% tuning pa"amete"s fo" tas*s suc% as )ac*up sessions= indentif' duplicates p"ocesses=
"eclamation p"ocesses= and e6pi"e invento"' "esou"ces#
2.3.2 CPU
T%e use of deduplication "e3ui"es additional 0PD "esou"ces on t%e TSM se"ve"= pa"ticula"l' fo" pe"fo"ming t%e
tas* of duplicate identification# Fou s%ould conside" using a minimum of at least 1 G2#24%B o" e3uivalentH
p"ocesso" co"es in an' TSM se"ve" t%at is configu"ed fo" deduplication# T%e follo(ing ta)le p"ovides 0PD
"ecommendations fo" diffe"ent "anges of dail' ingest#
Daily ingest
Recommended CPU
cores
Dp to 8T: 12
8T: to 1T: 16
1T: to !0T: !2
2.3.3 Memory
@o" t%e %ig%est pe"fo"mance of a la"ge&scale TSM se"ve" using deduplication= additional memo"' is
"ecommended# T%e memo"' is used to optimiBe t%e f"e3uent loo*up of deduplication c%un* info"mation
sto"ed in t%e TSM data)ase#
5 minimum of 684: of s'stem memo"' s%ould )e conside"ed fo" TSM se"ve"s using deduplication# 2f t%e
"etained capacit' of )ac*up data g"o(s= t%e memo"' "e3ui"ement ma' need to )e as %ig% as 124:# 2t is
)eneficial to monito" memo"' utiliBation on a "egula" )asis to dete"mine if additional memo"' is "e3ui"ed# T%e
follo(ing ta)le p"ovides s'stem memo"' guidance fo" diffe"ent "anges of dail' ingest#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 22 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
Daily ingest
Recommended system
memory
Dp to 8T: 684:
8T: to 1T: 1214:
1T: to !0T: 124:
2.3.4 Considerations for the storage pool disk
T%e speed of t%e dis* tec%nolog' used fo" t%e deduplicated sto"age pool also %as significant implications to
t%e ove"all pe"fo"mance of a deduplication solution# 2n gene"al= using c%eape" dis* suc% as S5T5 is desi"a)le
fo" t%e sto"age pool to *eep t%e ove"all cost do(n# To p"event t%e use of slo(e" dis* tec%nolog' f"om
impacting pe"fo"mance= it is impo"tant to dist"i)ute t%e sto"age pool 2/? ac"oss a ve"' la"ge num)e" of dis*s#
T%is can )e accomplis%ed )':
1# 0"eating a la"ge num)e" of volumes (it%in t%e sto"age a""a'# 5lt%oug% t%e optimal num)e" of
volumes is dependent upon t%e envi"onment= testing %as s%o(n t%at !2 volumes can p"ovide an
effective configu"ation#
2# P"esent all of t%ese volumes as file s'stems to t%e TSM se"ve"Ls device class definition so t%at 2/?
f"om activities suc% as )ac*up ingest (ill )e dist"i)uted ac"oss all of t%e volumes in pa"allel#
!# Pus% t%e pa"allelism of tas*s suc% as duplicate identification and "eclamation to t%e uppe" limits using
t%e options (%ic% cont"ol t%e num)e" of p"ocesses used )' t%e tas*s to d"ive 2/? ac"oss all of t%e
dis*s in t%e sto"age pool# Mo"e info"mation on t%is topic follo(s in late" sections#
@o" s'stems t%at (ill %andle ve"' la"ge dail' ingests )e'ond 1T: pe" da'= faste" S5S o" @0 10< dis* is
"ecommended fo" t%e sto"age pool dis*# T%is is pa"ticula"l' t"ue (%en using se"ve"&side deduplication to
accommodate t%e additional 2/? "e3ui"ed fo" identif' duplicates and "eclamation p"ocessing#
2.3.5 Hardware requirements for TSM client deduplication
0lient&side deduplication Gand comp"ession if used (it% deduplicationH "e3ui"es "esou"ces on t%e client
s'stem fo" p"ocessing# P"io" to deciding to use client&side deduplication 'ou s%ould ve"if' t%at client s'stems
%ave ade3uate "esou"ces availa)le du"ing t%e )ac*up (indo( to pe"fo"m t%e deduplication p"ocessing# 5
suggested minimum 0PD "e3ui"ement is t%e e3uivalent of one 2#24%B 0PD co"e pe" )ac*up p"ocess (it%
client&side deduplication# 5s an e6ample= a s'stem (it% a single&soc*et= 3uad&co"e= 2#24%B p"ocesso" t%at is
utiliBed 7$N o" less du"ing t%e )ac*up (indo( (ould )e a good candidate to use client&side deduplication#
?ne 0PD co"e s%ould also )e planned fo" eac% pa"allel )ac*up st"eam (it%in a p"ocess fo" client t'pes t%at
suppo"t t%is suc% as TSM fo" Vi"tual Envi"onments# Testing %as demonst"ated a simila" )enefit to lo(e"ing
0PD usage du"ing client&side deduplication "esulting f"om adding 0PD soc*ets compa"ed to using mo"e 0PD
co"es pe" soc*et#
T%e"e is no significant additional memo"' "e3ui"ement fo" client s'stems t%at use client&side deduplication#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 2! of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
3 Implementation guidelines
5 successful implementation of TSM deduplication "e3ui"es ca"eful planning in t%e follo(ing a"eas:
2mplementing an app"op"iate a"c%itectu"e suita)le fo" using deduplication
P"ope"l' siBing 'ou" TSM se"ve" %a"d(a"e and sto"age
0onfigu"ing TSM follo(ing )est p"actices fo" sepa"ating data ingestion and data maintenance tas*s
3.1 Deciding between client and server deduplication
5fte" 'ou decide on an a"c%itectu"e using deduplication fo" 'ou" TSM se"ve"= 'ou need to decide (%et%e" 'ou
(ill pe"fo"m deduplication on t%e TSM clients= t%e TSM se"ve"= o" using a com)ination of t%e t(o# T%e TSM
deduplication implementation allo(s sto"age pools to manage deduplication pe"fo"med )' )ot% clients and
t%e TSM se"ve"# T%e se"ve" is optimiBed to onl' pe"fo"m deduplication on data t%at %as not )een
deduplicated )' t%e TSM clients# @u"t%e"mo"e= duplicate data can )e identified ac"oss o)7ects "ega"dless of
(%et%e" t%e deduplication is pe"fo"med on t%e client o" se"ve"# T%ese )enefits allo( fo" %')"id configu"ations
t%at efficientl' appl' client&side deduplication to a su)set of clients= and use se"ve"&side deduplication fo" t%e
"emaining clients#
T'picall' a com)ination of )ot% client&side and se"ve"&side data deduplication is t%e most app"op"iate# Ce"e
a"e some fu"t%e" points to conside":
Se"ve"&side deduplication is a t(o&step p"ocess of duplicate data identification follo(ed )'
"eclamation to "emove t%e duplicate data# 0lient&side deduplication sto"es t%e data di"ectl' in a
deduplicated fo"mat= eliminating t%e need fo" t%e e6t"a "eclamation p"ocessing#
Deduplication on t%e client can )e com)ined (it% comp"ession to p"ovide t%e la"gest possi)le sto"age
savings#
0lient&side deduplication p"ocessing can inc"ease )ac*up du"ations# E6pect inc"eased )ac*up
du"ations if net(o"* )and(idt% is not "est"ictive# 5 dou)ling of )ac*up du"ations is a "easona)le
estimate (%en using client&side deduplication in an envi"onment t%at is not const"ained )' t%e
net(o"*# 2n addition= if 'ou (ill )e c"eating a seconda"' cop' using sto"age pool )ac*up (%e"e t%e
cop' sto"age pool is not using deduplication= t%e data movement (ill ta*e longe" due to t%e e6t"a
p"ocessing "e3ui"ed to "econst"uct t%e deduplicated data#
0lient&side deduplication can outpe"fo"m se"ve"&side deduplication (it% a %ig%&pe"fo"ming TSM
se"ve" configu"ation and a lo(&latenc' net(o"* connection )et(een t%e clients and se"ve"# 2n
addition= (%en com)ining deduplication (it% node "eplication= client&side deduplication sto"es data on
t%e TSM se"ve" in a deduplicated state t%at is "ead' fo" immediate "eplication t%at (ill ta*e advantage
of t%e node "eplication a)ilit' to conse"ve )and(idt% )' not sending data c%un*s t%at %ave p"eviousl'
)een "eplicated#
0lient&side deduplication can place a significant load on t%e TSM se"ve" in cases (%e"e a la"ge
num)e" of clients a"e simultaneousl' d"iving deduplication p"ocessing# T%e load is a "esult of t%e
TSM se"ve" p"ocessing duplicate c%un* 3ue"ies f"om t%e clients# Se"ve"&side deduplication= on t%e
ot%e" %and= t'picall' %as a "elativel' small num)e" of identification p"ocesses "unning in a cont"olled
fas%ion#
0lient&side deduplication cannot )e com)ined (it% A5/&f"ee data movement using t%e Tivoli Sto"age
Manage" fo" S5/ featu"e# 2f 'ou a"e implementing one of TSMLs suppo"ted A5/&f"ee to dis*
solutions= t%en 'ou can still conside" using se"ve"&side deduplication#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 28 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
Tips:
Pe"fo"m deduplication at t%e client in com)ination (it% comp"ession in t%e follo(ing ci"cumstances:
1# Fou" )ac*up net(o"* speed is a )ottlenec*#
2# 2nc"eased )ac*up du"ations can )e tole"ated= and t%e ma6imum sto"age savings is mo"e impo"tant
t%an %aving t%e fastest possi)le )ac*up elapsed times#
!# V6 se"ve"s onl': t%e client does not t'picall' send o)7ects la"ge" t%an $004: in siBe= o" client
configu"ation options can )e used to )"ea* up la"ge o)7ects into smalle" o)7ects# T%ese options a"e
discussed in a late" section#
3.2 TSM Deduplication configuration recommendations
3.2.1 Recommendations for deduplicated storage pools
T%e TSM deduplication featu"e is tu"ned on at t%e sto"age pool level# T%e TSM se"ve" can )e configu"ed (it%
mo"e t%an one deduplicated sto"age pool= )ut duplicate data (ill not )e identified ac"oss diffe"ent sto"age
pools# 2n most cases= using a single la"ge deduplicated sto"age pool is "ecommended#
T%e follo(ing commands p"ovide an e6ample of setting up a deduplicated sto"age pool on t%e TSM se"ve"#
Some pa"amete"s a"e e6plained in fu"t%e" detail to give t%e "ationale )e%ind t%e values used= and late"
sections )uild upon t%ose settings#
3.2.1.1 Device class
5 device class is used to define t%e sto"age t%at (ill )e used fo" se3uential file volumes )' t%e deduplicated
sto"age pool# Eac% of t%e di"ecto"ies specified s%ould )e )ac*ed )' a sepa"ate file s'stem= (%ic%
co""esponds to a distinct logical volume on t%e dis* sto"age su)s'stem# :' using multiple di"ecto"ies )ac*ed
)' diffe"ent sto"age elements on t%e su)s'stem= t%e TSM "ound&"o)in implementation fo" volume allocation is
a)le to ac%ieve mo"e t%"oug%put )' sp"eading 2/? ac"oss a la"ge pool of p%'sical dis*s#
Ce"e a"e some conside"ations fo" pa"amete"s (it% t%e DE@2/E DEV0A5SS command:
T%e mountlimit pa"amete" limits t%e num)e" of volumes t%at can )e simultaneousl' mounted )' all
sto"age pools t%at use t%is device class# T'picall' client sessions sending data to t%e se"ve" use t%e
most mount points= so 'ou (ill (ant to set t%is pa"amete" %ig% enoug% to %andle t%e e6pected
num)e" of simultaneous client sessions#
T%is pa"amete" needs to )e set ve"' %ig% fo" deduplicated sto"age pools to avoid %aving client
session and se"ve" p"ocesses (aiting fo" availa)le mount points# T%e setting is influenced )' t%e
numopenvolsallo(ed option= (%ic% is discussed in a late" section# To estimate t%e setting of t%is
option= use t%e follo(ing fo"mula (%e"e nump"ocs is t%e la"gest num)e" of p"ocesses used fo" a data
cop'/movement tas* suc% as "eclamation and mig"ation:
mountlimit = (numprocs * numopenvolsallowed) + max_backup_sessions +
(restore_sessions * numopenvolsallowed) + buffer
T%e ma6capacit' pa"amete" cont"ols t%e siBe of eac% file volume t%at (ill )e c"eated fo" 'ou" sto"age
pool# T%is pa"amete" ta*es some planning# T%e goal is to avoid too small of a volume siBe= (%ic%
(ill "esult in f"e3uent end&of&volume p"ocessing and spanning of la"ge" o)7ects ac"oss multiple
volumes= and also to avoid volume siBes t%at a"e too la"ge to ensu"e t%at enoug% ("itea)le volumes
a"e availa)le to %andle 'ou" e6pected num)e" of client )ac*up sessions# T%e follo(ing e6ample
s%o(s a volume siBe of $04:= (%ic% %as p"oven to )e optimal in man' envi"onments#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 2$ of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
> define devclass dedupfile devtype=file mountlimit=4000 maxcapacity=0!
directory="tsmdedup#$"tsmdedup%$"tsmdedup&$"tsmdedup4$'$"tsmdedup&%
3.2.1.2 Storage pools
T%e sto"age pool is t%e "eposito"' fo" deduplicated sto"age and uses t%e device class p"eviousl' defined# 5n
e6ample command fo" defining a deduplicated sto"age pool is given )elo(= (it% e6planations fo" pa"amete"s
t%at va"' f"om defaults# T%e"e a"e t(o met%ods fo" allocating volumes in a file&)ased sto"age pool# ;it% t%e
fi"st met%od= volumes a"e p"e&allocated and "emain assigned to t%e same sto"age pool afte" t%e' a"e
"eclaimed# T%e second met%od uses sc"atc% volumes= (%ic% a"e allocated as needed= and "etu"n to t%e
sc"atc% pool once t%e' a"e "eclaimed# T%e e6amples )elo( set up a sto"age pool using sc"atc% volumes as
t%is app"oac% is mo"e convenient and %as s%o(n in testing to mo"e efficientl' dist"i)ute t%e load ac"oss
multiple sto"age containe"s (it%in a dis* su)s'stem#
T%e dedupl icate pa"amete" is "e3ui"ed to ena)le deduplication fo" t%e sto"age pool#
T%e maxscratch pa"amete" defines t%e ma6imum num)e" of volumes t%at can )e c"eated fo" t%e
sto"age pool# T%is pa"amete" is used (%en using t%e sc"atc% met%od of volume allocation= and
s%ould ot%e"(ise )e set to a value of 0 (%en using p"e&allocated volumes# Eac% volume (ill %ave a
siBe dete"mined )' t%e maxcapacity pa"amete" fo" t%e device class# 2n ou" e6ample= 200 volumes
multiplied )' $04: pe" volume= "e3ui"es t%at 10T: of f"ee space )e availa)le ac"oss t%e !2 file
s'stems used )' t%e device class#
T%e i denti fyprocess pa"amete" is set to 0 to p"event duplicate identification p"ocesses f"om sta"ting
automaticall'# T%is suppo"ts sc%eduling (%en duplicate identification "uns= (%ic% is desc"i)ed in
mo"e detail in a late" section#
T%e reclai m pa"amete" is set to 100 to p"event automatic sto"age pool "eclamation f"om "unning#
T%is suppo"ts t%e )est p"actice of sc%eduling (%en "eclamation "uns= (%ic% is desc"i)ed in mo"e
detail in a late" section# T%e actual t%"es%old used fo" "eclamation is defined as pa"t of t%e sc%eduled
"eclamation command (%ic% is defined in a late" section#
T%e reclai mprocess pa"amete" is set to a value %ig%e" t%an t%e default of 1 since a deduplicated
sto"age pool "e3ui"es a la"ge volume of "eclamation p"ocessing to *eep up (it% t%e dail' ingestion of
ne( )ac*ups# 5s a "ule&of&t%um)= allo( fo" one p"ocess fo" eve"' file s'stem defined to t%e device
class# T%e e6ample value of !2 is li*el' )e sufficient fo" la"ge&scale implementations= )ut 'ou ma'
need to tune t%is setting afte" monito"ing s'stem usage du"ing "eclamation#
> define st(pool deduppool dedupfile maxscratc)=%00 deduplicate=yes
identifyprocess=0 reclaim=#00 reclaimprocess=&%
3.2.1.3 Policy settings
T%e final configu"ation step involves defining polic' settings on t%e TSM se"ve" t%at allo( data to ingest
di"ectl' into t%e deduplicated sto"age pool t%at %as )een c"eated# Polic' "e3ui"ements va"' fo" eac%
custome"= )ut t%e follo(ing e6ample s%o(s polic' t%at "etains e6t"a )ac*up ve"sions fo" !0 da's#
> define domain *+*,-*./0
> define policy *+*,-*./0 -12.34#
> define m(mtclass *+*,-*./0 -12.34# /567*68*
> assi(n defm(mtclass *+*,-*./0 -12.34# /567*68*
> define copy(roup *+*,-*./0 -12.34# /567*68* type=backup destination=*+*,--112
9+8+:./5/=nolimit 9+8*+2+5+*=#0 8+5+:586=&0 8+51724=;0
> define copy(roup *+*,-*./0 -12.34# /567*68* type=arc)ive destination=*+*,--112
8+59+8=&0
> activate policyset *+*,-*./0 -12.34#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 26 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
3.2.2 Recommended options for deduplication
T%e se"ve" %as seve"al tuning options t%at cont"ol deduplication p"ocessing# T%e follo(ing ta)le summa"iBes
t%ese options= and p"ovides an e6planation fo" t%ose options fo" (%ic% (e "ecommend ove""iding t%e default
values#
Option
All owed
values
Recommende
d value Explanation
Dedup.e3ui"es:ac*up
Fes R /o
Default: Fes
Default T%is option dela's t%e completion of se"ve"&
side deduplication p"ocessing until afte" a
seconda"' cop' of t%e data %as )een made
(it% sto"age pool )ac*up# T%is option does
not influence (%et%e" client&side
deduplication is pe"fo"med#
T%e TSM se"ve" offe"s man' levels of
p"otection= including t%e a)ilit' to c"eate a
seconda"' cop' of 'ou" data# 0"eating a
seconda"' cop' is optional= )ut is al(a's a
)est p"actice fo" an' sto"age pool "ega"dless
of (%et%e" it is deduplicated#
Note: See the section which fol l ows this
table for addi ti onal i nformation
regardi ng this opti on.
0lientDedupT6nAimit
Min: !2
Ma6: 2081
Default: !00
Default Specifies t%e la"gest o)7ect siBe in giga)'tes
t%at can )e p"ocessed using client&side
deduplication# T%is can )e inc"eased up to
2T:= )ut t%is does not gua"antee t%at t%e
TSM se"ve" (ill )e a)le to p"ocess o)7ects up
to t%is siBe in all envi"onments#
Se"ve"DedupT6nAimit
Min: !2
Ma6: 2081
Default: !00
Default Specifies t%e la"gest o)7ect siBe in giga)'tes
t%at can )e p"ocessed using se"ve"&side
deduplication# T%is can )e inc"eased up to
2T:= )ut t%is does not gua"antee t%at t%e
TSM se"ve" (ill )e a)le to p"ocess o)7ects up
to t%is siBe in all envi"onments#
DedupTie"2@ileSiBe
Min: 20
Ma6:
Default: 100
Default 0%anging t%e default tie" settings is not
"ecommended# Small c%anges ma' )e
tole"ated= )ut avoid f"e3uent c%anges to
t%ese settings= as c%anges (ill p"event
matc%es )et(een p"eviousl' ingested
)ac*ups and futu"e )ac*ups#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 27 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
DedupTie"!@ileSiBe
Min: 0
Ma6:
Default: 800
Default 0%anging t%e default tie" settings is not
"ecommended# Small c%anges ma' )e
tole"ated= )ut avoid f"e3uent c%anges to
t%ese settings= as c%anges (ill p"event
matc%es )et(een p"eviousl' ingested
)ac*ups and futu"e )ac*ups#
/um?penVols5llo(ed
Min: !
Ma6:
Default: 10
20 T%is option cont"ols t%e num)e" of volumes
t%at a p"ocess suc% as "eclamation o" client
sessions can %old open at t%e same time# 5
small inc"ease to t%is option is
"ecommended= and some t"ial and e""o" ma'
)e needed# /ote: T%e device class mount
limit pa"amete" ma' need to )e inc"eased if
t%is option is inc"eased#
Ena)le/asDedup
Fes R /o
Default: /o
Default 2f 'ou a"e using /DMP )ac*up of /et5pp file
se"ve"s in 'ou" envi"onment= c%ange t%is
option to Fes#
3.2.2.1 Additional information regarding the deduprequiresbackup option
:ac*up of t%e TSM p"ima"' sto"age pool is optional as dete"mined )' envi"onment&specific "is* mitigation
"e3ui"ements# T%e ta)le (%ic% follo(s summa"iBes t%e app"op"iate value fo" t%e deduprequi resbackup
option fo" diffe"ent situations# 2n t%e case of a non&deduplicated cop' sto"age pool= t%e sto"age pool )ac*up
s%ould )e pe"fo"med p"io" to "unning t%e "eclamation p"ocessing# 2f sto"age pool )ac*up is pe"fo"med afte"
t%e "eclamation p"ocessing Go" (it% client&side deduplicationH t%e cop' p"ocess (ill ta*e longe" since it
"e3ui"es t%e deduplicated data to )e "eassem)led to full o)7ects#
Archi tecture for secondary copy of backup data
Appropriate setti ng for
deduprequi resbackup
5 seconda"' cop' is c"eated using t%e sto"age pool )ac*up capa)ilit' to a
non- deduplicated copy pool suc% as a cop' pool using tape#
Fes
5 seconda"' cop' is c"eated using t%e sto"age pool )ac*up capa)ilit' to a
dedupl icated copy pool #
/o
/o seconda"' cop' is c"eated# /o
5 seconda"' cop' is c"eated on anot%e" TSM se"ve" using t%e node
"eplication featu"e#
/o
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 21 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
3.2.3 Best practices for ordering backup ingestion and data
maintenance tasks
5 successful implementation of deduplication (it% TSM "e3ui"es sepa"ating t%e tas*s of ingesting client data
and pe"fo"ming se"ve" data maintenance tas*s into sepa"ate time (indo(s# @u"t%e"mo"e= t%e se"ve" data
maintenance tas*s %ave an optimal o"de"ing= and in some cases need to )e pe"fo"med (it%out ove"lap to
avoid "esou"ce contention p"o)lems#
TSM %as t%e a)ilit' to sc%edule all of t%ese activities to follo( t%ese )est p"actices# T%e"e is a va"iation on
t%e "ecommended o"de"ing (%en sto"age pool )ac*up is used in com)ination (it% se"ve"&side deduplication=
to dela' duplicate identification to allo( fo" t%e fastest possi)le t%"oug%put fo" )ac*up ingestion on s'stems
t%at a"e 2/? const"ained and cannot %andle ove"lapping )ac*up ingest (it% duplicate identification# T%is
alte"nate va"iation can also )e follo(ed an' time se"ve"&side deduplication is used and t%e fastest possi)le
)ac*up ingestion is desi"ed# 5 late" section p"ovides t%e sample commands to implement t%e o"de"ing (it%
dela'ed duplicate identification t%"oug% command sc"ipts and sc%eduling# T(o suggested tas* se3uences= S5L
and S:L a"e desc"i)ed# .efe" to ta)le )elo( to dete"mine t%e p"efe""ed tas* se3uence#
Type of
Dedupl icati on Used
(Server or Client Side
)
Is Node
Replication Used?
(Yes or No)
Are you doi ng storage pool
backup to a non- dedupl icated
copy storage pool?
Suggested Task
Sequence
0lient Side Eit%e" Fes o" /o Eit%e" Fes o" /o 5
Se"ve" Side /o /o 5
Se"ve" Side Fes /o 5
Se"ve" Side /o Fes := if fastest possi)le
)ac*up ingest is
"e3ui"ed#
Please note t%at t%e list focuses on t%ose tas*s pe"tinent to deduplication# Please consult t%e p"oduct
documentation fo" additional commands (%ic% 'ou ma' also need to include in t%e dail' maintenance tas*s#
3.2.3.1 Suggested task sequence A
1# T%e follo(ing tas*s can "un in pa"allel:
a# 0lient data ingestion#
)# Pe"fo"m se"ve"&side duplicate identification )' "unning t%e 2DE/T2@F DDPA205TES
command# T%is p"ocesses data t%at (as not al"ead' deduplicated on t%e clients#
2# ?ptional: 0"eate t%e seconda"' disaste" "ecove"' GD.H cop' using t%e .EPA205TE /?DE command
o" t%e :50<DP ST4P??A command#
!# 0"eate a D. cop' of t%e TSM data)ase )' "unning t%e :50<DP D5T5:5SE command# @ollo(ing
t%e completion of t%e data)ase )ac*up= t%e DEAETE V?AC2ST?.F command can )e used to
"emove olde" ve"sions of data)ase )ac*ups (%ic% a"e no longe" "e3ui"ed#
8# .emove o)7ects t%at %ave e6ceeded t%ei" allo(ed "etention using t%e EKP2.E 2/VE/T?.F
command#
$# .eclaim unused space f"om sto"age pool volumes t%at %as )een "eleased t%"oug% deduplication and
invento"' e6pi"ation using t%e .E0A52M ST4P??A command#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 2 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
6# :ac*up t%e volume %isto"' and device configu"ation using :50<DP V?AC2ST?.F and :50<DP
DEV0?/@24 commands#
3.2.3.2 Suggested task sequence B
1# 0lient data ingestion#
2# 0"eate t%e seconda"' disaste" "ecove"' GD.H cop' using t%e :50<DP ST4P??A command#
!# 0"eate a D. cop' of t%e TSM data)ase )' "unning t%e :50<DP D5T5:5SE command# @ollo(ing
t%e completion of t%e data)ase )ac*up= t%e DEAETE V?AC2ST?.F command can )e used to
"emove olde" ve"sions of data)ase )ac*ups (%ic% a"e no longe" "e3ui"ed#
8# Pe"fo"m se"ve"&side duplicate identification )' "unning t%e 2DE/T2@F DDPA205TES command# T%is
p"ocesses data t%at (as not al"ead' deduplicated on t%e clients#
$# .emove o)7ects t%at %ave e6ceeded t%ei" allo(ed "etention using t%e EKP2.E 2/VE/T?.F
command#
6# .eclaim unused space f"om sto"age pool volumes t%at %as )een "eleased t%"oug% deduplication and
invento"' e6pi"ation using t%e .E0A52M ST4P??A command#
7# :ac*up t%e volume %isto"' and device configu"ation using :50<DP V?AC2ST?.F and :50<DP
DEV0?/@24 commands#
3.2.3.3 Define scripts that run each required maintenance task
T%e follo(ing sc"ipts= once defined= can )e called )' sc%eduled administ"ative commands# Ce"e a"e a fe(
points to note "ega"ding t%ese sc"ipts:
T%e sto"age pool )ac*up sc"ipt assumes 'ou %ave al"ead' defined a cop' sto"age pool named
cop'pool= (%ic% uses tape sto"age# /?TE: Sto"age pool )ac*up is optional= as dete"mined )'
envi"onment specific "is* mitigation "e3ui"ements#
T%e data)ase )ac*up sc"ipt "e3ui"es a device class t%at t'picall' also uses tape sto"age#
T%e sc"ipt fo" "eclamation gives an e6ample of %o( t%e parallel command can )e used to
simultaneousl' p"ocess mo"e t%an one sto"age pool#
T%e num)e" of p"ocesses to use fo" identif'ing duplicates s%ould not e6ceed t%e num)e" of 0PD
co"es availa)le on 'ou" TSM se"ve"# T%is command also does not %ave a (ait,'es pa"amete"= so it
is necessa"' to define a du"ation limit#
2f 'ou %ave a la"ge TSM data)ase= 'ou can fu"t%e" optimiBe t%e :50<DP D5T5:5SE command )'
using multiple st"eams (it% TSM 6#! and late"#
5 deduplicated sto"age pool is t'picall' "eclaimed to a t%"es%old lo(e" t%an t%e default of 60 to allo(
mo"e of t%e identified duplicate c%un*s to )e "emoved# Some e6pe"imenting (ill )e needed to find a
value t%at can )e completed (it%in t%e availa)le time# Tip: 5 "eclamation setting of 80 o" less is
usuall' sufficient#
define script /5!<630,- ="* 8un st( pool backups *"=
update script /5!<630,- =backup st(pool *+*,--112 copypool maxprocess=#0
wait=yes= line=0%0
define script *+*,- ="* 8un identify duplicate processes *"=
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page !0 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
update script *+*,- =identify duplicates *+*,--112 numprocess=#% duration=>>0=
line=0#0
set dbrecovery 56-+*+93 numstreams=&
define script *<<630,- ="* 8un *< backups *"=
update script *<<630,- =backup db devclass=56-+*+93 type=full numstreams=&
wait=yes= line=0#0
update script *<<630,- =if(error) (oto done= line=0%0
update script *<<630,- =backup vol)istory= line=0&0
update script *<<630,- =backup devconfi(= line=040
update script *<<630,- =delete vol)istory type=dbbackup todate=today?@
totime=now= line=00
update script *<<630,- =doneAexit= line=0>0
define script 8+326.B ="* 8un st( pool reclamation *"=
update script 8+326.B =parallel= line=0#0
update script 8+326.B =reclaim st(pool *+*,--112 t)res)old=40 wait=yes= line=0%0
update script 8+326.B =reclaim st(pool 31-4-112 t)res)old=>0 wait=yes= line=0&0
define script +:-.8+ ="* 8un expiration processesC *"=
update script +:-.8+ =expire inventory resources=; wait=yes= line=0#0
3.2.3.4 Define schedules to run the data maintenance tasks
T%e TSM se"ve" %as t%e a)ilit' to sc%edule commands to "un= (%e"e t%e sc%eduled action is to "un t%e
va"ious sc"ipts t%at (e"e defined in t%e p"evious sections# T%e e6amples )elo( give specific sta"t times t%at
%ave p"oven to )e successful in envi"onments (%e"e )ac*ups "un f"om midnig%t until 07:00 5M on t%e same
da'# Fou (ill need to c%ange t%e sta"t times to app"op"iate values fo" 'ou" envi"onment#
715+A /tora(e pool backup is optional$ as determined by environment specific
risk miti(ation reDuirementsC
define sc)edule /5!<630,- type=admin cmd==run /5!<630,-= active=yes E
desc==8un all st( pool backupsC= startdate=today starttime=0;A00A00 E
duration=# durunits=minutes period=# perunits=day
define sc)edule *+*,- type=admin cmd==run *+*,-= active=yes E
desc==8un indentify duplicatesC= startdate=today starttime=00A00A00 E
duration=# durunits=minutes period=# perunits=day
define sc)edule +:-.865.17 type=admin cmd==run expire= active=yes E
desc==8un expirationC= startdate=today starttime=#4A00A00 E
duration=# durunits=minutes period=# perunits=day
define sc)edule *<<630,- type=admin cmd==run *<<630,-= active=yes E
desc==8un database backupC= startdate=today starttime=#%A00A00 E
duration=# durunits=minutes period=# perunits=day
define sc)edule 8+326.B type=admin cmd==run 8+326.B= active=yes E
desc==8eclaim space from stora(e poolsC= startdate=today starttime=#>A00 E
duration=# durunits=minutes period=# perunits=day
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page !1 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
4 Estimating deduplication savings
2f 'ou as* someone in t%e data deduplication )usiness to give 'ou an estimate of t%e amount of savings to
e6pect fo" 'ou" specific data= t%e ans(e" (ill often )e Iit depends#J T%e "ealit' is t%at TSM= li*e eve"' ot%e"
data p"otection p"oduct= cannot gua"antee a ce"tain level of deduplication )ecause t%e"e a"e a va"iet' of
facto"s uni3ue to 'ou" data t%at influence t%e "esults#
Since deduplication "e3ui"es computational "esou"ces= it is impo"tant to conside" (%ic% envi"onments and
ci"cumstances can )enefit most f"om deduplication= and (%en ot%e" data "eduction tec%ni3ues ma' )e mo"e
app"op"iate# ;%at (e can do is p"ovide an unde"standing of t%e facto"s t%at influence deduplication
effectiveness (%en using TSM= and p"ovide some e6amples of o)se"ved )e%avio"s fo" specific t'pes of data=
(%ic% can )e used as a "efe"ence fo" planning pu"poses#
4.1 Factors that influence the effectiveness of deduplication
T%e follo(ing a"e facto"s t%at %ave an influence on %o( effectivel' TSM "educes t%e amount of data to )e
sto"ed using deduplication#
4.1.1 Characteristics of the data
4.1.1.1 Uniqueness of the data
T%e fi"st facto" to conside" is t%e uni3ueness of t%e data# Muc% of deduplication savings come f"om "epeated
)ac*ups of t%e same o)7ects# Some savings= %o(eve"= "esult f"om %aving data in common (it% )ac*ups of
ot%e" o)7ects o" even (it%in t%e same o)7ect# T%e uni3ueness of t%e data is t%e po"tion of an o)7ect t%at %as
neve" )een sto"ed )' a p"evious )ac*up# Duplicate data can )e found (it%in t%e same o)7ect= ac"oss
diffe"ent o)7ects sto"ed )' t%e same client= and f"om o)7ects sto"ed )' diffe"ent clients#
4.1.1.2 Response to fingerprinting
T%e ne6t facto" is %o( data "esponds to t%e deduplication finge"p"inting p"ocessing used )' TSM# Du"ing
deduplication= TSM )"ea*s o)7ects into c%un*s= (%ic% a"e e6amined to dete"mine (%et%e" t%e' %ave )een
p"eviousl' sto"ed# T%ese c%un*s a"e va"ia)le in siBe and a"e identified using a p"ocess called finge"p"inting#
T%e pu"pose of finge"p"inting is to ensu"e t%at t%e same c%un* (ill al(a's )e identified "ega"dless of (%et%e"
it s%ifts to diffe"ent positions (it%in t%e o)7ect )et(een successive )ac*ups#
T%e TSM finge"p"inting implementation uses a p"o)a)ilit'&)ased algo"it%m fo" identif'ing c%un* )ounda"ies
(it%in an o)7ect# T%e algo"it%m st"ives to %ave all of t%e c%un*s c"eated fo" an o)7ect ave"age out in te"ms of
siBe to a ta"get ave"age fo" all c%un*s# T%e actual siBe of eac% c%un* is va"ia)le (it%in t%e const"aints t%at it
must )e la"ge" t%an t%e minimum c%un* siBe and cannot )e la"ge" t%an t%e o)7ect itself# T%e finge"p"inting
implementation "esults in ave"age c%un* siBes t%at va"' fo" diffe"ent *inds of data# @o" data t%at finge"p"ints
to ave"age c%un* siBes significantl' la"ge" t%an t%e ta"get ave"age= t%e deduplication efficienc' is mo"e
sensitive to c%anges# Mo"e details a"e given in t%e late" section t%at discusses tie"ing#
4.1.1.3 Volatility of the data
T%e final facto" is t%e volatilit' of t%e data# 5 significant amount of deduplication savings is a "esult of t%e fact
t%at simila" o)7ects a"e )ac*ed up "epeatedl' ove" time# ?)7ects t%at unde"go onl' mino" c%anges )et(een
)ac*ups (ill end up %aving a significant pe"centage of c%un*s t%at a"e unc%anged since t%e last )ac*up and
%ence do not need to )e sto"ed again# Ai*e(ise= an o)7ect can unde"go a patte"n of c%ange t%at alte"s a
la"ge pe"cent of t%e c%un*s in t%e o)7ect# 2n t%ese cases= t%e"e is ve"' little savings "ealiBed )' deduplication#
2t is impo"tant to note t%at t%is effect does not necessa"il' "elate to t%e amount of data )eing ("itten to an
o)7ect# 2nstead= it is a facto" of %o( pe"vasivel' t%e c%anges a"e scatte"ed t%"oug%out t%e o)7ect# Some
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page !2 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
c%ange patte"ns= suc% as appending ne( data at t%e end of an o)7ect= %ave a ve"' favo"a)le "esponse (it%
deduplication#
4.1.1.4 Examples of workloads that respond well to deduplication
T%e follo(ing a"e gene"al e6amples of )ac*up (o"*loads t%at "espond (ell to deduplication:
:ac*up of (o"*stations (it% multiple copies o" ve"sions of t%e same file#
:ac*up of o)7ects (it% "egions t%at "epeat t%e same c%un*s of data Gfo" e6ample= "egions (it% Be"osH#
Multiple full )ac*ups of diffe"ent ve"sions of t%e same data)ase#
?pe"ating s'stem files ac"oss multiple s'stems# @o" e6ample= ;indo(s s'stemstate )ac*up is a
common sou"ce of duplicate data# 5not%e" e6ample is vi"tual mac%ine image )ac*ups (it% TSM fo"
Vi"tual Envi"onments#
:ac*up of (o"*stations (it% ve"sions o" copies of t%e same application data Gfo" e6ample=
documents= p"esentations= o" imagesH#
Pe"iodic full )ac*ups ta*en of s'stems using a ne( nodename fo" t%e pu"poses of c"eating a out of
c'cle )ac*up (it% special "etention c"ite"ia#
4.1.1.5 Deduplication efficiency of some data types
T%e follo(ing ta)le s%o(s some common data t'pes along (it% t%ei" e6pected deduplication efficienc'#
Data type Dedupl icati on efficiency
5udio Gmp!= (maH= Video Gmp8H= 2mages G7pegH Poo"
Cuman gene"ated/consume" data: te6t documents= sou"ce
code
4ood
?ffice documents T sp"eads%eets= p"esentations Poo"
0ommon ope"ating s'stem files 4ood
Aa"ge "epeated )ac*ups of data)ases G?"acle= D:2= etcH 4ood
?)7ects (it% em)edded cont"ol st"uctu"es Poo"
TSM data sto"ed in non&native sto"age pools Gfo" e6ample=
/DMP dataH
/one
4.1.2 Impacts from backup strategy decisions
T%e gains "ealiBed f"om deduplication a"e also influenced )' t(o diffe"ent implementation c%oices in %o(
)ac*ups a"e ta*en and managed#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page !! of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
4.1.2.1 Backup model
@o" TSM= a ve"' common )ac*up model is t%e use of inc"emental&fo"eve" )ac*ups# 2n t%is case= eac%
su)se3uent )ac*up ac%ieves significant sto"age savings )' not %aving to send unc%anged o)7ects# T%ese
o)7ects t%at a"e not "e&sent also do not need to go t%"oug% deduplication p"ocessing= (%ic% tu"ns out to )e a
ve"' efficient met%od of "educing data# ?n t%e ot%e" %and= ot%e" data t'pes use a )ac*up model t%at al(a's
"uns a full )ac*up= o" a pe"iodic full )ac*up# 2n t%ese cases= t%e"e (ill t'picall' )e significant "eductions in t%e
data to )e sto"ed= (%ic% is a "esult of t%e significant duplication ac"oss su)se3uent )ac*ups of t%e simila"
o)7ects# T%e follo(ing ta)le illust"ates some e6amples of deduplication savings )et(een full and inc"emental
)ac*up models:
Does dedupl ication offer
savings i n the case where
.
Full backup Incremental backup
@ile&level )ac*ups a"e ta*en
using t%e )ac*up&a"c%ive
client#
Fes (%en:
T%e"e is data in common
f"om ot%e" nodes suc% as
ope"ating s'stem files
Pe"iodic full )ac*ups a"e
ta*en fo" a s'stem# T%is is
occasionall' pe"fo"med
using a diffe"ent node
name fo" t%e pu"pose of
esta)lis%ing a diffe"ent
"etention sc%eme
Fes fo" files t%at a"e )eing "e&sent
due to c%anges Gdepends on
volatilit'H
/o fo" ne( files t%at a"e )eing sent
fo" t%e fi"st time Gdepends on
uni3uenessH
Data)ase )ac*ups a"e ta*en
using a data p"otection client#
Fes (%en:
Su)se3uent full )ac*ups
a"e ta*en Gdepends on
volatilit'H
/o (%en:
T%e fi"st )ac*up is ta*en#
Data)ases a"e t'picall'
uni3ue
T'picall' no# T%e data)ase
inc"emental mec%anism is onl'
sending c%anged "egions of t%e
o)7ect= (%ic% t'picall' %ave not
)een sto"ed )efo"e#
Vi"tual mac%ine )ac*ups a"e
ta*en using t%e Data
P"otection fo" VM(a"e
p"oduct#
Fes# VM(a"e full )ac*ups often
e6pe"ience savings (it% matc%es
f"om t%e )ac*ups of ot%e" vi"tual
mac%ines= as (ell as f"om "egions
f"om t%e same vi"tual dis* t%at a"e
in common#
4.1.2.2 Retention settings
2n gene"al= t%e mo"e ve"sions 'ou set TSM polic' to "etain= t%e mo"e savings 'ou (ill "ealiBe f"om TSM
deduplication as a pe"centage of t%e total 'ou (ould %ave needed to sto"e (it%out deduplication# Dse"s (%o
desi"e to "etain mo"e ve"sions of o)7ects in TSM sto"age find t%is to )e mo"e cost effective (%en using
deduplication# 0onside" t%e e6ample )elo(= (%ic% s%o(s t%e accumulated sto"age used ove" a se"ies of
)ac*ups using t%e Data P"otection fo" ?"acle p"oduct# Fou can see t%at ten )ac*up ve"sions a"e sto"ed (it%
deduplication using less capacit' t%an t%"ee )ac*up ve"sions "e3ui"e (it%out deduplication#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page !8 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page !$ of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
4.2 Effectiveness of deduplication combined with progressive
incremental backup
T%e p"og"essive inc"emental )ac*up tec%nolog' in TSM p"ovides a ve"' effective met%od of efficientl'
"educing t%e amount of data p"ocessed in eac% )ac*up# T%is tec%nolog' can also )e effectivel' com)ined
(it% deduplication# ;%en used in com)ination= data is initiall' "educed )' t%e inc"emental p"ocessing (%ic%
is a)le to s*ip unc%anged o)7ects (it%out appl'ing deduplication p"ocessing against t%em# @o" t%ose o)7ects
(%ic% do "e3ui"e a )ac*up= deduplication is applied# 5 ve"' t'pical patte"n seen (it% inc"emental )ac*up is
t%e e6istence of ce"tain files (%ic% c%ange continuousl'# @o" t%ese o)7ects= inc"emental )ac*up is not a)le to
p"ovide an' savings as t%e' al(a's "e3ui"e )ac*up# T%is p"ovides a significant sou"ce of "eduction fo"
deduplication since alt%oug% t%ese o)7ects c%ange f"e3uentl'= t%e c%anges can )e minimal in te"ms of t%e
volatilit' f"om a deduplication pe"spective#
T%e follo(ing e6ample s%o(s %o( t%is (o"*s fo" one common file t'pe (%ic% c%anges continuousl'# T%e
follo(ing c%a"t s%o(s a Aotus /otes mail "eplica file (%ic% unde"goes a se"ies of ten dail' )ac*ups# 2n t%is
case= deduplication is a)le to p"ovide a cumulative savings of 71N afte" t%e se"ies of )ac*ups#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page !6 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
4.3 Interaction of compression and deduplication
T%e TSM client p"ovides t%e a)ilit' to comp"ess data (it% t%e potential to p"ovide additional sto"age savings
)' com)ining )ot% comp"ession and deduplication# ;it% TSM deduplication= 'ou (ill need to decide (%et%e"
to pe"fo"m deduplication at t%e client= se"ve"= o" in some com)ination# T%is section (ill guide 'ou t%"oug% t%e
anal'sis t%at s%ould %appen in ma*ing t%at decision= ta*ing into conside"ation t%e fact t%at com)ining
deduplication and comp"ession is onl' possi)le on t%e clients#
4.3.1 How deduplication and compression interact with TSM
2n gene"al= deduplication tec%nologies a"e not ve"' effective (%en applied to data t%at is p"eviousl'
comp"essed# Co(eve"= )' comp"essing data afte" it is al"ead' deduplicated= additional savings can )e
gained# ;%en deduplication and comp"ession a"e )ot% pe"fo"med )' t%e TSM client= t%e ope"ations a"e
se3uenced in t%e desi"a)le o"de" of fi"st appl'ing deduplication= follo(ed )' comp"ession# T%e follo(ing list
summa"iBes *e' points of t%e TSM implementation= (%ic% (ill %elp e6plain ot%e" info"mation to follo( in t%is
section:
T%e TSM client can pe"fo"m deduplication com)ined (it% comp"ession#
T%e TSM se"ve" can pe"fo"m deduplication= )ut cannot pe"fo"m comp"ession#
2f data is comp"essed p"io" to )eing passed to t%e TSM client= it is not possi)le to pe"fo"m
deduplication p"io" to comp"ession# @o" e6ample= ce"tain data)ases p"ovide t%e a)ilit' to comp"ess a
)ac*up st"eam p"io" to passing t%e st"eam to a Tivoli fo" Data P"otection client# 2n t%ese cases= t%e
data (ill )e comp"essed p"io" to TSM pe"fo"ming deduplication#
T%e most significant "eduction in data siBe is t'picall' a "esult of pe"fo"ming t%e com)ination of client&side
deduplication and comp"ession# T%e additional savings p"ovided )' comp"ession (ill va"' depending on %o(
(ell t%e specific data "esponds to t%e TSM client comp"ession mec%anism#
4.3.2 Considerations related to compression when choosing
between client - side and server- side deduplication
T'picall'= t%e decision of (%et%e" to use data "eduction tec%nologies on t%e TSM client depends on 'ou"
)ac*up (indo( "e3ui"ements= and (%et%e" 'ou" envi"onment is net(o"*&const"ained# ;it% const"ained
net(o"*s= using data "eduction tec%nologies on t%e client ma' actuall' imp"ove )ac*up elapsed times#
;it%out a const"ained net(o"*= t%e use of client&side data "eduction tec%nologies (ill t'picall' "esult in longe"
)ac*up elapsed times# T%e follo(ing 3uestions a"e impo"tant to conside" (%en c%oosing (%et%e" to
implement client&side data "eduction tec%nologies:
1# 2s t%e speed of 'ou" )ac*up net(o"* limiting )ac*up elapsed times+
2# ;%at is mo"e impo"tant to 'ou" )usiness: t%e amount of sto"age savings 'ou ac%ieve t%"oug%
data "eduction tec%nologies= o" %o( 3uic*l' )ac*ups complete+
2f t%e ans(e" to t%e fi"st 3uestion is 'es= using data "eduction tec%nologies on t%e client ma' "esult in )ot%
faste" )ac*ups and inc"eased sto"age savings on t%e TSM se"ve"# Mo"e often= t%e ans(e" to t%is 3uestion is
no= in (%ic% case 'ou need to (eig% t%e t"ade&offs )et(een %aving t%e fastest possi)le )ac*up elapsed
times= and gaining t%e ma6imum amount of sto"age pool savings#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page !7 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
T%e g"ap% a)ove s%o(s a 204: o)7ect going t%"oug% a se"ies of ten )ac*ups# @o" eac% of t%e ten )ac*ups=
t%e o)7ect in t%e same state (as "un t%"oug% diffe"ent data "eduction mec%anisms in TSM to allo( compa"ing
t%e )e%avio" of eac%# T%e ta)le summa"iBes t%e cumulative totals sto"ed and saved fo" eac% of t%e
tec%ni3ues= along (it% elapsed times in some cases# T%e follo(ing o)se"vations can )e made f"om t%ese
"esults:
T%e most significant sto"age savings of 16N is seen (it% t%e com)ination of client&side deduplication
and comp"ession# T%e"e is a cost of a 1#$ times inc"ease in t%e )ac*up elapsed time ve"sus a
)ac*up (it% no client&side data "eduction# T%e addition of comp"ession p"ovides t%e additional 11N
savings )e'ond t%e 7$N t%at is possi)le using deduplication alone#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page !1 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
;it% comp"ession alone= t%e"e is a savings of 87N# T%is is a fai"l' t'pical savings seen (it% TSM
comp"ession#
;it% deduplication alone Gcan )e eit%e" client&side o" se"ve"&side=H t%e"e is a savings of 7$N# T%e"e
(as no savings fo" t%e fi"st )ac*up (it% deduplication alone# T%is is t'pical (it% uni3ue o)7ects suc%
as data)ases# T%e additional savings seen on t%e initial )ac*up is one a"ea in (%ic% comp"ession
p"ovides su)stantial savings )e'ond (%at deduplication p"ovides#
5ppl'ing se"ve"&side deduplication to data t%at is al"ead' comp"essed )' t%e client "esults in a lo(e"
$1N savings t%an t%e 7$N t%at can )e ac%ieved using se"ve"&side deduplication alone# Caution:
Fou" application ma' comp"ess data )efo"e it is passed to t%e TSM client# T%is (ill "esult in a simila"
less&efficient deduplication savings# 2n t%ese cases= it is )est to eit%e" disa)le t%e application
comp"ession= o" send t%is data to a sto"age pool t%at does not use deduplication#
The bot tom l i ne: @o" t%e fastest )ac*ups on a fast net(o"*= c%oose se"ve"&side deduplication# @o" t%e
la"gest sto"age savings= c%oose client&side deduplication com)ined (it% comp"ession# 5void pe"fo"ming
client&comp"ession in com)ination (it% se"ve"&side deduplication#
4.4 Understanding the TSM deduplication tiering implementation
T%e deduplication implementation in TSM uses a tie"ed model (%e"e la"ge" o)7ects a"e p"ocessed (it% la"ge"
ave"age c%un* siBes (it% t%e goal of limiting t%e num)e" of c%un*s t%at an o)7ect (ill )e split into# T%e tie"ing
model is used to avoid ope"ational p"o)lems t%at a"ise (%en t%e TSM se"ve" needs to ope"ate on o)7ects
consisting of ve"' la"ge num)e"s of c%un*s= and also to limit t%e g"o(t% of t%e TSM data)ase# T%e use of
la"ge" ave"age c%un* siBes %as t%e t"ade&off of limiting t%e amount of savings ac%ieved )' deduplication# T%e
TSM se"ve" p"ovides t%"ee diffe"ent tie"s t%at a"e used fo" diffe"ent "anges of o)7ect siBes#
4.4.1 Controls for deduplication tiering
T%e"e a"e t(o options on t%e TSM se"ve" t%at cont"ol t%e o)7ect siBe t%"es%olds at (%ic% o)7ects a"e
p"ocessed in tie"2 o" tie"!# 5ll o)7ects (it% siBes smalle" t%an t%e tie"2 t%"es%old a"e p"ocessed in tie"1# :'
default= o)7ects unde" 1004: in siBe a"e p"ocessed at tie"1# ?)7ects in t%e "ange of 1004: to unde" 8004:
a"e p"ocessed in tie"2= and all o)7ects 8004: and la"ge" a"e p"ocessed in tie"!#
5void ma*ings ad7ustments to t%e options cont"olling t%e deduplication tie" t%"es%olds# 0%anges to t%e
t%"es%olds afte" data %as )een sto"ed can p"event ne(l' sto"ed data f"om matc%ing data sto"ed in p"evious
)ac*ups= and can also cause ope"ational p"o)lems if t%e c%anges cause la"ge" o)7ects to )e p"ocessed in t%e
lo(e" tie"s#
Ve"' la"ge o)7ects can )e e6cluded f"om deduplication using t%e options clientdedupt xnl i mi t and
serverdedupt xnl i mi t # T%e sto"age pool pa"amete" maxsi ze can also )e used to p"event la"ge o)7ects f"om
)eing sto"ed in a deduplicated sto"age pool#
:eginning (it% TSM ve"sion 7#1= a ne( featu"e %as )een added (%e"e t%e TSM se"ve" t"anspa"entl'
segments la"ge o)7ects into f"agments of 104:# Eac% f"agment is p"ocessed (it% deduplication
independentl' as a sepa"ate t"ansaction= and can avoid ope"ational and pe"fo"mance p"o)lems p"eviousl'
e6pe"ienced (it% la"ge o)7ects# T%e client is una(a"e of t%is f"agmentation= and t%e se"ve" "epo"ts on t%e
o)7ects no"mall' as if t%e' (e"e one la"ge o)7ect# T%is capa)ilit' is availa)le fo" )ot% client&side deduplication
and se"ve"&side deduplication= and is ena)led )' default# T%e capa)ilit' can )e disa)led selectivel' fo"
specific nodes )' updating a nodeLs SPA2TA5.4E?:UE0TS setting#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page ! of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
Option All owed values (GB) Implications of the defaul t
DedupTie"2@ileSiBe
Minimum: 20
Ma6imum:
Default: 100
?)7ects t%at a"e smalle" t%e 1004: (ill )e p"ocessed in
tie"1# ?)7ects 1004: and up to t%e tie"! setting a"e
p"ocessed as tie"2#
DedupTie"!@ileSiBe
Minimum: 0
Ma6imum:
Default: 800
?)7ects t%at a"e 8004: and la"ge" a"e p"ocessed in tie"!#
?)7ects t%at a"e smalle" t%e 8004: a"e p"ocessed in tie"2
do(n to t%e tie"2 t%"es%old (%e"e t%e' a"e p"ocessed (it%
tie"1#
4.4.2 The impact of tiering to deduplication storage reduction
T%e c%a"t )elo( gives an e6ample of t%e impact t%at tie"ing %as on deduplication savings# @o" t%e test )elo(=
t%e same D:2 data)ase (as p"ocessed t%"oug% a se"ies of ten sets of )ac*ups (it% a va"'ing c%ange patte"n
applied afte" eac% set of )ac*ups# @o" eac% set of )ac*ups= t%e o)7ect in t%e same state (as tested using t%e
t%"ee diffe"ent deduplication tie"s= eac% )eing sto"ed in its o(n sto"age pool# T%e ta)le )elo( gives t%e
cumulative savings fo" eac% tie" ac"oss t%e ten )ac*ups# T%e follo(ing o)se"vations can )e made:
Deduplication is al(a's mo"e effective at "educing data in t%e lo(e" tie"s#
T%e amount of diffe"ence in data "eduction )et(een t%e tie"s depends on %o( t%e o)7ects c%ange
)et(een )ac*ups# @o" data (it% lo( volatilit'= t%e"e is less impact to savings f"om tie"ing#
5s a gene"al "ule&of&t%um)= 'ou can estimate t%at t%e"e (ill )e app"o6imatel' 17N loss of
deduplication savings as 'ou move t%"oug% eac% tie"#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 80 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication

4.4.3 Client controls that optimize deduplication efficiency
0ont"ols a"e availa)le on some TSM client t'pes t%at p"event o)7ects f"om )ecoming too la"ge# T%is allo(s
fo" la"ge o)7ects to )e p"ocessed as multiple smalle" o)7ects (%ic% fall into t%e tie"1 "ange# T%e"e is not a
met%od to accomplis% t%is fo" eve"' client t'pe= )ut %e"e a"e some st"ategies t%at %ave p"oven effective at
*eeping o)7ects (it%in t%e tie"1 t%"es%old:
@o" ?"acle data)ase )ac*ups= use t%e .5M MAXPIECESIZE option to p"event an' individual o)7ect
c"ossing t%e tie"2 siBe t%"es%old# Mo"e "ecommendations on t%is topic follo( in a late" section#
@o" Mic"osoft SEA data)ase )ac*ups t%at use t%e legac' )ac*up 5P2= t%e data)ase can )e split
ac"oss multiple st"eams# Eac% st"eam t%at is used "esults in a sepa"ate o)7ect )eing sto"ed on t%e
TSM se"ve"# 5 2004: data)ase= fo" e6ample= can )e )ac*ed up (it% fou" st"eams= (%ic% "esults in
app"o6imatel' fou" $04: o)7ects t%at (ill all fit (it%in t%e default tie"1 siBe t%"es%old#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 81 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
4.5 What kinds of savings can I expect for different application
types
/o specific gua"antee of TSM deduplication data "eduction can )e made fo" specific application t'pes# 2t is
possi)le to const"uct an implementation of an' of t%e applications discussed in t%is section (it% initial data
and appl' c%anges to t%at data in suc% a (a' t%at an' deduplication s'stem (ould s%o( poo" "esults# ;%at
(e can do= and (%at is cove"ed in t%is section= is to p"ovide some e6amples of %o( specific implementations
of t%ese applications t%at unde"go "easona)le patte"ns of c%ange "espond to TSM deduplication# T%is
info"mation can )e conside"ed to )e a li*el' outcome of using TSM deduplication in 'ou" envi"onment# Mo"e
specific "esults fo" 'ou" envi"onment can onl' )e o)tained )' testing 'ou" "eal data (it% TSM ove" a pe"iod of
time#
2n t%e sections t%at follo(= sample deduplication savings a"e given fo" specific applications t%at "esult f"om
ta*ing a se"ies of )ac*ups (it% TSM# Eac% of t%ese e6amples s%o(s "esults f"om onl' using deduplication=
so imp"oved "esults a"e possi)le )' com)ining deduplication and comp"ession# 0ompa"isons ac"oss t%e
t%"ee diffe"ent deduplication tie"s a"e given e6cept fo" applications (%e"e using t%e %ig%e" tie"s can )e
avoided# 0lient&side deduplication (as used fo" all of t%e tests#
T%e"e a"e ta)les in t%e follo(ing sections t%at include elapsed times# T%ese a"e given so t%at 'ou can ma*e
"elative compa"isons and s%ould not )e conside"ed indicato"s of t%e pe"fo"mance 'ou (ill see# T%e"e a"e
man' facto"s t%at (ill influence actual )ac*up elapsed times= including net(o"* pe"fo"mance#
4.5.1 IBM DB2
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 82 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 8! of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
4.5.2 Microsoft Exchange
T%e test envi"onment consisted of a Mic"osoft E6c%ange Se"ve" 2010 (it% five data)ases eac% (it% a sta"ting
siBe of app"o6imatel' 2$4: containing !0$ use"s pe" data)ase# 5ll )ac*ups (e"e full VSS )ac*ups
pe"fo"med using TSM Data P"otection fo" Mic"osoft E6c%ange 6#8#
:et(een )ac*ups= c%ange activit' (as d"iven against t%e data)ase using t%e Mic"osoft Aoad 4ene"ato"# T%e
load p"ofile (as set to gene"ate 1! tas*s pe" use" pe" da' consisting of "eceive= send= delete= and open/"ead
activit'#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 88 of $0
Cumulative Data tored #$%% &' M E(change )ull
* backu!s+
0#0
200000#0
800000#0
600000#0
100000#0
1000000#0
1200000#0
1800000#0
1600000#0
)1 )2 )! )8 )$ )6 )7 )1 ) )10
'acku! number
D
a
t
a

t
o
r
e
d
#
M
'
+
Tie" 1
/o Dedup
Effective Planning and Use of TSM V6 and V7 Dedupl ication
4.5.3 Microsoft SQL
4.5.4 Oracle
:ac*ups using t%e Data P"otection fo" ?"acle p"oduct can ac%ieve simila" deduplication sto"age savings (it%
t%e p"ope" configu"ation# T%e test "esults summa"iBed in t%e follo(ing c%a"ts onl' give values fo" tie" 1# T%e
ot%e" tie"s (e"e not tested )ecause t%e .M5/ MAXPIECESIZE option can )e used to p"event o)7ects f"om
"eac%ing siBes t%at "e3ui"e t%e %ig%e" tie"s#
T%e follo(ing .M5/ settings a"e "ecommended (%en pe"fo"ming deduplicated )ac*ups of ?"acle data)ases
(it% TSM:
Dse t%e maxpiecesize .M5/ pa"amete" to *eep t%e o)7ects sent to TSM (it%in t%e tie" 1 siBe "ange#
?"acle )ac*ups can )e )"o*en into multiple o)7ects of a specified siBe# T%is allo(s fo" data)ases of
la"ge" siBes to )e p"ocessed safel' (it% tie"1 deduplication p"ocessing# T%e pa"amete" must )e set
to a value t%at is less t%an t%e TSM se"ve" DedupTier2FileSize pa"amete" Gdefaults to 1004:H#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 8$ of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
.ecommended value: 5 maxpiecesize setting of 104: p"ovides a good )alance )et(een
*eeping eac% piece at an optimal siBe fo" %andling )' t%e TSM se"ve" and %aving too man'
"esulting o)7ects#
?"acle .M5/ p"ovides t%e capa)ilit' to multiple6 t%e )ac*ups of data)ase filesets ac"oss multiple
c%annels# Dsing t%is featu"e (ill t'picall' "esult in less effective TSM deduplication data "eduction#
Dse t%e fi lesperset .M5/ pa"amete" to avoid splitting a fileset ac"oss multiple c%annels#
.ecommended value: 5 fi lesperset setting of 1 s%ould )e used fo" optimal deduplication
data "eduction#
@ollo(ing is a sample .M5/ sc"ipt= (%ic% includes t%e "ecommended values fo" use (it% TSM
deduplication:
run
{
allocate channel ch1 type 'SBT_TAPE' maxopenfiles=1 maxpiecesize 10G
parms 'ENV=(TDPO_OPTFILE=/home/orc11/tdpo_10g.opt)';
backup filesperset 1 (tablespace tbsp_dd);
release channel ch1;
}
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 86 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
4.5.5 VMware
VM(a"e )ac*up using TSM fo" Vi"tual Envi"onments is one a"ea t%at is commonl' )eing deplo'ed using TSM
deduplication# VM(a"e )ac*ups a"e t'picall' s%o(ing ve"' su)stantial savings (%en t%e com)ination of
client&side deduplication and comp"ession is used# T%e follo(ing facto"s cont"i)ute to t%e su)stantial savings
t%at a"e seen:
T%e"e is often significant data in common ac"oss vi"tual mac%ines# Pa"t of t%is is t%e "esult of t%e
same ope"ating s'stems )eing installed and cloned ac"oss multiple vi"tual mac%ines# 5lt%oug% t%e
initial full "esults in t%e %ig%est "eduction in data= su)se3uent inc"emental )ac*ups G(it% inc"emental
fo"eve"H can still )enefit f"om "eduction f"om deduplication#
Some duplicate data e6ists (it%in t%e same vi"tual mac%ine on t%e initial )ac*up#
5n e6ample savings ac%ieved (it% VM(a"e )ac*up using t%e com)ination of inc"emental fo"eve"=
client&side deduplication and client comp"ession is 2$:1#
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 87 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
5 How to determine deduplication results
2t is useful to evaluate t%e actual data "eduction "esults f"om TSM deduplication to dete"mine if t%e e6pected
sto"age savings %ave )een ac%ieved# 2n addition to evaluating t%e data "eduction "esults= ot%e" *e'
ope"ational facto"s s%ould )e c%ec*ed= suc% as data)ase utiliBation= to ensu"e t%at t%e' a"e consistent (it%
e6pectations#
Deduplication "esults can )e dete"mined )' va"ious 3ue"ies to t%e TSM se"ve" f"om t%e administ"ative
command line o" t%e ?pe"ations 0ente" inte"face# 2t is impo"tant to "ecogniBe t%e d'namic natu"e of
deduplication and t%at t%e )enefits of deduplication a"e not al(a's "ealiBed immediatel' afte" data is )ac*ed
up# 5lso= since t%e scope of deduplication includes multiple )ac*ups ac"oss multiple %osts= it (ill ta*e time to
accumulate sufficient data in t%e TSM sto"age pool to )e effective at eliminating duplicates# T%e"efo"e= it is
impo"tant to sample "esults at "egula" inte"vals= suc% as (ee*l'= to o)tain a valid "epo"t of t%e "esults#
2n addition to c%ec*ing data "eduction "esults= TSM p"ovides 3ue"ies t%at can s%o( pending activit' fo"
deduplication p"ocessing# T%ese 3ue"ies can )e issued to dete"mine an ove"all assessment of deduplication
p"ocessing in t%e se"ve"# 5 sc"ipt %as )een developed to assist administ"ato"s (it% monito"ing of
deduplication&"elated p"ocessing# T%e sc"ipt sou"ce is p"ovided in t%e appendi6 of t%is document#
5.1 Simple TSM Server Queries
5.1.1 QUERY STGPOOL
T%e EDE.F ST4P??A command p"ovides a )asic and 3uic* met%od fo" evaluating deduplication "esults#
Co(eve"= if t%e 3ue"' is "un p"io" to "eclamation of t%e sto"age pool t%en t%e IDuplicate Data /ot Sto"edJ
value (ill )e inaccu"ate and not "eflect t%e most "ecent data "eduction#
E6ample command:
Eue"' stgpool fo"mat,detailed
E6ample output:
+stimated 3apacityA F$;4; !
/pace 5ri((er ,tilA >0C@
-ct ,tilA >0C@
-ct Bi(rA >0C@
-ct 2o(icalA F;C@
G CCC >
*eduplicate *ataHA 4es
-rocesses Ior .dentifyin( *uplicatesA 0
*uplicate *ata 7ot /toredA %;$&;@ ! (;@J)
6uto?copy BodeA 3lient
3ontains *ata *eduplicated by 3lientHA 4es
T%e displa'ed value of IDuplicate Data /ot Sto"edJ (ill s%o( t%e actual "eduction of data in mega)'tes o"
giga)'tes= and t%e pe"centage of "eduction of t%e sto"age pool# 2f "eclamation %as not 'et occu""ed= t%e
follo(ing e6ample s%o(s t%e pending amount of data t%at (ill )e "emoved:
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 81 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
2n t%is e6ample I)ac*uppool&fileJ is t%e name of t%e deduplicating sto"age pool#
5.1.2 Other server queries affected by deduplication
5.1.2.1 QUERY OCCUPANCY
;%en a filespace is )ac*ed up to a deduplicated sto"age pool= t%e IEDE.F ?00DP5/0FJ command (ill
s%o( t%e logical amount of sto"age pe" filespace# T%e p%'sical space is displa'ed as I0#00J as t%is
info"mation is not a)le to )e dete"mined on an individual filespace )asis# 5n e6ample is s%o(n )elo(:
Ea"l' ve"sions of t%e TSM V6 se"ve" inco""ectl' maintained occupanc' "eco"ds in ce"tain cases= (%ic% can
"esult in an inco""ect "epo"t of t%e amount of sto"ed data# T%e follo(ing tec%note p"ovides info"mation on %o(
to "epai" t%e occupanc' info"mation if necessa"':
%ttp://(((#i)m#com/suppo"t/docvie(#(ss+uid,s(g21$7$00
5.2 TSM client reports
;%en using client&side deduplication= t%e client summa"' "epo"t (ill s%o( t%e data "eduction associated (it%
deduplication as (ell as comp"ession# 5n e6ample is s%o(n %e"e:
5otal number of obKects inspectedA &;0$#F4
5otal number of obKects backed upA @&
5otal number of obKects updatedA 0
5otal number of obKects reboundA 0
5otal number of obKects deletedA 0
5otal number of obKects expiredA @%
5otal number of obKects failedA 0
5otal obKects deduplicatedA &%4
5otal number of bytes inspectedA #C#F 5<
5otal number of bytes processedA #&%C%4 B<
5otal bytes before deduplicationA #C0# !<
5otal bytes after deduplicationA #&#CF B<
5otal number of bytes transferredA #&%C%4 B<
*ata transfer timeA %%C## sec
7etwork data transfer rateA >$#%%C 0<"sec
6((re(ate data transfer rateA #>4CF 0<"sec
1bKects compressed byA 0J
*eduplication reductionA ;@C&0J
5otal data reduction ratioA FFCFFJ
+lapsed processin( timeA 00A#&A40
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page 8 of $0
Effective Planning and Use of TSM V6 and V7 Dedupl ication
5.3 TSM deduplication report script
5 sc"ipt %as )een developed to p"ovide detailed info"mation on deduplication "esults fo" a TSM se"ve"# 2n
addition to p"oviding summa"' info"mation on t%e effectiveness of TSM deduplication= it can also )e used to
gat%e" diagnostics if deduplication "esults a"e not consistent (it% e6pectations# T%e sc"ipt and usage
inst"uctions can )e o)tained f"om t%e TSM suppo"t site:
http://www.ibm.com/support/docview.wss?uid=swg21596944
5n e6ample of t%e summa"' data p"ovided )' t%is "epo"t is s%o(n )elo(:
T%e "epo"t also p"ovides details of dedup "elated utiliBation of t%e TSM data)ase#
V End of DocumentW
Document: Effective Planning and use of TSM V6 and V7 Deduplication
Date: 12/0/201!
Ve"sion: 2#0
Page $0 of $0

S-ar putea să vă placă și