Sunteți pe pagina 1din 14

Database High Availability/Business Intelligence

www.lanzarotecaliente.com (press control & click here)


O ORACLE RACLE D DATABASE ATABASE 10 10g g D DATA ATA W WAREHOUSE AREHOUSE B BACU! ACU! & R & RECO"ER# ECO"ER#$ $
A AUTO%AT&C UTO%AT&C' S ' S&%!LE &%!LE' A ' A(D (D R REL&ABLE EL&ABLE
George Lumpkin, Oracle
Tammy Bednar, Oracle
BACU! & RECO"ER# O) "LDB
Backup and recovery is one of the most crucial and important jobs for a DBA to protect their businesss assets
its data. When data is not available, companies loose credibility, money, and possibly the whole business. As the
data store rows larer each year, you are continually challened to ensure that critical data is backed up and it can
be recovered !uickly and easily to meet your business needs.
Data warehouses are uni!ue in that they are lare and data may come from a myriad of resources and it is
transformed before finally bein inserted into the database. While it may be possible to lean data from these data
sources to repopulate tables in case of a loss, this does not imply that the data in a warehouse is any less important
to protect. Data warehouses present challenes to implement a backup and recovery stratey to meet the needs of
its users. "he key focus of this paper is to propose a more efficient backup and recovery stratey for data
warehouses and reduce the overall resources necessary to support backup and recovery by leverain some of the
special characteristics that differentiate data warehouses from #$"% systems.
DATA WAREHOUS&(*
A data warehouse is a system which is desined to support analysis and decision&makin. 'n a typical enterprise,
hundreds or thousands of users may rely on their data warehouse to provide the information to help them
understand their business and make better decisions. "herefore availability is a key re!uirement for data
warehousin. "his paper will address one key aspect of data&warehousin availability( the recovery of data after a
data&loss.
Before lookin at the backup and recovery techni!ues in detail, it is important to understand why we would even
discuss specific techni!ues for backup and recovery of a data warehouse. 'n particular, one leitimate !uestion
miht be( why shouldn)t a data warehouse)s backup and recovery stratey be just like that of every other database
system*
'ndeed, any DBA should initially approach the task of data warehouse backup and recovery by applyin the same
techni!ues that are used in #$"% systems( the DBA must decide what information they want to protect and
Paper 40179
Database High Availability/Business Intelligence
!uickly recover when media recovery is re!uired, prioriti+in data accordin to its importance and the deree to
which it chanes. ,owever, the issue that commonly arises for data warehouses is that an approach that is efficient
and cost&effective for a -../B #$"% system may not be viable for a -."B data warehouse. "he backup and
recovery may take -.. times loner or re!uire -.. times more tape drives. 'n this paper, we will e0amine the
uni!ue characteristics of a data warehouse, and discuss efficient strateies for backin up and recoverin even very
lare amounts of data cost&effectively and in the time re!uired to meet the needs of the business.
DATA WAREHOUSE CHARACTERSTCS
"here are four key differences between data warehouses and operational systems that have sinificant impacts on
backup and recovery.
1irst, a data warehouse is typically much larer than an operational system. Data warehouses over a terabyte are
not uncommon and the larest data warehouses runnin #racle2i rane into the -.)s of terabytes. Data warehouses
built on #racle3i and #racle-.g row to orders of manitude larer. "hus, scalability is a particularly important
consideration for data warehouse backup and recovery.
4econd, a data warehouse often has lower availability re!uirements than an operational system. While data
warehouses are mission critical, there is also a sinificant cost associated with the ability to recover multiple
terabytes in a few hours vs. recoverin in a day. 4ome orani+ation may determine that in the unlikely event of a
failure re!uirin the recovery of a sinificant portion of the data warehouse, they may tolerate an outae of a day
or more if they can save sinificant e0penditures in backup hardware and storae.
"hird, a data warehouse is typically updated via a controlled process called the 5"$ 650tract, "ransform, $oad7
process, unlike in #$"% systems where end&users are modifyin data themselves. Because the data&modifications
are done in a controlled process, the updates to a data warehouse are often known and reproducible from sources
other than database los.
1ourth, a data warehouse contains historical information, and often, sinificant portions of the older data in a data
warehouse are static. 1or e0ample, a data warehouse may track five years of historical sales data. While the most
recent year of data may still be subject to modifications 6due to returns, restatements and so forth7, the last four
years of data may be entirely static. "he advantae of static data is that it does not need to be backed up fre!uently.
"hese four characteristics are key considerations when devisin a backup and recovery stratey that is optimi+ed
for data warehouses.
ORACLE BACU! A(D RECO"ER#
'n eneral, backup and recovery refers to the various strateies and procedures involved in protectin your
database aainst data loss and reconstructin the database after any kind of data loss. A backup is a representative
copy of data. "his copy can include important parts of a database such as the control file, redo los, and data files.
A backup protects data from application error and acts as a safeuard aainst une0pected data loss, by providin a
way to restore oriinal data.
!H"SCAL DATABASE STRUCTURES USED # RECO$ER#G DATA
Before you bein to think seriously about backup and recovery stratey, the physical data structures relevant for
backup and recovery operations must be identified. "he files and other structures that make up an #racle database
store data and safeuard it aainst possible failures. "hree basic components are re!uired for an #racle database
recovery that include datafiles, control files, and redo los.
Paper 40179
Database High Availability/Business Intelligence
DATA%LES
An #racle database consists of one or more loical storae units called tablespaces. 5ach tablespace in an #racle
database consists of one or more files called datafiles, which are physical files located on or attached to the host
operatin system in which #racle is runnin.
A databases data is collectively stored in the datafiles that constitute each tablespace of the database. "he simplest
#racle database would have one tablespace, stored in one datafile. 8opies of the datafiles of a database are a
critical part of any backup to recover the data !uickly.
REDO LOGS
9edo los record all chanes made to a databases data files. With a complete set of redo los and an older copy
of a datafile, #racle can reapply the chanes recorded in the redo los to recreate the database at any point
between the backup time and the end of the last redo lo. 5ach time data is chaned in an #racle database, that
chane is recorded in the online redo lo first, before it is applied to the datafiles.
An #racle database re!uires at least two online redo lo roups, and in each roup there is at least one online redo
lo member, an individual redo lo file where the chanes are recorded. At intervals, #racle rotates throuh the
online redo lo roups, storin chanes in the current online redo lo while the roups not in use can be copied to
an archive location, where they are called archived redo los 6or, collectively, the archived redo lo7. %reservin
the archived redo lo is a major part of your backup stratey, as they contain a record of all updates to datafiles.
Backup strateies often involve copyin the archived redo los to disk or tape for loner&term storae.
CO#TROL %LES
"he control file contains a crucial record of the physical structures of the database and their status. 4everal types of
information stored in the control file are related to backup and recovery(
Database information re!uired to recover from crashes, media recovery, etc.
Database structure information such as datafile details
9edo lo details
Archive lo records
A record of past 9:A; backups
#racles datafile recovery process is in part uided by status information in the control file, such as the database
checkpoints, current online redo lo file, and the datafile header checkpoints for the datafiles. $oss of the control
file makes recovery from a data loss much more difficult.
BAC&U! T"!E
Backups are divided into physical backups and loical backups %hysical backups are backups of the physical files
used in storin and recoverin your database, such as datafiles, control files, and archived redo los. <ltimately,
every physical backup is a copy of files storin database information to some other location, whether on disk or
some offline storae such as tape.
$oical backups contain loical data 6for e0ample, tables or stored procedures7 e0tracted from a database with the
#racle Data %ump 6e0port=import7 utility. "he data is stored in a binary file that can be used for re&importin into
an #racle database. %hysical backups are the foundation of any sound backup and recovery stratey. $oical
backups are a useful supplement to physical backups in many circumstances but are not sufficient protection
aainst data loss without physical backups.
Paper 40179
Database High Availability/Business Intelligence
9econstructin the contents of all or part of a database from a backup typically involves two phases( retrievin a
copy of the datafile from a backup, and reapplyin chanes to the file since the backup from on the archived and
online redo los, to brin the database to the desired recovery point in time. "o restore a datafile or control file
from backup is to retrieve the file onto disk from a backup location on tape, disk or other media, and make it
available to the #racle database server.
"o recover a datafile, is to take a restored copy of the datafile and apply to it chanes recorded in the databases
redo los. "o recover a whole database is to perform recovery on each of its datafiles.
BAC&U! TOOLS
#racle provides tools to manae backup and recovery of #racle databases. 5ach tool ives you a choice of several
basic methods for makin backups. "he methods include(
Reco'ery (anager )R(A#* > 9:A; reduces the administration work associated with your backup stratey.
9:A; keeps an e0tensive record of metadata about backups, archived los, and its own activities. 'n restore
operations, 9:A; can use this information to eliminate the need for you to identify backup files for use in
restores. ?ou can also enerate reports of backup activity usin the information in the repository.
Oracle En+erpri,e (anager > #racles /<' interface that invokes 9ecovery :anaer.
Oracle Da+a !ump > A new feature of #racle Database -.g that provides hih speed, parallel, bulk data and
metadata movement of #racle database contents. "his utility makes loical backups by writin data from an
#racle database to operatin system files in a proprietary format. "his data can later be imported into a
database.
U,er (anaged > "he database is backed up manually by e0ecutin commands specific to the users
operatin system.
RECO$ER" (A#AGER )R(A#*
9ecovery :anaer is #racles utility to manae the backup, and more importantly the recovery, of the database. 't
eliminates operational comple0ity while providin superior performance and availability of the database.
9ecovery :anaer debuted with #racle2 to provide DBAs an interated backup and recovery solution.
9ecovery :anaer determines the most efficient method of e0ecutin the re!uested backup, restore, or recovery
operation and then e0ecutes these operations in concert with the #racle database server. 9ecovery :anaer and the
server automatically identify modifications to the structure of the database and dynamically adjust the re!uired
operation to adapt to the chanes.
9ecovery :anaer is a powerful and versatile utility that allows users to make a backup or imae copy of their
data. When the user specifies files or archived los usin the 9ecovery :anaer backup command, 9ecovery
:anaer creates a backup set as output. A backup set is a file or files in a 9ecovery :anaer&specific format that
re!uires the use of the 9ecovery :anaer restore command for recovery operations.
Paper 40179
Database High Availability/Business Intelligence
When a 9ecovery :anaer command is issued, such as backup or copy, 9ecovery :anaer establishes a
connection to an #racle server process. "he server process then backs up the specified datafile, control file, or
archived lo from the #racle database.
9ecovery :anaer automatically establishes the names and locations of all the files needed to back up. 9ecovery
:anaer also supports incremental backups > backups of only those blocks that have chaned since a previous
backup. 'n traditional backup methods, all the data blocks ever used in a datafile must be backed up.
E#TER!RSE (A#AGER
Althouh 9ecovery :anaer is commonly used as a command&line utility, the #racle 5nterprise :anaer is the
/<' interface that enables backup and recovery via a point&and&click method. #racle 5nterprise :anaer 65:7
supports Backup and 9ecovery features commonly used by users.
Backup 8onfiurations to customi+e and save commonly used confiurations for repeated use
Backup and 9ecovery wi+ards to walk the user throuh the steps of creatin a backup script and submittin it
as a scheduled job
Backup @ob $ibrary to save commonly used Backup jobs that can be retrieved and applied to multiple tarets
Backup @ob "ask to submit any 9:A; job usin a user&defined 9:A; script.
Paper 40179
Database High Availability/Business Intelligence
BAC&U! (A#AGE(E#T
5nterprise :anaer provides the ability to view and perform maintenance aainst 9:A; backups. ?ou can view
the 9:A; backups, archive los, control file backups, and imae copies. 'f you select the link on the 9:A;
backup, it will display all files that are located in that backup.
ORACLE DATA !U(!
%hysical backups can be supplemented by usin the Data %ump 6e0port=import7 utility to make loical backups of
data. $oical backups store information about the schema objects created for a database. Data %ump is a utility for
unloadin data and metadata into a set of operatin system files that can be imported on the same system or it can
be moved to another system and loaded there.
"he dump file set is made up of one or more disk files that contain table data, database object metadata, and
control information. "he files are written in a proprietary, binary format. Durin an import operation, the Data
%ump 'mport utility uses these files to locate each database object in the dump file set.
USER (A#AGED BAC&U!S
'f the user does not want to use 9ecovery :anaer, operatin system commands can be used such as the <;'A dd
or +ar command to make backups. 'n order to create a user manaed online backup, the database must manually be
placed into hot backup mode. ,ot backup mode can cause additional writes to the online lo files.
Backup operations can also be automated by writin scripts. "he user can make a backup of the whole database at
once or back up individual tablespaces, datafiles, control files, or archived los. A whole database backup can be
supplemented with backups of individual tablespaces, datafiles, control files, and archived los. #=4 commands or
B
rd
party backup software can be used to perform database backups. 8onversely, you or the B
rd
party software
must be used to restore the backups of the database.
Paper 40179
Database High Availability/Business Intelligence
DATA WAREHOUSE BACU! & RECO"ER#
Data warehouse recovery is not any different from an #$"% system. But a data warehouse may not re!uire all of
the data to be recovered in the traditional method, i.e. from a backup. An efficient and fast recovery of a data
warehouse beins with a -ell.planned backup. "he ne0t several sections will help you to identify what data
should be backed up and uide you to the method and tools that will allow you to recover critical data in the
shortest amount of time.
RECO"ER# T&%E OB+ECT&"E (RTO)
A 9ecovery "ime #bjective, or 9"#, is the number of hours in which you want to be able to recover your data.
?our backup and recovery plan should be desined to meet 9"#)s your company chooses for its data warehouse.
1or e0ample, you may determine that C.D of the data must be available after a complete loss of the #racle
database within C days. And then the remainder of the data should be available within -E days. 'n this particular
case you have two 9"#s. ?our total 9"# is -3 days.
"o determine what your 9"# should be, you must first identify the impact of the data not bein available. "o
establish an 9"# follow these four steps.
-. Analy/e 0 den+i1y( <nderstand your recovery readiness, risk areas, and the business costs of unavailable
data. 'n a data warehouse, you should identify critical data that must be recovered in the # days after an
outae.
F. De,ign2 "ransform the recovery re!uirements into backup and recovery strateies. "his can be
accomplished by orani+in the data into their loical relationships and criticality.
B. Build 0 n+egra+e2 Deploy and interate the solution into your environment to backup and recover your
data. Document the backup and recovery planG
E. (anage 0 E'ol'e2 "est your recovery plans at reular intervals. 'mplement a chane manaement
processes to refine and update it as your data, '" infrastructure, and business processes chane.
RECO"ER# !O&(T OB+ECT&"E (R!O)
9ecovery %oint #bjective describes the ae of the data you want the ability to restore in the event the #racle
database files are corrupted or lost. 1or e0ample, if your 9%# is - week, you want to be able to restore the
database back to the state it was - week ao or less. "o achieve this, you should create backups at least once per
week. Any data created or modified inside your recovery point objective will be either lost or must be recreated
durin the recovery interval. A short 9"# and a low 9%# enerally cause recovery measures to be more
e0pensive, but for critical business processes, e0pense is not an issue.
Paper 40179
Database High Availability/Business Intelligence
%ORE DATA %EA(S A LO(*ER BACU! W&(DOW
"he most obvious characteristic of the data warehouse is the si+e of the database. "his can rane from C../B to
-..s of "B of data. DBAs who have waited around for a tape backup to complete on a C/B database are
probably sayin to themselves that there is no way to backup this much database in a reasonable timeframe usin
the traditional backup method to tape. ,ardware is the limitin factor to a fast backup and more importantly the
recovery.

,owever, todays tape storae continues to evolve to accommodate the amount of data that needs to be offloaded
to tape. "apes can now backup the database at speeds of H:B=sec to FE:B=sec. :oreover, tapes can hold up to
F.. /B of data. #racles 9:A; can fully utili+e, in parallel, all available tape devices to ma0imi+e backup and
recovery performance.
,% and #racle teamed up in a recent test utili+in ,% 4torae Data %rotector, 9:A;, and the #racle Database to
demonstrate lare amounts of data can be backed up in a short period of time
-
. A B.H"B = hour backup was
achieved usin the ,% 54$ <ltrium EH. $ibrary with -H drives.
5ssentially, the time re!uired to back up a lare database is a matter of simple arithmetic. "his time is dependent
on the hardware, type of tape library, and the number of tape devices. Backup and recovery windows can be
reduced to fit any businesss re!uirements, when suitable hardware resources are available. 5ach data warehouse
team will make its own tradeoff for backup performance versus total cost, based on its availability re!uirements
and budetary constraints. 'f you want a fast backup and recovery, you must invest in the hardware re!uired to
meet that backup window.
D$DE A#D CO#3UER
'n a data warehouse, there may be times when the database is not bein fully utili+ed, i.e. loadin of data. While
this window of time may be several contiuous hours, it is not enouh to backup the entire database. "herefore,
you may want to consider breakin the database backup over a number of days.
#racle Database -.g 9:A; e0tended the BA8I<% capability that allows you to specify how lon a iven
backup job is allowed to run. When usin BA8I<%.. D<9A"'#; you can choose between runnin the backup
to completion as !uickly as possible and runnin the backup more slowly to minimi+e the load the backup may
imposes on your database. 'n the followin e0ample, 9:A; will backup all database files that have not been
backed up in the last FE hours first, run for E hours, and read the blocks as fast as possible.
BA8I<% DA"ABA45 ;#" BA8I5D <% 4';85 Jsysdate - %A9"'A$ D<9A"'#; E(.. :';':'K5 "':5L
5ach time this 9:A; command is run, it will backup the datafiles that have not been backed up in the last FE
hours first. ?ou do not need to manually specify the tablespaces or datafiles to be backed up each niht. #ver a
course of several days, all of your database files have been backed up.
While this is a simplistic approach to database backup, it is easy to implement and provides more fle0ibility in
backin up lare amounts of data.
THE DATA WAREHOUSE RECO"ER# %ETHODOLO*#
Devisin a backup and recovery stratey can be a dauntin task. And when you have -..s of /is of data that
must be protected and recovered in the case of a failure, the stratey can be very comple0. Below are several best
-
Mhp enterprise libraries reach new performance levelsN, .E=F..B
Paper 40179
Database High Availability/Business Intelligence
practices that can be implemented to ease the administration of backup and recovery.
BEST !RACT&CE ,1$ USE ARCH&"ELO* %ODE
Archived redo los are crucial for recovery when no data can be lost, since they constitute a record of chanes to
the database. #racle can be run in either of two modes(
A98,'O5$#/ && #racle archives the filled online redo lo files before reusin them in the cycle.
;#A98,'O5$#/ && #racle does not archive the filled online redo lo files before reusin them in the cycle.
9unnin the database in A98,'O5$#/ mode has the followin benefits(
"he database can be completely recovered from both instance and media failure.
"he user can perform backups while the database is open and available for use.
#racle supports multiple0ed archive los to avoid any possible sinle point of failure on the archive los.
"he user has more recovery options, such as the ability to perform tablespace&point&in&time recovery
6"4%'"97
Archived redo los can be transmitted and applied to the physical standby database, which is an e0act replica
of the primary database.
9unnin the database in ;#A98,'O5$#/ mode has the followin conse!uences(
"he user can only back up the database while it is completely closed after a clean shutdown.
"ypically, the only media recovery option is to restore the whole database, which causes the loss of all
transactions since the last backup.
S DOW#T(E ACCE!TABLE4
#racle database backups can be made while the database is open or closed. %lanned downtime of the database can
be disruptive to operations, especially in lobal enterprises that support users in multiple time +ones, up to FE&
hours per day. 'n these cases it is important to desin a backup plan to minimi+e database interruptions.
Dependin on your business, some enterprises can afford downtime. 'f your overall business stratey re!uires
little or no downtime, then your backup stratey should implement an online backup. "he database needs never to
be taken down for a backup. An online backup re!uires the database to be in A98,'O5$#/ mode.
"here is essentially no reason not to use A98,'O5$#/ mode. All data warehouses 6and for that matter, all
mission&critical databases7 should use A98,'O5$#/ mode. 4pecifically, iven the si+e of a data warehouse 6and
conse!uently the amount of time to back up a data warehouse7, it is enerally not viable to make an offline backup
of a data warehouse, which would be necessitated if one were usin ;#A98,'O5$#/ mode.
#f course, lare&scale data warehouses may undero lare amounts of data&modification, which in turn will
enerate lare volumes of lo files. "o accommodate the manaement of lare volumes of archived lo files,
#racle Database -.g 9:A; provides the option to compress lo files as they are archived. "his will allow you to
keep more archive los on disk for faster accessibility for recovery.
Paper 40179
Database High Availability/Business Intelligence
BEST !RACT&CE ,-$ USE R%A(
:any data warehouses, which were oriinally developed on #racle2 and even #racle2i, may not have interated
9:A; for backup and recovery. ,owever, just as there is a preponderance of reasons to leverae A98,'O5$#/
mode, there is a similarly compellin list of reasons to adopt 9:A;. A few of the 9:A; differentiators are
listed here.
Top 10 Re.sons to inte/r.te Reco0er1 %.n./er into 1o2r B.ck2p .n3 Reco0er1 Str.te/1
104 50tensive 9eportin 54 5asily interates with :edia :anaers
64 'ncremental Backups 74 Block :edia 9ecovery 6B:97
84 Downtime 1ree Backups 94 Archive lo validation and manaement
:4 Backup and 9estore Oalidation -4 8orrupt Block Detection
;4 Backup and 9estore #ptimi+ation 14 "rouble 1ree Backup and 9ecovery
BEST !RACT&CE ,9$ LE"ERA*E READ<O(L# TABLES!ACES
#ne of the biest issues facin a data warehouse is sheer si+e of a typical data warehouse. 5ven with powerful
backup hardware, backups may still take several hours. "hus, one important consideration in improvin backup
performance is minimi+in the amount of data to be backed up. 9ead&only tablespaces are the simplest
mechanism to reduce the amount of data to be backed up in a data warehouse.
"he advantae of a read&only tablespace is that the data only need to be backed up once. 4o, if a data warehouse
contains five years of historical data, and the first four years of data can be made read&only. "heoretically the
reular backup of the database would only back up F.D of the data. "his can dramatically reduce the amount of
time re!uired to back up the data warehouse.
:ost data warehouses store their data in tables that have been rane&partitioned by time. 'n a typical data
warehouse, data is enerally )active) for a period ranin anywhere from B. days to one year. Durin this period,
the historical data can still be updated and chaned 6for e0ample, a retailer may accept returns up to B. days
beyond the date of purchase, so that sales data records could chane durin this period7. ,owever, once a data has
reached a certain date, it is often known to be static.
By leverain partitionin, users can make the static portions of their data read&only. 8urrently, #racle supports
read&only tablespaces rather than read&only partitions or tables. "o take advantae of the read&only tablespaces and
reduce the backup window, a stratey of storin constant data partitions in a read&only tablespace should be
devised. ,ere are two strateies for implementin a rollin window.
-. 'mplement a reularly scheduled process to move partitions from a read&write tablespace to a read&only
tablespace when the data matures to the point where it is entirely static.
F. 8reate a series of tablespaces, each containin a small number of partitions and reularly modify one
tablespace from read&write to read&only as the data in that tablespaces aes.
#ne consideration is that backin up data is only half of the recovery process. 'f you confiure a tape system so
that it can backup the read&write portions of a data warehouse in E hours, the corollary is that a tape system miht
Paper 40179
Best Practice: Put the database in archive log mode to provide:
online backups.
point-in-time recovery options.
Database High Availability/Business Intelligence
take F. hours to recover the database if a complete recovery is necessary when 2.D of the database is read&only.
BEST !RACT&CE ,7$ !LA( )OR (OLO**&(* O!ERAT&O(S &( #OUR BACU!=RECO"ER# STRATE*#
'n eneral, one of the hihest priorities for a data warehouse is performance. ;ot only must the data warehouse
provide ood !uery performance for online users, but the data warehouse must also be efficient durin the 5"$
process so that lare amount of data can be loaded in the shortest amount of time.
#ne common optimi+ation leveraed by data warehouses is to e0ecute bulk&data operations usin the )noloin)
mode. "he database operations which support noloin modes are direct&path loads and inserts, inde0 creation,
and table creation. When an operation runs in )noloin) mode, data is not written to the redo lo 6or more
precisely, only a small set of metadata is written to the redo lo7. "his mode is widely used within data warehouses
and can improve the performance of bulk data operations by up to C.D.
,owever, the tradeoff is that a noloin operation cannot be recovered usin conventional recovery mechanisms,
since the necessary data to support the recovery was never written to the lo file. :oreover, subse!uent operations
to the data upon which a noloin operation has occurred also cannot be recovered even if those operations were
not usin noloin mode. Because of the performance ains provided by noloin operations, it is enerally
recommended that data warehouses utili+e noloin mode in their 5"$ process.
"he presence of noloin operations must be taken into account when devisin the backup and recovery stratey.
When a database is relyin on noloin operations, the conventional recovery stratey 6of recoverin from the
latest tape backup and applyin the archived lofiles7 is no loner applicable because the lo files will not be able
to recover the noloin operation.
"he first principle to remember is, dont make a backup when a noloin operation is occurrin. #racle does not
currently enforce this rule, so the DBA must schedule the backup jobs and the 5"$ jobs such that the noloin
operations do not overlap with backup operations.
"here are two approaches to backup and recovery in the presence of noloin operationsL 5"$ or incremental
backups. 'f you are not usin noloin operations in your data warehouse, then you do not have to choose either
of the followin options( you can recover your data warehouse usin archived los. ,owever, the followin
options may offer some performance benefits over an archive&lo&based approach in the event of recovery.
E5TRACT, TRA#S%OR(, 0 LOAD
"he 5"$ process uses several #racle features= tools and a combination of methods to load 6re&load7 data into a
data warehouse. "hese features or tools may consist of(
Tran,por+a6le Ta6le,pace,7 "he #racle "ransportable "ablespace feature allows users to !uickly move a
tablespace across #racle databases. 't is the most efficient way to move bulk data between databases. #racle
Database -.g provides the ability to transport tablespaces across platforms. 'f the source platform and the
taret platform are of different endianness, then 9:A; will convert the tablespace bein transported to the
taret format.
S3L8Loader7 4P$Q$oader loads data from e0ternal flat files into tables of an #racle database. 't has a
Paper 40179
Best Practice: Place static tables and partitions into read-only tablespaces. A read-only tablespace
needs to be backed up only one time.
Best Practice: Restore a backup that does not contain non-recoverable (nologging) transactions.
Then replay the T! process to reload the data.
Database High Availability/Business Intelligence
powerful data&parsin enine that puts little limitation on the format of the data in the datafile.
Da+a !ump 6e0port=import7. #racle Database -.g introduces the new #racle Data %ump technoloy, which
enables very hih&speed movement of data and metadata from one database to another. "his technoloy is the
basis for #racles new data movement utilities, Data %ump 50port and Data %ump 'mport.
E9+ernal Ta6le,7 "he e0ternal tables feature is a complement to e0istin 4P$Q$oader functionality. 't enables
you to access data in e0ternal sources as if it were in a table in the database.
THE ETL STRATEG"
#ne approach is take reular database backups and also store the necessary data files to recreate the 5"$ process
for that entire week. 'n the event where a recovery is necessary, the data warehouse could be recovered from the
most recent backup. "hen, instead of Mrollin forwardN by applyin the archived redo los 6as would be done in a
conventional recovery scenario7, the data warehouse could be rolled forward by re&runnin the 5"$ processes.
"his paradim assumes that the 5"$ processes can be easily replayed, which would typically involve storin a set
of e0tract files for each 5"$ process 6many data warehouses do this already as a best practice, in order to be able
to identify repair a bad data feed for e0ample7.
A sample implementation of this approach is make a backup of the data warehouse every weekend, and then store
the necessary files to support the 5"$ process for each niht. "hus, at most, R days of 5"$ processin would need
to be re&applied in order to recover a database. "he data warehouse administrator can easily project the lenth of
time to recover the data warehouse, based upon the recovery speeds from tape and performance data from
previous 5"$ runs.
5ssentially, the data warehouse administrator is ainin better performance in the 5"$ process via noloin
operations, at a price of sliht more comple0 and less&automated recovery process. :any data warehouse
administrators have found that this is a desirable trade&off.
#ne downside to this approach is that the burden is upon the data warehouse administrator to track all of the
relevant chanes that have occurred in the data warehouse. "his approach will not capture chanes that fall outside
of the 5"$ process. 1or e0ample, in some data warehouses, end&users may create their own tables and data
structures. "hose chanes will be lost in the event of a recovery. "his restriction needs to be conveyed to the end&
users. Alternatively, one could also mandate that end&users create all of private database objects in a separate
tablespace, and durin recovery, the DBA could recover this tablespace usin conventional recovery while
recoverin the rest of the database usin the approach of replayin the 5"$ process.
#CRE(E#TAL BAC&U!
A more automated backup and recovery stratey in the presence of noloin operations leveraes 9:A;s
incremental backup capability 'ncremental backups have been part of 9:A; since it was first released in
#racle2... 'ncremental backups provide the capability to backup only the chaned blocks since the previous
backup. 'ncremental backups of datafiles capture data chanes on a block&by&block basis, rather than re!uirin the
backup of all used blocks in a datafile. "he resultin backups set are enerally smaller and more efficient than full
datafile backups, unless every block in the datafile is chane.
#racle Database -.g delivers the ability for faster incrementals with the implementation of the chane trackin file
feature. When you enable block chane trackin, #racle tracks the physical location of all database chanes.
9:A; automatically use the chane trackin file to determine which blocks need to be read durin an incremental
backup and directly accesses that block to back it up.
Paper 40179
Best Practice: "mplement Block #hange Tracking $unctionality and make an incremental backup a$ter
a direct load that leaves ob%ects unrecoverable due to nologging operations.
Database High Availability/Business Intelligence
S:#G THE BLOC& CHA#GE TRAC&#G %LE
"he si+e of the block chane trackin file is proportional to(
Da+a6a,e ,i/e in 6y+e,7 "he block chane trackin file contains data representin data file blocks in the
database. "he data is appro0imately -=FC.... of the total si+e of the database.
T;e num6er o1 ena6led +;read,. All 9eal Application 8luster 69A87 instances have access to the same block
chane trackin file, however, the instances update different areas of the trackin file without any lockin or
inter&node block swappin. ?ou enable block chane trackin for the entire database and not for individual
instances.
C;anged Block (e+a.Da+a7 "he block chane trackin file keeps a record of all chanes between previous
backups, in addition to the modifications since the last backup. "he trackin file retains the chane history for
a ma0imum of eiht backups. 'f the trackin file contains the chane history for eiht backups then the #racle
database overwrites the oldest chane history information.
$ets take an e0ample of a C.. /B database, with only one thread, and havin eiht backups kept in the 9:A;
repository will re!uire a block chane trackin file of F. :B.
66"hreads Q F7 S number of old backups7 Q 6database si+e in bytes7
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& T F. :B
FC....
THE #CRE(E#TAL A!!ROACH
A typical backup and recovery stratey usin this approach is to backup the data warehouse every weekend, and
then take incremental backups of the data warehouse every niht followin the completion of the 5"$ process.
;ote that incremental backups, like conventional backups, must not be run concurrently with noloin
operations. 'n order to recover the data warehouse, the database backup would be restored, and then each nihts
incremental backups would be re&applied. Althouh the noloin operations were not captured in the archivelos,
the data from the noloin operations is present in the incremental backups. :oreover, unlike the previous
approach, this backup and recovery stratey can be completely manaed usin 9:A;.
"he Jreplay 5"$ approach and the Jincremental backup approach are both recommended solutions to efficiently
and safely backin up and recoverin a database which is a workload consistin of many noloin operations.
"he most important consideration is that your backup and recovery stratey must take these noloin operations
into account.
BEST !RACT&CE ,5$ (OT ALL TABLES!ACES ARE CREATED E>UAL
DBAs are not the foundin fathers of a new country. ;ot all of the tablespaces in a data warehouse are e!ually
sinificant from a backup and recovery perspective. DBAs can leverae this information to devise more efficient
backup and recovery strateies when necessary. "he basic ranularity of backup and recovery is a tablespace, so
different tablespaces can potentially have different backup and recovery strateies. #n the most basic level,
temporary tablespaces never need to be backed up 6a rule which 9:A; enforces7.
Paper 40179
Database High Availability/Business Intelligence
:oreover, in some data warehouses, there may be tablespaces, which are not e0plicit temporary tablespaces but
are essentially functionin as temporary tablespaces as they are dedicated to Jscratch space for end&users to store
temporary tables and incremental results. Dependin upon the business re!uirements, these tablespaces may not
need to backed up and restoredL instead, in the case of a loss of these tablespaces, the end&users would recreate
their own data objects.
'n many data warehouses, some data is more important than other data. 1or e0ample, the sales data in a data
warehouse may be crucial and in a recovery situation this data must be online as soon as possible. But, in the same
data warehouse, a table storin clickstream data from the corporate website may be much less mission&critical. "he
business may tolerate this data bein offline for a few days or may even be able to accommodate the loss of
several days of clickstream data in the event of a loss of database files. 'n this scenario, the tablespaces containin
sales data must be backed up often, while the tablespaces containin clickstream data need only to be backed up
once every week or two.
While the simplest backup and recovery scenario is to treat every tablespace in the database the same, #racle
provides the fle0ibility for a DBA to devise a backup and recovery scenario for each tablespace as needed.
CO(CLUS&O(
Backup and recovery is one of the most crucial and important jobs for a DBA to protect their businesss assets
its data. When data is not available, companies loose credibility, money, and possibly the whole business. Data
warehouses are uni!ue in that the data may come from a myriad of resources and it is transformed before finally
bein inserted into the databaseL but mostly because it can be very lare. :anain the recovery of a lare data
warehouses can be a dauntin task and traditional #$"% backup and recovery strateies may not meet the needs of
a data warehouse
By understandin the characteristics of a data warehouse and how it differs from the #$"% systems is the first step
in implementin an efficient recovery stratey. "he recovery time of data is less strinent and can take several
days. 'nteratin 9:A; into your backup and recovery stratey reduces the comple0ity of protectin your data
since 9:A; knows what needs to be backed up. "raditional recovery of data from a backup may not be re!uired
for FCD to C.D of the data warehouse since it can be recreated usin 5"$ processes and methods. 'mplementin
operational best practices for efficient recovery beins with a backup.
Paper 40179

S-ar putea să vă placă și