Sunteți pe pagina 1din 70

Oracle Data Archiving

Taming the Beast

Dave Moore
Neon Enterprise Software

INTELLIGENCE. INNOVATION. INTEGRITY

Agenda
Archiving Defined
Requirements and Solutions
Oracle Archiving Strategies
Oracle Row Removal Options
Oracle Post Archive Operations

INTELLIGENCE. INNOVATION. INTEGRITY

Dave
Oracle ACE
Using Oracle since 1991
Product Author at Neon Enterprise Software
Creator of OracleUtilities.com
Author of Oracle Utilities from Rampant
Tech Press
Core competencies include performance,
utilities and data management
INTELLIGENCE. INNOVATION. INTEGRITY

Database Archiving
Database Archiving:
The process of removing selected data records from
operational databases that are not expected to be referenced
again and storing them in an archive data store where
they can be retrieved if needed.

Purge
INTELLIGENCE. INNOVATION. INTEGRITY

Trends Impacting Archive Needs


Data Retention Issues:

Amount of Data

Volume of data

m
o
C

ia
l
p

ce

ot
r
P

ti
c
e

Length of retention
requirement

on

Time Required

Varied types of data


Security issues

30+ Yrs

INTELLIGENCE. INNOVATION. INTEGRITY

Archiving All Types of Data

Paper
Blueprints
Forms
Claims

Word
Excel
PDF
XML

IMS
DB2
ORACLE
SYBASE
SQL Server
IDMS

VSAM
Programs
UNIX Files

Outlook
Lotus Notes
Attachments

Sound
Pictures
Video

INTELLIGENCE. INNOVATION. INTEGRITY

Data Archiving and ILM

Create

Operational

Needed for
completing
business
transactions

Reference

Archive

Needed for
reporting
or expected
queries

Needed for
compliance
and business
protection

Discard

Mandatory Retention Period

INTELLIGENCE. INNOVATION. INTEGRITY

Some Sample Regulations


Impacting Data Retention

INTELLIGENCE. INNOVATION. INTEGRITY

What Does It All Mean?


Enterprises must recognize that there is a
business value in organizing their information
and data.
Organizations that fail to respond run the risk
of seeing more of their cases decided on
questions of process rather than merit.
(Gartner, 20-April-2007, Research Note G00148170:
Cost of E-Discovery Threatens to Skew Justice System)

INTELLIGENCE. INNOVATION. INTEGRITY

Operational Efficiency
Database Archiving can be undertaken to
improve operational efficiency
Large volumes of data can interfere with
production operations
efficiency of transactions
efficiency of utilities: BACKUP/RESTORE, REORG, etc.
Storage
Gartner: databases copied an average of 6 times!

INTELLIGENCE. INNOVATION. INTEGRITY

What Solutions Are Out There?


Keep Data in Operational Database

Problems with authenticity of large amounts of


data over long retention times

Store Data in UNLOAD files (or backups)

Problems with schema change and reading


archived data;
using backups poses even more serious problems

Move Data to a Parallel Reference Database

Combines problems of the previous two

Move Data to a Database Archive

INTELLIGENCE. INNOVATION. INTEGRITY

Components of a
Database Archiving Solution
Production
Database
Data
Extract

Captured Structure
Archive Policies
Data Retention

Archive Data Store


and Retrieve

Recall
Database

Data
Recall

Archive Store
Archive Data
Query Access

Metadata
Policies
History

Data &
Metadata

Archive
Administration

INTELLIGENCE. INNOVATION. INTEGRITY

Archiving Requirements
Policy based archiving: logical selection
Keep data for very long periods of time
Store very large amounts of data in archive
Maintain Archives for ever changing operational systems
Become independent from Applications/DBMS/Systems
Protect authenticity of data
Access data when needed; as needed
Discard data after retention period automatically

INTELLIGENCE. INNOVATION. INTEGRITY

Policy based archiving


Why :

Business objects are archived, not files

Rules for when something is ready can be complex

Data ready to be archived is distributed over database

Implications:

User must provide policies for when something is to


be archived

How:

Full metadata description of data

Flexible specification of policy : WHERE clause

INTELLIGENCE. INNOVATION. INTEGRITY

For Example
Parts Master is
the parent table
to all other tables

STORAGE INFO

Part Number
Bin Number
Qty on Hand
Qty on Order
Qty Backorder

PARTS MASTER
Part Number
Type
Description
Unit Type
Cost
Price
Substitute Parts

ORDER INFO

Part Number
PO Number
Vendor ID
Quantity Ordered
Unit Cost
Date Ordered
Date Received

SUMMARY BY
QUARTER

Part Number
Year
Q1 Disbursed
Q2 Disbursed
Q3 Disbursed
Q4 Disbursed

DISBURSEMENT

Part Number
Dept. ID
CHIT ID
Qty Disbursed
Date Disbursed

INTELLIGENCE. INNOVATION. INTEGRITY

Keep Data for a Long Time


Why: retention requirements in decades
Implications:

Archive will outlive applications/DBMS/systems that


generated them
Archive will outlive people who designed and managed
operational systems
Archive will outlive media we store it on

How:

Unique data store


Application/DBMS/system independence
Metadata independence
Continuous management of storage
Continuous management of archive content

INTELLIGENCE. INNOVATION. INTEGRITY

Maintain Archive for


Changing Operational Systems
Why :

Metadata changes frequently


Applications are re-engineered periodically

Change DBMS platform


Change System platform
Replace with new application
Consolidate after merger or acquisition

Implications:

Archive must support multiple variations of an application


Archive must deal with metadata changes

How:

Manage applications as major archive streams having


multiple minor streams with metadata differences
Achieve independence from operating environment
INTELLIGENCE. INNOVATION. INTEGRITY

Achieve Metadata Independence


Why :

Operational metadata is inadequate


Operational metadata changes
Operational systems keep only the current metadata
Data in archive often does not mirror data in operational
structures

Implications:

Archive must encapsulate metadata


Metadata must be improved

How:

Metadata Capture, Validate, Enhance capabilities


Store structure that encapsulates with data
Keeps multiple versions of metadata
INTELLIGENCE. INNOVATION. INTEGRITY

Protect Authenticity of Data


Why :

Potential use in lawsuits/ investigations


Potential use in business analysis

Implications:

Protect from unwanted changes


Show original input
Cannot be managed in operational environment

How:

SQL Access that does not support I/U/D


Do not modify archive data on metadata changes
Encryption as stored
Checksum for detection of sabotage
Limit access to functions
Audit use of functions
Maintain offsite backup copies for restore if sabotaged
INTELLIGENCE. INNOVATION. INTEGRITY

Access Data Directly From Archive


Why :

Cannot depend on application environment

Implications:

Full access capability within archive system

How:

Industry standard interface (e.g. JDBC)


LOAD format output for
For load into a database
May be different from source database

Requires full and accurate metadata


Ability to review metadata
Ability to function across metadata changes
INTELLIGENCE. INNOVATION. INTEGRITY

Discard Function
Why :

Legal exposure for data kept too long

Implications:

Data cannot be kept in archive beyond retention period

Must be removed with no exposure to forensic software

How:

Policy based discard

System level function

Tightly controlled and audited

True zero out capability

Discard from backups as well

INTELLIGENCE. INNOVATION. INTEGRITY

Database or Archive?
Keep in DB

Keep in Archive

Performance
Space
Compliance

INTELLIGENCE. INNOVATION. INTEGRITY

Based on Data Availability


Keep in DB

Keep in Archive

Purge

Must be Available to App

Must be Available
Must Be Secure
Not Needed

INTELLIGENCE. INNOVATION. INTEGRITY

Oracle Archiving Strategies


Designed Up Front (Yeah, right)
Determined by Application Owner
Implemented by ____________
Utilize Oracle Features

INTELLIGENCE. INNOVATION. INTEGRITY

Finding Large Tables


DBA_SEGMENTS (bytes)
DBA_TABLES (num_rows)
or based on I/O

INTELLIGENCE. INNOVATION. INTEGRITY

Rolling Windows
Self Managing
Mostly based on DATE
Utilize DBMS Features

Partitioning

Transportable Tablespaces

Exchange Partition
Set tablespace read only
Expdp
Copy export file and data file

INTELLIGENCE. INNOVATION. INTEGRITY

Rolling Windows via Partitioning


P1

Data Profile

Storage Profile

Probably
Never
Accessed

Cheap as
you can get

Read Only /
Compressed

P47
Rarely
Accessed

Heavily
Accessed

Not so fast
or expensive

Fast,
expensive

Read Only

Read / Write

INTELLIGENCE. INNOVATION. INTEGRITY

Why not use transportable tablespaces


or Oracle exports for data retention?

INTELLIGENCE. INNOVATION. INTEGRITY

The Problem with Oracle Files


Transportable Tablespaces
Exports
Backups
Oracle
Export Files &
Datafiles

Import
Trans Tsp

Version 16Z

Year 2007
Year 2030

Not a good method for LT Data Retention


INTELLIGENCE. INNOVATION. INTEGRITY

Partitioning (Old ways)


Range Partitioning
Data is distributed based on partition key range
of values usually a date.
Good When: Data is date-based.

INTELLIGENCE. INNOVATION. INTEGRITY

Partitioning (Old Ways)


Hash Partitioning
Uses hash algorithm to create equally sized
buckets of data.
Good When: No natural partition key and
desire I/O balancing (hot spots).

INTELLIGENCE. INNOVATION. INTEGRITY

Partitioning (Old Ways)


List Partitioning
Data is distributed based on LIST of values in
partition key.
Good When: Have short list of values (States,
Regions, Account Types)

INTELLIGENCE. INNOVATION. INTEGRITY

Partitioning (New Ways 11G)


Interval Partitioning
Initial Partition is created manually, the rest are
automatically created as new data arrives.
Good When: Need a rolling window!

INTELLIGENCE. INNOVATION. INTEGRITY

Partitioning (New Ways 11G)


REF partitioning
Related Tables benefit from same partitioning strategy,
whether column exists in children or not!
Good When: Desire related data to be partitioned in
the same manner.

INTELLIGENCE. INNOVATION. INTEGRITY

Partitioning (New Ways 11G)


Virtual Column Partitioning
Partition key may be based on virtual column
Good When: Virtual column is required for
partition key.

INTELLIGENCE. INNOVATION. INTEGRITY

Rows Gotta Go

INTELLIGENCE. INNOVATION. INTEGRITY

Row Removal Options


SQL DELETE
CTAS / DROP / RENAME
TRUNCATE
Row Marking

INTELLIGENCE. INNOVATION. INTEGRITY

SQL DELETE
Good for small number of rows
RI handled automatically
Oracle was born to DELETE, better than any
PL/SQL that you write.
Issue with Un-indexed Foreign Keys

INTELLIGENCE. INNOVATION. INTEGRITY

DELETE Optimization
Work in batches, committing (only when
programmatically DELETING)
Use parallel DML (Partitioned tables only)
Drop Indexes before (if possible)
Index FK columns

INTELLIGENCE. INNOVATION. INTEGRITY

CTAS
Works well for PURGE, not archive
Perfect when you want to keep low percentage of
rows in the table
Doesnt handle RI no DELETE was issued.
Process
1.

Create table with rows you want to keep

2.

Drop old table

3.

Rename table

4.

Recreate indexes

create table new_table unrecoverable as select * from old_table where ...

INTELLIGENCE. INNOVATION. INTEGRITY

TRUNCATE
Congratulations if your application lends
itself to TRUNCATE without losing new data
What about RI?
May truncate or drop individual partitions

INTELLIGENCE. INNOVATION. INTEGRITY

DROP
DROP PARTITION
What would you do before you drop it?
Exchange partition with table
Transportable tablespace.

INTELLIGENCE. INNOVATION. INTEGRITY

Things to Remember
Benchmark the best way for you
Benchmark against real data if possible
Use parallel DML

INTELLIGENCE. INNOVATION. INTEGRITY

Design Summary
Create an architecture that lends itself to
aging, archiving, deleting
This architecture should compensate for
business requirements

For instance, customer orders not accessible after


6 months or

top query performance needed for all ACTIVE


accounts etc

Implement it THE EASY PART


INTELLIGENCE. INNOVATION. INTEGRITY

Post Archive Challenges

INTELLIGENCE. INNOVATION. INTEGRITY

Post Archive Challenges

I have successfully deleted 10


billion rows from the table.
HoooAhhhh! Performance will be
great, space will be available, and
I will get credit for optimizing our
data warehouse application, saving
the company billions of dollars

INTELLIGENCE. INNOVATION. INTEGRITY

2 Days Later

INTELLIGENCE. INNOVATION. INTEGRITY

Post Archive Challenges

Hmmmmm. It looks like


- Queries are not any faster . . .
- The Select count(*) took the same amount of time . . .
- Space was not freed in Oracle (DBA_FREE_SPACE) . . .
- Space was not freed in the operating system . . .
WHY NOT ????? Where are the benefits ???

INTELLIGENCE. INNOVATION. INTEGRITY

From Swiss to Provolone

After DELETE

After Maintenance

INTELLIGENCE. INNOVATION. INTEGRITY

Post Archive Challenges


Statistics are not fresh
High Water Marks are very high
Space has not been freed within
Oracle (if thats what you want)
Space has not been freed to the OS

INTELLIGENCE. INNOVATION. INTEGRITY

Refresh Statistics
Help the optimizer, easy enough
dbms_stats provides many options

INTELLIGENCE. INNOVATION. INTEGRITY

Automatic Stats
Recommended by Oracle
Calls DBMS_STATS_JOB_PROC
Enabled via:
Begin
dbms_auto_task_admin.enable(
client_name => auto optimizer stats collection,
operation => NULL,
window_name => NULL);
END;
/

INTELLIGENCE. INNOVATION. INTEGRITY

When do you go manual ?


High transaction DELETEs or TRUNCATEs
Bulk loads which add more than 10% of table
size

So theres our answer go manual.

INTELLIGENCE. INNOVATION. INTEGRITY

How do we Gather Them?


NOT the Analyze Command
Instead DBMS_STATS package
exec dbms_stats.gather_table_stats(ownname => 'BDB',
tabname => 'MASTER', estimate_percent =>
dbms_stats.auto_sample_size);

INTELLIGENCE. INNOVATION. INTEGRITY

High Water Mark

INTELLIGENCE. INNOVATION. INTEGRITY

High Water Mark

INTELLIGENCE. INNOVATION. INTEGRITY

Reset High Water Mark (HWM)


DROP or TRUNCATE
Multiple OTHER ways to do this depending on version
In v9 alter table move tablespace [tsp name];

Row movement must be enabled

Tablespace must be a LMT

Can move into same tablespace

Will occupy 2X space temporarily

Must then rebuild indexes

In v10 alter table <table_name> shrink space;


INTELLIGENCE. INNOVATION. INTEGRITY

Freeing Allocated Space

INTELLIGENCE. INNOVATION. INTEGRITY

Create table, check space


SQL> create table space_example as select * from dba_source;
Table created.
SQL> select count(*) from space_example;
COUNT(*)
---------296463
SQL> exec dbms_space.unused_space(DAVE', 'SPACE_EXAMPLE');
Total blocks: 6328
Unused blocks: 1
Unused bytes: 8192
Last Used Block: 55
Last Used Block ID: 10377
Last Used Ext File ID: 4
INTELLIGENCE. INNOVATION. INTEGRITY

Check datafile space

Size Current
Poss.
FILE_NAME
Poss.
Size Savings
-------------------------------------------------- -------- -------- -------/export/home/ora102/oradata/ora102/qasb001.dbf
29
46
17
/export/home/ora102/oradata/ora102/example01.dbf
69
100
31
/export/home/ora102/oradata/ora102/qasb002.dbf
41
41
0
/export/home/ora102/oradata/ora102/system01.dbf
493
500
7
/export/home/ora102/oradata/ora102/sysaux01.dbf
430
430
0
/export/home/ora102/oradata/ora102/undotbs01.dbf
91
175
84
/export/home/ora102/oradata/ora102/users01.dbf
44
83
39
/export/home/ora102/oradata/ora102/test.dbf
51
70
19

INTELLIGENCE. INNOVATION. INTEGRITY

Delete rows, check space


SQL> delete from space_example;
296463 rows deleted.
SQL> commit;
SQL> exec dbms_space.unused_space(DAVE', 'SPACE_EXAMPLE');
Total blocks: 6328
Unused blocks: 1
Unused bytes: 8192
Last Used Block: 55
Last Used Block ID: 10377
Last Used Ext File ID: 4

Nothing Changed !
INTELLIGENCE. INNOVATION. INTEGRITY

Shrink it, check space


SQL> alter table space_example enable row movement;
SQL> alter table space_example shrink space;
SQL> exec dbms_space.unused_space('BDB', 'SPACE_EXAMPLE');
Total blocks: 8
Unused blocks: 4
Unused bytes: 32768
Last Used Block: 4
Last Used Block ID: 5129
Last Used Ext File ID: 4

Space Freed From Table, but still in Oracle


INTELLIGENCE. INNOVATION. INTEGRITY

Check space again

Size Current
Poss.
FILE_NAME
Poss.
Size Savings
-------------------------------------------------- -------- -------- -------/export/home/ora102/oradata/ora102/qasb001.dbf
29
46
17
/export/home/ora102/oradata/ora102/example01.dbf
69
100
31
/export/home/ora102/oradata/ora102/qasb002.dbf
41
41
0
/export/home/ora102/oradata/ora102/system01.dbf
493
500
7
/export/home/ora102/oradata/ora102/sysaux01.dbf
430
430
0
/export/home/ora102/oradata/ora102/undotbs01.dbf
171
175
4
/export/home/ora102/oradata/ora102/users01.dbf
44
83
39
/export/home/ora102/oradata/ora102/test.dbf
1
70
69

This datafile should be resized to save 69 MB


SQL> alter database datafile '/export/home/ora102/oradata/ora102/test.dbf' resize 1m;

INTELLIGENCE. INNOVATION. INTEGRITY

Free the Space


Space is still reserved for future inserts and
updates, just not freed back to the OS
Space will not be automatically freed
confirm by checking DBA_FREE_SPACE
Ways to set it free

drop

truncate

alter table move

alter table shrink space


INTELLIGENCE. INNOVATION. INTEGRITY

Unindexed Foreign Keys Example


1 Million Rows

PARENT
COL1

ON DELETE CASCADE

CHILD

COL1
COL1_PARENT

1 Million Rows

SQL> DELETE FROM PARENT WHERE COL1 < 1000;


Fky.sql

INTELLIGENCE. INNOVATION. INTEGRITY

Before Index
delete from parent
where
col1 < 1000
call
count
------- -----Parse
1
Execute
1
Fetch
0
------- -----total
2

cpu
elapsed
disk
query
current
-------- ---------- ---------- ---------- ---------0.01
0.08
2
27
0
0.90
0.80
4
2208799
6062
0.00
0.00
0
0
0
-------- ---------- ---------- ---------- ---------0.91
0.88
6
2208826
6062

rows
---------0
999
0
---------999

delete from "DAVE"."CHILD"


where
"COL1_PARENT" = :1
call
count
------- -----Parse
1
Execute
999
Fetch
0
------- -----total
1000

cpu
elapsed
disk
query
current
-------- ---------- ---------- ---------- ---------0.00
0.00
0
0
0
285.94
293.11
1543900
2208789
1029
0.00
0.00
0
0
0
-------- ---------- ---------- ---------- ---------285.94
293.11
1543900
2208789
1029

rows
---------0
999
0
---------999

INTELLIGENCE. INNOVATION. INTEGRITY

SQL> create index prnt_ndx on child(col1_parent);

delete from parent


where
col1 < 1000
call
count
------- -----Parse
1
Execute
1
Fetch
0
------- -----total
2

cpu
elapsed
disk
query
current
-------- ---------- ---------- ---------- ---------0.00
0.00
0
0
0
0.53
0.47
7
13
7053
0.00
0.00
0
0
0
-------- ---------- ---------- ---------- ---------0.53
0.47
7
13
7053

rows
---------0
999
0
---------999

delete from "DAVE"."CHILD"


where
"COL1_PARENT" = :1
call
count
------- -----Parse
1
Execute
999
Fetch
0
------- -----total
1000

cpu
elapsed
disk
query
current
-------- ---------- ---------- ---------- ---------0.00
0.00
0
0
0
0.42
0.46
2
3002
4058
0.00
0.00
0
0
0
-------- ---------- ---------- ---------- ---------0.42
0.46
2
3002
4058

rows
---------0
999
0
---------999

INTELLIGENCE. INNOVATION. INTEGRITY

Unindexed Foreign Keys


Problem is not limited to DELETE statements
Search database for unindexed FK columns
Script is on asktom

Search for unindex.sql

INTELLIGENCE. INNOVATION. INTEGRITY

Summary Points
Create sound Archiving strategy based on Oracle
technical features as well as business and/or
legal requirements
Leverage partitioning
Move partitions to cheap disk when appropriate
Make partitions read only and compressed
Remove data via DROP or TRUNCATE if possible
If SQL DELETE, make sure to perform
maintenance operations
Consider 3rd party solutions
INTELLIGENCE. INNOVATION. INTEGRITY

Well done is better than well said


Ben Franklin
Questions?

INTELLIGENCE. INNOVATION. INTEGRITY

S-ar putea să vă placă și