Sunteți pe pagina 1din 36

Scaling To Infinity

Partitioning Data Warehouses on


Oracle Database
Session UGF-3587
Sunday 28-Sep 2014
Tim Gorman - www.Delphix.com
Oracle Open World 2014

Agenda

Stating the problem


Brief overview of star schemas
Brief overview of star transformations
The virtuous cycle
The death spiral
EXCHANGE PARTITION load technique in detail
More EXCHANGE PARTITION ideas

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

A short story
Background: DBA support for about 70 data warehouse databases at a
large telecommunications company
o Emails about out of space in TEMP tablespace scraped out of alert.log
file in a QA/TEST database
Offshore DBAs already reacting by killing sessions

o QA/TEST database already has 56G temp, and PROD has almost 300G
temp
o Reviewing AWR reports reveals a parallel aggregated query against a view
comprising a star schema between a fact table and 33 dimension tables
Further analysis reveals that none of the dimension-key columns on the fact
table supported by bitmap indexes
Thus, the star join consists of a partition-pruned FULL table scan on the fact,
followed by 33 HASH joins in parallel to the dimensions

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

A short story
When asked about the removed bitmap indexes
o An Informatica ETL developer commented
I dont know why all those indexes were removed

It was because the bitmap indexes made ETL data loading into the fact
table utterly impossible
o Project manager confirmed this, adding
Once the indexes were dropped, everything worked great

o DW architect added
This setup has been running in PROD for 10 months!
o In other words, everything is OK! Stop causing trouble!

o When asked how the end-users felt about performance


not good at all bring on Netezza, Exadata, etc...

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

Why Star Schemas?


BI analysts just want a big spreadsheet
o Lots and lots of attribute and measure columns
Attributes characterize the data
Measures are usually additive and numeric

Dimensional data model is really just that spreadsheet


o Normalized to recursive depth of one
More sophisticated models might normalize further (snowflake)
Normalized entities are dimension tables
o Columns are primary key and attribute columns
Original spreadsheet is the fact table
o Columns are foreign-keys to dimensions and measures

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

Why Star Schemas?


Transactional
Operational
Entity-Relational
Modeling

Dimensional
Modeling

Customers

Suppliers

Suppliers Dim Products Dim

Orders

Products

Order Facts

Order Lines

www.Delphix.com

Customers Dim

Oracle Open World 2014

Time Dim

28-Sep 2014

Why Star Transformations?


Star transformation
compared to other join
methods (NL, SM, HA):

Dim

Table 1

Dim

Table 2

Dim

Table 3

Dim

Table 4

www.Delphix.com

Fact

table

Filter result set in one of


the dimension tables
Join from that dimension
table to the fact along a
low-cardinality dimension
key
Join back from fact to
other dimensions using
dimension PK
o Filtering rows retrieved
along the way

Oracle Open World 2014

Wasteful and inefficient


28-Sep 2014

Why Star Transformations?


Dim

Table 1

Dim

Table 2

Dim

Table 3

Dim

Table 4

www.Delphix.com

Star transformation:
Filter on query set in
each dimension
Merge result set
from all dimensions
Join to the fact from
merged result set,
using BITMAP
MERGE index scan

Oracle Open World 2014

28-Sep 2014

Why Star Transformations?


Point: Single-column bitmap indexes on dimension-key
columns are required for star transformation
Counter-point: Bitmap indexes become impossible to
load/maintain when data volume increases past dozens of
Gb
Catch-22? Does this mean that Oracle cannot handle
large data warehouses?

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

The Virtuous Cycle


Non-volatile time-variant data implies
o Data warehouses are INSERT only

Insert-only data warehouses implies


o Tables and indexes range-partitioned by a DATE column

Tables range-partitioned by DATE enables


o Data loading using EXCHANGE PARTITION load technique
o Incremental statistics gathering and summarization

Data loading using EXCHANGE PARTITION enables


o
o
o
o
o

Direct-path (a.k.a. append) inserts


Partitions organized into time-variant tablespaces
Data purging using DROP/TRUNCATE PARTITION instead of DELETE
Bitmap indexes and bitmap-join indexes
Elimination of ETL load window and 24x7 availability for queries

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

10

The Virtuous Cycle


Direct-path (a.k.a. append) inserts enable
o
o
o
o

Load more data, faster, more efficiently


Optional NOLOGGING on inserts
Basic table compression (9i and above)
Eliminates contention in Oracle Buffer Cache during data loading

Optional NOLOGGING inserts enable


o Option to generate less redo during data loads

Basic table compression enables


o Less space consumed for tables and indexes
o Fewer I/O operations during queries

Partitions organized into time-variant tablespaces enable


o READ ONLY tablespaces for older, less-volatile data

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

11

The Virtuous Cycle


READ ONLY tablespaces for older less-volatile data enables
o Tiered storage
o Backup efficiencies

Data purging using DROP/TRUNCATE PARTITION enables


o Faster more efficient data purging than using DELETE statements

Bitmap indices enable


o Star transformations

Star transformations enable


o Optimal query-execution plan for dimensional data models
o Bitmap-join indexes

Bitmap-join indexes enable


o Further optimization of star transformations

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

12

The Death Spiral


ETL using conventional-path INSERT, UPDATE, and DELETE operations
Conventional-path operations are trouble with:
o High-volume data loads
o Contention in Shared Pool, Buffer Cache, global structures
Mixing of queries and loads simultaneously on table and indexes
Periodic rebuilds/reorgs of tables if deletions occur
Full redo and undo generation for all inserts, updates, and deletes

o Bitmap indexes and bitmap-join indexes


Modifying bitmap indexes is slow
Prone to locking issues in concurrency situations

ETL will dominate the workload in the database


o Queries will consist mainly of dumps or extracts to downstream systems
o Query performance will be abysmal and worsening

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

13

The Death Spiral


Without partitioning and insert-only (insert mostly?)
o Query performance worsens as tables/indexes grow larger
o ETL performance worsens as
Loads must be performed into live tables
o Users must be locked out during load cycle
o In-flight queries must be killed during load cycle
o Bitmap indexes must be dropped/rebuilt during load cycle
o Entire tables must be re-analyzed during load cycle

o Entire database must be backed up frequently


No change for setting tablespaces to READ ONLY
Data cannot be right-sized to storage options according to IOPS

Everything just gets harder and harder to do


o and that stupid Oracle software is to blame
o Bring on Exadata or Netezza or <expensive-flavor-of-the-month>
www.Delphix.com

Oracle Open World 2014

28-Sep 2014

14

Exchange Partition
The technique of bulk-loading new data into a temporary swap table,
which is then published using the EXCHANGE PARTITION operation,
should be the default load technique for all large tables in a data
warehouse
o fact tables
o slowly-changing dimensions

Assumptions for this upcoming example:


o Composite partitioned fact table named TXN
Range partitioned on DATE column TXN_DATE
Hash sub-partitioned on NUMBER column ACCTKEY
Data to be loaded into partition P20140225 on TXN

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

15

Exchange Partition, step 1


Hash-partitioned

table TXN_SWAP

Composite-partitioned

table TXN

CREATE TABLE
TXN_SWAP
AS SELECT
FROM TXN
PARTITION
(P20140225)

22-Feb

2014

www.Delphix.com

23-Feb

2014

24-Feb

2014

25-Feb

2014

Oracle Open World 2014

28-Sep 2014

16

Exchange Partition, step 2


Hash-partitioned

table TXN_SWAP

Composite-partitioned

table TXN

Load

Load

Load

22-Feb

2014

www.Delphix.com

23-Feb

2014

24-Feb

2014

25-Feb

2014

Oracle Open World 2014

28-Sep 2014

17

Exchange Partition, step 3


Hash-partitioned

table TXN_SWAP

Composite-partitioned

table TXN

CREATE
INDEX

CREATE
INDEX

CREATE
INDEX
22-Feb

2014

www.Delphix.com

23-Feb

2014

24-Feb

2014

25-Feb

2014

Oracle Open World 2014

28-Sep 2014

18

Exchange Partition, step 4


Hash-partitioned

table TXN_SWAP

Composite-partitioned

table TXN

EXCHANGE
PARTITION

22-Feb

2014

www.Delphix.com

23-Feb

2014

24-Feb

2014

25-Feb

2014

Oracle Open World 2014

28-Sep 2014

19

Exchange Partition, step 5


Hash-partitioned

table TXN_SWAP

Composite-partitioned

table TXN

Gather
partition
statistics
for table,
columns,
indexes

22-Feb

2014

www.Delphix.com

23-Feb

2014

24-Feb

2014

25-Feb

2014

Oracle Open World 2014

28-Sep 2014

20

Exchange Partition
1. Create temporary table TXN_SWAP as a hash-partitioned table
2. Perform parallel, direct-path load of new data into TXN_SWAP
Perform any other DML needed to prepare data in TXN_SWAP
for publishing into the TXN table
3. Create indexes on TXN_SWAP corresponding to the local indexes
on TXN
4. Exchange partition to publish new data to TXN
alter table TXN
exchange partition P20140225 with table TXN_SWAP
including indexes update global indexes;

5. Gather CBO statistics on table TXN partition P20140225


DBMS_STATS preference INCREMENTAL will gather partitionlevel statistics as well as updating global-level statistics

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

21

Exchange Partition
It is a good idea to encapsulate this logic inside PL/SQL packaged- or
stored-procedures:
SQL>
2
SQL>
SQL>
2
3
4
SQL>
SQL>

execute exchpart.prepare(TXN,_SWAP, -!
25-FEB-2014);!
alter session enable parallel dml;!
insert /*+ append parallel(n,4) */!
into txn_swap n!
select /*+ full(x) parallel(x,4) */ *!
from
stage_txn_fact x;!
commit;!
execute exchpart.finish(TXN,_SWAP);!

EXCHPART package (exchpart.sql) posted at http://www.EvDBT.com/scripts!

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

22

The dribble effect


In real-life, data loading is often much messier
than shown here
o For example, for a daily load frequency, data to be
loaded during the 25-Feb load cycle might consist of:

950,000 rows for the 25-Feb partition


4,500 rows for the 24-Feb partition
400 rows for the 23-Feb partition
700 rows for the 22-Feb partition
200 rows for the 21-Feb partition
100 rows for the 20-Feb partition
and 3 rows for the 07-Jan partition

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

23

The dribble effect


How can this be handled?
o One suggestion:

1.Use EXCHPART package to load the data for the 25-Feb and 24Feb partitions
2.Load the data to the remainder of the partitions by just
inserting (conventional-path) directly into the partitioned table

o Must determine a threshold when to use EXCHPART,


and when to simply insert rows
Data volume is the metric
Threshold value of N rows varies due to many factors
o Number of bitmap indexes on partitioned table?
o Using compression during load or not?
o Degree of parallelism?
o Are there any global indexes?
www.Delphix.com

Oracle Open World 2014

28-Sep 2014

24

The dribble effect


Example: Use EXCHANGE PARTITION when rows-to-be-loaded >
1000, else just use conventional INSERT
for d in (select trunc(txn_dt) dt, count(*) cnt from EXT_STAGE_TXN
group by trunc(txn_dt)) loop
-if d.cnt > 1000 then
-exchpart.prepare(TXN,_SWAP||to_char(d.dt,YYYYMMDD), d.dt);
insert /*+ append parallel(n,16) */ into TXN_20140224 n
select /*+ parallel(x,16) */ * from EXT_STAGE x
where x.txn_dt >= d.dt and x.txn_dt < d.dt + 1;
exchpart.finish(TXN, TXN_||to_char(d.dt,YYYYMMDD));
exchpart.drop_indexes(TXN_||to_char(d.dt,YYYYMMDD));
insert /*+ append parallel(n,16) */ into TXN_20140224 n
select /*+ parallel(x,16) */ * from EXT_STAGE x
where x.txn_dt >= d.dt and x.txn_dt < d.dt + 1;
-else
-insert into TXN
select * from ext_stage
where txn_dt >= d.dt and txn_dt < d.dt + 1;
-end if;
-end loop;
www.Delphix.com

Oracle Open World 2014

28-Sep 2014

25

Slowly-changing dimensions
Loading time-variant fact and dimension tables is not the
only load activity in most data warehouses
o Often, some tables contain current or point-in-time data
Example: type-1 dimension derived from type-2 dimension

With each load cycle loading new data into a new partition
of the type-2 dimension, the type-1 dimension needs to be
updated
o Instead of performing transactional MERGE (i.e. Update or Insert)
logic directly on the table
Rebuild the table into a temporary table, then swap it in using
EXCHANGE PARTITION

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

26

Merge logic to update SCDs


The MERGE statement to update the type-1 dimension
(CURR_ACCT_DIM) from the just-loaded partition of the
type-2 dimension (ACCT_DIM) could look like
merge into curr_acct_dim
using (select * from acct_dim
where eff_dt >= 25-FEB-2014
and
eff_dt < 26-FEB-2014)
when matched then update set ...
when not matched then insert ...;

Simple to write, but sloooowwwwwwwww


www.Delphix.com

Oracle Open World 2014

28-Sep 2014

27

ExchPart instead of MERGE


CURR_ACCT_DIM

(type-1 dimension)

ACCT_DIM

(type-2 dimension)

ACCT_DIM_SWAP

Truncated as
we begin

Data
current
as of 24Feb

22-Feb

2014

23-Feb

2014

www.Delphix.com

24-Feb

2014

25-Feb

2014

Just loaded 25-Feb data



Oracle Open World 2014

28-Sep 2014

28

ExchPart instead of MERGE


INSERT /*+ append parallel(t,8) */ INTO TMP_CURR_ACCOUNT_DIM T!
SELECT /*+ parallel(x,8) */ (list of columns)
FROM !(SELECT /*+ parallel(y,8) */
(list of columns),
ROW_NUMBER() over (PARTITION BY acctkey!
ORDER BY effdt desc) rn!
FROM (SELECT /*+ parallel(z1,8) */
(list of columns)
FROM CURR_ACCOUNT_DIM z1!
UNION ALL!
SELECT /*+ parallel(z2,8) */
(list of columns)
FROM ACCOUNT_DIM partition(P20140225) z2) y) x!
WHERE x.RN = 1;!
!
1. Inner-most query pulls changed data from type-2, merged with existing data from
type-1
2. Middle query ranks within ACCTKEY values, sorted by EFFDT DESC
3. Outer-most query selects only latest row for each ACCTKEY and passes to INSERT

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

29

ExchPart instead of MERGE


CURR_ACCT_DIM

(type-1 dimension)

ACCT_DIM

(type-2 dimension)

All rows
All rows

22-Feb

2014

23-Feb

2014

www.Delphix.com

24-Feb

2014

ACCT_DIM_SWAP

union
all

filter

25-Feb

2014

Oracle Open World 2014

28-Sep 2014

30

ExchPart instead of MERGE


CURR_ACCT_DIM

(type-1 dimension)

ACCT_DIM

(type-2 dimension)

ACCT_DIM_SWAP

CREATE
INDEX

CREATE
INDEX

CREATE
INDEX
22-Feb

2014

23-Feb

2014

www.Delphix.com

24-Feb

2014

25-Feb

2014

Oracle Open World 2014

28-Sep 2014

31

ExchPart instead of MERGE


CURR_ACCT_DIM

(type-1 dimension)

ACCT_DIM

(type-2 dimension)

ACCT_DIM_SWAP

EXCHANGE
PARTITION

22-Feb

2014

23-Feb

2014

www.Delphix.com

24-Feb

2014

25-Feb

2014

Oracle Open World 2014

28-Sep 2014

32

ExchPart instead of MERGE


CURR_ACCT_DIM

(type-1 dimension)

ACCT_DIM

(type-2 dimension)

ACCT_DIM_SWAP

Gather
partition
statistics
for table,
columns,
indexes

22-Feb

2014

23-Feb

2014

www.Delphix.com

24-Feb

2014

25-Feb

2014

Oracle Open World 2014

28-Sep 2014

33

Summary
1. Data warehouses use star schemas
2. Star schemas are best queried using star
transformations
3. Star transformation requires bitmap indexes
o

Bitmap indexes or bitmap-join indexes

4. Large bitmap indexes become infeasible without


using the EXCHANGE PARTITION load technique
5. INSERT is always faster than UPDATE, DELETE, or
MERGE
Data loading using EXCHANGE PARTITION is the key to
unlock infinite scalability
www.Delphix.com

Oracle Open World 2014

28-Sep 2014

34

KScope15

http://www.KScope15.com
#ODTUG
#KScope15
Conference for EPM, APEX, ADF, BI, Oracle developers and DBAs

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

35

Q&A

Session:
Email:
Twitter:
Blog:
o Papers:
o Scripts:
o Videos:
www.Delphix.com

UGF-3587
Tim.Gorman@Delphix.com
@TimothyJGorman
EvDBT.com
EvDBT.com/papers
EvDBT.com/scripts
EvDBT.com/videos

including this presentation!

Oracle Open World 2014

28-Sep 2014

36

S-ar putea să vă placă și