Sunteți pe pagina 1din 14

Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools

An Oracle White Paper


April 2011

Oracle GoldenGate Best Practices –


Oracle GoldenGate for ETL Tools
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools

Introduction ....................................................................................... 3
Data Staging Methods ....................................................................... 4
Live Reporting Database Considerations ....................................... 4
Historical Database Considerations ............................................... 4
Flat ASCII File Considerations ....................................................... 5
Data Staging Configurations .............................................................. 6
Live Reporting Database ............................................................... 6
Live Reporting Database Sample Schema: ........................................... 6
Historical Database ....................................................................... 7
HistoricalDatabase Sample Schema: ..................................................... 7

HistoricalDatabase – Sample Stored Procedure Lookup: ...................... 8


Oracle GoldenGate Change Data Capture Process........................... 9
Capture Configuration ................................................................... 9
GGHEADER Record Information: .......................................................... 9

Sample Capture Parameter File: ............................................................ 9


Oracle GoldenGate Delivery Process .............................................. 10
Delivery Process – Live Reporting Database Configuration ......... 10
Sample Live Reporting Database Delivery Parameter File: ................. 10
Delivery Process – Historical Database Configuration ................. 10
Sample Live Reporting Database Delivery Parameter File: ................. 11
Delivery Process – Flat ASCII File Configuration ......................... 12
Typical Oracle GoldenGate Configuration for Flat File writing: ............ 12

Sample Live Reporting Database Delivery Parameter File: ................. 12


Conclusion ...................................................................................... 13
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools

Introduction

This document outlines the best practices for implementing Oracle GoldenGate real-time data
integration solution to stage data for consumption by an ETL tool.

The three different approaches that will be discussed are:

 Live Reporting database - The basic assumption with this approach is that there is already a
Live Reporting database being maintained that can be modified to support the additional
requirements for feeding data into the ETL tool being used.

 Historical database - The basic assumption with this approach is that there is a need to have
a chronology of the database changes. This approach turns all database change operations
into inserts and creates a staging database specifically to feed data into the ETL tool being
used.

 Batch mode flat files - The basic assumption here is that the desired delivery medium is flat
ASCII style files that are created at user defined intervals to feed data into the ETL tool
being used.

The primary focus of this document is the recommended “Best Practices” configuration of
Oracle GoldenGate sending data to the ETL tool of choice. There is no reference to any
specific ETL tool or any special requirements to support specific tools. This is meant to be a
starting point and it is expected that changes will need to be made to support specific user
requirements and tools.

3
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools

Data Staging Methods


This section goes into more detail about the things that need to be considered when determining which
method is best suited for staging data to feed into the ETL tool being used.

Live Reporting Database Considerations


When using a Live Reporting configuration (assumption is that the database scheme approximates the
operational database and is fully populated) certain guidelines need to be followed.
 Need to ensure that there is a column in all tables that contains the transaction commit timestamp or
the application’s last update timestamp
 Need to ensure that there is an index created on the additional column (transaction commit
timestamp or application last update timestamp)
 Need to determine the desired approach to processing delete operations. If the requirement is to
send the last values for a deleted row to the ETL tool for processing, then additional steps will be
needed to support “logical delete” processing. These steps most likely will include adding an
additional column for “operation type” or a “logical delete flag” and perhaps a view to hide the
deleted row from the Live Reporting application. The specific business requirements will determine
what approach is best implemented
The periodic load using the ETL tool will be driven off of the timestamp column so that the
incremental change can be applied.

Historical Database Considerations


When building a Historical Database where all database operations are represented as inserts into the
history table, three additional columns will be required to properly maintain transaction order and one
optional column that could be used for downstream processing.
Required Columns:
 Transaction Commit Timestamp
 Log Sequence Number
 Log Relative Byte Address (RBA)
Optional Column:
 Transaction Type (Insert, Update, Delete)
Another item to be determined by the business requirements and tool being used deals with the
contents of the update image. Is a full image for every update required or should that image only
include the columns that are actually update in the operational database?

If the requirement is to have the full image for every update then you can achieve this via executing a
stored procedure to do a lookup on the staging database for the prior image before applying the new
operation. This minimizes the impact on the source system by not requiring supplemental logging to
be enabled for all columns.

4
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools

It is important to note that if it is determined that only the changed data is desired for updates that the
schemas for the staging database will need to allow null values for all non-key columns.

Flat ASCII File Considerations


When staging data into a flat file to be consumed by an ETL tool, this can be done by the use of the
Oracle GoldenGate Flat File User Exit.

The Flat File User Exit is used to output transactional data captured by Oracle GoldenGate to rolling
flat files to be consumed by a third party product.

extrac
t

data.ldv
data.ldv
data.
ts.
extract ext
extract
OR
prm flatfilewriter
.so or .dll + data.ctrl
sch.
sch.
table. sch.
sch. sch.
source ts.table.
table. table.
sch.
ts.
ext ts. ts.table.
table2.
defs exit. ext
ext ts.
ext ts.
extext
propertie
s

Figure 1. Flat File User Exit Overview

The Flat File User Exit is provided as a shared library (.so or .dll) that integrates into the dataflow via
the Oracle GoldenGate extract process.
The user exit supports two modes of output:
 DSV – Delimiter Separated Values (commas are an example)
 LDV – Length Delimited Values
And it can output data:
 All to one file
 Or one file per table
The user exit can rollover based on time and / or size criteria and flushes files and maintains
checkpoints when Oracle GoldenGate checkpoints to ensure recovery. In addition, it writes a control

5
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools

file containing a list of rolled over files for synchronization with Data Integration Products and can
also produce a summary file for use in auditing. Additional properties control formatting (delimiters,
other values), directories, file extensions, meta columns (such as table name, file position, etc.) and data
options.
For more detail information on Flat File user Exit method such as installation and configuration,
please refer to the Oracle GoldenGate Flat File User Exit Guide.

Data Staging Configurations


This section gives example configurations used to support the methods discussed earlier.

Live Reporting Database


The tables for this example have two additional columns added to them.
 Transaction Commit Timestamp – GG_COMMIT)TIME
 Transaction Type – GG_TRAN_TYPE
The Timestamp column will be used by the ETL tool to load the database changes in an incremental
fashion (where timestamp between A and B). The transaction type column is used to allow for logical
deletes.

In this mode, the “DELETE” operation will be converted into an update operation which in turn will
be your logical delete. You can create a view for the live reporting system to run against that would
hide the deleted rows (where the transaction type column is not equal to “DELETE”).

Live Reporting Database Sample Schema:

CREATE TABLE ds_tcustmer


( gg_commit_time TIMESTAMP NOT NULL,
gg_tran_type varchar (20) NOT NULL,
cust_code VARCHAR2(4),
name VARCHAR2(30),
city VARCHAR2(20),
state CHAR(2),
PRIMARY KEY (cust_code)
USING INDEX);
CREATE INDEX ds_tcustmer_idx
ON ds_tcustmer (gg_commit_time);
CREATE or REPLACE VIEW v$ds_tcustmer AS
SELECT cust_code, name, city, state
FROM ds_tcustmer
WHERE gg_tran_type <> 'DELETE';

CREATE TABLE ds_tcustord


( gg_commit_time TIMESTAMP NOT NULL,
gg_tran_type varchar (20) NOT NULL,

6
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools

cust_code VARCHAR2(4),
order_date DATE,
product_code VARCHAR2(8),
order_id NUMBER,
product_price NUMBER(8,2),
product_amount NUMBER(6),
transaction_id NUMBER,
PRIMARY KEY (cust_code, order_date, product_code, order_id)
USING INDEX);
CREATE INDEX ds_tcustord_idx
ON ds_tcustord (gg_commit_time);
CREATE or REPLACE VIEW v$ds_tcustord AS
SELECT cust_code, order_date, product_code, order_id,
product_price, product_amount, transaction_id
FROM ds_tcustord
WHERE gg_tran_type <> 'DELETE';

Historical Database
The tables for this example have four new columns added to them.
 Transaction Commit Timestamp – GG_COMMIT)TIME
 Log Sequence Number – GG_SEQNO
 Log Relative Byte Address (RBA) Number – GG_RBA
 Transaction Type – GG_TRAN_TYPE
For these tables, all database operations are turned into inserts. Again, the Timestamp column will be
used by the ETL tool to load the database changes in an incremental fashion (where timestamp
between A and B). The Log Sequence Number and Log RBA number enforce processing order and
insure uniqueness.

In this example a stored procedure is used to minimize the impact on the source system for lookup
operations when querying the previous image to complete missing columns in an update or delete
operation.

HistoricalDatabase Sample Schema:


CREATE TABLE ds_tcustmer_history
( gg_commit_time TIMESTAMP NOT NULL,
gg_seqno NUMBER(12) NOT NULL,
gg_rba NUMBER(12) NOT NULL,
gg_tran_type varchar (20) NOT NULL,
cust_code VARCHAR2(4),
name VARCHAR2(30),
city VARCHAR2(20),
state CHAR(2),
PRIMARY KEY (gg_commit_time, gg_seqno, gg_rba)
USING INDEX);
CREATE INDEX ds_tcustmer_history_idx

7
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools

ON ds_tcustmer_history (cust_code);
CREATE TABLE ds_tcustord_history
( gg_commit_time TIMESTAMP NOT NULL,
gg_seqno NUMBER(12) NOT NULL,
gg_rba NUMBER(12) NOT NULL,
gg_tran_type varchar (20) NOT NULL,
cust_code VARCHAR2(4),
order_date DATE,
product_code VARCHAR2(8),
order_id NUMBER,
product_price NUMBER(8,2),
product_amount NUMBER(6),
transaction_id NUMBER,
PRIMARY KEY (gg_commit_time, gg_seqno, gg_rba)
USING INDEX);
CREATE INDEX ds_tcustord_hidtory_idx
ON ds_tcustord_history
(cust_code, order_date, product_code, order_id);

HistoricalDatabase – Sample Stored Procedure Lookup:


create or replace procedure tcustmer_hist_lookup
(i_cust_code IN varchar2,
o_name OUT varchar2,
o_city OUT varchar2,
o_state OUT varchar2
)
IS
BEGIN
SELECT name, city, state
INTO o_name, o_city, o_state
FROM
(SELECT name, city, state
FROM ds_tcustmer_history
WHERE cust_code = i_cust_code
ORDER BY gg_commit_time desc, gg_seqno desc, gg_rba desc)
WHERE ROWNUM = 1;
END tcustmer_hist_lookup;
/
create or replace procedure tcustord_hist_lookup
(i_cust_code IN varchar2,
i_order_date IN date,
i_product_code IN varchar2,
i_order_id IN number,
o_product_price OUT number,
o_product_amount OUT number,
o_transaction_id OUT number
)
IS
BEGIN
SELECT product_price, product_amount, transaction_id

8
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools

INTO o_product_price, o_product_amount, o_transaction_id


FROM
(SELECT product_price, product_amount, transaction_id
FROM ds_tcustord_history
WHERE cust_code = i_cust_code
AND order_date = i_order_date
AND product_code = i_product_code
AND order_id = i_order_id
ORDER BY gg_commit_time desc, gg_seqno desc, gg_rba desc)
WHERE ROWNUM = 1;
END tcustord_hist_lookup;
/

Oracle GoldenGate Change Data Capture Process


In this section we will discuss the necessary steps to configure the Change Data Capture on the Source
system. One example can be used for all the approaches discussed so far as there are only very minor
differences in the parameter files. This capture configuration uses the standard Oracle GoldenGate
TCUSTMER and TCUSTORD tables.

Capture Configuration
The additional columns that will be used by the ETL tool will be extracted from the Oracle
GoldenGate Header (GGHEADER) record information via a column conversion function
@GETENV. The @GETENV function will place the necessary header record information into a
Oracle GoldenGate tokenize format into the trail which in turn will be processed downstream by
Replicat via another column conversion function @TOKEN.

GGHEADER Record Information:

 “COMMITTIMESTAMP” – Returns the transaction timestamp

 “LOGRBA” – Returns the sequence number of the transaction log (this is only on the transactional log-
based Oracle GoldenGate product)

 “LOGPOSITION” – Returns the relative byte address within the transaction log (this is only on the
transactional log-based Oracle GoldenGate product)
 “OPTYPE” – Returns the type of operation

Sample Capture Parameter File:


EXTRACT GGSCUST
USERID mpapio, PASSWORD mpapio
RMTHOST amber, MGRPORT 9020
RMTTRAIL ./dirdat/ds
TABLE TCUSTMER,
TOKENS (TKN-GG-COMMIT-TIME = @GETENV("GGHEADER",
"COMMITTIMESTAMP"),
TKN-GG-SEQNO = @GETENV("GGHEADER", "LOGRBA"),

9
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools

TKN-GG-RBA = @GETENV("GGHEADER", "LOGPOSITION"),


TKN-GG-TRAN-TYPE = @GETENV("GGHEADER", "OPTYPE")
);
TABLE TCUSTORD,
TOKENS (TKN-GG-COMMIT-TIME = @GETENV("GGHEADER",
"COMMITTIMESTAMP"),
TKN-GG-SEQNO = @GETENV("GGHEADER", "LOGRBA"),
TKN-GG-RBA = @GETENV("GGHEADER", "LOGPOSITION"),
TKN-GG-TRAN-TYPE = @GETENV("GGHEADER", "OPTYPE")
);

Oracle GoldenGate Delivery Process


Each of the proposed solutions requires a different configuration for the Oracle GoldenGate Delivery
process. In fact the third method (Flat ASCII file) uses a second Capture process with an ETL User
Exit to create, format and write the files.

Delivery Process – Live Reporting Database Configuration


The delivery process for feeding the Live Reporting database uses the two additional columns which
are the timestamp and transaction type.

The two additional columns will be retrieved via the @TOKEN function and will be populated to the
staging database.

Sample Live Reporting Database Delivery Parameter File:


REPLICAT GGSCUST
SOURCEDEFS ./dirdef/dscust.def
USERID mpapio, PASSWORD mpapio
DISCARDFILE ./dirrpt/rdscust.dsc, PURGE
UPDATEDELETES

MAP TCUSTMER, TARGET DS_TCUSTMER,


COLMAP (USEDEFAULTS,
gg_commit_time = @TOKEN("TKN-GG-COMMIT-TIME"),
gg_tran_type = @TOKEN("TKN-GG-TRAN-TYPE"));
MAP TCUSTORD, TARGET DS_TCUSTORD,
COLMAP (USEDEFAULTS,
gg_commit_time = @TOKEN("TKN-GG-COMMIT-TIME"),
gg_tran_type = @TOKEN("TKN-GG-TRAN-TYPE"));

Delivery Process – Historical Database Configuration


The delivery process for feeding the Historical database uses four additional columns. The following
additional columns are:
 Timestamp
 Sequence Number

10
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools

 Relative Byte Address (RBA)


 Transaction Type
The four additional columns will be retrieved via the @TOKEN function and will be populated to the
Historical database.

A separate stored procedure will be called for each table to do a lookup of the prior image of the
record to be inserted. The stored procedure will be called for each missing column found in the Oracle
GoldenGate trail record. For TCUSTMER table, tcustmer_hist_lookup stored procedure will be called
and for TCUSTORD table, tcustord_hist_lookup stored procedure will be called for each missing
columns for that table.
There were three additional column conversion functions that were used in the sample parameter
below to check for missing columns in the trail record. The three functions were @IF, @COLTEST
and @GETVAL.. The SQLEXEC parameter function was also needed to execute Stored Procedure
within the Delivery process.

Sample Live Reporting Database Delivery Parameter File:


REPLICAT GGSCUST
SOURCEDEFS ./dirdef/dscust.def
USERID mpapio, PASSWORD mpapio
DISCARDFILE ./dirrpt/rdscust.dsc, PURGE
NOUPDATEDELETES
INSERTALLRECORDS
MAP TCUSTMER, TARGET DS_TCUSTMER_HISTORY,
SQLEXEC (SPNAME tcustmer_hist_lookup, PARAMS (i_cust_code = cust_code)),
COLMAP (USEDEFAULTS,
gg_commit_time = @TOKEN("TKN-GG-COMMIT-TIME"),
gg_seqno = @TOKEN("TKN-GG-SEQNO"),
gg_rba = @TOKEN("TKN-GG-RBA"),
gg_tran_type = @TOKEN("TKN-GG-TRAN-TYPE")
name = @if (@coltest (name, MISSING),
@getval (tcustmer_hist_lookup.o_name), name),
city = @if (@coltest (city, MISSING),
@getval (tcustmer_hist_lookup.o_city), city),
state = @if (@coltest (state, MISSING),
@getval (tcustmer_hist_lookup.o_state), state)
);
MAP TCUSTORD, TARGET DS_TCUSTORD_HISTORY,
SQLEXEC (SPNAME tcustord_hist_lookup,
PARAMS (i_cust_code = cust_code,
i_order_date = order_date,
i_product_code = product_code,
i_order_id = order_id)),
COLMAP (USEDEFAULTS,
gg_commit_time = @TOKEN("TKN-GG-COMMIT-TIME"),
gg_seqno = @TOKEN("TKN-GG-SEQNO"),
gg_rba = @TOKEN("TKN-GG-RBA"),
gg_tran_type = @TOKEN("TKN-GG-TRAN-TYPE")
product_price = @if (@coltest (product_price, MISSING),
@getval (tcustord_hist_lookup.o_product_price), product_price),

11
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools

product_amount = @if (@coltest (product_amount, MISSING),


@getval (tcustord_hist_lookup.o_product_amount), product_amount),
transaction_id = @if (@coltest (transaction_id, MISSING),
@getval (tcustord_hist_lookup.o_transaction_id), transaction_id)
);

For more information on column mapping and data transformation functions, please refer to the
Oracle GoldenGate Reference Guide.

Delivery Process – Flat ASCII File Configuration


The Delivery process for Flat ASCII file uses a second capture process where the Oracle GoldenGate
Flat File User Exit is used to format and write the files.

Typical Oracle GoldenGate Configuration for Flat File writing:


./dirdat/aa ./dirdat/b
b data.l
dv data.l
data.
extract pump ffwriter dv
ts.
ext
./dirdat/ff
ffue
DB System Data Integration System

[Figure 2. Typical Configuration

DB System GGSCI:
add extract pump, exttrailsource ./dirdat/aa
add rmttrail ./dirdat/bb, extract pump, megabytes 20

Data Integration System GGSCI:


add extract ffwriter, exttrailsource ./dirdat/bb
add exttrail ./dirdat/ff, extract ffwriter, megabytes 20
The sample process names and trail names used above can be replaced with any valid name. Process
names need to be 8 characters or less, trail names need to be two characters.

Sample Live Reporting Database Delivery Parameter File:


EXTRACT FFWRITER
SETENV (GG_USEREXT_PROPFILE = “ffwriter_client.porperties”)
SOURCEDEFS ./dirdef/dscust.def
CUSEREXIT ./flatfilewriter.so CUSEREXIT
EXTTRAIL ./dirdat/bb
REPORTCOUNT ELVERY 5 seconds, RATE
TABLE SCHEMA.*, GETUPDATEBEFORES;

For more detail information on configuring the Delivery Process for the Flat File
User Exit, please refer to the Oracle GoldenGate Flat File User Exit Guide.

12
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools

Conclusion

This document outlines the best practices for implementing Oracle GoldenGate real-time data
integration solution to stage data for consumption by an ETL tool using three general
approaches namely Live Reporting database, Historical database and Batch mode flat files.

13
\
White Paper Title Copyright © 2010, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only and the
June 2010 contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other
Author: Mike Papio warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or
Contributing Authors: fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are
formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any
Oracle Corporation means, electronic or mechanical, for any purpose, without our prior written permission.
World Headquarters
500 Oracle Parkway Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective
Redwood Shores, CA 94065 owners.
U.S.A.
AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. Intel
Worldwide Inquiries: and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are
Phone: +1.650.506.7000 trademarks or registered trademarks of SPARC International, Inc. UNIX is a registered trademark licensed through X/Open
Fax: +1.650.506.7200 Company, Ltd. 0410
oracle.com

S-ar putea să vă placă și