Documente Academic
Documente Profesional
Documente Cultură
Introduction ....................................................................................... 3
Data Staging Methods ....................................................................... 4
Live Reporting Database Considerations ....................................... 4
Historical Database Considerations ............................................... 4
Flat ASCII File Considerations ....................................................... 5
Data Staging Configurations .............................................................. 6
Live Reporting Database ............................................................... 6
Live Reporting Database Sample Schema: ........................................... 6
Historical Database ....................................................................... 7
HistoricalDatabase Sample Schema: ..................................................... 7
Introduction
This document outlines the best practices for implementing Oracle GoldenGate real-time data
integration solution to stage data for consumption by an ETL tool.
Live Reporting database - The basic assumption with this approach is that there is already a
Live Reporting database being maintained that can be modified to support the additional
requirements for feeding data into the ETL tool being used.
Historical database - The basic assumption with this approach is that there is a need to have
a chronology of the database changes. This approach turns all database change operations
into inserts and creates a staging database specifically to feed data into the ETL tool being
used.
Batch mode flat files - The basic assumption here is that the desired delivery medium is flat
ASCII style files that are created at user defined intervals to feed data into the ETL tool
being used.
The primary focus of this document is the recommended “Best Practices” configuration of
Oracle GoldenGate sending data to the ETL tool of choice. There is no reference to any
specific ETL tool or any special requirements to support specific tools. This is meant to be a
starting point and it is expected that changes will need to be made to support specific user
requirements and tools.
3
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools
If the requirement is to have the full image for every update then you can achieve this via executing a
stored procedure to do a lookup on the staging database for the prior image before applying the new
operation. This minimizes the impact on the source system by not requiring supplemental logging to
be enabled for all columns.
4
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools
It is important to note that if it is determined that only the changed data is desired for updates that the
schemas for the staging database will need to allow null values for all non-key columns.
The Flat File User Exit is used to output transactional data captured by Oracle GoldenGate to rolling
flat files to be consumed by a third party product.
extrac
t
data.ldv
data.ldv
data.
ts.
extract ext
extract
OR
prm flatfilewriter
.so or .dll + data.ctrl
sch.
sch.
table. sch.
sch. sch.
source ts.table.
table. table.
sch.
ts.
ext ts. ts.table.
table2.
defs exit. ext
ext ts.
ext ts.
extext
propertie
s
The Flat File User Exit is provided as a shared library (.so or .dll) that integrates into the dataflow via
the Oracle GoldenGate extract process.
The user exit supports two modes of output:
DSV – Delimiter Separated Values (commas are an example)
LDV – Length Delimited Values
And it can output data:
All to one file
Or one file per table
The user exit can rollover based on time and / or size criteria and flushes files and maintains
checkpoints when Oracle GoldenGate checkpoints to ensure recovery. In addition, it writes a control
5
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools
file containing a list of rolled over files for synchronization with Data Integration Products and can
also produce a summary file for use in auditing. Additional properties control formatting (delimiters,
other values), directories, file extensions, meta columns (such as table name, file position, etc.) and data
options.
For more detail information on Flat File user Exit method such as installation and configuration,
please refer to the Oracle GoldenGate Flat File User Exit Guide.
In this mode, the “DELETE” operation will be converted into an update operation which in turn will
be your logical delete. You can create a view for the live reporting system to run against that would
hide the deleted rows (where the transaction type column is not equal to “DELETE”).
6
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools
cust_code VARCHAR2(4),
order_date DATE,
product_code VARCHAR2(8),
order_id NUMBER,
product_price NUMBER(8,2),
product_amount NUMBER(6),
transaction_id NUMBER,
PRIMARY KEY (cust_code, order_date, product_code, order_id)
USING INDEX);
CREATE INDEX ds_tcustord_idx
ON ds_tcustord (gg_commit_time);
CREATE or REPLACE VIEW v$ds_tcustord AS
SELECT cust_code, order_date, product_code, order_id,
product_price, product_amount, transaction_id
FROM ds_tcustord
WHERE gg_tran_type <> 'DELETE';
Historical Database
The tables for this example have four new columns added to them.
Transaction Commit Timestamp – GG_COMMIT)TIME
Log Sequence Number – GG_SEQNO
Log Relative Byte Address (RBA) Number – GG_RBA
Transaction Type – GG_TRAN_TYPE
For these tables, all database operations are turned into inserts. Again, the Timestamp column will be
used by the ETL tool to load the database changes in an incremental fashion (where timestamp
between A and B). The Log Sequence Number and Log RBA number enforce processing order and
insure uniqueness.
In this example a stored procedure is used to minimize the impact on the source system for lookup
operations when querying the previous image to complete missing columns in an update or delete
operation.
7
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools
ON ds_tcustmer_history (cust_code);
CREATE TABLE ds_tcustord_history
( gg_commit_time TIMESTAMP NOT NULL,
gg_seqno NUMBER(12) NOT NULL,
gg_rba NUMBER(12) NOT NULL,
gg_tran_type varchar (20) NOT NULL,
cust_code VARCHAR2(4),
order_date DATE,
product_code VARCHAR2(8),
order_id NUMBER,
product_price NUMBER(8,2),
product_amount NUMBER(6),
transaction_id NUMBER,
PRIMARY KEY (gg_commit_time, gg_seqno, gg_rba)
USING INDEX);
CREATE INDEX ds_tcustord_hidtory_idx
ON ds_tcustord_history
(cust_code, order_date, product_code, order_id);
8
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools
Capture Configuration
The additional columns that will be used by the ETL tool will be extracted from the Oracle
GoldenGate Header (GGHEADER) record information via a column conversion function
@GETENV. The @GETENV function will place the necessary header record information into a
Oracle GoldenGate tokenize format into the trail which in turn will be processed downstream by
Replicat via another column conversion function @TOKEN.
“LOGRBA” – Returns the sequence number of the transaction log (this is only on the transactional log-
based Oracle GoldenGate product)
“LOGPOSITION” – Returns the relative byte address within the transaction log (this is only on the
transactional log-based Oracle GoldenGate product)
“OPTYPE” – Returns the type of operation
9
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools
The two additional columns will be retrieved via the @TOKEN function and will be populated to the
staging database.
10
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools
A separate stored procedure will be called for each table to do a lookup of the prior image of the
record to be inserted. The stored procedure will be called for each missing column found in the Oracle
GoldenGate trail record. For TCUSTMER table, tcustmer_hist_lookup stored procedure will be called
and for TCUSTORD table, tcustord_hist_lookup stored procedure will be called for each missing
columns for that table.
There were three additional column conversion functions that were used in the sample parameter
below to check for missing columns in the trail record. The three functions were @IF, @COLTEST
and @GETVAL.. The SQLEXEC parameter function was also needed to execute Stored Procedure
within the Delivery process.
11
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools
For more information on column mapping and data transformation functions, please refer to the
Oracle GoldenGate Reference Guide.
DB System GGSCI:
add extract pump, exttrailsource ./dirdat/aa
add rmttrail ./dirdat/bb, extract pump, megabytes 20
For more detail information on configuring the Delivery Process for the Flat File
User Exit, please refer to the Oracle GoldenGate Flat File User Exit Guide.
12
Oracle GoldenGate Best Practices – Oracle GoldenGate for ETL Tools
Conclusion
This document outlines the best practices for implementing Oracle GoldenGate real-time data
integration solution to stage data for consumption by an ETL tool using three general
approaches namely Live Reporting database, Historical database and Batch mode flat files.
13
\
White Paper Title Copyright © 2010, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only and the
June 2010 contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other
Author: Mike Papio warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or
Contributing Authors: fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are
formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any
Oracle Corporation means, electronic or mechanical, for any purpose, without our prior written permission.
World Headquarters
500 Oracle Parkway Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective
Redwood Shores, CA 94065 owners.
U.S.A.
AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. Intel
Worldwide Inquiries: and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are
Phone: +1.650.506.7000 trademarks or registered trademarks of SPARC International, Inc. UNIX is a registered trademark licensed through X/Open
Fax: +1.650.506.7200 Company, Ltd. 0410
oracle.com