Documente Academic
Documente Profesional
Documente Cultură
equirements
have
been
defined, the data model is
complete, source systems have
been identified, tool selections
have been made, and now the
only thing left to do is connect the
dots. Sounds easy, right? Designated
with creating the extract, transform
and load (ETL) architecture to move
the source data into the warehouse,
you begin drawing boxes and lines
depicting the individual ETL processes
that must be created in order to reconcile the idiosyncratic source system
data into its generic business view of
the world. The solution seems obvious: create an ETL process for each
source you must introduce.
And theyre off! The ETL analysts
charge forward, creating the detail
design (source-to-target mappings)
for each of the processes you identified. However, upon reviewing said
designs, you begin to discover that
many of them are repetitively performing many of the same actions
(and not always consistently). Maybe
theres a better way.
ETL architects and data warehouse designers are faced with the
task of homogenizing data into a stan-
SRC X Customer
CUST_STATUS
BILLDT
SRC Y Customer
DISCONNECT_DATE
LAST_BILL
Source-Specific
Non Source-Specific
SRC X Rules:
If CUST_STATUS = A or X, then
STATUS = ACTIVE, else
INACTIVE.
LAST_BILL_DATE = BILLDT.
SRC Y Rules:
If DISCONNECT_DATE = NULL,
then STATUS = ACTIVE, else
INACTIVE.
LAST_BILL_DATE = LAST_BILL
Traditional Versus
Conformed ETL Architecture
Following is a simple example as
context for this discussion.
Company ABC has two sales systems
(systems X and Y), which contain information surrounding the sale of widgets.
Each sale of a widget needs to be captured
www.dmreview.com
Source X Data
Customer
PK
Date of Sale
Customer Number
Sale Category
Sale Amount
Sales Tax
Customer ID
....
Widget Sales
PK, FK1 Widget Type ID
PK, FK2 Calendar ID
PK, FK3 Customer ID
Source Y Data
Customer Code
Sales Date
Widget
Amount
Widget Type
Calendar
PK Widget Type ID
PK
....
Source
ETL
Calendar ID
....
Target
as part of the data warehouse implementation. The sales are qualified by the date of
the sale, the customer purchasing the widget,
the type of widget purchased (commercial
or residential) and the total dollar
amount of the sale (see Figure 2).
When faced with the issue of how
to create the necessary ETL processes
to converge like data sources into
standardized entities within the warehouse, there are two choices:
Traditional ETL Architecture:
Create individual ETL processes for
each source system (as shown in
Figure 3).
A traditional ETL architecture
would create one ETL process to perform all of the logic necessary to
transform the source data into its target destination. The advantages to
this approach are that there are fewer
Source X Data
Date of Sale
Customer Number
Sale Category
Sale Amount
Sales Tax
PK
ETL
Customer ID
....
Calendar
Widget Sales
PK, FK1 Customer ID
PK, FK2 Calendar ID
PK, FK3 Widget Type ID
PK
Calendar ID
....
ETL
Widget Type
PK
Widget Type ID
....
www.dmreview.com
Source-Specific
Non Source-Specific
Date of Sale
Customer Number
Sale Category
Sale Amount
Sales Tax
Calendar
Customer
Source X Data
PK
Calendar ID
....
....
ETL
Widget Sales
ETL
Source Y Data
Customer Code
Sales Date
Widget
Amount
PK
Customer ID
Widget Type
PK
ETL
Widget Type ID
....
www.dmreview.com