Sunteți pe pagina 1din 26

6

Extraction, Transformation,
and Loading (ETL)
Transformation

Objectives
After completing this lesson, you should be able to do
the following:
Explain transformation using SQL

6-2

CREATE TABLE AS SELECT (CTAS)


UPDATE
MERGE
Multitable INSERT

Describe transformation using table functions and


PL/SQL
Implement DML Error Logging

Copyright 2005, Oracle. All rights reserved.

Data Transformation

Data transformations are often the most complex


and costly part of the ETL process.
Transformations can range from simple data
conversion to complex scrubbing techniques.
Many, if not all, transformations can occur within
the Oracle database.
External transformations outside the
database are supported (flat files, for
instance).
Data can be transformed in two ways:
Multistage data transformation
Pipelined data transformation

6-3

Copyright 2005, Oracle. All rights reserved.

Multistage Data Transformation


Load into
staging
table.

NEW_SALES_STEP1

Flat files

Validate customer
keys (look up in
customer
dimension table).

NEW_SALES_STEP3

6-4

Convert source
product keys
to warehouse
product keys.

NEW_SALES_STEP2

Insert into SALES


warehouse table.

SALES

Copyright 2005, Oracle. All rights reserved.

Pipelined Data Transformation

External table

Validate customer
keys (look up in
customer
dimension table).

Flat files

Convert source
product keys
to warehouse
product keys.

6-5

Insert into SALES


warehouse table.

Copyright 2005, Oracle. All rights reserved.

SALES

Transformation Mechanisms
You have the following choices for transforming data
inside the database:
SQL
PL/SQL
Table functions

6-6

Copyright 2005, Oracle. All rights reserved.

Transformation Using SQL


After data is loaded into the database, transformations
can be executed using SQL operations:
CREATE TABLE ... AS SELECT
INSERT /*+APPEND*/ AS SELECT
UPDATE
MERGE
Multitable INSERT

6-7

Copyright 2005, Oracle. All rights reserved.

CREATE TABLE AS SELECT


and UPDATE
DESC sales_activity_direct
Name
Null? Type
------------ ----- --------SALES_DATE
DATE
PRODUCT_ID
NUMBER
CUSTOMER_ID
NUMBER
PROMOTION_ID
NUMBER
AMOUNT
NUMBER
QUANTITY
NUMBER
INSERT /*+ APPEND NOLOGGING PARALLEL(sales) */
INTO sales as
SELECT product_id, customer_id, TRUNC(sales_date),
3,
promotion_id, quantity, amount
FROM sales_activity_direct;
6-8

Copyright 2005, Oracle. All rights reserved.

MERGE Statement: Overview

MERGE statements provide the ability to


conditionally UPDATE/INSERT into the database.
The DELETE clause for UPDATE branch is provided
for implicit data maintenance.
Conditions are specified in the ON clause.
MERGE statements do an UPDATE if the
row exists and an INSERT if it is a new
row.
Using MERGE statements avoids
multiple updates.
MERGE statements use a single SQL
statement to complete an UPDATE or
INSERT or both.

6-10

Copyright 2005, Oracle. All rights reserved.

Data Warehousing MERGE: Example

MERGE INTO customer C


USING cust_src S
ON (c.customer_id = s.src_customer_id)
WHEN MATCHED THEN
UPDATE SET c.cust_address = s.cust_address
WHEN NOT MATCHED THEN
INSERT ( Customer_id, cust_first_name,)
VALUES (src_customer_id, src_first_name,);

6-11

Copyright 2005, Oracle. All rights reserved.

Data Maintenance with MERGE/DELETE

MERGE USING product_changes S


INTO products D ON (d.prod_id = s.prod_id)
WHEN MATCHED THEN
UPDATE SET d.prod_list_price = s.prod_new_price,
d.prod_status = s.prod_newstatus
DELETE WHERE (d.prod_status = "OBSOLETE")
WHEN NOT MATCHED THEN
INSERT (prod_id, prod_list_price, prod_status)
VALUES (s.prod_id, s.prod_new_price,
s.prod_new_status);

6-12

Copyright 2005, Oracle. All rights reserved.

Overview of Multitable INSERT Statements

Allows the INSERT AS SELECT statement to


insert rows into multiple tables as part of a single
DML statement
Can be used in data warehousing systems to
transfer data from one or more operational sources
to a set of target tables
Types of multitable INSERT statements:

6-13

Unconditional INSERT
Pivoting INSERT
Conditional ALL INSERT
Conditional FIRST INSERT

Copyright 2005, Oracle. All rights reserved.

Example of Unconditional INSERT

INSERT ALL
INTO sales VALUES (product_id, customer_id, today, 3,
promotion_id, quantity_per_day, amount_per_day)
INTO costs VALUES (product_id, today, promotion_id, 3,
product_cost, product_price)
SELECT TRUNC(s.sales_date) AS today, s.product_id,
s.customer_id, s.promotion_id, SUM(s.amount)
AS amount_per_day, SUM(s.quantity) quantity_per_day,
p.prod_min_price*0.8 AS product_cost, p.prod_list_price
AS product_price
FROM sales_activity_direct s, products p
WHERE s.product_id = p.prod_id AND TRUNC(sales_date) =
TRUNC(SYSDATE)
GROUP BY TRUNC(sales_date), s.product_id, s.customer_id,
s.promotion_id, p.prod_min_price*0.8, p.prod_list_price;

6-14

Copyright 2005, Oracle. All rights reserved.

Example of Conditional ALL INSERT

INSERT ALL
WHEN product_id IN (SELECT product_id
FROM

promotional_items) THEN

INTO promotional_sales
VALUES(product_id,list_price)
WHEN order_mode = 'online' THEN
INTO web_orders VALUES(product_id, order_total)
SELECT product_id, list_price, order_total,
order_mode
FROM orders;

6-15

Copyright 2005, Oracle. All rights reserved.

Example of Pivoting INSERT


INSERT ALL
INTO sales (product_id, customer_id, week, amount)
VALUES (product_id, customer_id, weekly_start_date, sales_sun)
INTO sales (product_id, customer_id, week, amount)
VALUES (product_id, customer_id, weekly_start_date+1, sales_mon)
INTO sales (product_id, customer_id, week, amount)
VALUES (product_id, customer_id, weekly_start_date+2, sales_tue)
INTO sales (product_id, customer_id, week, amount)
VALUES (product_id, customer_id, weekly_start_date+3, sales_wed)
...
SELECT product_id, customer_id, weekly_start_date, sales_sun,
sales_mon, sales_tue, sales_wed, sales_thu, sales_fri, sales_sat
FROM sales_summary;

6-16

Copyright 2005, Oracle. All rights reserved.

Example of Conditional FIRST INSERT


INSERT FIRST
WHEN order_total > 10000 THEN
INTO priority_handling VALUES (id)
WHEN order_total > 5000 THEN
INTO special_handling VALUES (id)
WHEN total > 3000 THEN
INTO privilege_handling VALUES (id)
ELSE
INTO regular_handling VALUES (id)
SELECT order_total , order_id id
FROM orders

6-17

Copyright 2005, Oracle. All rights reserved.

Overview of Table Functions

A table function is a function that can produce a


set of rows as output.
Input can be a set of rows.

Table functions support pipelined and parallel


execution of transformations using:
PL/SQL
C programming language
Java

6-18

Table functions are used in the FROM clause of a


SELECT statement.

Copyright 2005, Oracle. All rights reserved.

Creating Table Functions

CREATE OR REPLACE FUNCTION transform(p ref_cur_type)


RETURN table_order_items_type
PIPELINED
PARALLEL_ENABLE( PARTITION p BY ANY) IS
BEGIN
FOR rec IN p LOOP
-- Transform this record
PIPE ROW (rec);
END LOOP;
RETURN;
END;

6-19

Copyright 2005, Oracle. All rights reserved.

Using Table Functions

SELECT *
FROM TABLE(transform(cursor(SELECT *
FROM
order_items_ext)));
INSERT /*+ APPEND, PARALLEL(order_items) */
INTO order_items
SELECT *
FROM TABLE(transform(cursor(SELECT *
FROM
order_items_ext)));

6-21

Copyright 2005, Oracle. All rights reserved.

DML Error Logging: Overview

Aborting long-running bulk DML operations is


wasteful of time and system resources.
DML Error Logging allows bulk DML operations to
continue processing after a DML error occurs.
Errors are logged to an error-logging table.

6-22

DML Error Logging combines the power and speed


of bulk processing with the functionality of rowlevel error handling.

Copyright 2005, Oracle. All rights reserved.

DML Error Logging Concepts

An error-logging table is created and associated


with an existing table by using the DBMS_ERRLOG
package.

EXECUTE DBMS_ERRLOG.CREATE_ERROR_LOG('bonuses', 'errlog');

The LOG ERRORS INTO... clause is added to the


bulk DML load statement providing:
Error-logging table name
Statement tag
Reject limit

INSERT INTO bonuses


SELECT employee_id, salary*1.1 FROM employees
WHERE commission_pct > .2
LOG ERRORS INTO errlog ('my_bad') REJECT LIMIT 10;

6-23

Copyright 2005, Oracle. All rights reserved.

Error-Logging Table

The error-logging table logically consists of two


parts:
Fixed: Columns that describe the error
Variable: Columns that contain data values
corresponding to the row in error

6-25

Column names for the variable part determine the


DML table columns logged when an error occurs.
The columns of the DML table to be logged is the
intersection of the column names of the DML table
and the error-logging table.

Copyright 2005, Oracle. All rights reserved.

Error-Logging Table Format


Fixed Portion: Error information columns
Column name

Data type

Information

ORA_ERR_NUMBER$

varchar2(10)

Oracle error number

ORA_ERR_MESG$

varchar2(4000)

Oracle error message text

ORA_ERR_ROWID$

row ID

Row ID of row in error


(UPDATE/DELETE)

ORA_ERR_OPTYP$

varchar2(2)

Type of operation: (I)nsert,


(U)pdate, (D)elete

ORA_ERR_TAG$

varchar2(4000)

User-supplied tag value


via DML statement

6-27

Copyright 2005, Oracle. All rights reserved.

Inserting into a Table


with Error Logging
Example:
1

CREATE TABLE bonuses (emp_id NUMBER, sal NUMBER


CHECK(sal > 8000));

EXECUTE DBMS_ERRLOG.CREATE_ERROR_LOG('bonuses',
'errlog');

INSERT INTO bonuses


SELECT employee_id, salary*1.1 FROM employees
WHERE commission_pct > .2
LOG ERRORS INTO errlog (my_bad) REJECT LIMIT 10;

6-28

Copyright 2005, Oracle. All rights reserved.

Summary
In this lesson, you should have learned how to:
Explain transformation using SQL

6-29

CREATE TABLE AS SELECT (CTAS)


UPDATE
MERGE
Multitable INSERT

Describe transformation using table functions


Implement DML Error Logging

Copyright 2005, Oracle. All rights reserved.

Practice 6: Overview
This practice covers the following topics:
Loading data using multitable inserts
Loading data using multitable conditional insert
Loading data using MERGE
Logging DML load errors

6-30

Copyright 2005, Oracle. All rights reserved.

S-ar putea să vă placă și