Sunteți pe pagina 1din 32

Working with Flat File Source, LookUp & Filter Transformation

This tutorial shows the process of creating an Informatica PowerCenter mapping and workflow which pulls data from Flat File data sources and use LookUp and Filter Transformation. For the demonstration purpose lets consider a flat file with the list of existing and potential customers. We need to create a mapping which loads only the potential customers but not the existing customers to a relational target table. While creating the mapping we will cover the following.

Create a mapping which reads from a flat file and creates a relational table consisting of new customers

Analyze a fixed width flat file Configure a Connected Lookup transformation Use a Filter transformation to exclude records from the pipeline.

I. Connect to the Repository


1. Connect to the repository. 2. Open the folder where you need the mapping built.

II. Analyze the source files


1. Import the flat file definition (say Nielsen.dat) into the repository. 2. Select SOURCES | IMPORT FROM FILE from the menu. 3. Select Nielsen.dat from the source file directory path. Hint : Be sure to set the Files of type: to All files (*.*) from the pull-down list, before clicking on OK. 1. Set the following options in the Flat File Wizard:

2. Select Fixed Width and check the Import field names from first line box. This option will extract the field names from the first record in the file.

3. Create a break line or separator between the fields. 4. Click on NEXT to continue.

5. Refer

Appendix

to

see

the

structure

of

NIELSEN.DAT

flat

file.

4. Change field name St to State and Code to Postal_Code. Note : The physical data file will be present on the Server. At runtime, when the Server is ready to process the data (which is now defined by this new source definition called Nielsen.dat) it will look for the flat file that contains the data in Nielsen.dat. 5. Click Finish. 6. Name the new source definition NIELSEN. This is the name that will appear as metadata in the repository, for the source definition.

III. Design the Target Schema


Assumption: The target table does not exist in the database 1. Switch to Target Designer. 2. Select EDIT | CLEAR if necessary to clear the workspace. Any objects you clear from the workspace will still be available for use in Designers Navigator Window, in the Targets node. 3. Drag the NIELSEN source definition from the Navigator Window into the workspace to automatically create a target table definition. You have just created a target definition based on the structure of the source file definition. You now need to edit the target table definition. 4. Rename the table as Tgt_New_Cust_x.

5. Enter the field names as mentioned in the Figure below .Change the Key Type for Customer_ID to Primary Key. The Not Null option will automatically be checked. Save the repository. 6. The target table definition should look like this

7. Create the physical table in the Oracle Database so that you can load data. Hint : From the Edit table properties in Target designer, change the database type to Oracle.

IV. Create the mapping and drag the Source and Target
1. Create a new mapping with the name M_New_Customer_x 2. Drag the source into the Mapping Designer workspace. The SourceQualifier should be automatically created. 3. Rename the Source Qualifier as SQ_NIELSEN_x 4. Drag the target (Tgt_New_Cust_x) into the Mapping Designer workspace

V. Create a Lookup Transformation


1. Select TRANSFORMATION | CREATE. 2. Select Lookup from the pull-down list.

3. Name

the

new

Lookup

transformation

Lkp_New_Customer_x.

4. You need to identify the Lookup table in the Lookup transformation. Use the CUSTOMERS table from the source database to serve as the Lookup table and import it from the database. 5. Select Import to import the Lookup table.

6. Enter the ODBC Data Source, Username, Owner name, and Password for the Source Database and Connect. 7. In the Select Tables box, expand the owner name until you see a TABLES listing. 8. Select the CUSTOMERS table. 9. Click OK.

10. Click

Done

to

close

the

Create

Transformation

dialog

box.

Note : All the columns from the CUSTOMERS table are seen in the transformation. 11. Create an input-only port in Lkp_New_Customer_x to hold the Customer_Id value, coming from SQ_NIELSEN_x . 1. Highlight the Cust_Id column from the SQ_NIELSEN_x 2. Drag/drop it to Lkp_New_Customer_x. 3. Double-click on Lkp_New_Customer_x to edit the Lookup transformation. 4. Click the Ports tab, make Cust_Id an input-only port. 5. Make CUSTOMER_Id a lookup and output port.

12. Create the condition for lookup.

1. Click the Condition Tab. 2. Click on the 3. Add the icon. lookup condition: CUSTOMER_ID = Cust_Id.

Note : Informatica takes its best guess at the lookup condition you intend, based on data type and precision of the ports now in the Lookup transformation.

13. Click the Properties tab. 14. At line 6 as shown in the figure below, note the Connection Information.

VI. Create a Filter Transformation

1. Create a Filter transformation that will filter through those records that do not match the lookup condition and name it Fil_New_Cust_x. 2. Drag all the ports from Source Qualifier to the new Filter. The next step is to create an inputonly port to hold the result of the lookup. 3. Highlight the CUSTOMER_ID port from Lkp_New_Customer_x . 4. Drag it to an empty port in Fil_New_Cust_x . 5. Double-click Fil_New_Cust_x to edit the filter. 6. Click the Properties tab. 7. Enter the filter condition: ISNULL(CUSTOMER_ID). This condition will allow only those records whose value for CUSTOMER_ID is = null, to pass through the filter. 8. Click OK twice to exit the transformation. 9. Link all ports except CUSTOMER_ID from the Filter to the Target table.

Hint : Select the LAYOUT | AUTOLINK menu options, or right-click in the workspace background, and choose Auto link. In the Auto link box, select the Name radio button. This will link the corresponding columns based on their names.

10. Click OK. 11. Save the repository. 12. Check the Output window to verify that the mapping is valid. 13. Given below is the final mapping.

VII.

Create the Workflow and Set Session Tasks Properties


1. Launch the Workflow Manager and connect to the repository. 2. Select your folder. 3. Select WORKFLOWS | CREATE to create a Workflow as wf_New_Customer_x. 4. Select TASKS | CREATE to d create a Session Task as s_New_Customer_x. 5. Select the M_New_Customer_x mapping. 6. Set the following options in the Session Edit Task: 1. Select the Properties tab. Leave all defaults. 7. Select the Mapping tab. 1. Select the Source folder. On the right hand side, under Properties, verify the attribute settings are set to the following: 1. Source Directory path = $PMSourceFileDir\ 2. File Name = Nielsen.dat (Use the same case as that present on the server) 3. Source Type: Direct

Note : For the session you are creating, the Server needs the exact path, file name and extension for the file as it resides on the Server, to use at run time

2. Click

on

the

Set

File

Properties

button.

3. Click

on

Advanced.

4. Check the Line sequential file format check box.

5. Select the Targets folder. 1. Under Connections on the right hand side, Select the value of Target Relational Database Connection. 6. In the Transformations Folder, Select the Lkp_New_Customer transformation. 1. On the right hand side, in Connections, Select the Relational Database Connection for the Lookup Table. Figure

8. Run the Workflow. 9. Monitor the Workflow. 10. View the Session Details and Session Log. 11. Verify the Results from the target table by running the query SELECT * FROM Tgt_New_Cust_x;

SCD Type 1 Implementation using Informatica PowerCenter

Unlike SCD Type 2, Slowly Changing Dimension Type 1 do not preserve any history versions of data. This methodology overwrites old data with new data, and therefore stores only the most current information. In this article lets discuss the step by step implementation of SCD Type 1 using Informatica PowerCenter.

The number of records we store in SCD Type 1 do not increase exponentially as this methodology

overwrites old data with new data Hence we may not need theperformance improvement techniques used in the SCD Type 2 Tutorial.

Understand the Staging and Dimension Table.


Slowly Changing Dimension Series Part I : SCD Type 1. Part II : SCD Type 2. Part III : SCD Type 3. Part IV : SCD Type 4. Part V : SCD Type 6. For our demonstration purpose, lets consider the CUSTOMER Dimension. Below are the detailed structure of both staging and dimension table.

Staging Table
In our staging table, we have all the columns required for the dimension table attributes. So no other tables other than Dimension table will be involved in the mapping. Below is the structure of our staging table.

Key Points
1. Staging table will have only one days data. Change Data Capture is not in scope. 2. Data is uniquely identified using CUST_ID. 3. All attribute required by Dimension Table is available in the staging table

Dimension Table
Here is the structure of our Dimension table.

Key Points
1. CUST_KEY is the surrogate key. 2. CUST_ID is the Natural key, hence the unique record identifier.

Mapping Building and Configuration


Step 1 Lets start the mapping building process. For that pull the CUST_STAGE source definition into the mapping designer.

Step

Now using a LookUp Transformation fetch the existing Customer columns from the dimension table T_DIM_CUST. This lookup will give NULL values if the customer is not already existing in the Dimension tables.

LookUp Condition : IN_CUST_ID = CUST_ID Return Columns : CUST_KEY

Step

Use an Expression Transformation to identify the records for Insert and Update using below expression.
o

INS_UPD :- IIF(ISNULL(CUST_KEY),'INS', 'UPD')

Additionally create two output ports.

o o

CREATE_DT :- SYSDATE UPDATE_DT :- SYSDATE

See the structure of the mapping in below image.

Step

Map the columns from the Expression Transformation to a Router Transformation and create two groups (INSERT, UPDATE) in Router Transformation using the below expression. The mapping will look like shown in the image.
o

INSERT :- IIF(INS_UPD='INS',TRUE,FALSE)

UPDATE :- IIF(INS_UPD='UPD',TRUE,FALSE)

INSERT Group
Step 5 Every records coming through the 'INSERT Group' will be inserted into the Dimension table T_DIM_CUST.

Use a Sequence generator transformation to generate surrogate key CUST_KEY as shown in below image. And map the columns from the Router Transformation to the target as shown below image.

Note : Update Strategy is not required, if the records are set for Insert.

UPDATE Group
Step 6 Records coming from the 'UPDATE Group' will update the customer Dimension with the latest customer attributes. Add an Update Strategy Transformation before the target instance and set it as DD_UPDATE. Below is the structure of the mapping.

We are done with the mapping building and below is the structure of the completed mapping.

Workflow and Session Creation


There is not any specific properties required to be given during the session configuration.

Below is a sample data set taken from the Dimension table T_DIM_CUST. Initial Inserted Value for CUSTI_ID 1003

Updated Value for CUSTI_ID 1003

Hope you guys enjoyed this. Please leave us a comment in case you have any questions of difficulties implementing this.

Slowly Changing Dimension Type 2 also known SCD Type 2 is one of the most commonly used type of Dimension table in a Data Warehouse. SCD Type 2 dimension loads are considered to be complex mainly because of the data volume we process and because of the number of transformation we are using in the mapping. Here in this article, we will be building an Informatica PowerCenter mapping to load SCD Type 2 Dimension.

Understand the Data Warehouse Architecture


Before we go to the mapping design, Lets understand the high level architecture of our Data Warehouse.

Slowly Changing Dimension Series Part I : SCD Type 1. Part II : SCD Type 2. Part III : SCD Type 3. Part IV : SCD Type 4. Part V : SCD Type 6. Here we have a staging schema, which is loaded from different data sources after the required data cleansing. Warehouse Tables are loaded from the staging schema directly. Both staging tables and the warehouse tables are in two different schemas with in a single database instance.

Understand the Staging and Dimension Table.


Staging Table
In our staging table, we have all the columns required for the dimension table attributes. So no other tables other than Dimension table will be involved in the mapping. Below is the structure of our staging table.

CUST_ID CUST_NAME ADDRESS1 ADDRESS2 CITY STATE ZIP

Key Points :
1. Staging table will have only one days data. 2. Data is uniquely identified using CUST_ID. 3. All attribute required by Dimension Table is available in the staging table.

Dimension Table
Here is the structure of our Dimension table.

CUST_KEY AS_OF_START_DT AS_OF_END_DT CUST_ID CUST_NAME ADDRESS1 ADDRESS2 CITY STATE ZIP CHK_SUM_NB CREATE_DT UPDATE_DT

Key Points :
1. CUST_KEY is the surrogate key. 2. CUST_ID, AS_OF_END_DT is the Natural key, hence the unique record identifier. 3. Record versions are kept based on Time Range using AS_OF_START_DT,

AS_OF_END_DATE 4. Active record will have an AS_OF_END_DATE value 12-31-4000 5. Checksum value of all dimension attribute columns are stored into the column CHK_SUM_NB

Mapping Building and Configuration


Now we understand the ETL Architecture, Staging Table, Dimension Table and the design considerations, we can go to the mapping development. We are splitting the mapping development into six steps. 1. Join Staging Table and Dimension Table 2. Data Transformation
o o o

Generate Surrogate Key Generate Checksum Number Other Calculations

3. Identify Insert/Update 4. Insert the new Records 5. Update(Expire) the Old Version 6. Insert the new Version of Updated Record

1. Join Staging Table and Dimension Table


We are going to OUTER JOIN both the Staging (Source) Table and the Dimension (Target) Table using the SQL Override below. An OUTER Join gives you all the records from the Staging table and the corresponding records from Dimension table. if it is there is no corresponding record in the Dimension table, it returns NULL values for the Dimension table columns.

SELECT --Columns From Staging (Source) Tables CUST_STAGE.CUST_ID, CUST_STAGE.CUST_NAME, CUST_STAGE.ADDRESS1, CUST_STAGE.ADDRESS2, CUST_STAGE.CITY, CUST_STAGE.STATE,

CUST_STAGE.ZIP, --Columns from Dimension (Target) Tables. T_DIM_CUST.CUST_KEY, T_DIM_CUST.CHK_SUM_NB FROM CUST_STAGE LEFT OUTER JOIN T_DIM_CUST ON CUST_STAGE.CUST_ID = T_DIM_CUST.CUST_ID -- Join On the Natural Key AND T_DIM_CUST.AS_OF_END_DT = TO_DATE('12-31-4000','MM-DD-YYYY') Get the active record.

2. Data Transformation
Now map the columns from the Source Qualifier to an Expression Transformation. When you map the columns to the Expression Transformation, rename the ports from Dimension Table with OLD_CUST_KEY, CUST_CHK_SUM_NB and add below expressions.

Generate Surrogate Key : A surrogate key will be generated for each and every record inserted in to theDimension table
o

CUST_KEY : Is the surrogate key, This will be generated using a Sequence Generator Transformation

Generate Checksum Number : Checksum number of all dimension attributes. Difference in the Checksum value between the incoming and Checksum of the Dimension table record will indicate a changed column value. This is an easy way to identify changes in the columns than comparing each and every column.
o

CHK_SUM_NB : MD5(TO_CHAR(CUST_ID) || CUST_NAME || ADDRESS1 || ADDRESS2 || CITY || STATE || TO_CHAR(ZIP))

Other Calculations :
o

Effective Start Date : Effective start date of the Record

AS_OF_START_DT : TRUNC(SYSDATE)

Effective end date : Effective end date of the Record,

AS_OF_END_DT : TO_DATE('12-31-4000','MM-DD-YYYY')

Record creation date : Record creation timestamp, this will be used for the records inserted

CREATE_DT : TRUNC(SYSDATE)

Record updating date : Record updating timestamp, this will be used for records updated.

UPDATE_DT : TRUNC(SYSDATE)

3. Identify Insert/Update
In this step we will identify the records for INSERT and UPDATE.

INSERT : A record will be set for INSERT if the record is not exist in the Dimension Table, We can identify the New records if OLD_CUST_KEY is NULL, which is the column from the Dimension table

UPDATE : A record will be set for UPDATE, if the record is already existing in the Dimension table and any of the incoming column from staging table has a new value. If the column OLD_CUST_KEY is not null and the Checksum of the incoming record is different from the Checksum of the existing record (OLD_CHK_SUM_NB <> CHK_SUM_NB), the record will be set for UPADTE
o

Following

expression

will

be

used

in

the

Expression

Transformation

port INS_UPD_FLG shown in the previous step


o

INS_UPD_FLG : IIF(ISNULL(OLD_CUST_KEY), 'I', IIF(NOT ISNULL(OLD_CUST_KEY) AND OLD_CHK_SUM_NB <> CHK_SUM_NB, 'U'))

Now map all the columns from the Expression Transformation to a Router and add two groups as below
o o

INSERT : IIF(INS_UPD_FLG = 'I', TRUE, FALSE) UPDATE : IIF(INS_UPD_FLG = 'U', TRUE, FALSE)

4. Insert The new Records


Now map all the columns from the INSERT group to the Dimension table instance T_DIM_CUST. While mapping the columns, we dont need any column named OLD_, which is pulled from the Dimension table.

5. Update(Expire) the Old Version


The records which are identified for UPDATE will be inserted into a temporary table T_DIM_CUST_TEMP. These records will then be updated into T_DIM_CUST as a post session SQL. You can learn more about this performance improvement technique from one of our previous post.

We will be mapping below columns from UPDATE group of the Router Transformation to the target table. To update(expire) the old record we just need the columns below list.
o o o

OLD_CUST_KEY : To uniquely identify the Dimension Column. UPDATE_DATE : Audit column to know the record update date. AS_OF_END_DT : Record will be expired with previous days date. map the columns, AS_OF_END_DT will be calculated

While

we

as ADD_TO_DATE(TRUNC(SYSDATE),'DD',-1) in an Expression Transformation. Below image gives the picture of the mapping.

6. Insert the new Version of Updated Record


The records which are identified as UPDATE will have to have a new(active) version inserted. Map all the ports from the UPDATE group of the Router Transformation to target instance

T_DIM_CUST. While mapping the columns, we dont need any column named OLD_, which is pulled from the Dimension table.

Workflow and Session Creation


During the session configuration process, add the below SQL as part of the Post session SQL statement as shown below. This correlated update SQL will update the records in T_DIM_CUST table with the values from T_DIM_CUST_TEMP. Like we mentioned previously, this is a performance improvement technique used to update huge tables. UPDATE (T_DIM_CUST.AS_OF_END_DT, T_DIM_CUST.UPDATE_DT) (SELECT T_DIM_CUST_TEMP.AS_OF_END_DT, T_DIM_CUST_TEMP.UPDATE_DT FROM WHERE (SELECT T_DIM_CUST_TEMP.CUST_KEY = T_DIM_CUST_TEMP T_DIM_CUST.CUST_KEY) WHERE EXISTS 1 = T_DIM_CUST SET

FROM WHERE T_DIM_CUST_TEMP.CUST_KEY = T_DIM_CUST.CUST_KEY)

T_DIM_CUST_TEMP

Now lets look at the data see how it looks from the below image.

TARGET UPDATE OVERRIDE - INFORMATICA


When you used an update strategy transformation in the mapping or specified the "Treat Source Rows As" option as update, informatica integration service updates the row in the target table whenever there is match of primary key in the target table found. The update strategy works only

when there is primary key defined in the target definition. When you want update the target table based on the primary key.

What if you want to update the target table by a matching column other than the primary key? In this case the update strategy wont work. Informatica provides feature, "Target Update Override", to update even on the columns that are not primary key. You can find the Target Update Override option in the target definition properties tab. The syntax of update statement to be specified in Target Update Override is

UDATE TARGET_TABLE_NAME SET TARGET_COLUMN1 = :TU.TARGET_PORT1, [Additional update columns] WHERE TARGET_COLUMN = :TU.TARGET_PORT AND [Additional conditions]

Here TU means target update and used to specify the target ports. Example: Consider the employees table as an example. In the employees table, the primary key is employee_id. Let say we want to update the salary of the employees whose employee name is MARK. In this case we have to use the target update override. The update statement to be specified is

UPDATE EMPLOYEES SET SALARY = :TU.SAL WHERE EMPLOYEE_NAME = :TU.EMP_NAME

Target Update Override in Informatica


By default, the Informatica Server updates targets based on key values. However, you can override the default UPDATE statement for each target in a mapping. You might want to update the target based on non-key columns. For a mapping without an Update Strategy transformation, configure the session to mark source records as update. If your mapping includes an Update Strategy transformation, the Target Update option only affects source records marked as update. The Informatica Server processes all records marked as insert, delete, or reject normally. When you configure the session, mark source records as data-driven. The Target Update Override only affects source rows marked as update by the Update Strategy transformation. Overriding the WHERE Clause

Default: UPDATE T_EMP_UPDATE_OVERRIDE SET ENAME = :TU.ENAME, JOB = :TU.JOB, SAL = :TU.SAL WHERE ENAME = :TU.ENAME
You can override the WHERE clause to include non-key columns. For example, you might want to update records for employees named Smith only. To do this, you edit the WHERE clause as follows:

UPDATE T_EMP_UPDATE_OVERRIDE SET EMPNO = :TU.EMPNO, ENAME = :TU.ENAME, JOB = :TU.JOB, SAL = :TU.SAL where ENAME = :TU.ENAME AND ENAME='SMITH'
Entering a Target Update Statement Follow these instructions to create an update statement. To enter a target update statement: 1. Double-click the title bar of a target instance. 2. Click Properties. 4. Click the arrow button in the Update Override field. 5. The SQL Editor displays. 5. Select Generate SQL. The default UPDATE statement appears. 6. Modify the update statement. You can override the WHERE clause to include non-key columns. 7. Click OK. NOTES: 1. One more thing i want to say that is :TU is a reserved keyword in Informatica to be used to match target port names with target table's column name. 2.The general error when we are doing this is as follows "TE_7023 Transformation Parse Fatal Error; transformation stopped... error constructing sql statement". Check the following to Solve This.. Override Statement Once You have to keep a space before the :TU.

S-ar putea să vă placă și