Documente Academic
Documente Profesional
Documente Cultură
This tutorial shows the process of creating an Informatica PowerCenter mapping and workflow which pulls data from Flat File data sources and use LookUp and Filter Transformation. For the demonstration purpose lets consider a flat file with the list of existing and potential customers. We need to create a mapping which loads only the potential customers but not the existing customers to a relational target table. While creating the mapping we will cover the following.
Create a mapping which reads from a flat file and creates a relational table consisting of new customers
Analyze a fixed width flat file Configure a Connected Lookup transformation Use a Filter transformation to exclude records from the pipeline.
2. Select Fixed Width and check the Import field names from first line box. This option will extract the field names from the first record in the file.
3. Create a break line or separator between the fields. 4. Click on NEXT to continue.
5. Refer
Appendix
to
see
the
structure
of
NIELSEN.DAT
flat
file.
4. Change field name St to State and Code to Postal_Code. Note : The physical data file will be present on the Server. At runtime, when the Server is ready to process the data (which is now defined by this new source definition called Nielsen.dat) it will look for the flat file that contains the data in Nielsen.dat. 5. Click Finish. 6. Name the new source definition NIELSEN. This is the name that will appear as metadata in the repository, for the source definition.
5. Enter the field names as mentioned in the Figure below .Change the Key Type for Customer_ID to Primary Key. The Not Null option will automatically be checked. Save the repository. 6. The target table definition should look like this
7. Create the physical table in the Oracle Database so that you can load data. Hint : From the Edit table properties in Target designer, change the database type to Oracle.
IV. Create the mapping and drag the Source and Target
1. Create a new mapping with the name M_New_Customer_x 2. Drag the source into the Mapping Designer workspace. The SourceQualifier should be automatically created. 3. Rename the Source Qualifier as SQ_NIELSEN_x 4. Drag the target (Tgt_New_Cust_x) into the Mapping Designer workspace
3. Name
the
new
Lookup
transformation
Lkp_New_Customer_x.
4. You need to identify the Lookup table in the Lookup transformation. Use the CUSTOMERS table from the source database to serve as the Lookup table and import it from the database. 5. Select Import to import the Lookup table.
6. Enter the ODBC Data Source, Username, Owner name, and Password for the Source Database and Connect. 7. In the Select Tables box, expand the owner name until you see a TABLES listing. 8. Select the CUSTOMERS table. 9. Click OK.
10. Click
Done
to
close
the
Create
Transformation
dialog
box.
Note : All the columns from the CUSTOMERS table are seen in the transformation. 11. Create an input-only port in Lkp_New_Customer_x to hold the Customer_Id value, coming from SQ_NIELSEN_x . 1. Highlight the Cust_Id column from the SQ_NIELSEN_x 2. Drag/drop it to Lkp_New_Customer_x. 3. Double-click on Lkp_New_Customer_x to edit the Lookup transformation. 4. Click the Ports tab, make Cust_Id an input-only port. 5. Make CUSTOMER_Id a lookup and output port.
1. Click the Condition Tab. 2. Click on the 3. Add the icon. lookup condition: CUSTOMER_ID = Cust_Id.
Note : Informatica takes its best guess at the lookup condition you intend, based on data type and precision of the ports now in the Lookup transformation.
13. Click the Properties tab. 14. At line 6 as shown in the figure below, note the Connection Information.
1. Create a Filter transformation that will filter through those records that do not match the lookup condition and name it Fil_New_Cust_x. 2. Drag all the ports from Source Qualifier to the new Filter. The next step is to create an inputonly port to hold the result of the lookup. 3. Highlight the CUSTOMER_ID port from Lkp_New_Customer_x . 4. Drag it to an empty port in Fil_New_Cust_x . 5. Double-click Fil_New_Cust_x to edit the filter. 6. Click the Properties tab. 7. Enter the filter condition: ISNULL(CUSTOMER_ID). This condition will allow only those records whose value for CUSTOMER_ID is = null, to pass through the filter. 8. Click OK twice to exit the transformation. 9. Link all ports except CUSTOMER_ID from the Filter to the Target table.
Hint : Select the LAYOUT | AUTOLINK menu options, or right-click in the workspace background, and choose Auto link. In the Auto link box, select the Name radio button. This will link the corresponding columns based on their names.
10. Click OK. 11. Save the repository. 12. Check the Output window to verify that the mapping is valid. 13. Given below is the final mapping.
VII.
Note : For the session you are creating, the Server needs the exact path, file name and extension for the file as it resides on the Server, to use at run time
2. Click
on
the
Set
File
Properties
button.
3. Click
on
Advanced.
5. Select the Targets folder. 1. Under Connections on the right hand side, Select the value of Target Relational Database Connection. 6. In the Transformations Folder, Select the Lkp_New_Customer transformation. 1. On the right hand side, in Connections, Select the Relational Database Connection for the Lookup Table. Figure
8. Run the Workflow. 9. Monitor the Workflow. 10. View the Session Details and Session Log. 11. Verify the Results from the target table by running the query SELECT * FROM Tgt_New_Cust_x;
Unlike SCD Type 2, Slowly Changing Dimension Type 1 do not preserve any history versions of data. This methodology overwrites old data with new data, and therefore stores only the most current information. In this article lets discuss the step by step implementation of SCD Type 1 using Informatica PowerCenter.
The number of records we store in SCD Type 1 do not increase exponentially as this methodology
overwrites old data with new data Hence we may not need theperformance improvement techniques used in the SCD Type 2 Tutorial.
Staging Table
In our staging table, we have all the columns required for the dimension table attributes. So no other tables other than Dimension table will be involved in the mapping. Below is the structure of our staging table.
Key Points
1. Staging table will have only one days data. Change Data Capture is not in scope. 2. Data is uniquely identified using CUST_ID. 3. All attribute required by Dimension Table is available in the staging table
Dimension Table
Here is the structure of our Dimension table.
Key Points
1. CUST_KEY is the surrogate key. 2. CUST_ID is the Natural key, hence the unique record identifier.
Step
Now using a LookUp Transformation fetch the existing Customer columns from the dimension table T_DIM_CUST. This lookup will give NULL values if the customer is not already existing in the Dimension tables.
Step
Use an Expression Transformation to identify the records for Insert and Update using below expression.
o
o o
Step
Map the columns from the Expression Transformation to a Router Transformation and create two groups (INSERT, UPDATE) in Router Transformation using the below expression. The mapping will look like shown in the image.
o
INSERT :- IIF(INS_UPD='INS',TRUE,FALSE)
UPDATE :- IIF(INS_UPD='UPD',TRUE,FALSE)
INSERT Group
Step 5 Every records coming through the 'INSERT Group' will be inserted into the Dimension table T_DIM_CUST.
Use a Sequence generator transformation to generate surrogate key CUST_KEY as shown in below image. And map the columns from the Router Transformation to the target as shown below image.
Note : Update Strategy is not required, if the records are set for Insert.
UPDATE Group
Step 6 Records coming from the 'UPDATE Group' will update the customer Dimension with the latest customer attributes. Add an Update Strategy Transformation before the target instance and set it as DD_UPDATE. Below is the structure of the mapping.
We are done with the mapping building and below is the structure of the completed mapping.
Below is a sample data set taken from the Dimension table T_DIM_CUST. Initial Inserted Value for CUSTI_ID 1003
Hope you guys enjoyed this. Please leave us a comment in case you have any questions of difficulties implementing this.
Slowly Changing Dimension Type 2 also known SCD Type 2 is one of the most commonly used type of Dimension table in a Data Warehouse. SCD Type 2 dimension loads are considered to be complex mainly because of the data volume we process and because of the number of transformation we are using in the mapping. Here in this article, we will be building an Informatica PowerCenter mapping to load SCD Type 2 Dimension.
Slowly Changing Dimension Series Part I : SCD Type 1. Part II : SCD Type 2. Part III : SCD Type 3. Part IV : SCD Type 4. Part V : SCD Type 6. Here we have a staging schema, which is loaded from different data sources after the required data cleansing. Warehouse Tables are loaded from the staging schema directly. Both staging tables and the warehouse tables are in two different schemas with in a single database instance.
Key Points :
1. Staging table will have only one days data. 2. Data is uniquely identified using CUST_ID. 3. All attribute required by Dimension Table is available in the staging table.
Dimension Table
Here is the structure of our Dimension table.
CUST_KEY AS_OF_START_DT AS_OF_END_DT CUST_ID CUST_NAME ADDRESS1 ADDRESS2 CITY STATE ZIP CHK_SUM_NB CREATE_DT UPDATE_DT
Key Points :
1. CUST_KEY is the surrogate key. 2. CUST_ID, AS_OF_END_DT is the Natural key, hence the unique record identifier. 3. Record versions are kept based on Time Range using AS_OF_START_DT,
AS_OF_END_DATE 4. Active record will have an AS_OF_END_DATE value 12-31-4000 5. Checksum value of all dimension attribute columns are stored into the column CHK_SUM_NB
3. Identify Insert/Update 4. Insert the new Records 5. Update(Expire) the Old Version 6. Insert the new Version of Updated Record
SELECT --Columns From Staging (Source) Tables CUST_STAGE.CUST_ID, CUST_STAGE.CUST_NAME, CUST_STAGE.ADDRESS1, CUST_STAGE.ADDRESS2, CUST_STAGE.CITY, CUST_STAGE.STATE,
CUST_STAGE.ZIP, --Columns from Dimension (Target) Tables. T_DIM_CUST.CUST_KEY, T_DIM_CUST.CHK_SUM_NB FROM CUST_STAGE LEFT OUTER JOIN T_DIM_CUST ON CUST_STAGE.CUST_ID = T_DIM_CUST.CUST_ID -- Join On the Natural Key AND T_DIM_CUST.AS_OF_END_DT = TO_DATE('12-31-4000','MM-DD-YYYY') Get the active record.
2. Data Transformation
Now map the columns from the Source Qualifier to an Expression Transformation. When you map the columns to the Expression Transformation, rename the ports from Dimension Table with OLD_CUST_KEY, CUST_CHK_SUM_NB and add below expressions.
Generate Surrogate Key : A surrogate key will be generated for each and every record inserted in to theDimension table
o
CUST_KEY : Is the surrogate key, This will be generated using a Sequence Generator Transformation
Generate Checksum Number : Checksum number of all dimension attributes. Difference in the Checksum value between the incoming and Checksum of the Dimension table record will indicate a changed column value. This is an easy way to identify changes in the columns than comparing each and every column.
o
Other Calculations :
o
AS_OF_START_DT : TRUNC(SYSDATE)
AS_OF_END_DT : TO_DATE('12-31-4000','MM-DD-YYYY')
Record creation date : Record creation timestamp, this will be used for the records inserted
CREATE_DT : TRUNC(SYSDATE)
Record updating date : Record updating timestamp, this will be used for records updated.
UPDATE_DT : TRUNC(SYSDATE)
3. Identify Insert/Update
In this step we will identify the records for INSERT and UPDATE.
INSERT : A record will be set for INSERT if the record is not exist in the Dimension Table, We can identify the New records if OLD_CUST_KEY is NULL, which is the column from the Dimension table
UPDATE : A record will be set for UPDATE, if the record is already existing in the Dimension table and any of the incoming column from staging table has a new value. If the column OLD_CUST_KEY is not null and the Checksum of the incoming record is different from the Checksum of the existing record (OLD_CHK_SUM_NB <> CHK_SUM_NB), the record will be set for UPADTE
o
Following
expression
will
be
used
in
the
Expression
Transformation
INS_UPD_FLG : IIF(ISNULL(OLD_CUST_KEY), 'I', IIF(NOT ISNULL(OLD_CUST_KEY) AND OLD_CHK_SUM_NB <> CHK_SUM_NB, 'U'))
Now map all the columns from the Expression Transformation to a Router and add two groups as below
o o
INSERT : IIF(INS_UPD_FLG = 'I', TRUE, FALSE) UPDATE : IIF(INS_UPD_FLG = 'U', TRUE, FALSE)
We will be mapping below columns from UPDATE group of the Router Transformation to the target table. To update(expire) the old record we just need the columns below list.
o o o
OLD_CUST_KEY : To uniquely identify the Dimension Column. UPDATE_DATE : Audit column to know the record update date. AS_OF_END_DT : Record will be expired with previous days date. map the columns, AS_OF_END_DT will be calculated
While
we
as ADD_TO_DATE(TRUNC(SYSDATE),'DD',-1) in an Expression Transformation. Below image gives the picture of the mapping.
T_DIM_CUST. While mapping the columns, we dont need any column named OLD_, which is pulled from the Dimension table.
T_DIM_CUST_TEMP
Now lets look at the data see how it looks from the below image.
when there is primary key defined in the target definition. When you want update the target table based on the primary key.
What if you want to update the target table by a matching column other than the primary key? In this case the update strategy wont work. Informatica provides feature, "Target Update Override", to update even on the columns that are not primary key. You can find the Target Update Override option in the target definition properties tab. The syntax of update statement to be specified in Target Update Override is
UDATE TARGET_TABLE_NAME SET TARGET_COLUMN1 = :TU.TARGET_PORT1, [Additional update columns] WHERE TARGET_COLUMN = :TU.TARGET_PORT AND [Additional conditions]
Here TU means target update and used to specify the target ports. Example: Consider the employees table as an example. In the employees table, the primary key is employee_id. Let say we want to update the salary of the employees whose employee name is MARK. In this case we have to use the target update override. The update statement to be specified is
Default: UPDATE T_EMP_UPDATE_OVERRIDE SET ENAME = :TU.ENAME, JOB = :TU.JOB, SAL = :TU.SAL WHERE ENAME = :TU.ENAME
You can override the WHERE clause to include non-key columns. For example, you might want to update records for employees named Smith only. To do this, you edit the WHERE clause as follows:
UPDATE T_EMP_UPDATE_OVERRIDE SET EMPNO = :TU.EMPNO, ENAME = :TU.ENAME, JOB = :TU.JOB, SAL = :TU.SAL where ENAME = :TU.ENAME AND ENAME='SMITH'
Entering a Target Update Statement Follow these instructions to create an update statement. To enter a target update statement: 1. Double-click the title bar of a target instance. 2. Click Properties. 4. Click the arrow button in the Update Override field. 5. The SQL Editor displays. 5. Select Generate SQL. The default UPDATE statement appears. 6. Modify the update statement. You can override the WHERE clause to include non-key columns. 7. Click OK. NOTES: 1. One more thing i want to say that is :TU is a reserved keyword in Informatica to be used to match target port names with target table's column name. 2.The general error when we are doing this is as follows "TE_7023 Transformation Parse Fatal Error; transformation stopped... error constructing sql statement". Check the following to Solve This.. Override Statement Once You have to keep a space before the :TU.