Sunteți pe pagina 1din 69

Informatica Interview Questions

1.How to join two tables without common columns?

Create a Dummy Port in both tables and assign same value e.g. 1 to both ports in expression
transformation before joiner.
Now in join condition use this dummy port to join both tables.
Or
You join these table using null key.
Pass {} to the key
2. How to generate the sequence of keys or numbers in target without using the sequence
generator transformation.
It can be done using a setvariable function. We need to add a mapping variable with the initial
value given as 0.
Then in the expression transformation:
1. Seq_No -->
2. Out_Seq_No --> setvariable(,)
At every run, the value of the mapping variable will be incremented by 1.

3. How do you take only duplicate rows in target table in Informatica


Use this condition in sql override
select * from table_name where rowid not in(select max(rowid) from table_name group by
key_column_name);
Or
you use rank transformation make range according the field which represinting duplicasy
rows that have more then 1 rank put get only those rows in target table
4. when we use only Aggregator transformation in our mapping for approx
5 milion records it takes 40-42min time but when we use it with a sorter
transformation, the time reduces to 12 -13 min. We have also noticed that
throughput of select statement from source was also very high. It
aggregates grouped data quickly is ok but why throughput of select
statement was also higher using sorter transformation?

When an Aggregator transformation is used without sorter, it stores all the


data before performing the grouping operation. But when a sorter is used
before aggregator, it sorts all the data before passing it to the Aggregator
transformation.When the source records pass through the aggregator
transformation, it groups the rows based on the group by function once
the rows for that coulmn(used in group by) are passed to it.
eg.
eno ename
1A
2B
1C
Once the record " 2 B" is passed to the aggregator t/f, it groups the
records for "1" which is not the case when sorter is not used.
5. How to find out duplicate records using aggregator.
Its similar to the SQL query,
SELECT * FROM ,,....
FROM TABLE_NAME
GROUP BY ,,.....
HAVING COUNT(*)=1
Similarly in Informatica Aggregator transformation, select group by for all
the columns and add one output port,OUT_CNT_RCRDS=count(*)
In the next transformation, use a Router transformation and put a
condition,
G1_OUT_CNT_RCRDS=1
G2_OUT_CNT_RCRDS>1
G1_OUT_CNT_RCRDS --> TGT_NO_DUPLICATES
G2_OUT_CNT_RCRDS --> TGT_DUPLICATES

6. Delete First 3 Rows & Last 3 Rows in Target Table


How to delete first 3 rows & last 3 rows in target table in Informatica?
select count(1) from
(select * from where rownum<= (select count(1)-3 from )
minus

select * from where rownum<=3)


This query will remove the first 3 and the bottom 3 records. select count(1) from
Or
For deleting the first 3 rows in the target table, you can write a Post-SQL in the
Session task delete from < tablename > where rownum < 4;

7. If a record is updated multiple times before session is run, how do you track those changes
in SCD type 2?
can track the changes in SCD type 2 in many ways:
--> A time column
--> Mentioning a version

8. What is the way to add the total number of records that have been read from src in the tgt
file as last line?
This can be achieved using an Aggregator transformation.
In the aggregator transformation, check the group by columns for all the source
columns and add one extra output port in the aggregator.
OUT_TTL_RECORDS=count(*)
Pass this port value as the last record of the flat file target.

9. If there are multiple source flat files with different names but same file structure. How do
we load all those files to the target in one step?
1. Create the mapping as if there is only single source and target.
2. Now create an additional file on the server which will list the multiple
source file names along with their paths.
3. Specify the path and name of this file in the "Source File" under session
properties.
4. Now the most important thing - Set "Source Filetype" as "indirect" under
session properties.

10.write a query to retrieve latest records from the target table means if
we have used scd2 version type of dimension, than retrieve the record
with highest version no.for eg
verno
1 100
2 100
1 101
2 101

id loc
bang
kol
bang
chen

we have to retrieve 100/kol and 101/chen. how it is possible through


query.

select * from table_name where rowid in (select max(rowid) from table_name


groupby verno);

11. , I have scenario like, I have ten flat files to be loaded in to target but I need to load file

names in to a table in mapping level. I think we can do achieve though TC transformation.


Unfortunately I did not get though.. Please advise how to implement the logic. ( I need to
capture in mapping level only). This question has been asked one of the interviews.
If you are loading a target table from multiple flat files and looking to add the
source file name as a field in each row in the Target Table, then Transaction
Control will not help you here. You have to load all the source files using Indirect
Option in Session Level and list all the source file names to be loaded in one flat
file and give that as an input source file. Then in the PW Designer, go to the
Source Definition and enable the property Add Currently Processed Flat File
Name Port. This will add an additional port in the source definition. Pass that port
to the target tables filename field.
12.Suppose I have one source which is linked into 3 targets.When the

workflow runs for the first time only the first target should be populated
and the rest two(second and last) should not be populated.When the
workflow runs for the second time only the second target should be
populated and the rest two(first and last) should not be populated.When
the workflow runs for the third time only the third target should be
populated and the rest two(first and second) should not be populated.
u can use the 3 target tables as lookup. If an incoming row from the file is in the target, set
flags accordingly. Then next step you evaluate the flags and then use a router.
if in target 1, set flag1=Y, else N
if in target2, set flag2=Y else N
if in target3, set flag3=Y else N
Now if flag1=N, route totarget 1
if flag1=Y and flag2=N, route to target 2
if flag1=Y, flag2=Y and flag3=N route to target3
Of couse this is only if you are inserting rows into the targets. If you have updates, then of

course the logic gets complicated because you have to check for changed values. But the
concept would still be the same.

0r
declare a workflow variable like counter assign default variable =1
each time we run the workflow just increment variable like counter + 1.
if your are running first time, check the counter value mod 3 you will be getting 1 then load
first target.
during the second time we will get (counter mod 3 )=2 then load the data into second target
table.
during the thrid time we will get (counter mod 3 )=0 then load the data into third target table.
repository server automatically update the counter value in repository when it is
successfully finished. while executing second time repository server read the recent value
from repository

in my mapping i am having mutliple files and only one target output Flat file,and i need to
implement below logic.Can any one suggests me an idea ,how to do it?
input
-----file1:
field1 field2

field3

file2:

file3:
4

here i am reading three different files in the order File3,file2,file1 .The logic i needed is ,
for example if the record corresponding to '1' is present in multiple files ,then i need to
write the record which is present in the first file and discard the records corresponding to 1
in the rest rest of the files.My target is a flat file and i tried with update strategy but i had
later found that "update concept" wont work with flat files. So please suggest another way to
get this logic

output
-------6

1 A B
2 C

by this at informatica level we can do the required thing however instead of having fixed
number of source piplines (as # of files will be placed is not known in case)...it is better to
read all the files by indirect listing and then do the ranking based on source filename port and
grouping on field1....

so by indirect listing we will be independent of number of files coming from source and can
avoid UNION operations in turn

13.Informatica partition:

14
Adder header and footer in Infor.matica?

You can get the column heading for a flat file using the session configuration as
below. This session setting will give a file with header record 'Cust ID,Name,
Street #,City,State,ZIP'

Use Case 5 : Custom Flat File Footer.

You can get the footer for a flat file using the session configuration as given in
below image. This configuration will give you a file with ***** End Of The Report
***** as the last row of the file.

15.To Read a compressed Source file :

Before the file is read, the file need to be unzipped. We do not need any other pres session
script to achieve this. This can be done easy with the below session setting.

This command configuration generates rows to stdout and the Flat file reader
reads directly from stdout, hence removes need for staging data.

16.Reading multiple files


0

Generating a File List.

For reading multiple file sources with same structure, we use indirect file method.
Indirect file reading is made easy using File Command Property in the session
configuration as shown below.

Command writes list of file names to stdout and PowerCenter interprets this as a
file list

17.Zip the output target file:

Zip the Target File.

We can zip the target file using a post session script. but this can be done with
out a post session script as shown in below session configuration.

18. Informatica PowerCenter Partitioning for Parallel Processing and Faster


Delivery

In addition to a better ETL design, it is obvious to have a session optimized with


no bottlenecks to get the best session performance. After optimizing the session
performance, we can further improve the performance by exploiting the under
utilized hardware power. This refers to parallel processing and we can achieve
this in Informatica PowerCenter using Partitioning Sessions.

What is Session Partitioning


The Informatica PowerCenter Partitioning Option increases the performance of
PowerCenter through parallel data processing. Partitioning option will let you split
the large data set into smaller subsets which can be processed in parallel to get
a better session performance.

Partitioning Terminology
Lets understand some partitioning terminology before we get into mode details.

Partition : A partition is a subset of the data that executes in a single


thread.

Number of partitions : We can divide the data set into smaller subset by
increasing the number of partitions. When we add partitions, we increase
the number of processing threads, which can improve session
performance.

Stage : Stage is the portion of a pipeline, which is implemented at run


time as a thread.

Partition Point : This is the boundary between two stages and divide the
pipeline into stages. Partition point is always associated with a
transformation.

Partition Type : It is an algorithm for distributing data among partitions,


which is always associated with a partition point. The partition type
controls how the Integration Service distributes data among partitions at
partition points.

Below image shows the points we discussed above. We have three partitions and
three partition points in below session demo.

Type of Session Partitions


Different type of partition algorithms are available.

Database partitioning : The Integration Service queries the database


system for table partition information. It reads partitioned data from the
corresponding nodes in the database.

Round-Robin Partitioning : Using this partitioning algorithm, the


Integration service distributes data evenly among all partitions. Use roundrobin partitioning when you need to distribute rows evenly and do not
need to group data among partitions.

Hash Auto-Keys Partitioning : The PowerCenter Server uses a hash


function to group rows of data among partitions. When hash auto-key
partition is used, the Integration Service uses all grouped or sorted ports
as a compound partition key. You can use hash auto-keys partitioning at or
before Rank, Sorter, and unsorted Aggregator transformations to ensure
that rows are grouped properly before they enter these transformations.

Hash User-Keys Partitioning : Hash user keys. The Integration Service


uses a hash function to group rows of data among partitions based on a
user-defined partition key. You choose the ports that define the partition
key.

Key Range Partitioning : With this type of partitioning, you specify one
or more ports to form a compound partition key for a source or target. The
Integration Service then passes data to each partition depending on the
ranges you specify for each port.

Pass-through Partitioning : In this type of partitioning, the Integration


Service passes all rows at one partition point to the next partition point
without redistributing them.

Setting Up Session Partitions


Lets see what is required to setup a session

with partition enabled.

We can invoke the user interface for session partition as shown in below image
from your session using the menu Mapping -> Partitions.

The interface will let you Add/Modify Partitions, Partition Points and Choose the
type of partition Algorithm. Choose any transformation from the mapping and the
"Add Partition Point" button will let you add additional partition points.

Choose any transformation from the mapping and the "Delete Partition Point"
or "Edit Partition Point" button will let you modify partition points.

The "Add/Delete/Edit Partition Point" opens up an additional window which let


you modify the partition and choose the type of the partition algorithm as shown

in below image.

Example:

Business Use Case


Lets consider a business use case to explain the implementation of appropriate
partition algorithms and configuration.
Daily sales data generated from three sales region need to be loaded into an
Oracle data warehouse. The sales volume from three different regions varies a
lot, hence the number of records processed for every region varies a lot. The
warehouse target table is partitioned based on product line.

Below is the simple structure of the mapping to get the assumed functionality.

Pass-through Partition

A pass-through partition at the source qualifier transformation is used to split the


source data into three different parallel processing data sets. Below image shows
how to setup pass through partition for three different sales regions.

Once the partition is setup at the source qualifier, you get additional Source Filter
option to restrict the data which corresponds to each partition. Be sure to provide
the filter condition such that same data is not processed through more than one
partition and data is not duplicated. Below image shows three additional Source
Filters, one per each partition.

Round Robin Partition

Since the data volume from three sales region is not same, use round robin
partition algorithm at the next transformation in pipeline. So that the data is
equally distributed among the three partitions and the processing load is equally
distributed. Round robin partition can be setup as shown in below image.

Hash Auto Key Partition

At the Aggregator transformation, data need to redistribute across the partitions


to avoid the potential splitting of aggregator groups. Hash auto key partition
algorithm will make sure the data from different partition is redistributed such
that records with the same key is in the same partition. This algorithm will
identify the keys based on the group key provided in the transformation.
Processing records of the same aggregator group in different partition will result
in wrong result.

Key Range Partition

Use Key range partition when required to distribute the records among partitions
based on the range of values of a port or multiple ports.

Here the target table is range partitioned on product line. Create a range
partition on target definition on PRODUCT_LINE_ID port to get the best write
throughput.

Below images shows the steps involved in setting up the key range partition.
Click on Edit Keys to define the ports on which the key range partition is defined.

A pop up window shows the list of ports in the transformation, Choose the ports
on which the key range partition is required.

Now give the value start and end range for each partition as shown below.

We did not have to use Hash User Key Partition and Database Partition algorithm
in the use case discussed here.

Hash User Key partition algorithm will let you choose the ports to group rows
among partitions. This algorithm can be used in most of the places where hash
auto key algorithm is appropriate.
Database partition algorithm queries the database system for table partition
information. It reads partitioned data from the corresponding nodes in the
database. This algorithm can be applied either on the source or target definition.

19.Change data capture:

Change Data Capture framework for such project is not a recommended way to
handle this, just because of the efforts required to build the framework may not
be justified. Here in this article lets discuss about a simple, easy approach handle
Change Data Capture.
We will be using Informatica Mapping Variables to building our Change Data
Capture logic. Before even we talk about the implementation, lets understand
the Mapping Variable

Informatica Mapping Variable


What is Mapping Variable

These are variables created in PowerCenter Designer, which you can use in any
expression in a mapping, and you can also use the mapping variables in a source
qualifier filter, user-defined join, or extract override, and in the Expression Editor
of reusable transformations.
Mapping Variable Starting Value

Mapping variable can take the starting value from


1. Parameter file
2. Pre-session variable assignment
3. Value saved in the repository
4. Initial value
5. Default Value

The Integration Service looks for the start value in the order mentioned above.
Value of the mapping variable can be changed with in the session using an
expression and the final value of the variable will be saved into the repository.
The saved value from the repository is retrieved in the next session run and used
as the session start value.
Setting Mapping Variable Value

You can change the mapping variable value with in the mapping or session using
the Set Function. We need to use the set function based on the Aggregation Type
of the variable. Aggregation Type of the variable can be set when the variable is
declared in the mapping.

SetMaxVariable. Sets the variable to the maximum value of a group of


values. To use the SetMaxVariable with a mapping variable, the
aggregation type of the mapping variable must be set to Max.

SetMinVariable. Sets the variable to the minimum value of a group of


values. use the SetMinVariable with a mapping variable, the aggregation
type of the mapping variable must be set to Min.

SetCountVariable. Increments the variable value by one. In other words,


it adds one to the variable value when a row is marked for insertion, and
subtracts one when the row is marked for deletion. To use the
SetCountVariable with a mapping variable, the aggregation type of the
mapping variable must be set to Count.

SetVariable. Sets the variable to the configured value. At the end of a


session, it compares the final current value of the variable to the start
value of the variable. Based on the aggregate type of the variable, it saves
a final value to the repository.

Change Data Capture Implementation


Now we understand the mapping variables, lets go ahead and start building our
mapping with Change Data Capture.
Here we are going to implement Change Data Capture for CUSTOMER data
load. We need to load any new customer or changed customers data to a flat file.
Since the column UPDATE_TS value changes for any new or updated customer
record, we will be able to find the new or changed customer records using
UPDATE_TS column.
As the first step lets start the mapping and create a mapping variable as shown
in below image.
o

$$M_DATA_END_TIME as Date/Time

Now bring in the source and source qualified to the mapping designer
workspace. Open the source qualifier and give the filter condition to get the
latest data from the source as shown below.
o

STG_CUSTOMER_MASTER.UPDATE_TS
$M_DATA_END_TIME')

>

CONVERT(DATETIME,'$

Note : This filter condition will make sure that, latest data is pulled from the
source table each and every time. Latest value for the variable
$M_DATA_END_TIME is retrieved from the repository every time the session is
run.

Now map the column UPDATE_TS to an expression transformation and create a


variable expression as below.
o

SETMAXVARIABLE($M_DATA_END_TIME,UPDATE_TS)

Note : This expression will make sure that, latest value from the the column
UPDATE_TS is stored into the repository after the successful completion of the
session run.

Now you can map all the remaining columns to the down stream transformation
and complete all other transformation required in the mapping.

Thats all you need to configure Change Data Capture, Now create your workflow
and run the workflow.

Once you look into the session log file you can see the mapping variable value is
retrieved from the repository and used in the source SQL, just like shown in the
image below.

You can look at the mapping variable value stored in the repository, from
workflow manager. Choose the session from the workspace, right click and select
'View Persistent Value'. You get the mapping variable in a pop up window, like
shown below.

20. Difference between STOP and ABORT

Stop - If the Integration Service is executing a Session task when you issue the stop
command, the Integration Service stops reading data. It continues processing and writing
data and committing data to targets. If the Integration Service cannot finish processing and
committing data, you can issue the abort command.
Abort - The Integration Service handles the abort command for the Session task like the stop
command, except it has a timeout period of 60 seconds. If the Integration Service cannot
finish processing and committing data within the timeout period, it kills the DTM process
and terminates the session.
Stop: Stop command is used immediatly kills the process
Abort: Abort command is used it takes certain time period.after kill the process.It will takes
60 Sec to kill the process..
21. What are the join types in joiner transformation?

There are 4 Types of Joiner Trasnformations:


1) Normal
2) Master Outer
3) Detail Outer
4) Full Outer
Note: A normal or master outer join performs faster than a full outer or
detail outer join.
Example: In EMP, we have employees with DEPTNO 10, 20, 30 and 50. In
DEPT, we have DEPTNO 10, 20, 30 and 40. DEPT will be MASTER table as
it has less rows.
Normal Join: With a normal join, the Power Center Server discards all
rows of data from the master and detail source that do not match, based
on the condition.
All employees of 10, 20 and 30 will be there as only they are matching.
Master Outer Join: This join keeps all rows of data from the detail source
and the matching rows from the master source. It discards the unmatched
rows from the master source.
All data of employees of 10, 20 and 30 will be there.
There will be employees of DEPTNO 50 and corresponding DNAME and
LOC Columns will be NULL.
Detail Outer Join: This join keeps all rows of data from the master source
and the matching rows from the detail source. It discards the unmatched
rows from the detail source.
All employees of 10, 20 and 30 will be there.
There will be one record for DEPTNO 40 and corresponding data of EMP
columns will be NULL.
Full Outer Join: A full outer join keeps all rows of data from both the master
and detail sources.
All data of employees of 10, 20 and 30 will be there.
There will be employees of DEPTNO 50 and corresponding DNAME and
LOC Columns will be NULL.
There will be one record for DEPTNO 40 and corresponding data of EMP
Columns will be NULL
22. How to enter same record twice in target table? give me syntax.

In mapping drag source 2 times and make sure that source and target
doesn't have any key constraints.
Then add UNION TRF and link both sources to union and link output ports
from union to target.
or
You can use Normalizer t/f to achieve the desired output. There is an
"Occur" option in Normalizer in which you can mention the no of times you
want to load the same source data into target.

23. How to get particular record from the table in informatica?


We can use regmatch function in Informatica
Or we can use substr and instr aption to match particular records.

24.How to create primary key only on odd numbers?


use Mod function in the aggregator to find odd and even numbers... then
filter the records with odd no and use sequence generator

25. why sorter transformation is an active transformation?


It allows to sort data either in ascending or descending order according to
a specified field. Also used to configure for case-sensitive sorting, and
specify whether the output rows should be distinct. then it will not return
all the rows
So
If any transformation has the distinct option then it will be a active one,bec active
transformation is nothing but the transformation which will change the no. of o/p records.So
distinct always filters the duplicate rows,which inturn decrease the no of o/p records when
compared to i/n records.
One more thing is"An active transformation can also behave like a passive"

26. How we can validate all mapping at a time?


In repository go to menu tool then queries.query browser dialog box will appear.then click on
new button.
in query editor,choose folder name and object type after that execute it(by clicking the blue

arrow button) query results window will appear.u select single mapping or select whole
mappings(by pressing ctrl+A) and then go to tools then validate option to validate it

27. what is the difference between index cache and data


cache
INDEX CACHE: cache contains all the port values which port values are satisfies the
condition those port values are stored in index cache.
DATA CACHE: cache contains all the port values which port values are not satisfies the
condition those port values are stored in data cache.

All these properties are just for improving performance. cahce creates 2 files index and data
cache file. In index file, it just stores frequently acessed key columns wrt transformation
where more I/O and comparisions is required.
Assume if infa storing all data in single cache file considering a table of 100 columns. So
assume it may create a file of 100MB. So we are reading whole file actually where we just
want to read 1 key column data because of joining or sorting. Rest of 99 column data is just
has to be passed to downstream transformation without any other operation on it.
Consider same scenario now by separating a file into 2, one file stores data of 1 key column
of joiner or sorter. Then size of file to be read will be too less than 100MB (can say 10MB).
So think abt reading a file of 10MB and 100MB just for comparision even rest of 99 column
data is not required for comaprision.

28. how to format phone number 9999999999 into (999)9999999 in informatica


' ( ' || SUBSTR(sample,1,3) || ')' ||SUBSTR(sample,4,3) || '-' ||
SUBSTR(sample,7,4)

29.Different type of dimensions:


In Informatica 4 types of dimensions are available, these are
1) Degenerate dimensions
2) Junk dimesnsions ( a dimension which contains the less numbers of
cordinality vales are less number of indicators )
3) Confirmed dimensions ( a dimension which can be stored by multiple

fact tables)
4) Slowly changing ( based on period of time the dimensions will be
changed
a)SCD1 (most recent values in the target)
b)SCD2 (current+ history data)
c)SCD3 (just partial history)
5) Casual dimension
6) Dirty dim

30. difference between summary filter and details filter?

Summary Filter --- we can apply records group by that contain common values.
Detail Filter --- we can apply to each and every record in a database

31.data movement in Informatica:

32.Types of load in Informatica:


Incremental load:
Incremental means suppose today we processed 100 records ,for tomorrow run
u need to extract whatever the records inserted newly and updated after previous run based
on last updated timestamp (Yesterday run) this process called as incremental or delta
Normal load:
In normal load we are processing entire source data into target with constraint based
checking
Bulk load:
In bulk load with out checking constraints in target we are processing entire source data into
target

What is a Cold Start in Informatica Workflow?


Cold Start means that Integration Service will restart a task or workflow without
recovery. You can restart task or workflow without recovery by using a cold start.
Now Recovering a workflow means to restart processing of the workflow or tasks
from the point of interruption of the workflow or task. By default, the recovery
strategy for Workflow tasks is to fail the task and continue running the workflow.
Else you need to configure the recovery strategy.

To restart a task or workflow without recovery:


1. You can select the task or workflow that you want to restart.

2. Right click > Cold Start Task or Cold Start Workflow.

What is a FACTLESS FACT TABLE?Where we use Factless Fact

We know that fact table is a collection of many facts and measures


having multiple keys joined with one or more dimesion tables.Facts
contain both numeric and additive fields.But factless fact table are
different from all these.
A factless fact table is fact table that does not contain fact.They contain
only dimesional keys and it captures events that happen only at
information level but not included in the calculations level.just an
information about an event that happen over a period.

A factless fact table captures the many-to-many relationships between


dimensions, but contains no numeric or textual facts. They are often used to
record events or coverage information. Common examples of factless fact tables
include:

Identifying product promotion events (to determine promoted products


that didnt sell)

Tracking student attendance or registration events

Tracking insurance-related accident events

Identifying building, facility, and equipment schedules for a hospital or


university

Factless fact tables are used for tracking a process or collecting stats. They are
called so because, the fact table does not have aggregatable numeric values or

information.There are two types of factless fact tables: those that describe
events, and those that describe conditions. Both may play important roles in
your dimensional models.
Factless
fact
tables
for
Events
The first type of factless fact table is a table that records an event. Many eventtracking tables in dimensional data warehouses turn out to be
factless.Sometimes there seem to be no facts associated with an important
business process. Events or activities occur that you wish to track, but you find
no measurements. In situations like this, build a standard transaction-grained
fact table that contains no facts.
For eg.

The above fact is used to capture the leave taken by an employee.Whenever an


employee takes leave a record is created with the dimensions.Using the fact
FACT_LEAVE we can answer many questions like

Number of leaves taken by an employee

The type of leave an employee takes

Details of the employee who took leave

Factless
fact
tables
for
Conditions
Factless fact tables are also used to model conditions or other important
relationships among dimensions. In these cases, there are no clear transactions
or events.It is used to support negative analysis report. For example a Store that
did not sell a product for a given period. To produce such report, you need to
have a fact table to capture all the possible combinations. You can then figure
out what is missing.
For eg, fact_promo gives the information about the products which have
promotions but still did not sell

This fact answers the below questions:

To find out products that have promotions.

To find out products that have promotion that sell.

The list of products that have promotion but did not sell.

This kind of factless fact table is used to track conditions, coverage or eligibility.
In Kimball terminology, it is called a "coverage table."
Note:
We may have the question that why we cannot include these information in the
actual fact table .The problem is that if we do so then the fact size will increase
enormously .

Factless fact table is crucial in many complex business processes. By applying


you can design a dimensional model that has no clear facts to produce more
meaningful information for your business processes.Factless fact table itself can
be used to generate the useful reports.

The different types of ETL Testing are,


1. RequirementsTesting
2. DataValidation Testing
3. IntegrationTesting
4. ReportTesting
5. UserAcceptance Testing
6. PerformanceTesting
7. RegressionTesting

Requirements Testing Phase in ETL Testing


The steps are,

Are the requirements complete?

Are the requirements testable?

Are the requirements clear (is there any ambiguity)?

Data Validation Testing Phase in ETL Testing

Compare record counts between data sources

Ensure that the ETL application properly rejects, replaces with default
values and reports invalid data

Verify that data is transformed correctly according to system requirements


and business rules

Compare unique values of key fields between source data and warehouse
data

Ensure that all projected data is loaded into the data warehouse without
any data loss or truncation

Test the boundaries of each field to find any database limitations

Integration Testing Phase in ETL Testing


The steps are,

Verify the sequence and outcome of ETL batch jobs

Verify that ETL processes function with upstream and downstream


processes

Verify the initial load of records on data warehouse

Verify any incremental loading of records at a later date for newly inserted
or updated data

Test the rejected records that fail ETL rules

Test error log generation

Report Testing Phase in ETL Testing


The steps are,

Verify report data with the data source

Create SQL queries to verify source/target data

Verify field-level data

User Acceptance Testing(UAT) Phase in ETL Testing


The steps are,

Verify that the business rules have been met

Confirm that the system is acceptable to the client

Performance Testing Phase in ETL Testing


The steps are,

Verify that data loads and queries are executed within anticipated time
frames

Verify that maximum anticipated volume of data is loaded within an


acceptable time frame

Verify load times with various amounts of data to predict scalability


Regression Testing Phase in ETL Testing

The steps are,

Ensure that current functionality stays intact whenever new code is


release

Informatica Java Transformation Practical Example


Feel the Power of Java programming language to transform data in PowerCenter Informatica.
Java Transformation in Informatica can be used either in Active or Passive Mode.
Suppose I have the requirement where my source data looks like this:
Source Data

NAME

CUST_ID

SVC_ST_DT

SVC_END_DT

TOM

31/08/2009

23/03/2011

DICK

01/01/2004

31/05/2010

HARRY

28/02/2007

31/12/2009

Here I have a service start date and service end date tied to a customer.
Now I want my target table data in a flattened manner like this:
Target Data
NAME

CUST_ID

SVC_ST_DT

SVC_END_DT

TOM

31/08/2009

31/12/2009

TOM

01/01/2010

31/12/2010

TOM

01/01/2011

23/03/2011

DICK

01/01/2004

31/12/2004

DICK

01/01/2005

31/12/2005

DICK

01/01/2006

31/12/2006

DICK

01/01/2007

31/12/2007

DICK

01/01/2008

31/12/2008

DICK

01/01/2009

31/12/2009

DICK

01/01/2010

31/05/2010

HARRY

28/02/2007

31/12/2007

HARRY

01/01/2008

31/12/2008

HARRY

01/01/2009

31/12/2009

i.e. I want to split the service start date and service end dates on a yearly basis.
The first thing that comes to mind with this situation is to use Informatica Normalizer. Thats
TRUE. But if you think twice, you will find that we need to assume or hard-code one thing.
That means you should consider that either the time span should have a fixed maximum
value. Actually say the maximum span between the start and end date should be 5 years.
Knowingly you are trying to set the number of occurences of the Normalizer. Next you will
be using a expression transformation followed by a filter to achieve the requirement. But in
this manner the requirement would not be satisfied when a customer having tenure more than
5 years.
Now here I will be using a small portion of Java Code. The real raw power of Java
programming language called from Informatica Powercenter will do the data transformation.
Lets go straight to the mapping and the code.

Find the Java Code:try


{

DateFormat formatter = new SimpleDateFormat("dd/MM/yyyy");


Calendar cal1 = Calendar.getInstance();
Calendar cal2 = Calendar.getInstance();

int st_yr, ed_yr, st_mon, ed_mon, st_date, ed_date, st_ldm, ed_ldm;


String str;
Date st_dt = (Date)formatter.parse(SVC_ST_DT);
Date ed_dt = (Date)formatter.parse(SVC_END_DT);
cal1.clear();
cal1.setTime(st_dt);
cal2.clear();
cal2.setTime(ed_dt);
st_yr = cal1.get(Calendar.YEAR);
ed_yr = cal2.get(Calendar.YEAR);
do
{
OUT_NAME = NAME;
OUT_CUST_ID = CUST_ID;
OUT_SVC_ST_DT = formatter.format(st_dt);
if(ed_yr != st_yr)
{
str = "31/12/" + st_yr;
st_dt = (Date)formatter.parse(str);
cal1.setTime(st_dt);
OUT_SVC_END_DT = formatter.format(st_dt);
}
else
OUT_SVC_END_DT = formatter.format(ed_dt);
generateRow();
st_yr = st_yr + 1;
str = "01/01/" + st_yr;
st_dt = (Date)formatter.parse(str);
cal1.setTime(st_dt);
st_yr = cal1.get(Calendar.YEAR);
}while(ed_yr >= st_yr);
}
catch (ParseException e)
{
System.out.println(e);
}

Next now if we want to transform and load the data on a monthly basis. Simply find the
Mapping and the Code.

Find the Java Code:try


{

DateFormat formatter = new SimpleDateFormat("dd/MM/yyyy");


DateFormat formatter1 = new SimpleDateFormat("dd/M/yyyy");
Calendar cal1 = Calendar.getInstance();
Calendar cal2 = Calendar.getInstance();
int yr, st_mon, ed_mon, st_ldm;
String str;
Date st_dt = (Date)formatter.parse(SVC_ST_DT);
Date ed_dt = (Date)formatter.parse(SVC_END_DT);
cal1.clear();
cal1.setTime(st_dt);
cal2.clear();
cal2.setTime(ed_dt);
yr = cal1.get(Calendar.YEAR);
st_mon = cal1.get(Calendar.MONTH)+1;
ed_mon = cal2.get(Calendar.MONTH)+1;
st_ldm = cal1.getActualMaximum(Calendar.DAY_OF_MONTH);
while(ed_mon != st_mon)
{
OUT_NAME = NAME;
OUT_CUST_ID = CUST_ID;
OUT_SVC_ST_DT = formatter.format(st_dt);
if(ed_mon != st_mon)
{
str = st_ldm + "/" + st_mon +"/" + yr;
st_dt = (Date)formatter1.parse(str);
cal1.clear();

cal1.setTime(st_dt);
OUT_SVC_END_DT = formatter.format(st_dt);
}
else
{

OUT_SVC_ST_DT = formatter.format(ed_dt);
}
generateRow();

st_mon = st_mon + 1;
str = "01/" + st_mon + "/" + yr;
st_dt = (Date)formatter1.parse(str);
cal1.clear();
cal1.setTime(st_dt);
st_mon = cal1.get(Calendar.MONTH)+1;
st_ldm = cal1.getActualMaximum(Calendar.DAY_OF_MONTH);

OUT_NAME = NAME;
OUT_CUST_ID = CUST_ID;
OUT_SVC_ST_DT = formatter.format(st_dt);
OUT_SVC_END_DT = formatter.format(ed_dt);
generateRow();
}
catch (ParseException e)
{
System.out.println(e);
}

Note: You can extend PowerCenter functionality with the Java transformation which provides
a simple native programming interface to define transformation functionality with the Java
programming language. You can use the Java transformation to quickly define simple or
moderately complex transformation functionality without advanced knowledge of the Java
programming language.
For example, you can define transformation logic to loop through input rows and generate
multiple output rows based on a specific condition. You can also use expressions, userdefined functions, unconnected transformations, and mapping variables in the Java code.

Implementing Informatica Incremental Aggregation


Last Updated on Wednesday, 13 March 2013 07:35
Written by Saurav Mitra

Using incremental aggregation, we apply captured changes in the source data (CDC part) to
aggregate calculations in a session. If the source changes incrementally and we can capture
the changes, then we can configure the session to process those changes. This allows the
Integration Service to update the target incrementally, rather than forcing it to delete previous
loads data, process the entire source data and recalculate the same data each time you run the
session.

Incremental Aggregation

When the session runs with incremental aggregation enabled for the first time say 1st week of
Jan, we will use the entire source. This allows the Integration Service to read and store the
necessary aggregate data information. On 2nd week of Jan, when we run the session again,
we will filter out the CDC records from the source i.e the records loaded after the initial load.
The Integration Service then processes these new data and updates the target accordingly.
Use incremental aggregation when the
changes do not significantly change
the target.If processing the
incrementally changed source alters
more than half the existing target, the
session may not benefit from using
incremental aggregation. In this case,
drop the table and recreate the target
with entire source data and recalculate
the same aggregation formula .
INCREMENTAL AGGREGATION, may be helpful in cases when we need to load data in
monthly facts in a weekly basis.
Sample Mapping

Let us see a sample mapping to implement incremental aggregation:

Look at the Source Qualifier query to fetch the CDC part using a
BATCH_LOAD_CONTROL table that saves the last successful load date for the particular
mapping.

Look at the ports tab of Expression transformation.

Look at the ports tab of Aggregator Transformation.

Now the most important session properties configuation to implement incremental


Aggregation

If we want to reinitialize the aggregate cache suppose during first week of every month we
will configure the same session in a new workflow at workflow level with the Reinitialize
aggregate cache property checked in.

Example with Data

Now have a look at the source table data:


CUSTOMER_K
EY

INVOICE_K
EY

AMOUN
T

LOAD_DAT
E

1111

5001

100

01/01/201
0

2222

5002

250

01/01/201
0

3333

5003

300

01/01/201
0

1111

6007

200

07/01/201
0

1111

6008

150

07/01/201
0

2222

6009

250

07/01/201

4444

1234

350

07/01/201
0

5555

6157

500

07/01/201
0

After the first Load on 1st week of Jan 2010, the data in the target is as follows:
CUSTOMER_KE
Y

INVOICE_KE
Y

MON_KE
Y

AMOUNT

1111

5001

201001

100

2222

5002

201001

250

3333

5003

201001

300

Now during the 2nd week load it will process only the incremental data in the source i.e those
records having load date greater than the last session run date. After the 2nd weeks load after
incremental aggregation of the incremental source data with the aggregate cache file data will
update the target table with the following dataset:
CUSTOME
R_KEY

1111

2222

INVOIC
E_KEY

MON_
KEY

6008

2010
01

6009

2010
01

AMO
UNT

Remarks/Op
eration

450

The cache
file updated
after
aggretation

500

The cache
file updated
after
aggretation

3333

4444

5555

5003

1234

6157

2010
01

2010
01

2010
01

300

The cache
file remains
the same as
before

350

New group
row
inserted in
cache file

500

New group
row
inserted in
cache file

Understanding Incremental Aggregation Process

The first time we run an incremental aggregation session, the Integration Service processes
the entire source. At the end of the session, the Integration Service stores aggregate data for
that session run in two files, the index file and the data file. The Integration Service creates
the files in the cache directory specified in the Aggregator transformation properties.
Each subsequent time we run the session with incremental aggregation, we use the
incremental source changes in the session. For each input record, the Integration Service
checks historical information in the index file for a corresponding group. If it finds a
corresponding group, the Integration Service performs the aggregate operation incrementally,
using the aggregate data for that group, and saves the incremental change. If it does not find a
corresponding group, the Integration Service creates a new group and saves the record data.
When writing to the target, the Integration Service applies the changes to the existing target.
It saves modified aggregate data in the index and data files to be used as historical data the
next time you run the session.
Each subsequent time we run a session with incremental aggregation, the Integration Service
creates a backup of the incremental aggregation files. The cache directory for the Aggregator
transformation must contain enough disk space for two sets of the files.
The Integration Service creates new aggregate data, instead of using historical data, when we
configure the session to reinitialize the aggregate cache, Delete cache files etc.
When the Integration Service rebuilds incremental aggregation files, the data in the previous
files is lost.

Pushdown Optimization which is a new concept in Informatica PowerCentre, allows


developers to balance data transformation load among servers. This article describes
pushdown techniques.
What is Pushdown Optimization?

Pushdown optimization is a way of


load-balancing among servers in order
to achieve optimal performance. Veteran
ETL developers often come across
issues when they need to determine the
appropriate place to perform ETL logic.
Suppose an ETL logic needs to filter out
data based on some condition. One can
either do it in database by using
WHERE condition in the SQL query or
inside Informatica by using Informatica
Filter transformation.
Sometimes, we can even "push" some transformation logic to the target database instead of
doing it in the source side (Especially in the case of EL-T rather than ETL). Such
optimization is crucial for overall ETL performance.
How does Push-Down Optimization work?

One can push transformation logic to the source or target database using pushdown
optimization. The Integration Service translates the transformation logic into SQL queries and
sends the SQL queries to the source or the target database which executes the SQL queries to
process the transformations. The amount of transformation logic one can push to the database
depends on the database, transformation logic, and mapping and session configuration. The
Integration Service analyzes the transformation logic it can push to the database and executes
the SQL statement generated against the source or target tables, and it processes any
transformation logic that it cannot push to the database.
Using Pushdown Optimization

Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic
that the Integration Service can push to the source or target database. You can also use the
Pushdown Optimization Viewer to view the messages related to pushdown optimization.
Let us take an example:

Filter Condition used in this mapping is: DEPTNO>40


Suppose a mapping contains a Filter transformation that filters out all employees except those
with a DEPTNO greater than 40. The Integration Service can push the transformation logic to
the database. It generates the following SQL statement to process the transformation logic:
INSERT INTO EMP_TGT(EMPNO, ENAME, SAL, COMM, DEPTNO)
SELECT
EMP_SRC.EMPNO,
EMP_SRC.ENAME,
EMP_SRC.SAL,
EMP_SRC.COMM,
EMP_SRC.DEPTNO
FROM EMP_SRC
WHERE (EMP_SRC.DEPTNO >40)

The Integration Service generates an INSERT SELECT statement and it filters the data using
a WHERE clause. The Integration Service does not extract data from the database at this
time.
We can configure pushdown optimization in the following ways:
Using source-side pushdown optimization:

The Integration Service pushes as much transformation logic as possible to the source
database. The Integration Service analyzes the mapping from the source to the target or until
it reaches a downstream transformation it cannot push to the source database and executes the
corresponding SELECT statement.
Using target-side pushdown optimization:

The Integration Service pushes as much transformation logic as possible to the target
database. The Integration Service analyzes the mapping from the target to the source or until
it reaches an upstream transformation it cannot push to the target database. It generates an
INSERT, DELETE, or UPDATE statement based on the transformation logic for each
transformation it can push to the database and executes the DML.
Using full pushdown optimization:

The Integration Service pushes as much transformation logic as possible to both source and
target databases. If you configure a session for full pushdown optimization, and the
Integration Service cannot push all the transformation logic to the database, it performs
source-side or target-side pushdown optimization instead. Also the source and target must be
on the same database. The Integration Service analyzes the mapping starting with the source
and analyzes each transformation in the pipeline until it analyzes the target.
When it can push all transformation logic to the database, it generates an INSERT SELECT
statement to run on the database. The statement incorporates transformation logic from all the
transformations in the mapping. If the Integration Service can push only part of the
transformation logic to the database, it does not fail the session, it pushes as much
transformation logic to the source and target database as possible and then processes the
remaining transformation logic.

For example, a mapping contains the following transformations:


SourceDefn -> SourceQualifier -> Aggregator -> Rank -> Expression -> TargetDefn
SUM(SAL), SUM(COMM) Group by DEPTNO
RANK PORT on SAL
TOTAL = SAL+COMM

The Rank transformation cannot be pushed to the database. If the session is configured for
full pushdown optimization, the Integration Service pushes the Source Qualifier
transformation and the Aggregator transformation to the source, processes the Rank
transformation, and pushes the Expression transformation and target to the target database.
When we use pushdown optimization, the Integration Service converts the expression in the
transformation or in the workflow link by determining equivalent operators, variables, and
functions in the database. If there is no equivalent operator, variable, or function, the
Integration Service itself processes the transformation logic. The Integration Service logs a
message in the workflow log and the Pushdown Optimization Viewer when it cannot push an
expression to the database. Use the message to determine the reason why it could not push
the expression to the database.
How does Integration Service handle Push Down Optimization

To push transformation logic to a database, the Integration Service might create temporary
objects in the database. The Integration Service creates a temporary sequence object in the
database to push Sequence Generator transformation logic to the database. The Integration
Service creates temporary views in the database while pushing a Source Qualifier
transformation or a Lookup transformation with a SQL override to the database, an
unconnected relational lookup, filtered lookup.
1. To push Sequence Generator transformation logic to a database, we must
configure the session for pushdown optimization with Sequence.
2. To enable the Integration Service to create the view objects in the
database we must configure the session for pushdown optimization
with View.

After the database transaction completes, the Integration Service drops sequence and view
objects created for pushdown optimization.
Configuring Parameters for Pushdown Optimization

Depending on the database workload, we might want to use source-side, target-side, or full
pushdown optimization at different times and for that we can use the $$PushdownConfig
mapping parameter. The settings in the $$PushdownConfig parameter override the pushdown
optimization settings in the session properties. Create $$PushdownConfig parameter in the

Mapping Designer , in session property for Pushdown Optimization attribute select $


$PushdownConfig and define the parameter in the parameter file.
The possible values may be,
1. none i.e the integration service itself processes all the transformations.
2. Source [Seq View],
3. Target [Seq View],
4. Full [Seq View]
Using Pushdown Optimization Viewer

Use the Pushdown Optimization Viewer to examine the transformations that can be pushed to
the database. Select a pushdown option or pushdown group in the Pushdown Optimization
Viewer to view the corresponding SQL statement that is generated for the specified
selections. When we select a pushdown option or pushdown group, we do not change the
pushdown configuration. To change the configuration, we must update the pushdown option
in the session properties.
Database that supports Informatica Pushdown Optimization

We can configure sessions for pushdown optimization having any of the databases like
Oracle, IBM DB2, Teradata, Microsoft SQL Server, Sybase ASE or Databases that use
ODBC drivers.
When we use native drivers, the Integration Service generates SQL statements using native
database SQL. When we use ODBC drivers, the Integration Service generates SQL
statements using ANSI SQL. The Integration Service can generate more functions when it
generates SQL statements using native language instead of ANSI SQL.
Pushdown Optimization Error Handling

When the Integration Service pushes transformation logic to the database, it cannot track
errors that occur in the database.
When the Integration Service runs a session configured for full pushdown optimization and
an error occurs, the database handles the errors. When the database handles errors, the
Integration Service does not write reject rows to the reject file.
If we configure a session for full pushdown optimization and the session fails, the Integration
Service cannot perform incremental recovery because the database processes the
transformations. Instead, the database rolls back the transactions. If the database server fails,
it rolls back transactions when it restarts. If the Integration Service fails, the database server
rolls back the transaction.

Aggregation with out Informatica Aggregator


Last Updated on Sunday, 31 March 2013 09:13
Written by Saurav Mitra

Since Informatica process data on row by row basis, it is generally possible to handle data
aggregation operation even without an Aggregator Transformation. On certain cases, you may
get huge performance gain using this technique!
General Idea of Aggregation without Aggregator Transformation

Let us take an example: Suppose we want to find the SUM of SALARY for Each Department
of the Employee Table. The SQL query for this would be:
SELECT DEPTNO, SUM(SALARY)
FROM EMP_SRC
GROUP BY DEPTNO;

If we need to implement this in Informatica, it would be very easy as we would obviously go


for an Aggregator Transformation. By taking the DEPTNO port as GROUP BY and one
output port as SUM(SALARY) the problem can be solved easily.
But we want to achieve this without aggregator transformation!
We will use only Expression transformation to achieve the functionality of Aggregator
expression. The trick is to use the very funda of the expression transformation of holding the
value of an attribute of the previous tuple over here.
But wait... why would we do this? Aren't we complicating the things here?
Yes, we are! But as it appears, in many cases, it might have an performance benefit
(especially if the input is already sorted or when you know input data will not violate the
order, like you are loading daily data and want to sort it by day). Please see this article to
know more about how to improve the performance of Aggregator transformation
Remember Informatica holds all the
rows in Aggregator cache for
aggregation operation. This needs time
and cache space and this also voids the
normal row by row processing in
Informatica. By removing the
Aggregator with an Expression, we
reduce cache space requirement and
ease out row by row processing. The
mapping below will show how to do
this.
Mapping for Aggregation with Expression and Sorter only:

Sorter (SRT_SAL) Ports Tab

Now I am showing a sorter here just illustrate the concept. If you already have sorted
data from the source, you need not use this thereby increasing the performance
benefit.

Expression (EXP_SAL) Ports Tab

Sorter (SRT_SAL1) Ports Tab

Expression (EXP_SAL2) Ports Tab

Filter (FIL_SAL) Properties Tab

This is how we can implement aggregation without using Informatica aggregator


transformation.

Approach to send an Email Notification when a Job runs


for a Long time:
Description:
Here is an approach to send an Email Notification if a desired task is running
for a long time or exceeding a stipulated time. This approach doesnot send
an email notification when the desired task runs normally or within the
stipulated time.
Approach:
This approach enables to send an email notification if a Task is running for
more than a stipulated time [or say 20 mins] .
Here in the below scenario consider the EventWait task to check its run
time.
Create a Work flow variable $$GO_SIGNAL_FOR_EMAIL with nstring as
datatype.Set the default value of this variable to a character N and validate
it.
Create an Assignment task next to the Task for whose delay a notification
has to be sent . Link the Assingment task to the parent task using a link
task.From the Assignment task connect to the rest of the tasks in the
workflow.
Now assign the workflow variable $$GO_SIGNAL_FOR_EMAIL inside the
assignment task with a character Y.
Now connect a Timer task to the Start task as shown below or to the Task
for whose delay a notification is to be sent . Now set the Timer task with the
time it has to wait for to send a notification as below:

Connect an Email task to the Timer task using a link task.In the link task
which is in between the timer and email tasks define a condition as:
$Timer.Status=SUCCEEDED AND $$GO_SIGNAL_FOR_EMAIL != Y . Validate it
and after whole Work Flow is completed save and proceed for running it.
Advantages:
Does not impact the rest of the workflow. Sends an email notification only
when the desired Task is running for more than the stipulated time.
Limitations:
The overall status of the Work Flow is shown as Running until the Timer
task is SUCCEEDED.
*Note: Even the Timer task is succeeded the approach only sends an Email
Notification when it the desired task exceeds the stipulated time set.

How can you complete unrecoverable session?


Under certain circumstances, when a session does not complete, you need to truncate
the target tables and run the session from the beginning. Run the session from the
beginning when the Informatica server cannot run recovery or when running recovery
might result in inconsistent data.

How to recover sessions in concurrent batches?


If multiple sessions in a concurrent batch fail, you might want to truncate all targets and
run the batch again. However, if a session in a concurrent batch fails and the rest of the
sessions complete successfully, you can recover the session as a standalone session.
To recover a session in a concurrent batch:
1.Copy the failed session using Operations-Copy Session.
2.Drag the copied session outside the batch to be a standalone session.
3.Follow the steps to recover a standalone session.
4.Delete the standalone copy

Explain about perform recovery?


When the Informatica Server starts a recovery session, it reads the OPB_SRVR_RECOVERY
table and notes the row ID of the last row committed to the target database . The
Informatica Server then reads all sources again and starts processing from the next row
ID. For example, if the Informatica Server commits 10,000 rows before the session fails,
when you run recovery, the Informatica Server bypasses the rows up to 10,000 and starts
loading with row 10,001.
By default, Perform Recovery is disabled in the Informatica Server setup. You must
enable Recovery in the Informatica Server setup before you run a session so the
Informatica Server can create and/or write entries in the OPB_SRVR_RECOVERY table

Explain about Recovering sessions?


If you stop a session or if an error causes a session to stop, refer to the session and error
logs to determine the cause of failure. Correct the errors, and then complete the session.
The method you use to complete the session depends on the properties of the mapping,
session, and Informatica Server configuration.
Use one of the following methods to complete the session:
Run the session again if the Informatica Server has not issued a commit.
Truncate the target tables and run the session again if the session is not recoverable.
Consider performing recovery if the Informatica Server has issued at least one commit

What is difference between stored procedure transformation and external


procedure transformation?
In case of storedprocedure transformation procedure will be compiled and executed in a
relational data source.U need data base connection to import the stored procedure in to
ur maping.Where as in external procedure transformation procedure or function will be
executed out side of data source.Ie u need to make it as a DLL to access in u r
maping.No need to have data base connection in case of external procedure
transformation

what is incremantal aggregation?


When using incremental aggregation, you apply captured changes in the source to
aggregate calculations in a session. If the source changes only incrementally and you can
capture changes, you can configure the session to process only those changes. This
allows the Informatica Server to update your target incrementally, rather than forcing it
to process the entire source and recalculate the same calculations each time you run the
session

How can u access the remote source into Ur session?


Relational source: To acess relational source which is situated in a remote place ,u need
to
configure database connection to the datasource.
FileSource : To access the remote source file U must configure the FTP connection to the
host machine before u create the session.
Hetrogenous : When Ur maping contains more than one source type,the server manager
creates
a hetrogenous session that displays source options for all types

What r the out put files that the informatica server creates during the session
running?

Informatica server log: Informatica server(on unix) creates a log for all status and error
messages (default name: pm.server.log).It also creates an error log for error
messages.These files will be created in informatica home directory.
Session log file: Informatica server creates session log file for each session.It writes
information about session into log files such as initialization process,creation of sql
commands for reader and writer threads,errors encountered and load summary.The
amount of detail in session log file depends on the tracing level that u set.
Session detail file: This file contains load statistics for each targets in mapping.Session
detail include information such as table name,number of rows written or rejected.U can
view this file by double clicking on the session in monitor window
Performance detail file: This file contains information known as session performance
details which helps U where performance can be improved.To genarate this file select the
performance detail option in the session property sheet.
Reject file: This file contains the rows of data that the writer does notwrite to targets.
Control file: Informatica server creates control file and a target file when U run a session
that uses the external loader.The control file contains the information about the target
flat file such as data format and loading instructios for the external loader.
Post session email: Post session email allows U to automatically communicate
information about a session run to designated recipents.U can create two different
messages.One if the session completed sucessfully the other if the session fails.
Indicator file: If u use the flat file as a target,U can configure the informatica server to
create indicator file.For each target row,the indicator file contains a number to indicate
whether the row was marked for insert,update,delete or reject.
output file: If session writes to a target file,the informatica server creates the target file
based on file prpoerties entered in the session property sheet.
Cache files: When the informatica server creates memory cache it also creates cache
files.For the following circumstances informatica server creates index and datacache
files.
Aggreagtor transformation
Joiner transformation
Rank transformation
Lookup transformation

To achieve the session partition what r the necessary tasks u have to


do?
4

Configure the session to partition source data. Install the Informatica server on a
machine with multiple cpus.

Describe two levels in which update strategy transformation sets?


Within a session. When you configure a session, you can instruct the Informatica Server
to either treat all records in the same way (for example, treat all records as inserts), or

use instructions coded into the session mapping to flag records for different database
operations.
Within a mapping. Within a mapping, you use the Update Strategy transformation to flag
records for insert, delete, update, or reject

What r the rank caches?


Asked By: Interview Candidate | Asked On: Sep 21st, 2004

During the session ,the Informatica server compares an inout row with rows in the
datacache.If the input row out-ranks a stored row,the Informatica server replaces the
stored row with the input row.The Informatica server stores group information in an index
cache and row data in a data cache

Why we use partitioning the session in Informatica?


Partitioning achieves the session performance by reducing the time period of reading the
source and loading the data into target.
Performance can be improved by processing data in parallel in a single session by
creating multiple partitions of the pipeline.
Informatica server can achieve high performance by partitioning the pipleline and
performing the extract , transformation, and load for each partition in parallel.

Which transformation should we use to normalize the COBOL and relational


sources?
Normalizer Transformation.
When U drag the COBOL source in to the mapping Designer workspace,the normalizer
transformation automatically appears,creating input and output ports for every column in
the

What is the Rankindex in Ranktransformation?


The Designer automatically creates a RANKINDEX port for each Rank transformation. The
Informatica Server uses the Rank Index port to store the ranking position for each record
in a group. For example, if you create a Rank transformation that ranks the top 5
salespersons for each quarter, the rank index numbers the salespeople from 1 to 5

What r the different types of Type2 dimension maping?


Type2 Dimension/Version Data Maping: In this maping the updated dimension in the
source will gets inserted in target along with a new version number.And newly added
dimension
in source will inserted into target with a primary key.
Type2 Dimension/Flag current Maping: This maping is also used for slowly changing
dimensions.In addition it creates a flag value for changed or new dimension.
Flag indiactes the dimension is new or newlyupdated.Recent dimensions will gets saved
with cuurent flag value 1. And updated dimensions r saved with the value 0.
Type2 Dimension/Effective Date Range Maping: This is also one flavour of Type2 maping
used for slowly changing dimensions.This maping also inserts both new and changed

dimensions in to the target.And changes r tracked by the effective date range for each
version of each dimension.

How the informatica server sorts the string values in Ranktransformation?


When the informatica server runs in the ASCII data movement mode it sorts session data
using Binary sortorder.If U configure the seeion to use a binary sort order,the informatica
server caluculates the binary value of each string and returns the specified number of
rows with the higest binary values for the string
When Informatica Server runs in UNICODE data movement mode ,then it uses the
sort order configured in session properties

what is a time dimension? give an example?


Time dimension is one of important in Datawarehouse. Whenever u genetated the
report , that time u access all data from thro time dimension.
eg. employee time dimension
Fields : Date key, full date, day of wek, day , month,quarter, fiscal year
In a relational data model, for normalization purposes, year lookup, quarter lookup,
month lookup, and week lookups are not merged as a single table. In a dimensional data
modeling(star schema), these tables would be merged as a single table called TIME
DIMENSION for performance and slicing data.
This dimensions helps to find the sales done on date, weekly, monthly and yearly basis.
We can have a trend analysis by comparing this year sales with the previous year or this
week sales with the previous week
A TIME DIMENSION is a table that contains the detail information of the time at which a
particular 'transaction' or 'sale' (event) has taken place.
The TIME DIMENSION has the details of
DAY, WEEK, MONTH, QUARTER, YEAR

Can i start and stop single session in concurent bstch?


ya shoor,Just right click on the particular session and going to recovery option
or

by using event wait and event rise

Difference between static cache and dynamic cache

Static cache

Dynamic cache

U can not insert or update the cache

U can insert rows into the cache


as u pass to the target

The informatic server returns a value from the


The informatic server inserts
lookup table or cache when the condition is
rows into cache when the
true.When the condition is not true,
condition is false.This indicates
informatica server returns the default value
that the the row is not in the
for connected transformations and null for
cache or target table. U can pass
unconnected transformations.
these rows to the target table

how to use mapping parameters and what is their use?


in designer u will find the mapping parameters and variables options.u can assign a
value to them in designer. comming to there uses suppose u r doing incremental
extractions daily. suppose ur source system contains the day column. so every day u
have to go to that mapping and change the day so that the particular data will be
extracted . if we do that it will be like a layman's work. there comes the concept of
mapping parameters and variables. once if u assign a value to a mapping variable then it
will change between sessions

mapping parameters and variables make the use of mappings more flexible.and also it
avoids creating of multiple mappings. it helps in adding incremental data.mapping
parameters and variables has to create in the mapping designer by choosing the menu
option as Mapping ----> parameters and variables and the enter the name for the
variable or parameter but it has to be preceded by $$. and choose type as
parameter/variable, datatypeonce defined the variable/parameter is in the any
expression for example in SQ transformation in the source filter prop[erties tab. just enter
filter condition and finally create a parameter file to assgn the value for the variable /
parameter and configigure the session properties. however the final step is optional. if
ther parameter is npt present it uses the initial value which is assigned at the time of
creating the variable

What r the options in the target session of update strategy transsformatioin?


Insert
Delete
Update
Update as update
Update as insert
Update esle insert
Truncate table

Update as Insert:
This option specified all the update records from source to be flagged as inserts in the
target. In other words, instead of updating the records in the target they are inserted as
new records.
Update else Insert:
This option enables informatica to flag the records either for update if they are old or
insert, if they are new records from source

how to create the staging area in your database


client having database throught that data base u get all sources
A Staging area in a DW is used as a temporary space to hold all the records from the
source system. So more or less it should be exact replica of the source systems except
for the laod startegy where we use truncate and reload options.
So create using the same layout as in your source tables or using the Generate SQL
option in the Warehouse Designer tab
creating of staging tables/area is the work of data modellor/dba.just like " create
table ......" the tables will be created. they will have some name to identified as staging
like dwc_tmp_asset_eval.
tmp-----> indicate temparary tables nothing but staging

What is the difference between connected and unconnected stored procedures


Unconnected:
The unconnected Stored Procedure transformation is not connected directly to the flow of
the mapping. It either runs before or after the session, or is called by an expression in
another transformation in the mapping.
connected:
The flow of data through a mapping in connected mode also passes through the Stored
Procedure transformation. All data entering the transformation through the input ports
affects the stored procedure. You should use a connected Stored Procedure
transformation when you need data from an input port sent as an input parameter to the
stored procedure, or the results of a stored procedure sent as an output parameter to
another transformation

Run a stored procedure before or after your session.

Unconnected

Run a stored procedure once during your mapping, such as pre- or postsession.

Unconnected

Run a stored procedure every time a row passes through the Stored
Procedure transformation.
Run a stored procedure based on data that passes through the mapping,
such as when a specific port does not contain a null value.
Pass parameters to the stored procedure and receive a single output
parameter.

Connected or
Unconnected
Unconnected
Connected or
Unconnected

Pass parameters to the stored procedure and receive multiple output


parameters.
Note: To get multiple output parameters from an unconnected Stored
Procedure transformation, you must create variables for each output
parameter. For details, see Calling a Stored Procedure From an
Expression.
Run nested stored procedures.
Call multiple times within a mapping.

Connected or
Unconnected

Unconnected
Unconnected

while running multiple session in parallel which loads data in the same
table, throughput of each session becomes very less and almost same
for each session. How can we improve the performance (throughput) in
such cases?
I think this will be handled by the database

we use.

When the operations/loading on the table is in progress the table will be locked.
If we are trying to load the same table with different partitions then we run into rowID
erros if the database is 9i and we can apply a patch to reslove this issue

How can you delete duplicate rows with out using Dynamic Lookup? Tell me any other
ways using lookup delete the duplicate rows?

For example u have a table Emp_Name and it has two columns Fname, Lname in the
source table which has douplicate rows. In the mapping Create Aggregator transformation.
Edit the aggregator transformation select Ports tab select Fname then click the check box on
GroupBy and uncheck the (O) out port. select Lname then uncheck the (O) out port and click
the check box on GroupBy. Then create 2 new ports Uncheck the (I) import then click
Expression on each port. In the first new port Expression type Fname. Then second
Newport type Lname. Then close the aggregator transformation link to the target table
In a joiner trasformation, you should specify the source with fewer rows as the
master source. Why?
in joinner transformation informatica server reads all the records from master source
builds index and data caches based on master table rows.after building the caches the
joiner transformation reads records from the detail source and perform joins
Joiner transformation compares each row of the master source against the detail source.
The fewer unique rows in the master, the fewer iterations of the join comparison occur,
which speeds the join process.

What is data merging, data cleansing, sampling


Cleansing:---TO identify and remove the retundacy and inconsistency
sampling: just smaple the data throug send the data from source to target

what is tracing level?


Tracing level determines the amount of information that informatcia server writes in a
session log
Ya its the level of information storage in session log.
The option comes in the properties tab of transformations. By default it remains
"Normal". Can be
Verbose Initialisation
Verbose Data
Normal
or Terse

How can we join 3 database like Flat File, Oracle, Db2 in Informatrica?
You have to use two joiner transformations.fIRST one will join two tables and the next one
will join the third with the resultant of the first joiner

How do we analyse the data at database level?


Data can be viewed using Informatica's designer tool.
If you want to view the data on source/target we can preview the data but with some
limitations.
We can use data profiling too

how can we eliminate duplicate rows from flat file?


keep aggregator between source qualifier and target and choose group by field key, it
will eliminate the duplicate records.

What are the index you used? Bitmap join index?


Bitmap index used in data warehouse environment to increase query response time,
since DWH has low cardinality, low updates, very efficient for where clause.
Bitmap join index used to join dimension and fact table instead reading 2 different index.

What is Data driven?

Data driven is a process, in which data is insert/deleted/updated based on the data. here it is
not predifed tht data is to insert or delete or update .. it will come to knw only when data is
proceesed

What is batch? Explain the types of the batches?


Session: A session is a set of commands that describes the server to move data to the target.
Batch : A Batch is set of tasks that may include one or more numbar of tasks (sessions, ewent
wait, email, command, etc..,)
There are two types of batches in Informatica:
1. Sequential: When Data moves one after another from source to target it is sequential
. Concurrent: When whole data moves simultaneously from source to target it is Concurrent

What are the types of meta data repository stores?

Global objects
Mappings
Mapplets
Multidimensional metadata
Reusable transformations
Sessions and batches
Short cuts
Source definitions
Target defintions
Transformations

.Can you use the mapping parameters or variables created in one mapping into another
mapping?
NO.
We can use mapping parameters or variables in any transformation of the same maping or
mapplet in which U have created maping parameters or variables.
NO. You might want to use a workflow parameter/variable if you want it to be visible with
other mappings/sessions
Why did we use stored procedure in our ETL Application?
Using of stored procedures plays important role.Suppose ur using oracle database where ur
doing some ETL changes you may use informatica .In this every row of the table pass should
pass through informatica and it should undergo specified ETL changes mentioned in
transformations. If use stored procedure i..e..oracle pl/sql package this will run on oracle
database(which is the databse where we need to do changes) and it will be faster comapring
to informatica because it is runing on the oracle databse.Some things which we cant do using
tools we can do using packages.Some jobs make take hours to run ........in order to save time
and database usage we can go for stored procedures

What is the default join operation performed by the look up transformation


equi-join

What is hash table Informatica?


Use hash partitioning when you want the Integration Service to distribute rows to the
partitions by group. For example, you need to sort items by item ID, but you do not know
how many items have a particular ID number

Difference between Cached lookup and Un-cached lookup?


For a cached lookup the entire rows (lookup table) will be put in the buffer, and compare
these rows with the incomming rows.

where as uncached lookup, for every input row the lookup will query the lookup table and get
the rows.
So for performance Go for Cache lookup if Lookup table size< Mapping rows
Go for UnCache lookup if Lookup table size> Mapping rows.
.What is polling?
displays the updated information about the session in the monitor window. The monitor
window displays the status of each session when you poll the Informatica server.

What is rank cache?

The integration service compares input rows in the data cache, if the input row out-ranks a
cached row, the integration service replaces the cached row with the input row. If you
configure the rank transformation to rank across multiple groups, the integration service
ranks incrementally for each group it finds. The integration service stores group information
in index cache and row data in data cache

S-ar putea să vă placă și