Documente Academic
Documente Profesional
Documente Cultură
Create a Dummy Port in both tables and assign same value e.g. 1 to both ports in expression
transformation before joiner.
Now in join condition use this dummy port to join both tables.
Or
You join these table using null key.
Pass {} to the key
2. How to generate the sequence of keys or numbers in target without using the sequence
generator transformation.
It can be done using a setvariable function. We need to add a mapping variable with the initial
value given as 0.
Then in the expression transformation:
1. Seq_No -->
2. Out_Seq_No --> setvariable(,)
At every run, the value of the mapping variable will be incremented by 1.
7. If a record is updated multiple times before session is run, how do you track those changes
in SCD type 2?
can track the changes in SCD type 2 in many ways:
--> A time column
--> Mentioning a version
8. What is the way to add the total number of records that have been read from src in the tgt
file as last line?
This can be achieved using an Aggregator transformation.
In the aggregator transformation, check the group by columns for all the source
columns and add one extra output port in the aggregator.
OUT_TTL_RECORDS=count(*)
Pass this port value as the last record of the flat file target.
9. If there are multiple source flat files with different names but same file structure. How do
we load all those files to the target in one step?
1. Create the mapping as if there is only single source and target.
2. Now create an additional file on the server which will list the multiple
source file names along with their paths.
3. Specify the path and name of this file in the "Source File" under session
properties.
4. Now the most important thing - Set "Source Filetype" as "indirect" under
session properties.
10.write a query to retrieve latest records from the target table means if
we have used scd2 version type of dimension, than retrieve the record
with highest version no.for eg
verno
1 100
2 100
1 101
2 101
id loc
bang
kol
bang
chen
11. , I have scenario like, I have ten flat files to be loaded in to target but I need to load file
workflow runs for the first time only the first target should be populated
and the rest two(second and last) should not be populated.When the
workflow runs for the second time only the second target should be
populated and the rest two(first and last) should not be populated.When
the workflow runs for the third time only the third target should be
populated and the rest two(first and second) should not be populated.
u can use the 3 target tables as lookup. If an incoming row from the file is in the target, set
flags accordingly. Then next step you evaluate the flags and then use a router.
if in target 1, set flag1=Y, else N
if in target2, set flag2=Y else N
if in target3, set flag3=Y else N
Now if flag1=N, route totarget 1
if flag1=Y and flag2=N, route to target 2
if flag1=Y, flag2=Y and flag3=N route to target3
Of couse this is only if you are inserting rows into the targets. If you have updates, then of
course the logic gets complicated because you have to check for changed values. But the
concept would still be the same.
0r
declare a workflow variable like counter assign default variable =1
each time we run the workflow just increment variable like counter + 1.
if your are running first time, check the counter value mod 3 you will be getting 1 then load
first target.
during the second time we will get (counter mod 3 )=2 then load the data into second target
table.
during the thrid time we will get (counter mod 3 )=0 then load the data into third target table.
repository server automatically update the counter value in repository when it is
successfully finished. while executing second time repository server read the recent value
from repository
in my mapping i am having mutliple files and only one target output Flat file,and i need to
implement below logic.Can any one suggests me an idea ,how to do it?
input
-----file1:
field1 field2
field3
file2:
file3:
4
here i am reading three different files in the order File3,file2,file1 .The logic i needed is ,
for example if the record corresponding to '1' is present in multiple files ,then i need to
write the record which is present in the first file and discard the records corresponding to 1
in the rest rest of the files.My target is a flat file and i tried with update strategy but i had
later found that "update concept" wont work with flat files. So please suggest another way to
get this logic
output
-------6
1 A B
2 C
by this at informatica level we can do the required thing however instead of having fixed
number of source piplines (as # of files will be placed is not known in case)...it is better to
read all the files by indirect listing and then do the ranking based on source filename port and
grouping on field1....
so by indirect listing we will be independent of number of files coming from source and can
avoid UNION operations in turn
13.Informatica partition:
14
Adder header and footer in Infor.matica?
You can get the column heading for a flat file using the session configuration as
below. This session setting will give a file with header record 'Cust ID,Name,
Street #,City,State,ZIP'
You can get the footer for a flat file using the session configuration as given in
below image. This configuration will give you a file with ***** End Of The Report
***** as the last row of the file.
Before the file is read, the file need to be unzipped. We do not need any other pres session
script to achieve this. This can be done easy with the below session setting.
This command configuration generates rows to stdout and the Flat file reader
reads directly from stdout, hence removes need for staging data.
For reading multiple file sources with same structure, we use indirect file method.
Indirect file reading is made easy using File Command Property in the session
configuration as shown below.
Command writes list of file names to stdout and PowerCenter interprets this as a
file list
We can zip the target file using a post session script. but this can be done with
out a post session script as shown in below session configuration.
Partitioning Terminology
Lets understand some partitioning terminology before we get into mode details.
Number of partitions : We can divide the data set into smaller subset by
increasing the number of partitions. When we add partitions, we increase
the number of processing threads, which can improve session
performance.
Partition Point : This is the boundary between two stages and divide the
pipeline into stages. Partition point is always associated with a
transformation.
Below image shows the points we discussed above. We have three partitions and
three partition points in below session demo.
Key Range Partitioning : With this type of partitioning, you specify one
or more ports to form a compound partition key for a source or target. The
Integration Service then passes data to each partition depending on the
ranges you specify for each port.
We can invoke the user interface for session partition as shown in below image
from your session using the menu Mapping -> Partitions.
The interface will let you Add/Modify Partitions, Partition Points and Choose the
type of partition Algorithm. Choose any transformation from the mapping and the
"Add Partition Point" button will let you add additional partition points.
Choose any transformation from the mapping and the "Delete Partition Point"
or "Edit Partition Point" button will let you modify partition points.
in below image.
Example:
Below is the simple structure of the mapping to get the assumed functionality.
Pass-through Partition
Once the partition is setup at the source qualifier, you get additional Source Filter
option to restrict the data which corresponds to each partition. Be sure to provide
the filter condition such that same data is not processed through more than one
partition and data is not duplicated. Below image shows three additional Source
Filters, one per each partition.
Since the data volume from three sales region is not same, use round robin
partition algorithm at the next transformation in pipeline. So that the data is
equally distributed among the three partitions and the processing load is equally
distributed. Round robin partition can be setup as shown in below image.
Use Key range partition when required to distribute the records among partitions
based on the range of values of a port or multiple ports.
Here the target table is range partitioned on product line. Create a range
partition on target definition on PRODUCT_LINE_ID port to get the best write
throughput.
Below images shows the steps involved in setting up the key range partition.
Click on Edit Keys to define the ports on which the key range partition is defined.
A pop up window shows the list of ports in the transformation, Choose the ports
on which the key range partition is required.
Now give the value start and end range for each partition as shown below.
We did not have to use Hash User Key Partition and Database Partition algorithm
in the use case discussed here.
Hash User Key partition algorithm will let you choose the ports to group rows
among partitions. This algorithm can be used in most of the places where hash
auto key algorithm is appropriate.
Database partition algorithm queries the database system for table partition
information. It reads partitioned data from the corresponding nodes in the
database. This algorithm can be applied either on the source or target definition.
Change Data Capture framework for such project is not a recommended way to
handle this, just because of the efforts required to build the framework may not
be justified. Here in this article lets discuss about a simple, easy approach handle
Change Data Capture.
We will be using Informatica Mapping Variables to building our Change Data
Capture logic. Before even we talk about the implementation, lets understand
the Mapping Variable
These are variables created in PowerCenter Designer, which you can use in any
expression in a mapping, and you can also use the mapping variables in a source
qualifier filter, user-defined join, or extract override, and in the Expression Editor
of reusable transformations.
Mapping Variable Starting Value
The Integration Service looks for the start value in the order mentioned above.
Value of the mapping variable can be changed with in the session using an
expression and the final value of the variable will be saved into the repository.
The saved value from the repository is retrieved in the next session run and used
as the session start value.
Setting Mapping Variable Value
You can change the mapping variable value with in the mapping or session using
the Set Function. We need to use the set function based on the Aggregation Type
of the variable. Aggregation Type of the variable can be set when the variable is
declared in the mapping.
$$M_DATA_END_TIME as Date/Time
Now bring in the source and source qualified to the mapping designer
workspace. Open the source qualifier and give the filter condition to get the
latest data from the source as shown below.
o
STG_CUSTOMER_MASTER.UPDATE_TS
$M_DATA_END_TIME')
>
CONVERT(DATETIME,'$
Note : This filter condition will make sure that, latest data is pulled from the
source table each and every time. Latest value for the variable
$M_DATA_END_TIME is retrieved from the repository every time the session is
run.
SETMAXVARIABLE($M_DATA_END_TIME,UPDATE_TS)
Note : This expression will make sure that, latest value from the the column
UPDATE_TS is stored into the repository after the successful completion of the
session run.
Now you can map all the remaining columns to the down stream transformation
and complete all other transformation required in the mapping.
Thats all you need to configure Change Data Capture, Now create your workflow
and run the workflow.
Once you look into the session log file you can see the mapping variable value is
retrieved from the repository and used in the source SQL, just like shown in the
image below.
You can look at the mapping variable value stored in the repository, from
workflow manager. Choose the session from the workspace, right click and select
'View Persistent Value'. You get the mapping variable in a pop up window, like
shown below.
Stop - If the Integration Service is executing a Session task when you issue the stop
command, the Integration Service stops reading data. It continues processing and writing
data and committing data to targets. If the Integration Service cannot finish processing and
committing data, you can issue the abort command.
Abort - The Integration Service handles the abort command for the Session task like the stop
command, except it has a timeout period of 60 seconds. If the Integration Service cannot
finish processing and committing data within the timeout period, it kills the DTM process
and terminates the session.
Stop: Stop command is used immediatly kills the process
Abort: Abort command is used it takes certain time period.after kill the process.It will takes
60 Sec to kill the process..
21. What are the join types in joiner transformation?
In mapping drag source 2 times and make sure that source and target
doesn't have any key constraints.
Then add UNION TRF and link both sources to union and link output ports
from union to target.
or
You can use Normalizer t/f to achieve the desired output. There is an
"Occur" option in Normalizer in which you can mention the no of times you
want to load the same source data into target.
arrow button) query results window will appear.u select single mapping or select whole
mappings(by pressing ctrl+A) and then go to tools then validate option to validate it
All these properties are just for improving performance. cahce creates 2 files index and data
cache file. In index file, it just stores frequently acessed key columns wrt transformation
where more I/O and comparisions is required.
Assume if infa storing all data in single cache file considering a table of 100 columns. So
assume it may create a file of 100MB. So we are reading whole file actually where we just
want to read 1 key column data because of joining or sorting. Rest of 99 column data is just
has to be passed to downstream transformation without any other operation on it.
Consider same scenario now by separating a file into 2, one file stores data of 1 key column
of joiner or sorter. Then size of file to be read will be too less than 100MB (can say 10MB).
So think abt reading a file of 10MB and 100MB just for comparision even rest of 99 column
data is not required for comaprision.
fact tables)
4) Slowly changing ( based on period of time the dimensions will be
changed
a)SCD1 (most recent values in the target)
b)SCD2 (current+ history data)
c)SCD3 (just partial history)
5) Casual dimension
6) Dirty dim
Summary Filter --- we can apply records group by that contain common values.
Detail Filter --- we can apply to each and every record in a database
Factless fact tables are used for tracking a process or collecting stats. They are
called so because, the fact table does not have aggregatable numeric values or
information.There are two types of factless fact tables: those that describe
events, and those that describe conditions. Both may play important roles in
your dimensional models.
Factless
fact
tables
for
Events
The first type of factless fact table is a table that records an event. Many eventtracking tables in dimensional data warehouses turn out to be
factless.Sometimes there seem to be no facts associated with an important
business process. Events or activities occur that you wish to track, but you find
no measurements. In situations like this, build a standard transaction-grained
fact table that contains no facts.
For eg.
Factless
fact
tables
for
Conditions
Factless fact tables are also used to model conditions or other important
relationships among dimensions. In these cases, there are no clear transactions
or events.It is used to support negative analysis report. For example a Store that
did not sell a product for a given period. To produce such report, you need to
have a fact table to capture all the possible combinations. You can then figure
out what is missing.
For eg, fact_promo gives the information about the products which have
promotions but still did not sell
The list of products that have promotion but did not sell.
This kind of factless fact table is used to track conditions, coverage or eligibility.
In Kimball terminology, it is called a "coverage table."
Note:
We may have the question that why we cannot include these information in the
actual fact table .The problem is that if we do so then the fact size will increase
enormously .
Ensure that the ETL application properly rejects, replaces with default
values and reports invalid data
Compare unique values of key fields between source data and warehouse
data
Ensure that all projected data is loaded into the data warehouse without
any data loss or truncation
Verify any incremental loading of records at a later date for newly inserted
or updated data
Verify that data loads and queries are executed within anticipated time
frames
NAME
CUST_ID
SVC_ST_DT
SVC_END_DT
TOM
31/08/2009
23/03/2011
DICK
01/01/2004
31/05/2010
HARRY
28/02/2007
31/12/2009
Here I have a service start date and service end date tied to a customer.
Now I want my target table data in a flattened manner like this:
Target Data
NAME
CUST_ID
SVC_ST_DT
SVC_END_DT
TOM
31/08/2009
31/12/2009
TOM
01/01/2010
31/12/2010
TOM
01/01/2011
23/03/2011
DICK
01/01/2004
31/12/2004
DICK
01/01/2005
31/12/2005
DICK
01/01/2006
31/12/2006
DICK
01/01/2007
31/12/2007
DICK
01/01/2008
31/12/2008
DICK
01/01/2009
31/12/2009
DICK
01/01/2010
31/05/2010
HARRY
28/02/2007
31/12/2007
HARRY
01/01/2008
31/12/2008
HARRY
01/01/2009
31/12/2009
i.e. I want to split the service start date and service end dates on a yearly basis.
The first thing that comes to mind with this situation is to use Informatica Normalizer. Thats
TRUE. But if you think twice, you will find that we need to assume or hard-code one thing.
That means you should consider that either the time span should have a fixed maximum
value. Actually say the maximum span between the start and end date should be 5 years.
Knowingly you are trying to set the number of occurences of the Normalizer. Next you will
be using a expression transformation followed by a filter to achieve the requirement. But in
this manner the requirement would not be satisfied when a customer having tenure more than
5 years.
Now here I will be using a small portion of Java Code. The real raw power of Java
programming language called from Informatica Powercenter will do the data transformation.
Lets go straight to the mapping and the code.
Next now if we want to transform and load the data on a monthly basis. Simply find the
Mapping and the Code.
cal1.setTime(st_dt);
OUT_SVC_END_DT = formatter.format(st_dt);
}
else
{
OUT_SVC_ST_DT = formatter.format(ed_dt);
}
generateRow();
st_mon = st_mon + 1;
str = "01/" + st_mon + "/" + yr;
st_dt = (Date)formatter1.parse(str);
cal1.clear();
cal1.setTime(st_dt);
st_mon = cal1.get(Calendar.MONTH)+1;
st_ldm = cal1.getActualMaximum(Calendar.DAY_OF_MONTH);
OUT_NAME = NAME;
OUT_CUST_ID = CUST_ID;
OUT_SVC_ST_DT = formatter.format(st_dt);
OUT_SVC_END_DT = formatter.format(ed_dt);
generateRow();
}
catch (ParseException e)
{
System.out.println(e);
}
Note: You can extend PowerCenter functionality with the Java transformation which provides
a simple native programming interface to define transformation functionality with the Java
programming language. You can use the Java transformation to quickly define simple or
moderately complex transformation functionality without advanced knowledge of the Java
programming language.
For example, you can define transformation logic to loop through input rows and generate
multiple output rows based on a specific condition. You can also use expressions, userdefined functions, unconnected transformations, and mapping variables in the Java code.
Using incremental aggregation, we apply captured changes in the source data (CDC part) to
aggregate calculations in a session. If the source changes incrementally and we can capture
the changes, then we can configure the session to process those changes. This allows the
Integration Service to update the target incrementally, rather than forcing it to delete previous
loads data, process the entire source data and recalculate the same data each time you run the
session.
Incremental Aggregation
When the session runs with incremental aggregation enabled for the first time say 1st week of
Jan, we will use the entire source. This allows the Integration Service to read and store the
necessary aggregate data information. On 2nd week of Jan, when we run the session again,
we will filter out the CDC records from the source i.e the records loaded after the initial load.
The Integration Service then processes these new data and updates the target accordingly.
Use incremental aggregation when the
changes do not significantly change
the target.If processing the
incrementally changed source alters
more than half the existing target, the
session may not benefit from using
incremental aggregation. In this case,
drop the table and recreate the target
with entire source data and recalculate
the same aggregation formula .
INCREMENTAL AGGREGATION, may be helpful in cases when we need to load data in
monthly facts in a weekly basis.
Sample Mapping
Look at the Source Qualifier query to fetch the CDC part using a
BATCH_LOAD_CONTROL table that saves the last successful load date for the particular
mapping.
If we want to reinitialize the aggregate cache suppose during first week of every month we
will configure the same session in a new workflow at workflow level with the Reinitialize
aggregate cache property checked in.
INVOICE_K
EY
AMOUN
T
LOAD_DAT
E
1111
5001
100
01/01/201
0
2222
5002
250
01/01/201
0
3333
5003
300
01/01/201
0
1111
6007
200
07/01/201
0
1111
6008
150
07/01/201
0
2222
6009
250
07/01/201
4444
1234
350
07/01/201
0
5555
6157
500
07/01/201
0
After the first Load on 1st week of Jan 2010, the data in the target is as follows:
CUSTOMER_KE
Y
INVOICE_KE
Y
MON_KE
Y
AMOUNT
1111
5001
201001
100
2222
5002
201001
250
3333
5003
201001
300
Now during the 2nd week load it will process only the incremental data in the source i.e those
records having load date greater than the last session run date. After the 2nd weeks load after
incremental aggregation of the incremental source data with the aggregate cache file data will
update the target table with the following dataset:
CUSTOME
R_KEY
1111
2222
INVOIC
E_KEY
MON_
KEY
6008
2010
01
6009
2010
01
AMO
UNT
Remarks/Op
eration
450
The cache
file updated
after
aggretation
500
The cache
file updated
after
aggretation
3333
4444
5555
5003
1234
6157
2010
01
2010
01
2010
01
300
The cache
file remains
the same as
before
350
New group
row
inserted in
cache file
500
New group
row
inserted in
cache file
The first time we run an incremental aggregation session, the Integration Service processes
the entire source. At the end of the session, the Integration Service stores aggregate data for
that session run in two files, the index file and the data file. The Integration Service creates
the files in the cache directory specified in the Aggregator transformation properties.
Each subsequent time we run the session with incremental aggregation, we use the
incremental source changes in the session. For each input record, the Integration Service
checks historical information in the index file for a corresponding group. If it finds a
corresponding group, the Integration Service performs the aggregate operation incrementally,
using the aggregate data for that group, and saves the incremental change. If it does not find a
corresponding group, the Integration Service creates a new group and saves the record data.
When writing to the target, the Integration Service applies the changes to the existing target.
It saves modified aggregate data in the index and data files to be used as historical data the
next time you run the session.
Each subsequent time we run a session with incremental aggregation, the Integration Service
creates a backup of the incremental aggregation files. The cache directory for the Aggregator
transformation must contain enough disk space for two sets of the files.
The Integration Service creates new aggregate data, instead of using historical data, when we
configure the session to reinitialize the aggregate cache, Delete cache files etc.
When the Integration Service rebuilds incremental aggregation files, the data in the previous
files is lost.
One can push transformation logic to the source or target database using pushdown
optimization. The Integration Service translates the transformation logic into SQL queries and
sends the SQL queries to the source or the target database which executes the SQL queries to
process the transformations. The amount of transformation logic one can push to the database
depends on the database, transformation logic, and mapping and session configuration. The
Integration Service analyzes the transformation logic it can push to the database and executes
the SQL statement generated against the source or target tables, and it processes any
transformation logic that it cannot push to the database.
Using Pushdown Optimization
Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic
that the Integration Service can push to the source or target database. You can also use the
Pushdown Optimization Viewer to view the messages related to pushdown optimization.
Let us take an example:
The Integration Service generates an INSERT SELECT statement and it filters the data using
a WHERE clause. The Integration Service does not extract data from the database at this
time.
We can configure pushdown optimization in the following ways:
Using source-side pushdown optimization:
The Integration Service pushes as much transformation logic as possible to the source
database. The Integration Service analyzes the mapping from the source to the target or until
it reaches a downstream transformation it cannot push to the source database and executes the
corresponding SELECT statement.
Using target-side pushdown optimization:
The Integration Service pushes as much transformation logic as possible to the target
database. The Integration Service analyzes the mapping from the target to the source or until
it reaches an upstream transformation it cannot push to the target database. It generates an
INSERT, DELETE, or UPDATE statement based on the transformation logic for each
transformation it can push to the database and executes the DML.
Using full pushdown optimization:
The Integration Service pushes as much transformation logic as possible to both source and
target databases. If you configure a session for full pushdown optimization, and the
Integration Service cannot push all the transformation logic to the database, it performs
source-side or target-side pushdown optimization instead. Also the source and target must be
on the same database. The Integration Service analyzes the mapping starting with the source
and analyzes each transformation in the pipeline until it analyzes the target.
When it can push all transformation logic to the database, it generates an INSERT SELECT
statement to run on the database. The statement incorporates transformation logic from all the
transformations in the mapping. If the Integration Service can push only part of the
transformation logic to the database, it does not fail the session, it pushes as much
transformation logic to the source and target database as possible and then processes the
remaining transformation logic.
The Rank transformation cannot be pushed to the database. If the session is configured for
full pushdown optimization, the Integration Service pushes the Source Qualifier
transformation and the Aggregator transformation to the source, processes the Rank
transformation, and pushes the Expression transformation and target to the target database.
When we use pushdown optimization, the Integration Service converts the expression in the
transformation or in the workflow link by determining equivalent operators, variables, and
functions in the database. If there is no equivalent operator, variable, or function, the
Integration Service itself processes the transformation logic. The Integration Service logs a
message in the workflow log and the Pushdown Optimization Viewer when it cannot push an
expression to the database. Use the message to determine the reason why it could not push
the expression to the database.
How does Integration Service handle Push Down Optimization
To push transformation logic to a database, the Integration Service might create temporary
objects in the database. The Integration Service creates a temporary sequence object in the
database to push Sequence Generator transformation logic to the database. The Integration
Service creates temporary views in the database while pushing a Source Qualifier
transformation or a Lookup transformation with a SQL override to the database, an
unconnected relational lookup, filtered lookup.
1. To push Sequence Generator transformation logic to a database, we must
configure the session for pushdown optimization with Sequence.
2. To enable the Integration Service to create the view objects in the
database we must configure the session for pushdown optimization
with View.
After the database transaction completes, the Integration Service drops sequence and view
objects created for pushdown optimization.
Configuring Parameters for Pushdown Optimization
Depending on the database workload, we might want to use source-side, target-side, or full
pushdown optimization at different times and for that we can use the $$PushdownConfig
mapping parameter. The settings in the $$PushdownConfig parameter override the pushdown
optimization settings in the session properties. Create $$PushdownConfig parameter in the
Use the Pushdown Optimization Viewer to examine the transformations that can be pushed to
the database. Select a pushdown option or pushdown group in the Pushdown Optimization
Viewer to view the corresponding SQL statement that is generated for the specified
selections. When we select a pushdown option or pushdown group, we do not change the
pushdown configuration. To change the configuration, we must update the pushdown option
in the session properties.
Database that supports Informatica Pushdown Optimization
We can configure sessions for pushdown optimization having any of the databases like
Oracle, IBM DB2, Teradata, Microsoft SQL Server, Sybase ASE or Databases that use
ODBC drivers.
When we use native drivers, the Integration Service generates SQL statements using native
database SQL. When we use ODBC drivers, the Integration Service generates SQL
statements using ANSI SQL. The Integration Service can generate more functions when it
generates SQL statements using native language instead of ANSI SQL.
Pushdown Optimization Error Handling
When the Integration Service pushes transformation logic to the database, it cannot track
errors that occur in the database.
When the Integration Service runs a session configured for full pushdown optimization and
an error occurs, the database handles the errors. When the database handles errors, the
Integration Service does not write reject rows to the reject file.
If we configure a session for full pushdown optimization and the session fails, the Integration
Service cannot perform incremental recovery because the database processes the
transformations. Instead, the database rolls back the transactions. If the database server fails,
it rolls back transactions when it restarts. If the Integration Service fails, the database server
rolls back the transaction.
Since Informatica process data on row by row basis, it is generally possible to handle data
aggregation operation even without an Aggregator Transformation. On certain cases, you may
get huge performance gain using this technique!
General Idea of Aggregation without Aggregator Transformation
Let us take an example: Suppose we want to find the SUM of SALARY for Each Department
of the Employee Table. The SQL query for this would be:
SELECT DEPTNO, SUM(SALARY)
FROM EMP_SRC
GROUP BY DEPTNO;
Now I am showing a sorter here just illustrate the concept. If you already have sorted
data from the source, you need not use this thereby increasing the performance
benefit.
Connect an Email task to the Timer task using a link task.In the link task
which is in between the timer and email tasks define a condition as:
$Timer.Status=SUCCEEDED AND $$GO_SIGNAL_FOR_EMAIL != Y . Validate it
and after whole Work Flow is completed save and proceed for running it.
Advantages:
Does not impact the rest of the workflow. Sends an email notification only
when the desired Task is running for more than the stipulated time.
Limitations:
The overall status of the Work Flow is shown as Running until the Timer
task is SUCCEEDED.
*Note: Even the Timer task is succeeded the approach only sends an Email
Notification when it the desired task exceeds the stipulated time set.
What r the out put files that the informatica server creates during the session
running?
Informatica server log: Informatica server(on unix) creates a log for all status and error
messages (default name: pm.server.log).It also creates an error log for error
messages.These files will be created in informatica home directory.
Session log file: Informatica server creates session log file for each session.It writes
information about session into log files such as initialization process,creation of sql
commands for reader and writer threads,errors encountered and load summary.The
amount of detail in session log file depends on the tracing level that u set.
Session detail file: This file contains load statistics for each targets in mapping.Session
detail include information such as table name,number of rows written or rejected.U can
view this file by double clicking on the session in monitor window
Performance detail file: This file contains information known as session performance
details which helps U where performance can be improved.To genarate this file select the
performance detail option in the session property sheet.
Reject file: This file contains the rows of data that the writer does notwrite to targets.
Control file: Informatica server creates control file and a target file when U run a session
that uses the external loader.The control file contains the information about the target
flat file such as data format and loading instructios for the external loader.
Post session email: Post session email allows U to automatically communicate
information about a session run to designated recipents.U can create two different
messages.One if the session completed sucessfully the other if the session fails.
Indicator file: If u use the flat file as a target,U can configure the informatica server to
create indicator file.For each target row,the indicator file contains a number to indicate
whether the row was marked for insert,update,delete or reject.
output file: If session writes to a target file,the informatica server creates the target file
based on file prpoerties entered in the session property sheet.
Cache files: When the informatica server creates memory cache it also creates cache
files.For the following circumstances informatica server creates index and datacache
files.
Aggreagtor transformation
Joiner transformation
Rank transformation
Lookup transformation
Configure the session to partition source data. Install the Informatica server on a
machine with multiple cpus.
use instructions coded into the session mapping to flag records for different database
operations.
Within a mapping. Within a mapping, you use the Update Strategy transformation to flag
records for insert, delete, update, or reject
During the session ,the Informatica server compares an inout row with rows in the
datacache.If the input row out-ranks a stored row,the Informatica server replaces the
stored row with the input row.The Informatica server stores group information in an index
cache and row data in a data cache
dimensions in to the target.And changes r tracked by the effective date range for each
version of each dimension.
Static cache
Dynamic cache
mapping parameters and variables make the use of mappings more flexible.and also it
avoids creating of multiple mappings. it helps in adding incremental data.mapping
parameters and variables has to create in the mapping designer by choosing the menu
option as Mapping ----> parameters and variables and the enter the name for the
variable or parameter but it has to be preceded by $$. and choose type as
parameter/variable, datatypeonce defined the variable/parameter is in the any
expression for example in SQ transformation in the source filter prop[erties tab. just enter
filter condition and finally create a parameter file to assgn the value for the variable /
parameter and configigure the session properties. however the final step is optional. if
ther parameter is npt present it uses the initial value which is assigned at the time of
creating the variable
Update as Insert:
This option specified all the update records from source to be flagged as inserts in the
target. In other words, instead of updating the records in the target they are inserted as
new records.
Update else Insert:
This option enables informatica to flag the records either for update if they are old or
insert, if they are new records from source
Unconnected
Run a stored procedure once during your mapping, such as pre- or postsession.
Unconnected
Run a stored procedure every time a row passes through the Stored
Procedure transformation.
Run a stored procedure based on data that passes through the mapping,
such as when a specific port does not contain a null value.
Pass parameters to the stored procedure and receive a single output
parameter.
Connected or
Unconnected
Unconnected
Connected or
Unconnected
Connected or
Unconnected
Unconnected
Unconnected
while running multiple session in parallel which loads data in the same
table, throughput of each session becomes very less and almost same
for each session. How can we improve the performance (throughput) in
such cases?
I think this will be handled by the database
we use.
When the operations/loading on the table is in progress the table will be locked.
If we are trying to load the same table with different partitions then we run into rowID
erros if the database is 9i and we can apply a patch to reslove this issue
How can you delete duplicate rows with out using Dynamic Lookup? Tell me any other
ways using lookup delete the duplicate rows?
For example u have a table Emp_Name and it has two columns Fname, Lname in the
source table which has douplicate rows. In the mapping Create Aggregator transformation.
Edit the aggregator transformation select Ports tab select Fname then click the check box on
GroupBy and uncheck the (O) out port. select Lname then uncheck the (O) out port and click
the check box on GroupBy. Then create 2 new ports Uncheck the (I) import then click
Expression on each port. In the first new port Expression type Fname. Then second
Newport type Lname. Then close the aggregator transformation link to the target table
In a joiner trasformation, you should specify the source with fewer rows as the
master source. Why?
in joinner transformation informatica server reads all the records from master source
builds index and data caches based on master table rows.after building the caches the
joiner transformation reads records from the detail source and perform joins
Joiner transformation compares each row of the master source against the detail source.
The fewer unique rows in the master, the fewer iterations of the join comparison occur,
which speeds the join process.
How can we join 3 database like Flat File, Oracle, Db2 in Informatrica?
You have to use two joiner transformations.fIRST one will join two tables and the next one
will join the third with the resultant of the first joiner
Data driven is a process, in which data is insert/deleted/updated based on the data. here it is
not predifed tht data is to insert or delete or update .. it will come to knw only when data is
proceesed
Global objects
Mappings
Mapplets
Multidimensional metadata
Reusable transformations
Sessions and batches
Short cuts
Source definitions
Target defintions
Transformations
.Can you use the mapping parameters or variables created in one mapping into another
mapping?
NO.
We can use mapping parameters or variables in any transformation of the same maping or
mapplet in which U have created maping parameters or variables.
NO. You might want to use a workflow parameter/variable if you want it to be visible with
other mappings/sessions
Why did we use stored procedure in our ETL Application?
Using of stored procedures plays important role.Suppose ur using oracle database where ur
doing some ETL changes you may use informatica .In this every row of the table pass should
pass through informatica and it should undergo specified ETL changes mentioned in
transformations. If use stored procedure i..e..oracle pl/sql package this will run on oracle
database(which is the databse where we need to do changes) and it will be faster comapring
to informatica because it is runing on the oracle databse.Some things which we cant do using
tools we can do using packages.Some jobs make take hours to run ........in order to save time
and database usage we can go for stored procedures
where as uncached lookup, for every input row the lookup will query the lookup table and get
the rows.
So for performance Go for Cache lookup if Lookup table size< Mapping rows
Go for UnCache lookup if Lookup table size> Mapping rows.
.What is polling?
displays the updated information about the session in the monitor window. The monitor
window displays the status of each session when you poll the Informatica server.
The integration service compares input rows in the data cache, if the input row out-ranks a
cached row, the integration service replaces the cached row with the input row. If you
configure the rank transformation to rank across multiple groups, the integration service
ranks incrementally for each group it finds. The integration service stores group information
in index cache and row data in data cache