Documente Academic
Documente Profesional
Documente Cultură
Functional layer: Core functional ETL processing (extract, transform, and load).
Operational management layer: Job-stream definition and management, parameters, scheduling, monitoring,
communication and alerting.
Audit, balance and control (ABC) layer: Job-execution statistics, balancing and controls, rejects- and errorhandling, codes management.
Better performance if coded properly, and can take advantage of parallel processing capabilities when the need
arises.
Consideration for data-driven tables to support more complex code-mappings and business-rule application.
Performance
Scalable
Migratable
What is Informatica
Informatica Power Center is a powerful ETL tool from Informatica Corporation.
Informatica Corporation products are:
Informatica on demand
Informatica Power Center is a single, unified enterprise data integration platform for accessing, discovering, and integrating
data from virtually any business system, in any format, and delivering that data throughout the enterprise at any speed.
Informatica Power Center Editions :
Because every data integration project is different and includes many variables such as data volumes, latency requirements,
IT infrastructure, and methodologiesInformatica offers three Power Center Editions and a suite of Power Center Options
to meet your projects and organizations specific needs.
Standard Edition
Advanced Edition
Administration Console
Repository Service
Integration Service
SAP BW Service
Data Analyzer
Metadata Manager
POWERCENTER CLIENT
The Power Center Client consists of the following applications that we use to manage the repository, design mappings,
mapplets, and create sessions to load the data:
1. Designer
2. Data Stencil
3. Repository Manager
4. Workflow Manager
5. Workflow Monitor
1. Designer:
Use the Designer to create mappings that contain transformation instructions for the Integration Service.
The Designer has the following tools that you use to analyze sources, design target Schemas, and build source-to-target
mappings:
Mapping Designer: Create mappings that the Integration Service uses to Extract, transform, and load data.
2.Data Stencil
Use the Data Stencil to create mapping template that can be used to generate multiple mappings. Data Stencil uses the
Microsoft Office Visio interface to create mapping templates. Not used by a developer usually.
3.Repository Manager
Use the Repository Manager to administer repositories. You can navigate through multiple folders and repositories, and
complete the following tasks:
Manage users and groups: Create, edit, and delete repository users and User groups. We can assign and revoke
repository privileges and folder Permissions.
Perform folder functions: Create, edit, copy, and delete folders. Work we perform in the Designer and Workflow
Manager is stored in folders. If we want to share metadata, you can configure a folder to be shared.
View metadata: Analyze sources, targets, mappings, and shortcut dependencies, search by keyword, and view the
properties of repository Objects. We create repository objects using the Designer and Workflow Manager Client
tools.
We can view the following objects in the Navigator window of the Repository Manager:
Source definitions: Definitions of database objects (tables, views, synonyms) or Files that provide source data.
Target definitions: Definitions of database objects or files that contain the target data.
Mappings: A set of source and target definitions along with transformations containing business logic that you
build into the transformation. These are the instructions that the Integration Service uses to transform and move
data.
Sessions and workflows: Sessions and workflows store information about how and When the Integration Service
moves data. A workflow is a set of instructions that Describes how and when to run tasks related to extracting,
transforming, and loading Data. A session is a type of task that you can put in a workflow. Each session
Corresponds to a single mapping.
4.Workflow Manager :
Use the Workflow Manager to create, schedule, and run workflows. A workflow is a set of instructions that describes how
and when to run tasks related to extracting, transforming, and loading data.
The Workflow Manager has the following tools to help us develop a workflow:
Work let Designer: Create a worklet in the Worklet Designer. A worklet is an object that groups a set of tasks. A
worklet is similar to a workflow, but without scheduling information. We can nest worklets inside a workflow.
Workflow Designer: Create a workflow by connecting tasks with links in the Workflow Designer. You can also
create tasks in the Workflow Designer as you develop the workflow.
When we create a workflow in the Workflow Designer, we add tasks to the workflow. The Workflow Manager includes
tasks, such as the Session task, the Command task, and the Email task so you can design a workflow. The Session task is
based on a mapping we build in the Designer.
We then connect tasks with links to specify the order of execution for the tasks we created. Use conditional links and
workflow variables to create branches in the workflow.
5.Workflow Monitor
Use the Workflow Monitor to monitor scheduled and running workflows for each Integration Service. We can view details
about a workflow or task in Gantt chart view or Task view. We Can run, stop, abort, and resume workflows from the
Workflow Monitor. We can view Sessions and workflow log events in the Workflow Monitor Log Viewer.
The Workflow Monitor displays workflows that have run at least once. The Workflow Monitor continuously receives
information from the Integration Service and Repository Service. It also fetches information from the repository to display
historic Information.
Services Behind Scene
INTEGRATION SERVICE PROCESS
The Integration Service starts an Integration Service process to run and monitor workflows. The Integration Service process
accepts requests from the Power Center Client and from pmcmd. It performs the following tasks:
Runs workflow tasks and evaluates the conditional links connecting tasks.
LOAD BALANCER
The Load Balancer is a component of the Integration Service that dispatches tasks to achieve optimal performance and
scalability. When we run a workflow, the Load Balancer dispatches the Session, Command, and predefined Event-Wait
tasks within the workflow.
The Load Balancer dispatches tasks in the order it receives them. When the Load Balancer needs to dispatch more Session
and Command tasks than the Integration Service can run, it places the tasks it cannot run in a queue. When nodes become
available, the Load Balancer dispatches tasks from the queue in the order determined by the workflow service level.
DTM PROCESS
When the workflow reaches a session, the Integration Service process starts the DTM process. The DTM is the process
associated with the session task. The DTM process performs the following tasks:
Performs pushdown optimization when the session is configured for pushdown optimization.
Adds partitions to the session when the session is configured for dynamic partitioning.
Expands the service process variables, session parameters, and mapping variables and parameters.
Sends a request to start worker DTM processes on other nodes when the session is configured to run on a grid.
Creates and run mapping, reader, writer, and transformation threads to extract, transform, and load data.
PROCESSING THREADS
The DTM allocates process memory for the session and divides it into buffers. This is also known as buffer memory. The
default memory allocation is 12,000,000 bytes.
The DTM uses multiple threads to process data in a session. The main DTM thread is called the master thread.
The master thread can create the following types of threads:
Release locks
Add a repository to the Navigator, and then configure the domain connection information when we connect to the
repository.
1.Adding a Repository to the Navigator :
1. In any of the Power Center Client tools, click Repository > Add.
2. Enter the name of the repository and a valid repository user name.
3. Click OK.
Before we can connect to the repository for the first time, we must configure the Connection information for the domain that
the repository belongs to.
3. Click ok.
Difference Between 7.1 and 8.6
1. Target from Transformation: In Informatica 8X we can create target from Transformation by dragging transformation
in Target designer
2. Pushdown optimization: Uses increased performance by pushing Transformation logic to the database by analyzing the
transformations and Issuing SQL statements to sources and targets. Only processes any Transformation logic that it cannot
push to the database.
3. New function in expression editor: New function have been introduced in Informatica 8X like reg_extract and
reg_match
4. Repository query available in both versioned and non versioned Repositories previously it was available only for
versioned repository.
5. UDF (User defined function) similar to macro in excel
6. FTP: We can have partitioned FTP targets and Indirect FTP File source (with file list).
7. Propagating Port Descriptions: In Informatica 8 we can edit a port description and propagate the description to other
transformations in the mapping.
8. Environment SQL Enhancements: Environment SQL can still be used to Execute an SQL statement at start of
connection to the database. We can Use SQL commands that depend upon a transaction being opened during The entire
read or write process. For example, the following SQL command Modifies how the session handles characters: Alter session
set
NLS_DATE_FORMAT='DD/MM/YYYY';".
9. Concurrently write to multiple files in a session with partitioned targets.
10. Flat File Enhancements:
Flat file now can have integer and double data types
Informatica power center 8 is having the following features which makes it more powerful, easy to use and manage when
compared to previous versions.
High availability
Pushdown optimization
Dynamic partitioning
New transformations
23 New functions
Enterprise GRID
Testing
Unit Testing
Unit testing can be broadly classified into 2 categories.
Quantitative Testing
Validate your Source and Target
a) Ensure that your connectors are configured properly.
b) If you are using flat file make sure have enough read/write permission on the file share.
c) You need to document all the connector information.
Analyze the Load Time
a) Execute the session and review the session statistics.
b) Check the Read and Write counters. How long it takes to perform the load.
c) Use the session and workflow logs to capture the load statistics.
d) You need to document all the load timing information.
Analyze the success rows and rejections.
a) Have customized SQL queries to check the source/targets and here we will perform the Record Count Verification.
b) Analyze the rejections and build a process to handle those rejections. This requires a clear business requirement from the
business on how to handle the data rejections. Do we need to reload or reject and inform etc? Discussions are required and
appropriate process must be developed.
Performance Improvement
a) Network Performance
b) Session Performance
c) Database Performance
d) Analyze and if required define the Informatica and DB partitioning requirements.
Qualitative Testing
Analyze & validate your transformation business rules. More of functional testing.
e) You need review field by field from source to target and ensure that the required transformation logic is applied.
f) If you are making changes to existing mappings make use of the data lineage feature Available with Informatica Power
Center. This will help you to find the consequences of Altering or deleting a port from existing mapping.
g) Ensure that appropriate dimension lookups have been used and your development is in Sync with your business
requirements.
Integration Testing
After unit testing is complete; it should form the basis of starting integration testing. Integration testing should
Test out initial and incremental loading of the data warehouse.
Integration testing will involve following
1. Sequence of ETL jobs in batch.
2. Initial loading of records on data warehouse.
3. Incremental loading of records at a later date to verify the newly inserted or updated data.
When you validate the calculations you dont require loading the entire rows into target and Validating it.
Instead you use the Enable Test Load feature available in Informatica Power Center.
Property
Description
Number of Rows to Test Enter the number of source rows you want the Integration Service to test load. The Integration
Service reads the number you configure for the test load.
Data Quality Validation
Check for missing data, negatives and consistency. Field-by-Field data verification can be done to check the consistency of
source and target data.
Overflow checks: This is a limit check based on the capacity of a data field or data file area to accept data. This
programming technique can be used to detect the truncation of a financial or quantity data field value after computation
(e.g., addition, multiplication, and division). Usually, the first digit is the one lost.
Format checks: These are used to determine that data are entered in the proper mode, as numeric or alphabetical characters,
within designated fields of information. The proper mode in each case depends on the data field definition.
Sign test: This is a test for a numeric data field containing a designation of an algebraic sign, + or - , which can be used to
denote, for example, debits or credits for financial data fields.
Size test: This test can be used to test the full size of the data field. For example, a social security number in the United
States should have nine digits
Granularity
Validate at the lowest granular level possible
Other validations
Audit Trails, Transaction Logs, Error Logs and Validity checks.
Note: Based on your project and business needs you might have additional testing requirements.
User Acceptance Test
In this phase you will involve the user to test the end results and ensure that business is satisfied with the quality of the data.
Any changes to the business requirement will follow the change management process and eventually those changes have to
follow the SDLC process.
Optimize Development, Testing, and Training Systems
Dramatically accelerate development and test cycles and reduce storage costs by creating fully functional, smaller
targeted data subsets for development, testing, and training systems, while maintaining full data integrity.
Quickly build and update nonproduction systems with a small subset of production data and replicate current
subsets of nonproduction copies faster.
Simplify test data management and shrink the footprint of nonproduction systems to significantly reduce IT
infrastructure and maintenance costs.
Reduce application and upgrade deployment risks by properly testing configuration updates with up-to-date,
realistic data before introducing them into production .
Easily customize provisioning rules to meet each organizations changing business requirements.
Untangle complex operational systems and separate data along business lines to quickly build the divested
organizations system.
Accelerate the provisioning of new systems by using only data thats relevant to the divested organization.
Decrease the cost and time of data divestiture with no reimplementation costs .
Dramatically increase an IT teams productivity by reusing a comprehensive list of data objects for data selection
and updating processes across multiple projects, instead of coding by handwhich is expensive, resource intensive,
and time consuming .
Accelerate application delivery by decreasing R&D cycle time and streamlining test data management.
Improve the reliability of application delivery by ensuring IT teams have ready access to updated quality production
data.
Lower administration costs by centrally managing data growth solutions across all packaged and custom
applications.
Test a development environment. Run the Integration Service in safe mode to test a development environment
before migrating to production
Troubleshoot the Integration Service. Configure the Integration Service to fail over in safe mode and
troubleshoot errors when you migrate or test a production environment configured for high availability. After the
Integration Service fails over in safe mode, you can correct the error that caused the Integration Service to fail over.
Syntax Testing: Test your customized queries using your source qualifier before executing the session. Performance
Testing for identifying the following bottlenecks:
Target
Source
Mapping
Session
System
Run test sessions. You can configure a test session to read from a flat file source or to write to a flat file target to
identify source and target bottlenecks.
Analyze performance details. Analyze performance details, such as performance counters, to determine where
session performance decreases.
Analyze thread statistics. Analyze thread statistics to determine the optimal number of partition points.
Monitor system performance. You can use system monitoring tools to view the percentage of CPU use, I/O waits,
and paging to identify system bottlenecks. You can also use the Workflow Monitor to view system resource usage.
Use Power Center conditional filter in the Source Qualifier to improve performance.
Share metadata. You can share metadata with a third party. For example, you want to send a mapping to someone
else for testing or analysis, but you do not want to disclose repository connection information for security reasons.
You can export the mapping to an XML file and edit the repository connection information before sending the
XML file. The third party can import the mapping from the XML file and analyze the metadata.
Debugger
You can debug a valid mapping to gain troubleshooting information about data and error conditions. To debug a mapping,
you configure and run the Debugger from within the Mapping Designer. The Debugger uses a session to run the mapping on
the Integration Service. When you run the Debugger, it pauses at breakpoints and you can view and edit transformation
output data.
Before you run a session. After you save a mapping, you can run some initial tests with a debug session before you
create and configure a session in the Workflow Manager.
After you run a session. If a session fails or if you receive unexpected results in the target, you can run the
Debugger against the session. You might also want to run the Debugger against a session if you want to debug the
mapping using the configured session properties.
Use an existing non-reusable session. The Debugger uses existing source, target, and session configuration
properties. When you run the Debugger, the Integration Service runs the non-reusable session and the existing
workflow. The Debugger does not suspend on error.
Use an existing reusable session. The Debugger uses existing source, target, and session configuration properties.
When you run the Debugger, the Integration Service runs a debug instance of the reusable session And creates and
runs a debug workflow for the session.
Create a debug session instance. You can configure source, target, and session configuration properties through
the Debugger Wizard. When you run the Debugger, the Integration Service runs a debug instance of the debug
workflow and creates and runs a debug workflow for the session.
Debug Process
To debug a mapping, complete the following steps:
1. Create breakpoints. Create breakpoints in a mapping where you want the Integration Service to evaluate data and error
conditions.
2. Configure the Debugger. Use the Debugger Wizard to configure the Debugger for the mapping. Select the session type
the Integration Service uses when it runs the Debugger. When you create a debug session, you configure a subset of session
properties within the Debugger Wizard, such as source and target location. You can also choose to load or discard target
data.
3. Run the Debugger. Run the Debugger from within the Mapping Designer. When you run the Debugger, the Designer
connects to the Integration Service. The Integration Service initializes the Debugger and runs the debugging session and
workflow. The Integration Service reads the breakpoints and pauses the Debugger
when the breakpoints evaluate to true.
4. Monitor the Debugger. While you run the Debugger, you can monitor the target data, transformation and mapplet output
data, the debug log, and the session log. When you run the Debugger, the Designer displays the following windows:
5. Modify data and breakpoints. When the Debugger pauses, you can modify data and see the effect on transformations,
mapplets, and targets as the data moves through the pipeline. You can also modify breakpoint information.
The Designer saves mapping breakpoint and Debugger information in the workspace files. You can copy breakpoint
information and the Debugger configuration to another mapping. If you want to run the Debugger from another Power
Center Client machine, you can copy the breakpoint information and the Debugger configuration to the other Power Center
Client machine.
Running the Debugger:
When you complete the Debugger Wizard, the Integration Service starts the session and initializes the Debugger. After
initialization, the Debugger moves in and out of running and paused states based on breakpoints and commands that you
issue from the Mapping Designer. The Debugger can be in one of the following states:
Paused. The Integration Service encounters a break and pauses the Debugger.
Note: To enable multiple users to debug the same mapping at the same time, each user must configure different port
numbers in the Tools > Options > Debug tab.
The Debugger does not use the high availability functionality.
The Mapping Designer displays windows and debug indicators that help you monitor the session:
Debug indicators. Debug indicators on transformations help you follow breakpoints and data flow.
Instance window. When the Debugger pauses, you can view transformation data and row information in the
Instance window.
Target window. View target data for each target in the mapping.
Output window. The Integration Service writes messages to the following tabs in the Output window:
Session Log tab. The session log displays in the Session Log tab.
While you monitor the Debugger, you might want to change the transformation output data to see the effect on subsequent
transformations or targets in the data flow. You might also want to edit or add more breakpoint information to monitor the
session more closely.
Restrictions
You cannot change data for the following output ports:
Lookup transformation. NewLookupRow port for a Lookup transformation configured to use a dynamic cache.
Custom transformation. Ports in output groups other than the current output group.
Java transformation. Ports in output groups other than the current output group.
Constraint-Based Loading:
In the Workflow Manager, you can specify constraint-based loading for a session. When you select this option, the
Integration Service orders the target load on a row-by-row basis. For every row generated by an active source, the
Integration Service loads the corresponding transformed row first to the primary key table, then to any foreign key tables.
Constraint-based loading depends on the following requirements:
Active source. Related target tables must have the same active source.
Treat rows as insert. Use this option when you insert into the target. You cannot use updates with constraint based
loading.
Active Source:
When target tables receive rows from different active sources, the Integration Service reverts to normal loading for those
tables, but loads all other targets in the session using constraint-based loading when possible. For example, a mapping
contains three distinct pipelines. The first two contain a source, source qualifier, and target. Since these two targets receive
data from different active sources, the Integration Service reverts to normal loading for both targets. The third pipeline
contains a source, Normalizer, and two targets. Since these two targets share a single active source (the Normalizer), the
Integration Service performs constraint-based loading: loading the primary key table first, then the foreign key table.
Key Relationships:
When target tables have no key relationships, the Integration Service does not perform constraint-based loading.
Similarly, when target tables have circular key relationships, the Integration Service reverts to a normal load. For example,
you have one target containing a primary key and a foreign key related to the primary key in a second target. The second
target also contains a foreign key that references the primary key in the first target. The Integration Service cannot enforce
constraint-based loading for these tables. It reverts to a normal load.
Target Connection Groups:
The Integration Service enforces constraint-based loading for targets in the same target connection group. If you want to
specify constraint-based loading for multiple targets that receive data from the same active source, you must verify the
tables are in the same target connection group. If the tables with the primary key-foreign key relationship are in different
target connection groups, the Integration Service cannot enforce constraint-based loading when you run the workflow. To
verify that all targets are in the same target connection group, complete the following tasks:
Verify all targets are in the same target load order group and receive data from the same active source.
Use the default partition properties and do not add partitions or partition points.
Define the same target type for all targets in the session properties.
Define the same database connection name for all targets in the session properties.
Choose normal mode for the target load type for all targets in the session properties.
Load primary key table in one mapping and dependent tables in another mapping. Use constraint-based loading to
load the primary table.
Constraint-based loading does not affect the target load ordering of the mapping. Target load ordering defines the order the
Integration Service reads the sources in each target load order group in the mapping. A target load order group is a
collection of source qualifiers, transformations, and targets linked together in a mapping. Constraint based loading
establishes the order in which the Integration Service loads individual targets within a set of targets receiving data from a
single source qualifier.
Example
The following mapping is configured to perform constraint-based loading:
In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign keys referencing the T1 primary key. T_3
has a primary key that T_4 references as a foreign key.
Since these tables receive records from a single active source, SQ_A, the Integration Service loads rows to the target in the
following order:
1. T_1
2. T_2 and T_3 (in no particular order)
3. T_4
The Integration Service loads T_1 first because it has no foreign key dependencies and contains a primary key referenced by
T_2 and T_3. The Integration Service then loads T_2 and T_3, but since T_2 and T_3 have no dependencies, they are not
loaded in any particular order. The Integration Service loads T_4 last, because it has a foreign key that references a primary
key in T_3.After loading the first set of targets, the Integration Service begins reading source B. If there are no key
relationships between T_5 and T_6, the Integration Service reverts to a normal load for both targets.
If T_6 has a foreign key that references a primary key in T_5, since T_5 and T_6 receive data from a single active source,
the Aggregator AGGTRANS, the Integration Service loads rows to the tables in the following order:
T_5
T_6
T_1, T_2, T_3, and T_4 are in one target connection group if you use the same database connection for each target, and you
use the default partition properties. T_5 and T_6 are in another target connection group together if you use the same
database connection for each target and you use the default partition properties. The Integration Service includes T_5 and
T_6 in a different target connection group because they are in a different target load order group from the first four targets.
Enabling Constraint-Based Loading:
When you enable constraint-based loading, the Integration Service orders the target load on a row-by-row basis. To enable
constraint-based loading:
1. In the General Options settings of the Properties tab, choose Insert for the Treat Source Rows As property.
2. Click the Config Object tab. In the Advanced settings, select Constraint Based Load Ordering.
3. Click OK.
Target Load Order
When you use a mapplet in a mapping, the Mapping Designer lets you set the target load plan for sources within the
mapplet.
Setting the Target Load Order
You can configure the target load order for a mapping containing any type of target definition. In the Designer, you can set
the order in which the Integration Service sends rows to targets in different target load order groups in a mapping. A target
load order group is the collection of source qualifiers, transformations, and targets linked together in a mapping. You can set
the target load order if you want to maintain referential integrity when inserting, deleting, or updating tables that have the
primary key and foreign key constraints.
The Integration Service reads sources in a target load order group concurrently, and it processes target load order groups
sequentially.
To specify the order in which the Integration Service sends data to targets, create one source qualifier for each target within
a mapping. To set the target load order, you then determine in which order the Integration Service reads each source in the
mapping.
The following figure shows two target load order groups in one mapping:
In this mapping, the first target load order group includes ITEMS, SQ_ITEMS, and T_ITEMS. The second target load order
group includes all other objects in the mapping, including the TOTAL_ORDERS target. The Integration Service processes
the first target load order group, and then the second target load order group.
When it processes the second target load order group, it reads data from both sources at the same time.
To set the target load order:
1. Create a mapping that contains multiple target load order groups.
2. Click Mappings > Target Load Plan.
3. The Target Load Plan dialog box lists all Source Qualifier transformations in the mapping and the targets that
receive data from each source qualifier.
A mapping parameter represents a constant value that we can define before running a session.
A mapping parameter retains the same value throughout the entire session.
Example: When we want to extract records of a particular month during ETL process, we will create a Mapping Parameter
of data type and use it in query to compare it with the timestamp field in SQL override.
We can then use the parameter in any expression in the mapplet or mapping.
We can also use parameters in a source qualifier filter, user-defined join, or extract override, and in the Expression
Editor of reusable transformations.
MAPPING VARIABLES
Unlike mapping parameters, mapping variables are values that can change between sessions.
The Integration Service saves the latest value of a mapping variable to the repository at the end of each successful
session.
We can also clear all saved values for the session in the Workflow Manager.
We might use a mapping variable to perform an incremental read of the source. For example, we have a source table
containing time stamped transactions and we want to evaluate the transactions on a daily basis. Instead of manually entering
a session override to filter source data each time we run the session, we can create a mapping variable, $$IncludeDateTime.
In the source qualifier, create a filter to read only rows whose transaction date equals $$IncludeDateTime, such as:
TIMESTAMP = $$IncludeDateTime
In the mapping, use a variable function to set the variable value to increment one day each time the session runs. If we set
the initial value of $$IncludeDateTime to 8/1/2004, the first time the Integration Service runs the session, it reads only rows
dated 8/1/2004. During the session, the Integration Service sets $$IncludeDateTime to 8/2/2004. It saves 8/2/2004 to the
repository at the end of the session. The next time it runs the session, it reads only rows from August 2, 2004.
Used in following transformations:
Expression
Filter
Router
Update Strategy
Note: If a variable function is not used to calculate the current value of a mapping variable, the start value of the variable is
saved to the repository.
Variable Data type and Aggregation Type When we declare a mapping variable in a mapping, we need to configure the
Data type and aggregation type for the variable. The IS uses the aggregate type of a Mapping variable to determine the final
current value of the mapping variable.
Aggregation types are:
Count: Integer and small integer data types are valid only.
Max: All transformation data types except binary data type are valid.
Min: All transformation data types except binary data type are valid.
Variable Functions
Variable functions determine how the Integration Service calculates the current value of a mapping variable in a pipeline.
SetMaxVariable: Sets the variable to the maximum value of a group of values. It ignores rows marked for update, delete,
or reject. Aggregation type set to Max.
SetMinVariable: Sets the variable to the minimum value of a group of values. It ignores rows marked for update, delete, or
reject. Aggregation type set to Min.
SetCountVariable: Increments the variable value by one. It adds one to the variable value when a row is marked for
insertion, and subtracts one when the row is Marked for deletion. It ignores rows marked for update or reject. Aggregation
type set to Count.
SetVariable: Sets the variable to the configured value. At the end of a session, it compares the final current value of the
variable to the start value of the variable. Based on the aggregate type of the variable, it saves a final value to the repository.
Creating Mapping Parameters and Variables
1. Open the folder where we want to create parameter or variable.
2. In the Mapping Designer, click Mappings > Parameters and Variables. -or- In the Mapplet Designer, click Mapplet
> Parameters and Variables.
3. Click the add button.
4. Enter name. Do not remove $$ from name.
5. Select Type and Data type. Select Aggregation type for mapping variables.
6. Give Initial Value. Click ok.
Create a target table MP_MV_EXAMPLE having columns: EMPNO, ENAME, DEPTNO, TOTAL_SAL,
MAX_VAR, MIN_VAR, COUNT_VAR and SET_VAR.
TOTAL_SAL = SAL+ COMM + $$BONUS (Bonus is mapping parameter that changes every month)
Creating Mapping
1. Open folder where we want to create the mapping.
2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give name. Ex: m_mp_mv_example
4. Drag EMP and target table.
5. Transformation -> Create -> Select Expression for list -> Create > Done.
6. Drag EMPNO, ENAME, HIREDATE, SAL, COMM and DEPTNO to Expression.
7. Create Parameter $$Bonus and Give initial value as 200.
8. Create variable $$var_max of MAX aggregation type and initial value 1500.
9. Create variable $$var_min of MIN aggregation type and initial value 1500.
10. Create variable $$var_count of COUNT aggregation type and initial value 0. COUNT is visible when datatype is
INT or SMALLINT.
11. Create variable $$var_set of MAX aggregation type.
17. Open Expression editor for out_min_var and write the following expression:
SETMINVARIABLE($$var_min,SAL). Validate the expression.
18. Open Expression editor for out_count_var and write the following expression:
SETCOUNTVARIABLE($$var_count). Validate the expression.
19. Open Expression editor for out_set_var and write the following expression:
SETVARIABLE($$var_set,ADD_TO_DATE(HIREDATE,'MM',1)). Validate.
20. Click OK. Expression Transformation below:
21. Link all ports from expression to target and Validate Mapping and Save it.
PARAMETER FILE
A parameter file is a list of parameters and associated values for a workflow, worklet, or session.
Parameter files provide flexibility to change these variables each time we run a workflow or session.
We can create multiple parameter files and change the file we use for a session or workflow. We can create a
parameter file using a text editor such as WordPad or Notepad.
Enter the parameter file name and directory in the workflow or session properties.
Worklet variable: References values and records information in a worklet. Use predefined worklet variables in a
parent workflow, but we cannot use workflow variables from the parent workflow in a worklet.
Session parameter: Defines a value that can change from session to session, such as a database connection or file
name.
We can specify the parameter file name and directory in the workflow or session properties.
To enter a parameter file in the workflow properties:
1. Open a Workflow in the Workflow Manager.
2. Click Workflows > Edit.
3. Click the Properties tab.
4. Enter the parameter directory and name in the Parameter Filename field.
5. Click OK.
To enter a parameter file in the session properties:
1. Open a session in the Workflow Manager.
2. Click the Properties tab and open the General Options settings.
3. Enter the parameter directory and name in the Parameter Filename field.
4. Example: D:\Files\Para_File.txt or $PMSourceFileDir\Para_File.txt
5. Click OK.
MAPPLETS
It contains a set of transformations and lets us reuse that transformation logic in multiple mappings.
We need to use same set of 5 transformations in say 10 mappings. So instead of making 5 transformations in every 10
mapping, we create a mapplet of these 5 transformations. Now we use this mapplet in all 10 mappings. Example: To create
a surrogate key in target. We create a mapplet using a stored procedure to create Primary key for target table. We give target
table name and key column name as input to mapplet and get the Surrogate key as output.
Mapplets help simplify mappings in the following ways:
Include source definitions: Use multiple source definitions and source qualifiers to provide source data for a
mapping.
Pass data to multiple transformations: We can create a mapplet to feed data to multiple transformations. Each
Output transformation in a mapplet represents one output group in a mapplet.
Contain unused ports: We do not have to connect all mapplet input and output ports in a mapping.
Mapplet Input:
Mapplet input can originate from a source definition and/or from an Input transformation in the mapplet. We can create
multiple pipelines in a mapplet.
Mapplet Output:
The output of a mapplet is not connected to any target table.
A mapplet must contain at least one Output transformation with at least one connected port in the mapplet.
Example1: We will join EMP and DEPT table. Then calculate total salary. Give the output to mapplet out transformation.
EMP and DEPT will be source tables.
Output will be given to transformation Mapplet_Out.
Steps:
1. Open folder where we want to create the mapping.
2. Click Tools -> Mapplet Designer.
3. Click Mapplets-> Create-> Give name. Ex: mplt_example1
4. Drag EMP and DEPT table.
5. Use Joiner transformation as described earlier to join them.
6. Transformation -> Create -> Select Expression for list -> Create -> Done
7. Pass all ports from joiner to expression and then calculate total salary as described in expression transformation.
8. Now Transformation -> Create -> Select Mapplet Out from list > Create -> Give name and then done.
9. Pass all ports from expression to Mapplet output.
10. Mapplet -> Validate
11. Repository -> Save
Use of mapplet in mapping:
We can mapplet in mapping by just dragging the mapplet from mapplet folder on left pane as we drag source and
target tables.
When we use the mapplet in a mapping, the mapplet object displays only the ports from the Input and Output
transformations. These are referred to as the mapplet input and mapplet output ports.
PARTITIONING
A pipeline consists of a source qualifier and all the transformations and Targets that receive data from that source
qualifier.
When the Integration Service runs the session, it can achieve higher Performance by partitioning the pipeline and
performing the extract, Transformation, and load for each partition in parallel.
A partition is a pipeline stage that executes in a single reader, transformation, or Writer thread. The number of partitions in
any pipeline stage equals the number of Threads in the stage. By default, the Integration Service creates one partition in
every pipeline stage.
PARTITIONING ATTRIBUTES
1. Partition points
Partition points mark thread boundaries and divide the pipeline into stages.
2. Number of Partitions
When we increase or decrease the number of partitions at any partition point, the Workflow Manager increases or
decreases the number of partitions at all Partition points in the pipeline.
increasing the number of partitions or partition points increases the number of threads.
The number of partitions we create equals the number of connections to the source or target. For one partition, one
database connection will be used.
3. Partition types
The Integration Service creates a default partition type at each partition point.
If we have the Partitioning option, we can change the partition type. This option is purchased separately.
The partition type controls how the Integration Service distributes data among partitions at partition points.
PARTITIONING TYPES
1. Round Robin Partition Type
In round-robin partitioning, the Integration Service distributes rows of data evenly to all partitions.
Use round-robin partitioning when we need to distribute rows evenly and do not need to group data among
partitions.
In pass-through partitioning, the Integration Service processes data without Redistributing rows among partitions.
All rows in a single partition stay in that partition after crossing a pass-Through partition point.
Use pass-through partitioning when we want to increase data throughput, but we do not want to increase the number
of partitions.
Use database partitioning for Oracle and IBM DB2 sources and IBM DB2 targets only.
Use any number of pipeline partitions and any number of database partitions.
We can improve performance when the number of pipeline partitions equals the number of database partitions.
The Integration Service uses all grouped or sorted ports as a compound Partition key.
Use hash auto-keys partitioning at or before Rank, Sorter, Joiner, and Unsorted Aggregator transformations to
ensure that rows are grouped Properly before they enter these transformations.
The Integration Service uses a hash function to group rows of data among Partitions.
The Integration Service passes data to each partition depending on the Ranges we specify for each port.
Use key range partitioning where the sources or targets in the pipeline are Partitioned by key range.
Example: Customer 1-100 in one partition, 101-200 in another and so on. We Define the range for each partition.
The Workflow Manager does not allow us to use links to create loops in the workflow. Each link in the workflow
can run only once.
Valid Workflow :
Example of loop:
Once we create links between tasks, we can specify conditions for each link to determine the order of execution in
the workflow.
If we do not specify conditions for each link, the Integration Service runs the next task in the workflow by default.
Steps:
1. In the Workflow Designer workspace, double-click the link you want to specify.
2. The Expression Editor appears.
3. In the Expression Editor, enter the link condition. The Expression Editor provides predefined workflow variables,
user-defined workflow variables, variable functions, and Boolean and arithmetic operators.
4. Validate the expression using the Validate button.
Link conditions
Decision task
Assignment task
SCHEDULERS
We can schedule a workflow to run continuously, repeat at a given time or interval, or we can manually start a workflow.
The Integration Service runs a scheduled workflow as configured.
By default, the workflow runs on demand. We can change the schedule settings by editing the scheduler. If we change
schedule settings, the Integration Service reschedules the workflow according to the new settings.
The Workflow Manager marks a workflow invalid if we delete the scheduler associated with the workflow.
If we choose a different Integration Service for the workflow or restart the Integration Service, it reschedules all
workflows.
If we delete a folder, the Integration Service removes workflows from the schedule.
For each folder, the Workflow Manager lets us create reusable schedulers so we can reuse the same set of
scheduling settings for workflows in the folder.
Use a reusable scheduler so we do not need to configure the same set of scheduling settings in each workflow.
When we delete a reusable scheduler, all workflows that use the deleted scheduler becomes invalid. To make the
workflows valid, we must edit them and replace the missing scheduler.
Steps:
1. Open the folder where we want to create the scheduler.
2. In the Workflow Designer, click Workflows > Schedulers.
3. Click Add to add a new scheduler.
4. In the General tab, enter a name for the scheduler.
5. Configure the scheduler settings in the Scheduler tab.
6. Click Apply and OK.
Configuring Scheduler Settings
Configure the Schedule tab of the scheduler to set run options, schedule options, start options, and end options for the
schedule.
There are 3 run options:
1. Run on Demand
2. Run Continuously
3. Run on Server initialization
1. Run on Demand:
Integration Service runs the workflow when we start the workflow manually.
2. Run Continuously:
Integration Service runs the workflow as soon as the service initializes. The Integration Service then starts the next run of
the workflow as soon as it finishes the previous run.
3. Run on Server initialization
Integration Service runs the workflow as soon as the service is initialized. The Integration Service then starts the next run of
the workflow according to settings in Schedule Options.
Schedule options for Run on Server initialization:
Customized Repeat: Integration Service runs the workflow on the dates and times specified in the Repeat dialog
box.
End After: IS stops scheduling the workflow after the set number of
workflow runs.
Forever: IS schedules the workflow as long as the workflow does not fail.
To remove a workflow from its schedule, right-click the workflow in the Navigator window and choose
Unscheduled Workflow.
To reschedule a workflow on its original schedule, right-click the workflow in the Navigator window and choose
Schedule Workflow.
Session
Task Developer
Yes
Workflow Designer
Yes
Command
Worklet Designer
Yes
Event-Raise
Workflow Designer
No
Event-Wait
Worklet Designer
No
Timer
No
Decision
No
Assignment
No
Control
No
SESSION TASK
A session is a set of instructions that tells the Power Center Server how and when to move data from sources to
targets.
To run a session, we must first create a workflow to contain the Session task.
We can run as many sessions in a workflow as we need. We can run the Session tasks sequentially or concurrently,
depending on our needs.
The Power Center Server creates several files and in-memory caches depending on the transformations and options
used in the session.
EMAIL TASK
The Workflow Manager provides an Email task that allows us to send email during a workflow.
Created by Administrator usually and we just drag and use it in our mapping.
Steps:
1. In the Task Developer or Workflow Designer, choose Tasks-Create.
2. Select an Email task and enter a name for the task. Click Create.
3. Click Done.
4. Double-click the Email task in the workspace. The Edit Tasks dialog box appears.
5. Click the Properties tab.
6. Enter the fully qualified email address of the mail recipient in the Email User Name field.
7. Enter the subject of the email in the Email Subject field. Or, you can leave this field blank.
8. Click the Open button in the Email Text field to open the Email Editor.
9. Click OK twice to save your changes.
Example: To send an email when a session completes:
Steps:
1. Create a workflow wf_sample_email
2. Drag any session task to workspace.
3. Edit Session task and go to Components tab.
4. See On Success Email Option there and configure it.
5. In Type select reusable or Non-reusable.
6. In Value, select the email task to be used.
7. Click Apply -> Ok.
8. Validate workflow and Repository -> Save
We can also drag the email task and use as per need.
We can set the option to send email on success or failure in components tab of a session task.
COMMAND TASK
The Command task allows us to specify one or more shell commands in UNIX or DOS commands in Windows to run
during the workflow.
For example, we can specify shell commands in the Command task to delete reject files, copy a file, or archive target files.
Ways of using command task:
1. Standalone Command task: We can use a Command task anywhere in the workflow or worklet to run shell commands.
2. Pre- and post-session shell command: We can call a Command task as the pre- or post-session shell command for a
Session task. This is done in COMPONENTS TAB of a session. We can run it in Pre-Session Command or Post Session
Success Command or Post Session Failure Command. Select the Value and Type option as we did in Email task.
Example: to copy a file sample.txt from D drive to E.
Command: COPY D:\sample.txt E:\ in windows
Steps for creating command task:
1. In the Task Developer or Workflow Designer, choose Tasks-Create.
2. Select Command Task for the task type.
3. Enter a name for the Command task. Click Create. Then click done.
4. Double-click the Command task. Go to commands tab.
5. In the Commands tab, click the Add button to add a command.
6. In the Name field, enter a name for the new command.
7. In the Command field, click the Edit button to open the Command Editor.
8. Enter only one command in the Command Editor.
9. Click OK to close the Command Editor.
10. Repeat steps 5-9 to add more commands in the task.
11. Click OK.
Steps to create the workflow using command task:
1. Create a task using the above steps to copy a file in Task Developer.
2. Open Workflow Designer. Workflow -> Create -> Give name and click ok.
3. Start is displayed. Drag session say s_m_Filter_example and command task.
4. Link Start to Session task and Session to Command Task.
5. Double click link between Session and Command and give condition in editor as
6. $S_M_FILTER_EXAMPLE.Status=SUCCEEDED
7. Workflow-> Validate
8. Repository > Save
Pre-defined event: A pre-defined event is a file-watch event. This event Waits for a specified file to arrive at a
given location.
User-defined event: A user-defined event is a sequence of tasks in the Workflow. We create events and then raise
them as per need.
EVENT RAISE: Event-Raise task represents a user-defined event. We use this task to raise a user defined event.
EVENT WAIT: Event-Wait task waits for a file watcher event or user defined event to occur before executing the
next session in the workflow.
Example1: Use an event wait task and make sure that session s_filter_example runs when abc.txt file is present in
D:\FILES folder.
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_event_wait_file_watch -> Click ok.
2. Task -> Create -> Select Event Wait. Give name. Click create and done.
3. Link Start to Event Wait task.
4. Drag s_filter_example to workspace and link it to event wait task.
5. Right click on event wait task and click EDIT -> EVENTS tab.
6. Select Pre Defined option there. In the blank space, give directory and filename to watch. Example:
D:\FILES\abc.tct
7. Workflow validate and Repository Save.
Example 2: Raise a user defined event when session s_m_filter_example succeeds. Capture this event in event wait task
and run session S_M_TOTAL_SAL_EXAMPLE
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_event_wait_event_raise -> Click ok.
2. Workflow -> Edit -> Events Tab and add events EVENT1 there.
3. Drag s_m_filter_example and link it to START task.
4. Click Tasks -> Create -> Select EVENT RAISE from list. Give name
5. ER_Example. Click Create and then done.Link ER_Example to s_m_filter_example.
6. Right click ER_Example -> EDIT -> Properties Tab -> Open Value for User Defined Event and Select EVENT1
from the list displayed. Apply -> OK.
7. Click link between ER_Example and s_m_filter_example and give the condition
$S_M_FILTER_EXAMPLE.Status=SUCCEEDED
8. Click Tasks -> Create -> Select EVENT WAIT from list. Give name EW_WAIT. Click Create and then done.
9. Link EW_WAIT to START task.
10. Right click EW_WAIT -> EDIT-> EVENTS tab.
11. Select User Defined there. Select the Event1 by clicking Browse Events button.
12. Apply -> OK.
13. Drag S_M_TOTAL_SAL_EXAMPLE and link it to EW_WAIT.
14. Mapping -> Validate
15. Repository -> Save.
16. Run workflow and see.
Absolute time: We specify the exact date and time or we can choose a user-defined workflow variable to specify
the exact time. The next task in workflow will run as per the date and time specified.
Relative time: We instruct the Power Center Server to wait for a specified period of time after the Timer task, the
parent workflow, or the top-level workflow starts.
Example: Run session s_m_filter_example relative to 1 min after the timer task.
DECISION TASK
The Decision task allows us to enter a condition that determines the execution of the workflow, similar to a link
condition.
The Decision task has a pre-defined variable called $Decision_task_name.condition that represents the result of the
decision condition.
The Power Center Server evaluates the condition in the Decision task and sets the pre-defined condition variable to
True (1) or False (0).
CONTROL TASK
We can use the Control task to stop, abort, or fail the top-level workflow or the parent workflow based on an input
link condition.
A parent workflow or worklet is the workflow or worklet that contains the Control task.
Control Option
Description
Fail Me
Fail Parent
Stop Parent
Abort Parent
Fail Top-Level WF
ASSIGNMENT TASK
To use an Assignment task in the workflow, first create and add the
Assignment task to the workflow. Then configure the Assignment task to assign values or expressions to userdefined variables.
SCD Type 1
Slowly Changing Dimensions (SCDs) are dimensions that have data that changes slowly, rather than changing on a timebased, regular schedule
For example, you may have a dimension in your database that tracks the sales records of your company's salespeople.
Creating sales reports seems simple enough, until a salesperson is transferred from one regional office to another. How do
you record such a change in your sales dimension?
You could sum or average the sales by salesperson, but if you use that to compare the performance of salesmen, that might
give misleading information. If the salesperson that was transferred used to work in a hot market where sales were easy, and
now works in a market where sales are infrequent, her totals will look much stronger than the other salespeople in her new
region, even if they are just as good. Or you could create a second salesperson record and treat the transferred person as a
new sales person, but that creates problems also.
Dealing with these issues involves SCD management methodologies:
Type 1:
The Type 1 methodology overwrites old data with new data, and therefore does not track historical data at all. This is most
appropriate when correcting certain types of data errors, such as the spelling of a name. (Assuming you won't ever need to
know how it used to be misspelled in the past.)
Here is an example of a database table that keeps supplier information:
Supplier_Key Supplier_Code Supplier_Name Supplier_State
123
ABC
Acme Supply Co CA
In this example, Supplier_Code is the natural key and Supplier_Key is a surrogate key. Technically, the surrogate key is not
necessary, since the table will be unique by the natural key (Supplier_Code). However, the joins will perform better on an
integer than on a character string.
Now imagine that this supplier moves their headquarters to Illinois. The updated table would simply overwrite this record:
Supplier_Key Supplier_Code Supplier_Name Supplier_State
123
ABC
Acme Supply Co IL
The obvious disadvantage to this method of managing SCDs is that there is no historical record kept in the data warehouse.
You can't tell if your suppliers are tending to move to the Midwest, for example. But an advantage to Type 1 SCDs is that
they are very easy to maintain.
Explanation with an Example:
Source Table: (01-01-11) Target Table: (01-01-11)
Emp no
Ename
Sal
101
1000
102
2000
103
3000
Emp no
Ename
Sal
101
1000
102
2000
103
3000
The necessity of the lookup transformation is illustrated using the above source and target table.
Source Table: (01-02-11) Target Table: (01-02-11)
Emp no
Ename
Sal
Empno
Ename
Sal
101
1000
101
1000
102
2500
102
2500
103
3000
103
3000
104
4000
104
4000
In the second Month we have one more employee added up to the table with the Ename D and salary of the
Employee is changed to the 2500 instead of 2000.
Create a table by name emp_source with three columns as shown above in oracle.
In the same way as above create two target tables with the names emp_target1, emp_target2.
Go to the targets Menu and click on generate and execute to confirm the creation of the target tables.
The snap shot of the connections using different kinds of transformations are shown below.
Here in this transformation we are about to use four kinds of transformations namely Lookup transformation,
Expression Transformation, Filter Transformation, Update Transformation. Necessity and the usage of all the
transformations will be discussed in detail below.
Look up Transformation: The purpose of this transformation is to determine whether to insert, Delete, Update or reject the
rows in to target table.
The first thing that we are goanna do is to create a look up transformation and connect the Empno from the source
qualifier to the transformation.
What Lookup transformation does in our mapping is it looks in to the target table (emp_table) and compares it with
the Source Qualifier and determines whether to insert, update, delete or reject rows.
In the Ports tab we should add a new column and name it as empno1 and this is column for which we are gonna
connect from the Source Qualifier.
The Input Port for the first column should be unchked where as the other ports like Output and lookup box should
be checked. For the newly created column only input and output boxes should be checked.
(ii)Lookup Table Column should be Empno, Transformation port should be Empno1 and Operator should =.
Expression Transformation: After we are done with the Lookup Transformation we are using an expression
transformation to check whether we need to insert the records the same records or we need to update the records. The steps
to create an Expression Transformation are shown below.
Drag all the columns from both the source and the look up transformation and drop them all on to the Expression
transformation.
Now double click on the Transformation and go to the Ports tab and create two new columns and name it as insert
and update. Both these columns are gonna be our output data so we need to have check mark only in front of the
Output check box.
The Snap shot for the Edit transformation window is shown below.
The condition that we want to parse through our output data are listed below.
Input IsNull(EMPNO1)
Output iif(Not isnull (EMPNO1) and Decode(SAL,SAL1,1,0)=0,1,0) .
Filter Transformation: we are gonna have two filter transformations one to insert and other to update.
Connect the Insert column from the expression transformation to the insert column in the first filter transformation
and in the same way we are gonna connect the update column in the expression transformation to the update column
in the second filter.
Later now connect the Empno, Ename, Sal from the expression transformation to both filter transformation.
If there is no change in input data then filter transformation 1 forwards the complete input to update strategy
transformation 1 and same output is gonna appear in the target table.
If there is any change in input data then filter transformation 2 forwards the complete input to the update strategy
transformation 2 then it is gonna forward the updated input to the target table.
Update Strategy Transformation: Determines whether to insert, delete, update or reject the rows.
Drag the respective Empno, Ename and Sal from the filter transformations and drop them on the respective Update
Strategy Transformation.
Now go to the Properties tab and the value for the update strategy expression is 0 (on the 1 st update transformation).
Now go to the Properties tab and the value for the update strategy expression is 1 (on the 2 nd update transformation).
We are all set here finally connect the outputs of the update transformations to the target table.
Type 2
Let us drive the point home using a simple scenario. For eg., in the current month ie.,(01-01-2010) we are provided
with an source table with the three columns and three rows in it like (EMpno,Ename,Sal). There is a new employee
added and one change in the records in the month (01-02-2010). We are gonna use the SCD-2 style to extract and
load the records in to target table.
The thing to be noticed here is if there is any update in the salary of any employee then the history of that employee
is displayed with the current date as the start date and the previous date as the end date.
Ename
Sal
101
1000
102
2000
103
3000
Emp no
Ename
Sal
S-date
E-date
Ver
Flag
100
101
1000
01-01-10
Null
200
102
2000
01-01-10
Null
300
103
3000
01-01-10
Null
Ename
Sal
101
1000
102
2500
103
3000
104
4000
Emp no
Ename
Sal
S-date
E-date
Ver
Flag
100
101
1000
01-02-10
Null
200
102
2000
01-02-10
Null
300
103
3000
01-02-10
Null
201
102
2500
01-02-10
01-01-10
400
104
4000
01-02-10
Null
In the second Month we have one more employee added up to the table with the Ename D and salary of the Employee is
changed to the 2500 instead of 2000.
Step 1: Is to import Source Table and Target table.
Create a table by name emp_source with three columns as shown above in oracle.
Drag the Target table twice on to the mapping designer to facilitate insert or update process.
Go to the targets Menu and click on generate and execute to confirm the creation of the target tables.
The snap shot of the connections using different kinds of transformations are shown below.
In The Target Table we are goanna add five columns (Skey, Version, Flag, S_date ,E_Date).
Here in this transformation we are about to use four kinds of transformations namely Lookup transformation (1),
Expression Transformation (3), Filter Transformation (2), Sequence Generator. Necessity and the usage of all the
transformations will be discussed in detail below.
Look up Transformation: The purpose of this transformation is to Lookup on the target table and to compare the same with
the Source using the Lookup Condition.
The first thing that we are gonna do is to create a look up transformation and connect the Empno from the source
qualifier to the transformation.
Drag the Empno column from the Source Qualifier to the Lookup Transformation.
(ii)Lookup Table Column should be Empno, Transformation port should be Empno1 and Operator should =.
Expression Transformation: After we are done with the Lookup Transformation we are using an expression
transformation to find whether the data on the source table matches with the target table. We specify the condition here
whether to insert or to update the table. The steps to create an Expression Transformation are shown below.
Drag all the columns from both the source and the look up transformation and drop them all on to the Expression
transformation.
Now double click on the Transformation and go to the Ports tab and create two new columns and name it as insert
and update. Both these columns are goanna be our output data so we need to have unchecked input check box.
The Snap shot for the Edit transformation window is shown below.
The condition that we want to parse through our output data are listed below.
Insert : IsNull(EmpNO1)
Update: iif(Not isnull (Skey) and Decode(SAL,SAL1,1,0)=0,1,0) .
Filter Transformation: We need two filter transformations the purpose the first filter is to filter out the records which we
are goanna insert and the next is vice versa.
If there is no change in input data then filter transformation 1 forwards the complete input to Exp 1 and same output
is goanna appear in the target table.
If there is any change in input data then filter transformation 2 forwards the complete input to the Exp 2 then it is
gonna forward the updated input to the target table.
The closer view of the connections from the expression to the filter is shown below.
Sequence Generator: We use this to generate an incremental cycle of sequential range of number.The purpose of this in
our mapping is to increment the skey in the bandwidth of 100.
We are gonna have a sequence generator and the purpose of the sequence generator is to increment the values of the
skey in the multiples of 100 (bandwidth of 100).
Expression Transformation:
Exp 1: It updates the target table with the skey values. Point to be noticed here is skey gets multiplied by 100 and a new
row is generated if there is any new EMP added to the list. Else the there is no modification done on the target table.
Now add a new column as N_skey and the expression for it is gonna be Nextval1*100.
We are goanna make the s-date as the o/p and the expression for it is sysdate.
Exp 2: If same employee is found with any updates in his records then Skey gets added by 1 and version changes to the
next higher number,F
Now add a new column as N_skey and the expression for it is gonna be Skey+1.
Exp 3: If any record of in the source table gets updated then we make it only as the output.
Update Strategy: This is place from where the update instruction is set on the target table.
SCD Type 3
This Method has limited history preservation, and we are goanna use skey as the Primary key here.
Source table: (01-01-2011)
Empno
Ename
Sal
101
1000
102
2000
103
3000
C-sal
P-sal
101
1000
102
2000
103
3000
Ename
Sal
101
1000
102
4566
103
3000
Ename
C-sal
P-sal
101
1000
102
4566
Null
103
3000
102
4544
4566
Step 2: here we are goanna see the purpose and usage of all the transformations that we have used in the above mapping.
Look up Transformation: The look Transformation looks the target table and compares the same with the source table.
Based on the Look up condition it decides whether we need to update, insert, and delete the data from being loaded in to the
target table.
As usually we are goanna connect Empno column from the Source Qualifier and connect it to look up
transformation. Prior to this Look up transformation has to look at the target table.
Next to this we are goanna specify the look up condition empno =empno1.
Finally specify that connection Information (Oracle) and look up policy on multiple mismatches (use last value) in
the Properties tab.
Expression Transformation:
We are using the Expression Transformation to separate out the Insert-stuffs and Update- Stuffs logically.
Drag all the ports from the Source Qualifier and Look up in to Expression.
These two ports are goanna be just output ports. Specify the below conditions in the Expression editor for the ports
respectively.
Insert: isnull(ENO1 )
Update: iif(not isnull(ENO1) and decode(SAL,Curr_Sal,1,0)=0,1,0)
Filter Transformation: We are goanna use two filter Transformation to filter out the data physically in to two separate
sections one for insert and the other for the update process to happen.
Filter 1:
Drag the Insert and other three ports which came from source qualifier in to the Expression in to first filter.
Filter 2:
Drag the update and other four ports which came from Look up in to the Expression in to Second filter.
Update Strategy: Finally we need the update strategy to insert or to update in to the target table.
Update Strategy 1: This is intended to insert in to the target table.
Drag all the ports except the insert from the first filter in to this.
Drag all the ports except the update from the second filter in to this.
Finally connect both the update strategy in to two instances of the target.
Step 3: Create a session for this mapping and Run the work flow.
Step 4: Observe the output it would same as the second target table
Incremental Aggregation:
When we enable the session option-> Incremental Aggregation the Integration Service performs incremental aggregation, it
passes source data through the mapping and uses historical cache data to perform aggregation calculations incrementally.
When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If the
source changes incrementally and you can capture changes, you can configure the session to process those changes. This
allows the Integration Service to update the target incrementally, rather than forcing it to process the entire source and
recalculate the same data each time you run the session.
For example, you might have a session using a source that receives new data every day. You can capture those incremental
changes because you have added a filter condition to the mapping that removes pre-existing data from the flow of data. You
then enable incremental aggregation.
When the session runs with incremental aggregation enabled for the first time on March 1, you use the entire source. This
allows the Integration Service to read and store the necessary aggregate data. On March 2, when you run the session again,
you filter out all the records except those time-stamped March 2. The Integration Service then processes the new data and
updates the target accordingly.Consider using incremental aggregation in the following circumstances:
You can capture new source data. Use incremental aggregation when you can capture new source data each time
you run the session. Use a Stored Procedure or Filter transformation to process new data.
Incremental changes do not significantly change the target. Use incremental aggregation when the changes do
not significantly change the target. If processing the incrementally changed source alters more than half the existing
target, the session may not benefit from using incremental aggregation. In this case, drop the table and recreate the
target with complete source data.
Note: Do not use incremental aggregation if the mapping contains percentile or median functions. The Integration Service
uses system memory to process these functions in addition to the cache memory you configure in the session properties. As
a result, the Integration Service does not store incremental aggregation values for percentile and median functions in disk
caches.
Integration Service Processing for Incremental Aggregation
(i)The first time you run an incremental aggregation session, the Integration Service processes the entire source. At the end
of the session, the Integration Service stores aggregate data from that session run in two files, the index file and the data file.
The Integration Service creates the files in the cache directory specified in the Aggregator transformation properties.
(ii)Each subsequent time you run the session with incremental aggregation, you use the incremental source changes in the
session. For each input record, the Integration Service checks historical information in the index file for a corresponding
group. If it finds a corresponding group, the Integration Service performs the aggregate operation incrementally, using the
aggregate data for that group, and saves the incremental change. If it does not find a corresponding group, the Integration
Service creates a new group and saves the record data.
(iii)When writing to the target, the Integration Service applies the changes to the existing target. It saves modified aggregate
data in the index and data files to be used as historical data the next time you run the session.
(iv) If the source changes significantly and you want the Integration Service to continue saving aggregate data for future
incremental changes, configure the Integration Service to overwrite existing aggregate data with new aggregate data.
Each subsequent time you run a session with incremental aggregation, the Integration Service creates a backup of the
incremental aggregation files. The cache directory for the Aggregator transformation must contain enough disk space for
two sets of the files.
(v)When you partition a session that uses incremental aggregation, the Integration Service creates one set of cache files for
each partition.
The Integration Service creates new aggregate data, instead of using historical data, when you perform one of the following
tasks:
Move the aggregate files without correcting the configured path or directory for the files in the session properties.
Change the configured path or directory for the aggregate files without moving the files to the new location.
When the Integration Service rebuilds incremental aggregation files, the data in the previous files is lost.
Note: To protect the incremental aggregation files from file corruption or disk failure, periodically back up the files.
Preparing for Incremental Aggregation:
When you use incremental aggregation, you need to configure both mapping and session properties:
Configure the session for incremental aggregation and verify that the file directory has enough disk space for the
aggregate files.
The index and data files grow in proportion to the source data. Be sure the cache directory has enough disk space to
store historical data for the session.
When you run multiple sessions with incremental aggregation, decide where you want the files stored. Then, enter
the appropriate directory for the process variable, $PMCacheDir, in the Workflow Manager. You can enter sessionspecific directories for the index and data files. However, by using the process variable for all sessions using
incremental aggregation, you can easily change the cache directory when necessary by changing $PMCacheDir.
Changing the cache directory without moving the files causes the Integration Service to reinitialize the aggregate
cache and gather new aggregate data.
In a grid, Integration Services rebuild incremental aggregation files they cannot find. When an Integration Service
rebuilds incremental aggregation files, it loses aggregate history.
You can configure the session for incremental aggregation in the Performance settings on the Properties tab.
You can also configure the session to reinitialize the aggregate cache. If you choose to reinitialize the cache, the
Workflow Manager displays a warning indicating the Integration Service overwrites the existing cache and a
reminder to clear this option after running the session.
Mapping Templates
A mapping template is a drawing in Visio that represents a PowerCenter mapping. You can configure rules and parameters
in a mapping template to specify the transformation logic.
Use the Informatica Stencil and the Informatica toolbar in the Mapping Architect for Visio to create a mapping template.
The Informatica Stencil contains shapes that represent mapping objects that you can use to create a mapping template. The
Informatica toolbar contains buttons for the tasks you can perform on mapping template.
You can create a mapping template manually, or you can create a mapping template by importing a Power Center mapping.
Creating a Mapping Template Manually:
You can use the Informatica Stencil and the Informatica toolbar to create a mapping template. Save and publish a mapping
template to create the mapping template files.
To create a mapping template manually, complete the following steps:
1. Start Mapping Architect for Visio.
2. Verify that the Informatica Stencil and Informatica toolbar are available.
3. Drag the mapping objects from the Informatica Stencil to the drawing window:- Use the mapping objects to create
visual representation of the mapping.
4. Create links:- Create links to connect mapping objects.
5. Configure link rules:- Configure rules for each link in the mapping template to indicate how data moves from one
mapping object to another. Use parameters to make the rules flexible.
6. Configure the mapping objects:- Add a group or expression required by the transformations in the mapping template.
To create multiple mappings, set a parameter for the source or target definition.
7. Declare mapping parameters and variables to use when you run sessions in Power Center:- After you import the
mappings created from the mapping template into Power Center, you can use the mapping parameters and variables in the
session or workflow.
8. Validate the mapping template.
9. Save the mapping template:- Save changes to the mapping template drawing file.
10. Publish the mapping template:- When you publish the mapping template, Mapping Architect for Visio generates a
mapping template XML file and a mapping template parameter file (param.xml).If you edit the mapping template drawing
file after you publish it, you need to publish again. Do not edit the mapping template XML file.
Importing a Mapping Template from a Power Center:
If you have a Power Center mapping that you want to use as a basis for a mapping template, export the mapping to a
mapping XML file and then use the mapping XML file to create a mapping template.
Note: Export the mapping XML file within the current Power Center release. Informatica does not support imported objects
from a different release.
To import a mapping template from a Power Center mapping, complete the following steps:
1. Export a Power Center mapping. In the Designer, select the mapping that you want to base the mapping template on
and export it to an XML file.
2. Start Mapping Architect for Visio.
3. Verify that the Informatica stencil and Informatica toolbar are available.
4. Import the mapping. On the Informatica toolbar, click the Create Template from Mapping XML button. Mapping
Architect for Visio determines the mapping objects and links included in the mapping and adds the appropriate objects to
the drawing window.
5. Verify links. Create or verify links that connect mapping objects.
6. Configure link rules. Configure rules for each link in the mapping template to indicate how data moves from one
mapping object to another. Use parameters to make the rules flexible.
7. Configure the mapping objects. Add a group or expression required by the transformations in the mapping template. To
create multiple mappings, set a parameter for the source or target definition.
8. Declare mapping parameters and variables to use when you run the session in Power Center. After you import the
mappings created from the mapping template into Power Center, you can use the mapping parameters and variables in the
session or workflow.
Note: If the Power Center mapping contains mapping parameters and variables, it is possible that the mapping parameters
and variables ($$ParameterName) may not work for all mappings you plan to create from the mapping template. Modify or
declare new mapping parameters and variables appropriate for running the new mappings created from the mapping
template.
9. Validate the mapping template.
10. Save the mapping template. Save changes to the mapping template drawing file.
11. Publish the mapping template. When you publish the mapping template, Mapping Architect for Visio generates a
mapping template XML file and a mapping template parameter file (param.xml).
If you make any change to the mapping template after publishing, you need to publish the mapping template again. Do not
edit the mapping template XML file.
Note: Mapping Architect for Visio fails to create a mapping template if you import a mapping that includes an unsupported
source type, target type, or mapping object.
Grid Processing
When a Power Center domain contains multiple nodes, you can configure workflows and sessions to run on a grid. When
you run a workflow on a grid, the Integration Service runs a service process on each available node of the grid to increase
performance and scalability. When you run a session on a grid, the Integration Service distributes session threads to
multiple DTM processes on nodes in the grid to increase performance and scalability.
You create the grid and configure the Integration Service in the Administration Console. To run a workflow on a grid, you
configure the workflow to run on the Integration Service associated with the grid. To run a session on a grid, configure the
session to run on the grid.
The Integration Service distributes workflow tasks and session threads based on how you configure the workflow or session
to run:
Running workflows on a grid. The Integration Service distributes workflows across the nodes in a grid. It also
distributes the Session, Command, and predefined Event-Wait tasks within workflows across the nodes in a grid.
Running sessions on a grid. The Integration Service distributes session threads across nodes in a grid.
Note: To run workflows on a grid, you must have the Server grid option. To run sessions on a grid, you must have the
Session on Grid option.
Running Workflows on a Grid:
When you run a workflow on a grid, the master service process runs the workflow and all tasks except Session, Command,
and predefined Event-Wait tasks, which it may distribute to other nodes. The master service process is the Integration
Service process that runs the workflow, monitors service processes running on other nodes, and runs the Load Balancer. The
Scheduler runs on the master service process node, so it uses the date and time for the master service process node to start
scheduled workflows.
The Load Balancer is the component of the Integration Service that dispatches Session, Command, and predefined EventWait tasks to the nodes in the grid. The Load Balancer distributes tasks based on node availability. If the Integration Service
is configured to check resources, the Load Balancer also distributes tasks based on resource availability.
For example, a workflow contains a Session task, a Decision task, and a Command task. You specify a resource
requirement for the Session task. The grid contains four nodes, and Node 4 is unavailable. The master service process runs
the Start and Decision tasks. The Load Balancer distributes the Session and Command tasks to
nodes on the grid based on resource availability and node availability.
Running Sessions on a Grid:
When you run a session on a grid, the master service process runs the workflow and all tasks except Session, Command,
and predefined Event-Wait tasks as it does when you run a workflow on a grid. The Scheduler runs on the master service
process node, so it uses the date and time for the master service process node to start scheduled workflows. In addition, the
Load Balancer distributes session threads to DTM processes running on different nodes.
When you run a session on a grid, the Load Balancer distributes session threads based on the following factors:
Node availability :- The Load Balancer verifies which nodes are currently running, enabled, and available for task
dispatch.
Resource availability :- If the Integration Service is configured to check resources, it identifies nodes that have
resources required by mapping objects in the session.
Partitioning configuration. The Load Balancer dispatches groups of session threads to separate nodes based on
the partitioning configuration.
You might want to configure a session to run on a grid when the workflow contains a session that takes a long time to run.
Grid Connectivity and Recovery
When you run a workflow or session on a grid, service processes and DTM processes run on different nodes. Network
failures can cause connectivity loss between processes running on separate nodes. Services may shut down unexpectedly, or
you may disable the Integration Service or service processes while a workflow or session is running. The Integration
Service failover and recovery behavior in these situations depends on the service process that is disabled, shuts down, or
loses connectivity. Recovery behavior also depends on the following factors:
High availability option:-When you have high availability, workflows fail over to another node if the node or
service shuts down. If you do not have high availability, you can manually restart a workflow on another node to
recover it.
Recovery strategy:- You can configure a workflow to suspend on error. You configure a recovery strategy for
tasks within the workflow. When a workflow suspends, the recovery behavior depends on the recovery strategy you
configure for each task in the workflow.
Shutdown mode:- When you disable an Integration Service or service process, you can specify that the service
completes, aborts, or stops processes running on the service. Behavior differs when you disable the Integration
Service or you disable a service process. Behavior also differs when you disable a master service process or a
worker service process. The Integration Service or service process may also shut down unexpectedly. In this case,
the failover and recovery behavior depend on which service process shuts down and the configured recovery
strategy.
Running mode:-If the workflow runs on a grid, the Integration Service can recover workflows and tasks on another
node. If a session runs on a grid, you cannot configure a resume recovery strategy.
Operating mode:- If the Integration Service runs in safe mode, recovery is disabled for sessions and workflows.
Note: You cannot configure an Integration Service to fail over in safe mode if it runs on a grid.
Workflow Variables
You can create and use variables in a workflow to reference values and record information. For example, use a Variable in a
Decision task to determine whether the previous task ran properly. If it did, you can run the next task.
If not, you can stop the workflow. Use the following types of workflow variables:
Predefined workflow variables. The Workflow Manager provides predefined workflow variables for tasks within
a workflow.
User-defined workflow variables. You create user-defined workflow variables when you create a workflow. Use
workflow variables when you configure the following types of tasks:
Assignment tasks. Use an Assignment task to assign a value to a user-defined workflow variable. For Example,
you can increment a user-defined counter variable by setting the variable to its current value plus 1.
Decision tasks. Decision tasks determine how the Integration Service runs a workflow. For example, use the Status
variable to run a second session only if the first session completes successfully.
Links. Links connect each workflow task. Use workflow variables in links to create branches in the workflow. For
example, after a Decision task, you can create one link to follow when the decision condition evaluates to true, and
another link to follow when the decision condition evaluates to false.
Timer tasks. Timer tasks specify when the Integration Service begins to run the next task in the workflow. Use a
user-defined date/time variable to specify the time the Integration Service starts to run the next task.
Use the following keywords to write expressions for user-defined and predefined workflow variables:
AND
OR
NOT
TRUE
FALSE
NULL
SYSDATE
Task-specific variables. The Workflow Manager provides a set of task-specific variables for each task in the
workflow. Use task-specific variables in a link condition to control the path the Integration Service takes when
running the workflow. The Workflow Manager lists task-specific variables under the task name in the Expression
Editor.
Built-in variables. Use built-in variables in a workflow to return run-time or system information such as folder
name, Integration Service Name, system date, or workflow start time. The Workflow Manager lists built-in
variables under the Built-in node in the Expression Editor.
Task-Specific
Variables
Description
Task
Types
Data type
Condition
Decision
Integer
Sample syntax:
$Dec_TaskStatus.Condition =
<TRUE | FALSE | NULL | any
integer>
End Time
Date/Time
ErrorCode
Integer
ErrorMsg
Nstring
Session
Integer
PrevTaskStatus
SrcFailedRows
Session
Nstring
Integer
Integer
StartTime
Date/Time
Status
Integer
TgtFailedRows
Integer
$s_dist_loc.TgtFailedRows = 0
TgtSuccessRows Total number of rows
Session
successfully written to the target.
Sample syntax:
$s_dist_loc.TgtSuccessRows > 0
Integer
Integer
Use a user-defined variable to determine when to run the session that updates the orders database at headquarters.
To configure user-defined workflow variables, complete the following steps:
1. Create a persistent workflow variable, $$WorkflowCount, to represent the number of times the workflow has run.
2. Add a Start task and both sessions to the workflow.
3. Place a Decision task after the session that updates the local orders database.Set up the decision condition to check to see
if the number of workflow runs is evenly divisible by 10. Use the modulus (MOD) function to do this.
4. Create an Assignment task to increment the $$WorkflowCount variable by one.
5. Link the Decision task to the session that updates the database at headquarters when the decision condition evaluates to
true. Link it to the Assignment task when the decision condition evaluates to false. When you configure workflow variables
using conditions, the session that updates the local database runs every time the workflow runs. The session that updates the
database at headquarters runs every 10th time the workflow runs.
Creating User-Defined Workflow Variables :
You can create workflow variables for a workflow in the workflow properties.
To create a workflow variable:
1. In the Workflow Designer, create a new workflow or edit an existing one.
2. Select the Variables tab.
3. Click Add.
4. Enter the information in the following table and click OK:
Field
Description
Name
Data type
Persistent
Default Value
Is Null
Description
5. To validate the default value of the new workflow variable, click the Validate button.
6. Click Apply to save the new workflow variable.
7. Click OK.
Interview Zone
Hi readers. These are the questions which normally i would expect by interviewee to know when i sit in panel. So what i
would request my readers to start posting your answers to this questions in the discussion forum under informatica
technical interview guidance tag and ill review them and only valid answers will be kept and rest will be deleted.
1. Explain your Project?
2. What are your Daily routines?
3. How many mapping have you created all together in your project?
4. In which account does your Project Fall?
5. What is your Reporting Hierarchy?
6. How many Complex Mappings have you created? Could you please me the situation for which you have
developed that Complex mapping?
7. What is your Involvement in Performance tuning of your Project?
8. What is the Schema of your Project? And why did you opt for that particular schema?
9. What are your Roles in this project?
10. Can I have one situation which you have adopted by which performance has improved dramatically?
11. Where you Involved in more than two projects simultaneously?
12. Do you have any experience in the Production support?
13. What kinds of Testing have you done on your Project (Unit or Integration or System or UAT)? And Enhancements
were done after testing?
14. How many Dimension Table are there in your Project and how are they linked to the fact table?
15. How do we do the Fact Load?
16. How did you implement CDC in your project?
17. How does your Mapping in File to Load look like?
18. How does your Mapping in Load to Stage look like?
19. How does your Mapping in Stage to ODS look like?
20. What is the size of your Data warehouse?
21. What is your Daily feed size and weekly feed size?
22. Which Approach (Top down or Bottom Up) was used in building your project?
23. How do you access your sources (are they Flat files or Relational)?
24. Have you developed any Stored Procedure or triggers in this project? How did you use them and in which situation?
25. Did your Project go live? What are the issues that you have faced while moving your project from the Test
Environment to the Production Environment?
26. What is the biggest Challenge that you encountered in this project?
27. What is the scheduler tool you have used in this project? How did you schedule jobs using it?
Informatica Experienced Interview Questions part 1
1. Difference between Informatica 7x and 8x?
2. Difference between connected and unconnected lookup transformation in Informatica?
3. Difference between stop and abort in Informatica?
4. Difference between Static and Dynamic caches?
5. What is Persistent Lookup cache? What is its significance?
6. Difference between and reusable transformation and mapplet?
7. How the Informatica server sorts the string values in Rank transformation?
8. Is sorter an active or passive transformation? When do we consider it to be active and passive?
9. Explain about Informatica server Architecture?
10. In update strategy Relational table or flat file which gives us more performance? Why?
11. What are the out put files that the Informatica server creates during running a session?
12. Can you explain what are error tables in Informatica are and how we do error handling in Informatica?
13. Difference between constraint base loading and target load plan?
14. Difference between IIF and DECODE function?
15. How to import oracle sequence into Informatica?
16. What is parameter file?
17. Difference between Normal load and Bulk load?
18. How u will create header and footer in target using Informatica?
19. What are the session parameters?
20. Where does Informatica store rejected data? How do we view them?
21. What is difference between partitioning of relational target and file targets?
22. What are mapping parameters and variables in which situation we can use them?
23. What do you mean by direct loading and Indirect loading in session properties?
24. How do we implement recovery strategy while running concurrent batches?
25. Explain the versioning concept in Informatica?
26. What is Data driven?
27. What is batch? Explain the types of the batches?
28. What are the types of meta data repository stores?
29. Can you use the mapping parameters or variables created in one mapping into another mapping?
30. Why did we use stored procedure in our ETL Application?
31. When we can join tables at the Source qualifier itself, why do we go for joiner transformation?
32. What is the default join operation performed by the look up transformation?
33. What is hash table Informatica?
34. In a joiner transformation, you should specify the table with lesser rows as the master table. Why?
35. Difference between Cached lookup and Un-cached lookup?
36. Explain what DTM does when you start a work flow?
37. Explain what Load Manager does when you start a work flow?
38. In a Sequential batch how do i stop one particular session from running?
39. What are the types of the aggregations available in Informatica?
40. How do I create Indexes after the load process is done?
41. How do we improve the performance of the aggregator transformation?
42. What are the different types of the caches available in Informatica? Explain in detail?
43. What is polling?
44. What are the limitations of the joiner transformation?
45. What is Mapplet?
46. What are active and passive transformations?
47. What are the options in the target session of update strategy transformation?
48. What is a code page? Explain the types of the code pages?
49. What do you mean rank cache?
50. How can you delete duplicate rows with out using Dynamic Lookup? Tell me any other ways using lookup delete
the duplicate rows?
51. Can u copy the session in to a different folder or repository?
52. What is tracing level and what are its types?
53. What is a command that used to run a batch?
54. What are the unsupported repository objects for a mapplet?
55. If your workflow is running slow, what is your approach towards performance tuning?
56. What are the types of mapping wizards available in Informatica?
57. After dragging the ports of three sources (Sql server, oracle, Informix) to a single source qualifier, can we map
these three ports directly to target?
58. Why we use stored procedure transformation?
59. Which object is required by the debugger to create a valid debug session?
60. Can we use an active transformation after update strategy transformation?
61. Explain how we set the update strategy transformation at the mapping level and at the session level?
62. What is exact use of 'Online' and 'Offline' server connect Options while defining Work flow in Work flow monitor?
The system hangs when 'Online' Server connect option. The Informatica is installed on a Personal laptop.
63. What is change data capture?
64. Write a session parameter file which will change the source and targets for every session. i.e different source and
targets for each session run ?
65. What are partition points?
66. What are the different threads in DTM process?
67. Can we do ranking on two ports? If yes explain how?
68. What is Transformation?
69. What does stored procedure transformation do in special as compared to other transformation?
70. How do you recognize whether the newly added rows got inserted or updated?
71. What is data cleansing?
72. My flat files size is 400 MB and I want to see the data inside the FF with out opening it? How do I do that?
73. Difference between Filter and Router?
74. How do you handle the decimal places when you are importing the flat file?
75. What is the difference between $ & $$ in mapping or parameter file? In which case they are generally used?
76. While importing the relational source definition from database, what are the meta data of source U import?
77. Difference between Power mart & Power Center?
78. What kinds of sources and of targets can be used in Informatica?
79. If a sequence generator (with increment of 1) is connected to (say) 3 targets and each target uses the NEXTVAL
port, what value will each target get?
80. What do you mean by SQL override?
81. What is a shortcut in Informatica?
82. How does Informatica do variable initialization? Number/String/Date
83. How many different locks are available for repository objects
84. What are the transformations that use cache for performance?
85. What is the use of Forward/Reject rows in Mapping?
86. How many ways you can filter the records?
87. How to delete duplicate records from source database/Flat Files? Can we use post sql to delete these records. In
case of flat file, how can you delete duplicates before it starts loading?
88. You are required to perform bulk loading using Informatica on Oracle, what action would perform at Informatica
+ Oracle level for a successful load?
89. What precautions do you need take when you use reusable Sequence generator transformation for concurrent
sessions?
90. Is it possible negative increment in Sequence Generator? If yes, how would you accomplish it?
91. Which directory Informatica looks for parameter file and what happens if it is missing when start the session? Does
session stop after it starts?
92. Informatica is complaining about the server could not be reached? What steps would you take?
93. You have more five mappings use the same lookup. How can you manage the lookup?
94. What will happen if you copy the mapping from one repository to another repository and if there is no identical
source?
95. How can you limit number of running sessions in a workflow?
96. An Aggregate transformation has 4 ports (l sum (col 1), group by col 2, col3), which port should be the output?
97. What is a dynamic lookup and what is the significance of NewLookupRow? How will use them for rejecting
duplicate records?
98. If you have more than one pipeline in your mapping how will change the order of load?
99. When you export a workflow from Repository Manager, what does this xml contain? Workflow only?
100.
Your session failed and when you try to open a log file, it complains that the session details are not
available. How would do trace the error? What log file would you seek for?
101.
You want to attach a file as an email attachment from a particular directory using email task in
Informatica, How will you do it?
102.
You have a requirement to alert you of any long running sessions in your workflow. How can you create a
workflow that will send you email for sessions running more than 30 minutes. You can use any method, shell script,
procedure or Informatica mapping or workflow control?
Data warehousing Concepts Based Interview Questions
1. What is a data-warehouse?
2. What are Data Marts?
3. What is ER Diagram?
4. What is a Star Schema?
5. What is Dimensional Modelling?
6. What Snow Flake Schema?
7. What are the Different methods of loading Dimension tables?
8. What are Aggregate tables?
9. What is the Difference between OLTP and OLAP?
10. What is ETL?
11. What are the various ETL tools in the Market?
12. What are the various Reporting tools in the Market?
13. What is Fact table?
14. What is a dimension table?
15. What is a lookup table?
16. What is a general purpose scheduling tool? Name some of them?
17. What are modeling tools available in the Market? Name some of them?
18. What is real time data-warehousing?
19. What is data mining?
20. What is Normalization? First Normal Form, Second Normal Form , Third Normal Form?
21. What is ODS?
22. What type of Indexing mechanism do we need to use for a typical
Data warehouse?
23. Which columns go to the fact table and which columns go the dimension table? (My user needs to see <data
element<data element broken by <data element<data element>
All elements before broken = Fact Measures
All elements after broken = Dimension Elements
24. What is a level of Granularity of a fact table? What does this signify?(Weekly level summarization there is no need to
have Invoice Number in the fact table anymore)
25. How are the Dimension tables designed? De-Normalized, Wide, Short, Use Surrogate Keys, Contain Additional date
fields and flags.
26. What are slowly changing dimensions?
27. What are non-additive facts? (Inventory,Account balances in bank)
28. What are conformed dimensions?
29. What is VLDB? (Database is too large to back up in a time frame then it's a VLDB)
30. What are SCD1, SCD2 and SCD3?