Sunteți pe pagina 1din 10

Informatica PowerCenter

1. PowerCenter Domain
2. PowerCenter Repository
3. Administration Console
4. PowerCenter Client
5. Repository Service
6. Integration Service
PowerCenter Domain
A domain is the primary unit for management and administration of services in
PowerCenter. Node, Service Manager and Application Services are components of a
domain.
Node
Node is the logical representation of a machine in a domain. The machine in which the
PowerCenter is installed acts as a Domain and also as a primary node. We can add other
machines as nodes in the domain and configure the nodes to run application services such
as the Integration Service or Repository Service. All service requests from other nodes in
the domain go through the primary node also called as master gateway.
The Service Manager
The Service Manager runs on each node within a domain and is responsible for starting
and running the application services. The Service Manager performs the following
functions,

Alerts. Provides notifications of events like shutdowns, restart


Authentication. Authenticates user requests from the Administration Console,
PowerCenter Client, Metadata Manager, and Data Analyzer
Domain configuration. Manages configuration details of the domain like machine
name, port
Node configuration. Manages configuration details of a node metadata like
machine name, port
Licensing. When an application service connects to the domain for the first time
the licensing registration is performed and for subsequent connections the
licensing information is verified
Logging. Manages the event logs from each service, the messages could be
Fatal, Error, Warning, Info
User management. Manages users, groups, roles, and privileges

Application services

The services that essentially perform data movement, connect to different data sources
and manage data are called Application services, they are namely Repository Service,
Integration Service, Web Services Hub, SAPBW Service, Reporting Service and
Metadata Manager Service. The application services run on each node based on the way
we configure the node and the application service
Domain Configuration
Some of the configurations for a domain involves assigning host name, port numbers to
the nodes, setting up Resilience Timeout values, providing connection information of
metadata Database, SMTP details etc. All the Configuration information for a domain is
stored in a set of relational database tables within the repository. Some of the global
properties that are applicable for Application Services like Maximum Restart Attempts,
Dispatch Mode as Round Robin/Metric Based/Adaptive etc are configured under
Domain Configuration
2. PowerCenter Repository
The PowerCenter Repository is one of best metadata storage among all ETL products.
The repository is sufficiently normalized to store metadata at a very detail level; which in
turn means the Updates to the repository are very quick and the overall Team-based
Development is smooth. The repository data structure is also useful for the users to do
analysis and reporting.
Accessibility to the repository through MX views and SDK kit extends the repositories
capability from a simple storage of technical data to a database for analysis of the ETL
metadata.
PowerCenter Repository is a collection of 355 tables which can be created on any major
relational database. The kinds of information that are stored in the repository are,
1.
2.
3.
4.
5.

Repository configuration details


Mappings
Workflows
User Security
Process Data of session runs

For a quick understanding,


When a user creates a folder, corresponding entries are made into table OPB_SUBJECT;
attributes like folder name, owner id, type of the folder like shared or not are all stored.
When we create\import sources and define field names, datatypes etc in source analyzer
entries are made into opb_src and OPB_SRC_FLD.
When target and related fields are created/imported from any database entries are made
into tables like OPB_TARG and OPB_TARG_FLD.
Table OPB_MAPPING stores mapping attributes like Mapping Name, Folder Id, Valid

status and mapping comments.


Table OPB_WIDGET stores attributes like widget type, widget name, comments etc.
Widgets are nothing but the Transformations which Informatica internally calls them as
Widgets.
Table OPB_SESSION stores configurations related to a session task and table
OPB_CNX_ATTR stores information related to connection objects.
Table OPB_WFLOW_RUN stores process details like workflow name, workflow started
time, workflow completed time, server node it ran etc.
REP_ALL_SOURCES, REP_ALL_TARGETS and REP_ALL_MAPPINGS are few of
the many views created over these tables.
PowerCenter applications access the PowerCenter repository through the Repository
Service. The Repository Service protects metadata in the repository by managing
repository connections and using object-locking to ensure object consistency.
We can create a repository as global or local. We can go forglobal to store common
objects that multiple developers can use through shortcuts and go for local repository to
perform of development mappings and workflows. From a local repository, we can create
shortcuts to objects in shared folders in the global repository. PowerCenter supports
versioning. A versioned repository can store multiple versions of an object
3. Administration Console
The Administration Console is a web application that we use to administer the PowerCenter
domain and PowerCenter security. There are two pages in the console, Domain Page &
Security Page.
We can do the following In Domain Page:
o Create & manage application services like Integration Service and Repository Service
o Create and manage nodes, licenses and folders
o Restart and shutdown nodes
o View log events
o Other domain management tasks like applying licenses and managing grids and
resources
We can do the following in Security Page:
o Create, edit and delete native users and groups
o Configure a connection to an LDAP directory service. Import users and groups from the
LDAP directory service
o Create, edit and delete Roles (Roles are collections of privileges)
o Assign roles and privileges to users and groups
o Create, edit, and delete operating system profiles. An operating system profile is a level
of security that the Integration Services uses to run workflows
4. PowerCenter Client
Designer, Workflow Manager, Workflow Monitor, Repository Manager & Data Stencil are five
client tools that are used to design mappings, Mapplets, create sessions to load data and
manage repository.
Mapping is an ETL code pictorially depicting logical data flow from source to target involving
transformations of the data. Designer is the tool to create mappings
Designer has five window panes, Source Analyzer, Warehouse Designer, Transformation
Developer, Mapping Designer and Mapplet Designer.
Source Analyzer:
Allows us to import Source table metadata from Relational databases, flat files, XML and
COBOL files. We can only import the source definition in the source Analyzer and not the

source data itself is to be understood. Source Analyzer also allows us to define our own
Source data definition.
Warehouse Designer:
Allows us to import target table definitions which could be Relational databases, flat files,
XML and COBOL files. We can also create target definitions manually and can group them
into folders. There is an option to create the tables physically in the database that we do not
have in source analyzer. Warehouse designer doesnt allow creating two tables with same
name even if the columns names under them vary or they are from different
databases/schemas.
Transformation Developer:
Transformations like Filters, Lookups, Expressions etc that have scope to be re-used are
developed in this pane. Alternatively Transformations developed in Mapping Designer can
also be reused by checking the optionre-use and by that it would be displayed under
Transformation Developer folders.
Mapping Designer:
This is the place where we actually depict our ETL process; we bring in source definitions,
target definitions, transformations like filter, lookup, aggregate and develop a logical ETL
program. In this place it is only a logical program because the actual data load can be done
only by creating a session and workflow.
Mapplet Designer:
We create a set of transformations to be used and re-used across mappings
Workflow Manager : In the Workflow Manager, we define a set of instructions called a
workflow to execute mappings we build in the Designer. Generally, a workflow contains a session
and any other task we may want to perform when we run a session. Tasks can include a session,
email notification, or scheduling information.
A set of tasks grouped together becomes worklet. After we create a workflow, we run the
workflow in the Workflow Manager and monitor it in the Workflow Monitor. Workflow Manager has
following three window panes,Task Developer, Create tasks we want to accomplish in the
workflow. Worklet Designer, Create a worklet in the Worklet Designer. A worklet is an object that
groups a set of tasks. A worklet is similar to a workflow, but without scheduling information. You
can nest worklets inside a workflow. Workflow Designer, Create a workflow by connecting tasks
with links in the Workflow Designer. We can also create tasks in the Workflow Designer as you
develop the workflow. The ODBC connection details are defined in Workflow Manager
Connections Menu .
Workflow Monitor : We can monitor workflows and tasks in the Workflow Monitor. We can
view details about a workflow or task in Gantt Chart view or Task view. We can run, stop, abort,
and resume workflows from the Workflow Monitor. We can view sessions and workflow log events
in the Workflow Monitor Log Viewer.
The Workflow Monitor displays workflows that have run at least once. The Workflow Monitor
continuously receives information from the Integration Service and Repository Service. It also
fetches information from the repository to display historic information.
The Workflow Monitor consists of the following windows:
Navigator window Displays monitored repositories, servers, and repositories
objects.
Output window Displays messages from the Integration Service and Repository
Service.
Time window Displays progress of workflow runs.

Gantt chart view Displays details about workflow runs in chronological format.
Task view Displays details about workflow runs in a report format.
Repository Manager
We can navigate through multiple folders and repositories and perform basic repository tasks
with the Repository Manager. We use the Repository Manager to complete the following tasks:
2. Add and connect to a repository, we can add repositories to the Navigator window
and client registry and then connect to the repositories.
3. Work with PowerCenter domain and repository connections, we can edit or remove
domain connection information. We can connect to one repository or multiple
repositories. We can export repository connection information from the client registry to a
file. We can import the file on a different machine and add the repository connection
information to the client registry.
4. Change your password. We can change the password for our user account.
5. Search for repository objects or keywords. We can search for repository objects
containing specified text. If we add keywords to target definitions, use a keyword to
search for a target definition.
6. View objects dependencies. Before we remove or change an object, we can view
dependencies to see the impact on other objects.
7. Compare repository objects. In the Repository Manager, wecan compare two
repository objects of the same type to identify differences between the objects.
8. Truncate session and workflow log entries. we can truncate the list of session and
workflow logs that the Integration Service writes to the repository. we can truncate all
logs, or truncate all logs older than a specified date.
5. Repository Service
As we already discussed about metadata repository, now we discuss a
separate,multi-threaded process that retrieves, inserts and updates metadata in the
repository database tables, it is Repository Service.
Repository service manages connections to the PowerCenter repository from
PowerCenter client applications like Desinger, Workflow Manager, Monitor, Repository
manager, console and integration service. Repository service is responsible for
ensuring the consistency of metdata in the repository.
Creation & Properties:
Use the PowerCenter Administration Console Navigator window to create a Repository
Service. The properties needed to create are,
Service Name name of the service like rep_SalesPerformanceDev
Location Domain and folder where the service is created
License license service name
Node, Primary Node & Backup Nodes Node on which the service process runs

CodePage The Repository Service uses the character set encoded in the repository
code page when writing data to the repository
Database type & details Type of database, username, pwd, connect string and
tablespacename
The above properties are sufficient to create a repository service, however we can
take a look at following features which are important for better performance and
maintenance.
General Properties
> OperatingMode: Values are Normal and Exclusive. Use Exclusive mode to perform
administrative tasks like enabling version control or promoting local to global
repository
> EnableVersionControl: Creates a versioned repository
Node Assignments: High availability option is licensed feature which allows us to
choose Primary & Backup nodes for continuous running of the repository service.
Under normal licenses would see only only Node to select from
Database Properties
> DatabaseArrayOperationSize: Number of rows to fetch each time an array database
operation is issued, such as insert or fetch. Default is 100
> DatabasePoolSize:Maximum number of connections to the repository database that
the Repository Service can establish. If the Repository Service tries to establish more
connections than specified for DatabasePoolSize, it times out the connection attempt
after the number of seconds specified for DatabaseConnectionTimeout
Advanced Properties
> CommentsRequiredFor Checkin: Requires users to add comments when checking in
repository objects.
> Error Severity Level: Level of error messages written to the Repository Service log.
Specify one of the following message levels: Fatal, Error, Warning, Info, Trace &
Debug
> EnableRepAgentCaching:Enables repository agent caching. Repository agent
caching provides optimal performance of the repository when you run workflows.
When you enable repository agent caching, the Repository Service ppository. we can
truncate all logs, or truncate all logs older than a specified date.
5. Repository Service
As we already discussed about metadata repository, now we discuss a
separate,multi-threaded process that retrieves, inserts and updates metadata in the
repository database tables, it is Repository Service.
Repository service manages connections to the PowerCenter repository from
PowerCenter client applications like Desinger, Workflow Manager, Monitor, Repository
manager, console and integration service. Repository service is responsible for
ensuring the consistency of metdata in the repository.
Creation & Properties:
Use the PowerCenter Administration Console Navigator window to create a Repository
Service. The properties needed to create are,
Service Name name of the service like rep_SalesPerformanceDev

Location Domain and folder where the service is created


License license service name
Node, Primary Node & Backup Nodes Node on which the service process runs
CodePage The Repository Service uses the character set encoded in the repository
code page when writing data to the repository
Database type & details Type of database, username, pwd, connect string and
tablespacename
The above properties are sufficient to create a repository service, however we can
take a look at following features which are important for better performance and
maintenance.
General Properties
> OperatingMode: Valuee Integration Service process running on the same node.
For example, the Integration Service reads from and writes to databases using the
UTF-8 code page. The Integration Service requires that the code page environment
variable be set to UTF-8. However, you have a Shift-JIS repository that requires that
the code page environment variable be set to Shift-JIS. Set the environment variable
on the node to UTF-8. Then add the environment variable to the Repository Service
process properties and set the value to Shift-JIS.

6. Integration Service (IS)


The key functions of IS are

Interpretation of the workflow and mapping metadata from the repository.


Execution of the instructions in the metadata
Manages the data from source system to target system within the memory and
disk

The main three components of Integration Service which enable data movement are,

Integration Service Process


Load Balancer
Data Transformation Manager

6.1 Integration Service Process (ISP)


The Integration Service starts one or more Integration Service processes to run and
monitor workflows. When we run a workflow, the ISP starts and locks the workflow, runs
the workflow tasks, and starts the process to run sessions. The functions of the Integration
Service Process are,

6.2

Locks and reads the workflow


Manages workflow scheduling, ie, maintains session dependency
Reads the workflow parameter file
Creates the workflow log
Runs workflow tasks and evaluates the conditional links
Starts the DTM process to run the session
Writes historical run information to the repository
Sends post-session emails

Load Balancer

The Load Balancer dispatches tasks to achieve optimal performance. It dispatches tasks
to a single node or across the nodes in a grid after performing a sequence of steps. Before
understanding these steps we have to know about Resources, Resource Provision
Thresholds, Dispatch mode and Service levels

Resources we can configure the Integration Service to check the resources


available on each node and match them with the resources required to run the
task. For example, if a session uses an SAP source, the Load Balancer dispatches
the session only to nodes where the SAP client is installed
Three Resource Provision Thresholds, The maximum number of runnable
threads waiting for CPU resources on the node called Maximum CPU Run Queue
Length. The maximum percentage of virtual memory allocated on the node
relative to the total physical memory size called Maximum Memory %. The
maximum number of running Session and Command tasks allowed for each
Integration Service process running on the node called Maximum Processes
Three Dispatch modes Round-Robin: The Load Balancer dispatches tasks to
available nodes in a round-robin fashion after checking the Maximum Process
threshold. Metric-based: Checks all the three resource provision thresholds and
dispatches tasks in round robin fashion. Adaptive: Checks all the three resource
provision thresholds and also ranks nodes according to current CPU availability
Service Levels establishes priority among tasks that are waiting to be dispatched,
the three components of service levels are Name, Dispatch Priority and Maximum
dispatch wait time. Maximum dispatch wait time is the amount of time a task
can wait in queue and this ensures no task waits forever

A .Dispatching Tasks on a node


1. The Load Balancer checks different resource provision thresholds on the node
depending on the Dispatch mode set. If dispatching the task causes any threshold
to be exceeded, the Load Balancer places the task in the dispatch queue, and it
dispatches the task later
2. The Load Balancer dispatches all tasks to the node that runs the master
Integration Service process
B. Dispatching Tasks on a grid,

1. The Load Balancer verifies which nodes are currently running and enabled
2. The Load Balancer identifies nodes that have the PowerCenter resources required
by the tasks in the workflow
3. The Load Balancer verifies that the resource provision thresholds on each
candidate node are not exceeded. If dispatching the task causes a threshold to be
exceeded, the Load Balancer places the task in the dispatch queue, and it
dispatches the task later
4. The Load Balancer selects a node based on the dispatch mode

6.3 Data Transformation Manager (DTM) Process


When the workflow reaches a session, the Integration Service Process starts the DTM
process. The DTM is the process associated with the session task. The DTM process
performs the following tasks:

Retrieves and validates session information from the repository.


Validates source and target code pages.
Verifies connection object permissions.
Performs pushdown optimization when the session is configured for pushdown
optimization.
Adds partitions to the session when the session is configured for dynamic
partitioning.
Expands the service process variables, session parameters, and mapping variables
and parameters.
Creates the session log.
Runs pre-session shell commands, stored procedures, and SQL.
Sends a request to start worker DTM processes on other nodes when the session is
configured to run on a grid.
Creates and runs mapping, reader, writer, and transformation threads to extract,
transform, and load data
Runs post-session stored procedures, SQL, and shell commands and sends postsession email
After the session is complete, reports execution result to ISP
Pictorial Representation of Workflow execution:

1.
2.
3.
4.

A PowerCenter Client request IS to start workflow


IS starts ISP
ISP consults LB to select node
ISP starts DTM in node selected by LB

S-ar putea să vă placă și