Documente Academic
Documente Profesional
Documente Cultură
While every attempt has been made to ensure that the information in this document is accurate and complete,
some typographical errors or technical inaccuracies may exist. Informatica does not accept responsibility for any
kind of loss resulting from the use of information contained in this document. The information contained in this
document is subject to change without notice.
The incorporation of the product attributes discussed in these materials into any release or upgrade of any
Informatica software product—as well as the timing of any such release or upgrade—is at the sole discretion of
Informatica.
Protected by one or more of the following U.S. Patents: 6,032,158; 5,794,246; 6,014,670; 6,339,775;
6,044,374; 6,208,990; 6,208,990; 6,850,947; 6,895,471; or by the following pending U.S. Patents:
09/644,280; 10/966,046; 10/727,700.
Table of Contents
Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
Pushdown Optimization 1
Executive Summary
Over the next five to 10 years and beyond, the two dominant variables in the enterprise data
integration equation are painfully clear—more data and less time. Given these, what’s the right
data integration strategy to effectively manage terabytes or even hundreds of terabytes of data
with enough flexibility and adaptability to cope with future growth?
Historically, data integration was performed by developing hand-coded programs that extract
data from source systems, apply business/transformation logic and then populate the appropriate
downstream system, be it a staging area, data warehouse or other application interface.
Hand-coding has been replaced, in many instances, by data integration software that performs
“Helping to overcome the challenges the access, discovery, integration, and delivery of data using an “engine” or “data integration
of implementing data integration as an server” and visual tools to map and execute the desired process. Driven by accelerated
enterprise-wide function, PowerCenter 8 productivity gains and ever-increasing performance, “state of the art” data integration platforms,
offers key new features that can enable such as Informatica® PowerCenter®, handle the vast majority of today’s scenarios quite effectively.
near-universal data access, deliver greater PowerCenter has enjoyed wide acceptance and use by high-volume customers representing
performance and scalability, and signifi- companies and government organizations of all sizes. Based on this use, Informatica has
cantly increase developer productivity. identified performance scenarios where processing data in a source or target database—instead
The push-down logic will allow us to of within the data integration server—can lead to significant performance gains. These scenarios
take further advantage of our database are primarily where data is “co-located” within a common database instance, such as when
processing power.” staging and production reside in a single Oracle relational database management system
(RDBMS) or where a large investment has been made in database hardware and software that
Mark Cothron can provide additional processing power.
Data Integration Architect,
Ace Hardware With these scenarios in mind, Informatica Corporation set out to deliver a solution that
delivers the best of both worlds without incurring undo configuration and management burden;
a solution that best leverages the performance capabilities of its data integration server
and/or the processing power of a relational database interchangeably to optimize the use
of available resources.
2
White Paper
Informatica has developed a solution that offers IT architects flexibility and ease of performance
optimization through “push down” processing into a relational database using the same
metadata-driven mapping and execution architecture: the PowerCenter Pushdown Optimization
Option now available through Informatica PowerCenter 8. PowerCenter 8 is the latest release of
Informatica’s single, unified enterprise data integration platform for accessing and integrating
data from virtually any business system, in any format, and delivering that data throughout the
enterprise at any speed.
This white paper describes the flexibility, performance optimization, and leverage provided by the
PowerCenter 8 Pushdown Optimization Option. It examines the historical approaches to data
integration and describes how a combined engine- and RDBMS-based approach to data
integration can help the enterprise:
• Cost-effectively scale by using a flexible, adaptable data integration architecture
• Increase developer and team productivity
• Save costs through greater leverage of RDBMS and hardware investments
• Eliminate the need to write custom-coded solutions
• Easily adapt to changes in underlying RDBMS architecture
• Maintain visibility and control of data integration processes
After reading this paper, you will understand how pushdown processing works, the option’s
technical capabilities, and how these capabilities will benefit your environment.
Pushdown Optimization 3
Historical Approaches to Data Integration
Historically, there have been four approaches to data integration:
1. Hand-coding. Since the early days of data processing, IT has attempted to solve integration
problems through development of hand-coded programs. These efforts still proliferate in many
mainframe environments, data migration projects, and other scenarios where manual labor is
applied to extract, transform, and move data for the purposes of integration. The high risks,
escalating costs, and lack of compliance associated with hand-coded efforts are well
documented, especially in today’s environment of heightened regulatory oversight and the
THE POWERCENTER PUSHDOWN need for data transparency. Early on, solutions for automation emerged to replace hand-
coding as an alternative cost effective solution.
OPTIMIZATION OPTION
2. Code generators. The first early attempts at increasing IT efficiency led to the development of
Automatically generates and “pushes down” code generation frameworks that leveraged visual tools to map out processes and data flow
but then generated and compiled code as the resultant run-time solution. Code generators
mapping logic were a step-up from hand-coding for developers, but this approach did not gain widespread
adoption as solution requirements and IT architecture complexity arose and the issues around
• Generates database-specific logic that code maintenance, lack of visibility through metadata, and inaccuracies in the generation
represents overall data flow process led to higher rather than lower costs.
3. RDBMS-centric SQL Code generators. An offspring of early generation code generators
• Pushes the execution of the logic into the
emerged from the database vendors themselves. Using the database as an “engine” and SQL
database to perform data transformation as a language, RDBMS vendors delivered offerings that centered on their “flavor” of database
processing programming. Unfortunately, these products exposed the lack of capability of the SQL
language and the database-specific extensions (e.g., PL/SQL, stored procedures) to handle
Provides a single design environment with cross-platform data issues; XML data; the full range of functions such as data quality,
an easy-to-use GUI profiling, and conditional aggregation; and the rest of the complete range of business
logic needed for enterprise data integration. What these products did prove was that for
certain scenarios, the horsepower of the relational database can be effectively used for
• Decouples data transformation logic
data integration.
from the physical execution plan
4. Metadata-driven engines. Informatica pioneered a data integration approach that leveraged
• Controls where processing takes place a data server, or “engine,” powered by open, interpreted metadata as the workhorse for
transformation processing. This approach addressed complexity and met the needs for
• Dynamically creates and executes performance. It also provided the added benefit of re-use and openness due to its meta
database specific transformation language data-centricity. Others have since copied this approach through other types of engines
and languages, but it wasn’t until this metadata-driven, engine-based approach was widely
• Allows you to preview the processing adopted by the market as the preferred method for saving costs and rapidly delivering
on data integration requirements that extraction, transformation, and loading (ETL) was established
you can push to the database as a proven technology. Figure 1 shows this engine-based data integration approach.
Leverages a single, unified data
integration platform
Metadata
Data Repository
Sources
4
White Paper
Pushdown Optimization 5
Overview of Pushdown Processing
Separating logical business logic from physical run-time execution, the Pushdown Optimization
Option is coupled with the creation and management of workflows. Workflows tie the execution
of a metadata-based mapping to an actual physical environment. This environment spans not
only the PowerCenter Data Integration Services that may reside on multiple hardware systems,
but also the relational databases where pushdown processing will occur. As shown in Figure 2,
data integration solution architects can configure the pushdown strategy through a simple drop-
down menu in the PowerCenter 8 Workflow Manager.
Figure 2: Data Integration Solution Architects Can Configure the Pushdown Strategy through a Simple Drop-Down
Menu in the Powercenter 8 Workflow Manager
Pushdown optimization can be used to push data transformation logic to the source or target
database. The amount of work data integration solution architects can push to the database
depends on the pushdown optimization configuration, the data transformation logic, and the
mapping configuration.
When pushdown optimization is used, PowerCenter writes one or more SQL statements to the
source or target database based on the data transformation logic. PowerCenter analyzes the
data transformation logic and mapping configuration to determine the data transformation logic
it can push to the database. At run time, PowerCenter executes any SQL statement generated
against the source or target tables, and it processes any data transformation logic within
PowerCenter that it cannot push to the database.
Using pushdown processing can improve performance and optimize available resources. For
example, PowerCenter can push the data transformation logic for the mapping seen in Figure 2
to the source database.
6
White Paper
The mapping contains a filter transformation that filters out all items except for those with an ID
greater than 1005. PowerCenter can push the data transformation logic to the database, and it
generates the following SQL statement to process the data transformation logic:
INSERT INTO ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC, n_PRICE) SELECT ITEMS.ITEM_ID,
ITEMS.ITEM_NAME, ITEMS.ITEM_DESC, CAST(ITEMS.PRICE AS INTEGER) FROM ITEMS WHERE
(ITEMS.ITEM_ID >1005)
PowerCenter generates an INSERT SELECT statement to obtain and insert the ID, NAME, and
DESCRIPTION columns from the source table, and it filters the data using a WHERE clause.
PowerCenter does not extract any data from the database during this process. Because
PowerCenter does not need to extract and load data, performance improves and resources
are maximized.
Pushdown Optimization 7
Partial Pushdown Processing
Partial pushdown processing occurs when either the source and target systems are in different
database instances, or only some of the data transformation logic can be represented in SQL.
In such cases, some processing may be pushed into source database, some processing occurs
inside PowerCenter, and some processing may be pushed to the target database.
In Figure 4, all transformations up to and including the aggregate transformation are pushed into
the source database. The Update Strategy transformation is executed within PowerCenter, and
the Expression transformation is executed inside the target database.
Figure 4 shows an example of partial pushdown processing.
In Figure 5, the sources and targets are the same instance, and the data transformation logic
can be pushed to the database. The work of the filtering, joining, and sorting the data is
performed by the database, freeing PowerCenter resources to perform other tasks. However, the
transformation logic is represented in PowerCenter, so it is platform independent and easy to
modify. The visual representation makes it simple to review the flow of logic, and the Pushdown
Optimizer Viewer allows you to preview the SQL statements PowerCenter will execute at run time.
8
White Paper
Figure 6: Transformation Types that Can Be Pushed to The Database Using PowerCenter
With the PowerCenter Pushdown Optimization Option, data integration solution architects
canleverage both the database and PowerCenter’s capabilities by pushing some transformation
logic to the database and processing other data transformation logic using PowerCenter.
Pushdown Optimization 9
For example, users might have a mapping that filters and sorts data and then outputs the data
to an XML target. To utilize database and PowerCenter capabilities to their fullest potential, data
integration solution architects might push the transformation logic for the Source Qualifier, Filter,
and Sorter transformations to the source database, and then the extract the data to output it to
the XML target.
Figure 7 shows a mapping that uses database capabilities and PowerCenter’s XML capabilities.
Figure 7: Mapping Pushes Transformation Logic to the Source and Writes to an XML Target
Increased Performance
The PowerCenter Pushdown Optimization Option increases systems performance by providing
the flexibility to push data transformation processing to the most appropriate processing
resource, whether within a source or target database or through the PowerCenter server. With
this option, PowerCenter is the only enterprise data integration software on the market that
allows data integration solution architects to choose when pushing down processing offers
a performance advantage.
With the PowerCenter Pushdown Optimization Option, data integration solution architects can
choose to push all or part of the data transformation logic to the source or target database.
Data integration solution architects can select the database they want to push transformation
logic to, and they can choose to push some sessions to the database, while allowing
PowerCenter to process other sessions.
For example, let’s say an IT organization has an Oracle source database with very low user
activity. This organization may choose to push transformation logic for all sessions that run on
this database. In contrast, let’s say an IT organization has a Teradata® source database with
heavy user activity. This organization may choose to allow PowerCenter to process the
transformation logic for sessions that run on this database. In this way, the sessions can
be tuned to work with the load on each database, optimizing performance.
With the PowerCenter Pushdown Optimization Option, data integration solution architects can
also use variables to choose to push different volumes of transformation logic to the source or
target database at different times during the day. For example, partial pushdown optimization
may be used during the peak hours of the day, but full pushdown optimization is used from
midnight until 2 a.m. when activity is low.
10
White Paper
Pushdown Optimization 11
Reduced Risk and Enhanced Flexibility
IT organizations typically support several different relational databases. Even when they are
able to standardize on a single RDBMS, changing business conditions—resulting from mergers
and acquisitions, cost cutting, etc.—dictate that they need to be prepared to support multiple
relational databases architectures. IT organizations need to be able to fully leverage the
capabilities of each type of database, and yet stay agile enough to rapidly integrate other
types of databases as the need arises. New regulatory and governance requirements also
dictate increased visibility and control into the business rules applied to data as it moves
throughout the enterprise.
PowerCenter reduce the risk of changing database architectures and enhances flexibility by being
database-neutral. PowerCenter’s metadata-driven architecture extends to mappings that leverage
the Pushdown Optimization Option. The appropriate database-specific logic can be easily
regenerated post-database change, providing flexibility of choice and ease of change. Leveraging
metadata analysis and reporting, rather than having business logic tied to vendor-specific hand-
coded logic, enables effective data governance and transparency.
12
White Paper
Pushdown Optimization 13
Worldwide Headquarters, 100 Cardinal Way, Redwood City, CA 94063, USA
phone: 650.385.5000 fax: 650.385.5500 toll-free in the US: 1.800.653.3871 www.informatica.com
Informatica Offices Around The Globe: Australia • Belgium • Canada • China • France • Germany • Japan • Korea • the Netherlands • Singapore • Switzerland • United Kingdom • USA
© 2006 Informatica Corporation. All rights reserved. Printed in the U.S.A. Informatica, the Informatica logo, and, PowerCenter are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions
throughout the world. All other company and product names may be tradenames or trademarks of their respective owners.
J50701 6650 (04/25/06)