Sunteți pe pagina 1din 174

Informatica Developer (Version 9.0.

1 HotFix 2)

User Guide

Informatica Developer User Guide Version 9.0.1 HotFix 2 November 2010 Copyright (c) 1998-2010 Informatica. All rights reserved. This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. This Software may be protected by U.S. and/or international Patents and other Patents Pending. Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable. The information in this product or documentation is subject to change without notice. If you find any problems in this product or documentation, please report them to us in writing. Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange, PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange and Informatica On Demand are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners. Portions of this software and/or documentation are subject to copyright held by third parties, including without limitation: Copyright DataDirect Technologies. All rights reserved. Copyright Sun Microsystems. All rights reserved. Copyright RSA Security Inc. All Rights Reserved. Copyright Ordinal Technology Corp. All rights reserved.Copyright Aandacht c.v. All rights reserved. Copyright Genivia, Inc. All rights reserved. Copyright 2007 Isomorphic Software. All rights reserved. Copyright Meta Integration Technology, Inc. All rights reserved. Copyright Oracle. All rights reserved. Copyright Adobe Systems Incorporated. All rights reserved. Copyright DataArt, Inc. All rights reserved. Copyright ComponentSource. All rights reserved. Copyright Microsoft Corporation. All rights reserved. Copyright Rouge Wave Software, Inc. All rights reserved. Copyright Teradata Corporation. All rights reserved. Copyright Yahoo! Inc. All rights reserved. Copyright Glyph & Cog, LLC. All rights reserved. This product includes software developed by the Apache Software Foundation (http://www.apache.org/), and other software which is licensed under the Apache License, Version 2.0 (the "License"). You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. This product includes software which was developed by Mozilla (http://www.mozilla.org/), software copyright The JBoss Group, LLC, all rights reserved; software copyright 1999-2006 by Bruno Lowagie and Paulo Soares and other software which is licensed under the GNU Lesser General Public License Agreement, which may be found at http:// www.gnu.org/licenses/lgpl.html. The materials are provided free of charge by Informatica, "as-is", without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. The product includes ACE(TM) and TAO(TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University, University of California, Irvine, and Vanderbilt University, Copyright () 1993-2006, all rights reserved. This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (copyright The OpenSSL Project. All Rights Reserved) and redistribution of this software is subject to terms available at http://www.openssl.org. This product includes Curl software which is Copyright 1996-2007, Daniel Stenberg, <daniel@haxx.se>. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://curl.haxx.se/docs/copyright.html. Permission to use, copy, modify, and distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies. The product includes software copyright 2001-2005 () MetaStuff, Ltd. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://www.dom4j.org/ license.html. The product includes software copyright 2004-2007, The Dojo Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http:// svn.dojotoolkit.org/dojo/trunk/LICENSE. This product includes ICU software which is copyright International Business Machines Corporation and others. All rights reserved. Permissions and limitations regarding this software are subject to terms available at http://source.icu-project.org/repos/icu/icu/trunk/license.html. This product includes software copyright 1996-2006 Per Bothner. All rights reserved. Your right to use such materials is set forth in the license which may be found at http:// www.gnu.org/software/ kawa/Software-License.html. This product includes OSSP UUID software which is Copyright 2002 Ralf S. Engelschall, Copyright 2002 The OSSP Project Copyright 2002 Cable & Wireless Deutschland. Permissions and limitations regarding this software are subject to terms available at http://www.opensource.org/licenses/mit-license.php. This product includes software developed by Boost (http://www.boost.org/) or under the Boost software license. Permissions and limitations regarding this software are subject to terms available at http:/ /www.boost.org/LICENSE_1_0.txt. This product includes software copyright 1997-2007 University of Cambridge. Permissions and limitations regarding this software are subject to terms available at http:// www.pcre.org/license.txt. This product includes software copyright 2007 The Eclipse Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http:// www.eclipse.org/org/documents/epl-v10.php. This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html, http://www.bosrup.com/web/overlib/?License, http://www.stlport.org/doc/ license.html, http://www.asm.ow2.org/license.html, http://www.cryptix.org/LICENSE.TXT, http://hsqldb.org/web/hsqlLicense.html, http://httpunit.sourceforge.net/doc/ license.html, http://jung.sourceforge.net/license.txt , http://www.gzip.org/zlib/zlib_license.html, http://www.openldap.org/software/release/license.html, http://www.libssh2.org, http://slf4j.org/license.html, http://www.sente.ch/software/OpenSourceLicense.html, and http://fusesource.com/downloads/license-agreements/fuse-message-broker-v-5-3license-agreement. This product includes software licensed under the Academic Free License (http://www.opensource.org/licenses/afl-3.0.php), the Common Development and Distribution License (http://www.opensource.org/licenses/cddl1.php) the Common Public License (http://www.opensource.org/licenses/cpl1.0.php) and the BSD License (http:// www.opensource.org/licenses/bsd-license.php). This product includes software copyright 2003-2006 Joe WaInes, 2006-2007 XStream Committers. All rights reserved. Permissions and limitations regarding this software are subject to terms available at http://xstream.codehaus.org/license.html. This product includes software developed by the Indiana University Extreme! Lab. For further information please visit http://www.extreme.indiana.edu/. This Software is protected by U.S. Patent Numbers 5,794,246; 6,014,670; 6,016,501; 6,029,178; 6,032,158; 6,035,307; 6,044,374; 6,092,086; 6,208,990; 6,339,775; 6,640,226; 6,789,096; 6,820,077; 6,823,373; 6,850,947; 6,895,471; 7,117,215; 7,162,643; 7,254,590; 7,281,001; 7,421,458; and 7,584,422, international Patents and other Patents Pending.

DISCLAIMER: Informatica Corporation provides this documentation "as is" without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of non-infringement, merchantability, or use for a particular purpose. Informatica Corporation does not warrant that this software or documentation is error free. The information provided in this software or documentation may include technical inaccuracies or typographical errors. The information in this software and documentation is subject to change at any time without notice. NOTICES This Informatica product (the Software) includes certain drivers (the DataDirect Drivers) from DataDirect Technologies, an operating company of Progress Software Corporation (DataDirect) which are subject to the following terms and conditions: 1. THE DATADIRECT DRIVERS ARE PROVIDED AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. 2. IN NO EVENT WILL DATADIRECT OR ITS THIRD PARTY SUPPLIERS BE LIABLE TO THE END-USER CUSTOMER FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, CONSEQUENTIAL OR OTHER DAMAGES ARISING OUT OF THE USE OF THE ODBC DRIVERS, WHETHER OR NOT INFORMED OF THE POSSIBILITIES OF DAMAGES IN ADVANCE. THESE LIMITATIONS APPLY TO ALL CAUSES OF ACTION, INCLUDING, WITHOUT LIMITATION, BREACH OF CONTRACT, BREACH OF WARRANTY, NEGLIGENCE, STRICT LIABILITY, MISREPRESENTATION AND OTHER TORTS. Part Number: IN-DUG-90100HF2-0001

Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .viii
Informatica Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Informatica Customer Portal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Informatica Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Informatica Web Site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Informatica How-To Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Informatica Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Informatica Multimedia Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Informatica Global Customer Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Part I: Informatica Developer Concepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 1: Working with Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2


Working with Informatica Developer Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Informatica Data Quality and Informatica Data Explorer AE. . . . . . . . . . . . . . . . . . . . . . . . . 2 Informatica Data Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Informatica Developer Interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Informatica Developer Welcome Page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Cheat Sheets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Setting Up Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Domains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Adding a Domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 The Model Repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Objects in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Adding a Model Repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Connecting to a Model Repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Projects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Creating a Project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Assigning Permissions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Folders. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Creating a Folder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Search. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Searching for Objects and Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Configuring Validation Preferences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Copy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Copying an Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Copying an Object as a Link. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Table of Contents

Chapter 2: Connections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Connections Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 DB2 for i5/OS Connection Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 DB2 for z/OS Connection Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 IBM DB2 Connection Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 IMS Connection Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Microsoft SQL Server Connection Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 ODBC Connection Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Oracle Connection Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 SAP Connection Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Sequential Connection Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 VSAM Connection Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Connection Explorer View. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Creating a Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Chapter 3: Physical Data Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28


Physical Data Objects Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Relational Data Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Key Relationships. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Creating a Read Transformation from Relational Data Objects. . . . . . . . . . . . . . . . . . . . . . 30 Importing a Relational Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Customized Data Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Default Query. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Key Relationships. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Select Distinct. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Sorted Ports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 User-Defined Joins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Custom Queries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Outer Join Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Informatica Join Syntax. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Pre- and Post-Mapping SQL Commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Customized Data Objects Write Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Creating a Customized Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Adding Relational Resources to a Customized Data Object. . . . . . . . . . . . . . . . . . . . . . . . 44 Adding Relational Data Objects to a Customized Data Object. . . . . . . . . . . . . . . . . . . . . . . 44 Nonrelational Data Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Importing a Nonrelational Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Flat File Data Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Flat File Data Object Overview Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Flat File Data Object Read Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Flat File Data Object Write Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

ii

Table of Contents

Flat File Data Object Advanced Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Creating a Flat File Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Importing a Fixed-Width Flat File Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Importing a Delimited Flat File Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 SAP Data Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Creating an SAP Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Synchronization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Troubleshooting Physical Data Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Chapter 4: Mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Mappings Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Object Dependency in a Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Developing a Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Creating a Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Mapping Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Adding Objects to a Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 One to One Links. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 One to Many Links. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Linking Ports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Manually Linking Ports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Automatically Linking Ports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Rules and Guidelines for Linking Ports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Propagating Port Attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Dependency Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Link Path Dependencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Implicit Dependencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Propagated Port Attributes by Transformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Mapping Validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Connection Validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Expression Validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Object Validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Validating a Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Running a Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Segments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Copying a Segment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Chapter 5: Performance Tuning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69


Performance Tuning Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Optimization Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Early Projection Optimization Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Early Selection Optimization Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Predicate Optimization Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Semi-Join Optimization Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

Table of Contents

iii

Setting the Optimizer Level for a Developer Tool Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Setting the Optimizer Level for a Deployed Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Chapter 6: Filter Pushdown Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75


Filter Pushdown Optimization Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Pushdown Optimization Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Pushdown Optimization to Native Sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Pushdown Optimization to PowerExchange Nonrelational Sources. . . . . . . . . . . . . . . . . . . 76 Pushdown Optimization to ODBC Sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Pushdown Optimization to SAP Sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Pushdown Optimization Expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Comparing the Output of the Data Integration Service and Sources. . . . . . . . . . . . . . . . . . . . . . . . . 82

Chapter 7: Mapplets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Mapplets Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Mapplet Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Mapplets and Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Mapplet Input and Output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Mapplet Input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Mapplet Output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Creating a Mapplet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Validating a Mapplet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Segments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Copying a Segment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Chapter 8: Object Import and Export. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88


Object Import and Export Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 The Import/Export XML File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Dependent Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Exporting Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Importing Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Importing Application Archives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Chapter 9: Export to PowerCenter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92


Export to PowerCenter Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 PowerCenter Release Compatibility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Setting the Compatibility Level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Mapplet Export. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Export to PowerCenter Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Exporting an Object to PowerCenter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Export Restrictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

iv

Table of Contents

Rules and Guidelines for Exporting to PowerCenter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Troubleshooting Exporting to PowerCenter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

Chapter 10: Deployment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99


Deployment Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Creating an Application. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Deploying an Object to a Data Integration Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Deploying an Object to a File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .101 Updating an Application. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .101 Mapping Deployment Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Application Changes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Chapter 11: Parameters and Parameter Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104


Parameters and Parameter Files Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .104 Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .104 Where to Create Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .105 Where to Assign Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .106 Creating a Parameter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Assigning a Parameter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .106 Parameter Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Parameter File Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Parameter File Schema Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Creating a Parameter File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .111

Chapter 12: Viewing Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112


Viewing Data Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Selecting a Default Data Integration Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .112 Configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Data Viewer Configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .113 Mapping Configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .114 Updating the Default Configuration Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .114 Configuration Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .115 Troubleshooting Configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Exporting Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .117 Logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

Part II: Informatica Data Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Chapter 13: Logical View of Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Logical View of Data Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .120 Developing a Logical View of Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Logical Data Object Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Creating a Logical Data Object Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .121

Table of Contents

Importing a Logical Data Object Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Logical Data Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Logical Data Object Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Attribute Relationships. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Creating a Logical Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Logical Data Object Mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Logical Data Object Read Mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Logical Data Object Write Mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Creating a Logical Data Object Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Chapter 14: Virtual Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126


Virtual Data Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 SQL Data Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Defining an SQL Data Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Creating an SQL Data Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Virtual Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Data Access Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Creating a Virtual Table from a Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Creating a Virtual Table Manually. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Defining Relationships between Virtual Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Running an SQL Query to Preview Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Virtual Table Mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Defining a Virtual Table Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Creating a Virtual Table Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Validating a Virtual Table Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Previewing Virtual Table Mapping Output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Virtual Stored Procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Defining a Virtual Stored Procedure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Creating a Virtual Stored Procedure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Validating a Virtual Stored Procedure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 Previewing Virtual Stored Procedure Output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 SQL Query Plans. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 SQL Query Plan Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Viewing an SQL Query Plan. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

Part III: Informatica Data Quality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Chapter 15: Profiles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Profiles Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Profile Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Creating a Column Profile for a Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Creating a Profile for Join Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Adding a Rule to Profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

vi

Table of Contents

Running a Saved Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Profiling a Mapplet or Mapping Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Profile Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Column Profiling Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Join Analysis Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Exporting Profile Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

Chapter 16: Scorecards. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146


Scorecards Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Creating a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Viewing Column Data in a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

Chapter 17: Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148


Reference Data Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Types of Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Appendix A: Datatype Reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150


Datatype Reference Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Flat File and Transformation Datatypes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 IBM DB2 and Transformation Datatypes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Unsupported IBM DB2 Datatypes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Microsoft SQL Server and Transformation Datatypes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Unsupported Microsoft SQL Server Datatypes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 ODBC and Transformation Datatypes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Oracle and Transformation Datatypes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Number(P,S) Datatype. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Char, Varchar, Clob Datatypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Unsupported Oracle Datatypes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Converting Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Port-to-Port Data Conversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

Table of Contents

vii

Preface
The Informatica Developer User Guide is written for data services and data quality developers. This guide assumes that you have an understanding of flat file and relational database concepts, the database engines in your environment, and data quality concepts.

Informatica Resources
Informatica Customer Portal
As an Informatica customer, you can access the Informatica Customer Portal site at http://mysupport.informatica.com. The site contains product information, user group information, newsletters, access to the Informatica customer support case management system (ATLAS), the Informatica How-To Library, the Informatica Knowledge Base, the Informatica Multimedia Knowledge Base, Informatica Product Documentation, and access to the Informatica user community.

Informatica Documentation
The Informatica Documentation team takes every effort to create accurate, usable documentation. If you have questions, comments, or ideas about this documentation, contact the Informatica Documentation team through email at infa_documentation@informatica.com. We will use your feedback to improve our documentation. Let us know if we can contact you regarding your comments. The Documentation team updates documentation as needed. To get the latest documentation for your product, navigate to Product Documentation from http://mysupport.informatica.com.

Informatica Web Site


You can access the Informatica corporate web site at http://www.informatica.com. The site contains information about Informatica, its background, upcoming events, and sales offices. You will also find product and partner information. The services area of the site includes important information about technical support, training and education, and implementation services.

Informatica How-To Library


As an Informatica customer, you can access the Informatica How-To Library at http://mysupport.informatica.com. The How-To Library is a collection of resources to help you learn more about Informatica products and features. It includes articles and interactive demonstrations that provide solutions to common problems, compare features and behaviors, and guide you through performing specific real-world tasks.

viii

Informatica Knowledge Base


As an Informatica customer, you can access the Informatica Knowledge Base at http://mysupport.informatica.com. Use the Knowledge Base to search for documented solutions to known technical issues about Informatica products. You can also find answers to frequently asked questions, technical white papers, and technical tips. If you have questions, comments, or ideas about the Knowledge Base, contact the Informatica Knowledge Base team through email at KB_Feedback@informatica.com.

Informatica Multimedia Knowledge Base


As an Informatica customer, you can access the Informatica Multimedia Knowledge Base at http://mysupport.informatica.com. The Multimedia Knowledge Base is a collection of instructional multimedia files that help you learn about common concepts and guide you through performing specific tasks. If you have questions, comments, or ideas about the Multimedia Knowledge Base, contact the Informatica Knowledge Base team through email at KB_Feedback@informatica.com.

Informatica Global Customer Support


You can contact a Customer Support Center by telephone or through the Online Support. Online Support requires a user name and password. You can request a user name and password at http://mysupport.informatica.com. Use the following telephone numbers to contact Informatica Global Customer Support:
North America / South America Toll Free Brazil: 0800 891 0202 Mexico: 001 888 209 8853 North America: +1 877 463 2435 Europe / Middle East / Africa Toll Free France: 00800 4632 4357 Germany: 00800 4632 4357 Israel: 00800 4632 4357 Italy: 800 915 985 Netherlands: 00800 4632 4357 Portugal: 800 208 360 Spain: 900 813 166 Switzerland: 00800 4632 4357 or 0800 463 200 United Kingdom: 00800 4632 4357 or 0800 023 4632 Asia / Australia Toll Free Australia: 1 800 151 830 New Zealand: 1 800 151 830 Singapore: 001 800 4632 4357

Standard Rate North America: +1 650 653 6332

Standard Rate India: +91 80 4112 5738

Standard Rate Belgium: +31 30 6022 797 France: 0805 804632 Germany: 01805 702702 Netherlands: 030 6022 797

Preface

ix

Part I: Informatica Developer Concepts


This part contains the following chapters:
Working with Informatica Developer, 2 Connections, 13 Physical Data Objects, 28 Mappings, 58 Performance Tuning, 69 Filter Pushdown Optimization, 75 Mapplets, 84 Object Import and Export, 88 Export to PowerCenter, 92 Deployment, 99 Parameters and Parameter Files, 104 Viewing Data, 112

CHAPTER 1

Working with Informatica Developer


This chapter includes the following topics:
Working with Informatica Developer Overview, 2 Informatica Developer Interface, 4 Setting Up Informatica Developer, 5 Domains, 5 The Model Repository, 6 Projects, 8 Folders, 9 Search, 10 Configuring Validation Preferences, 11 Copy, 11

Working with Informatica Developer Overview


The Developer tool is an application that you use to design and implement data quality and data services solutions. Use Informatica Data Quality and Informatica Data Explorer Advanced Edition for data quality solutions. Use Informatica Data Services for data services solutions. You can also use the Profiling option with Informatica Data Services to profile data.

Informatica Data Quality and Informatica Data Explorer AE


Use the data quality capabilities in the Developer tool to analyze the content and structure of your data and enhance the data in ways that meet your business needs. Use the Developer tool to design and run processes to complete the following tasks:
Profile data. Profiling reveals the content and structure of data. Profiling is a key step in any data project, as it

can identify strengths and weaknesses in data and help you define a project plan.
Create scorecards to review data quality. A scorecard is a graphical representation of the quality

measurements in a profile.
Standardize data values. Standardize data to remove errors and inconsistencies that you find when you run a

profile. You can standardize variations in punctuation, formatting, and spelling. For example, you can ensure that the city, state, and ZIP code values are consistent.

Parse data. Parsing reads a field composed of multiple values and creates a field for each value according to

the type of information it contains. Parsing can also add information to records. For example, you can define a parsing operation to add units of measurement to product data.
Validate postal addresses. Address validation evaluates and enhances the accuracy and deliverability of postal

address data. Address validation corrects errors in addresses and completes partial addresses by comparing address records against address reference data from national postal carriers. Address validation can also add postal information that speeds mail delivery and reduces mail costs.
Find duplicate records. Duplicate analysis calculates the degrees of similarity between records by comparing

data from one or more fields in each record. You select the fields to be analyzed, and you select the comparison strategies to apply to the data. The Developer tool enables two types of duplicate analysis: field matching, which identifies similar or duplicate records, and identity matching, which identifies similar or duplicate identities in record data.
Create reference data tables. Informatica provides reference data that can enhance several types of data

quality process, including standardization and parsing. You can create reference tables using data from profile results.
Create and run data quality rules. Informatica provides rules that you can run or edit to meet your project

objectives. You can create mapplets and validate them as rules in the Developer tool.
Collaborate with Informatica users. The Model Repository stores reference data and rules, and this repository

is available to users of the Developer tool and Analyst tool. Users can collaborate on projects, and different users can take ownership of objects at different stages of a project.
Export mappings to PowerCenter. You can export mappings to PowerCenter to reuse the metadata for physical

data integration or to create web services.

Informatica Data Services


Data services are a collection of reusable operations that you can run against sources to access, transform, and deliver data. Use the data services capabilities in the Developer tool to complete the following tasks:
Define logical views of data. A logical view of data describes the structure and use of data in an enterprise. You

can create a logical data object model that shows the types of data your enterprise uses and how that data is structured.
Map logical models to data sources or targets. Create a mapping that links objects in a logical model to data

sources or targets. You can link data from multiple, disparate sources to create a single view of the data. You can also load data that conforms to a model to multiple, disparate targets.
Create virtual views of data. You can deploy a logical model to a virtual federated database. End users can run

SQL queries against the virtual data without affecting the actual source data.
Export mappings to PowerCenter. You can export mappings to PowerCenter to reuse the metadata for physical

data integration or to create web services.


Create and deploy mappings that end users can query. You can create mappings and deploy them so that end

users can run SQL queries against the mapping results.


Profile data. If you use the Profiling option, profile data to reveal the content and structure of data. Profiling is a

key step in any data project, as it can identify strengths and weaknesses in data and help you define a project plan.

Working with Informatica Developer Overview

Informatica Developer Interface


The Developer tool lets you design and implement data quality and data services solutions. You can work on multiple tasks in the Developer tool at the same time. You can also work in multiple folders and projects at the same time. To work in the Developer tool, you access the Developer tool workbench. The following figure shows the Developer tool workbench:

The Developer tool workbench includes an editor and views. You edit objects, such as mappings, in the editor. The Developer tool displays views, such as the default view, based on which object is open in the editor. The Developer tool also includes the following views that appear independently of the objects in the editor:
Cheat Sheets. Shows cheat sheets. Connection Explorer. Shows connections to relational databases. Data Viewer. Shows the results of a mapping, data preview, or an SQL query. Object Explorer. Shows projects, folders, and the objects they contain. Outline. Shows dependent objects in an object. Progress. Shows the progress of operations in the Developer tool, such as a mapping run. Properties. Shows object properties. Search. Shows search options. Validation Log. Shows object validation errors.

You can hide views and move views to another location in the Developer tool workbench. You can also display other views. Click Window > Show View to select the views you want to display.

Chapter 1: Working with Informatica Developer

Informatica Developer Welcome Page


The first time you open the Developer tool, the Welcome page appears. Use the Welcome page to learn more about the Developer tool, set up the Developer tool, and to start working in the Developer tool. The Welcome page displays the following options:
Overview. Click the Overview button to get an overview of data quality and data services solutions. First Steps. Click the First Steps button to learn more about setting up the Developer tool and accessing

Informatica Data Quality and Informatica Data Services lessons.


Tutorials. Click the Tutorials button to see cheat sheets for the Developer tool and for data quality and data

services solutions.
Web Resources. Click the Web Resources button for a link to my.informatica.com. From my.informatica.com,

you can access the Informatica How-To Library. The Informatica How-To Library contains articles about the Developer tool, Informatica Data Quality, Informatica Data Services, and other Informatica products.
Workbench. Click the Workbench button to start working in the Developer tool.

Cheat Sheets
The Developer tool includes cheat sheets as part of the online help. A cheat sheet is a step-by-step guide that helps you complete one or more tasks in the Developer tool. When you follow a cheat sheet, you complete the tasks and see the results. For example, you can complete a cheat sheet to import and preview a relational physical data object. To access cheat sheets, click Help > Cheat Sheets.

Setting Up Informatica Developer


To set up the Developer tool, you add a domain. You create a connection to a Model repository, and you create a project and folder to store your work. You also select a default Data Integration Service. To set up the Developer tool, complete the following tasks: 1. 2. 3. 4. 5. Add a domain. Connect to a Model repository. Create a project. Optionally, create a folder. Select a default Data Integration Service.

Domains
The Informatica domain is a collection of nodes and services that define the Informatica environment. You add a domain in the Developer tool. You can also edit the domain information or remove a domain. You manage domain information in the Developer tool preferences.

Setting Up Informatica Developer

Adding a Domain
Add a domain in the Developer tool to access a Model repository. Before you add a domain, verify that you have a domain name, host name, and port number to connect to a domain. You can get this information from an administrator. 1. Click Window > Preferences. The Preferences dialog box appears. 2. 3. Select Informatica > Domains. Click Add. The New Domain dialog box appears. 4. 5. 6. Enter the domain name, host name, and port number. Click Finish. Click OK.

The Model Repository


The Model repository is a relational database that stores the metadata for projects and folders. When you set up the Developer tool, you need to add a Model repository. Each time you open the Developer tool, you connect to the Model repository to access projects and folders.

Objects in Informatica Developer


You can create, manage, or view certain objects in a project or folder in the Developer tool. The following table lists the objects in a project or folder and the operations you can perform:
Object Application Connection Folder Logical data object Logical data object mapping Logical data object model Mapping Mapplet Physical data object Description Create, edit, and delete applications. Create, edit, and delete connections. Create, edit, and delete folders. Create, edit, and delete logical data objects in a logical data object model. Create, edit, and delete logical data object mappings for a logical data object. Create, edit, and delete logical data object models. Create, edit, and delete mappings. Create, edit, and delete mapplets. Create, edit, and delete physical data objects. Physical data objects can be flat file, nonrelational, relational, or SAP.

Chapter 1: Working with Informatica Developer

Object Profile Reference table Rule Scorecard SQL data service Transformation Virtual schema Virtual stored procedure Virtual table Virtual table mapping

Description Create, edit, and delete profiles. View and delete reference tables. Create, edit, and delete rules. Create, edit, and delete scorecards. Create, edit, and delete SQL data services. Create, edit, and delete transformations. Create, edit, and delete virtual schemas in an SQL data service. Create, edit, and delete virtual stored procedures in a virtual schema. Create, edit, and delete virtual tables in a virtual schema. Create, edit, and delete virtual table mappings for a virtual table.

Adding a Model Repository


Add a Model repository to access projects and folders. Before you add a Model repository, verify the following prerequisites:
An administrator has configured a Model Repository Service in the Administrator tool. You have a user name and password to access the Model Repository Service. You can get this information

from an administrator. 1. Click File > Connect to Repository. The Connect to Repository dialog box appears. 2. 3. 4. 5. 6. Click Browse to select a Model Repository Service. Click OK. Click Next. Enter your user name and password. Click Finish. The Model Repository appears in the Object Explorer view.

Connecting to a Model Repository


Each time you open the Developer tool, you connect to a Model repository to access projects and folders. When you connect to a Model repository, you enter connection information to access the domain that includes the Model Repository Service that manages the Model repository. 1. In the Object Explorer view, right-click a Model repository and click Connect. The Connect to Repository dialog box appears. 2. Enter the domain user name and password.

The Model Repository

3.

Click OK. The Developer tool connects to the Model repository. The Developer tool displays the projects in the repository.

Projects
A project is the top-level container that you use to store folders and objects in the Developer tool. Use projects to organize and manage the objects that you want to use for data services and data quality solutions. You manage and view projects in the Object Explorer view. When you create a project, the Developer tool stores the project in the Model repository. Each project that you create also appears in the Analyst tool. The following table describes the tasks that you can perform on a project:
Task Manage projects Description Manage project contents. You can create, duplicate, rename, and delete a project. You can view project contents. Organize project contents in folders. You can create, duplicate, rename, move, and rename folders within projects. You can view object contents, duplicate, rename, move, and delete objects in a project or in a folder within a project. You can search for folders or objects in projects. You can view search results and select an object from the results to view its contents. You can add users to a project. You can assign the read, write, and grant permissions to users on a project to restrict or provide access to objects within the project. Share project contents to collaborate with other users on the project. The contents of a shared project are available for other uses to add to use. For example, when you create a profile in the project Customers_West, you can add a physical data object from the shared folder Customers_East to the profile.

Manage folders

Manage objects

Search projects

Assign permissions

Share projects

Creating a Project
Create a project to store objects and folders. 1. 2. Select a Model Repository Service in the Object Explorer view. Click File > New > Project. The New Project dialog box appears. 3. 4. 5. Enter a name for the project. Click Shared if you want to use objects in this project in other projects. Click Finish. The project appears under the Model Repository Service in the Object Explorer view.

Chapter 1: Working with Informatica Developer

Assigning Permissions
You can add users to a project and assign permissions for the user. Assign permissions to determine the tasks that users can complete on a project and objects in the project. 1. 2. Select a project in the Object Explorer view. Click File > Permissions. The Permissions dialog box appears. 3. Click Add to add a user and assign permissions for the user. The Domain Users dialog box appears. The dialog box shows a list of users. 4. To filter the list of users, enter a name or string. Optionally, use the wildcard characters in the filter. 5. 6. 7. To filter by security domain, click the Filter by Security Domain button. Select Native to show users in the native security domain. Or, select All to show all users. Select a user and click OK. The user appears with the list of users in the Permissions dialog box. 8. 9. Select Allow or Deny for each permission for the user. Click OK.

Folders
Use folders to organize objects in a project. Create folders to group objects based on business needs. For example, you can create a folder to group objects for a particular task in a project. You can create a folder in a project or in another folder. Folders appear within projects in the Object Explorer view. A folder can contain other folders, data objects, and object types. You can perform the following tasks on a folder:
Create a folder. View a folder. Rename a folder. Duplicate a folder. Move a folder. Delete a folder.

Creating a Folder
Create a folder to store related objects in a project. You must create the folder in a project or another folder. 1. 2. In the Object Explorer view, select the project or folder where you want to create a folder. Click File > New > Folder. The New Folder dialog box appears. 3. Enter a name for the folder.

Folders

4.

Click Finish. The folder appears under the project or parent folder.

Search
You can search for objects and object properties in the Developer tool. You can create a search query and then filter the search results. You can view search results and select an object from the results to view its contents. Search results appear on the Search view. You can use the following search options:
Search Option Containing text Description Object or property that you want to search for. Enter an exact string or use a wildcard. Not case sensitive. One or more objects that contain the name pattern. Enter an exact string or use a wildcard. Not case sensitive. One or more object types to search for. Search the workspace or an object that you selected.

Name patterns

Search for Scope

The Model Repository Service uses a search analyzer to index the metadata in the Model repository. The Developer tool uses the search analyzer to perform searches on objects contained in projects in the Model repository. You must save an object before you can search on it. You can search in different languages. To search in a different language, an administrator must change the search analyzer and configure the Model repository to use the search analyzer.

Searching for Objects and Properties


Search for objects and properties in the Model repository. 1. Click Search > Search. The Search dialog box appears. 2. 3. 4. 5. 6. Enter the object or property you want to search for. Optionally, include wildcard characters. If you want to search for a property in an object, optionally enter one or more name patterns separated by a comma. Optionally, choose the object types you want to search for. Choose to search the workspace or the object you selected. Click Search. The search results appear in the Search view. 7. In the Search view, double-click an object to open it in the editor.

10

Chapter 1: Working with Informatica Developer

Configuring Validation Preferences


Configure validation preferences to set error limits and limit the number of visible items per group. 1. Click Window > Preferences. The Preferences dialog box appears. 2. 3. 4. Select Informatica > Validation. Optionally, select Use Error Limits. Enter a value for Limit visible items per group to. Default is 100. 5. 6. 7. To restore the default values, click Restore Defaults. Click Apply. Click OK.

Copy
You can copy objects within a project or to a different project. You can also copy objects to folders in the same project or to folders in a different project. You can also copy an object as a link to view the object in the Analyst tool or to provide a link to the object in another medium, such as an email message. You can copy the following objects to another project or folder or as a link:
Application Data service Logical data object model Mapping Mapplet Physical data object Profile Reference table Reusable transformation Rule Scorecard Virtual stored procedure

Use the following guidelines when you copy objects:


You can copy segments of mappings, mapplets, rules, and virtual stored procedures. You can copy a folder to another project. You can copy a logical data object as a link. You can paste an object multiple times after you copy it. If the project or folder contains an object with the same name, you can rename or replace the object.

Configuring Validation Preferences

11

Copying an Object
Copy an object to make it available in another project or folder. 1. 2. 3. 4. Select an object in a project or folder. Click Edit > Copy . Select the project or folder that you want to copy the object to. Click Edit > Paste.

Copying an Object as a Link


Copy an object as a link to view the object in the Analyst tool. You can paste the link into a web browser or in another medium, such as a document or an email message. When you click the link, it opens the Analyst tool in the default web browser configured for the machine. You must log in to the Analyst tool to access the object. 1. 2. 3. Right-click an object in a project or folder. Click Copy as Link. Paste the link into another application, such as Microsoft Internet Explorer or an email message.

12

Chapter 1: Working with Informatica Developer

CHAPTER 2

Connections
This chapter includes the following topics:
Connections Overview, 13 DB2 for i5/OS Connection Properties, 14 DB2 for z/OS Connection Properties, 16 IBM DB2 Connection Properties, 18 IMS Connection Properties, 19 Microsoft SQL Server Connection Properties, 20 ODBC Connection Properties, 21 Oracle Connection Properties, 22 SAP Connection Properties, 23 Sequential Connection Properties, 24 VSAM Connection Properties, 25 Connection Explorer View, 26 Creating a Connection , 27

Connections Overview
A connection is a repository object that defines a connection in the domain configuration repository. Create a connection to import relational or nonrelational data objects, preview data, profile data, and run mappings. The Developer tool uses the connection when you import a data object. The Data Integration Service uses the connection when you preview data or run mappings. The Developer tool stores connections in the Model repository. Any connection that you create in the Developer tool is available in the Analyst tool or the Administrator tool. Create and manage connections in the connection preferences or in the Connection Explorer view. You can create the following types of connection:
DB2/I5OS DB2/ZOS IBM DB2 IMS Microsoft SQL Server

13

ODBC Oracle SAP Sequential VSAM

DB2 for i5/OS Connection Properties


Use a DB2 for i5/OS connection to access tables in DB2 for i5/OS. The Data Integration Service connects to DB2 for i5/OS through PowerExchange. The following table describes the DB2 for i5/OS connection properties:
Property Database Name Location Description Name of the database instance. Location of the PowerExchange Listener node that can connect to DB2. The location is defined in the first parameter of the NODE statement in the PowerExchange dbmover.cfg configuration file. Database user name. Password for the user name. SQL commands to set the database environment when you connect to the database. The Data Integration Service executes the connection environment SQL each time it connects to the database. Specifies the i5/OS database file override. The format is:
from_file/to_library/to_file/to_member

Username Password Environment SQL

Database File Overrides

Where: - from_file is the file to be overridden - to_library is the new library to use - to_file is the file in the new library to use - to_member is optional and is the member in the new library and file to use. *FIRST is used if nothing is specified. You can specify up to 8 unique file overrides on a single connection. A single override applies to a single source or target. When you specify more than one file override, enclose the string of file overrides in double quotes and include a space between each file override. Note: If you specify both Library List and Database File Overrides and a table exists in both, the Database File Overrides takes precedence. Library List List of libraries that PowerExchange searches to qualify the table name for Select, Insert, Delete, or Update statements. PowerExchange searches the list if the table name is unqualified. Separate libraries with semicolons. Note: If you specify both Library List and Database File Overrides and a table exists in both, Database File Overrides takes precedence. Database code page.

Code Page

14

Chapter 2: Connections

Property SQL identifier character

Description The type of character used for the Support Mixed-Case Identifiers property. Select the character based on the database in the connection. Enables the Developer tool and Analyst tool to place quotes around table, view, schema, synonym, and column names when generating and executing SQL against these objects in the connection. Use if the objects have mixed-case or lowercase names. Also, use if the object names contain SQL keywords, such as WHERE. Commit scope of the transaction. Select one of the following values: - None - CS. Cursor stability. - RR. Repeatable Read. - CHG. Change. - ALL Default is CS. Type of encryption that the Data Integration Service uses. Select one of the following values: - None - RC2 - DES Default is None. Level of encryption that the Data Integration Service uses. If you select RC2 or DES for Encryption Type, select one of the following values to indicate the encryption level: - 1 - Uses a 56-bit encryption key for DES and RC2. - 2 - Uses 168-bit triple encryption key for DES. Uses a 64-bit encryption key for RC2. - 3 - Uses 168-bit triple encryption key for DES. Uses a 128-bit encryption key for RC2. Ignored if you do not select an encryption type. Default is 1. Amount of data the source system can pass to the PowerExchange Listener. Configure the pacing size if an external application, database, or the Data Integration Service node is a bottleneck. The lower the value, the faster the performance. Minimum value is 0. Enter 0 for maximum performance. Default is 0. Interprets the pacing size as rows or kilobytes. Select to represent the pacing size in number of rows. If you clear this option, the pacing size represents kilobytes. Default is Disabled. Select to compress source data when reading from the database. Number of records of the storage array size for each thread. Use if the number of worker threads is greater than 0. Default is 25.

Support mixed-case identifiers

Isolation Level

Encryption Type

Level

Pacing Size

Interpret as Rows

Compression Array Size

DB2 for i5/OS Connection Properties

15

Property Write Mode

Description Mode in which Data Integration Service sends data to the PowerExchange Listener. Configure one of the following write modes: - CONFIRMWRITEON. Sends data to the PowerExchange Listener and waits for a response before sending more data. Select if error recovery is a priority. This option might decrease performance. - CONFIRMWRITEOFF. Sends data to the PowerExchange Listener without waiting for a response. Use this option when you can reload the target table if an error occurs. - ASYNCHRONOUSWITHFAULTTOLERANCE. Sends data to the PowerExchange Listener without waiting for a response. This option also provides the ability to detect errors. This provides the speed of confirm write off with the data integrity of confirm write on. Default is CONFIRMWRITEON. Overrides the default prefix of PWXR for the reject file. PowerExchange creates the reject file on the target machine when the write mode is asynchronous with fault tolerance. Specifying PWXDISABLE prevents the creation of the reject files.

Async Reject File

DB2 for z/OS Connection Properties


Use a DB2 for z/OS connection to access tables in DB2 for z/OS. The Data Integration Service connects to DB2 for z/OS through PowerExchange. The following table describes the DB2 for z/OS connection properties:
Property DB2 Subsystem ID Location Description Name of the DB2 subsystem. Location of the PowerExchange Listener node that can connect to DB2. The location is defined in the first parameter of the NODE statement in the PowerExchange dbmover.cfg configuration file. Database user name. Password for the user name. SQL commands to set the database environment when you connect to the database. The Data Integration Service executes the connection environment SQL each time it connects to the database. Value to be concatenated to prefix PWX to form the DB2 correlation ID for DB2 requests. Database code page. The type of character used for the Support Mixed-Case Identifiers property. Select the character based on the database in the connection. Enables the Developer tool and Analyst tool to place quotes around table, view, schema, synonym, and column names when generating and executing SQL

Username Password Environment SQL

Correlation ID

Code Page SQL identifier character

Support mixed-case identifiers

16

Chapter 2: Connections

Property

Description against these objects in the connection. Use if the objects have mixed-case or lowercase names. Also, use if the object names contain SQL keywords, such as WHERE.

Encryption Type

Type of encryption that the Data Integration Service uses. Select one of the following values: - None - RC2 - DES Default is None. Level of encryption that the Data Integration Service uses. If you select RC2 or DES for Encryption Type, select one of the following values to indicate the encryption level: - 1 - Uses a 56-bit encryption key for DES and RC2. - 2 - Uses 168-bit triple encryption key for DES. Uses a 64-bit encryption key for RC2. - 3 - Uses 168-bit triple encryption key for DES. Uses a 128-bit encryption key for RC2. Ignored if you do not select an encryption type. Default is 1. Amount of data the source system can pass to the PowerExchange Listener. Configure the pacing size if an external application, database, or the Data Integration Service node is a bottleneck. The lower the value, the faster the performance. Minimum value is 0. Enter 0 for maximum performance. Default is 0. Interprets the pacing size as rows or kilobytes. Select to represent the pacing size in number of rows. If you clear this option, the pacing size represents kilobytes. Default is Disabled. Select to compress source data when reading from the database. Moves data processing for bulk data from the source system to the Data Integration Service machine. Default is No. Number of threads that the Data Integration Services uses on the Data Integration Service machine to process data. For optimal performance, do not exceed the number of installed or available processors on the Data Integration Service machine. Default is 0. Number of records of the storage array size for each thread. Use if the number of worker threads is greater than 0. Default is 25. Configure one of the following write modes: - CONFIRMWRITEON. Sends data to the PowerExchange Listener and waits for a response before sending more data. Select if error recovery is a priority. This option might decrease performance. - CONFIRMWRITEOFF. Sends data to the PowerExchange Listener without waiting for a response. Use this option when you can reload the target table if an error occurs. - ASYNCHRONOUSWITHFAULTTOLERANCE. Sends data to the PowerExchange Listener without waiting for a response. This option also provides the ability to detect errors. This provides the speed of confirm write off with the data integrity of confirm write on.

Level

Pacing Size

Interpret as Rows

Compression Offload Processing

Worker Threads

Array Size

Write Mode

DB2 for z/OS Connection Properties

17

Property

Description Default is CONFIRMWRITEON.

Async Reject File

Overrides the default prefix of PWXR for the reject file. PowerExchange creates the reject file on the target machine when the write mode is asynchronous with fault tolerance. Specifying PWXDISABLE prevents the creation of the reject files.

IBM DB2 Connection Properties


Use an IBM DB2 connection to access tables in an IBM DB2 database. The following table describes the IBM DB2 connection properties:
Property User name Password Connection String for metadata access Description Database user name. Password for the user name. Connection string to import physical data objects. Use the following connection string: jdbc:informatica:db2://<host>:50000;databaseName=<dbname> Connection string to preview data and run mappings. Enter dbname from the alias configured in the DB2 client. Database code page. Optional. Enter SQL commands to set the database environment when you connect to the database. The Data Integration Service executes the connection environment SQL each time it connects to the database. Optional. Enter SQL commands to set the database environment when you connect to the database. The Data Integration Service executes the transaction environment SQL at the beginning of each transaction. Number of seconds the Data Integration Service attempts to reconnect to the database if the connection fails. If the Data Integration Service cannot connect to the database in the retry period, the session fails. Default is 0. Tablespace name of the IBM DB2 database. The type of character used for the Support Mixed-Case Identifiers property. Select the character based on the database in the connection. Enables the Developer tool and Analyst tool to place quotes around table, view, schema, synonym, and column names when generating and executing SQL against these objects in the connection. Use if the objects have mixed-case or lowercase names. Also, use if the object names contain SQL keywords, such as WHERE.

Connection String for data access

Code Page Environment SQL

Transaction SQL

Retry Period

Tablespace SQL identifier character

Support mixed-case identifiers

18

Chapter 2: Connections

IMS Connection Properties


Use an IMS connection to access an IMS database. The Data Integration Service connects to IMS through PowerExchange. The following table describes the IMS connection properties:
Option Location Description Location of the PowerExchange Listener node that can connect to IMS. The location is defined in the first parameter of the NODE statement in the PowerExchange dbmover.cfg configuration file. Database user name. Password for the database user name. Required. Code to read from or write to the database. Use the ISO code page name, such as ISO-8859-6. The code page name is not case sensitive. Type of encryption that the Data Integration Service uses. Select one of the following values: - None - RC2 - DES Default is None. Level of encryption that the Data Integration Service uses. If you select RC2 or DES for Encryption Type, select one of the following values to indicate the encryption level: - 1. Uses a 56-bit encryption key for DES and RC2. - 2. Uses 168-bit triple encryption key for DES. Uses a 64-bit encryption key for RC2. - 3. Uses 168-bit triple encryption key for DES. Uses a 128-bit encryption key for RC2. Ignored if you do not select an encryption type. Default is 1. Amount of data the source system can pass to the PowerExchange Listener. Configure the pacing size if an external application, database, or the Data Integration Service node is a bottleneck. The lower the value, the faster the performance. Minimum value is 0. Enter 0 for maximum performance. Default is 0. Interprets the pacing size as rows or kilobytes. Select to represent the pacing size in number of rows. If you clear this option, the pacing size represents kilobytes. Default is Disabled. Optional. Compresses the data to decrease the amount of data Informatica applications write over the network. True or false. Default is false. Optional. Moves bulk data processing from the IMS source to the Data Integration Service machine. Enter one of the following values: - Auto. The Data Integration Service determines whether to use offload processing. - Yes. Use offload processing. - No. Do not use offload processing. Default is Auto. Number of threads that the Data Integration Service uses to process bulk data when offload processing is enabled. For optimal performance, this value should not exceed the number of available processors on the Data Integration Service machine. Valid values are 1 through 64. Default is 0, which disables multithreading.

User Name Password Code Page

Encryption Type

Encryption Level

Pacing Size

Interpret as Rows

Compression

OffLoad Processing

Worker Threads

IMS Connection Properties

19

Option Array Size

Description Determines the number of records in the storage array for the threads when the worker threads value is greater than 0. Valid values are from 1 through 100000. Default is 25. Mode in which Data Integration Service sends data to the PowerExchange Listener. Configure one of the following write modes: - CONFIRMWRITEON. Sends data to the PowerExchange Listener and waits for a response before sending more data. Select if error recovery is a priority. This option might decrease performance. - CONFIRMWRITEOFF. Sends data to the PowerExchange Listener without waiting for a response. Use this option when you can reload the target table if an error occurs. - ASYNCHRONOUSWITHFAULTTOLERANCE. Sends data to the PowerExchange Listener without waiting for a response. This option also provides the ability to detect errors. This provides the speed of confirm write off with the data integrity of confirm write on. Default is CONFIRMWRITEON.

Write Mode

Microsoft SQL Server Connection Properties


Use Microsoft SQL Server connection to access tables in a Microsoft SQL Server database. The following table describes the Microsoft SQL Server connection properties:
Property User name Password Use Trusted Connection Description Database user name. Password for the user name. Optional. When enabled, the Data Integration Service uses Windows authentication to access the Microsoft SQL Server database. The user name that starts the Data Integration Service must be a valid Windows user with access to the Microsoft SQL Server database. Connection string to import physical data objects. Use the following connection string: jdbc:informatica:sqlserver://
<host>:<port>;databaseName=<dbname>

Connection String for metadata access

Connection String for data access

Connection string to preview data and run mappings. Enter


<ServerName>@<DBName>

Domain Name Packet Size

Optional. Name of the domain where Microsoft SQL Server is running. Required. Optimize the ODBC connection to Microsoft SQL Server. Increase the packet size to increase performance. Default is 0. Database code page. Optional. Enter SQL commands to set the database environment when you connect to the database. The Data Integration Service executes the connection environment SQL each time it connects to the database.

Code Page Environment SQL

20

Chapter 2: Connections

Property Transaction SQL

Description Optional. Enter SQL commands to set the database environment when you connect to the database. The Data Integration Service executes the transaction environment SQL at the beginning of each transaction. Number of seconds the Data Integration Service attempts to reconnect to the database if the connection fails. If the Data Integration Service cannot connect to the database in the retry period, the session fails. Default is 0. The type of character used for the Support Mixed-Case Identifiers property. Select the character based on the database in the connection. Enables the Developer tool and Analyst tool to place quotes around table, view, schema, synonym, and column names when generating and executing SQL against these objects in the connection. Use if the objects have mixed-case or lowercase names. Also, use if the object names contain SQL keywords, such as WHERE.

Retry Period

SQL identifier character

Support mixed-case identifiers

ODBC Connection Properties


Use an ODBC connection to access tables in a database through ODBC. The following table describes the ODBC connection properties:
Property User name Password Connection String Code Page Environment SQL Description Database user name. Password for the user name. Connection string to connect to the database. Database code page. Optional. Enter SQL commands to set the database environment when you connect to the database. The Data Integration Service executes the connection environment SQL each time it connects to the database. Optional. Enter SQL commands to set the database environment when you connect to the database. The Data Integration Service executes the transaction environment SQL at the beginning of each transaction. Number of seconds the Data Integration Service attempts to reconnect to the database if the connection fails. If the Data Integration Service cannot connect to the database in the retry period, the session fails. Default is 0. Type of character used for the Support mixed-case identifiers property. Select the character based on the database in the connection. Enables the Developer tool and the Analyst tool to place quotes around table, view, schema, synonym, and column names when generating and executing SQL against these objects in the connection. Use if the objects have mixed-case

Transaction SQL

Retry Period

SQL identifier character

Support mixed-case identifiers

ODBC Connection Properties

21

Property

Description or lowercase names. Also, use if the object names contain SQL keywords, such as WHERE.

ODBC Provider

Type of database that ODBC connects to. For pushdown optimization, specify the database type to enable the Data Integration Service to generate native database SQL. Default is Other.

Oracle Connection Properties


Use an Oracle connection to access tables in an Oracle database. The following table describes the Oracle connection properties:
Property User name Password Connection String for metadata access Description Database user name. Password for the user name. Connection string to import physical data objects. Use the following connection string: jdbc:informatica:oracle://<host>:1521;SID=<sid> Connection string to preview data and run mappings. Enter dbname.world from the TNSNAMES entry. Database code page. Optional. Enter SQL commands to set the database environment when you connect to the database. The Data Integration Service executes the connection environment SQL each time it connects to the database. Optional. Enter SQL commands to set the database environment when you connect to the database. The Data Integration Service executes the transaction environment SQL at the beginning of each transaction. Number of seconds the Data Integration Service attempts to reconnect to the database if the connection fails. If the Data Integration Service cannot connect to the database in the retry period, the session fails. Default is 0. Optional. Enables parallel processing when loading data into a table in bulk mode. Default is disabled. The type of character used for the Support Mixed-Case Identifiers property. Select the character based on the database in the connection. Enables the Developer tool and Analyst tool to place quotes around table, view, schema, synonym, and column names when generating and executing SQL against these objects in the connection. Use if the objects have mixed-case or lowercase names. Also, use if the object names contain SQL keywords, such as WHERE.

Connection String for data access

Code Page Environment SQL

Transaction SQL

Retry Period

Parallel Mode

SQL identifier character

Support mixed-case identifiers

22

Chapter 2: Connections

SAP Connection Properties


The following table describes the SAP connection properties:
Property User name Password Trace Description SAP source system connection user name. Password for the user name. Select this option to track the RFC calls that the SAP system makes. SAP stores the information about the RFC calls in a trace file. You can access the trace files from server/ bin directory on the Informatica server machine and the client/bin directry on the client machine. Select Type A to connect to one SAP system. Select Type B when you want to use SAP load balancing. Host name or IP address of the SAP server. Informatica uses this entry to connect to the SAP server. Name of the SAP system. Group name of the SAP application server. SAP system number. SAP client number. Language that you want for the mapping. Must be compatible with the the Developer tool code page. If you leave this option blank, Informatica uses the default language of the SAP system. Code page compatible with the SAP server. Must also correspond to the language code. Path in the SAP system where the staging file will be created. The Data Integration Service path containing the source file. Enables FTP access to SAP. User name to connect to the FTP server. Password for the FTP user. Host name or IP address of the FTP server. Optionally, you can specify a port number from 1 through 65535, inclusive. Default for FTP is 21. Use the following syntax to specify the host name:
hostname:port_number

Connection type

Host name

R3 name Group System number Client number Language

Code page Staging directory Source directory Use FTP FTP user FTP password FTP host

Or ,
IP address:port_number

When you specify a port number, enable that port number for FTP on the host machine. If you enable SFTP, specify a host name or port number for an SFTP server. Default for SFTP is 22. Retry period Number of seconds that the Data Integration Service attempts to reconnect to the FTP host if the connection fails. If the Data Integration Service cannot reconnect to the FTP

SAP Connection Properties

23

Property

Description host in the retry period, the session fails. Default value is 0 and indicates an infinite retry period.

Use SFTP Public key file name

Enables SFTP access to SAP. Public key file path and file name. Required if the SFTP server uses publickey authentication. Enabled for SFTP. Private key file path and file name. Required if the SFTP server uses publickey authentication. Enabled for SFTP. Private key file password used to decrypt the private key file. Required if the SFTP server uses public key authentication and the private key is encrypted. Enabled for SFTP.

Private key file name

Private key file name password

Sequential Connection Properties


Use a sequential connection to access z/OS sequential data sets. The Data Integration Service connects to the data sets through PowerExchange. The following table describes the sequential connection properties:
Option Code Page Description Required. Code to read from or write to the sequential data set. Use the ISO code page name, such as ISO-8859-6. The code page name is not case sensitive. Determines the number of records in the storage array for the threads when the worker threads value is greater than 0. Valid values are from 1 through 100000. Default is 25. Compresses the data to decrease the amount of data Informatica applications write over the network. True or false. Default is false. Level of encryption that the Data Integration Service uses. If you select RC2 or DES for Encryption Type, select one of the following values to indicate the encryption level: - 1 - Uses a 56-bit encryption key for DES and RC2. - 2 - Uses 168-bit triple encryption key for DES. Uses a 64-bit encryption key for RC2. - 3 - Uses 168-bit triple encryption key for DES. Uses a 128-bit encryption key for RC2. Ignored if you do not select an encryption type. Default is 1. Type of encryption that the Data Integration Service uses. Select one of the following values: - None - RC2 - DES Default is None. Interprets the pacing size as rows or kilobytes. Select to represent the pacing size in number of rows. If you clear this option, the pacing size represents kilobytes. Default is Disabled. Location of the PowerExchange Listener node that can connect to the data object. The location is defined in the first parameter of the NODE statement in the PowerExchange dbmover.cfg configuration file.

Array Size

Compression

Encryption Level

Encryption Type

Interpret as Rows

Location

24

Chapter 2: Connections

Option OffLoad Processing

Description Moves bulk data processing from the source machine to the Data Integration Service machine. Enter one of the following values: - Auto. The Data Integration Service determines whether to use offload processing. - Yes. Use offload processing. - No. Do not use offload processing. Default is Auto. Amount of data that the source system can pass to the PowerExchange Listener. Configure the pacing size if an external application, database, or the Data Integration Service node is a bottleneck. The lower the value, the faster the performance. Minimum value is 0. Enter 0 for maximum performance. Default is 0. Number of threads that the Data Integration Service uses to process bulk data when offload processing is enabled. For optimal performance, this value should not exceed the number of available processors on the Data Integration Service machine. Valid values are 1 through 64. Default is 0, which disables multithreading. Mode in which Data Integration Service sends data to the PowerExchange Listener. Configure one of the following write modes: - CONFIRMWRITEON. Sends data to the PowerExchange Listener and waits for a response before sending more data. Select if error recovery is a priority. This option might decrease performance. - CONFIRMWRITEOFF. Sends data to the PowerExchange Listener without waiting for a response. Use this option when you can reload the target table if an error occurs. - ASYNCHRONOUSWITHFAULTTOLERANCE. Sends data to the PowerExchange Listener without waiting for a response. This option also provides the ability to detect errors. This provides the speed of confirm write off with the data integrity of confirm write on. Default is CONFIRMWRITEON.

Pacing Size

Worker Threads

Write Mode

VSAM Connection Properties


Use a VSAM connection to connect to a VSAM data set. The following table describes the VSAM connection properties:
Option Code Page Description Required. Code to read from or write to the VSAM file. Use the ISO code page name, such as ISO-8859-6. The code page name is not case sensitive. Determines the number of records in the storage array for the threads when the worker threads value is greater than 0. Valid values are from 1 through 100000. Default is 25. Compresses the data to decrease the amount of data Informatica applications write over the network. True or false. Default is false. Level of encryption that the Data Integration Service uses. If you select RC2 or DES for Encryption Type, select one of the following values to indicate the encryption level: - 1 - Uses a 56-bit encryption key for DES and RC2. - 2 - Uses 168-bit triple encryption key for DES. Uses a 64-bit encryption key for RC2. - 3 - Uses 168-bit triple encryption key for DES. Uses a 128-bit encryption key for RC2. Ignored if you do not select an encryption type.

Array Size

Compression

Encryption Level

VSAM Connection Properties

25

Option

Description Default is 1.

Encryption Type

Enter one of the following values for the encryption type: - None - RC2 - DES Default is None. Interprets the pacing size as rows or kilobytes. Select to represent the pacing size in number of rows. If you clear this option, the pacing size represents kilobytes. Default is Disabled. Location of the PowerExchange Listener node that can connect to the VSAM file. The location is defined in the first parameter of the NODE statement in the PowerExchange dbmover.cfg configuration file. Moves bulk data processing from the VSAM source to the Data Integration Service machine. Enter one of the following values: - Auto. The Data Integration Service determines whether to use offload processing. - Yes. Use offload processing. - No. Do not use offload processing. Default is Auto. Amount of data the source system can pass to the PowerExchange Listener. Configure the pacing size if an external application, database, or the Data Integration Service node is a bottleneck. The lower the value, the faster the performance. Minimum value is 0. Enter 0 for maximum performance. Default is 0. Number of threads that the Data Integration Service uses to process bulk data when offload processing is enabled. For optimal performance, this value should not exceed the number of available processors on the Data Integration Service machine. Valid values are 1 through 64. Default is 0, which disables multithreading. Mode in which Data Integration Service sends data to the PowerExchange Listener. Configure one of the following write modes: - CONFIRMWRITEON. Sends data to the PowerExchange Listener and waits for a response before sending more data. Select if error recovery is a priority. This option might decrease performance. - CONFIRMWRITEOFF. Sends data to the PowerExchange Listener without waiting for a response. Use this option when you can reload the target table if an error occurs. - ASYNCHRONOUSWITHFAULTTOLERANCE. Sends data to the PowerExchange Listener without waiting for a response. This option also provides the ability to detect errors. This provides the speed of confirm write off with the data integrity of confirm write on. Default is CONFIRMWRITEON.

Interpret as Rows

Location

OffLoad Processing

PacingSize

Worker Threads

Write Mode

Connection Explorer View


Use the Connection Explorer view to view relational database connections and to create relational data objects. You can complete the following tasks in the Connection Explorer view:
Add a connection to the view. Click the Select Connection button to choose one or more connections to add to

the Connection Explorer view.


Connect to a relational database. Right-click a relational database and click Connect.

26

Chapter 2: Connections

Disconnect from a relational database. Right-click a relational database and click Disconnect. Create a relational data object. After you connect to a relational database, expand the database to view tables.

Right-click a table and click Add to Project to open the New Relational Data Object dialog box.
Refresh a connection. Right-click a connection and click Refresh. Show only the default schema. Right-click a connection and click Show Default Schema Only. Default is

enabled.
Delete a connection from the Connection Explorer view. The connection remains in the Model repository.

Right-click a connection and click Delete. Note: When you use a Microsoft SQL Server connection to access tables in a Microsoft SQL Server database, the Developer tool does not display the synonyms for the tables.

Creating a Connection
Create a connection before you import relational data objects, preview data, profile data, or run mappings. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. Click Window > Preferences. Select Informatica > Connections. Expand the domain. Select Databases and click Add. Enter a connection name. Optionally, enter a connection description. Select the type of database that you want to connect to. Click Next. Configure the connection properties. Click Test Connection to verify that you entered the connection properties correctly and that you can connect to the database. Click Finish.

After you create a connection, you can add it to the Connection Explorer view.

Creating a Connection

27

CHAPTER 3

Physical Data Objects


This chapter includes the following topics:
Physical Data Objects Overview, 28 Relational Data Objects, 29 Customized Data Objects, 31 Nonrelational Data Objects, 44 Flat File Data Objects, 45 SAP Data Objects, 56 Synchronization, 56 Troubleshooting Physical Data Objects, 57

Physical Data Objects Overview


A physical data object is the representation of data that is based on a flat file, relational database, nonrelational database, or SAP resource. Create a physical data object to read data from resources or write data to resources. A physical data object can be one of the following types: Relational data object A physical data object that uses a relational table, view, or synonym as a source. For example, you can create a relational data object from a DB2 i5/OS table or an Oracle view. Customized data object A physical data object that uses one or multiple related relational resources or relational data objects as sources. Relational resources include tables, views, and synonyms. For example, you can create a customized data object from two Microsoft SQL Server tables that have a primary key-foreign key relationship. Create a customized data object if you want to perform operations such as joining data, filtering rows, sorting ports, or running custom queries when the Data Integration Service reads source data. Nonrelational data object A physical data object that uses a nonrelational database resource as a source. For example, you can create a nonrelational data object from a VSAM source. Flat file data object A physical data object that uses a flat file as a source. You can create a flat file data object from a delimited or fixed-width flat file.

28

SAP data object A physical data object that uses an SAP source. If the data object source changes, you can synchronize the physical data object. When you synchronize a physical data object, the Developer tool reimports the object metadata. You can create any physical data object in a project or folder. Physical data objects in projects and folders are reusable objects. You can use them in any type of mapping, mapplet, or profile, but you cannot change the data object within the mapping, mapplet, or profile. To update the physical data object, you must edit the object within the project or folder. You can include a physical data object in a mapping, mapplet, or profile. You can add a physical data object to a mapping or mapplet as a read, write, or lookup transformation. You can add a physical data object to a logical data object mapping to map logical data objects. You can also include a physical data object in a virtual table mapping when you define an SQL data service.

Relational Data Objects


Import a relational data object to include in a mapping, mapplet, or profile. A relational data object is a physical data object that uses a relational table, view, or synonym as a source. You can create primary key-foreign key relationships between relational data objects. You can create key relationships between relational data objects whether or not the relationships exist in the source database. You can include relational data objects in mappings and mapplets. You can a add relational data object to a mapping or mapplet as a read, write, or lookup transformation. You can add multiple relational data objects to a mapping or mapplet as sources. When you add multiple relational data objects at the same time, the Developer tool prompts you to add the objects in either of the following ways:
As related data objects. The Developer tool creates one read transformation. The read transformation has the

same capabilities as a customized data object.


As independent data objects. The Developer tool creates one read transformation for each relational data

object. The read transformations have the same capabilities as relational data objects. You can import the following types of relational data object:
DB2 for i5/OS DB2 for z/OS IBM DB2 Microsoft SQL Server ODBC Oracle

Key Relationships
You can create key relationships between relational data objects. Key relationships allow you to join relational data objects when you use them as sources in a customized data object or as read transformations in a mapping or mapplet. When you import relational data objects, the Developer tool retains the primary key information defined in the database. When you import related relational data objects at the same time, the Developer tool also retains foreign keys and key relationships. However, if you import related relational data objects separately, you must re-create the key relationships after you import the objects.

Relational Data Objects

29

To create key relationships between relational data objects, first create a primary key in the referenced object. Then create the relationship in the relational data object that contains the foreign key. The key relationships that you create exist in the relational data object metadata. You do not need to alter the source relational resources.

Creating Keys in a Relational Data Object


Create key columns to identify each row in a relational data object. You can create one primary key in each relational data object. 1. 2. 3. Open the relational data object. Select the Keys view. Click Add. The New Key dialog box appears. 4. 5. 6. 7. 8. Enter a key name. If the key is a primary key, select Primary Key. Select the key columns. Click OK. Save the relational data object.

Creating Relationships between Relational Data Objects


You can create key relationships between relational data objects. You cannot create key relationships between a relational data object and a customized data object. The relational data object that you reference must have a primary key. 1. 2. 3. Open the relational data object where you want to create a foreign key. Select the Relationships view. Click Add. The New Relationship dialog box appears. 4. 5. 6. 7. 8. Enter a name for the foreign key. Select a primary key from the referenced relational data object. Click OK. In the Relationships properties, select the foreign key columns. Save the relational data object.

Creating a Read Transformation from Relational Data Objects


You can a add relational data object to a mapping or mapplet as a read transformation. When you add multiple relational data objects at the same time, you can add them as related or independent objects. 1. 2. 3. Open the mapping or mapplet in which you want to create a read transformation. In the Object Explorer view, select one or more relational data objects. Drag the relational data objects into the mapping editor. The Add to Mapping dialog box appears. 4. Select the Read option.

30

Chapter 3: Physical Data Objects

5.

If you add multiple data objects, select one of the following options:
Option As related data objects Description The Developer tool creates one read transformation. The read transformation has the same capabilities as a customized data object. The Developer tool creates one read transformation for each relational data object. Each read transformation has the same capabilities as a relational data object.

As independent data objects

6. 7.

If the relational data objects use different connections, select the default connection. Click OK. The Developer tool creates one or multiple read transformations in the mapping or mapplet.

Importing a Relational Data Object


Import a relational data object to add to a mapping, mapplet, or profile. Before you import a relational data object, you must configure a connection to the database. 1. 2. Select a project or folder in the Object Explorer view. Click File > New > Data Object. The New dialog box appears. 3. Select Relational Data Object and click Next. The New Relational Data Object dialog box appears. 4. 5. 6. 7. 8. 9. Click Browse next to the Connection option and select a connection to the database. Click Create data object from existing resource. Click Browse next to the Resource option and select the table, view, or synonym that you want to import. Enter a name for the physical data object. Click Browse next to the Location option and select the project where you want to import the relational data object. Click Finish. The data object appears under Physical Data Objects in the project or folder in the Object Explorer view.

Customized Data Objects


Create a customized data object to include in a mapping, mapplet, or profile. Customized data objects are physical data objects that use relational resources as sources. Customized data objects allow you to perform tasks that relational data objects do not allow you to perform, such as joining data from related resources and filtering rows. When you create a customized data object, the Data Integration Service generates a default SQL query that it uses to read data from the source relational resources. The default query is a SELECT statement for each column that it reads from the sources. Create a customized data object to perform the following tasks:
Join source data that originates from the same source database. You can join multiple tables with primary key-

foreign key relationships whether or not the relationships exist in the database.

Customized Data Objects

31

Select distinct values from the source. If you choose Select Distinct, the Data Integration Service adds a

SELECT DISTINCT statement to the default SQL query.


Filter rows when the Data Integration Service reads source data. If you include a filter condition, the Data

Integration Service adds a WHERE clause to the default query.


Specify sorted ports. If you specify a number for sorted ports, the Data Integration Service adds an ORDER BY

clause to the default SQL query.


Specify an outer join instead of the default inner join. If you include a user-defined join, the Data Integration

Service replaces the join information specified by the metadata in the SQL query.
Create a custom query to issue a special SELECT statement for the Data Integration Service to read source

data. The custom query replaces the default query that the Data Integration Service uses to read data from sources.
Add pre- and post-mapping SQL commands. The Data Integration Service runs pre-mapping SQL commands

against the source database before it reads the source. It runs post-mapping SQL commands against the source database after it writes to the target.
Define parameters for the data object. You can define and assign parameters in a customized data object to

represent connections. When you run a mapping that uses the customized data object, you can define different values for the connection parameters at runtime.
Retain key relationships when you synchronize the object with the sources. If you create a customized data

object that contains multiple tables, and you define key relationships that do not exist in the database, you can retain the relationships when you synchronize the data object. You can create customized data objects in projects and folders. The customized data objects that you create in projects and folders are reusable. You can use them in multiple mappings, mapplets, and profiles. You cannot change them from within a mapping, mapplet, or profile. If you change a customized data object in a project or folder, the Developer tool updates the object in all mappings, mapplets, and profiles that use the object. You can create customized data objects from the following types of connections and objects:
DB2 i5/OS connections DB2 z/OS connections IBM DB2 connections Microsoft SQL Server connections ODBC connections Oracle connections Relational data objects

You can also add sources to a customized data object through a custom SQL query.

Default Query
When you create a customized data object, the Data Integration Service generates a default SQL query that it uses to read data from the source relational resources. The default query is a SELECT statement for each column that it reads from the sources. You can override the default query through the simple or advanced query. Use the simple query to select distinct values, enter a source filter, sort ports, or enter a user-defined join. Use the advanced query to create a custom SQL query for reading data from the sources. The custom query overrides the default and simple queries. If any table name or column name contains a database reserved word, you can create and maintain a reserved words file, reswords.txt. Create the reswords.txt file on any machine the Data Integration Service can access.

32

Chapter 3: Physical Data Objects

When the Data Integration Service runs a mapping, it searches for the reswords.txt file. If the file exists, the Data Integration Service places quotes around matching reserved words when it executes SQL against the database. If you override the default query, you must enclose any database reserved words in quotes. When the Data Integration Service generates the default query, it delimits table and field names containing the following characters with double quotes:
/ + - = ~ ` ! % ^ & * ( ) [ ] { } ' ; ? , < > \ | <space>

Creating a Reserved Words File


Create a reserved words file if any table name or column name in the customized data object contains a database reserved word. You must have administrator privileges to configure the Data Integration Service to use the reserved words file. 1. 2. 3. Create a file called "reswords.txt." Create a section for each database by entering the database name within square brackets, for example, [Oracle]. Add the reserved words to the file below the database name. For example:
[Oracle] OPTION START where number [SQL Server] CURRENT where number

Entries are not case-sensitive. 4. 5. 6. 7. Save the reswords.txt file. In the Administration Console, select the Data Integration Service. Edit the custom properties. Add the following custom property:
Name Reserved Words File Value <path>\reswords.txt

8.

Restart the Data Integration Service.

Key Relationships
You can create key relationships between sources in a customized data object when the sources are relational resources. Key relationships allow you to join the sources within the customized data object. Note: If a customized data object uses relational data objects as sources, you cannot create key relationships within the customized data object. You must create key relationships between the relational data objects instead. When you import relational resources into a customized data object, the Developer tool retains the primary key information defined in the database. When you import related relational resources into a customized data object at the same time, the Developer tool also retains key relationship information. However, if you import related relational resources separately, you must re-create the key relationships after you import the objects into the customized data object.

Customized Data Objects

33

When key relationships exist between sources in a customized data object, the Data Integration Service joins the sources based on the related keys in each source. The default join is an inner equijoin that uses the following syntax in the WHERE clause:
Source1.column_name = Source2.column_name

You can override the default join by entering a user-defined join or by creating a custom query. To create key relationships in a customized data object, first create a primary key in the referenced source transformation. Then create the relationship in the source transformation that contains the foreign key. The key relationships that you create exist in the customized data object metadata. You do not need to alter the source relational resources.

Creating Keys in a Customized Data Object


Create key columns to identify each row in a source transformation. You can create one primary key in each source transformation. 1. 2. 3. Open the customized data object. Select the Read view. Select the source transformation where you want to create a key. The source must be a relational resource, not a relational data object. If the source is a relational data object, you must create keys in the relational data object. 4. 5. Select the Keys properties. Click Add. The New Key dialog box appears. 6. 7. 8. 9. 10. Enter a key name. If the key is a primary key, select Primary Key. Select the key columns. Click OK. Save the customized data object.

Creating Relationships within a Customized Data Object


You can create key relationships between sources in a customized data object. The source transformation that you reference must have a primary key. 1. 2. 3. Open the customized data object. Select the Read view. Select the source transformation where you want to create a foreign key. The source must be a relational resource, not a relational data object. If the source is a relational data object, you must create relationships in the relational data object. 4. 5. Select the Relationships properties. Click Add. The New Relationship dialog box appears. 6. 7. 8. Enter a name for the foreign key. Select a primary key from the referenced source transformation. Click OK.

34

Chapter 3: Physical Data Objects

9. 10.

In the Relationships properties, select the foreign key columns. Save the customized data object.

Select Distinct
You can select unique values from sources in a customized data object through the select distinct option. When you use select distinct, the Data Integration Service adds a SELECT DISTINCT statement to the default SQL query. Use the select distinct option in a customized data object to filter out unnecessary source data. For example, you might use the select distinct option to extract unique customer IDs from a table that lists total sales. When you use the customized data object in a mapping, the Data Integration Service filters out unnecessary data earlier in the data flow, which can increase performance.

Using Select Distinct


You can configure a customized data object to select unique values from the source relational resource. The Data Integration Service filters out unnecessary data when you use the customized data object in a mapping. 1. 2. 3. 4. 5. 6. 7. Open the customized data object. Select the Read view. Select the Output transformation. Select the Query properties. Select the simple query. Enable the Select Distinct option. Save the customized data object.

Filter
You can enter a filter value in a read operation. The filter specifies the where clause of select statment of ABAP program. Use a filter to reduce the number of rows that the Data Integration Service reads from the source SAP table. When you enter a source filter, the Developer tool adds a WHERE clause to the default query in the ABAP program.

Entering a Source Filter


Enter a source filter to reduce the number of rows the Data Integration Service reads from the source relational resource. 1. 2. 3. 4. 5. 6. Open the customized data object. Select the Read view. Select the Output transformation. Select the Query properties. Select the simple query. Click Edit next to the Filter field. The SQL Query dialog box appears. 7. Enter the filter condition in the SQL Query field.

Customized Data Objects

35

You can select column names from the Columns list. 8. 9. 10. Click OK. Click Validate to validate the filter condition. Save the customized data object.

Sorted Ports
You can use sorted ports in a customized data object to sort rows queried from the sources. The Data Integration Service adds the ports to the ORDER BY clause in the default query. When you use sorted ports, the Data Integration Service creates the SQL query used to extract source data, including the ORDER BY clause. The database server performs the query and passes the resulting data to the Data Integration Service. You might use sorted ports to increase performance when you include any of the following transformations in a mapping:
Aggregator. When you configure an Aggregator transformation for sorted input, you can send sorted data by

using sorted ports. The group by ports in the Aggregator transformation must match the order of the sorted ports in the customized data object.
Joiner. When you configure a Joiner transformation for sorted input, you can send sorted data by using sorted

ports. Configure the order of the sorted ports the same in each customized data object. Note: You can also use the Sorter transformation to sort relational and flat file data before Aggregator and Joiner transformations.

Using Sorted Ports


Use sorted ports to sort column data in a customized data object. When you use the customized data object as a read transformation in a mapping or mapplet, you can send sorted data to transformations downstream from the read transformation. 1. 2. 3. 4. 5. 6. Open the customized data object. Select the Read view. Select the Output transformation. Select the Query properties. Select the simple query. Click Edit next to the Sort field. The Sort dialog box appears. 7. 8. 9. To specify a column as a sorted port, click the New button. Select the column and sort type, either ascending or descending. Repeat steps 7 and 8 to select other columns to sort. The Developer tool sorts the columns in the order in which they appear in the Sort dialog box. 10. Click OK. In the Query properties, the Developer tool displays the sort columns in the Sort field. 11. 12. Click Validate to validate the sort syntax. Save the customized data object.

36

Chapter 3: Physical Data Objects

User-Defined Joins
You can enter a user-defined join in a customized data object. A user-defined join specifies the condition used to join data from multiple sources in the same customized data object. You can use a customized data object with a user-defined join as a read transformation in a mapping. The source database performs the join before it passes data to the Data Integration Service. This can improve mapping performance when the source tables are indexed. Enter a user-defined join in a customized data object to join data from related sources. The user-defined join overrides the default inner equijoin that the Data Integration creates based on the related keys in each source. When you enter a user-defined join, enter the contents of the WHERE clause that specifies the join condition. If the user-defined join performs an outer join, the Data Integration Service might insert the join syntax in the WHERE clause or the FROM clause, based on the database syntax. You might need to enter a user-defined join in the following circumstances:
Columns do not have a primary key-foreign key relationship. The datatypes of columns used for the join do not match. You want to specify a different type of join, such as an outer join.

Use the following guidelines when you enter a user-defined join in a customized data object:
Do not include the WHERE keyword in the user-defined join. Enclose all database reserved words in quotes. If you use Informatica join syntax, and Enable quotes in SQL is enabled for the connection, you must enter

quotes around the table names and the column names if you enter them manually. If you select tables and columns when you enter the user-defined join, the Developer tool places quotes around the table names and the column names. User-defined joins join data from related resources in a database. To join heterogeneous sources, use a Joiner transformation in a mapping that reads data from the sources. To perform a self-join, you must enter a custom SQL query that includes the self-join.

Entering a User-Defined Join


Enter a user-defined join in a customized data object to specify the join condition for the customized data object sources. 1. 2. 3. 4. 5. 6. Open the customized data object. Select the Read view. Select the Output transformation. Select the Query properties. Select the simple query. Click Edit next to the Join field. The SQL Query dialog box appears. 7. Enter the user-defined join in the SQL Query field. You can select column names from the Columns list. 8. 9. 10. Click OK. Click Validate to validate the user-defined join. Save the customized data object.

Customized Data Objects

37

Custom Queries
You can create a custom SQL query in a customized data object. When you create a custom query, you issue a special SELECT statement that the Data Integration Service uses to read source data. You can create a custom query to add sources to an empty customized data object. You can also use a custom query to override the default SQL query. The custom query you enter overrides the default SQL query that the Data Integration Service uses to read data from the source relational resource. The custom query also overrides the simple query settings you specify when you enter a source filter, use sorted ports, enter a user-defined join, or select distinct ports. You can use a customized data object with a custom query as a read transformation in a mapping. The source database executes the query before it passes data to the Data Integration Service. Use the following guidelines when you create a custom query in a customized data object:
In the SELECT statement, list the column names in the order in which they appear in the source transformation. Enclose all database reserved words in quotes.

If you use a customized data object to perform a self-join, you must enter a custom SQL query that includes the self-join.

Creating a Custom Query


Create a custom query in a customized data object to issue a special SELECT statement for reading data from the sources. The custom query overrides the default query that the Data Integration Service issues to read source data. 1. 2. 3. 4. 5. 6. Open the customized data object. Select the Read view. Select the Output transformation. Select the Query properties. Select the advanced query. Select Use custom query. The Data Integration Service displays the query it issues to read source data. 7. 8. Change the query or replace it with a custom query. Save the customized data object.

Outer Join Support


You can use a customized data object to perform an outer join of two sources in the same database. When the Data Integration Service performs an outer join, it returns all rows from one source resource and rows from the second source resource that match the join condition. Use an outer join when you want to join two resources and return all rows from one of the resources. For example, you might perform an outer join when you want to join a table of registered customers with a monthly purchases table to determine registered customer activity. You can join the registered customer table with the monthly purchases table and return all rows in the registered customer table, including customers who did not make purchases in the last month. If you perform a normal join, the Data Integration Service returns only registered customers who made purchases during the month, and only purchases made by registered customers. With an outer join, you can generate the same results as a master outer or detail outer join in the Joiner transformation. However, when you use an outer join, you reduce the number of rows in the data flow which can increase performance.

38

Chapter 3: Physical Data Objects

You can enter two kinds of outer joins:


Left. The Data Integration Service returns all rows for the resource to the left of the join syntax and the rows

from both resources that meet the join condition.


Right. The Data Integration Service returns all rows for the resource to the right of the join syntax and the rows

from both resources that meet the join condition. Note: Use outer joins in nested query statements when you override the default query. You can enter an outer join in a user-defined join or in a custom SQL query.

Informatica Join Syntax


When you enter join syntax, use the Informatica or database-specific join syntax. When you use the Informatica join syntax, the Data Integration Service translates the syntax and passes it to the source database during a mapping run. Note: Always use database-specific syntax for join conditions. When you use Informatica join syntax, enclose the entire join statement in braces ({Informatica syntax}). When you use database syntax, enter syntax supported by the source database without braces. When you use Informatica join syntax, use table names to prefix column names. For example, if you have a column named FIRST_NAME in the REG_CUSTOMER table, enter REG_CUSTOMER.FIRST_NAME in the join syntax. Also, when you use an alias for a table name, use the alias within the Informatica join syntax to ensure the Data Integration Service recognizes the alias. You can combine left outer or right outer joins with normal joins in a single customized data object. You cannot combine left and right outer joins. Use multiple normal joins and multiple left outer joins. Some databases limit you to using one right outer join. When you combine joins, enter the normal joins first.

Normal Join Syntax


You can create a normal join using the join condition in a customized data object. However, if you create an outer join, you must override the default join. As a result, you must include the normal join in the join override. When you include a normal join in the join override, list the normal join before outer joins. You can enter multiple normal joins in the join override. To create a normal join, use the following syntax:
{ source1 INNER JOIN source2 on join_condition }

The following table displays the syntax for normal joins in a join override:
Syntax source1 Description Source resource name. The Data Integration Service returns rows from this resource that match the join condition. Source resource name. The Data Integration Service returns rows from this resource that match the join condition. Condition for the join. Use syntax supported by the source database. You can combine multiple join conditions with the AND operator.

source2

join_condition

For example, you have a REG_CUSTOMER table with data for registered customers:
CUST_ID 00001 FIRST_NAME Marvin LAST_NAME Chi Customized Data Objects 39

CUST_ID 00002 00003 00004

FIRST_NAME Dinah John J.

LAST_NAME Jones Bowden Marks

The PURCHASES table, refreshed monthly, contains the following data:


TRANSACTION_NO 06-2000-0001 06-2000-0002 06-2000-0003 06-2000-0004 06-2000-0005 06-2000-0006 06-2000-0007 CUST_ID 00002 00002 00001 00004 00002 NULL NULL DATE 6/3/2000 6/10/2000 6/10/2000 6/15/2000 6/21/2000 6/23/2000 6/24/2000 AMOUNT 55.79 104.45 255.56 534.95 98.65 155.65 325.45

To return rows displaying customer names for each transaction in the month of June, use the following syntax:
{ REG_CUSTOMER INNER JOIN PURCHASES on REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID }

The Data Integration Service returns the following data:


CUST_ID 00002 00002 00001 00004 00002 DATE 6/3/2000 6/10/2000 6/10/2000 6/15/2000 6/21/2000 AMOUNT 55.79 104.45 255.56 534.95 98.65 FIRST_NAME Dinah Dinah Marvin J. Dinah LAST_NAME Jones Jones Chi Marks Jones

The Data Integration Service returns rows with matching customer IDs. It does not include customers who made no purchases in June. It also does not include purchases made by non-registered customers.

Left Outer Join Syntax


You can create a left outer join with a join override. You can enter multiple left outer joins in a single join override. When using left outer joins with other joins, list all left outer joins together, after any normal joins in the statement. To create a left outer join, use the following syntax:
{ source1 LEFT OUTER JOIN source2 on join_condition }

The following tables displays syntax for left outer joins in a join override:
Syntax source1 Description Source resource name. With a left outer join, the Data Integration Service returns all rows in this resource. Source resource name. The Data Integration Service returns rows from this resource that match the join condition. Condition for the join. Use syntax supported by the source database. You can combine multiple join conditions with the AND operator.

source2

join_condition

For example, using the same REG_CUSTOMER and PURCHASES tables described in Normal Join Syntax on page 39, you can determine how many customers bought something in June with the following join override:
{ REG_CUSTOMER LEFT OUTER JOIN PURCHASES on REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID }

The Data Integration Service returns the following data:


CUST_ID 00001 00002 00003 00004 00002 00002 FIRST_NAME Marvin Dinah John J. Dinah Dinah LAST_NAME Chi Jones Bowden Marks Jones Jones DATE 6/10/2000 6/3/2000 NULL 6/15/2000 6/10/2000 6/21/2000 AMOUNT 255.56 55.79 NULL 534.95 104.45 98.65

40

Chapter 3: Physical Data Objects

The Data Integration Service returns all registered customers in the REG_CUSTOMERS table, using null values for the customer who made no purchases in June. It does not include purchases made by non-registered customers. Use multiple join conditions to determine how many registered customers spent more than $100.00 in a single purchase in June:
{REG_CUSTOMER LEFT OUTER JOIN PURCHASES on (REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID AND PURCHASES.AMOUNT > 100.00) }

The Data Integration Service returns the following data:


CUST_ID 00001 00002 00003 00004 FIRST_NAME Marvin Dinah John J. LAST_NAME Chi Jones Bowden Marks DATE 6/10/2000 6/10/2000 NULL 6/15/2000 AMOUNT 255.56 104.45 NULL 534.95

You might use multiple left outer joins if you want to incorporate information about returns during the same time period. For example, the RETURNS table contains the following data:
CUST_ID 00002 00002 CUST_ID 6/10/2000 6/21/2000 RETURN 55.79 104.45

To determine how many customers made purchases and returns for the month of June, use two left outer joins:
{ REG_CUSTOMER LEFT OUTER JOIN PURCHASES on REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID LEFT OUTER JOIN RETURNS on REG_CUSTOMER.CUST_ID = PURCHASES.CUST_ID }

The Data Integration Service returns the following data:


CUST_ID 00001 00002 00003 00004 00002 00002 00002 00002 FIRST_NAME Marvin Dinah John J. Dinah Dinah Dinah Dinah LAST_NAME Chi Jones Bowden Marks Jones Jones Jones Jones DATE 6/10/2000 6/3/2000 NULL 6/15/2000 6/10/2000 6/21/2000 NULL NULL AMOUNT 255.56 55.79 NULL 534.95 104.45 98.65 NULL NULL RET_DATE NULL NULL NULL NULL NULL NULL 6/10/2000 6/21/2000 RETURN NULL NULL NULL NULL NULL NULL 55.79 104.45

The Data Integration Service uses NULLs for missing values.

Right Outer Join Syntax


You can create a right outer join with a join override. The right outer join returns the same results as a left outer join if you reverse the order of the resources in the join syntax. Use only one right outer join in a join override. If you want to create more than one right outer join, try reversing the order of the source resources and changing the join types to left outer joins. When you use a right outer join with other joins, enter the right outer join at the end of the join override. To create a right outer join, use the following syntax:
{ source1 RIGHT OUTER JOIN source2 on join_condition }

Customized Data Objects

41

The following table displays syntax for a right outer join in a join override:
Syntax source1 Description Source resource name. The Data Integration Service returns rows from this resource that match the join condition. Source resource name. With a right outer join, the Data Integration Service returns all rows in this resource. Condition for the join. Use syntax supported by the source database. You can combine multiple join conditions with the AND operator.

source2 join_condition

Pre- and Post-Mapping SQL Commands


You can create SQL commands in a customized data object that the Data Integration Service runs against the source relational resource. When you use the customized data object in a mapping, the Data Integration Service runs pre-mapping SQL commands against the source database before it reads the source. It runs post-mapping SQL commands against the source database after it writes to the target. Use the following guidelines when you enter pre- and post-mapping SQL commands in a customized data object:
Use any command that is valid for the database type. The Data Integration Service does not allow nested

comments, even though the database might.


Use a semicolon (;) to separate multiple statements. The Data Integration Service issues a commit after each

statement.
The Data Integration Service ignores semicolons within /* ... */. If you need to use a semicolon outside comments, you can escape it with a backslash (\). When you escape

the semicolon, the Data Integration Service ignores the backslash, and it does not use the semicolon as a statement separator.
The Developer tool does not validate the SQL.

Adding Pre- and Post-Mapping SQL Commands


You can add pre- and post-mapping SQL commands to a customized data object. The Data Integration Service runs the SQL commands when you use the customized data object in a mapping. 1. 2. 3. 4. 5. 6. 7. Open the customized data object. Select the Read view. Select the Output transformation Select the Advanced properties. Enter a pre-mapping SQL command in the PreSQL field. Enter a post-mapping SQL command in the PostSQL field. Save the customized data object.

Customized Data Objects Write Properties


The Data Integration Service uses write properties when it writes data to relational resources. To edit write properties, select the Input transformation in the Write view, and then select the Advanced properties.

42

Chapter 3: Physical Data Objects

The following table describes the write properties that you configure for customized data objects:
Property Load type Description Type of target loading. Select Normal or Bulk. If you select Normal, the Data Integration Service loads targets normally. You can choose Bulk when you load to DB2, Sybase, Oracle, or Microsoft SQL Server. If you specify Bulk for other database types, the Data Integration Service reverts to a normal load. Bulk loading can increase mapping performance, but it limits the ability to recover because no database logging occurs. Choose Normal mode if the mapping contains an Update Strategy transformation. If you choose Normal and the Microsoft SQL Server target name includes spaces, configure the following environment SQL in the connection object:
SET QUOTED_IDENTIFIER ON

Update override Delete

Overrides the default UPDATE statement for the target. Deletes all rows flagged for delete. Default is enabled. Inserts all rows flagged for insert. Default is enabled. Truncates the target before it loads data. Default is disabled. Update strategy for existing rows. You can select one of the following strategies: - Update as update. The Data Integration Service updates all rows flagged for update. - Update as insert. The Data Integration Service inserts all rows flagged for update. You must also select the Insert target option. - Update else insert. The Data Integration Service updates rows flagged for update if they exist in the target and then inserts any remaining rows marked for insert. You must also select the Insert target option. SQL command the Data Integration Service runs against the target database before it reads the source. The Developer tool does not validate the SQL. SQL command the Data Integration Service runs against the target database after it writes to the target. The Developer tool does not validate the SQL.

Insert

Truncate target table

Update strategy

PreSQL

PostSQL

Creating a Customized Data Object


Create a customized data object to add to a mapping, mapplet, or profile. After you create a customized data object, add sources to it. 1. 2. Select a project or folder in the Object Explorer view. Click File > New > Data Object. The New dialog box appears. 3. Select Relational Data Object and click Next. The New Relational Data Object dialog box appears. 4. 5. 6. 7. Click Browse next to the Connection option and select a connection to the database. Click Create customized data object. Enter a name for the customized data object. Click Browse next to the Location option and select the project where you want to create the customized data object.

Customized Data Objects

43

8.

Click Finish. The customized data object appears under Physical Data Objects in the project or folder in the Object Explorer view.

Add sources to the customized data object. You can add relational resources or relational data objects as sources. You can also use a custom SQL query to add sources.

Adding Relational Resources to a Customized Data Object


After you create a customized data object, add sources to it. You can use relational resources as sources. Before you add relational resources to a customized data object, you must configure a connection to the database. 1. 2. In the Connection Explorer view, select one or more relational resources in the same relational connection. Right-click in the Connection Explorer view and select Add to project. The Add to Project dialog box appears. 3. Select Add as related resource(s) to existing customized data object and click OK. The Add to Data Object dialog box appears. 4. 5. Select the customized data object and click OK. If you add multiple resources to the customized data object, the Developer tool prompts you to select the resource to write to. Select the resource and click OK. If you use the customized data object in a mapping as a write transformation, the Developer tool writes data to this resource. The Developer tool adds the resources to the customized data object.

Adding Relational Data Objects to a Customized Data Object


After you create a customized data object, add sources to it. You can use relational data objects as sources. 1. 2. 3. 4. 5. Open the customized data object. Select the Read view. In the Object Explorer view, select one or more relational data objects in the same relational connection. Drag the objects from the Object Explorer view to the customized data object Read view. If you add multiple relational data objects to the customized data object, the Developer tool prompts you to select the object to write to. Select the object and click OK. If you use the customized data object in a mapping as a write transformation, the Developer tool writes data to this relational data object. The Developer tool adds the relational data objects to the customized data object.

Nonrelational Data Objects


Import a nonrelational data object to include in a mapping, mapplet, or profile. You can import nonrelational data objects for the following types of connection:
IMS Sequential

44

Chapter 3: Physical Data Objects

VSAM

When you import a nonrelational data object, the Developer tool reads the metadata for the object from its data map. A data map maps nonrelational records to relational tables so that the product can use the SQL language to access the data. Use the PowerExchange Navigator to create data maps. For more information, see the PowerExchange Navigator Guide.

Importing a Nonrelational Data Object


Import a nonrelational data object to add to a mapping, mapplet, or profile. Before you import a nonrelational data object, you need to configure a connection to the database or data set. You also need to create a data map for the object. 1. 2. 3. Select a project or folder in the Object Explorer view. Click File > New > Data Object. Select Non-Relational Data Object and click Next. The New Non-Relational Data Object dialog box appears. 4. 5. 6. 7. Click Browse next to the Connection option and select a connection. Click Browse next to the Resource option and select the data map that you want to import. If the data map includes multiple tables, you can select the data map or an individual table. Enter a name for the physical data object. Click Finish. The data object appears under Data Object in the project or folder in the Object Explorer view.

Flat File Data Objects


Create or import a flat file data object to include in a mapping, mapplet, or profile. You can use flat file data objects as sources, targets, and lookups in mappings and mapplets. You can create profiles on flat file data objects. A flat file physical data object can be delimited or fixed-width. You can import fixed-width and delimited flat files that do not contain binary data. After you import a flat file data object, you might need to create parameters or configure file properties. Create parameters through the Parameters view. Edit file properties through the Overview, Read, Write, and Advanced views. The Overview view allows you to edit the flat file data object name and description. It also allows you to update column properties for the flat file data object. The Read view controls the properties that the Data Integration Service uses when it reads data from the flat file. The Read view contains the following transformations:
Source transformation. Defines the flat file that provides the source data. Select the source transformation to

edit properties such as the name and description, column properties, and source file format properties.
Output transformation. Represents the rows that the Data Integration Service reads when it runs a mapping.

Select the Output transformation to edit the file run-time properties such as the source file name and directory.

Flat File Data Objects

45

The Write view controls the properties that the Data Integration Service uses when it writes data to the flat file. The Write view contains the following transformations:
Input transformation. Represents the rows that the Data Integration Service writes when it runs a mapping.

Select the Input transformation to edit the file run-time properties such as the target file name and directory.
Target transformation. Defines the flat file that accepts the target data. Select the target transformation to edit

the name and description and the target file format properties. The Advanced view controls format properties that the Data Integration Service uses when it reads data from and writes data to the flat file. When you create mappings that use file sources or file targets, you can view flat file properties in the Properties view. You cannot edit file properties within a mapping, except for the reject file name, reject file directory, and tracing level.

Flat File Data Object Overview Properties


The Data Integration Service uses overview properties when it reads data from or writes data to a flat file. Overview properties include general properties, which apply to the flat file data object. They also include column properties, which apply to the columns in the flat file data object. The Developer tool displays overview properties for flat files in the Overview view. The following table describes the general properties that you configure for flat files:
Property Name Description Description Name of the flat file data object. Description of the flat file data object.

The following table describes the column properties that you configure for flat files:
Property Name Native type Bytes to process (fixed-width flat files) Precision Description Name of the column. Native datatype of the column. Number of bytes that the Data Integration Service reads or writes for the column.

Maximum number of significant digits for numeric datatypes, or maximum number of characters for string datatypes. For numeric datatypes, precision includes scale. Maximum number of digits after the decimal point for numeric values. Column format for numeric and datetime datatypes. For numeric datatypes, the format defines the thousand separator and decimal separator. Default is no thousand separator and a period (.) for the decimal separator. For datetime datatypes, the format defines the display format for year, month, day, and time. It also defines the field width. Default is "A 19 YYYY-MM-DD HH24:MI:SS." Determines whether the Data Integration Service can read data from or write data to the column. For example, when the visibility is Read, the Data Integration Service can read data from the column. It cannot write data to the column.

Scale Format

Visibility

46

Chapter 3: Physical Data Objects

Property

Description For flat file data objects, this property is read-only. The visibility is always Read and Write.

Description

Description of the column.

Flat File Data Object Read Properties


The Data Integration Service uses read properties when it reads data from a flat file. Select the source transformation to edit general, column, and format properties. Select the Output transformation to edit run-time properties.

General Properties
The Developer tool displays general properties for flat file sources in the source transformation in the Read view. The following table describes the general properties that you configure for flat file sources:
Property Name Description Name of the flat file. This property is read-only. You can edit the name in the Overview view. When you use the flat file as a source in a mapping, you can edit the name within the mapping. Description of the flat file.

Description

Columns Properties
The Developer tool displays column properties for flat file sources in the source transformation in the Read view. The following table describes the column properties that you configure for flat file sources:
Property Name Native type Bytes to process (fixed-width flat files) Precision Description Name of the column. Native datatype of the column. Number of bytes that the Data Integration Service reads for the column.

Maximum number of significant digits for numeric datatypes, or maximum number of characters for string datatypes. For numeric datatypes, precision includes scale. Maximum number of digits after the decimal point for numeric values. Column format for numeric and datetime datatypes. For numeric datatypes, the format defines the thousand separator and decimal separator. Default is no thousand separator and a period (.) for the decimal separator. For datetime datatypes, the format defines the display format for year, month, day, and time. It also defines the field width. Default is "A 19 YYYY-MM-DD HH24:MI:SS."

Scale Format

Flat File Data Objects

47

Property Shift key (fixed-width flat files)

Description Allows the user to define a shift-in or shift-out statefulness for the column in the fixed-width flat file. Description of the column.

Description

Format Properties
The Developer tool displays format properties for flat file sources in the source transformation in the Read view. The following table describes the format properties that you configure for delimited flat file sources:
Property Start import at line Description Row at which the Data Integration Service starts importing data. Use this option to skip header rows. Default is 1. Octal code for the character that separates rows of data. Default is line feed, \012 LF (\n). Character used to escape a delimiter character in an unquoted string if the delimiter is the next character after the escape character. If you specify an escape character, the Data Integration Service reads the delimiter character as a regular character embedded in the string. Note: You can improve mapping performance slightly if the source file does not contain quotes or escape characters. Includes the escape character in the output string. Default is disabled. Causes the Data Integration Service to treat one or more consecutive column delimiters as one. Otherwise, the Data Integration Service reads two consecutive delimiters as a null value. Default is disabled.

Row delimiter

Escape character

Retain escape character in data Treat consecutive delimiters as one

The following table describes the format properties that you configure for fixed-width flat file sources:
Property Start import at line Description Row at which the Data Integration Service starts importing data. Use this option to skip header rows. Default is 1. Number of bytes between the last column of one row and the first column of the next. The Data Integration Service skips the entered number of bytes at the end of each row to avoid reading carriage return characters or line feed characters. Enter 1 for UNIX files and 2 for DOS files. Default is 2. Causes the Data Integration Service to read a line feed character or carriage return character in the last column as the end of the column. Select this option if the file uses line feeds or carriage returns to shorten the last column of each row. Default is disabled. Strips trailing blanks from string values.

Number of bytes to skip between records

Line sequential

Strip trailing blanks

48

Chapter 3: Physical Data Objects

Property

Description Default is disabled.

User defined shift state

Allows you to select the shift state for source columns in the Columns properties. Select this option when the source file contains both multibyte and single-byte data, but does not contain shift-in and shift-out keys. If a multibyte file source does not contain shift keys, you must select a shift key for each column in the flat file data object. Select the shift key for each column to enable the Data Integration Service to read each character correctly. Default is disabled.

Run-time Properties
The Developer tool displays run-time properties for flat file sources in the Output transformation in the Read view. The following table describes the run-time properties that you configure for flat file sources:
Property Input type Description Type of source input. You can choose the following types of source input: - File. For flat file sources. - Command. For source data or a file list generated by a shell command. Indicates whether the source file contains the source data or a list of files with the same file properties. You can choose the following source file types: - Direct. For source files that contain the source data. - Indirect. For source files that contain a list of files. The Data Integration Service finds the file list and reads each listed file when it runs the mapping. File name of the flat file source. Directory where the flat file source exists. The machine that hosts Informatica Services must be able to access this directory. Command used to generate the source file data. Use a command to generate or transform flat file data and send the standard output of the command to the flat file reader when the mapping runs. The flat file reader reads the standard output as the flat file source data. Generating source data with a command eliminates the need to stage a flat file source. Use a command or script to send source data directly to the Data Integration Service instead of using a pre-mapping command to generate a flat file source. You can also use a command to generate a file list. For example, to use a directory listing as a file list, use the following command:
cd MySourceFiles; ls sales-records-Sep-*-2005.dat

Source type

Source file name Source file directory

Command

Truncate string null

Strips the first null character and all characters after the first null character from string values. Enable this option for delimited flat files that contain null characters in strings. If you do not enable this option, the Data Integration Service generates a row error for any row that contains null characters in a string. Default is disabled. Number of bytes that the Data Integration Service reads for each line. This property, together with the total row size, determines whether the Data Integration Service drops a row. If the row exceeds the larger of the line sequential buffer length or the total row size, the Data Integration Service drops the row and writes it to the mapping log file. To determine the total row size, add the column precision and the delimiters, and then multiply the total by the maximum bytes for each character. Default is 1024.

Line sequential buffer length

Flat File Data Objects

49

Configuring Flat File Read Properties


Configure read properties to control how the Data Integration Service reads data from a flat file. 1. 2. 3. 4. Open the flat file data object. Select the Read view. To edit general, column, or format properties, select the source transformation. To edit run-time properties, select the Output transformation. In the Properties view, select the properties you want to edit. For example, click Columns properties or Runtime properties. 5. 6. Edit the properties. Save the flat file data object.

Flat File Data Object Write Properties


The Data Integration Service uses write properties when it writes data to a flat file. Select the Input transformation to edit run-time properties. Select the target transformation to edit general and column properties.

Run-time Properties
The Developer tool displays run-time properties for flat file targets in the Input transformation in the Write view. The following table describes the run-time properties that you configure for flat file targets:
Property Append if exists Description Appends the output data to the target files and reject files. If you do not select this option, the Data Integration Service truncates the target file and reject file before writing data to them. If the files do not exist, the Data Integration Service creates them. Default is disabled. Creates the target directory if it does not exist. Default is disabled. Creates a header row in the file target. You can choose the following options: - No header. Does not create a header row in the flat file target. - Output field names. Creates a header row in the file target with the output port names . - Use header command output. Uses the command in the Header Command field to generate a header row. For example, you can use a command to add the date to a header row for the file target. Default is no header. Command used to generate the header row in the file target. Command used to generate the footer row in the file target. Type of target for the mapping. Select File to write the target data to a flat file. Select Command to output data to a command. Output directory for the flat file target. The machine that hosts Informatica Services must be able to access this directory. Default is ".", which stands for the following directory:
<Informatica Services Installation Directory>\tomcat\bin

Create directory if not exists

Header options

Header command Footer command Output type

Output file directory

50

Chapter 3: Physical Data Objects

Property Output file name Command

Description File name of the flat file target. Command used to process the target data. On UNIX, use any valid UNIX command or shell script. On Windows, use any valid DOS command or batch file. The flat file writer sends the data to the command instead of a flat file target. You can improve mapping performance by pushing transformation tasks to the command instead of the Data Integration Service. You can also use a command to sort or to compress target data. For example, use the following command to generate a compressed file from the target data:
compress -c - > MyTargetFiles/MyCompressedFile.Z

Reject file directory

Directory where the reject file exists. Note: This field appears when you edit a flat file target in a mapping. File name of the reject file. Note: This field appears when you edit a flat file target in a mapping.

Reject file name

General Properties
The Developer tool displays general properties for flat file targets in the target transformation in the Write view. The following table describes the general properties that you configure for flat file targets:
Property Name Description Name of the flat file. This property is read-only. You can edit the name in the Overview view. When you use the flat file as a target in a mapping, you can edit the name within the mapping. Description of the flat file.

Description

Columns Properties
The Developer tool displays column properties for flat file targets in the target transformation in the Write view. The following table describes the column properties that you configure for flat file targets:
Property Name Native type Bytes to process (fixed-width flat files) Precision Description Name of the column. Native datatype of the column. Number of bytes that the Data Integration Service writes for the column.

Maximum number of significant digits for numeric datatypes, or maximum number of characters for string datatypes. For numeric datatypes, precision includes scale. Maximum number of digits after the decimal point for numeric values. Column format for numeric and datetime datatypes.

Scale Format

Flat File Data Objects

51

Property

Description For numeric datatypes, the format defines the thousand separators and decimal separators. Default is no thousand separator and a period (.) for the decimal separator. For datetime datatypes, the format defines the display format for year, month, day, and time. It also defines the field width. Default is "A 19 YYYY-MM-DD HH24:MI:SS."

Description

Description of the column.

Configuring Flat File Write Properties


Configure write properties to control how the Data Integration Service writes data to a flat file. 1. 2. 3. 4. Open the flat file data object. Select the Write view. To edit run-time properties, select the Input transformation. To edit general or column properties, select the target transformation. In the Properties view, select the properties you want to edit. For example, click Runtime properties or Columns properties. 5. 6. Edit the properties. Save the flat file data object.

Flat File Data Object Advanced Properties


The Data Integration Service uses advanced properties when it reads data from or writes data to a flat file. The Developer tool displays advanced properties for flat files in the Advanced view. The following table describes the advanced properties that you configure for flat files:
Property Code page Description Code page of the flat file data object. For source files, use a source code page that is a subset of the target code page. For lookup files, use a code page that is a superset of the source code page and a subset of the target code page. For target files, use a code page that is a superset of the source code page Default is "MS Windows Latin 1 (ANSI), superset of Latin 1." Format for the flat file, either delimited or fixed-width. Character used to separate columns of data. Null character type, either text or binary.

Format Delimiters (delimited flat files) Null character type (fixedwidth flat files) Null character (fixed-width flat files) Repeat null character (fixedwidth flat files)

Character used to represent a null value. The null character can be any valid character in the file code page or any binary value from 0 to 255. For source files, causes the Data Integration Service to read repeat null characters in a single field as one null value. For target files, causes the Data Integration Service to write as many null characters as possible into the target field. If you do not enable this option, the Data Integration Service enters one null character at the beginning of the field to represent a null value. Default is disabled.

52

Chapter 3: Physical Data Objects

Property Datetime format

Description Defines the display format and the field width for datetime values. Default is "A 19 YYYY-MM-DD HH24:MI:SS." Thousand separator for numeric values. Default is None. Decimal separator for numeric values. Default is a period (.). Controls the amount of detail in the mapping log file. Note: This field appears when you edit a flat file source or target in a mapping.

Thousand separator

Decimal separator

Tracing level

Creating a Flat File Data Object


Create a flat file data object to define the data object columns and rows. 1. 2. 3. Select a project or folder in the Object Explorer view. Click File > New > Data Object. Select Physical Data Objects > Flat File Data Object and click Next. The New Flat File Data Object dialog box appears. 4. 5. 6. 7. 8. 9. 10. 11. Select Create as Empty. Enter a name for the data object. Optionally, click Browse to select a project or folder for the data object. Click Next. Select a code page that matches the code page of the data in the file. Select Delimited or Fixed-width. If you selected Fixed-width, click Finish. If you selected Delimited, click Next. Configure the following properties:
Property Delimiters Description Character used to separate columns of data. Use the Other field to enter a different delimiter. Delimiters must be printable characters and must be different from the configured escape character and the quote character. You cannot select unprintable multibyte characters as delimiters. Quote character that defines the boundaries of text strings. If you select a quote character, the Developer tool ignores delimiters within a pair of quotes.

Text Qualifier

12.

Click Finish. The data object appears under Data Object in the project or folder in the Object Explorer view.

Importing a Fixed-Width Flat File Data Object


Import a fixed-width flat file data object when you have a fixed-width flat file that defines the metadata you want to include in a mapping, mapplet, or profile.

Flat File Data Objects

53

1.

Click File > New > Data Object. The New dialog box appears.

2.

Select Physical Data Objects > Flat File Data Object and click Next. The New Flat File Data Object dialog box appears.

3. 4. 5.

Enter a name for the data object. Click Browse and navigate to the directory that contains the file. Click Open. The wizard names the data object the same name as the file you selected.

6. 7. 8. 9. 10. 11. 12.

Optionally, edit the data object name. Click Next. Select a code page that matches the code page of the data in the file. Select Fixed-Width. Optionally, edit the maximum number of rows to preview. Click Next. Configure the following properties:
Property Import Field Names From First Line Description If selected, the Developer tool uses data in the first row for column names. Select this option if column names appear in the first row. Row number at which the Data Integration Service starts reading when it imports the file. For example, if you specify to start at the second row, the Developer tool skips the first row before reading.

Start Import At Row

13.

Click Edit Breaks to edit column breaks. Or, follow the directions in the wizard to manipulate the column breaks in the file preview window. You can move column breaks by dragging them. Or, double-click a column break to delete it.

14. 15.

Click Next to preview the physical data object. Click Finish. The data object appears under Data Object in the project or folder in the Object Explorer view.

Importing a Delimited Flat File Data Object


Import a delimited flat file data object when you have a delimited flat file that defines the metadata you want to include in a mapping, mapplet, or profile. 1. 2. Select a project or folder in the Object Explorer view. Click File > New > Data Object. The New dialog box appears. 3. Select Physical Data Objects > Flat File Data Object and click Next. The New Flat File Data Object dialog box appears. 4. 5. Enter a name for the data object. Click Browse and navigate to the directory that contains the file.

54

Chapter 3: Physical Data Objects

6.

Click Open. The wizard names the data object the same name as the file you selected.

7. 8. 9. 10. 11. 12. 13.

Optionally, edit the data object name. Click Next. Select a code page that matches the code page of the data in the file. Select Delimited. Optionally, edit the maximum number of rows to preview. Click Next. Configure the following properties:
Property Delimiters Description Character used to separate columns of data. Use the Other field to enter a different delimiter. Delimiters must be printable characters and must be different from the configure escape character and the quote character. You cannot select nonprinting multibyte characters as delimiters. Quote character that defines the boundaries of text strings. If you select a quote character, the Developer tool ignores delimiters within pairs of quotes. If selected, the Developer tool uses data in the first row for column names. Select this option if column names appear in the first row. The Developer tool prefixes "FIELD_" to field names that are not valid. Specify a line break character. Select from the list or enter a character. Preface an octal code with a backslash (\). To use a single character, enter the character. The Data Integration Service uses only the first character when the entry is not preceded by a backslash. The character must be a single-byte character, and no other character in the code page can contain that byte. Default is line-feed, \012 LF (\n). Character immediately preceding a column delimiter character embedded in an unquoted string, or immediately preceding the quote character in a quoted string. When you specify an escape character, the Data Integration Service reads the delimiter character as a regular character. Row number at which the Data Integration Service starts reading when it imports the file. For example, if you specify to start at the second row, the Developer tool skips the first row before reading. If selected, the Data Integration Service reads one or more consecutive column delimiters as one. Otherwise, the Data Integration Service reads two consecutive delimiters as a null value. Removes the escape character in the output string.

Text Qualifier

Import Field Names From First Line

Row Delimiter

Escape Character

Start Import At Row

Treat Consecutive Delimiters as One

Remove Escape Character From Data

14. 15.

Click Next to preview the data object. Click Finish. The data object appears under Data Object in the project or folder in the Object Explorer view.

Flat File Data Objects

55

SAP Data Objects


Import an SAP data object to include in a mapping, mapplet, or profile. SAP data objects are physical data objects that use SAP as the source.

Creating an SAP Data Object


Create an SAP data object to add to a mapping, mapplet, or profile. Before you create an SAP data object, you need to configure a connection to the enterprise application. 1. 2. 3. Select a project or folder in the Object Explorer view. Click File > New > Data Object. Select SAP Data Object and click Next. The New SAP Data Object dialog box appears. 4. 5. 6. Click Browse next to the Location option and select the target project or folder. Click Browse next to the Connection option and select an SAP connection from which you want to import the SAP table metadata. Click Add next to the Resource option. The Add sources to the data object dialog box appears. 7. Enter the table names or select them to add to the data object:
Navigate to the SAP table or tables that you want to import and click OK. Enter the table name or the description of the table you want to import in the Resource field.

When you enter a table name, you can include wildcard characters and separate multiple table names with a comma. 8. 9. Optionally enter a name for the physical data object. Click Finish. The data object appears under Data Object in the project or folder in the Object Explorer view. You can also add tables to an SAP data object after you create it.

Synchronization
You can synchronize physical data objects when their sources change. When you synchronize a physical data object, the Developer tool reimports the object metadata from the source you select. You can synchronize all physical data objects. When you synchronize relational data objects or customized data objects, you can retain or overwrite the key relationships you define in the Developer tool. You can configure a customized data object to be synchronized when its sources change. For example, a customized data object uses a relational data object as a source, and you add a column to the relational data object. The Developer tool adds the column to the customized data object. To synchronize a customized data object when its sources change, select the Synchronize input and output option in the Overview properties of the customized data object. To synchronize any physical data object, right-click the object in the Object Explorer view, and select Synchronize.

56

Chapter 3: Physical Data Objects

Troubleshooting Physical Data Objects


I am trying to preview a relational data object or a customized data object source transformation and the preview fails.
Verify that the resource owner name is correct. When you import a relational resource, the Developer tool imports the owner name when the user name and schema from which the table is imported do not match. If the user name and schema from which the table is imported match, but the database default schema has a different name, preview fails because the Data Integration Service executes the preview query against the database default schema, where the table does not exist. Update the relational data object or the source transformation and enter the correct resource owner name. The owner name appears in the relational data object or the source transformation Advanced properties.

I am trying to preview a flat file data object and the preview fails. I get an error saying that the system cannot find the path specified.
Verify that the machine that hosts Informatica Services can access the source file directory. For example, you create a flat file data object by importing the following file on your local machine, MyClient:
C:\MySourceFiles\MyFile.csv

In the Read view, select the Runtime properties in the Output transformation. The source file directory is "C: \MySourceFiles." When you preview the file, the Data Integration Service tries to locate the file in the "C:\MySourceFiles" directory on the machine that hosts Informatica Services. If the directory does not exist on the machine that hosts Informatica Services, the Data Integration Service returns an error when you preview the file. To work around this issue, use the network path as the source file directory. For example, change the source file directory from "C:\MySourceFiles" to "\\MyClient\MySourceFiles." Share the "MySourceFiles" directory so that the machine that hosts Informatica Services can access it.

Troubleshooting Physical Data Objects

57

CHAPTER 4

Mappings
This chapter includes the following topics:
Mappings Overview, 58 Developing a Mapping, 59 Creating a Mapping, 59 Mapping Objects, 60 Linking Ports, 61 Propagating Port Attributes, 63 Mapping Validation, 66 Running a Mapping, 67 Segments, 67

Mappings Overview
A mapping is a set of inputs and outputs that represent the data flow between sources and targets. They can be linked by transformation objects that define the rules for data transformation. The Data Integration Service uses the instructions configured in the mapping to read, transform, and write data. The type of input and output you include in a mapping determines the type of mapping. You can create the following types of mapping in the Developer tool:
Mapping with physical data objects as the input and output Logical data object mapping with a logical data object as the mapping input or output Virtual table mapping with a virtual table as the mapping output

Object Dependency in a Mapping


A mapping is dependent on some objects that are stored as independent objects in the repository. When object metadata changes, the Developer tool tracks the effects of these changes on mappings. Mappings might become invalid even though you do not edit the mapping. When a mapping becomes invalid, the Data Integration Service cannot run it. The following objects are stored as independent objects in the repository:
Logical data objects Physical data objects

58

Reusable transformations Mapplets

A mapping is dependent on these objects. The following objects in a mapping are stored as dependent repository objects:
Virtual tables. Virtual tables are stored as part of an SQL data service. Non-reusable transformations that you build within the mapping. Non-reusable transformations are stored

within the mapping only.

Developing a Mapping
Develop a mapping to read, transform, and write data according to your business needs. 1. 2. Determine the type of mapping you want to create: logical data object, virtual table, or a mapping with physical data objects as input and output. Create input, output, and reusable objects that you want to use in the mapping. Create physical data objects, logical data objects, or virtual tables to use as mapping input or output. Create reusable transformations that you want to use. If you want to use mapplets, you must create them also. Create the mapping. Add objects to the mapping. You must add input and output objects to the mapping. Optionally, add transformations and mapplets. Link ports between mapping objects to create a flow of data from sources to targets, through mapplets and transformations that add, remove, or modify data along this flow. Validate the mapping to identify errors. Save the mapping to the Model repository.

3. 4. 5. 6. 7.

After you develop the mapping, run it to see mapping output.

Creating a Mapping
Create a mapping to move data between flat file or relational sources and targets and transform the data. 1. 2. 3. 4. Select a project or folder in the Object Explorer view. Click File > New > Mapping. Optionally, enter a mapping name. Click Finish. An empty mapping appears in the editor.

Developing a Mapping

59

Mapping Objects
Mapping objects determine the data flow between sources and targets. Every mapping must contain the following objects:
Input. Describes the characteristics of the mapping source. Output. Describes the characteristics of the mapping target.

A mapping can also contain the following components:


Transformation. Modifies data before writing it to targets. Use different transformation objects to perform

different functions.
Mapplet. A reusable object containing a set of transformations that you can use in multiple mappings.

When you add an object to a mapping, you configure the properties according to how you want the Data Integration Service to transform the data. You also connect the mapping objects according to the way you want the Data Integration Service to move the data. You connect the objects through ports. The editor displays objects in the following ways:
Iconized. Shows an icon of the object with the object name. Normal. Shows the columns and the input and output port indicators. You can connect objects that are in the

normal view.

Adding Objects to a Mapping


Add objects to a mapping to determine the data flow between sources and targets. 1. 2. 3. 4. 5. Open the mapping. Drag a physical data object to the editor and select Read to add the data object as a source. Drag a physical data object to the editor and select Write to add the data object as a target. To add a Lookup transformation, drag a physical data object from the Data Sources folder in the Object Explorer view to the editor and select Lookup. To add a reusable transformation, drag the transformation from the Transformation folder in the Object Explorer view to the editor. Repeat this step for each reusable transformation you want to add. 6. To add a non-reusable transformation, select the transformation on the Transformation palette and drag it to the editor. Repeat this step for each non-reusable transformation you want to add. 7. 8. Configure ports and properties for each non-reusable transformation. Optionally, drag a mapplet to the editor.

One to One Links


Link one port in an input object or transformation to one port in an output object or transformation.

60

Chapter 4: Mappings

One to Many Links


When you want to use the same data for different purposes, you can link the port providing that data to multiple ports in the mapping. You can create a one to many link in the following ways:
Link one port to multiple transformations or output objects. Link multiple ports in one transformation to multiple transformations or output objects.

For example, you want to use salary information to calculate the average salary in a bank branch through the Aggregator transformation. You can use the same information in an Expression transformation configured to calculate the monthly pay of each employee.

Linking Ports
After you add and configure input, output, transformation, and mapplet objects in a mapping, complete the mapping by linking ports between mapping objects. Data passes into and out of a transformation through the following ports:
Input ports. Receive data. Output ports. Pass data. Input/output ports. Receive data and pass it unchanged.

Every input object, output object, mapplet, and transformation contains a collection of ports. Each port represents a column of data:
Input objects provide data, so they contain only output ports. Output objects receive data, so they contain only input ports. Mapplets contain only input ports and output ports. Transformations contain a mix of input, output, and input/output ports, depending on the transformation and its

application. To connect ports, you create a link between ports in different mapping objects. The Developer tool creates the connection only when the connection meets link validation and concatenation requirements. You can leave ports unconnected. The Data Integration Service ignores unconnected ports. When you link ports between input objects, transformations, mapplets, and output objects, you can create the following types of link:
One to one One to many

You can manually link ports or link ports automatically.

Manually Linking Ports


You can manually link one port or multiple ports. Drag a port from an input object or transformation to the port of an output object or transformation. Use the Ctrl or Shift key to select multiple ports to link to another transformation or output object. The Developer tool links the ports, beginning with the top pair. It links all ports that meet the validation requirements. When you drag a port into an empty port, the Developer tool copies the port and creates a connection.
Linking Ports 61

Automatically Linking Ports


When you link ports automatically, you can link by position or by name. When you link ports automatically by name, you can specify a prefix or suffix by which to link the ports. Use prefixes or suffixes to indicate where ports occur in a mapping.

Linking Ports by Name


When you link ports by name, the Developer tool adds links between input and output ports that have the same name. Link by name when you use the same port names across transformations. You can link ports based on prefixes and suffixes that you define. Use prefixes or suffixes to indicate where ports occur in a mapping. Link by name and prefix or suffix when you use prefixes or suffixes in port names to distinguish where they occur in the mapping or mapplet. Linking by name is not case sensitive. 1. Click Mapping > Auto Link. The Auto Link dialog box appears. 2. 3. 4. 5. 6. Select an object in the From window to link from. Select an object in the To window to link to. Select Name. Optionally, click Advanced to link ports based on prefixes or suffixes. Click OK.

Linking Ports by Position


When you link by position, the Developer tool links the first output port to the first input port, the second output port to the second input port, and so forth. Link by position when you create transformations with related ports in the same order. 1. Click Mapping > Auto Link. The Auto Link dialog box appears. 2. 3. 4. Select an object in the From window to link from. Select an object in the To window to link to. Select Position and click OK. The Developer tool links the first output port to the first input port, the second output port to the second input port, and so forth.

Rules and Guidelines for Linking Ports


Certain rules and guidelines apply when you link ports. Use the following rules and guidelines when you connect mapping objects:
If the Developer tool detects an error when you try to link ports between two mapping objects, it displays a

symbol indicating that you cannot link the ports.


Follow the logic of data flow in the mapping. You can link the following types of port: - The receiving port must be an input or input/output port. - The originating port must be an output or input/output port.

62

Chapter 4: Mappings

- You cannot link input ports to input ports or output ports to output ports. You must link at least one port of an input group to an upstream transformation. You must link at least one port of an output group to a downstream transformation. You can link ports from one active transformation or one output group of an active transformation to an input

group of another transformation.


You cannot connect an active transformation and a passive transformation to the same downstream

transformation or transformation input group.


You cannot connect more than one active transformation to the same downstream transformation or

transformation input group.


You can connect any number of passive transformations to the same downstream transformation,

transformation input group, or target.


You can link ports from two output groups in the same transformation to one Joiner transformation configured

for sorted data if the data from both output groups is sorted.
You can only link ports with compatible datatypes. The Developer tool verifies that it can map between the two

datatypes before linking them. The Data Integration Service cannot transform data between ports with incompatible datatypes.
The Developer tool marks some mappings invalid if the mapping violates data flow validation.

Propagating Port Attributes


Propagate port attributes to pass changed attributes to a port throughout a mapping. 1. 2. In the mapping canvas, select a port in a transformation. Click Mapping > Propagate Attributes. The Propagate Attributes dialog box appears. 3. 4. 5. 6. Select a direction to propagate attributes. Select the attributes you want to propagate. Optionally, preview the results. Click Apply. The Developer tool propagates the port attributes.

Dependency Types
When you propagate port attributes, the Developer tool updates dependencies. The Developer tool can update the following dependencies:
Link path dependencies Implicit dependencies

Link Path Dependencies


A link path dependency is a dependency between a propagated port and the ports in its link path.

Propagating Port Attributes

63

When you propagate dependencies in a link path, the Developer tool updates all the input and input/output ports in its forward link path and all the output and input/output ports in its backward link path. The Developer tool performs the following updates:
Updates the port name, datatype, precision, scale, and description for all ports in the link path of the

propagated port.
Updates all expressions or conditions that reference the propagated port with the changed port name. Updates the associated port property in a dynamic Lookup transformation if the associated port name changes.

Implicit Dependencies
An implicit dependency is a dependency within a transformation between two ports based on an expression or condition. You can propagate datatype, precision, scale, and description to ports with implicit dependencies. You can also parse conditions and expressions to identify the implicit dependencies of the propagated port. All ports with implicit dependencies are output or input/output ports. When you include conditions, the Developer tool updates the following dependencies:
Link path dependencies Output ports used in the same lookup condition as the propagated port Associated ports in dynamic Lookup transformations that are associated with the propagated port Master ports used in the same join condition as the detail port

When you include expressions, the Developer tool updates the following dependencies:
Link path dependencies Output ports containing an expression that uses the propagated port

The Developer tool does not propagate to implicit dependencies within the same transformation. You must propagate the changed attributes from another transformation. For example, when you change the datatype of a port that is used in a lookup condition and propagate that change from the Lookup transformation, the Developer tool does not propagate the change to the other port dependent on the condition in the same Lookup transformation.

Propagated Port Attributes by Transformation


The Developer tool propagates dependencies and attributes for each transformation. The following table describes the dependencies and attributes the Developer tool propagates for each transformation:
Transformation Address Validator Dependency None. Propagated Attributes None. This transform has predefined port names and datatypes. Port name, datatype, precision, scale, description Port name Datatype, precision, scale Port name, datatype, precision, scale, description

Aggregator

Ports in link path Expression Implicit dependencies

Association

Ports in link path

64

Chapter 4: Mappings

Transformation Case Converter

Dependency Ports in link path

Propagated Attributes Port name, datatype, precision, scale, description Port name, datatype, precision, scale, description

Comparison

Ports in link path

Consolidator

None.

None. This transform has predefined port names and datatypes. Port name, datatype, precision, scale, description Port name Datatype, precision, scale Port name, datatype, precision, scale, description Port name Port name, datatype, precision, scale, description Port name Datatype, precision, scale Port name, datatype, precision, scale, description Port name, datatype, precision, scale, description Port name, datatype, precision, scale, description Port name Port name Datatype, precision, scale Port name, datatype, precision, scale, description Port name, datatype, precision, scale, description Port name, datatype, precision, scale, description Port name, datatype, precision, scale, description Port name Datatype, precision, scale Port name, datatype, precision, scale, description Port name Port name, datatype, precision, scale, description

Expression

Ports in link path Expression Implicit dependencies

Filter

Ports in link path Condition

Joiner

Ports in link path Condition Implicit Dependencies

Key Generator

Ports in link path

Labeler

Ports in link path

Lookup

Ports in link path Condition Associated ports (dynamic lookup) Implicit Dependencies

Match

Ports in link path

Merge

Ports in link path

Parser

Ports in link path

Rank

Ports in link path Expression Implicit dependencies

Router

Ports in link path Condition

Sorter

Ports in link path

Propagating Port Attributes

65

Transformation SQL

Dependency Ports in link path

Propagated Attributes Port name, datatype, precision, scale, description Port name, datatype, precision, scale, description Port name, datatype, precision, scale, description Datatype, precision, scale Port name, datatype, precision, scale, description Port name Datatype, precision, scale Port name, datatype, precision, scale, description

Standardizer

Ports in link path

Union

Ports in link path Implicit dependencies

Update Strategy

Ports in link path Expression Implicit dependencies

Weighted Average

Ports in link path

Mapping Validation
When you develop a mapping, you must configure it so the Data Integration Service can read and process the entire mapping. The Developer tool marks a mapping invalid when it detects errors that will prevent the Data Integration Service from running sessions associated with the mapping. The Developer tool considers the following types of validation:
Connection Expression Object Data flow

Connection Validation
The Developer tool performs connection validation each time you connect ports in a mapping and each time you validate a mapping. When you connect ports, the Developer tool verifies that you make valid connections. When you validate a mapping, the Developer tool verifies that the connections are valid and that all required ports are connected. The Developer tool makes the following connection validations:
At least one input object and one output object are connected. At least one mapplet input port and output port is connected to the mapping. Datatypes between ports are compatible. If you change a port datatype to one that is incompatible with the port

it is connected to, the Developer tool generates an error and invalidates the mapping. You can however, change the datatype if it remains compatible with the connected ports, such as Char and Varchar.

66

Chapter 4: Mappings

Expression Validation
You can validate an expression in a transformation while you are developing a mapping. If you did not correct the errors, error messages appear in the Validation Log view when you validate the mapping. If you delete input ports used in an expression, the Developer tool marks the mapping as invalid.

Object Validation
When you validate a mapping, the Developer tool verifies that the definitions of the independent objects, such as Input transformations or mapplets, match the instance in the mapping. If any object changes while you configure the mapping, the mapping might contain errors. If any object changes while you are not configuring the mapping, the Developer tool tracks the effects of these changes on the mappings.

Validating a Mapping
Validate a mapping to ensure that the Data Integration Service can read and process the entire mapping. 1. Click Edit > Validate. Errors appear in the Validation Log view. 2. Fix errors and validate the mapping again.

Running a Mapping
Run a mapping to move output from sources to targets and transform data. Before you can run a mapping, you need to configure a Data Integration Service in the Administrator tool. You also need to select a default Data Integration Service. If you have not selected a default Data Integration Service, the Developer tool prompts you to select one.
u

Right-click an empty area in the editor and click Run Mapping. The Data Integration Service runs the mapping and writes the output to the target.

Segments
A segment consists of one or more objects in a mapping, mapplet, rule, or virtual stored procedure. A segment can include a source, target, transformation, or mapplet.

Copying a Segment
You can copy a segment when you want to reuse a portion of the mapping logic in another mapping, mapplet, rule, or virtual stored procedure. 1. 2. Open a the object that contains the segment you want to copy. Select a segment by highlighting each object you want to copy.

Running a Mapping

67

Hold down the Ctrl key to select multiple objects. You can also select segments by dragging the pointer in a rectangle around objects in the editor. 3. 4. 5. Click Edit > Copy to copy the segment to the clipboard. Open a target mapping, mapplet, rule, or virtual stored procedure. Click Edit > Paste.

68

Chapter 4: Mappings

CHAPTER 5

Performance Tuning
This chapter includes the following topics:
Performance Tuning Overview, 69 Optimization Methods, 70 Setting the Optimizer Level for a Developer Tool Mapping, 73 Setting the Optimizer Level for a Deployed Mapping, 74

Performance Tuning Overview


The Developer tool contains features that allow you to tune the performance of mappings. You might be able to improve mapping performance by updating mapping optimizer level through the mapping configuration or mapping deployment properties. If you notice that a mapping takes an excessive amount of time to run, you might want to change the optimizer level for the mapping. The optimizer level determines which optimization methods the Data Integration Service applies to the mapping at run-time. You can choose one of the following optimizer levels:
None. The Data Integration Service does not optimize the mapping. It runs the mapping exactly as you

designed it.
Minimal. The Data Integration Service applies the early projection optimization method to the mapping. Normal. The Data Integration Service applies the early projection, early selection, and predicate optimization

methods to the mapping. This is the default optimizer level.


Full. The Data Integration Service applies the early projection, early selection, predicate optimization, and semi-

join optimization methods to the mapping. You set the optimizer level for a mapping in the mapping configuration or mapping deployment properties. The Data Integration Service applies different optimizer levels to the mapping depending on how you run the mapping. You can run a mapping in the following ways:
From the Run menu or mapping editor. The Data Integration Service uses the normal optimizer level. From the Run dialog box. The Data Integration Service uses the optimizer level in the selected mapping

configuration.
From the command line. The Data Integration Service uses the optimizer level in the application's mapping

deployment properties. You can also preview mapping output. When you preview mapping output, the Developer tool uses the optimizer level in the selected data viewer configuration.

69

Optimization Methods
To increase mapping performance, select an optimizer level for the mapping. The optimizer level controls the optimization methods that the Data Integration Service applies to a mapping. The Data Integration Service can apply the following optimization methods:
Early projection. The Data Integration Service attempts to reduce the amount of data that passes through a

mapping by identifying unused ports and removing the links between those ports. The Data Integration Service applies this optimization method when you select the minimal, normal, or full optimizer level.
Early selection. The Data Integration Service attempts to reduce the amount of data that passes through a

mapping by applying the filters as early as possible. The Data Integration Service applies this optimization method when you select the normal or full optimizer level.
Predicate optimization. The Data Integration Service attempts to improve mapping performance by inferring

new predicate expressions and by simplifying and rewriting the predicate expressions generated by a mapping or the transformations within the mapping. The Data Integration Service applies this optimization method when you select the normal or full optimizer level.
Semi-join. The Data Integration Service attempts to reduce the amount of data extracted from the source by

decreasing the size of one of the join operand data sets. The Data Integration Service applies this optimization method when you select full optimizer level. The Data Integration Service can apply multiple optimization methods to a mapping at the same time. For example, it applies the early projection, early selection, and predicate optimization methods when you select the normal optimizer level.

Early Projection Optimization Method


The early projection optimization method causes the Data Integration Service to identify unused ports and remove the links between those ports. Identifying and removing links between unused ports improves performance by reducing the amount of data the Data Integration Service moves across transformations. When the Data Integration Service processes a mapping, it moves the data from all connected ports in a mapping from one transformation to another. In large, complex mappings, or in mappings that use nested mapplets, some ports might not ultimately supply data to the target. The early projection method causes the Data Integration Service to identify the ports that do not supply data to the target. After the Data Integration Service identifies unused ports, it removes the links between all unused ports from the mapping. The Data Integration Service does not remove all links. For example, it does not remove the following links:
Links connected to a Custom transformation Links connected to transformations that call an ABORT() or ERROR() function, send email, or call a stored

procedure If the Data Integration Service determines that all ports in a transformation are unused, it removes all transformation links except the link to the port with the least data. The Data Integration Service does not remove the unused transformation from the mapping. The Developer tool enables this optimization method by default.

Early Selection Optimization Method


The early selection optimization method applies the filters in a mapping as early as possible.

70

Chapter 5: Performance Tuning

Filtering data early increases performance by reducing the number of rows that pass through the mapping. In the early selection method, the Data Integration Service splits, moves, splits and moves, or removes the Filter transformations in a mapping. The Data Integration Service might split a Filter transformation if the filter condition is a conjunction. For example, the Data Integration Service might split the filter condition "A>100 AND B<50" into two simpler conditions, "A>100" and "B<50." When the Data Integration Service can split a filter, it attempts to move the simplified filters up the mapping pipeline, closer to the mapping source. Splitting the filter allows the Data Integration Service to move the simplified filters up the pipeline separately. Moving the filter conditions closer to the source reduces the number of rows that pass through the mapping. The Data Integration Service might also remove Filter transformations from a mapping. It removes a Filter transformation when it can apply the filter condition to the transformation logic of the transformation immediately upstream of the original Filter transformation. The Data Integration Service cannot always move a Filter transformation. For example, it cannot move a Filter transformation upstream of the following transformations:
Custom transformations Transformations that call an ABORT() or ERROR() function, send email, or call a stored procedure Transformations that maintain count through a variable port, for example, COUNT=COUNT+1 Transformations that create branches in the mapping. For example, the Data Integration Service cannot move

a Filter transformation upstream if it is immediately downstream of a Router transformation with two output groups. The Data Integration Service does not move a Filter transformation upstream in the mapping if doing so changes the mapping results. The Developer tool enables this optimization method by default. You might want to disable this method if it does not increase performance. For example, a mapping contains source ports "P1" and "P2." "P1" is connected to an Expression transformation that evaluates "P2=f(P1)." "P2" is connected to a Filter transformation with the condition "P2>1." The filter drops very few rows. If the Data Integration Service moves the Filter transformation upstream of the Expression transformation, the Filter transformation must evaluate "f(P1)>1" for every row in source port "P1." The Expression transformation also evaluates "P2=f(P1)" for every row. If the function is resource intensive, moving the Filter transformation upstream nearly doubles the number of times it is called, which might degrade performance.

Predicate Optimization Method


The predicate optimization method causes the Data Integration Service to examine the predicate expressions generated by a mapping or the transformations within a mapping to determine whether the expressions can be simplified or rewritten to increase performance of the mapping. When the Data Integration Service runs a mapping, it generates queries against the mapping sources and performs operations on the query results based on the mapping logic and the transformations within the mapping. The generated queries and operations often involve predicate expressions. Predicate expressions represent the conditions that the data must satisfy. The filter and join conditions in Filter and Joiner transformations are examples of predicate expressions. This optimization method causes the Data Integration Service to examine the predicate expressions generated by a mapping or the transformations within a mapping to determine whether the expressions can be simplified or rewritten to increase performance of the mapping. The Data Integration Service also attempts to apply predicate expressions as early as possible to improve mapping performance.

Optimization Methods

71

This method also causes the Data Integration Service to infer relationships implied by existing predicate expressions and create new predicate expressions based on the inferences. For example, a mapping contains a Joiner transformation with the join condition "A=B" and a Filter transformation with the filter condition "A>5." The Data Integration Service might be able to add the inference "B>5" to the join condition. The Data Integration Service uses the predicate optimization method with the early selection optimization method when it can apply both methods to a mapping. For example, when the Data Integration Service creates new filter conditions through the predicate optimization method, it also attempts to move them upstream in the mapping through the early selection method. Applying both optimization methods improves mapping performance when compared to applying either method alone. The Data Integration Service applies this optimization method when it can run the mapping more quickly. It does not apply this method when doing so changes mapping results or worsens mapping performance. When the Data Integration Service rewrites a predicate expression, it applies mathematical logic to the expression to optimize it. For example, the Data Integration Service might perform any or all of the following actions:
Identify equivalent variables across predicate expressions in the mapping and generates simplified expressions

based on the equivalencies.


Identify redundant predicates across predicate expressions in the mapping and remove them. Extract subexpressions from disjunctive clauses and generates multiple, simplified expressions based on the

subexpressions.
Normalize a predicate expression. Apply predicate expressions as early as possible in the mapping.

The Data Integration Service might not apply predicate optimization to a mapping when the mapping contains transformations with a datatype mismatch between connected ports. The Data Integration Service might not apply predicate optimization to a transformation when any of the following conditions are true:
The transformation contains explicit default values for connected ports. The transformation calls an ABORT() or ERROR() function, sends email, or calls a stored procedure. The transformation does not allow predicates to be moved. For example, a developer might create a Custom

transformation that has this restriction. The Developer tool enables this optimization method by default.

Semi-Join Optimization Method


The semi-join optimization method attempts to reduce the amount of data extracted from the source by modifying join operations in the mapping. The Data Integration Service applies this method to a Joiner transformation when one input group has many more rows than the other and when the larger group has many rows with no match in the smaller group based on the join condition. The Data Integration Service attempts to decrease the size of the data set of one join operand by reading the rows from the smaller group, finding the matching rows in the larger group, and then performing the join operation. Decreasing the size of the data set improves mapping performance because the Data Integration Service no longer reads unnecessary rows from the larger group source. The Data Integration Service moves the join condition to the larger group source and reads only the rows that match the smaller group. Before applying this optimization method, the Data Integration Service performs analyses to determine whether semi-join optimization is possible and likely to be worthwhile. If the analyses determine that this method is likely to increase performance, the Data Integration Service applies it to the mapping. The Data Integration Service then reanalyzes the mapping to determine whether there are additional opportunities for semi-join optimization. It

72

Chapter 5: Performance Tuning

performs additional optimizations if appropriate. The Data Integration Service does not apply semi-join optimization unless the analyses determine that there is a high probability for improved performance. For the Data Integration Service to apply the semi-join optimization method to a join operation, the Joiner transformation must meet the following requirements:
The join type must be normal, master outer, or detail outer. The joiner transformation cannot perform a full

outer join.
The detail pipeline must originate from a relational source. The join condition must be a valid sort-merge-join condition. That is, each clause must be an equality of one

master port and one detail port. If there are multiple clauses, they must be joined by AND.
If the mapping does not use target-based commits, the Joiner transformation scope must be All Input. The master and detail pipelines cannot share any transformation. The mapping cannot contain a branch between the detail source and the Joiner transformation.

The semi-join optimization method might not be beneficial in the following circumstances:
The Joiner transformation master source does not contain significantly fewer rows than the detail source. The detail source is not large enough to justify the optimization. Applying the semi-join optimization method

adds some overhead time to mapping processing. If the detail source is small, the time required to apply the semi-join method might exceed the time required to process all rows in the detail source.
The Data Integration Service cannot get enough source row count statistics for a Joiner transformation to

accurately compare the time requirements of the regular join operation against the semi-join operation. The Developer tool does not enable this method by default.

Setting the Optimizer Level for a Developer Tool Mapping


When you run a mapping through the Run menu or mapping editor, the Developer tool runs the mapping with the normal optimizer level. To run the mapping with a different optimizer level, run the mapping through the Run dialog box. 1. 2. Open the mapping. Select Run > Open Run Dialog. The Run dialog box appears. 3. 4. 5. 6. 7. Select a mapping configuration that contains the optimizer level you want to apply or create a mapping configuration. Click the Advanced tab. Change the optimizer level, if necessary. Click Apply. Click Run to run the mapping. The Developer tool runs the mapping with the optimizer level in the selected mapping configuration.

Setting the Optimizer Level for a Developer Tool Mapping

73

Setting the Optimizer Level for a Deployed Mapping


Set the optimizer level for a mapping you run from the command line by changing the mapping deployment properties in the application. The mapping must be in an application. 1. 2. 3. 4. Open the application that contains the mapping. Click the Advanced tab. Select the optimizer level. Save the application.

After you change the optimizer level, you must redeploy the application.

74

Chapter 5: Performance Tuning

CHAPTER 6

Filter Pushdown Optimization


This chapter includes the following topics:
Filter Pushdown Optimization Overview, 75 Pushdown Optimization Process, 75 Pushdown Optimization Expressions, 77 Comparing the Output of the Data Integration Service and Sources, 82

Filter Pushdown Optimization Overview


When the Data Integration Service uses filter pushdown optimization, it pushes Filter transformation logic to the source. The Data Integration Service can push Filter transformation logic to the source when you run mappings, profiles, and SQL data services. The amount of Filter transformation logic that the Data Integration Service can push to the source depends on the location of the Filter transformation in the mapping, the source type, and the transformation logic. Filter pushdown optimization provides the following performance benefits:
The source may be able to process Filter transformation logic faster than the Data Integration Service. The Data Integration Service reads less data from the source.

Pushdown Optimization Process


The Data Integration Service uses early selection optimization to move Filter transformations immediately after the source in a mapping. The Data Integration Service can use pushdown optimization for Filter transformations that are immediately after the source. The Data Integration Service pushes transformation logic to the source when the transformation contains operators and functions that the source supports. The Data Integration Service processes transformation logic that it cannot push to the source. To push transformation logic to the source, the Data Integration Service translates the Filter transformation logic into a query that uses the native syntax and sends the query to the source system. The source system runs the query to process the transformation. Note: The Data Integration Service does not push down the filter transformation logic for a mapping if the mapping optimizer level is None or Minimal.

75

Pushdown Optimization to Native Sources


When the Data Integration Service pushes transformation logic to relational sources using the native drivers, the Data Integration Service generates SQL statements that use the native database SQL. The Data Integration Service can push Filter transformation logic to the following native sources:
IBM DB2 for i5/OS IBM DB2 for Linux, UNIX, and Windows ("DB2 for LUW") IBM DB2 for z/OS Microsoft SQL Server

The Data Integration Service can use a native connection to Microsoft SQL Server when the Data Integration Service runs on Windows.
Oracle

Pushdown Optimization to PowerExchange Nonrelational Sources


For PowerExchange nonrelational data sources on z/OS systems, the Data Integration Service pushes transformation logic to PowerExchange. PowerExchange tranlsates the logic into a query that the source can process. Pushdown optimization is supported for the following types of nonrelational sources:
IBM IMS Sequential data sets VSAM

Pushdown Optimization to ODBC Sources


When you use ODBC to connect to a source, the Data Integration Service can generate SQL statements using ANSI SQL or native database SQL. The Data Integration Service can push more transformation logic to the source when it generates SQL statements using the native database SQL. The source can process native database SQL faster than it can process ANSI SQL. You can specify the ODBC provider in the ODBC connection object. When the ODBC provider is database specific, the Data Integration Service can generates SQL statements using native database SQL. When the ODBC provider is Other, the Data Integration Service generates SQL statements using ANSI SQL. You can configure a specific ODBC provider for the following ODBC connection types:
Sybase ASE Microsoft SQL Server

Use an ODBC connection to connect to Microsoft SQL Server when the Data Integration Service runs on UNIX or Linux. Use a native connection to Microsoft SQL Server when the Data Integration Service runs on Windows.

Pushdown Optimization to SAP Sources


The Data Integration Service can push Filter transformation logic to SAP sources for expressions that contain a column name, an operator, and a literal string. When the Data Integration Service pushes transformation logic to SAP, the Data Integration Service converts the literal string in the expressions to an SAP datatype.

76

Chapter 6: Filter Pushdown Optimization

The Data Integration Service can push down transformation logic that contains the TO_DATE function when TO_DATE converts a DATS, TIMS, or ACCP datatype character string to one of the following date formats:
'MM/DD/YYYY' 'YYYY/MM/DD' 'YYYY-MM-DD HH24:MI:SS' 'YYYY/MM/DD HH24:MI:SS' 'MM/DD/YYYY HH24:MI:SS'

The Data Integration Service processes the transformation logic if you apply the TO_DATE function to a datatype other than DATS, TIMS, or ACCP or if TO_DATE converts a character string to a format that the Data Integration Services cannot push to SAP. The Data Integration Service processes transformation logic that contains other Informatica functions. The Data Integration Service processes transformation logic that contains other Informatica functions. Filter transformation expressions can include multiple conditions separated by AND or OR. If conditions apply to multiple SAP tables, the Data Integration Service can push transformation logic to SAP when the SAP data object uses the Open SQL ABAP join syntax. Configure the Select syntax mode in the read operation of the SAP data object.

SAP Datatype Exceptions


The Data Integration Service processes Filter transformation logic when the source cannot process the transformation logic. The Data Integration Service processes Filter transformation logic for an SAP source when transformation expression includes the following datatypes:
RAW LRAW LCHR

Pushdown Optimization Expressions


The Data Integration Service can push transformation logic to the source database when the transformation contains operators and functions that the source supports. The Data Integration Service translates the transformation expression into a query by determining equivalent operators and functions in the database. If there is no equivalent operator or function, the Data Integration Service processes the transformation logic. If the source uses an ODBC connection and you configure a database-specific ODBC provider in the ODBC connection object, the Data Integration Service considers the source to be the native source type.

Functions
The following table summarizes the availability of Informatica functions for pushdown optimization. In each column, an X indicates that the function can be pushed to the source.

Pushdown Optimization Expressions

77

Note: These functions are not available for nonrelational sources on z/OS.
Function DB2 for i5/OS DB2 for LUW X X X X X X X X X X X X X X X X X X X X X X X X X DB2 for z/OS Microsoft SQL Server X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X ODBC Oracle SAP Sybase ASE X X X X X X X X X X X X X X X

ABS() ADD_TO_DATE() ASCII() CEIL() CHR() CONCAT() COS() COSH() DATE_COMPARE() DECODE() EXP() FLOOR() GET_DATE_PART() IIF() IN() INITCAP() INSTR() ISNULL() LAST_DAY() LENGTH() LN() LOG() LOOKUP() LOWER() LPAD() LTRIM()

X X X X X X X X X X

78

Chapter 6: Filter Pushdown Optimization

Function

DB2 for i5/OS X X

DB2 for LUW X X

DB2 for z/OS X X X

Microsoft SQL Server X X

ODBC

Oracle

SAP

Sybase ASE X X

MOD() POWER() ROUND(DATE) ROUND(NUMBER) RPAD() RTRIM() SIGN() SIN() SINH() SOUNDEX() SQRT() SUBSTR() SYSDATE() SYSTIMESTAMP() TAN() TANH() TO_BIGINT TO_CHAR(DATE)

X X X

X X

X X X X

X X X X X X

X X X X

X X X X X X X X

X X X X X X X X X X X X X X X X X X X X X

X X X X X X X X X X X X X X X X X X

X X X X X X X

X X X X X X X X X X X X

X X X X X X X X X X X X

X X X X X X X X X X X X

TO_CHAR(NUMBER) X TO_DATE() TO_DECIMAL() TO_FLOAT() TO_INTEGER() TRUNC(DATE) TRUNC(NUMBER) UPPER() X X X X X X

X X

X X

X X X

X X

X X

Pushdown Optimization Expressions

79

IBM DB2 Function Exceptions


The Data Integration Service cannot push supported functions to IBM DB2 for i5/OS, DB2 for LUW, and DB2 for z/ OS sources under certain conditions. The Data Integration Service processes transformation logic for IBM DB2 sources when expressions contain supported functions with the following logic:
ADD_TO_DATE or GET_DATE_PART returns results with millisecond or nanosecond precision. LENGTH includes more than three arguments. LTRIM includes more than one argument. RTRIM includes more than one argument. TO_BIGINT includes more than one argument. TO_BIGINT converts a string to a bigint value on a DB2 for LUW source. TO_CHAR converts a date to a character string without the format argument. TO_CHAR converts a date to a character string and specifies a format that is not supported by DB2. TO_DATE converts a character string to a date without the format argument. TO_DATE converts a character string to a date and specifies a format that is not supported by DB2. TO_DECIMAL converts a string to a decimal value. TO_FLOAT converts a string to a double-precision floating point number. TO_INTEGER includes more than one argument. TO_INTEGER converts a string to an integer value on a DB2 for LUW source.

Microsoft SQL Server Function Exceptions


The Data Integration Service cannot push supported functions to Microsoft SQL Server sources under certain conditions. The Data Integration Service processes transformation logic for Microsoft SQL Server sources when expressions contain supported functions with the following logic:
IN includes the CaseFlag argument. INSTR includes more than three arguments. LTRIM includes more than one argument. RTRIM includes more than one argument. TO_BIGINT includes more than one argument. TO_INTEGER includes more than one argument.

Oracle Function Exceptions


The Data Integration Service cannot push supported functions to Oracle sources under certain conditions. The Data Integration Service processes transformation logic for Oracle sources when expressions contain supported functions with the following logic:
ADD_TO_DATE or GET_DATE_PART returns results with subsecond precision. ROUND rounds values to seconds or subseconds. SYSTIMESTAMP returns the date and time with microsecond precision. TRUNC truncates seconds or subseconds.

80

Chapter 6: Filter Pushdown Optimization

ODBC Function Exception


The Data Integration Service processes Filter transformation logic for ODBC when the CaseFlag argument for the IN function is a number other than zero. Note: When the ODBC connection object properties include a database-specific ODBC provider, the Data Integration Service considers the source to be the native source type.

Sybase ASE Function Exceptions


The Data Integration Service cannot push supported functions to Sybase ASE sources under certain conditions. The Data Integration Service processes transformation logic for Sybase ASE sources when expressions contain supported functions with the following logic:
IN includes the CaseFlag argument. INSTR includes more than two arguments. LTRIM includes more than one argument. RTRIM includes more than one argument. TO_BIGINT includes more than one argument. TO_INTEGER includes more than one argument. TRUNC(Numbers) includes more than one argument.

Operators
The following table summarizes the availability of Informatica operators by source type. In each column, an X indicates that the operator can be pushed to the source. Note: Nonrelational sources are IMS, VSAM, and sequential data sets on z/OS.
Operator DB2 for i5/OS, LUW, or z/OS X Microsoft SQL Server X Nonrelational ODBC Oracle SAP Sybase ASE X

+ * / % || = > < >= <= <> != ^=

X X X X

X X X X X

X X X

X X X X X

X X X

X X X X

X X X

X X X

X X X

X X X

Pushdown Optimization Expressions

81

Operator

DB2 for i5/OS, LUW, or z/OS X

Microsoft SQL Server X

Nonrelational

ODBC

Oracle

SAP

Sybase ASE X

AND OR NOT

Comparing the Output of the Data Integration Service and Sources


The Data Integration Service and sources can produce different results when processing the same transformation logic. When the Data Integration Service pushes transformation logic to the source, the output of the transformation logic can be different. Case sensitivity The Data Integration Service and a database can treat case sensitivity differently. For example, the Data Integration Service uses case-sensitive queries and the database does not. A Filter transformation uses the following filter condition: IIF(col_varchar2 = CA, TRUE, FALSE). You need the database to return rows that match CA. However, if you push this transformation logic to a database that is not case sensitive, it returns rows that match the values Ca, ca, cA, and CA. Numeric values converted to character values The Data Integration Service and a database can convert the same numeric value to a character value in different formats. The database might convert numeric values to an unacceptable character format. For example, a table contains the number 1234567890. When the Data Integration Service converts the number to a character value, it inserts the characters 1234567890. However, a database might convert the number to 1.2E9. The two sets of characters represent the same value. Date formats for TO_CHAR and TO_DATE functions The Data Integration Service uses the date format in the TO_CHAR or TO_DATE function when the Data Integration Service pushes the function to the database. Use the TO_DATE functions to compare date or time values. When you use TO_CHAR to compare date or time values, the database can add a space or leading zero to values such as a single-digit month, single-digit day, or single-digit hour. The database comparison results can be different from the results of the Data Integration Service when the database adds a space or a leading zero. Precision The Data Integration Service and a database can have different precision for particular datatypes. Transformation datatypes use a default numeric precision that can vary from the native datatypes. The results can vary if the database uses a different precision than the Data Integration Service. SYSDATE or SYSTIMESTAMP function When you use the SYSDATE or SYSTIMESTAMP, the Data Integration Service returns the current date and time for the node that runs the service process. However, when you push the transformation logic to the database, the database returns the current date and time for the machine that hosts the database. If the time zone of the machine that hosts the database is not the same as the time zone of the machine that runs the Data Integration Service process, the results can vary.

82

Chapter 6: Filter Pushdown Optimization

If you push SYSTIMESTAMP to an IBM DB2 or a Sybase ASE database, and you specify the format for SYSTIMESTAMP, the database ignores the format and returns the complete time stamp. LTRIM, RTRIM, or SOUNDEX function When you push LTRIM, RTRIM, or SOUNDEX to a database, the database treats the argument (' ') as NULL, but the Data Integration Service treats the argument (' ') as spaces. LAST_DAY function on Oracle source When you push LAST_DAY to Oracle, Oracle returns the date up to the second. If the input date contains subseconds, Oracle trims the date to the second.

Comparing the Output of the Data Integration Service and Sources

83

CHAPTER 7

Mapplets
This chapter includes the following topics:
Mapplets Overview, 84 Mapplet Types, 84 Mapplets and Rules, 85 Mapplet Input and Output, 85 Creating a Mapplet, 86 Validating a Mapplet, 86 Segments, 86

Mapplets Overview
A mapplet is a reusable object containing a set of transformations that you can use in multiple mappings. Use a mapplet in a mapping. Or, validate the mapplet as a rule. Transformations in a mapplet can be reusable or non-reusable. When you use a mapplet in a mapping, you use an instance of the mapplet. Any change made to the mapplet is inherited by all instances of the mapplet. Mapplets can contain other mapplets. You can also use a mapplet more than once in a mapping or mapplet. You cannot have circular nesting of mapplets. For example, if mapplet A contains mapplet B, mapplet B cannot contain mapplet A.

Mapplet Types
The mapplet type is determined by the mapplet input and output. You can create the following types of mapplet:
Source. The mapplet contains a data source as input and an Output transformation as output. Target. The mapplet contains an Input transformation as input and a data source as output. Midstream. The mapplet contains an Input transformation and an Output transformation. It does not contain a

data source for input or output.

84

Mapplets and Rules


A rule is business logic that defines conditions applied to source data when you run a profile. It is a midstream mapplet that you use in a profile. A rule must meet the following requirements:
It must contain an Input and Output transformation. You cannot use data sources in a rule. It can contain Expression transformations, Lookup transformations, and passive data quality transformations. It

cannot contain any other type of transformation. For example, a rule cannot contain a Match transformation, as it is an active transformation.
It does not specify cardinality between input groups.

Note: Rule functionality is not limited to profiling. You can add any mapplet that you validate as a rule to a profile in the Analyst tool. For example, you can evaluate postal address data quality by selecting a rule configured to validate postal addresses and adding it to a profile.

Mapplet Input and Output


To use a mapplet in a mapping, you must configure it for input and output. A mapplet has the following input and output components:
Mapplet input. You can pass data into a mapplet from data sources or Input transformations or both. If you

validate the mapplet as a rule, you must pass data into the mapplet through an Input transformation. When you use an Input transformation, you connect it to a source or upstream transformation in the mapping.
Mapplet output. You can pass data out of a mapplet from data sources or Output transformations or both. If you

validate the mapplet as a rule, you must pass data from the mapplet through an Output transformation. When you use an Output transformation, you connect it to a target or downstream transformation in the mapping.
Mapplet ports. You can see mapplet ports in the mapping canvas. Mapplet input ports and output ports

originate from Input transformations and Output transformations. They do not originate from data sources.

Mapplet Input
Mapplet input can originate from a data source or from an Input transformation. You can create multiple pipelines in a mapplet. Use multiple data sources or Input transformations. You can also use a combination of data sources and Input transformations. Use one or more data sources to provide source data in the mapplet. When you use the mapplet in a mapping, it is the first object in the mapping pipeline and contains no input ports. Use an Input transformation to receive input from the mapping. The Input transformation provides input ports so you can pass data through the mapplet. Each port in the Input transformation connected to another transformation in the mapplet becomes a mapplet input port. Input transformations can receive data from a single active source. Unconnected ports do not appear in the mapping canvas. You can connect an Input transformation to multiple transformations in a mapplet. You can also connect one port in an Input transformation to multiple transformations in the mapplet.

Mapplets and Rules

85

Mapplet Output
Use a data source as output when you want to create a target mapplet. Use an Output transformation in a mapplet to pass data through the mapplet into a mapping. Use one or more data sources to provide target data in the mapplet. When you use the mapplet in a mapping, it is the last object in the mapping pipeline and contains no output ports. Use an Output transformation to pass output to a downstream transformation or target in a mapping. Each connected port in an Output transformation appears as a mapplet output port in a mapping. Each Output transformation in a mapplet appears as an output group. An output group can pass data to multiple pipelines in a mapping.

Creating a Mapplet
Create a mapplet to define a reusable object containing a set of transformations that you can use in multiple mappings. 1. 2. 3. 4. Select a project or folder in the Object Explorer view. Click File > New > Mapplet. Enter a mapplet name. Click Finish. An empty mapplet appears in the editor. 5. Add mapplet inputs, outputs, and transformations.

Validating a Mapplet
Validate a mapplet before you add it to a mapping. You can also validate a mapplet as a rule to include it in a profile. 1. 2. Right-click the mapplet canvas. Select Validate As > Mapplet or Validate As > Rule. The Validation Log displays mapplet error messages.

Segments
A segment consists of one or more objects in a mapping, mapplet, rule, or virtual stored procedure. A segment can include a source, target, transformation, or mapplet.

Copying a Segment
You can copy a segment when you want to reuse a portion of the mapping logic in another mapping, mapplet, rule, or virtual stored procedure.

86

Chapter 7: Mapplets

1. 2.

Open a the object that contains the segment you want to copy. Select a segment by highlighting each object you want to copy. Hold down the Ctrl key to select multiple objects. You can also select segments by dragging the pointer in a rectangle around objects in the editor.

3. 4. 5.

Click Edit > Copy to copy the segment to the clipboard. Open a target mapping, mapplet, rule, or virtual stored procedure. Click Edit > Paste.

Segments

87

CHAPTER 8

Object Import and Export


This chapter includes the following topics:
Object Import and Export Overview, 88 The Import/Export XML File, 89 Dependent Objects, 89 Exporting Objects, 90 Importing Objects, 91 Importing Application Archives, 91

Object Import and Export Overview


You can export objects to an XML file and then import objects from the XML file. When you export objects, the Developer tool creates an XML file that contains the metadata of the exported objects. Use this file to import the objects into a project or folder. You can also import application archives into a repository. Export and import objects to accomplish the following tasks:
Deploy metadata into production. After you test a mapping in a development repository, you can export it to an

XML file and then import it from the XML file into a production repository.
Archive metadata. You can export objects to an XML file that you no longer need before you remove them from

the repository.
Share metadata. You can share metadata with a third party. For example, you can send a mapping to someone

else for testing or analysis.


Copy metadata between repositories. You can copy objects between repositories that you cannot connect to

from the same client. Export the object and transfer the XML file to the target machine. Then import the object from the XML file into the target repository. You can choose the objects to export. The Developer tool exports the objects and the dependent objects. The Developer tool exports the last saved version of the object. You can export multiple objects from a project to one XML file. When you import objects, you import all objects in the XML file. You can export and import the following objects:
Projects Folders Applications

88

Reference tables Physical data objects Logical data object models Reusable transformations Mapplets Mappings SQL data services Profiles Scorecards

You can also import application archive files into a repository. Application archive files contain deployed applications. You cannot export empty projects or empty folders.

The Import/Export XML File


When you export objects, the Developer tool creates an XML file that contains the metadata of the objects. The Developer tool includes Cyclic Redundancy Checking Value (CRCVALUE) codes in the elements in the XML file. If you modify attributes in an element that contains a CRCVALUE code, you cannot import the object. Therefore, do not modify any exported object in the XML file.

Dependent Objects
When you export an object, the Developer tool also exports the dependent objects. A dependent object is an object that is used by another object. For example, a physical data object used as a mapping input is a dependent object of that mapping. The following table lists the dependent objects that the Developer tool includes in an export file:
Parent Object Application Logical data object model Dependent Child Objects Included in the XML File SQL data services and their dependent objects Logical data objects Physical data objects Reusable transformations and their dependent objects Mapplets and their dependent objects Reference tables Logical data objects Physical data objects Reusable transformations and their dependent objects

Transformation Mapplet

The Import/Export XML File

89

Parent Object Mapping

Dependent Child Objects Included in the XML File Logical data objects Physical data objects Reusable transformations and their dependent objects Mapplets and their dependent objects Logical data objects Physical data objects Reusable transformations and their dependent objects Mapplets and their dependent objects Logical data objects Physical data objects Profiles and their dependent objects

SQL data service

Profile

Scorecard

Projects and folders contain other objects, but they do not have dependent objects. When you export or import objects in a project or folder, the Developer tool preserves the object hierarchy.

Exporting Objects
You can export objects to an XML file to use in another project or folder. 1. Click File > Export. The Export wizard appears. 2. 3. 4. Select Informatica > Object Export File. Click Next. Click Browse to select a project from which to export objects. If you are exporting reference table data, complete the following fields:
Option Reference data location Description Location where you want to save reference table data. Enter a path that the Data Integration Service can write to. The Developer tool saves the reference table data as one or more dictionary .dic files. Data Integration Service on which the reference table staging database runs. Code page of the destination repository for the reference table data.

Data service Code page

5. 6. 7. 8.

Click Next. Select the objects to export. Enter the export file name and location. To view the dependent objects that the Export wizard exports with the objects you selected, click Next. The Export wizard displays the dependent objects.

9.

Click Finish. The Developer tool exports the objects to the XML file.

90

Chapter 8: Object Import and Export

Importing Objects
You can import objects from a Developer tool XML file or application archive file. You import the objects and any dependent objects into a project or folder. 1. Click File > Import. The Import wizard appears. 2. 3. 4. 5. 6. 7. 8. 9. Select Informatica > Object Import File. Click Next. Click Browse to select the export file that you want to import. Select the project from which to import objects. Click Browse to select the target project or folder. Specify how to handle duplicate objects. You can either replace existing objects with the imported objects or rename the imported objects. To view all of the objects the Import wizard imports from the file, click Next. Click Finish.

If you choose to rename duplicate objects, the Import wizard names the imported objects "CopyOf_<Original Name>." You can rename the objects after you import them.

Importing Application Archives


You can import objects from an application archive file. You import the application and any dependent objects into a repository. 1. Click File > Import. The Import wizard appears. 2. 3. 4. Select Informatica > Application Archive. Click Next. Click Browse to select the application archive file. The Developer tool lists the application archive file contents. 5. 6. Select the repository into which you want to import the application. Click Finish. The Developer tool imports the application into the repository. If the Developer tool finds duplicate objects, it renames the imported objects.

Importing Objects

91

CHAPTER 9

Export to PowerCenter
This chapter includes the following topics:
Export to PowerCenter Overview, 92 PowerCenter Release Compatibility, 93 Mapplet Export, 93 Export to PowerCenter Options, 94 Exporting an Object to PowerCenter, 95 Export Restrictions, 96 Rules and Guidelines for Exporting to PowerCenter, 97 Troubleshooting Exporting to PowerCenter, 98

Export to PowerCenter Overview


You can export objects from the Developer tool to use in PowerCenter. You can export the following objects:
Mappings. Export mappings to PowerCenter mappings or mapplets. Mapplets. Export mapplets to PowerCenter mapplets. Logical data object read mappings. Export the logical data object read mappings within a logical data object

model to PowerCenter mapplets. The export process ignores logical data object write mappings. You export objects to a PowerCenter repository or to an XML file. Export objects to PowerCenter to take advantage of capabilities that are exclusive to PowerCenter such as partitioning, web services, and high availability. When you export objects, you specify export options such as the PowerCenter release, how to convert mappings and mapplets, and whether to export reference tables. If you export objects to an XML file, PowerCenter users can import the file into the PowerCenter repository.

Example
A supermarket chain that uses PowerCenter 8.6 wants to create a product management tool to accomplish the following business requirements:
Create a model of product data so that each store in the chain uses the same attributes to define the data. Standardize product data and remove invalid and duplicate entries. Generate a unique SKU for each product.

92

Migrate the cleansed data to another platform. Ensure high performance of the migration process by performing data extraction, transformation, and loading in

parallel processes.
Ensure continuous operation if a hardware failure occurs.

The developers at the supermarket chain use the Developer tool to create mappings that standardize data, generate product SKUs, and define the flow of data between the existing and new platforms. They export the mappings to XML files. During export, they specify that the mappings be compatible with PowerCenter 8.6. Developers import the mappings into PowerCenter and create the associated sessions and workflows. They set partition points at various transformations in the sessions to improve performance. They also configure the sessions for high availability to provide failover capability if a temporary network, hardware, or service failure occurs.

PowerCenter Release Compatibility


To verify that objects are compatible with a certain PowerCenter release, set the PowerCenter release compatibility level. The compatibility level applies to all mappings, mapplets, and logical data object models you can view in Developer tool. You can configure the Developer tool to validate against a particular release of PowerCenter, or you can configure it to skip validation for release compatibility. By default, the Developer tool does not validate objects against any release of PowerCenter. Set the compatibility level to a PowerCenter release before you export objects to PowerCenter. If you set the compatibility level, the Developer tool performs two validation checks when you validate a mapping, mapplet, or logical data object model. The Developer tool first verifies that the object is valid in Developer tool. If the object is valid, the Developer tool then verifies that the object is valid for export to the selected release of PowerCenter. You can view compatibility errors in the Validation Log view.

Setting the Compatibility Level


Set the compatibility level to validate mappings, mapplets, and logical data object models against a PowerCenter release. If you select none, the Developer tool skips release compatibility validation when you validate an object. 1. 2. Click Edit > Compatibility Level. Select the compatibility level. The Developer tool places a dot next to the selected compatibility level in the menu. The compatibility level applies to all mappings, mapplets, and logical data object models you can view in the Developer tool.

Mapplet Export
When you export a mapplet or you export a mapping as a mapplet, the export process creates objects in the mapplet. The export process also renames some mapplet objects. The export process might create the following mapplet objects in the export XML file:

PowerCenter Release Compatibility

93

Expression transformations The export process creates an Expression transformation immediately downstream from each Input transformation and immediately upstream from each Output transformation in a mapplet. The export process names the Expression transformations as follows: Expr_<InputOrOutputTransformationName> The Expression transformations contain pass-through ports. Output transformations If you export a mapplet and convert targets to Output transformations, the export process creates an Output transformation for each target. The export process names the Output transformations as follows: <MappletInstanceName>_<TargetName> The export process renames the following mapplet objects in the export XML file: Mapplet Input and Output transformations The export process names mapplet Input and Output transformations as follows: <TransformationName>_<InputOrOutputGroupName> Mapplet ports The export process renames mapplet ports as follows: <PortName>_<GroupName>

Export to PowerCenter Options


When you export an object for use in PowerCenter, you must specify the export options. The following table describes the export options:
Option Project Target release Export selected objects to file Description Project in the model repository from which to export objects. PowerCenter release number. Exports objects to a PowerCenter XML file. If you select this option, specify the export XML file name and location. Exports objects to a PowerCenter repository. If you select this option, you must specify the following information for the PowerCenter repository: - Host name. PowerCenter domain gateway host name. - Port number. PowerCenter domain gateway HTTP port number. - User name. Repository user name. - Password. Password for repository user name. - Security domain. LDAP security domain name, if one exists. Otherwise, enter "Native." - Repository name. PowerCenter repository name. Exports objects to the specified folder in the PowerCenter repository. Exports objects to the PowerCenter repository using the specified pmrep control file.

Export selected objects to PowerCenter repository

Send to repository folder Use control file

94

Chapter 9: Export to PowerCenter

Option Convert exported mappings to PowerCenter mapplets

Description Converts Developer tool mappings to PowerCenter mapplets. The Developer tool converts sources and targets in the mappings to Input and Output transformations in a PowerCenter mapplet. Converts targets in mapplets to Output transformations in the PowerCenter mapplet. PowerCenter mapplets cannot contain targets. If you export mapplets that contain targets and you do not select this option, the export process fails. Exports any reference table data used by a transformation in an object you export. Location where you want to save reference table data. Enter a path that the Data Integration Service can write to. The Developer tool saves the reference table data as one or more dictionary .dic files. Data Integration Service on which the reference table staging database runs. Code page of the PowerCenter repository.

Convert target mapplets

Export reference data Reference data location

Data service Code page

Exporting an Object to PowerCenter


When you export mappings, mapplets, or logical data object read mappings to PowerCenter, you can export the objects to a file or to the PowerCenter repository. Before you export an object, set the compatibility level to the appropriate PowerCenter release. Validate the object to verify that it is compatible with the PowerCenter release. 1. Click File > Export. The Export dialog box appears. 2. 3. Select Informatica > PowerCenter. Click Next. The Export to PowerCenter dialog box appears. 4. 5. 6. 7. 8. 9. Select the project. Select the PowerCenter release. Choose the export location, a PowerCenter import XML file or a PowerCenter repository. If you export to a PowerCenter repository, select the PowerCenter or the pmrep control file that defines how to import objects into PowerCenter. Specify the export options. Click Next. The Developer tool prompts you to select the objects to export. 10. Select the objects to export and click Finish. The Developer tool exports the objects to the location you selected. If you exported objects to a file, you can import objects from the XML file into the PowerCenter repository. If you export reference data, copy the reference table files to the PowerCenter dictionary directory on the machine that hosts Informatica Services:

Exporting an Object to PowerCenter

95

<PowerCenter Installation Directory>\services\<Informatica Developer Project Name>\<Informatica Developer Folder Name>

Export Restrictions
Some Developer tool objects are not valid in PowerCenter. The following objects are not valid in PowerCenter: Objects with long names PowerCenter users cannot import a mapping, mapplet, or object within a mapping or mapplet if the object name exceeds 80 characters. Mappings or mapplets that contain a Custom Data transformation You cannot export mappings or mapplets that contain Custom Data transformations. Mappings or mapplets that contain a Joiner transformation with certain join conditions The Developer tool does not allow you to export mappings and mapplets that contain a Joiner transformation with a join condition that is not valid in PowerCenter. In PowerCenter, a user defines join conditions based on equality between the specified master and detail sources. In the Developer tool, you can define other join conditions. For example, you can define a join condition based on equality or inequality between the master and detail sources. You can define a join condition that contains transformation expressions. You can also define a join condition, such as 1 = 1, that causes the Joiner transformation to perform a cross-join. These types of join conditions are not valid in PowerCenter. Therefore, you cannot export mappings or mapplets that contain Joiner transformations with these types of join conditions to PowerCenter. Mappings or mapplets that contain a Lookup transformation with renamed ports The PowerCenter Integration Service queries the lookup source based on the lookup ports in the transformation and a lookup condition. Therefore, the port names in the Lookup transformation must match the column names in the lookup source. Mappings or mapplets that contain a Lookup transformation that returns all rows The export process might fail if you export a mapping or mapplet with a Lookup transformation that returns all rows that match the lookup condition. The export process fails when you export the mapping or mapplet to PowerCenter 8.x. The Return all rows option was added to the Lookup transformation in PowerCenter 9.0. Therefore, the option is not valid in earlier versions of PowerCenter. Mappings or mapplets that contain PowerExchange data objects If you export a mapping that includes a PowerExchange data object, the Developer tool does not export the PowerExchange data object. Mapplets that concatenate ports The export process fails if you export a mapplet that contains a multigroup Input transformation and the ports in different input groups are connected to the same downstream transformation or transformation output group. Nested mapplets with unconnected Lookup transformations The export process fails if you export any type of mapping or mapplet that contains another mapplet with an unconnected Lookup transformation.

96

Chapter 9: Export to PowerCenter

Nested mapplets with Update Strategy transformations when the mapplets are upstream from a Joiner transformation Mappings and mapplets that contain an Update Strategy transformation upstream from a Joiner transformation are not valid in Developer tool or in PowerCenter. Verify that mappings or mapplets to export do not contain an Update Strategy transformation in a nested mapplet upstream from a Joiner transformation. Mappings with an SAP source When you export a mapping with an SAP source, the Developer tool exports the mapping without the SAP source. When you import the mapping into the PowerCenter repository, the PowerCenter Client imports the mapping without the source. The output window displays a message indicating the mapping is not valid. You must manually create the SAP source in PowerCenter and add it to the mapping.

Rules and Guidelines for Exporting to PowerCenter


Due to differences between the Developer tool and PowerCenter, some Developer tool objects might not be compatible with PowerCenter. Use the following rules and guidelines when you export objects to PowerCenter: Verify the PowerCenter release. When you export to PowerCenter 9.0.1, the Developer tool and PowerCenter must be running the same HotFix version. You cannot export mappings and mapplets to PowerCenter version 9.0. Verify that object names are unique. If you export an object to a PowerCenter repository, the export process replaces the PowerCenter object if it has the same name as an exported object. Verify that the code pages are compatible. The export process fails if the Developer tool and PowerCenter use code pages that are not compatible. Verify precision mode. By default, the Developer tool runs mappings and mapplets with high precision enabled and PowerCenter runs sessions with high precision disabled. If you run Developer tool mappings and PowerCenter sessions in different precision modes, they can produce different results. To avoid differences in results, run the objects in the same precision mode. Copy reference data. When you export mappings or mapplets with transformations that use reference tables, you must copy the reference tables to a directory where the PowerCenter Integration Service can access them. Copy the reference tables to the directory defined in the INFA_CONTENT environment variable. If INFA_CONTENT is not set, copy the reference tables to the following PowerCenter services directory: $INFA_HOME\services\<Developer Tool Project Name>\<Developer Tool Folder Name>

Rules and Guidelines for Exporting to PowerCenter

97

Troubleshooting Exporting to PowerCenter


The export process fails when I export a mapplet that contains objects with long names.
When you export a mapplet or you export a mapping as a mapplet, the export process creates or renames some objects in the mapplet. The export process might create Expression or Output transformations in the export XML file. The export process also renames Input and Output transformations and mapplet ports. To generate names for Expression transformations, the export process appends characters to Input and Output transformation names. If you export a mapplet and convert targets to Output transformations, the export process combines the mapplet instance name and target name to generate the Output transformation name. When the export process renames Input transformations, Output transformations, and mapplet ports, it appends group names to the object names. If an existing object has a long name, the exported object name might exceed the 80 character object name limit in the export XML file or in the PowerCenter repository. When an object name exceeds 80 characters, the export process fails with an internal error. If you export a mapplet, and the export process returns an internal error, check the names of the Input transformations, Output transformations, targets, and ports. If the names are long, shorten them.

98

Chapter 9: Export to PowerCenter

CHAPTER 10

Deployment
This chapter includes the following topics:
Deployment Overview, 99 Creating an Application, 100 Deploying an Object to a Data Integration Service, 100 Deploying an Object to a File, 101 Updating an Application, 101 Mapping Deployment Properties, 102 Application Changes, 103

Deployment Overview
Deploy objects to make them accessible to end users. You can deploy physical and logical data objects, mappings, SQL data services, and applications. Deploy objects to allow users to query the objects through a third-party client tool or run mappings at the command line. When you deploy an object, you isolate the object from changes in data structures. If you make changes to an object in the Developer tool after you deploy it, you must redeploy the application that contains the object for the changes to take effect. To deploy an object, perform one of the following actions:
Deploy an object directly. Deploy an object directly when you want to make the object available to end users

without modifying it. You can deploy a physical or logical data object, an SQL data service, or a mapping directly. The Developer tool prompts you to create an application. The Developer tool adds the object to the application. When you deploy a data object, the Developer tool also prompts you to create an SQL data service based on the data object.
Deploy an application that contains the object. Create an application when you want to deploy multiple objects

at the same time. When you create an application, you select the objects to include in the application. When you deploy objects, you must choose the deployment location. You can deploy objects to a Data Integration Service or a network file system. When you deploy an application to a Data Integration Service, end users can connect to the application and run queries against the objects or run mappings. The end users must have the appropriate permissions in the Administrator tool to query objects or run mappings. When you deploy an object to a network file system, the Developer tool creates an application archive file. Deploy an object to a network file system if you want to check the application into a version control system. You can also deploy an object to a file if your organization requires that administrators deploy objects to Data Integration Services. An administrator can deploy application archive files to Data Integration Services through the Administrator tool.
99

Creating an Application
Create an application when you want to deploy multiple objects at the same time. When you create an application, you select the objects to include in the application. 1. 2. Select a project or folder in the Object Explorer view. Click File > New > Application. The New Application dialog box appears. 3. 4. Enter a name for the application. Click Browse to select the application location. You must create the application in a project or a folder. 5. Click Next. The Developer tool prompts you for the objects to include in the application. 6. Click Add. The Add Objects dialog box appears. 7. Select one or more SQL data services, mappings, or reference tables and click OK. The Developer tool lists the objects you select in the New Application dialog box. 8. If the application contains mappings, choose whether to override the default mapping configuration when you deploy the application. If you select this option, choose a mapping configuration. The Developer tool sets the mapping deployment properties for the application to the same values as the settings in the mapping configuration. 9. Click Finish. The Developer tool adds the application to the project or folder. After you create an application, you must deploy the application so end users can query the objects or run the mappings.

Deploying an Object to a Data Integration Service


Deploy an object to a Data Integration Service so end users can query the object through a JDBC or ODBC client tool or run mappings from the command line. 1. Right-click an object in the Object Explorer view and click Deploy. The Deploy dialog box appears. 2. 3. Select Deploy to Service. Click Browse to select the domain. The Choose Domain dialog box appears. 4. Select a domain and click OK. The Developer tool lists the Data Integration Services associated with the domain in the Available Services section of the Deploy Application dialog box. 5. 6. Select the Data Integration Services to which you want to deploy the application. If you deploy a data object or an SQL data service, click Next and enter an application name.

100

Chapter 10: Deployment

7. 8.

If you deploy a data object, click Next and enter an SQL data service name. If you deploy a data object, click Next and add virtual tables to the SQL data service. By default, the Developer tool creates one virtual table based on the data object you deploy.

9.

Click Finish.

The Developer tool deploys the application to the Data Integration Services. End users can query the objects or run the mappings in the application.

Deploying an Object to a File


Deploy an object to an application archive file if you want to check the application into version control or if your organization requires that administrators deploy objects to Data Integration Services. 1. Right-click the application in the Object Explorer view and click Deploy. The Deploy dialog box appears. 2. 3. Select Deploy to File System. Click Browse to select the directory. The Choose a Directory dialog box appears. 4. 5. 6. 7. Select the directory and click OK. If you deploy a data object or an SQL data service, click Next and enter an application name. If you deploy a data object, click Next and enter an SQL data service name. If you deploy a data object, click Next and add virtual tables to the SQL data service. By default, the Developer tool creates one virtual table based on the data object you deploy. 8. Click Finish. The Developer tool deploys the application to an application archive file. Before end users can access the application, you must deploy the application to a Data Integration Service. Or, an administrator must deploy the application to a Data Integration Service through the Administrator tool.

Updating an Application
Update an application when you want to add objects to an application, remove objects from an application, or update mapping deployment properties. 1. 2. 3. Open the application you want to update. To add or remove objects, click the Overview view. To add objects to the application, click Add. The Developer tool prompts you to choose the SQL data services, mappings, or reference tables to add to the application. 4. 5. 6. To remove an object from the application, select the object, and click Remove. To update mapping deployment properties, click the Advanced view and change the properties. Save the application.

Deploying an Object to a File

101

Redeploy the application if you want end users to be able to access the updated application.

Mapping Deployment Properties


When you update an application that contains a mapping, you can set the deployment properties the Data Integration Services uses when end users run the mapping. Set mapping deployment properties on the Advanced view of the application. You can set the following properties:
Property Default date time format Description Date/time format the Data Integration Services uses when the mapping converts strings to dates. Default is MM/DD/YYYY HH24:MI:SS. Overrides the tracing level for each transformation in the mapping. The tracing level determines the amount of information the Data Integration Service sends to the mapping log files. Choose one of the following tracing levels: - None. The Data Integration Service does not override the tracing level that you set for each transformation. - Terse. The Data Integration Service logs initialization information, error messages, and notification of rejected data. - Normal. The Data Integration Service logs initialization and status information, errors encountered, and skipped rows due to transformation row errors. It summarizes mapping results, but not at the level of individual rows. - Verbose Initialization. In addition to normal tracing, the Data Integration Service logs additional initialization details, names of index and data files used, and detailed transformation statistics. - Verbose Data. In addition to verbose initialization tracing, the Data Integration Service logs each row that passes into the mapping. The Data Integration Service also notes where it truncates string data to fit the precision of a column and provides detailed transformation statistics. The Data Integration Service writes row data for all rows in a block when it processes a transformation. Default is None. Order in which the Data Integration Service sorts character data in the mapping. Default is Binary. Controls the optimization methods that the Data Integration Service applies to a mapping as follows: - None. The Data Integration Service does not optimize the mapping. - Minimal. The Data Integration Service applies the early projection optimization method to the mapping. - Normal. The Data Integration Service applies the early projection, early selection, and predicate optimization methods to the mapping. - Full. The Data Integration Service applies the early projection, early selection, predicate optimization, and semi-join optimization methods to the mapping. Default is Normal. Runs the mapping with high precision. High precision data values have greater accuracy. Enable high precision if the mapping produces large numeric values, for example, values with precision of more than 15 digits, and you require accurate values. Enabling high precision prevents precision loss in large numeric values.

Override tracing level

Sort order

Optimizer level

High precision

102

Chapter 10: Deployment

Property

Description Default is enabled.

Application Changes
When you change an application or change an object included in the application and you want end users to access the latest version of the application, you must deploy the application again. When you change an application or its contents and you deploy the application to the same Data Integration Service, the Data Integration Service replaces the objects and the mapping deployment properties in the application. Additionally, the Developer tool allows you to preserve or reset the SQL data service and virtual table properties for the application in the Administrator tool. The Developer tool gives you the following choices:
Update. If the application contains an SQL data service and an administrator changed the SQL data service or

virtual table properties in the Administrator tool, the Data Integration Service preserves the properties in the Administrator tool.
Replace. If the application contains an SQL data service and an administrator changed the SQL data service or

virtual table properties in the Administrator tool, the Data Integration Service resets the properties in the Administrator tool to the default values. When you change an application that contains a mapping and an administrator changed the mapping properties in the Administrator tool, the Data Integration Service replaces the Administrator tool mapping properties with the Developer tool mapping deployment properties. When you change an application and deploy it to a network file system, the Developer tool allows you to replace the application archive file or cancel the deployment. If you replace the application archive file, the Developer tool replaces the objects in the application and the mapping deployment properties.

Application Changes

103

CHAPTER 11

Parameters and Parameter Files


This chapter includes the following topics:
Parameters and Parameter Files Overview, 104 Parameters, 104 Parameter Files, 107

Parameters and Parameter Files Overview


Parameters and parameter files allow you to define mapping values and update those values each time you run a mapping. The Data Integration Service applies parameter values when you run a mapping from the command line and specify a parameter file. Create parameters so you can rerun a mapping with different relational connection, flat file, or reference table values. You define the parameter values in a parameter file. When you run the mapping from the command line and specify a parameter file, the Data Integration Service uses the parameter values defined in the parameter file. To run mappings with different parameter values, perform the following tasks: 1. 2. 3. 4. 5. Create a parameter and assign it a default value. Apply the parameter to a data object or to a transformation in the mapping. Add the mapping to an application and deploy the application. Create a parameter file that contains the parameter value. Run the mapping from the command line with the parameter file.

For example, you create a mapping that processes customer orders. The mapping reads customer information from a relational table that contains customer data for one country. You want to use the mapping for customers in the United States, Canada, and Mexico. Create a parameter that represents the connection to the customers table. Create three parameter files that set the connection name to the U.S. customers table, the Canadian customers table, and the Mexican customers table. Run the mapping from the command line, using a different parameter file for each mapping run.

Parameters
Parameters represent values that change between mapping runs. You can create parameters that represent relational connections, flat file names, flat file directories, reference table names, and reference table directories.

104

Create parameters so you can rerun a mapping with different values. For example, create a mapping parameter that represents a reference table name if you want to run a mapping with different reference tables. All parameters in the Developer tool are user-defined. You can create the following types of parameters:
Connection. Represents a relational connection. You cannot create connection parameters for nonrelational

database or SAP physical data objects.


String. Represents a flat file name, flat file directory, reference table name, or reference table directory.

When you create a parameter, you enter the parameter name and optional description, select the parameter type, and enter the default value. Each parameter must have a default value. When you run a mapping from the command line with a parameter file, the Data Integration Service resolves all parameters to the values set in the parameter file. The Data Integration Service resolves parameters to the default values in the following circumstances:
You run a mapping or preview mapping results within the Developer tool. You query an SQL data service that uses a data source that contains parameters. You run a mapping from the command line without a parameter file. You copy a mapping fragment from a mapping that has parameters defined and some of the transformations in

the mapping use the parameters. The Developer tool does not copy the parameters to the target mapping.
You export a mapping or mapplet for use in PowerCenter.

Where to Create Parameters


Create parameters to define values that change between mapping runs. Create connection parameters to define connections. Create string parameters to define flat file and reference table names and file paths. The following table lists the objects in which you can create parameters:
Object Flat file data objects Customized data objects (reusable) Mappings Parameter Type String Connection Scope You can use the parameter in the data object. You can use the parameter in the customized data object.

Connection, String

You can use the parameter in any nonreusable data object or transformation in the mapping that accepts parameters. You can use the parameter in any nonreusable data object or transformation in the mapplet that accepts parameters. You can use the parameter in the transformation.

Mapplets

Connection, String

Case Converter transformation (reusable) Labeler transformation (reusable) Parser transformation (reusable) Standardizer transformation (reusable)

String

String

You can use the parameter in the transformation.

String

You can use the parameter in the transformation.

String

You can use the parameter in the transformation.

Parameters

105

Where to Assign Parameters


Assign a parameter to a field when you want the Data Integration Service to replace the parameter with the value defined the a parameter file. The following table lists the objects and fields where you can assign parameters:
Object Flat file data objects Field Source file name Source file directory Output file name Output file directory Connection Connection Reference table Reference table Connection Reference table Reference table

Customized data objects Read transformation created from related relational data objects Case Converter transformation (reusable and nonreusable) Labeler transformation (reusable and nonreusable) Lookup transformation (nonreusable) Parser transformation (reusable and nonreusable) Standardizer transformation (reusable and nonreusable)

Creating a Parameter
Create a parameter to represent a value that changes between mapping runs. 1. 2. 3. Open the physical data object, mapping, or transformation where you want to create a parameter. Click the Parameters view. Click Add. The Add Parameter dialog box appears. 4. 5. 6. Enter the parameter name. Optionally, enter a parameter description. Select the parameter type. Select Connection to create a connection parameter. Select String to create a file name, file path, reference table name, or reference table path parameter. 7. Enter a default value for the parameter. For connection parameters, select a connection. For string parameters, enter a file name or file path. 8. Click OK. The Developer tool adds the parameter to the list of parameters.

Assigning a Parameter
Assign a parameter to a field so that when you run a mapping from the command line, the Data Integration Service replaces the parameter with the value defined in the parameter file.

106

Chapter 11: Parameters and Parameter Files

1. 2.

Open the field in which you want to assign a parameter. Click Assign Parameter. The Assign Parameter dialog box appears.

3. 4.

Select the parameter. Click OK.

Parameter Files
A parameter file is an XML file that lists parameters and their assigned values. The parameter values define properties for a data object, transformation, mapping, or mapplet. The Data Integration Service applies these values when you run a mapping from the command line and specify a parameter file. Parameter files provide you with the flexibility to change parameter values each time you run a mapping. You can define parameters for multiple mappings in a single parameter file. You can also create multiple parameter files and use a different file each time you run a mapping. The Data Integration Service reads the parameter file at the start of the mapping run to resolve the parameters. To run a mapping with a parameter file, use the infacmd ms RunMapping command. The -pf argument specifies the parameter file name. The machine from which you run the mapping must have access to the parameter file. The Data Integration Service fails the mapping if you run it with a parameter file and any of the following circumstances are true:
The Data Integration Service cannot access the parameter file. The parameter file is not valid or does not exist. Objects of the same type exist in the same project or folder, have the same name, and use parameters. For

example, a folder contains Labeler transformation "T1" and Standardizer transformation "T1." If both transformations use parameters, the Data Integration Service fails the mapping when you run it with a parameter file. If the objects are in different folders, or if one object does not use parameters, the Data Integration Service does not fail the mapping.

Parameter File Structure


A parameter file is an XML file that contains at least one parameter and its assigned value. The Data Integration Service uses the hierarchy defined in the parameter file to identify parameters and their defined values. The hierarchy identifies the physical data object or the transformation that uses the parameter. You define parameter values within the following top-level elements:
Application/mapping/project elements. When you define a parameter within the application/mapping/project

elements, the Data Integration Service applies the parameter value when you run the specified mapping in the application. For example, you want the Data Integration Service to apply a parameter value when you run mapping "MyMapping" in deployed application "MyApp." You do not want to use the parameter value when you run a mapping in any other application or when you run another mapping in "MyApp." Define the parameter within the following elements:
<application name="MyApp"> <mapping name="MyMapping"> <project name="MyProject"> <!-- Define the parameter here. --> </project> </mapping>

Parameter Files

107

</application> Project element. When you omit the application/mapping/project element and define a parameter within a

project top-level element, the Data Integration Service applies the parameter value when you run any mapping that has no application/mapping/project element defined in the parameter file. The Data Integration Service searches for parameter values in the following order: 1. 2. 3. The value specified within an application element. The value specified within a project element. The parameter default value.

Use the infacmd ms ListMappingParams command to list the parameters used in a mapping with the default values. You can use the output of this command as a parameter file template. Observe the following rules when you create a parameter file:
Parameter values cannot be empty. For example, the Data Integration Service fails the mapping run if the

parameter file contains the following entry:


<parameter name="Param1"> </parameter> Within an element, artifact names are not case-sensitive. Therefore, the Data Integration Service interprets

<application name="App1"> and <application name="APP1"> as the same application. The following example shows a sample parameter file:
<?xml version="1.0"?> <root description="Sample Parameter File" xmlns="http://www.informatica.com/Parameterization/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <!-The Data Integration Service uses this section only when you run mapping "Map1" or "Map2" in deployed application "App1." This section assigns values to parameters created in mappings "Map1" and "Map2." --> <application name="App1"> <mapping name="Map1"> <project name="Project1"> <mapping name="Map1"> <parameter name="MAP1_PARAM1">MAP1_PARAM1_VAL</parameter> <parameter name="MAP1_PARAM2">MAP1_PARAM2_VAL</parameter> </mapping> </project> </mapping> <mapping name="Map2"> <project name="Project1"> <mapping name="Map2"> <parameter name="MAP2_PARAM1">MAP2_PARAM1_VAL</parameter> <parameter name="MAP2_PARAM2">MAP2_PARAM2_VAL</parameter> </mapping> </project> </mapping> </application> The Data Integration Service uses this section only when you run mapping "Map1" in deployed application "App2." This section assigns values to parameters created in the following objects: * Data source "DS1" in mapping "Map1" * Mapping "Map1" <!--

--> <application name="App2"> <mapping name="Map1"> <project name="Project1"> <dataSource name="DS1"> <parameter name="PROJ1_DS1">PROJ1_DS1_APP2_MAP1_VAL</parameter> <parameter name="PROJ1_DS1">PROJ1_DS1_APP2_MAP1_VAL</parameter> </dataSource> <mapping name="Map1">

108

Chapter 11: Parameters and Parameter Files

<parameter name="MAP1_PARAM2">MAP1_PARAM2_VAL</parameter> </mapping> </project> </mapping> </application> The Data Integration Service uses this section when you run any mapping other than "Map1" in application "App1," "Map2" in application "App1," or "Map1" in application "App2." This section assigns values to parameters created in the following objects: * Reusable data source "DS1" * Mapplet "DS1" <!--

--> <project name="Project1"> <dataSource name="DS1"> <parameter name="PROJ1_DS1">PROJ1_DS1_VAL</parameter> <parameter name="PROJ1_DS1_PARAM1">PROJ1_DS1_PARAM1_VAL</parameter> </dataSource> <mapplet name="DS1"> <parameter name="PROJ1_DS1">PROJ1_DS1_VAL</parameter> <parameter name="PROJ1_DS1_PARAM1">PROJ1_DS1_PARAM1_VAL</parameter> </mapplet> </project> <!--

The Data Integration Service uses this section when you run any mapping other than "Map1" in application "App1," "Map2" in application "App1," or "Map1" in application "App2." This section assigns values to parameters created in the following objects: * Reusable transformation "TX2" * Mapplet "MPLT1" in folder "Folder2" * Mapplet "RULE1" in nested folder "Folder2_1_1"

--> <project name="Project2"> <transformation name="TX2"> <parameter name="RTM_PATH">Project1\Folder1\RTM1</parameter> </transformation> <folder name="Folder2"> <mapplet name="MPLT1"> <parameter name="PROJ2_FOLD2_MPLT1">PROJ2_FOLD2_MPLT1_VAL</parameter> </mapplet> <folder name="Folder2_1"> <folder name="Folder2_1_1"> <mapplet name="RULE1"> <parameter name="PROJ2_RULE1">PROJ2_RULE1_VAL</parameter> </mapplet> </folder> </folder> </folder> </project> </root>

Parameter File Schema Definition


A parameter file must conform to the structure of the parameter file XML schema definition (XSD). If the parameter file does not conform to the schema definition, the Data Integration Service fails the mapping run. The parameter file XML schema definition appears in the following directories:
On the machine that hosts the Developer tool: <Informatica Installation Directory>\clients\DeveloperClient\infacmd\plugins\ms \parameter_file_schema_1_0.xsd On the machine that hosts Informatica Services: <Informatica Installation Directory>\isp\bin\plugins\ms\parameter_file_schema_1_0.xsd

The following example shows the parameter file XML schema definition:
<?xml version="1.0"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.informatica.com/Parameterization/1.0"

Parameter Files

109

xmlns:pf="http://www.informatica.com/Parameterization/1.0" elementFormDefault="qualified"> <simpleType name="nameType"> <restriction base="string"> <minLength value="1"/> </restriction> </simpleType> <complexType name="parameterType"> <simpleContent> <extension base="string"> <attribute name="name" type="pf:nameType" use="required"/> </extension> </simpleContent> </complexType> <complexType name="designObjectType" abstract="true"> <sequence> <element name="parameter" type="pf:parameterType" minOccurs="1" maxOccurs="unbounded"/> </sequence> <attribute name="name" type ="pf:nameType" use="required"/> </complexType> <complexType name="dataSourceType"> <complexContent > <extension base="pf:designObjectType"/> </complexContent> </complexType> <complexType name="mappletType"> <complexContent > <extension base="pf:designObjectType"/> </complexContent> </complexType> <complexType name="transformationType"> <complexContent > <extension base="pf:designObjectType"/> </complexContent> </complexType> <complexType name="mappingType"> <complexContent > <extension base="pf:designObjectType"/> </complexContent> </complexType> <complexType name="deployedObjectType" abstract="true"> <sequence> <element name="project" type="pf:designContainerType" minOccurs="1" maxOccurs="unbounded"/> </sequence> <attribute name="name" type="pf:nameType" use="required"/> </complexType> <complexType name="deployedMappingType"> <complexContent> <extension base="pf:deployedObjectType"/> </complexContent> </complexType> <complexType name="containerType" abstract="true"> <attribute name="name" type ="pf:nameType" use="required"/> </complexType> <complexType name="designContainerType"> <complexContent> <extension base="pf:containerType"> <choice minOccurs="1" maxOccurs="unbounded"> <element name="dataSource" type="pf:dataSourceType"/> <element name="mapplet" type="pf:mappletType"/> <element name="transformation" type="pf:transformationType"/> <element name="mapping" type="pf:mappingType"/> <element name="folder" type="pf:designContainerType"/> </choice> </extension>

110

Chapter 11: Parameters and Parameter Files

</complexContent> </complexType> <complexType name="applicationContainerType"> <complexContent> <extension base="pf:containerType"> <sequence> <element name="mapping" type="pf:deployedMappingType" minOccurs="1" maxOccurs="unbounded"/> </sequence> </extension> </complexContent> </complexType> <element name="root"> <complexType> <choice minOccurs="1" maxOccurs="unbounded"> <element name="application" type="pf:applicationContainerType"/> <element name="project" type="pf:designContainerType"/> </choice> <attribute name="description" type ="string" use="optional"/> </complexType> </element> </schema>

Creating a Parameter File


The infacmd ms ListMappingParams command lists the parameters used in a mapping and the default value for each parameter. Use the output of this command to create a parameter file. 1. Run the infacmd ms ListMappingParams command to list all parameters used in a mapping and the default value for each parameter. The -o argument sends command output to an XML file. For example, the following command lists the parameters in mapping Mapping1 in file "MyOutputFile.xml":
infacmd ms ListMappingParams -dn MyDomain -sn MyDataIntSvs -un MyUser -pd MyPassword -a MyApplication -m MyMapping -o "MyOutputFile.xml"

The Data Integration Service lists all parameters in the mapping with their default values. 2. 3. 4. If you did not specify the -o argument, copy the command output to an XML file and save the file. Edit the XML file and replace the parameter default values with the values you want to use when you run the mapping. Save the XML file.

Run the mapping with the infacmd ms RunMapping command. Use the -pf argument to specify the parameter file name.

Parameter Files

111

CHAPTER 12

Viewing Data
This chapter includes the following topics:
Viewing Data Overview, 112 Selecting a Default Data Integration Service, 112 Configurations, 113 Exporting Data, 117 Logs, 117

Viewing Data Overview


You can run a mapping, run a profile, preview data, or execute an SQL query. You can run mappings from the command line, from the Run dialog box, or from the Data Viewer view. You can run a profile, preview data, and execute an SQL query from the Data Viewer view. Before you can view data, you need to select a default Data Integration Service. You can also add other Data Integration Services to use when you view data. You can create configurations to control settings that the Developer tool applies when you run a mapping or preview data. When you view data in the Data Viewer, you can export the data to a file. You can also access logs that show log events.

Selecting a Default Data Integration Service


The Data Integration Service performs data integration tasks in the Developer tool. You can select any Data Integration Service that is available in the domain. Select a default Data Integration Service. You can override the default Data Integration Service when you run a mapping or preview data. Add a domain before you select a Data Integration Service. 1. Click Window > Preferences. The Preferences dialog box appears. 2. 3. Select Informatica > Data Integration Services. Expand the domain.

112

4. 5. 6.

Select a Data Integration Service. Click Set as Default. Click OK.

Configurations
A configuration is a group of settings that the Developer tool applies when you run a mapping or preview output. A configuration controls settings such as the default Data Integration Service, number of rows to read from a source, default date/time format, and optimizer level. The configurations that you create apply to your installation of the Developer tool. You can create the following configurations:
Data viewer configurations. Control the settings the Developer tool applies when you preview output in the

Data Viewer view.


Mapping configurations. Control the settings the Developer tool applies when you run mappings through the

Run dialog box or from the command line.

Data Viewer Configurations


Data viewer configurations control the settings that the Developer tool applies when you preview output in the Data Viewer view. You can select a data viewer configuration when you preview output for the following objects:
Custom data objects Logical data objects Logical data object read mappings Logical data object write mappings Mappings Mapplets Physical data objects Virtual stored procedures Virtual tables Virtual table mappings

Creating a Data Viewer Configuration


Create a data viewer configuration to control the settings the Developer tool applies when you preview output in the Data Viewer view. 1. Click Run > Open Run Dialog. The Run dialog box appears. 2. 3. 4. Click Data Viewer Configuration. Click the New button. Enter a name for the data viewer configuration.

Configurations

113

5. 6. 7.

Configure the data viewer configuration properties. Click Apply. Click Close. The Developer tool creates the data viewer configuration.

Mapping Configurations
Mapping configurations control the mapping deployment properties that the Developer tool uses when you run a mapping through the Run dialog box or from the command line. To apply a mapping configuration to a mapping that you run through the Developer tool, you must run the mapping through the Run dialog box. If you run the mapping through the Run menu or mapping editor, the Developer tool runs the mapping with the default mapping deployment properties. To apply mapping deployment properties to a mapping that you run from the command line, select the mapping configuration when you add the mapping to an application. The mapping configuration that you select applies to all mappings in the application. You can change the mapping deployment properties when you edit the application. An administrator can also change the mapping deployment properties through the Administrator tool. You must redeploy the application for the changes to take effect.

Creating a Mapping Configuration


Create a mapping configuration to control the mapping deployment properties that the Developer tool uses when you run mappings through the Run dialog box or from the command line. 1. Click Run > Open Run Dialog. The Run dialog box appears. 2. 3. 4. 5. 6. 7. Click Mapping Configuration. Click the New button. Enter a name for the mapping configuration. Configure the mapping configuration properties. Click Apply. Click Close. The Developer tool creates the mapping configuration.

Updating the Default Configuration Properties


You can update the default data viewer and mapping configuration properties. 1. Click Window > Preferences. The Preferences dialog box appears. 2. 3. 4. 5. 6. Click Informatica > Run Configurations. Select the Data Viewer or Mapping configuration. Configure the data viewer or mapping configuration properties. Click Apply. Click OK. The Developer tool updates the default configuration properties.

114

Chapter 12: Viewing Data

Configuration Properties
The Developer tool applies configuration properties when you preview output or you run mappings. Set configuration properties for the Data Viewer view or mappings in the Run dialog box.

Data Integration Service Properties


The Developer tool displays the Data Integration Service tab for data viewer and mapping configurations. The following table displays the properties that you configure for the Data Integration Service:
Property Use default Data Integration Service Data Integration Service Description Uses the default Data Integration Service to run the mapping. Default is enabled. Specifies the Data Integration Service that runs the mapping if you do not use the default Data Integration Service.

Source Properties
The Developer tool displays the Source tab for data viewer and mapping configurations. The following table displays the properties that you configure for sources:
Property Read all rows Description Reads all rows from the source. Default is enabled. Specifies the maximum number of rows to read from the source if you do not read all rows. The Data Integration Service ignores this property for SAP sources. Note: If you enable the this option for a mapping that writes to a customized data object, the Data Integration Service does not truncate the target table before it writes to the target. Default is 1000. Reads all characters in a column. Default is disabled. Specifies the maximum number of characters to read in each column if you do not read all characters. Default is 4000.

Read up to how many rows

Read all characters

Read up to how many characters

Results Properties
The Developer tool displays the Results tab for data viewer configurations. The following table displays the properties that you configure for results in the Data Viewer view:
Property Show all rows Description Displays all rows in the Data Viewer view.

Configurations

115

Property

Description Default is disabled.

Show up to how many rows

Specifies the maximum number of rows to display if you do not display all rows. Default is 1000. Displays all characters in a column. Default is disabled. Specifies the maximum number of characters to display in each column if you do not display all characters. Default is 4000.

Show all characters

Show up to how many characters

Advanced Properties
The Developer tool displays the Advanced tab for data viewer and mapping configurations. The following table displays the advanced properties:
Property Default date time format Description Date/time format the Data Integration Services uses when the mapping converts strings to dates. Default is MM/DD/YYYY HH24:MI:SS. Overrides the tracing level for each transformation in the mapping. The tracing level determines the amount of information that the Data Integration Service sends to the mapping log files. Choose one of the following tracing levels: - None. The Data Integration Service uses the tracing levels set in the mapping. - Terse. The Data Integration Service logs initialization information, error messages, and notification of rejected data. - Normal. The Data Integration Service logs initialization and status information, errors encountered, and skipped rows due to transformation row errors. Summarizes mapping results, but not at the level of individual rows. - Verbose initialization. In addition to normal tracing, the Data Integration Service logs additional initialization details, names of index and data files used, and detailed transformation statistics. - Verbose data. In addition to verbose initialization tracing, the Data Integration Service logs each row that passes into the mapping. Also notes where the Data Integration Service truncates string data to fit the precision of a column and provides detailed transformation statistics. Default is None. Order in which the Data Integration Service sorts character data in the mapping. Default is Binary. Controls the optimization methods that the Data Integration Service applies to a mapping as follows: - None. The Data Integration Service does not optimize the mapping. - Minimal. The Data Integration Service applies the early projection optimization method to the mapping. - Normal. The Data Integration Service applies the early projection, early selection, and predicate optimization methods to the mapping. - Full. The Data Integration Service applies the early projection, early selection, predicate optimization, and semi-join optimization methods to the mapping. Default is Normal.

Override tracing level

Sort order

Optimizer level

116

Chapter 12: Viewing Data

Property High precision

Description Runs the mapping with high precision. High precision data values have greater accuracy. Enable high precision if the mapping produces large numeric values, for example, values with precision of more than 15 digits, and you require accurate values. Enabling high precision prevents precision loss in large numeric values. Default is enabled. Allows you to view log files in the Developer tool. If you disable this option, you must view log files through the Administrator tool. Default is enabled.

Send log to client

Troubleshooting Configurations
I created two configurations with the same name but with different cases. When I close and reopen the Developer tool, one configuration is missing.
Data viewer and mapping configuration names are not case sensitive. If you create multiple configurations with the same name but different cases, the Developer tool deletes one of the configurations when you exit. The Developer tool does not consider the configuration names unique.

I tried to create a configuration with a long name, but the Developer tool displays an error message that says it cannot not write the file.
The Developer tool stores data viewer and mapping configurations in files on the machine that runs the Developer tool. If you create a configuration with a long name, for example, more than 100 characters, the Developer tool might not be able to save the file to the hard drive. To work around this issue, shorten the configuration name.

Exporting Data
You can export the data that displays in the Data Viewer view to a tab-delimited flat file, such as a TXT or CSV file. Export data when you want to create a local copy of the data. 1. 2. 3. 4. In the Data Viewer view, right-click the results and select Export Data. Enter a file name and extension. Select the location where you want to save the file. Click OK.

Logs
The Data Integration Service generates log events when you run a mapping, run a profile, preview data, or run an SQL query. Log events include information about the tasks performed by the Data Integration Service, errors, and load summary and transformation statistics.

Exporting Data

117

When you run a profile, preview data, or run an SQL query, you can view log events in the editor. To view log events, click the Show Log button in the Data Viewer view.

118

Chapter 12: Viewing Data

Part II: Informatica Data Services


This part contains the following chapters:
Logical View of Data, 120 Virtual Data, 126

119

CHAPTER 13

Logical View of Data


This chapter includes the following topics:
Logical View of Data Overview, 120 Developing a Logical View of Data, 121 Logical Data Object Models, 121 Logical Data Objects, 122 Logical Data Object Mappings, 124

Logical View of Data Overview


Develop a logical view of data to describe how to represent and access data in an enterprise. You can achieve the following goals:
Use common data models across an enterprise so that you do not have to redefine data to meet different

business needs. It also means if there is a change in data attributes, you can apply this change one time and use one mapping to make this change to all databases that use this data.
Find relevant sources of data and present the data in a single view. Data resides in various places in an

enterprise, such as relational databases and flat files. You can access all data sources and present the data in one view.
Expose logical data as relational tables to promote reuse.

American Bank acquires California Bank. After the acquisition, American Bank has the following goals:
Present data from both banks in a business intelligence report, such as a report on the top 10 customers. Consolidate data from both banks into a central data warehouse.

Traditionally, American Bank would consolidate the data into a central data warehouse in a development environment, verify the data, and move the data warehouse to a production environment. This process might take several months or longer. The bank could then run business intelligence reports on the data warehouse in the production environment. In the Developer tool, a developer at American Bank can create a model of customer, account, branch, and other data in the enterprise. The developer can link the relational sources of American Bank and California bank to a single view of the customer. The developer can then make the data available for business intelligence reports before creating a central data warehouse.

120

Developing a Logical View of Data


Develop a logical view of data to represent and access data in an enterprise. After you develop a logical view of data, you can add it to a data service to make virtual data available for end users. Before you develop a logical view of data, you can define the physical data objects that you want to use in a logical data object mapping. You can also profile the physical data sources to analyze data quality. 1. 2. 3. Create or import a logical data model. Optionally, add logical data objects to the logical data object model and define relationships between objects. Create a logical data object mapping to read data from a logical data object or write data to a logical data object. A logical data object mapping can contain transformation logic to transform the data. The transformations can include data quality transformations to validate and cleanse the data. View the output of the logical data object mapping.

4.

Logical Data Object Models


A logical data object model describes the structure and use of data in an enterprise. The model contains logical data objects and defines relationships between them. Define a logical data object model to create a unified model of data in an enterprise. The data in an enterprise might reside in multiple disparate source systems such as relational databases and flat files. A logical data object model represents the data from the perspective of the business regardless of the source systems. For example, customer account data from American Bank resides in an Oracle database, and customer account data from California Banks resides in an IBM DB2 database. You want to create a unified model of customer accounts that defines the relationship between customers and accounts. Create a logical data object model to define the relationship. You can import a logical data object model from an XSD file that you created in a modeling tool, such as ERwin. Or, you can manually create a logical data object model in the Developer tool. You add a logical data object model to a project or folder and store it in the Model repository.

Creating a Logical Data Object Model


Create a logical data object model to define the structure and use of data in an enterprise. When you create a logical data object model, you can add logical data objects and create logical data object mappings. When you add a logical data object to the model, you can associate a physical data object with the logical data object. The Developer tool creates a logical data object read mapping and a logical data object write mapping that includes the logical and physical data objects as input and output. 1. 2. Select a project or folder in the Object Explorer view. Click File > New > Logical Data Object Model. The New Logical Data Object Model dialog box appears. 3. 4. 5. Select Create as Empty. Enter a name for the logical data object model. To create logical data objects, click Next. To create an empty logical data object model, click Finish. If you click Next, the Developer tool prompts you to add logical data objects to the model.

Developing a Logical View of Data

121

6.

To create a logical data object, click the New button. The Developer tool adds a logical data object to the list.

7. 8.

Enter a name in the Name column. Optionally, click the Open button in the Source column to associate a physical data object with the logical data object. The Select a Data Object dialog box appears.

9. 10. 11. 12.

Select a physical data object and click OK. Optionally, enter a description in the Description column. Repeat steps 6 through 10 to add logical data objects. Click Finish. The logical data object model opens in the editor. If you created logical data objects, they appear in the editor.

Importing a Logical Data Object Model


You can import a logical data object model from an XSD file. Import a logical data object model to use an existing model of the structure and use of data in an enterprise. 1. 2. Select a project or folder in the Object Explorer view. Click File > New > Logical Data Object Model. The New Logical Data Object Model dialog box appears. 3. 4. 5. 6. 7. 8. Enter a name for the logical data object model. Select Create Existing Model from File. Browse to the XSD file that you want to import, select the file, and click Open. Click Next. Add logical data objects to the logical data object model. Click Finish. The logical data objects appear in the editor.

Logical Data Objects


A logical data object is an object in a logical data object model that describes a logical entity in an enterprise. It has attributes, keys, and it describes relationships between attributes. You include logical data objects that relate to each other in a data object model. For example, the logical data objects Customer and Account appear in a logical data object model for a national bank. The logical data object model describes the relationship between customers and accounts. In the model, the logical data object Account includes the attribute Account_Number. Account_Number is a primary key, because it uniquely identifies an account. Account has a relationship with the logical data object Customer, because the Customer data object needs to reference the account for each customer.

122

Chapter 13: Logical View of Data

Logical Data Object Properties


A logical data object contains properties that define the data object and its relationship to other logical data objects in a logical data object model. A logical data object contains the following properties:
Name General Attributes Keys Description Name and description of the logical data object. Comprise the structure of data in a logical data object. One or more attributes in a logical data object can be primary keys or unique keys. Associations between logical data objects. Type of access for a logical data object and each attribute of the data object. Logical data object mappings associated with a logical data object.

Relationships Access

Mappings

Attribute Relationships
A relationship is an association between primary or foreign key attributes of one or more logical data objects. You can define the following types of relationship between attributes: Identifying A relationship between two attributes where an attribute is identified through its association with another attribute. For example, the relationship between the Branch_ID attribute of the logical data object Branch and the Branch_Location attribute of the logical data object Customer is identifying. This is because a branch ID is unique to a branch location. Non-Identifying A relationship between two attributes that identifies an attribute independently of the other attribute. For example, the relationship between the Account_Type attribute of the Account logical data object and the Account_Number attribute of the Customer logical data object is non-identifying. This is because you can identify an account type without having to associate it with an account number. When you define relationships, the logical data object model indicates an identifying relationship as a solid line between attributes. It indicates a non-identifying relationship as a dotted line between attributes.

Creating a Logical Data Object


You can create a logical data object in a logical data object model to define a logical entity in an enterprise. 1. 2. 3. Click File > New > Other. Select Informatica > Data Objects > Data Object and click Next. Enter a data object name.

Logical Data Objects

123

4.

Select the data object model for the data object and click Finish. The data object appears in the data object model canvas.

5. 6. 7. 8. 9. 10.

Select the data object and click the Properties tab. On the Generaltab, optionally edit the logical data object name and description. On the Attributes tab, create attributes and specify their datatype and precision. On the Keys tab, optionally specify primary and unique keys for the data object. On the Relationships tab, optionally create relationships between logical data objects. On the Access tab, optionally edit the type of access for the logical data object and each attribute in the data object. Default is read only.

11.

On the Mappings tab, optionally create a logical data object mapping.

Logical Data Object Mappings


A logical data object mapping is a mapping that links a logical data object to one or more physical data objects. It can include transformation logic. A logical data object mapping can be of the following types:
Read Write

You can associate each logical data object with one logical data object read mapping or one logical data object write mapping.

Logical Data Object Read Mappings


A logical data object read mapping contains one or more physical data objects as input and one logical data object as output. The mapping can contain transformation logic to transform the data. It provides a way to access data without accessing the underlying data source. It also provides a way to have a single view of data coming from more than one source. For example, American Bank has a logical data object model for customer accounts. The logical data object model contains a Customers logical data object. American Bank wants to view customer data from two relational databases in the Customers logical data object. You can use a logical data object read mapping to perform this task and view the output in the Data Viewer view.

Logical Data Object Write Mappings


A logical data object write mapping contains a logical data object as input. It provides a way to write to targets from a logical data object. The mapping can contain transformation logic to transform the data.

124

Chapter 13: Logical View of Data

Creating a Logical Data Object Mapping


You can create a logical data object mapping to link data from a physical data object to a logical data object and transform the data. 1. 2. 3. 4. 5. 6. 7. In the Data Object Explorer view, select the logical data object model that you want to add the mapping to. Click File > New > Other. Select Informatica > Data Objects > Data Object Mapping and click Next. Select the logical data object you want to include in the mapping. Select the mapping type. Optionally, edit the mapping name. Click Finish . The editor displays the logical data object as the mapping input or output, based on whether the mapping is a read or write mapping. 8. 9. 10. 11. Drag one or more physical data objects to the mapping as read or write objects, based on whether the mapping is a read or write mapping. Optionally, add transformations to the mapping. Link ports in the mapping. Right-click the mapping canvas and click Validate to validate the mapping. Validation errors appear on the Validation Log view. 12. 13. Fix validation errors and validate the mapping again. Optionally, click the Data Viewer view and run the mapping. Results appear in the Output section.

Logical Data Object Mappings

125

CHAPTER 14

Virtual Data
This chapter includes the following topics:
Virtual Data Overview, 126 SQL Data Services, 127 Virtual Tables, 128 Virtual Table Mappings, 130 Virtual Stored Procedures, 132 SQL Query Plans, 135

Virtual Data Overview


Create a virtual database to define uniform views of data and make the data available for end users to query. End users can query the virtual tables as if they were physical database tables. Create a virtual database to accomplish the following tasks:
Define a uniform view of data that you can expose to end users. Define the virtual flow of data between the sources and the virtual tables. Transform and standardize the data. Provide end users with access to the data. End users can use a JDBC or ODBC client tool to query the virtual

tables as if they were actual, physical database tables.


Isolate the data from changes in data structures. You can add the virtual database to a self-contained

application. If you make changes to the virtual database in the Developer tool, the virtual database in the application does not change until you redeploy it. To create a virtual database, you must create an SQL data service. An SQL data service contains the virtual schemas and the virtual tables or stored procedures that define the database structure. If the virtual schema contains virtual tables, the SQL data service also contains virtual table mappings that define the flow of data between the sources and the virtual tables. After you create an SQL data service, you add it to an application and deploy the application to make the SQL data service accessible by end users. End users can query the virtual tables or run the stored procedures in the SQL data service by entering an SQL query in a third-party client tool. When the user enters the query, the Data Integration Service retrieves virtual data from the sources or from cache tables, if an administrator specifies that any of the virtual tables should be cached.

Example
Two companies that store customer data in multiple, heterogeneous data sources merge. A developer at the merged company needs to make a single view of customer data available to other users at the company. To

126

accomplish this goal, the developer creates an SQL data service that contains virtual schemas and virtual tables that define a unified view of a customer. The developer creates virtual table mappings to link the virtual tables of the customer with the sources and to standardize the data. To make the virtual data accessible by end users, the developer includes the SQL data service in an application and deploys the application. After the developer deploys the application, end users can make queries against the standardized view of the customer through a JDBC or ODBC client tool.

SQL Data Services


An SQL data service is a virtual database that end users can query. It contains a schema and other objects that represent underlying physical data. An SQL data service can contain the following objects:
Virtual schemas. Schemas that define the virtual database structure. Virtual tables. The virtual tables in the database. You can create virtual tables from physical or logical data

objects, or you can create virtual tables manually.


Virtual table mappings. Mappings that link a virtual table to source data and define the data flow between the

sources and the virtual table. If you create a virtual table from a data object, you can create a virtual table mapping to define data flow rules between the data object and the virtual table. If you create a virtual table manually, you must create a virtual table mapping to link the virtual table with source data and define data flow.
Virtual stored procedures. Sets of data flow instructions that allow end users to perform calculations or retrieve

data.

Defining an SQL Data Service


To define an SQL data service, create an SQL data service and add objects to it. 1. Create an SQL data service. You can create virtual tables and virtual table mappings during this step. 2. Create virtual tables in the SQL data service. You can create a virtual table from a data object, or you can create a virtual table manually. 3. 4. 5. 6. Define relationships between virtual tables. Create or update virtual table mappings to define the data flow between data objects and the virtual tables. Optionally, create virtual stored procedures. Optionally, preview virtual table data.

Creating an SQL Data Service


Create an SQL data service to define a virtual database that end users can query. When you create an SQL data service, you can create virtual schemas, virtual tables, and virtual table mappings that link virtual tables with source data. 1. 2. Select a project or folder in the Object Explorer view. Click File > New > Data Service. The New dialog box appears.

SQL Data Services

127

3. 4. 5. 6.

Select SQL Data Service. Click Next. Enter a name for the SQL data service. To create virtual tables in the SQL data service, click Next. To create an SQL data service without virtual tables, click Finish. If you click Next, the New SQL Data Service dialog box appears.

7.

To create a virtual table, click the New button. The Developer tool adds a virtual table to the list of virtual tables.

8. 9.

Enter a virtual table name in the Name column. Click the Open button in the Data Object column. The Select a Data Object dialog box appears.

10. 11. 12. 13. 14.

Select a physical or logical data object and click OK. Enter the virtual schema name in the Virtual Schema column. Select Read in the Data Access column to link the virtual table with the data object. Select None if you do not want to link the virtual table with the data object. Repeat steps 7 through 12 to add more virtual tables. Click Finish. The Developer tool creates the SQL data service.

Virtual Tables
A virtual table is a table in a virtual database. Create a virtual table to define the structure of the data. Create one or more virtual tables within a schema. If a schema contains multiple virtual tables, you can define primary key-foreign key relationships between tables. You can create virtual tables manually or from physical or logical data objects. Each virtual table has a data access method. The data access method defines how the Data Integration Service retrieves data. When you manually create a virtual table, the Developer tool creates an empty virtual table and sets the data access method to none. When you create a virtual table from a data object, the Developer tool creates a virtual table with the same columns and properties as the data object. The Developer tool sets the data access method to read. If you change columns in the data object, the Developer tool updates the virtual table with the same changes. The Developer tool does not update the virtual table if you change the data object name or description. To define data transformation rules for the virtual table, set the data access method to custom. The Developer tool prompts you to create a virtual table mapping. You can preview virtual table data when the data access method is read or custom.

Data Access Methods


The data access method for a virtual table defines how the Data Integration Service retrieves data.

128

Chapter 14: Virtual Data

When you create a virtual table, you must choose a data access method. The following table describes the data access methods:
Data Access Method None Description

The virtual table is not linked to source data. If you change the data access method to none, the Developer tool removes the link between the data object and the virtual table. If the virtual table has a virtual table mapping, the Developer tool deletes the virtual table mapping. The Data Integration Service cannot retrieve data for the table. The virtual table is linked to a physical or logical data object without data transformation. If you add, remove, or change a column in the data object, the Developer tool makes the same change to the virtual table. However, if you change primary key-foreign key relationships, change the name of the data object, or change the data object description, the Developer tool does not update the virtual table. If you change the data access method to read, the Developer tool prompts you to choose a data object. If the virtual table has a virtual table mapping, the Developer tool deletes the virtual table mapping. When an end user queries the virtual table, the Data Integration Service retrieves data from the data object. The virtual table is linked to a physical or logical data object through a virtual table mapping. If you update the data object, the Developer tool does not update the virtual table. If you change the data access method to custom, the Developer tool prompts you to create a virtual table mapping. When an end user queries the virtual table, the Data Integration Service applies any transformation rule defined in the virtual table mapping to the source data. It returns the transformed data to the end user.

Read

Custom

Creating a Virtual Table from a Data Object


Create a virtual table from a physical or logical data object when the virtual table structure matches the structure of the data object. The Developer tool creates a virtual table mapping to read data from the data object. 1. 2. 3. Open an SQL data service. Click the Schema view. Drag a physical or logical data object from the Object Explorer view to the editor. The Add Data Objects to SQL Data Service dialog box appears. The Developer tool lists the data object in the Data Object column. 4. 5. Enter the virtual schema name in the Virtual Schema column. Click Finish. The Developer tool places the virtual table in the editor and sets the data access method to read.

Creating a Virtual Table Manually


Create a virtual table manually when the virtual table structure does not match the structure of an existing data object. The Developer tool sets the data access method for the virtual table to none, which indicates the virtual table is not linked to a source. 1. 2. Open an SQL data service. In the Overview view Tables section, click the New button. The New Virtual Table dialog box appears. 3. Enter a name for the virtual table.

Virtual Tables

129

4. 5.

Enter a virtual schema name or select a virtual schema. Click Finish. The virtual table appears in the Schema view.

6. 7.

To add a column to the virtual table, right-click Columns and click New. To make a column a primary key, click the blank space to the left of the column name.

Defining Relationships between Virtual Tables


You can define primary key-foreign key relationships between virtual tables in an SQL data service to show associations between columns in the virtual tables. 1. 2. 3. Open an SQL data service. Click the Schema view. Click the column you want to assign as a foreign key in one table. Drag the pointer from the foreign key column to the primary key column in another table. The Developer tool uses an arrow to indicate a relationship between the tables. The arrow points to the primary key table.

Running an SQL Query to Preview Data


Run an SQL query against a virtual table to preview the data. For the query to return results, the virtual table must be linked to source data. Therefore, the virtual table must be created from a data object or it must be linked to source data in a virtual table mapping. 1. 2. 3. Open an SQL data service. Click the Schema view. Select the virtual table in the Outline view. The virtual table appears in the Schema view. 4. 5. Click the Data Viewer view. Enter an SQL statement in the Input window. For example:
select * from <schema>.<table>

6.

Click Run. The query results appear in the Output window.

Virtual Table Mappings


A virtual table mapping defines the virtual data flow between sources and a virtual table in an SQL data service. Use a virtual table mapping to transform the data. Create a virtual table mapping to link a virtual table in an SQL data service with source data and to define the rules for data transformation. When an end user queries the virtual table, the Data Integration Service applies the transformation rules defined in the virtual table mapping to the source data. It returns the transformed data to the end user.

130

Chapter 14: Virtual Data

If you do not want to transform the data, you do not have to create a virtual table mapping. When an end user queries the virtual table, the Data Integration Service retrieves data directly from the data object. You can create one virtual table mapping for each virtual table in an SQL data service. You can preview virtual table data as you create and update the mapping. A virtual table mapping contains the following components:
Sources. Physical or logical data objects that describe the characteristics of source tables or files. A virtual

table mapping must contain at least one source.


Transformations. Objects that define the rules for data transformation. Use different transformation objects to

perform different functions. Transformations are optional in a virtual table mapping.


Virtual table. A virtual table in an SQL data service. Links. Connections between columns that define virtual data flow between sources, transformations, and the

virtual table.

Example
You want to make order information available to one of your customers. The orders information is stored in a relational database table that contains information for several customers. The customer is not authorized to view the orders information for other customers. Create an SQL data service to retrieve the orders information. Create a virtual table from the orders table and set the data access method to custom. Add a Filter transformation to the virtual table mapping to remove orders data for the other customers. After you create and deploy an application that contains the SQL data service, the customer can query the virtual table that contains his orders information.

Defining a Virtual Table Mapping


To define a virtual table mapping, create a virtual table mapping, add sources and transformations, and validate the mapping. 1. 2. 3. 4. Create a mapping from a virtual table in an SQL data service. Add sources and transformations to the mapping and link columns. Validate the mapping. Optionally, preview the mapping data.

Creating a Virtual Table Mapping


Create a virtual table mapping to define the virtual data flow between source data and a virtual table in an SQL data service. You can create one virtual table mapping for each virtual table. 1. 2. 3. Open the SQL data service that contains the virtual table for which you want to create a virtual table mapping. Click the Overview view. In the Tables section, change the data access method for the virtual table to Custom. The New Virtual Table Mapping dialog box appears. 4. 5. Enter a name for the virtual table mapping. Click Finish.

Virtual Table Mappings

131

The Developer tool creates a view for the virtual table mapping and places the virtual table in the editor. If you created the virtual table from a data object, the Developer tool adds the data object to the mapping as a source. 6. To add sources to the mapping, drag data objects from the Object Explorer view into the editor. You can add logical or physical data objects as sources. 7. 8. Optionally, add transformations to the mapping by dragging them from the Object Explorer view or Transformation palette into the editor. Link columns by selecting a column in a source or transformation and dragging it to a column in another transformation or the virtual table. The Developer tool uses an arrow to indicate the columns are linked.

Validating a Virtual Table Mapping


Validate a virtual table mapping to verify that the Data Integration Service can read and process the entire virtual table mapping. 1. 2. 3. Open an SQL data service. Select the virtual table mapping view. Select Edit > Validate. The Validation Log view opens. If no errors appear in the view, the virtual table mapping is valid. 4. If the Validation Log view lists errors, correct the errors and revalidate the virtual table mapping.

Previewing Virtual Table Mapping Output


As you develop a virtual table mapping, preview the output to verify the virtual table mapping produces the results you want. The virtual table must be linked to source data. 1. 2. 3. 4. 5. Open the SQL data service that contains the virtual table mapping. Click the virtual table mapping view. Select the object for which you want to preview output. You can select a transformation or the virtual table. Click the Data Viewer view. Click Run. The Developer tool displays results in the Output section.

Virtual Stored Procedures


A virtual stored procedure is a set of procedural or data flow instructions in an SQL data service. When you deploy an application that contains an SQL data service, end users can access and run the virtual stored procedures in the SQL data service through a JDBC client tool. Create a virtual stored procedure to allow end users to perform calculations, retrieve data, or write data to a data object. End users can send data to and receive data from the virtual stored procedure through input and output parameters.

132

Chapter 14: Virtual Data

Create a virtual stored procedure within a virtual schema in an SQL data service. You can create multiple stored procedures within a virtual schema. A virtual stored procedure contains the following components:
Inputs. Objects that pass data into the virtual stored procedure. Inputs can be input parameters, Read

transformations, or physical or logical data objects. Input parameters pass data to the stored procedure. Read transformations extract data from logical data objects. A virtual stored procedure must contain at least one input.
Transformations. Objects that define the rules for data transformation. Use different transformation objects to

perform different functions. Transformations are optional in a virtual stored procedure.


Outputs. Objects that pass data out of a virtual stored procedure. Outputs can be output parameters, Write

transformations, or physical or logical data objects. Output parameters receive data from the stored procedure. Write transformations write data to logical data objects. A virtual stored procedure must contain at least one output.
Links. Connections between ports that define virtual data flow between inputs, transformations, and outputs.

Example
An end user needs to update customer email addresses for customer records stored in multiple relational databases. To allow the end user to update the email addresses, first create a logical data object model to define a unified view of the customer. Create a logical data object that represents a union of the relational tables. Create a logical data object write mapping to write to the relational tables. Add a Router transformation to determine which relational table contains the customer record the end user needs to update. Next, create an SQL data service. In the SQL data service, create a virtual stored procedure that contains input parameters for the customer ID and email address. Create a Write transformation based on the logical data object and add it to the virtual stored procedure as output. Finally, deploy the SQL data service. The end user can call the virtual stored procedure through a third-party client tool. The end user passes the customer ID and updated email address to the virtual stored procedure. The virtual stored procedure uses the Write transformation to update the logical data object. The logical data object write mapping determines which relational table to update based on the customer ID and updates the customer email address in the correct table.

Defining a Virtual Stored Procedure


To define a virtual stored procedure, create a virtual stored procedure, add inputs, transformations, and outputs, and validate the stored procedure. 1. 2. 3. 4. Create a virtual stored procedure in an SQL data service. Add inputs, transformations, and outputs to the virtual stored procedure, and link the ports. Validate the virtual stored procedure. Optionally, preview the virtual stored procedure output.

Creating a Virtual Stored Procedure


Create a virtual stored procedure to allow an end user to access the business logic within the procedure through a JDBC or ODBC client tool. You must create a virtual stored procedure within a virtual schema.

Virtual Stored Procedures

133

1.

In the Object Explorer view or Outline view, right-click an SQL data service and select New > Virtual Stored Procedure. The New Virtual Stored Procedure dialog box appears.

2. 3. 4. 5.

Enter a name for the virtual stored procedure. Enter a virtual schema name or select a virtual schema. If the virtual stored procedure has input parameters or output parameters, select the appropriate option. Click Finish. The Developer tool creates an editor for the virtual stored procedure. If you select input parameters or output parameters, the Developer tool adds an Input Parameter transformation or an Output Parameter transformation, or both, in the editor.

6. 7. 8. 9.

Add input parameters or sources to the virtual stored procedure. Add output parameters or targets to the virtual stored procedure. Optionally, add transformations to the virtual stored procedure by dragging them from the Object Explorer view or the Transformation palette into the editor. Link ports by selecting a port in a source or transformation and dragging it to a port in another transformation or target. The Developer tool uses an arrow to indicate the ports are linked.

Validating a Virtual Stored Procedure


Validate a virtual stored procedure to verify that the Data Integration Service can read and process the virtual stored procedure. 1. 2. Open a virtual stored procedure. Select Edit > Validate. The Validation Log view opens. If no errors appear in the view, the virtual stored procedure is valid. 3. If the Validation Log view lists errors, correct the errors and revalidate the virtual stored procedure.

Previewing Virtual Stored Procedure Output


Preview the output of a virtual stored procedure to verify that it produces the results you want. The virtual stored procedure must contain at least one input parameter or source and one output parameter or target. 1. 2. 3. 4. Open a virtual stored procedure. Select the Data Viewer view. If the virtual stored procedure contains input parameters, enter them in the Input section. Click Run. The Developer tool displays results in the Output section.

134

Chapter 14: Virtual Data

SQL Query Plans


An SQL query plan enables you to view a mapping-like representation of the SQL query you enter when you preview virtual table data. When you view the SQL query plan for a query, the Developer tool displays a graphical representation of the query that looks like a mapping. The graphical representation has a source, transformations, links, and a target. The Developer tool allows you to view the graphical representation of your original query and the graphical representation of the optimized query. The optimized query view contains different transformations or transformations that appear in a different order than the transformations in the original query. The optimized query produces the same results as the original query, but usually runs more quickly. View the query plan to troubleshoot queries end users run against a deployed SQL data service. You can also use the query plan to help you troubleshoot your own queries and understand the log messages. The Developer tool uses optimizer levels to produce the optimized query. Different optimizer levels might produce different optimized queries, based on the complexity of the query. For example, if you enter a simple SELECT statement, for example, "SELECT * FROM <schema.table>," against a virtual table in an SQL data service without a user-generated virtual table mapping, the Developer tool might produce the same optimized query for each optimizer level. However, if you enter a query with many clauses and subqueries, or if the virtual table mapping is complex, the Developer tool produces a different optimized query for each optimizer level.

SQL Query Plan Example


When you view the SQL query plan for a query you enter in the Data Viewer view, you can view the original query and the optimized query. The optimized query displays the query as the Data Integration Service executes it. For example, you want to query the CUSTOMERS virtual table in an SQL data service. The SQL data service does not contain a user-generated virtual table mapping. In the Data Viewer view, you choose the default data viewer configuration settings, which sets the optimizer level for the query to normal. You enter the following query in the Data Viewer view:
select * from CUSTOMERS where CUSTOMER_ID > 150000 order by LAST_NAME

When you view the SQL query plan, the Developer tool displays the following graphical representation of the query:

The non-optimized view displays the query as you enter it. The Developer tool displays the WHERE clause as a Filter transformation and the ORDER BY clause as a Sorter transformation. The Developer tool uses the passthrough Expression transformation to rename ports. When you view the optimized query, the Developer tool displays the following graphical representation of the query:

The optimized view displays the query as the Data Integration Service executes it. Because the optimizer level is normal, the Data Integration Service pushes the filter condition to the source data object. Pushing the filter

SQL Query Plans

135

condition improves query performance because it reduces the number of rows that the Data Integration Service reads from the source data object. As in the non-optimized query, the Developer tool displays the ORDER BY clause as a Sorter transformation. It uses pass-through Expression transformations to enforce the data types you specify in the logical transformations.

Viewing an SQL Query Plan


Display the SQL query plan to view a mapping-like representation of the SQL query you enter when you preview virtual table data. 1. 2. 3. 4. 5. Open an SQL data service that contains at least one virtual table. Click the Data Viewer view. Enter an SQL query in the Input window. Optionally, select a data viewer configuration that contains the optimizer level you want to apply to the query. Click Show Query Plan. The Developer tool displays the SQL query plan for the query as you entered it on the Non-Optimized tab. 6. To view the optimized query, click the Optimized tab. The Developer tool displays the optimized SQL query plan.

136

Chapter 14: Virtual Data

Part III: Informatica Data Quality


This part contains the following chapters:
Profiles, 138 Scorecards, 146 Reference Data, 148

137

CHAPTER 15

Profiles
This chapter includes the following topics:
Profiles Overview, 138 Profile Features, 139 Creating a Column Profile for a Data Object, 139 Creating a Profile for Join Analysis, 140 Adding a Rule to Profile, 141 Running a Saved Profile , 141 Profiling a Mapplet or Mapping Object, 141 Profile Results, 142 Exporting Profile Results, 144

Profiles Overview
A profile is an analysis of the content and structure of data. Create and run a profile to identify data quality issues in data. Use profiles to create scorecards and to create and update reference data tables. Data profiling is often the first step in a project. You can run a profile to evaluate the structure of data and verify that data columns are populated with the types of information you expect. If a profile reveals problems in data, you can define steps in your project to fix those problems. There are two types of profiling: column profiling and join analysis. Column profiling provides the following facts about data:
The number of unique and null values in each column, expressed as a number and a percentage. The patterns of data in each column, and the frequencies with which these values occur. Statistics about the column values, such as the maximum and minimum lengths of values and the first and last

values in each column. Join analysis describes the degree of overlap between two data columns. It displays results as a Venn diagram and as a percentage value. Use join analysis profiles to validate or identify problems in column join conditions. You can perform column profiling on a data object and on an object in a mapping. You can perform join analysis on a data object. Note: You can also work with profiles in Informatica Analyst. Changes that you make to profiles in the Analyst tool do not appear in the Developer tool until you refresh the Developer tool connection to the repository. Disconnect from the repository and then reconnect to it to refresh the connection.

138

Profile Features
You can create a profile and save it to the Profiles folder of you project in the Model repository. You can edit or run the profile at any time. The Profile Warehouse stores the results for the profile.

Profiles and Rules


You can add rules to a profile. A rule is business logic that defines conditions applied to source data when you run a profile. A profile can read rules created from mapplets in the Developer tool or expression-based rules defined in the Analyst tool. You can apply rules to columns. Rules appear as virtual columns in the Columns tab of the profile.

Incremental Profiling
When you create a profile on a data object, you can run the profile multiple times, changing the profile settings in each run. When you create a profile on multiple columns, you can select or clear different columns each time you run the profile. You can also set the profile options to save or discard the results of previous profiles. When you use these options together, you can build a detailed picture of the data object while controlling the amount of data processed in each profile run. You can also add multiple rules to the profile and select or clear different rules each time you run the profile. This enables you to view the data with different rules without running every rule each time you run the profile.

Drill-Down on Profile Data


You can drill-down on the columns added to a profile to examine the raw data values. You can drill down on the latest data in the data source or on data saved to the staging database. You can choose to save data to the staging database when you define the profile. You can also drill down on columns from the data source that are not analyzed in the profile. In this way you can minimize the quantity of data that is processed when the profile runs. Select columns for drill-down when you define the profile.

Creating a Column Profile for a Data Object


You can create a profile for one or more columns in a data object and store the profile object in the Model repository. 1. 2. 3. 4. 5. 6. 7. 8. 9. In the Object Explorer view, select the data object on which to create the profile. Click File > New > Profile to open the New Profile wizard. Enter a name for the profile and verify the project location. If required, browse to a new location. Optionally, enter a text description of the profile. Verify that the name of the data object you selected appears under Data Objects in the wizard. Select Column Profiling. Select or clear Run Profile on finish. Click Next. Select the data columns to profile, and click Next. Select or clear the option to Discard profiling results for columns or rules not selected for re-profiling.

Profile Features

139

If you select this option, the Profile Warehouse saves only the results of the latest profile run. If you clear this option, the Profile Warehouse saves all profiling results. Clear this option if you will run the profile on different rules or columns and want to save all results. 10. Set the sampling options for the profile. These options determine the number of rows that are read when the profile runs. You can select all rows or a subset of rows. 11. Set the drilldown options. These options determine how the profile reads column data when you drill down on the profile results. You can drill down on live data from the data source or on staged data from the most recent profiling operation. You can select columns for drilldown that you did not select for profiling. Click the Select button to choose these columns in addition to the profiled columns. 12. Click Finish. The profile is ready to run.

Creating a Profile for Join Analysis


You can analyze potential joins on columns in two data objects and store the analysis in the Model repository. 1. 2. 3. Click File > New > Profile to open the New Profile wizard. Enter a name for the profile and verify the project location. If required, browse to a new location. Optionally, enter a text description of the profile. Click Add. The Data Objects dialog box opens. 4. Browse the repository and select a data object for join analysis. Click OK. 5. Click Add to open the Data Objects dialog box and select additional data objects. Click OK. 6. 7. 8. 9. 10. Verify that the names of the data objects appear under Data Objects in the wizard. Select Join Analysis. Select or clear Run Profile on finish. Click Next. Select the data columns to include in the profile results, and click Next. If required, scroll down the data objects to view all available columns. 11. 12. 13. 14. 15. Click Add. The Join Condition dialog box opens. Click the New button to activate the column selection fields. Select the data objects and columns to validate. Verify that the Left and Right join columns are prefixed with the correct data object names. Click Finish.

140

Chapter 15: Profiles

Adding a Rule to Profile


You can add a rule to a saved profile. You cannot add a rule to a profile configured for join analysis. Complete these steps to add a rule to a profile: 1. 2. Browse the Object Explorer and find the profile you need. Right-click the profile and select Open. The profile opens in the editor. 3. 4. Click the Rules tab. Click Add. The Apply Rule dialog box opens. 5. Click Browse to find the rule you want to apply. You can select rules from the current project. 6. 7. Click the Value column under Input Values to select an input port for the rule. Click the Value column under Output Values to edit the name of the rule output port. The rule appears in the Rules tab. 8. Save the profile.

Running a Saved Profile


Run a profile that you saved to the Model repository when you want to save the results in the Profile Warehouse. 1. 2. In the Object Explorer view, browse to the Profiles folder containing the saved profile. Right-click the profile and select Run Profile. The profile results appear on the Results tab of the profile.

Profiling a Mapplet or Mapping Object


Run a profile on a mapplet or mapping object when you want to verify the design of the mapping or mapplet and you do not need to save the profile results. This profiling operation runs on all data columns and enables drill-down operations on data that has been staged for the data object. 1. 2. 3. Open a mapplet or mapping. Verify that the mapplet or mapping is valid. Right-click a data object or transformation and select Profile Now. The profile results appear on the Results tab of the profile. The profile traces the source data through the mapping to the output ports of the object you selected. It analyzes the data that would appear on those ports if you ran the mapping.

Adding a Rule to Profile

141

Profile Results
Click the Results tab when you run a profile to display the result of the profiling operation. For column profiling, the Results tab displays the following types of information:
The number and percentage of unique values and null values in columns, and the inferred datatypes for

column values.
The frequency and character patterns of data values in a selected column and a statistical summary for the

column. For join analysis, the Results tab displays the following types of information:
A Venn diagram that shows the relationships between columns. The number and percentage of orphaned, null, and joined values in columns.

Column Profiling Results


The Column Profiling area on the Results tab includes information about the number of unique and null values, the inferred datatypes, and the last run date and time. The following table describes the properties in the Column Profiling area:
Property Drilldown Column Unique Values Unique % Null Values Null % Datatype Description If selected, enables drilldown on live data for the column. Name of the column in the profile. Number of unique values for the column. Percentage of unique values for the column. Number of null values for the column. Percentage of null values for the column. Data type derived from the values for the column. The Analyst tool can derive the following datatypes from the datypes of values in columns: - String - Varchar - Decimal - Integer - "-" for Nulls Data type declared for the column in the profiled object. Maximum value in the column. Minimum value in the column. Date and time you last ran the profile.

Documented Datatype Max Value Min Value Last Profiled

You can also use the Show menu to view information about frequencies, patterns, and statistics for values in a column.

142

Chapter 15: Profiles

Column Statistics
The Statistics selection in the profile results provides column statistics, such as maximum and minimum lengths of values and first and last values. To view statistical information, select Statistics from the Show menu. The following table describes the column statistics:
Property Maximum Length Minimum Length Bottom Top Description Length of the longest value in the column. Length of the shortest value in the column. Last three values in the column. First three values in the column.

Column Value Patterns


The Patterns area in the profile results shows the patterns of data in the profiled columns and the frequency with which the patterns appear in each column. The patterns are shown as a number, a percentage, and a bar chart. To view pattern information, select Patterns from the Show menu. The following table describes the properties for column value patterns:
Property Patterns Frequency Percent Description Pattern for the selected column. Number of times a pattern appears in a column. Number of times a pattern appears in a column, expressed as a percentage of all values in the column. Bar chart for the percentage.

Chart

Column Value Frequencies


The Frequency area in the profile results shows the values in the profiled columns and the frequency with which each value appears in each column. The frequencies are shown as a number, a percentage, and a bar chart. To view frequency information, select Frequency from the Show menu. The following table describes the properties for column value frequencies:
Property Values Frequency Description List of all values for the column in the profile. Number of times a value appears in a column.

Profile Results

143

Property Percent

Description Number of times a value appears in a column, expressed as a percentage of all values in the column. Bar chart for the percentage.

Chart

Join Analysis Results


The join analysis Results tab provides information about the number and percentage of parent orphan rows, child orphan rows, and join rows. Join analysis results also include Venn diagrams that show the relationships between columns. The following table describes the properties shown on the Results tab.
Property Left Table Right Table Parent Orphan Rows Child Orphan Rows Join Rows Description Name of the left table and columns used in the join analysis Name of the right table and columns used in the join analysis Number of rows in the left table that cannot be joined. Number of rows in the right table that cannot be joined. Number of rows included in the join.

Select a join condition to view a Venn diagram that shows the relationships between columns. The area below the Venn diagram also displays the number and percentage of orphaned, null, and joined values in columns. Double-click a section in the Venn diagram to view the records that the section represents. These records open in the Data Viewer view. You can export the list of records from the Data Viewer view to a flat file.

Exporting Profile Results


You can export column values and column pattern data from profile results. Export column values in Distinct Value Count format. Export pattern values in Domain Inference format. 1. 2. 3. 4. 5. In Object Explorer view, select and open a profile. Optionally, run the profile to update the profile results. Select the Results view. Select the column that contains the data you will export. Under Details, select Values or select Patterns and click the Export button. The Export data to a file dialog box opens. The dialog box displays the value or pattern data you selected. 6. 7. Accept or edit the file name. The default name is NewProfile_[column_name]_DVC for column value data and NewProfile_[column_name]_DI for pattern data. Select or clear the option export field names as the first row.

144

Chapter 15: Profiles

8.

Click OK.

The Developer tool writes the file to the /tomcat/bin/ProfileExport directory of the Informatica Data Quality installation.

Exporting Profile Results

145

CHAPTER 16

Scorecards
This chapter includes the following topics:
Scorecards Overview, 146 Creating a Scorecard, 146 Viewing Column Data in a Scorecard, 147

Scorecards Overview
A scorecard is the graphical representation of valid values for a column in a profile. You can create scorecards to drill down on live data or staged data. Scorecards display the value frequency for columns in a profile as scores. The scores reflect the percentage of valid values in the columns. After you run a profile, you can add columns from the profile to a scorecard. You can create and view a scorecard in the Developer tool. You can run and edit the scorecard in the Analyst tool. Use scorecards to measure data quality progress. For example, you can create a scorecard to measure data quality before you apply data quality rules. After you apply data quality rules, you can create another scorecard to compare the effect of the rules on data quality.

Creating a Scorecard
Create a scorecard and add columns from a profile to the scorecard. You must run a profile before you add columns to the scorecard. Complete these steps to create a scorecard: 1. 2. In Object Explorer, select the project or folder where you want to create the scorecard. Click File > New > Scorecard. The New Scorecard dialog box appears. 3. Click Add. The Select Profile dialog box appears. Select the profile that contains the columns you want to add. 4. 5. Click OK, then click Next. Select the columns that you want to add to the scorecard.

146

By default, the scorecard wizard selects the columns and rules defined in the profile. You cannot add columns that are not included in the profile. 6. Click Finish. The Developer tool creates the scorecard. 7. Optionally, click Open with Informatica Analyst to connect to the Analyst tool and open the scorecard in the Analyst tool. You can run and edit the scorecard in the Analyst tool. You can run the scorecard on current data in the data object or on data stored in the staging database.

Viewing Column Data in a Scorecard


Use a scorecard to view statistics on the valid and invalid data in a data object. A scorecard determines data to be valid or invalid based on the rule that the profile applies to the data source. Complete these steps to view scorecard data in the Developer tool: 1. 2. Browse the Object Explorer and find the scorecard you need. Right-click the scorecard and select Open The scorecard columns and statistics appear in the editor. The scorecard contains the following information:
The names of the columns in the scorecard. The number of rows in each column. The number of invalid rows in each column according to the rule applied in the profile. The score for each column. This is the percentage of valid rows in the column. A bar chart representation of the score for each column. The data object read by the underlying profile. The source column in the data object. The data source type. The drilldown setting for the column.

3.

Use the Data Viewer to drill down on the data values for a column. You can view the valid values or invalid values in the Data Viewer.

Viewing Column Data in a Scorecard

147

CHAPTER 17

Reference Data
This chapter includes the following topics:
Reference Data Overview, 148 Types of Reference Data, 148

Reference Data Overview


Several transformations read reference data to perform data quality tasks. The following transformations can read reference data:
Address Validator. Reads address reference data to verify the accuracy of addresses. Case Converter. Reads reference data tables to identify strings that must change case. Comparison. Reads identity population data during duplicate analysis. Labeler. Reads reference data tables to identify and label strings. Match. Reads identity population data during duplicate analysis. Parser. Reads reference data tables to parse strings. Standardizer. Reads reference data tables to standardize strings to a common format.

Use the Data Quality Content Installer to install reference data. You can create reference data tables from the results of column profiling. You can export reference tables as XML files.

Types of Reference Data


Reference data installs through the Data Quality Content Installer. The Content installer installs the following types of reference data:
Reference data tables. Contain information on common business terms from several countries. The types of

reference information include telephone area codes, postcode formats, first names, social security number formats, occupations, and acronyms. The Content Installer writes the table structure to the Model Repository and the table data to the staging database defined during installation. You can view and edit these tables in the Developer tool.

148

Address reference data files. Contain information on all valid addresses in a country. The Address Validator

transformation reads this data. You purchase an annual subscription to address data for a country. The Content Installer installs files for the countries that you have purchased. Address reference data is current for a defined period and you must refresh your data regularly, for example every quarter. You cannot view or edit address reference data.
Identity populations. Contain information on types of personal, household, and corporate identities. The

Match transformation and the Comparison transformation use this data to parse potential identities from input fields. The Content Installer writes population files to the file system. Note: The Content Installer user downloads and installs reference data separately from the applications. The Content Installer can also install prebuilt rules to the Model Repository. Contact an Administrator tool user for information about the reference data installed on your system.

Types of Reference Data

149

APPENDIX A

Datatype Reference
This appendix includes the following topics:
Datatype Reference Overview, 150 Flat File and Transformation Datatypes, 150 IBM DB2 and Transformation Datatypes, 151 Microsoft SQL Server and Transformation Datatypes, 152 ODBC and Transformation Datatypes, 154 Oracle and Transformation Datatypes, 155 Converting Data, 157

Datatype Reference Overview


When you create a mapping, you create a set of instructions for the Data Integration Service to read data from a source, transform it, and write it to a target. The Data Integration Service transforms data based on dataflow in the mapping, starting at the first transformation in the mapping, and the datatype assigned to each port in a mapping. The Developer tool displays two types of datatypes:
Native datatypes. Specific to the relational table or flat file used as a physical data object. Native datatypes

appear in the physical data object column properties.


Transformation datatypes. Set of datatypes that appear in the transformations. They are internal datatypes

based on ANSI SQL-92 generic datatypes, which the Data Integration Service uses to move data across platforms. The transformation datatypes appear in all transformations in a mapping. When the Data Integration Service reads source data, it converts the native datatypes to the comparable transformation datatypes before transforming the data. When the Data Integration Service writes to a target, it converts the transformation datatypes to the comparable native datatypes. When you specify a multibyte character set, the datatypes allocate additional space in the database to store characters of up to three bytes.

Flat File and Transformation Datatypes


Flat file datatypes map to transformation datatypes that the Data Integration Service uses to move data across platforms.

150

The following table compares flat file datatypes to transformation datatypes:


Flat File Bigint Datetime Double Int Nstring Number String Transformation Bigint Date/Time Double Integer String Decimal String Range Precision of 19 digits, scale of 0 Jan 1, 0001 A.D. to Dec 31, 9999 A.D. (precision to the nanosecond) Precision of 15 digits -2,147,483,648 to 2,147,483,647 1 to 104,857,600 characters Precision 1 to 28, scale 0 to 28 1 to 104,857,600 characters

When the Data Integration Service reads non-numeric data in a numeric column from a flat file, it drops the row and writes a message in the log. Also, when the Data Integration Service reads non-datetime data in a datetime column from a flat file, it drops the row and writes a message in the log.

IBM DB2 and Transformation Datatypes


IBM DB2 datatypes map to transformation datatypes that the Data Integration Service uses to move data across platforms. The following table compares IBM DB2 datatypes and transformation datatypes:
Datatype Bigint Range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 Transformation Bigint Range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 Precision 19, scale 0 1 to 104,857,600 bytes 1 to 104,857,600 characters 1 to 104,857,600 bytes 1 to 104,857,600 characters Jan 1, 0001 A.D. to Dec 31, 9999 A.D. (precision to the nanosecond) Precision 1 to 28, scale 0 to 28 Precision 15 -2,147,483,648 to 2,147,483,647

Blob Char Char for bit data Clob Date

1 to 2,147,483,647 bytes 1 to 254 characters 1 to 254 bytes 1 to 2,447,483,647 bytes 0001 to 9999 A.D. Precision 19; scale 0 (precision to the day) Precision 1 to 31, scale 0 to 31 Precision 1 to 15 -2,147,483,648 to 2,147,483,647

Binary String Binary Text Date/Time

Decimal Float Integer

Decimal Double Integer

IBM DB2 and Transformation Datatypes

151

Datatype

Range

Transformation

Range Precision 10, scale 0

Smallint

-32,768 to 32,767

Integer

-2,147,483,648 to 2,147,483,647 Precision 10, scale 0 Jan 1, 0001 A.D. to Dec 31, 9999 A.D. (precision to the nanosecond) Jan 1, 0001 A.D. to Dec 31, 9999 A.D. (precision to the nanosecond) 1 to 104,857,600 characters 1 to 104,857,600 bytes

Time

24-hour time period Precision 19, scale 0 (precision to the second) 26 bytes Precision 26, scale 6 (precision to the microsecond) Up to 4,000 characters Up to 4,000 bytes

Date/Time

Timestamp

Date/Time

Varchar Varchar for bit data

String Binary

Unsupported IBM DB2 Datatypes


The Developer tool does not support certain IBM DB2 datatypes. The Developer tool does not support the following IBM DB2 datatypes:
Dbclob Graphic Long Varchar Long Vargraphic Numeric Vargraphic

Microsoft SQL Server and Transformation Datatypes


Microsoft SQL Server datatypes map to transformation datatypes that the Data Integration Service uses to move data across platforms. The following table compares Microsoft SQL Server datatypes and transformation datatypes:
Microsoft SQL Server Binary Bit Char Datetime Range Transformation Range

1 to 8,000 bytes 1 bit 1 to 8,000 characters Jan 1, 1753 A.D. to Dec 31, 9999 A.D. Precision 23, scale 3

Binary String String Date/Time

1 to 104,857,600 bytes 1 to 104,857,600 characters 1 to 104,857,600 characters Jan 1, 0001 A.D. to Dec 31, 9999 A.D. (precision to the nanosecond)

152

Appendix A: Datatype Reference

Microsoft SQL Server

Range (precision to 3.33 milliseconds)

Transformation

Range

Decimal Float Image Int

Precision 1 to 38, scale 0 to 38 -1.79E+308 to 1.79E+308 1 to 2,147,483,647 bytes -2,147,483,648 to 2,147,483,647

Decimal Double Binary Integer

Precision 1 to 28, scale 0 to 28 Precision 15 1 to 104,857,600 bytes -2,147,483,648 to 2,147,483,647 Precision 10, scale 0 Precision 1 to 28, scale 0 to 28

Money

-922,337,203,685,477.5807 to 922,337,203,685,477.5807 Precision 1 to 38, scale 0 to 38 -3.40E+38 to 3.40E+38 Jan 1, 1900, to June 6, 2079 Precision 19, scale 0 (precision to the minute) -32,768 to 32,768

Decimal

Numeric Real Smalldatetime

Decimal Double Date/Time

Precision 1 to 28, scale 0 to 28 Precision 15 Jan 1, 0001 A.D. to Dec 31, 9999 A.D. (precision to the nanosecond)

Smallint

Integer

-2,147,483,648 to 2,147,483,647 Precision 10, scale 0 Precision 1 to 28, scale 0 to 28 1 to 104,857,600 characters 1 to 104,857,600 characters 1 to 104,857,600 bytes Precision 5, scale 0 1 to 104,857,600 bytes 1 to 104,857,600 characters

Smallmoney Sysname Text Timestamp Tinyint Varbinary Varchar

-214,748.3648 to 214,748.3647 1 to 128 characters 1 to 2,147,483,647 characters 8 bytes 0 to 255 1 to 8,000 bytes 1 to 8,000 characters

Decimal String Text Binary Small Integer Binary String

Unsupported Microsoft SQL Server Datatypes


The Developer tool does not support certain Microsoft SQL Server datatypes. The Developer tool does not support the following Microsoft SQL Server datatypes:
Bigint Nchar Ntext Numeric Identity Nvarchar Sql_variant

Microsoft SQL Server and Transformation Datatypes

153

ODBC and Transformation Datatypes


ODBC datatypes map to transformation datatypes that the Data Integration Service uses to move data across platforms. The following table compares ODBC datatypes, such as Microsoft Access or Excel, to transformation datatypes:
Datatype Bigint Transformation Bigint Range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 Precision 19, scale 0 1 to 104,857,600 bytes 1 to 104,857,600 characters 1 to 104,857,600 characters Jan 1, 0001 A.D. to Dec 31, 9999 A.D. (precision to the nanosecond) Precision 1 to 28, scale 0 to 28 Precision 15 Precision 15 -2,147,483,648 to 2,147,483,647 Precision 10, scale 0 1 to 104,857,600 bytes 1 to 104,857,600 characters 1 to 104,857,600 characters 1 to 104,857,600 characters Precision 1 to 28, scale 0 to 28 Precision 15 -2,147,483,648 to 2,147,483,647 Precision 10, scale 0 1 to 104,857,600 characters Jan 1, 0001 A.D. to Dec 31, 9999 A.D. (precision to the nanosecond) Jan 1, 0001 A.D. to Dec 31, 9999 A.D. (precision to the nanosecond) -2,147,483,648 to 2,147,483,647 Precision 10, scale 0

Binary Bit Char Date

Binary String String Date/Time

Decimal Double Float Integer

Decimal Double Double Integer

Long Varbinary Nchar Nvarchar Ntext Numeric Real Smallint

Binary String String Text Decimal Double Integer

Text Time

Text Date/Time

Timestamp

Date/Time

Tinyint

Integer

154

Appendix A: Datatype Reference

Datatype Varbinary Varchar

Transformation Binary String

Range 1 to 104,857,600 bytes 1 to 104,857,600 characters

Oracle and Transformation Datatypes


Oracle datatypes map to transformation datatypes that the Data Integration Service uses to move data across platforms. The following table compares Oracle datatypes and transformation datatypes:
Oracle Blob Char(L) Range Up to 4 GB 1 to 2,000 bytes Transformation Binary String Range 1 to 104,857,600 bytes 1 to 104,857,600 characters 1 to 104,857,600 characters Jan 1, 0001 A.D. to Dec 31, 9999 A.D. (precision to the nanosecond) 1 to 104,857,600 characters If you include Long data in a mapping, the Integration Service converts it to the transformation String datatype, and truncates it to 104,857,600 characters. 1 to 104,857,600 bytes 1 to 104,857,600 characters 1 to 104,857,600 characters Precision of 15 Precision of 1 to 28, scale of 0 to 28

Clob

Up to 4 GB

Text

Date

Jan. 1, 4712 B.C. to Dec. 31, 4712 A.D. Precision 19, scale 0

Date/Time

Long

Up to 2 GB

Text

Long Raw Nchar

Up to 2 GB 1 to 2,000 bytes

Binary String

Nclob

Up to 4 GB

Text

Number Number(P,S)

Precision of 1 to 38 Precision of 1 to 38, scale of 0 to 38

Double Decimal

Oracle and Transformation Datatypes

155

Oracle Nvarchar2

Range 1 to 4,000 bytes

Transformation String

Range 1 to 104,857,600 characters 1 to 104,857,600 bytes Jan 1, 0001 A.D. to Dec 31, 9999 A.D. (precision to the nanosecond) 1 to 104,857,600 characters 1 to 104,857,600 characters 1 to 104,857,600 characters

Raw Timestamp

1 to 2,000 bytes Jan. 1, 4712 B.C. to Dec. 31, 9999 A.D. Precision 19 to 29, scale 0 to 9 (precision to the nanosecond) 1 to 4,000 bytes

Binary Date/Time

Varchar

String

Varchar2

1 to 4,000 bytes

String

XMLType

Up to 4 GB

Text

Number(P,S) Datatype
The Developer tool supports Oracle Number(P,S) values with negative scale. However, it does not support Number(P,S) values with scale greater than precision 28 or a negative precision. If you import a table with an Oracle Number with a negative scale, the Developer tool displays it as a Decimal datatype. However, the Data Integration Service converts it to a double.

Char, Varchar, Clob Datatypes


When the Data Integration Service uses the Unicode data movement mode, it reads the precision of Char, Varchar, and Clob columns based on the length semantics that you set for columns in the Oracle database. If you use the byte semantics to determine column length, the Data Integration Service reads the precision as the number of bytes. If you use the char semantics, the Data Integration Service reads the precision as the number of characters.

Unsupported Oracle Datatypes


The Developer tool does not support certain Oracle datatypes. The Developer tool does not support the following Oracle datatypes:
Bfile Interval Day to Second Interval Year to Month Mslabel Raw Mslabel Rowid Timestamp with Local Time Zone Timestamp with Time Zone

156

Appendix A: Datatype Reference

Converting Data
You can convert data from one datatype to another. To convert data from one datatype to another, use one the following methods:
Pass data between ports with different datatypes (port-to-port conversion). Use transformation functions to convert data. Use transformation arithmetic operators to convert data.

Port-to-Port Data Conversion


The Data Integration Service converts data based on the datatype of the port. Each time data passes through a port, the Data Integration Service looks at the datatype assigned to the port and converts the data if necessary. When you pass data between ports of the same numeric datatype and the data is transferred between transformations, the Data Integration Service does not convert the data to the scale and precision of the port that the data is passed to. For example, you transfer data between two transformations in a mapping. If you pass data from a decimal port with a precision of 5 to a decimal port with a precision of 4, the Data Integration Service stores the value internally and does not truncate the data. You can convert data by passing data between ports with different datatypes. For example, you can convert a string to a number by passing it to an Integer port. The Data Integration Service performs port-to-port conversions between transformations and between the last transformation in a dataflow and a target. The following table describes the port-to-port conversions that the Data Integration Service performs:
Datatype Bigint Integer Decima l Double Strin g, Text Yes Yes Yes Yes Yes Date/Time Binary

Bigint Integer Decimal Double String, Text Date/ Time Binary

No Yes Yes Yes Yes

Yes No Yes Yes Yes

Yes Yes No Yes Yes

Yes Yes Yes No Yes

No No No No Yes

No No No No No

No

No

No

No

Yes

Yes

No

No

No

No

No

No

No

Yes

Converting Data

157

INDEX

A
applications creating 100 mapping deployment properties 102 replacing 103 updating 101, 103 attributes relationships 123

C
cheat sheets description 5 configurations troubleshooting 117 connections Connection Explorer view 26 creating 27 DB2 for i5/OS properties 14 DB2 for z/OS properties 16 IBM DB2 properties 18 IMS properties 19 Microsoft SQL Server properties 20 ODBC properties 21 Oracle properties 22 overview 13 SAP properties 23 sequential properties 24 VSAM properties 25 copy description 11 objects 12 objects as links 12 custom queries Informatica join syntax 39 left outer join syntax 40 normal join syntax 39 outer join support 38 right outer join syntax 41 custom SQL queries creating 38 customized data objects 38 customized data objects adding pre- and post-mapping SQL commands 42 adding relational data objects 44 adding relational resources 44 advanced query 32 creating 43 creating a custom query 38 creating key relationships 34 creating keys 34 custom SQL queries 38 default query 32 description 31

entering source filters 35 entering user-defined joins 37 key relationships 33 pre- and post-mapping SQL commands 42 reserved words file 32 select distinct 35 simple query 32 sorted ports 36 troubleshooting 57 user-defined joins 37 using select distinct 35 using sorted ports 36 write properties 43

D
Data Integration Service selecting 112 data viewer configuration properties 115 configurations 113 creating configurations 113 troubleshooting configurations 117 datatypes flat file 151 IBM DB2 151 Microsoft SQL Server 152 ODBC 154 Oracle 155 overview 150 port-to-port data conversion 157 default SQL query viewing 38 dependencies implicit 64 link path 64 deployment mapping properties 102 overview 99 replacing applications 103 to a Data Integration Service 100 to file 101 updating applications 103 domains adding 6 description 5

E
export dependent objects 89 objects 90 overview 88 to PowerCenter 92

158

XML file 89 export to PowerCenter export restrictions 96 exporting objects 95 options 94 overview 92 release compatibility 93 rules and guidelines 97 setting the compatibility level 93 troubleshooting 98 expressions pushdown optimization 77

K
key relationships creating between relational data objects 30 creating in customized data objects 34 customized data objects 33 relational data objects 29

L
logical data object mappings creating 125 read mappings 124 types 124 write mappings 124 logical data object models creating 121 description 121 importing 122 logical data objects attribute relationships 123 creating 123 description 122 properties 123 logical view of data developing 121 overview 120 logs description 118

F
Filter transformation pushdown optimization 75 filters 35 flat file data objects advanced properties 52 column properties 46 configuring read properties 50 configuring write properties 52 creating 53 delimited, importing 54 description 45 fixed-width, importing 54 general properties 46 read properties 47, 50 folders creating 9 description 9 functions available in sources 77 pushdown optimization 77

M
mappings adding objects 60 configuration properties 115 configurations 113, 114 connection validation 66 creating 59 creating configurations 114 deployment properties 102 developing 59 early projection optimization method 70 early selection optimization method 71 expression validation 67 object dependency 58 object validation 67 objects 60 optimization methods 70 overview 58 performance 69 predicate optimization method 71 running 67 semi-join optimization method 72 troubleshooting configurations 117 validating 67 validation 66 mapplets creating 86 exporting to PowerCenter 93 input 85 output 86 overview 84 rules 85 types 84 validating 86

I
IBM DB2 sources pushdown optimization 76 identifying relationships description 123 import application archives 91 dependent objects 89 objects 91 overview 88 XML file 89 Informatica Data Quality overview 2 Informatica Data Services overview 3 Informatica Developer overview 2 setting up 5

J
join syntax customized data objects 39 Informatica syntax 39 left outer join syntax 40 normal join syntax 39 right outer join syntax 41

Index

159

Microsoft SQL Server sources pushdown optimization 76 Model repository adding 7 connecting 7 description 6 objects 6

N
non-identifying relationships description 123 non-relational data objects description 44 importing 45 nonrelational sources pushdown optimization 76

O
objects copying 12 copying as a link 12 operators available in sources 81 pushdown optimization 81 Oracle sources pushdown optimization 76 outer join support customized data objects 38

P
parameter files creating 111 overview 104 purpose 107 running mappings with 107 structure 107 XML schema definition 109 parameters assigning 107 creating 106 overview 104 purpose 105 types 105 where to apply 106 where to create 105 performance tuning creating data viewer configurations 113 creating mapping configurations 114 data viewer configurations 113 early projection optimization method 70 early selection optimization method 71 mapping configurations 114 optimization methods 70 overview 69 predicate optimization method 71 semi-join optimization method 72 permissions assigning 9 physical data objects customized data objects 31 description 28 flat file data objects 45 non-relational data objects 44

relational data objects 29 SAP data objects 56 synchronization 56 troubleshooting 57 port attributes propogating 63 ports connection validation 66 linking 61 linking automatically 62 linking by name 62 linking by position 62 linking manually 61 linking rules and guidelines 62 propagated attributes by transformation 64 pre- and post-mapping SQL commands adding to customized data objects 42 customized data objects 42 primary keys creating in customized data objects 34 creating in relational data objects 30 projects assigning permissions 9 creating 8 description 8 sharing 8 pushdown optimization expressions 77 overview 75 process 75 SAP sources 76 Filter transformation 75 functions 77 IBM DB2 sources 76 Microsoft SQL Server sources 76 nonrelational sources on z/OS 76 ODBC sources 76 operators 81 Oracle sources 76 relational sources 76 Sybase ASE sources 76

R
read transformations creating from relational data objects 30 relational connections adding to customized data objects 44 relational data objects adding to customized data objects 44 creating key relationships 30 creating keys 30 creating read transformations 30 description 29 importing 31 key relationships 29 troubleshooting 57 relational sources pushdown optimization 76 reserved words file creating 33 customized data objects 32

160

Index

S
SAP data objects description 56 importing 56 SAP sources pushdown optimization 76 search description 10 searching for objects and properties 10 segments copying 67, 87 select distinct customized data objects 35 using in customized data objects 35 self-joins custom SQL queries 38 sorted ports customized data objects 36 using in customized data objects 36 source filters entering 35 SQL data services creating 127 defining 127 overview 127 previewing data 130 SQL queries previewing data 130 SQL query plans example 135 overview 135 viewing 136 Sybase ASE sources pushdown optimization 76 synchronization customized data objects 56 physical data objects 56

V
validation configuring preferences 11 views Connection Explorer view 26 description 4 virtual data overview 126 virtual stored procedures creating 134 defining 133 overview 132 previewing output 134 validating 134 virtual table mappings creating 131 defining 131 description 130 previewing output 132 validating 132 virtual tables creating from a data object 129 creating manually 129 data access methods 129 defining relationships 130 description 128

W
Welcome page description 5 workbench description 4

T
troubleshooting exporting objects to PowerCenter 98

U
user-defined joins customized data objects 37 entering 37 Informatica syntax 39 left outer join syntax 40 normal join syntax 39 outer join support 38 right outer join syntax 41

Index

161

S-ar putea să vă placă și