Documente Academic
Documente Profesional
Documente Cultură
Informatica PowerCenter
(Version 7.1.1)
Table of Contents
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxv
New Features and Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvi
PowerCenter 7.1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvi
PowerCenter 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxviii
PowerCenter 7.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlii
About Informatica Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlviii
About this Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlix
Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xlix
Other Informatica Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . l
Visiting Informatica Customer Portal . . . . . . . . . . . . . . . . . . . . . . . . . . . l
Visiting the Informatica Webzine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . l
Visiting the Informatica Web Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . l
Visiting the Informatica Developer Network . . . . . . . . . . . . . . . . . . . . . . l
Obtaining Technical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . li
iii
Table of Contents
Indicator File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Output File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Cache Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Server Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Steps for Registering a PowerCenter Server . . . . . . . . . . . . . . . . . . . . . . 48
Deleting a PowerCenter Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Configuring Connection Object Permissions . . . . . . . . . . . . . . . . . . . . . . . . 51
Connection Object Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Setting Up a Relational Database Connection . . . . . . . . . . . . . . . . . . . . . . . 53
Database Connect Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Database Connection Code Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Configuring Environment SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Configuring a Relational Database Connection . . . . . . . . . . . . . . . . . . . 56
Deleting Connection Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Copying a Relational Database Connection . . . . . . . . . . . . . . . . . . . . . . 59
Replacing a Relational Database Connection . . . . . . . . . . . . . . . . . . . . . . . . 62
Table of Contents
vi
Table of Contents
Table of Contents
vii
viii
Table of Contents
Table of Contents
ix
Table of Contents
Table of Contents
xi
xii
Table of Contents
Table of Contents
xiii
xiv
Table of Contents
Table of Contents
xv
Table of Contents
Table of Contents
xvii
xviii
Table of Contents
Table of Contents
xix
Resumeworkflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603
Resumeworklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603
Scheduleworkflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604
Setfolder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604
Setnowait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
Setwait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
Showsettings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
Shutdownserver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
Starttask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606
Startworkflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
Stoptask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
Stopworkflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
Unscheduleworkflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610
Unsetfolder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610
Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
Waittask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
Waitworkflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
xx
Table of Contents
Table of Contents
xxi
xxii
Table of Contents
Table of Contents
xxiii
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763
xxiv
Table of Contents
List of Figures
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.. 2
.. 3
.. 6
. 15
. 15
. 17
. 18
. 18
. 20
. 20
. 22
. 29
. 30
. 40
. 42
. 43
. 66
. 68
. 75
. 76
. 81
. 88
. 89
. 93
. 93
. 95
. 95
. 96
104
107
107
108
108
115
116
120
125
135
137
146
List of Figures
xxv
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
xxvi
List of Figures
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.150
.150
.151
.153
.161
.167
.168
.169
.177
.179
.181
.183
.184
.187
.189
.193
.198
.210
.211
.212
.213
.215
.216
.217
.219
.221
.221
.223
.223
.225
.236
.237
.238
.239
.242
.244
.247
.250
.262
.264
.265
.265
.266
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
267
279
280
281
282
286
290
292
303
304
304
306
307
308
309
312
328
332
339
341
342
343
343
347
348
349
351
352
354
360
361
362
363
364
365
367
371
378
381
382
386
387
388
List of Figures
xxvii
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
xxviii
List of Figures
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.389
.390
.393
.403
.408
.410
.411
.412
.413
.415
.415
.415
.415
.420
.423
.426
.427
.431
.434
.446
.447
.477
.497
.502
.504
.506
.508
.539
.553
.554
.556
.580
.648
.668
.670
.673
.676
.677
.679
.681
.684
.685
.686
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
688
689
690
693
694
696
699
701
702
702
704
705
706
707
708
709
710
712
713
715
716
717
718
722
724
726
727
729
731
732
733
737
739
740
741
741
742
744
745
747
747
748
748
List of Figures
xxix
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
xxx
C-13.
C-14.
C-15.
C-16.
C-17.
C-18.
C-19.
C-20.
C-21.
List of Figures
Server
Server
Server
Server
Server
Server
Server
Server
Server
Manager
Manager
Manager
Manager
Manager
Manager
Manager
Manager
Manager
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
..
..
..
..
..
..
..
..
.
.
.
.
.
.
.
.
.
..
..
..
..
..
..
..
..
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
..
..
..
..
..
..
..
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
..
..
..
..
..
..
..
..
.
.
.
.
.
.
.
.
.
..
..
..
..
..
..
..
..
..
.
.
.
.
.
.
.
.
.
.749
.750
.751
.752
.754
.755
.756
.758
.761
List of Tables
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.. 6
. 14
. 40
. 42
. 43
. 44
. 47
. 49
. 54
. 54
. 57
. 58
. 83
. 86
. 86
105
110
115
117
132
162
179
201
215
220
222
224
226
228
234
242
244
245
262
264
266
267
269
270
270
List of Tables
xxxi
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
xxxii
List of Tables
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.288
.292
.299
.299
.300
.301
.334
.334
.347
.352
.353
.357
.376
.376
.377
.379
.381
.382
.394
.396
.397
.410
.411
.413
.421
.434
.438
.448
.451
.451
.456
.458
.464
.473
.478
.479
.483
.485
.487
.487
.490
.494
.496
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
513
520
527
529
530
531
534
536
540
542
542
544
545
546
547
547
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
549
555
563
582
590
590
593
594
595
614
616
617
618
618
618
618
655
668
671
674
676
678
679
682
685
686
687
List of Tables
xxxiii
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
xxxiv
List of Tables
Preface
Welcome to PowerCenter, Informaticas software product that delivers an open, scalable data
integration solution addressing the complete life cycle for all data integration projects
including data warehouses and data marts, data migration, data synchronization, and
information hubs. PowerCenter combines the latest technology enhancements for reliably
managing data repositories and delivering information resources in a timely, usable, and
efficient manner.
The PowerCenter metadata repository coordinates and drives a variety of core functions,
including extracting, transforming, loading, and managing data. The PowerCenter Server can
extract large volumes of data from multiple platforms, handle complex transformations on the
data, and support high-speed loads. PowerCenter can simplify and accelerate the process of
moving data warehouses from development to test to production.
xxxv
PowerCenter 7.1.1
This section describes new features and enhancements to PowerCenter 7.1.1.
Data Profiling
Data sampling. You can create a data profile for a sample of source data instead of the
entire source. You can view a profile from a random sample of data, a specified percentage
of data, or for a specified number of rows starting with the first row.
Verbose data enhancements. You can specify the type of verbose data you want the
PowerCenter Server to write to the Data Profiling warehouse. The PowerCenter Server can
write all rows, the rows that meet the business rule, or the rows that do not meet the
business rule.
Session enhancement. You can save sessions that you create from the Profile Manager to
the repository.
Domain Inference function tuning. You can configure the Data Profiling Wizard to filter
the Domain Inference function results. You can configure a maximum number of patterns
and a minimum pattern frequency. You may want to narrow the scope of patterns returned
to view only the primary domains, or you may want to widen the scope of patterns
returned to view exception data.
Row Uniqueness function. You can determine unique rows for a source based on a
selection of columns for the specified source.
Define mapping, session, and workflow prefixes. You can define default mapping,
session, and workflow prefixes for the mappings, sessions, and workflows generated when
you create a data profile.
Profile mapping display in the Designer. The Designer displays profile mappings under a
profile mappings node in the Navigator.
PowerCenter Server
xxxvi
Preface
Code page. PowerCenter supports additional Japanese language code pages, such as JIPSEkana, JEF-kana, and MELCOM-kana.
Flat file partitioning. When you create multiple partitions for a flat file source session, you
can configure the session to create multiple threads to read the flat file source.
pmcmd. You can use parameter files that reside on a local machine with the Startworkflow
command in the pmcmd program. When you use a local parameter file, pmcmd passes
variables and values in the file to the PowerCenter Server.
SuSE Linux support. The PowerCenter Server runs on SuSE Linux. On SuSE Linux, you
can connect to IBM, DB2, Oracle, and Sybase sources, targets, and repositories using
native drivers. Use ODBC drivers to access other sources and targets.
Reserved word support. If any source, target, or lookup table name or column name
contains a database reserved word, you can create and maintain a file, reswords.txt,
containing reserved words. When the PowerCenter Server initializes a session, it searches
for reswords.txt in the PowerCenter Server installation directory. If the file exists, the
PowerCenter Server places quotes around matching reserved words when it executes SQL
against the database.
Teradata external loader. When you load to Teradata using an external loader, you can
now override the control file. Depending on the loader you use, you can also override the
error, log, and work table names by specifying different tables on the same or different
Teradata database.
Repository
Exchange metadata with other tools. You can exchange source and target metadata with
other BI or data modeling tools, such as Business Objects Designer. You can export or
import multiple objects at a time. When you export metadata, the PowerCenter Client
creates a file format recognized by the target tool.
Repository Server
Enable enhanced security when you create a relational source or target connection in the
repository.
SuSE Linux support. The Repository Server runs on SuSE Linux. On SuSE Linux, you
can connect to IBM, DB2, Oracle, and Sybase repositories.
Security
Attachment support. When you import web service definitions with attachment groups,
you can pass attachments through the requests or responses in a service session. The
document type you can attach is based on the mime content of the WSDL file. You can
attach document types such as XML, JPEG, GIF, or PDF.
Preface
xxxvii
Pipeline partitioning. You can create multiple partitions in a session containing web
service source and target definitions. The PowerCenter Server creates a connection to the
Web Services Hub based on the number of sources, targets, and partitions in the session.
XML
Multi-level pivoting. You can now pivot more than one multiple-occurring element in an
XML view. You can also pivot the view row.
PowerCenter 7.1
This section describes new features and enhancements to PowerCenter 7.1.
Data Profiling
Data Profiling for VSAM sources. You can now create a data profile for VSAM sources.
Support for verbose mode for source-level functions. You can now create data profiles
with source-level functions and write data to the Data Profiling warehouse in verbose
mode.
Aggregator function in auto profiles. Auto profiles now include the Aggregator function.
Creating auto profile enhancements. You can now select the columns or groups you want
to include in an auto profile and enable verbose mode for the Distinct Value Count
function.
Purging data from the Data Profiling warehouse. You can now purge data from the Data
Profiling warehouse.
Source View in the Profile Manager. You can now view data profiles by source definition
in the Profile Manager.
PowerCenter Data Profiling report enhancements. You can now view PowerCenter Data
Profiling reports in a separate browser window, resize columns in a report, and view
verbose data for Distinct Value Count functions.
Prepackaged domains. Informatica provides a set of prepackaged domains that you can
include in a Domain Validation function in a data profile.
Documentation
Web Services Provider Guide. This is a new book that describes the functionality of Real-time
Web Services. It also includes information from the version 7.0 Web Services Hub Guide.
XML User Guide. This book consolidates XML information previously documented in the
Designer Guide, Workflow Administration Guide, and Transformation Guide.
Licensing
Informatica provides licenses for each CPU and each repository rather than for each
installation. Informatica provides licenses for product, connectivity, and options. You store
xxxviii Preface
the license keys in a license key file. You can manage the license files using the Repository
Server Administration Console, the PowerCenter Server Setup, and the command line
program, pmlic.
PowerCenter Server
64-bit support. You can now run 64-bit PowerCenter Servers on AIX and HP-UX
(Itanium).
Partitioning enhancements. If you have the Partitioning option, you can define up to 64
partitions at any partition point in a pipeline that supports multiple partitions.
CLOB/BLOB datatype support. You can now read and write CLOB/BLOB datatypes.
Repository Server
Updating repository statistics. PowerCenter now identifies and updates statistics for all
repository tables and indexes when you copy, upgrade, and restore repositories. This
improves performance when PowerCenter accesses the repository.
pmrep. You can use pmrep to back up, disable, or enable a repository, delete a relational
connection from a repository, delete repository details, truncate log files, and run multiple
pmrep commands sequentially. You can also use pmrep to create, modify, and delete a
folder.
Repository
Exchange metadata with business intelligence tools. You can export metadata to and
import metadata from other business intelligence tools, such as Cognos Report Net and
Business Objects.
Object import and export enhancements. You can compare objects in an XML file to
objects in the target repository when you import objects.
MX views. MX views have been added to help you analyze metadata stored in the
repository. REP_SERVER_NET and REP_SERVER_NET_REF views allow you to see
information about server grids. REP_VERSION_PROPS allows you to see the version
history of all objects in a PowerCenter repository.
Preface
xxxix
Transformations
Flat file lookup. You can now perform lookups on flat files. When you create a Lookup
transformation using a flat file as a lookup source, the Designer invokes the Flat File
Wizard. You can also use a lookup file parameter if you want to change the name or
location of a lookup between session runs.
Dynamic lookup cache enhancements. When you use a dynamic lookup cache, the
PowerCenter Server can ignore some ports when it compares values in lookup and input
ports before it updates a row in the cache. Also, you can choose whether the PowerCenter
Server outputs old or new values from the lookup/output ports when it updates a row. You
might want to output old values from lookup/output ports when you use the Lookup
transformation in a mapping that updates slowly changing dimension tables.
Union transformation. You can use the Union transformation to merge multiple sources
into a single pipeline. The Union transformation is similar to using the UNION ALL SQL
statement to combine the results from two or more SQL statements.
Midstream XML transformations. You can now create an XML Parser transformation or
an XML Generator transformation to parse or generate XML inside a pipeline. The XML
transformations enable you to extract XML data stored in relational tables, such as data
stored in a CLOB column. You can also extract data from messaging systems, such as
TIBCO or IBM MQSeries.
Usability
Viewing active folders. The Designer and the Workflow Manager highlight the active
folder in the Navigator.
Version Control
You can run object queries that return shortcut objects. You can also run object queries based
on the latest status of an object. The query can return local objects that are checked out, the
latest version of checked in objects, or a collection of all older versions of objects.
xl
Preface
Real-time Web Services. Real-time Web Services allows you to create services using the
Workflow Manager and make them available to web service clients through the Web
Services Hub. The PowerCenter Server can perform parallel processing of both requestresponse and one-way services.
Web Services Hub. The Web Services Hub now hosts Real-time Web Services in addition
to Metadata Web Services and Batch Web Services. You can install the Web Services Hub
on a JBoss application server.
Note: PowerCenter Connect for Web Services allows you to create sources, targets, and
transformations to call web services hosted by other providers. For more informations, see
PowerCenter Connect for Web Services User and Administrator Guide.
Workflow Monitor
The Workflow Monitor includes the following performance and usability enhancements:
When you connect to the PowerCenter Server, you no longer distinguish between online
or offline mode.
You can open multiple instances of the Workflow Monitor on one machine.
You can simultaneously monitor multiple PowerCenter Servers registered to the same
repository.
The Workflow Monitor includes improved options for filtering tasks by start and end
time.
The Workflow Monitor displays workflow runs in Task view chronologically with the most
recent run at the top. It displays folders alphabetically.
XML Support
PowerCenter XML support now includes the following features:
Enhanced datatype support. You can use XML schemas that contain simple and complex
datatypes.
Additional options for XML definitions. When you import XML definitions, you can
choose how you want the Designer to represent the metadata associated with the imported
files. You can choose to generate XML views using hierarchy or entity relationships. In a
view with hierarchy relationships, the Designer expands each element and reference under
its parent element. When you create views with entity relationships, the Designer creates
separate entities for references and multiple-occurring elements.
Synchronizing XML definitions. You can synchronize one or more XML definition when
the underlying schema changes. You can synchronize an XML definition with any
repository definition or file used to create the XML definition, including relational sources
or targets, XML files, DTD files, or schema files.
XML workspace. You can edit XML views and relationships between views in the
workspace. You can create views, add or delete columns from views, and define
relationships between views.
Midstream XML transformations. You can now create an XML Parser transformation or
an XML Generator transformation to parse or generate XML inside a pipeline. The XML
transformations enable you to extract XML data stored in relational tables, such as data
stored in a CLOB column. You can also extract data from messaging systems, such as
TIBCO or IBM MQSeries.
Preface
xli
Support for circular references. Circular references occur when an element is a direct or
indirect child of itself. PowerCenter now supports XML files, DTD files, and XML
schemas that use circular definitions.
Increased performance for large XML targets. You can create XML files of several
gigabytes in a PowerCenter 7.1 XML session by using the following enhancements:
Spill to disk. You can specify the size of the cache used to store the XML tree. If the size
of the tree exceeds the cache size, the XML data spills to disk in order to free up
memory.
User-defined commits. You can define commits to trigger flushes for XML target files.
Support for multiple XML output files. You can output XML data to multiple XML
targets. You can also define the file names for XML output files in the mapping.
PowerCenter 7.0
This section describes new features and enhancements to PowerCenter 7.0.
Data Profiling
If you have the Data Profiling option, you can profile source data to evaluate source data and
detect patterns and exceptions. For example, you can determine implicit data type, suggest
candidate keys, detect data patterns, and evaluate join criteria. After you create a profiling
warehouse, you can create profiling mappings and run sessions. Then you can view reports
based on the profile data in the profiling warehouse.
The PowerCenter Client provides a Profile Manager and a Profile Wizard to complete these
tasks.
Documentation
xlii
Preface
Glossary. The Installation and Configuration Guide contains a glossary of new PowerCenter
terms.
Upgrading metadata. The Installation and Configuration Guide now contains a chapter
titled Upgrading Repository Metadata. This chapter describes changes to repository
objects impacted by the upgrade process. The change in functionality for existing objects
depends on the version of the existing objects. Consult the upgrade information in this
chapter for each upgraded object to determine whether the upgrade applies to your current
version of PowerCenter.
Functions
Soundex. The Soundex function encodes a string value into a four-character string.
SOUNDEX works for characters in the English alphabet (A-Z). It uses the first character
of the input string as the first character in the return value and encodes the remaining
three unique consonants as numbers.
Metaphone. The Metaphone function encodes string values. You can specify the length of
the string that you want to encode. METAPHONE encodes characters of the English
language alphabet (A-Z). It encodes both uppercase and lowercase letters in uppercase.
Installation
Remote PowerCenter Client installation. You can create a control file containing
installation information, and distribute it to other users to install the PowerCenter Client.
You access the Informatica installation CD from the command line to create the control
file and install the product.
Metadata browsing. You can use PowerCenter Metadata Reporter to browse PowerCenter
7.0 metadata, such as workflows, worklets, mappings, source and target tables, and
transformations.
Metadata analysis. You can use PowerCenter Metadata Reporter to analyze operational
metadata, including session load time, server load, session completion status, session
errors, and warehouse growth.
PowerCenter Server
DB2 bulk loading. You can enable bulk loading when you load to IBM DB2 8.1.
Distributed processing. If you purchase the Server Grid option, you can group
PowerCenter Servers registered to the same repository into a server grid. In a server grid,
PowerCenter Servers balance the workload among all the servers in the grid.
Row error logging. The session configuration object has new properties that allow you to
define error logging. You can choose to log row errors in a central location to help
understand the cause and source of errors.
External loading enhancements. When using external loaders on Windows, you can now
choose to load from a named pipe. When using external loaders on UNIX, you can now
choose to load from staged files.
Preface
xliii
External loading using Teradata Warehouse Builder. You can use Teradata Warehouse
Builder to load to Teradata. You can choose to insert, update, upsert, or delete data.
Additionally, Teradata Warehouse Builder can simultaneously read from multiple sources
and load data into one or more tables.
Mixed mode processing for Teradata external loaders. You can now use data driven load
mode with Teradata external loaders. When you select data driven loading, the
PowerCenter Server flags rows for insert, delete, or update. It writes a column in the target
file or named pipe to indicate the update strategy. The control file uses these values to
determine how to load data to the target.
Concurrent processing. The PowerCenter Server now reads data concurrently from
sources within a target load order group. This enables more efficient joins with minimal
usage of memory and disk cache.
Real time processing enhancements. You can now use real-time processing in sessions that
also process active transformations, such as the Aggregator transformation. You can apply
the transformation logic to rows defined by transaction boundaries.
Repository Server
Object export and import enhancements. You can now export and import objects using
the Repository Manager and pmrep. You can export and import multiple objects and
objects types. You can export and import objects with or without their dependent objects.
You can also export objects from a query result or objects history.
pmrep commands. You can use pmrep to perform change management tasks, such as
maintaining deployment groups and labels, checking in, deploying, importing, exporting,
and listing objects. You can also use pmrep to run queries. The deployment and object
import commands require you to use a control file to define options and resolve conflicts.
Trusted connections. You can now use a Microsoft SQL Server trusted connection to
connect to the repository.
Security
xliv
Preface
LDAP user authentication. You can now use default repository user authentication or
Lightweight Directory Access Protocol (LDAP) to authenticate users. If you use LDAP, the
repository maintains an association between your repository user name and your external
login name. When you log in to the repository, the security module passes your login name
to the external directory for authentication. The repository maintains a status for each
user. You can now enable or disable users from accessing the repository by changing the
status. You do not have to delete user names from the repository.
Use Repository Manager privilege. The Use Repository Manager privilege allows you to
perform tasks in the Repository Manager, such as copy object, maintain labels, and change
object status. You can perform the same tasks in the Designer and Workflow Manager if
you have the Use Designer and Use Workflow Manager privileges.
Audit trail. You can track changes to repository users, groups, privileges, and permissions
through the Repository Server Administration Console. The Repository Agent logs
security changes to a log file stored in the Repository Server installation directory. The
audit trail log contains information, such as changes to folder properties, adding or
removing a user or group, and adding or removing privileges.
Transformations
Joiner transformation. You can use the Joiner transformation to join two data streams that
originate from the same source.
Version Control
The PowerCenter Client and repository introduce features that allow you to create and
manage multiple versions of objects in the repository. Version control allows you to maintain
multiple versions of an object, control development on the object, track changes, and use
deployment groups to copy specific groups of objects from one repository to another. Version
control in PowerCenter includes the following features:
Object versioning. Individual objects in the repository are now versioned. This allows you
to store multiple copies of a given object during the development cycle. Each version is a
separate object with unique properties.
Check out and check in versioned objects. You can check out and reserve an object you
want to edit, and check in the object when you are ready to create a new version of the
object in the repository.
Compare objects. The Repository Manager and Workflow Manager allow you to compare
two repository objects of the same type to identify differences between them. You can
compare Designer objects and Workflow Manager objects in the Repository Manager. You
can compare tasks, sessions, worklets, and workflows in the Workflow Manager. The
PowerCenter Client tools allow you to compare objects across open folders and
repositories. You can also compare different versions of the same object.
Delete or purge a version. You can delete an object from view and continue to store it in
the repository. You can recover or undelete deleted objects. If you want to permanently
remove an object version, you can purge it from the repository.
Deployment. Unlike copying a folder, copying a deployment group allows you to copy a
select number of objects from multiple folders in the source repository to multiple folders
in the target repository. This gives you greater control over the specific objects copied from
one repository to another.
Preface
xlv
Deployment groups. You can create a deployment group that contains references to
objects from multiple folders across the repository. You can create a static deployment
group that you manually add objects to, or create a dynamic deployment group that uses a
query to populate the group.
Labels. A label is an object that you can apply to versioned objects in the repository. This
allows you to associate multiple objects in groups defined by the label. You can use labels
to track versioned objects during development, improve query results, and organize groups
of objects for deployment or export and import.
Queries. You can create a query that specifies conditions to search for objects in the
repository. You can save queries for later use. You can make a private query, or you can
share it with all users in the repository.
Track changes to an object. You can view a history that includes all versions of an object
and compare any version of the object in the history to any other version. This allows you
to see the changes made to an object over time.
XML Support
PowerCenter contains XML features that allow you to validate an XML file against an XML
schema, declare multiple namespaces, use XPath to locate XML nodes, increase performance
for large XML files, format your XML file output for increased readability, and parse or
generate XML data from various sources. XML support in PowerCenter includes the
following features:
XML schema. You can use an XML schema to validate an XML file and to generate source
and target definitions. XML schemas allow you to declare multiple namespaces so you can
use prefixes for elements and attributes. XML schemas also allow you to define some
complex datatypes.
XPath support. The XML wizard allows you to view the structure of XML schema. You
can use XPath to locate XML nodes.
Increased performance for large XML files. When you process an XML file or stream, you
can set commits and periodically flush XML data to the target instead of writing all the
output at the end of the session. You can choose to append the data to the same target file
or create a new target file after each flush.
XML target enhancements. You can format the XML target file so that you can easily view
the XML file in a text editor. You can also configure the PowerCenter Server to not output
empty elements to the XML target.
Usability
xlvi
Preface
Copying objects. You can now copy objects from all the PowerCenter Client tools using
the copy wizard to resolve conflicts. You can copy objects within folders, to other folders,
and to different repositories. Within the Designer, you can also copy segments of
mappings to a workspace in a new folder or repository.
Comparing objects. You can compare workflows and tasks from the Workflow Manager.
You can also compare all objects from within the Repository Manager.
Change propagation. When you edit a port in a mapping, you can choose to propagate
changed attributes throughout the mapping. The Designer propagates ports, expressions,
and conditions based on the direction that you propagate and the attributes you choose to
propagate.
Revert to saved. You can now revert to the last saved version of an object in the Workflow
Manager. When you do this, the Workflow Manager accesses the repository to retrieve the
last-saved version of the object.
Enhanced validation messages. The PowerCenter Client writes messages in the Output
window that describe why it invalidates a mapping or workflow when you modify a
dependent object.
Validate multiple objects. You can validate multiple objects in the repository without
fetching them into the workspace. You can save and optionally check in objects that
change from invalid to valid status as a result of the validation. You can validate sessions,
mappings, mapplets, workflows, and worklets.
View dependencies. Before you edit or delete versioned objects, such as sources, targets,
mappings, or workflows, you can view dependencies to see the impact on other objects.
You can view parent and child dependencies and global shortcuts across repositories.
Viewing dependencies help you modify objects and composite objects without breaking
dependencies.
Refresh session mappings. In the Workflow Manager, you can refresh a session mapping.
Preface
xlvii
xlviii
Preface
Data Profiling Guide. Provides information about how to profile PowerCenter sources to
evaluate source data and detect patterns and exceptions.
Designer Guide. Provides information needed to use the Designer. Includes information to
help you create mappings, mapplets, and transformations. Also includes a description of
the transformation datatypes used to process and transform source data.
PowerCenter Connect for JMS User and Administrator Guide. Provides information
to install PowerCenter Connect for JMS, build mappings, extract data from JMS messages,
and load data into JMS messages.
Repository Guide. Provides information needed to administer the repository using the
Repository Manager or the pmrep command line program. Includes details on
functionality available in the Repository Manager and Administration Console, such as
creating and maintaining repositories, folders, users, groups, and permissions and
privileges.
Transformation Language Reference. Provides syntax descriptions and examples for each
transformation function provided with PowerCenter.
Transformation Guide. Provides information on how to create and configure each type of
transformation in the Designer.
Troubleshooting Guide. Lists error messages that you might encounter while using
PowerCenter. Each error message includes one or more possible causes and actions that
you can take to correct the condition.
Web Services Provider Guide. Provides information you need to install and configure the Web
Services Hub. This guide also provides information about how to use the web services that the
Web Services Hub hosts. The Web Services Hub hosts Real-time Web Services, Batch Web
Services, and Metadata Web Services.
Workflow Administration Guide. Provides information to help you create and run
workflows in the Workflow Manager, as well as monitor workflows in the Workflow
Monitor. Also contains information on administering the PowerCenter Server and
performance tuning.
XML User Guide. Provides information you need to create XML definitions from XML,
XSD, or DTD files, and relational or other XML definitions. Includes information on
running sessions with XML data. Also includes details on using the midstream XML
transformations to parse or generate XML data within a pipeline.
Document Conventions
This guide uses the following formatting conventions:
If you see
It means
italicized text
boldfaced text
Emphasized subjects.
Note:
Tip:
Warning:
monospaced text
Preface
xlix
Informatica Webzine
Preface
The site contains information on how to create, market, and support customer-oriented addon solutions based on Informaticas interoperability interfaces.
Informatica Corporation
2100 Seaport Blvd.
Redwood City, CA 94063
Phone: 866.563.6332 or 650.385.5800
Fax: 650.213.9489
Hours: 6 a.m. - 6 p.m. (PST/PDT)
email: support@informatica.com
Preface
li
lii
Preface
Chapter 1
Overview, 2
Running a Workflow, 7
System Resources, 24
Overview
You can register multiple PowerCenter Servers to a repository. The PowerCenter Server moves
data from sources to targets based on workflow and mapping metadata stored in a repository.
A workflow is a set of instructions that describes how and when to run tasks related to
extracting, transforming, and loading data. The PowerCenter Server runs workflow tasks
according to the conditional links connecting the tasks. You can run a task by placing it in a
workflow.
When you have multiple PowerCenter Servers, you can assign a server to start a workflow or a
session. This allows you to distribute the workload. You can increase performance by using a
server grid to balance the workload. A server grid is a server object that allows you to
automate the distribution of sessions across multiple servers. For more information about
server grids, see Working with Server Grids on page 446.
A session is a type of workflow task. A session is a set of instructions that describes how to
move data from sources to targets using a mapping. Other workflow tasks include commands,
decisions, timers, pre-session SQL commands, post-session SQL commands, and email
notification. For details on workflow tasks, see Working with Tasks on page 131.
Use the Designer to import source and target definitions into the repository and to build
mappings. A mapping is a set of source and target definitions linked by transformation
objects that define the rules for data transformation. Use the Workflow Manager to develop
and manage workflows. Use the Workflow Monitor to monitor workflows and stop the
PowerCenter Server.
When a workflow starts, the PowerCenter Server retrieves mapping, workflow, and session
metadata from the repository to extract data from the source, transform it, and load it into
the target. It also runs the tasks in the workflow. The PowerCenter Server uses Load Manager
and Data Transformation Manager (DTM) processes to run the workflow.
Figure 1-1 shows the processing path between the PowerCenter Server, repository, source, and
target:
Figure 1-1. PowerCenter Server and Data Movement
Source
Source
Data
PowerCenter
Server
Transformed
Data
Instructions
from
Metadata
Repository
Target
The PowerCenter Server can combine data from different platforms and source types. For
example, you can join data from a flat file and an Oracle source. The PowerCenter Server can
also load data to different platforms and target types. For example, you can load transformed
data to both a flat file target and a Microsoft SQL Server database in the same session.
Workflow Processes
The PowerCenter Server uses both process memory and system shared memory to perform
these tasks. It runs as a daemon on UNIX and a service on Windows. The PowerCenter Server
uses the following processes to run a workflow:
The Load Manager process. Starts and locks the workflow, runs workflow tasks, and starts
the DTM to run sessions.
The Data Transformation Manager (DTM) process. Performs session validations. Creates
threads to initialize the session, read, write, and transform data, and handle pre- and postsession operations.
Pipeline Partitioning
When running sessions, the PowerCenter Server can achieve high performance by
partitioning the pipeline and performing the extract, transformation, and load for each
partition in parallel. To accomplish this, use the following session and server configuration:
You can configure the partition type at most transformations in the pipeline. The
PowerCenter Server can partition data using round-robin, hash, key-range, database
partitioning, or pass-through partitioning.
For relational sources, the PowerCenter Server creates multiple database connections to a
single source and extracts a separate range of data for each connection. For XML or file
sources, the PowerCenter Server reads multiple files concurrently. The files must have the
same structure or hierarchy.
When the PowerCenter Server transforms the partitions concurrently, it passes data between
the partitions as needed to perform operations such as aggregation. When the PowerCenter
Server loads relational data, it creates multiple database connections to the target and loads
partitions of data concurrently. When the PowerCenter Server loads data to file targets, it
creates a separate file for each partition. You can choose to merge the target files.
Figure 1-2 shows a mapping that contains two partitions:
Figure 1-2. Partitioned Mapping
Source
Transformations
Target
Overview
For more information about pipeline partitioning, see Pipeline Partitioning on page 345.
PowerCenter Client
Repository Server
Repository Agent
PowerCenter
Server
TCP/IP
Native/
ODBC
Sources and
Targets
TCP/IP
Repository Server
Repository Agent
Native/ODL
PowerCenter
Repository
Table 1-1 summarizes the software you need to connect the PowerCenter Server to the
platform components, source databases, and target databases:
Table 1-1. PowerCenter Server Connectivity Requirements
PowerCenter Server Connection
Connectivity Requirement
PowerCenter Client
TCP/IP
TCP/IP
Repository Server
TCP/IP
Repository Agent
TCP/IP
Note: Both the Windows and UNIX versions of the PowerCenter Server can use ODBC drivers to connect to
databases. However, Informatica recommends using native drivers when possible to improve performance.
Running a Workflow
The PowerCenter Server uses the Load Manager process and the Data Transformation
Manager Process (DTM) to run the workflow and carry out workflow tasks.
When the PowerCenter Server runs a workflow, the Load Manager performs the following
tasks:
1.
2.
3.
4.
5.
6.
7.
8.
For details on the Load Manager process, see Load Manager Process on page 8.
When the PowerCenter Server runs a session, the DTM performs the following tasks:
1.
2.
3.
4.
Validates session code pages if data code page validation is enabled. Checks query
conversions if data code page validation is disabled.
5.
6.
7.
8.
Creates and runs mapping, reader, writer, and transformation threads to extract,
transform, and load data.
9.
Running a Workflow
Runs workflow tasks and evaluates the conditional links connecting tasks.
When you start the PowerCenter Server. When you start the PowerCenter Server, the
Load Manager launches and queries the repository for a list of workflows configured to run
on the PowerCenter Server.
When you save a workflow. When you save a workflow assigned to a PowerCenter Server
to the repository, the Load Manager adds the workflow to or removes the workflow from
the schedule queue.
For more information on workflow log files, see Log Files on page 455.
distributes a session to a worker server, the Load Manager on the worker server machine starts
a DTM process to run the session.
For more information about creating and using server grids, see Working with Server Grids
on page 446.
10
Creates and runs mapping, reader, writer, and transformation threads to extract,
transform, and load data.
11
For more information on session log files, see Log Files on page 455.
Source code pages. Must be a subset of the PowerCenter Server code page.
Target code pages. Must be a superset of the PowerCenter Server code page.
Repository Agent code page. Must be compatible with the PowerCenter Server code page.
Repository Server code page. Must be compatible with the PowerCenter Server code page.
Lookup database code page. Must be compatible with the PowerCenter Server code page.
Stored procedure database code page. Must be compatible with the PowerCenter Server
code page.
PowerCenter Server code page. Must be registered with the Workflow Manager.
If the DTM cannot validate the code pages, it writes the error into the session log and fails the
session. If you disable data code page validation, the PowerCenter Server does not enforce
code page compatibility.
The PowerCenter Server processes data internally using the UCS-2 character set. When you
disable data code page validation the PowerCenter Server verifies that the source query, target
query, lookup database query, and stored procedure call text convert from the source, target,
lookup, or stored procedure data code page to the UCS-2 character without loss of data in
conversion. If the PowerCenter Server encounters an error when converting data, it writes an
error message to the session log.
For more information about code pages, see Globalization Overview and Code Pages in
the Installation and Configuration Guide.
12
13
Thread Types
The master thread creates different types of threads for a session. The types of threads the
master thread creates depend on the following factors:
Table 1-2 lists the types of threads that the master thread can create:
Table 1-2. Processing Threads
14
Thread Type
Description
Mapping Thread
One thread for each session. Fetches session and mapping information.
Compiles the mapping. Cleans up after session execution.
Reader Thread
One thread for each partition for each source pipeline. Reads from sources.
Relational sources use relational reader threads, and file sources use file
reader threads.
Transformation Thread
Writer Thread
One thread for each partition, if a target exists in the source pipeline. Writes to
targets. Relational targets use relational writer threads, and file targets use file
writer threads.
Figure 1-4 shows the threads the master thread creates for a simple mapping that contains one
target load order group:
Figure 1-4. Thread Creation for a Simple Mapping
1 Reader Thread
1 Transformation Thread
1 Writer Thread
The mapping in Figure 1-4 contains a single partition. In this case, the master thread creates
one reader, one transformation, and one writer thread to process the data. The reader thread
controls how the PowerCenter Server extracts source data and passes it to the source qualifier,
the transformation thread controls how the PowerCenter Server processes the data, and the
writer thread controls how the PowerCenter Server loads data to the target.
When the pipeline contains only a source definition, source qualifier, and a target definition,
the data bypasses the transformation threads, proceeding directly from the reader buffers to
the writer. This type of pipeline is a pass-through pipeline.
Figure 1-5 shows the threads for a pass-through pipeline with one partition:
Figure 1-5. Thread Creation for a Pass-through Pipeline
1 Reader Thread
Bypassed
Transformation
Thread
1 Writer Thread
Note: The previous examples assume that each session contains a single partition. For
information on how partitions and partition points affect thread creation, see Threads and
Partitioning on page 16.
Reader Threads
The master thread creates reader threads to extract source data. The number of reader threads
depends on the partitioning information for each pipeline. The number of reader threads
equals the number of partitions. For more information, see Threads and Partitioning on
page 16.
The PowerCenter Server creates an SQL statement for each reader thread to extract data from
a relational source. For file sources, the PowerCenter Server can create multiple threads to
read a single source.
15
Transformation Threads
The master thread creates transformation threads to transform data received in buffers by the
reader thread, move the data from transformation to transformation, and create memory
caches when necessary. The number of transformation threads depends on the partitioning
information for each pipeline. For more information, see Threads and Partitioning on
page 16.
The transformation threads store fully-transformed data in a buffer drawn from the memory
pool for subsequent access by the writer thread.
If the pipeline contains a Rank, Joiner, Aggregator, Sorter, or a cached Lookup
transformation, the transformation thread uses cache memory until it reaches the configured
cache size limits. If the transformation thread requires more space, it pages to local cache files
to hold additional data.
When the PowerCenter Server runs in ASCII mode, the transformation threads pass character
data in single bytes. When the PowerCenter Server runs in Unicode mode, the transformation
threads use double bytes to move character data.
Writer Threads
The master thread creates writer threads to load target data. The number of writer threads
depends on the partitioning information for each pipeline. If the pipeline contains one
partition, the master thread creates one writer thread. If it contains multiple partitions, the
master thread creates multiple writer threads. For more information, see Threads and
Partitioning on page 16.
Each writer thread creates connections to the target databases to load data. If the target is a
file, each writer thread creates a separate file. You can configure the session to merge these
files.
If the target is relational, the writer thread takes data from buffers and commits it to session
targets. When loading targets, the writer commits data based on the commit interval in the
session properties. You can configure a session to commit data based on the number of source
rows read, the number of rows written to the target, or the number of rows that pass through
a transformation that generates transactions, such as a Transaction Control transformation.
16
The partition points. Controls the thread boundaries and pipeline stages.
The number of partitions. Controls the number of threads the master thread creates for
each pipeline stage.
The number of source pipelines. Controls the number of reader threads and the number
of transformation threads downstream from the sources.
Partition Points
By default, the Workflow Manager places partition points at certain transformations in each
source pipeline. Partition points mark the thread boundaries in a source pipeline and divide
the pipeline into stages. A pipeline stage is the section of a pipeline executed between any two
partition points. When you set a partition point at a transformation, the new pipeline stage
includes that transformation.
The PowerCenter Server can redistribute rows of data at partition points. For example, if you
place a partition point at a Sorter transformation and specify multiple partitions, the
PowerCenter Server redistributes rows among all partitions before the rows enter the Sorter
transformation. The rows stay in the same partitions until they reach the next partition point.
For more information, see Pipeline Partitioning on page 345.
By default, the Workflow Manager places a partition point at each of the following
transformations:
Source qualifier. Marks the reader stage. You cannot delete this partition point.
Target instance. Marks the writer stage. You cannot delete this partition point.
Figure 1-6 shows the pipeline stages for a mapping that contains an unsorted Aggregator
transformation:
Figure 1-6. Pipeline Stages in a Mapping With an Unsorted Aggregator Transformation
First Stage
Second Stage
Third Stage
Fourth Stage
The mapping in Figure 1-6 contains four stages by default. The partition point at the source
qualifier marks the boundary between the first (reader) and second (transformation) stages.
The partition point at the Aggregator transformation marks the boundary between the second
and third (transformation) stages. The partition point at the target instance marks the
boundary between the third (transformation) and the fourth (writer) stages.
If you use PowerCenter, you can add and delete partition points at other transformations. For
information on valid partition points, see Pipeline Partitioning on page 345. When you add
a partition point, you increase the number of pipeline stages by one. When you remove a
partition point, you decrease the number of pipeline stages by one.
17
Figure 1-7 shows the pipeline stages if you add a partition point at the Filter transformation:
Figure 1-7. Pipeline Stages in a Mapping with an Additional Partition Point
First Stage
Second Stage
Third Stage
Fourth Stage
Partition Points
Fifth Stage
Number of Partitions
The number of threads that process each pipeline stage depends on the number of partitions.
A partition is a pipeline stage that executes in a single reader, transformation, or writer thread.
The number of partitions in any pipeline stage equals the number of threads in that stage. If
you do not specify otherwise, the PowerCenter Server creates one partition in every pipeline
stage. If you purchased the partitioning option, you can configure multiple partitions for a
single pipeline stage.
You can specify the number of partitions at any partition point. The number of partitions
must be consistent across a pipeline. Therefore, if you define two partitions at the source
qualifier, the Workflow Manager sets two partitions at all transformations that are partition
points, and two partitions at the target instances.
For example, suppose you need to use the mapping in Figure 1-6 on page 17 to read data from
three flat files. To do this, you need to specify three partitions at the source qualifier. When
you do this, the Workflow Manager sets three partitions at all other partition points in the
pipeline.
The master thread creates three sets of threads. Figure 1-8 shows thread creation for a
mapping with three partitions:
Figure 1-8. Thread Creation for a Mapping with Three Partitions
18
(Second Stage)
6 Transformation Threads
(Third Stage)
3 Writer Threads
(Fourth Stage)
When you define three partitions across the mapping in Figure 1-8, the master thread creates
three threads at each pipeline stage, for a total of 12 threads. If you need to read data from
four file sources, you would specify four partitions at the source qualifier. The master thread
would create a fourth thread at each stage, for a total of 16 threads.
The PowerCenter Server processes partitions concurrently. When you run a session with
multiple partitions, the threads run as follows:
1.
The reader threads run concurrently to extract data from the source.
2.
3.
Note: Increasing the number of partitions or partition points increases the number of threads.
Therefore, increasing the number of partitions or partition points also increases the load on
the server machine. If the server machine contains ample CPU bandwidth, processing rows of
data in a session concurrently can increase session performance. However, if you create a large
number of partitions or partition points in a session that processes large amounts of data, you
can overload the system.
You add a partition point at the multiple input group transformation. The PowerCenter
Server creates a new pipeline stage and creates one transformation thread downstream
from the partition point. The PowerCenter Server creates one transformation thread
regardless of the number of output groups the transformation contains.
You do not add a partition point at the multiple input group transformation. The
PowerCenter Server maintains the same number of transformation threads downstream
from the partition point until it reaches the next partition point. However, for each
partition at the multiple input group transformation and its downstream transformations,
only one thread actively processes a row of data at any given time.
19
Figure 1-9 shows the thread creation for a mapping that contains a Joiner transformation
configured for sorted input:
Figure 1-9. Thread Creation with Joiner Transformation
1 Reader Thread
1 Transformation Thread
* Partition Points
*
*
*
1 Reader Thread
1 Transformation Thread
1 Writer Thread
Each source pipeline in Figure 1-9 contains a transformation thread. The Joiner
transformation is not a partition point, so both transformation threads can process data at the
Joiner and Expression transformations. However, only one transformation thread processes a
row at any given time. The target load order group contains one target, so the master thread
creates only one writer thread.
Suppose you add a partition point at the Joiner transformation in Figure 1-9. Figure 1-10
shows the mapping in Figure 1-9 with a partition point at the Joiner transformation:
Figure 1-10. Thread Creation with a Partition Point at a Joiner Transformation
1 Reader Thread
1 Transformation Thread
* Partition Points
*
*
1 Reader Thread
20
1 Transformation
Thread
1 Transformation
Thread Created After
the Partition Point
1 Writer Thread
Each source pipeline in Figure 1-10 contains a transformation thread. However, the
transformation threads end at the Joiner transformation. The Joiner transformation is a
partition point, so the master thread creates a new transformation thread starting at the
partition point.
Note: If any source qualifier in either Figure 1-9 or Figure 1-10 feeds a target other than the
target associated with the Joiner transformation, the master thread creates an additional writer
thread.
21
Reading source data. The PowerCenter Server reads the sources in a mapping at different
times depending on how you configure the sources, transformations, and targets in the
mapping. For more information on reading data, see Reading Source Data on page 22.
Blocking data. The PowerCenter Server sometimes blocks the flow of data at a
transformation in the mapping while it processes a row of data from a different source. For
more information on blocking data, see Blocking Data on page 23.
Block processing. The PowerCenter Server reads and processes a block of rows at a time.
For more information, see Block Processing on page 23.
T1
Target Load Order Group 1
T2
Pipeline B
C
22
T3
In the mapping shown in Figure 1-11, the PowerCenter Server processes the target load order
groups sequentially. It first processes Target Load Order Group 1 by reading Source A and
Source B at the same time. When it finishes processing Target Load Order Group 1, the
PowerCenter Server begins to process Target Load Order Group 2 by reading Source C.
Blocking Data
You can include multiple input group transformations in a mapping. The PowerCenter Server
passes data to the input groups concurrently. However, sometimes the transformation logic of
a multiple input group transformation requires that the PowerCenter Server block data on
one input group while it waits for a row from a different input group.
Blocking is the suspension of the data flow into an input group of a multiple input group
transformation. When the PowerCenter Server blocks data, it reads data from the source
connected to the input group until it fills the reader and transformation buffers. Once the
PowerCenter Server fills the buffers, it does not read more source rows until the
transformation logic allows the PowerCenter Server to stop blocking the source. When the
PowerCenter Server stops blocking a source, it processes the data in the buffers and continues
to read from the source.
The PowerCenter Server blocks data at one input group when it needs a specific row from a
different input group to perform the transformation logic. Once the PowerCenter Server
reads and processes the row it needs, it stops blocking the source.
Block Processing
The PowerCenter Server reads and processes a block of rows at a time. The number of rows in
the block depend on the row size and the DTM buffer size. In the following circumstances,
the PowerCenter Server processes one row in a block:
Log row errors. When you log row errors, the PowerCenter Server processes one row in a
block.
Connect CURRVAL. When you connect the CURRVAL port in a Sequence Generator
transformation, the session processes one row in a block. For optimal performance,
Informatica recommends that you connect only the NEXTVAL port in mappings. For
more information, see Sequence Generator Transformation in the Transformation Guide.
Configure array-based mode for Custom transformation procedure. When you configure
the data access mode for a Custom transformation procedure to be row-based, the
PowerCenter Server processes one row in a block. By default, the data access mode is arraybased, and the PowerCenter Server processes multiple rows in a block. For more
information, see Custom Transformation Functions in the Transformation Guide.
23
System Resources
To allocate system resources for read, transformation, and write processing, you should
understand how the PowerCenter Server allocates and uses system resources. The
PowerCenter Server uses the following system resources:
CPU
Cache memory
CPU Usage
The PowerCenter Server performs read, transformation, and write processing for a pipeline in
parallel. It can process multiple partitions of a pipeline within a session, and it can process
multiple sessions in parallel.
If you have a symmetric multi-processing (SMP) platform, you can use multiple CPUs to
concurrently process session data or partitions of data. This provides increased performance,
as true parallelism is achieved. On a single processor platform, these tasks share the CPU, so
there is no parallelism.
The PowerCenter Server can use multiple CPUs to process a session that contains multiple
partitions. The number of CPUs used depends on factors such as the number of partitions,
the number of threads, the number of available CPUs, and amount or resources required to
process the mapping.
For more information about partitioning, see Pipeline Partitioning on page 345.
24
You can configure three parameters in the PowerCenter Server configuration that control how
the Load Manager allocates shared memory to sessions and the number of sessions the
PowerCenter Server runs simultaneously:
MaxSessions. The maximum sessions parameter indicates the maximum number of session
slots available to the Load Manager at one time for running or repeating sessions. For
example, if you select the default MaxSessions of 10, the Load Manager allocates 10
session slots. This parameter helps you control the number of sessions the PowerCenter
Server can run simultaneously.
LMSharedMemory. Set the Load Manager shared memory parameter in conjunction with
the Maximum Sessions parameter to ensure that the Load Manager has enough memory
for each session. The Load Manager requires approximately 200,000 bytes of shared
memory for each session slot. The default setting is 2,000,000 bytes. For each increase of
10 sessions in the MaxSessions setting, you need to increase LMSharedMemory by
2,000,000 bytes.
System Resources
25
Cache Memory
The DTM process creates in-memory index and data caches to temporarily store data used by
the following transformations:
Rank transformation
Joiner transformation
You configure memory size for the index and data cache in the transformation properties. By
default, the PowerCenter Server allocates 1,000,000 bytes for the index cache and 2,000,000
bytes for the data cache.
By default, the DTM creates cache files in the directory configured for the $PMCacheDir
server variable. If the DTM requires more space than it allocates, it pages to local index and
data files.
The DTM process also creates an in-memory cache to store data used by a Sorter
transformation. You configure the memory size for the cache in the transformation properties.
By default, the PowerCenter Server allocates 8,388,608 bytes for the cache, and the DTM
creates cache files in the directory configured for the $PMTempDir server variable. If the
DTM requires more cache space than it allocates, it pages to local cache files.
When processing large amounts of data, the DTM may create multiple index and data files.
The session does not fail if it runs out of cache memory and pages to the cache files. It does
fail, however, if the local directory for cache files runs out of disk space.
After the session completes, the DTM releases memory used by the index and data caches and
deletes any index and data files. However, if the session is configured to perform incremental
aggregation or if a Lookup transformation is configured for a persistent lookup cache, the
DTM saves all index and data cache information to disk for the next session run.
For more information about caching, see Session Caches on page 613.
26
ASCII Mode
Use ASCII mode when all sources and targets are 7-bit ASCII or EBCDIC character sets. In
ASCII mode, the PowerCenter Server recognizes 7-bit ASCII and EBCDIC characters and
stores each character in a single byte. When the PowerCenter Server runs in ASCII mode, it
does not validate session code pages. It reads all character data as ASCII characters and does
not perform code page conversions. It also treats all numerics as U.S. Standard and all dates as
binary data.
Unicode Mode
Use Unicode mode when sources or targets use 8-bit or multibyte character sets and contain
character data. In Unicode mode, the PowerCenter Server recognizes multibyte character sets
as defined by supported code pages.
If you configure the PowerCenter Server to validate data code pages, the PowerCenter Server
validates source and target code page compatibility when you run a session. If you configure
the PowerCenter Server for relaxed data code page validation, the PowerCenter Server lifts
source and target compatibility restrictions.
When reading a source, the PowerCenter Server converts data from the source character set to
Unicode based on the source code page. The PowerCenter Server allots two bytes for each
character when moving data through a mapping. The PowerCenter Server converts data from
Unicode to the target character set based on the target code page when writing to the target. It
also treats all numerics as U.S. Standard and all dates as binary data.
The PowerCenter Server code page must be compatible with the code pages of the
PowerCenter Client.
For details on code page compatibility and validation, see Globalization Overview in the
Installation and Configuration Guide.
Code Pages and Data Movement Modes
27
Reject files
Control file
Post-session email
Output file
Cache files
When the PowerCenter Server on UNIX creates any file other than a recovery file, it sets the
file permissions according to the umask of the shell that starts the PowerCenter Server. For
example, when the umask of the shell that starts the PowerCenter Server is 022, the
PowerCenter Server creates files with rw-r--r-- permissions. To change the file permissions,
you must change the umask of the shell that starts the PowerCenter Server and then restart it.
The PowerCenter Server on UNIX creates recovery files with rw------- permissions.
The PowerCenter Server on Windows creates files with read and write permissions.
28
On UNIX, the default name of the PowerCenter Server log file is pmserver.log. You configure
the PowerCenter Server log file name with the LogFileName option in the PowerCenter
Server setup program.
On Windows, the PowerCenter Server logs status and error messages in the event log. Use the
Event Viewer to access those messages. You can also configure the PowerCenter Server on
Windows to write status and error messages to a file.
Some message codes are embedded within other codes, for example:
CMN_1050 [LM 2041 Received request to start session]
You can also configure the PowerCenter Server on Windows to write error messages to the
Application Log, which you can view with the Event Viewer. Messages sent from the
PowerCenter Server display PowerCenter in the Source column, the code prefix in the
Category column, and the code number in the Event column. However, since some message
codes are embedded within other codes, to ensure you are viewing the true message code, you
must view the text of the message.
Figure 1-12 shows a sample application log:
Figure 1-12. Event Viewer Application Log Message
29
Figure 1-13 shows how you can view the text of the message by selecting the message and
using the Enter key:
Figure 1-13. Application Log Message Detail
Error Messages
Using the listed error code, consult the Troubleshooting Guide for probable causes and actions
to correct the problem.
30
By cycle, saving the configured number of workflow logs, replacing the older logs with new
logs. You can use the server variable $PMWorkflowLogCount to set the number of logs the
PowerCenter Server archives for the workflow.
For more information about the workflow log, see Log Files on page 455.
By cycle, saving the configured number of session logs, replacing the older logs with new
logs. You can use the server variable $PMSessionLogCount to set the number of logs the
PowerCenter Server archives for the session.
For more information about the session log, see Log Files on page 455.
Session Details
When you run a session, the Workflow Manager creates session details that provide load
statistics for each target in the mapping. You can monitor session details during the session or
after the session completes. Session details include information such as table name, number of
rows written or rejected, and read and write throughput. You can view this information by
double-clicking the session in the Workflow Monitor.
For more information on session details file, see Monitoring Session Details on page 434.
31
Reject Files
By default, the PowerCenter Server creates a reject file for each target in the session. The
reject file contains rows of data that the writer does not write to targets.
The writer may reject a row in the following circumstances:
A field in the row was truncated or overflowed, and the target database is configured to
reject truncated or overflowed data.
By default, the PowerCenter Server saves the reject file in the directory entered for the server
variable $PMBadFileDir in the Workflow Manager, and names the reject file
target_table_name.bad.
Note: If you enable row error logging, the PowerCenter Server does not create a reject file.
For more information about the reject file, see Log Files on page 455.
Control File
When you run a session that uses an external loader, the PowerCenter Server creates a control
file and a target flat file. The control file contains information about the target flat file such as
data format and loading instructions for the external loader. The control file has an extension
of .ctl. You can view the control file and the target flat file in the target file directory (default:
$PMTargetFilesDir).
For more information about external loading and control files, see External Loading on
page 523.
Email
You can compose and send email messages by creating an Email task in the Workflow
Designer or Task Developer. You can place the Email task in a workflow, or you can associate
it with a session. The Email task allows you to automatically communicate information about
a workflow or session run to designated recipients.
Email tasks in the workflow send email depending on the conditional links connected to the
task. For post-session email, you can create two different messages, one to be sent if the
session completes successfully, the other if the session fails. You can also use variables to
generate information about the session name, status, and total rows loaded.
For example, if your database administrator wants to track how long a session takes to
complete, you can configure the session to send an email containing the time and date the
session starts and completes. Or, if you want to notify your Informatica administrator when a
session fails, you can configure the session to send an email only if it fails and attach the
session log to the email.
For more information, see Sending Email on page 319.
Indicator File
If you use a flat file as a target, you can configure the PowerCenter Server to create an
indicator file for target row type information. For each target row, the indicator file contains a
number to indicate whether the row was marked for insert, update, delete, or reject. The
PowerCenter Server names this file target_name.ind and stores it in the same directory as the
target file. For more information about configuring the PowerCenter Server, see the
Installation and Configuration Guide.
Output File
If the session writes to a target file, the PowerCenter Server creates the target file based on a
file target definition. By default, the PowerCenter Server names the target file based on the
target definition name. If a mapping contains multiple instances of the same target, the
PowerCenter Server names the target files based on the target instance name.
33
The PowerCenter Server creates this file in the PowerCenter Server variable directory,
$PMTargetFileDir, by default. For more information about working with target files, see
Working with Targets on page 233.
Cache Files
When the PowerCenter Server creates memory cache it also creates cache files. The
PowerCenter Server creates index and data cache files for the following transformations in a
mapping:
Aggregator transformation
Joiner transformation
Rank transformation
Lookup transformation
Sorter transformation
By default, the DTM creates the index and data files for Aggregator, Rank, Joiner, and
Lookup transformations in the directory configured for the $PMCacheDir server variable.
The PowerCenter Server names the index file PM*.idx, and the data file PM*.dat. The
PowerCenter Server creates the index and data files for the Sorter transformation in the
$PMTempDir server variable directory.
The PowerCenter Server writes to the cache files during the session in the following cases:
The mapping contains one or more Aggregator transformations configured without sorted
ports.
The DTM runs out of cache memory and pages to the local cache files. The DTM may
create multiple files when processing large amounts of data. The session fails if the local
directory runs out of disk space.
After the session completes, the DTM generally deletes the overflow index and data files. It
does not delete the cache files under the following circumstances:
34
The PowerCenter Server names these files PMAGG*.dat and PMAGG*.idx and saves them to
the cache directory.
For more information about incremental aggregation, see Using Incremental Aggregation
on page 573.
35
36
Chapter 2
Overview, 38
37
Overview
Before you can use the Workflow Manager to create workflows and sessions, you must
configure the Workflow Manager. You can configure display options and connection
information in the Workflow Manager. You must register a PowerCenter Server before you
can start it or create a workflow to run against it.
You can configure the following information in the Workflow Manager:
Configure Workflow Manager options. You can configure options such as grouping
sessions or docking and undocking windows. For details, see Customizing the Workflow
Manager Options on page 39.
Register PowerCenter Servers. Before you can start a PowerCenter Server, you must
register it with the repository. For details, see Registering the PowerCenter Server on
page 46.
Create a server grid. When you have multiple PowerCenter Servers registered to the same
repository you can create a server grid to balance workloads. For details, see Working with
Server Grids on page 446.
Create source and target database connections. Create connections to each source and
target database. You must create connections to a database before you can create a session
that accesses the database. For details, see Setting Up a Relational Database Connection
on page 53.
Create connections objects. Create connection objects in the repository when you define
database, FTP, and external loader connections. For details, see Configuring Connection
Object Permissions on page 51.
38
General. You can configure workspace options, display options, and other general options
on the General tab. For more information about the General tab, see Configuring
General Options on page 39.
Format. You can configure font, color, and other format options on the Format tab. For
more information about the Format tab, see Configuring Format Options on page 42.
Miscellaneous. You can configure Copy Wizard and Versioning options on the
Miscellaneous tab. For more information about the Miscellaneous tab, see Configuring
Miscellaneous Options on page 43.
Advanced. You can configure enhanced security for connection objects in the Advanced
tab. For more information about the Advanced tab, see Enabling Enhanced Security on
page 44.
39
Table 2-1 describes general options you can configure in the Workflow Manager:
Table 2-1. Workflow Manager General Options
40
Option
Description
Reload Tasks/
Workflows When
Opening a Folder
Reloads the last view of a tool when you open it. For example, if you have a workflow open
when you disconnect from a repository, select this option so that the same workflow displays
the next time you open the folder and Workflow Designer. Enabled by default.
Appears only when you select Reload tasks/workflows when opening a folder. Select this
option if you want the Workflow Manager to prompt you to reload tasks, workflows, and
worklets each time you open a folder. Disabled by default.
By default, when you drag the focus of the Overview window, the focus of the workbook
moves concurrently. When you select this option, the focus of the workspace does not
change until you release the mouse button. Disabled by default.
Arrange Workflows/
Worklets Vertically By
Default
By default, you can press F2 to edit objects directly in the workspace instead of opening the
Edit Task dialog box. Select this option so you can also click the object name in the
workspace to edit the object. Disabled by default.
Description
Opens the Edit Task dialog box when you create a task. By default, the Workflow Manager
creates the task in the workspace. If you do not enable this option, double-click the task to
open the Edit Task dialog box. Disabled by default.
Workspace File
Directory
The directory for workspace files created by the Workflow Manager. Workspace files
maintain the last task or workflow you saved. This directory should be local to the
PowerCenter Client to prevent file corruption or overwrites by multiple users. By default, the
Workflow Manager creates files in the PowerCenter Client installation directory.
Displays the name of the tool in the upper left corner of the workspace or workbook. Enabled
by default.
Shows the full name of a task when you select it. By default, the Workflow Manager
abbreviates the task name in the workspace. Enabled by default.
Shows the link condition in the workspace. If you do not enable this option, the Workflow
Manager abbreviates the link condition in the workspace. Enabled by default.
Launch Workflow
Monitor when Workflow
is Started
The Workflow Monitor launches when you start a workflow or a task. Enabled by default.
Receive Notifications
from Server
Allows you to receive notification messages from the Repository Server. The Repository
Server sends notification about actions performed on repository objects. Enabled by default.
For details, see Understanding the Repository in the Repository Guide.
41
Table 2-2 describes the format options for the Workflow Manager:
Table 2-2. Workflow Manager Format Options
42
Option
Description
Displays links as solid lines. By default, the Workflow Manager displays links as dotted lines.
Workspace Colors
Displays all items that you can customize in the selected tool. Select an item to change its
color.
Color
Font Categories
Select the Workflow Manager tool for which you want to customize the display font.
Change Font
Select to change the display font and language script for the Workflow Manager tool you
choose from the Categories menu.
Reset All
Table 2-3 describes the options for the Copy Wizard, Versioning, and Target Load Type:
Table 2-3. Workflow Manager Miscellaneous Options
Option
Description
Generates unique names for copied objects if you select the Rename option. For
example, if the workflow wf_Sales has the same as a workflow in the destination
folder, the Rename option generates the unique name wf_Sales1. Enabled by
default.
Uses the object with the same name in the destination folder if you select the
Choose option.
Displays the Check Out icon when an object has been checked out. Enabled by
default.
43
Description
Reset All
Resets all Copy Wizard and Versioning options to their default values.
Sets default load type for sessions. You can choose normal or bulk loading.
Any change you make takes effect after you restart the Workflow Manager.
You can override this setting in the session properties. Default is Bulk.
For more information on normal and bulk loading, see Table A-15 on page 697.
Owner
Read/Write/Execute
Owner Group
Read/Execute
World
No permissions
If you do not enable enhanced security, the Workflow Manager assigns Read, Write, and
Execute permissions to all users or groups for the connection.
Enabling enhanced security does not lock the restricted access settings for connection objects.
You can continue to change the permissions for connection objects after enabling enhanced
security.
If you delete the Owner from the repository, the Workflow Manager automatically assigns
ownership of the object to Administrator.
To enable enhanced security for connection objects:
44
1.
Choose Tools-Options.
2.
3.
4.
Click OK.
45
Host name.
Code page identifying the character set associated with the PowerCenter Server.
Default directories you want the PowerCenter Server to use for workflow files and caches.
You can perform the following registration tasks for a PowerCenter Server:
Edit a PowerCenter Server. When you edit a PowerCenter Server, all workflows and
sessions using that PowerCenter Server use the updated server connection information,
including the updated code page settings. You do not need to restart the Workflow
Manager to use the updated information.
Delete a PowerCenter Server. When you delete a PowerCenter Server, you must assign
another PowerCenter Server for the workflows and sessions using the deleted server before
you can run the workflow. To assign a PowerCenter Server to a workflow or to a session,
choose Connections-Assign.
Server Variables
You can define server variables for each PowerCenter Server you register. Some server variables
define the path and directories for workflow output files and caches. By default, the
PowerCenter Server places output files in these directories when you run a workflow. Other
server variables define server attributes such as log file count. In a server grid, you must use
the same server variables for each server.
The installation process creates directories in the location where you install the PowerCenter
Server. To use these directories as the default location for the session output files, you must
first set the server variable $PMRootDir to define the path to the directories.
46
By using server variables, you simplify the process of changing the PowerCenter Server that
runs a workflow. If each workflow in a folder uses server variables, then when you copy the
folder to a production repository, the PowerCenter Server in production can run the workflow
using the server variables defined with the PowerCenter server running against the test
repository. The PowerCenter Server reads and writes the files to the directories in the
$PMRootDir path. To ensure a workflow successfully completes, relocate any necessary file
source or incremental aggregation file to the default directories of the new PowerCenter
Server.
Table 2-5 lists the server variables you configure when you register a PowerCenter Server:
Table 2-5. Server Variables
Server Variable
Required/
Optional
$PMRootDir
Required
$PMSessionLogDir
Required
$PMBadFileDir
Required
$PMCacheDir
Required
Default directory for the index and data cache files. Defaults to
$PMRootDir/Cache. To avoid performance problems, always use a drive
local to the PowerCenter Server for the cache directory. Do not use a
mapped or mounted drive for cache files.
$PMTargetFileDir
Required
$PMSourceFileDir
Required
$PMExtProcDir
Required
$PMTempDir
Required
$PMSuccessEmailUser
Optional
$PMFailureEmailUser
Optional
Email address to receive post-session email when the session fails. The
default value is an empty string. Use to address post-session email.
$PMSessionLogCount
Optional
Number of session logs the PowerCenter Server archives for the session.
Use to archive session logs. For details, see Viewing Session Logs on
page 474. Defaults to 0.
$PMSessionErrorThreshold
Optional
Description
47
Required/
Optional
$PMWorkflowLogDir
Required
$PMWorkflowLogCount
Optional
$PMLookupFileDir
Optional
Description
3.
48
4.
5.
6.
If you do not know the IP address, enter the host name and use the Resolve Server button
to resolve the IP address. You can also enter the IP address in the Host Name/IP Address
field and use the Resolve Server button to resolve the host name.
The Workflow Manager can only resolve the host name or IP address if you enter the
information in the Host Name/IP Address field.
The Workflow Manager also resolves the host name or IP address when you click OK.
Table 2-6 describes the settings required to register a PowerCenter Server using TCP/IP:
Table 2-6. TCP/IP Settings to Register a Server
TCP/IP Option
Required/
Optional
Server Name
Required
Host Name or IP
address
Required
Resolved IP Address
n/a (read-only)
Port Number
Required
Description
49
7.
TCP/IP Option
Required/
Optional
Timeout
Required
Code Page
Required
Description
For $PMRootDir, enter a valid root directory for the PowerCenter Server platform.
Informatica recommends using the PowerCenter Server installation directory as the root
directory because the PowerCenter Server installation creates the default server directories
there. If you enter a different root directory, make sure to create the necessary directories.
8.
9.
Click OK.
The new PowerCenter Server appears in the Navigator below the repository.
To delete a server:
50
1.
2.
3.
Click Delete.
4.
Click OK.
Relational. Database connections for relational source or target databases. For more
information about relational database connections, see Setting Up a Relational Database
Connection on page 53.
Queue. Database connections for message queues. For more information about message
queues, see the PowerCenter Connect for IBM MQSeries User and Administrator Guide.
FTP. Connection to access source or target files using File Transfer Protocol (FTP). For
more information about using FTP, see Using FTP on page 559.
Application. Database connection to access databases such as SAP R/3 and PeopleSoft. For
more information, see your PowerCenter Connect documentation.
Loader. Connection to access target databases using external loaders. For more
information about using external loaders, see External Loading on page 523.
With correct permissions, you can access these objects from all folders in the repository and
use them in any session.
Read. View the connection object in the Workflow Manager and Repository Manager.
When you have read permission, you can perform tasks in which you view, copy, or edit
repository objects associated with the connection object.
For information on tasks you can perform with user privileges, folder permissions, and
connection object permissions, see Repository Security in the Repository Guide.
To manage connection permissions, you must have Super User privileges or be the owner of
the connection. If you do not have the privilege to manage connection permissions, the
Permissions dialog box is read-only. You can change the owner of the object, add or remove
users and groups in the permissions list, and change the permissions for each user or group.
51
To view or delete a connection, you must have at least read permission for the connection. To
edit a connection, you must have read and write permissions for the connection.
You add permissions from the Connection Browser dialog box.
To configure permissions for connection objects:
1.
Open the Connection Browser dialog box for the connection object. For example, choose
Connections-Relational to open the Connection Browser dialog box for a relational
database connection.
2.
Select the connection object you want to configure in the Connection Browser dialog
box.
3.
52
4.
5.
Add user or group you want to assign permissions for the connection, and click OK.
Database username. Name of a user who has the appropriate database permissions to read
from and write to the database.
Some database drivers, such as ISG Navigator, do not allow user names and passwords. Since
the Workflow Manager requires a database user name and password, PowerCenter provides
two reserved words to register databases that do not allow user names and passwords:
PmNullUser
PmNullPasswd
Use the PmNullUser user name if you are using Oracle OS Authentication. Oracle OS
Authentication allows you to log on to an Oracle database if you have a logon to the operating
system. You do not need to know a database user name and password. PowerCenter uses
Oracle OS Authentication when the connection user name is PmNullUser and the connection
is for an Oracle database.
You can change connection information at any time. If you edit a Workflow Manager
connection used by a workflow, the PowerCenter Server uses the updated connection
information the next time the workflow runs. You might use this functionality when moving
from test to production.
Tip: If you edit a database connection, all sessions using the named connection then use the
updated connection.
To create a database connection, you must have one of the following privileges:
Super User
53
Table 2-7 lists the native connect string syntax for each supported database when you create
or update connections:
Table 2-7. Native Connect String Syntax
Database
Example
IBM DB2
dbname
mydatabase
Informix
dbname@servername
mydatabase@informix
servername@dbname
sqlserver@mydatabase
Oracle
oracle.world
Sybase
servername@dbname
sambrown@mydatabase
Teradata*
ODBC_data_source_name or
ODBC_data_source_name@db_name or
ODBC_data_source_name@db_user_name
TeradataODBC
TeradataODBC@mydatabase
TeradataODBC@jsmith
The target database code page must be a superset of the source database code page and the
PowerCenter Server code page.
The source database code page must be a subset of the target database code page and the
PowerCenter Server code page.
For example, if the source database code page is 7-bit ASCII and the PowerCenter Server code
page is Latin 1, the target database code page must be Latin 1, which is a superset of 7-bit
ASCII.
Table 2-8 summarizes code page compatibility between the source and target code pages when
you configure the PowerCenter Client and PowerCenter Server for data code page validation:
Table 2-8. Source and Target Code Page Compatibility
54
Source
Target
When you change the code page in a database connection, you must choose one that is
compatible with the previous code page. If the code pages are incompatible, the Workflow
Manager invalidates all sessions using that database connection.
If you configure the PowerCenter Client and PowerCenter Server for relaxed data code page
validation, you can select any supported code page for source and target database connections.
If you are familiar with your data and are confident that it will convert safely from one code
page to another, you can run sessions with incompatible source and target data code pages. It
is your responsibility to ensure your data will convert properly.
For details, see Globalization Overview and Code Pages in the Installation and
Configuration Guide.
You can enter any SQL command that is valid in the database associated with the
connection object. The PowerCenter Server does not allow nested comments, even though
the database might.
When you enter SQL in the SQL Editor, you manually type in the SQL statements.
The PowerCenter Server ignores semi-colons within single quotes, double quotes, or
within /* ...*/.
If you need to use a semi-colon outside of quotes or comments, you can escape it with a
back slash (\).
You can configure the table owner name using sqlid in the environment SQL for a DB2
connection. However, the table owner name in the target instance overrides the SET sqlid
statement in environment SQL. To use the table owner name specified in the SET sqlid
statement, do not enter a name in the target name prefix.
55
2.
Choose Connections-Relational.
A dialog box appears, listing all the registered source and target database connections.
56
3.
4.
Click New.
5.
For relational database connections, enter the connection information listed in Table 2-9:
Table 2-9. Relational Database Connection Information
Database Connection
Option
Required/
Optional
Name
Required
Type
Required
Type of database.
User Name
Required
Password
Required
Description
57
6.
Database Connection
Option
Required/
Optional
Connect String
Code Page
Required
Description
For each type of relational database connection, enter the attributes listed in Table 2-10:
Table 2-10. Relational Database Connection Attributes
7.
Attribute Name
Relational Database
Type
Rollback Segment
Oracle
Oracle
Environment SQL
Database Name
Teradata
Server Name
Packet Size
Domain Name
Description
Click OK.
The new database connection appears in the Connection Browser list.
8.
58
9.
Choose Connections-Relational.
The Relational Connection Browser appears.
2.
59
3.
4.
5.
Click OK.
6.
The Workflow Manager retains connection properties that apply to the relational
database type.
If a required connection property does not exist, the Workflow Manager displays a
warning message.
60
7.
8.
9.
If the copied connection is invalid, click the Edit button to enter required connection
properties.
10.
61
Source connection
Target connection
If the repository contains both relational and application connections with the same name,
the Workflow Manager only replaces the relational connection when you specified the
connection type as relational in all locations in the repository.
For example, you have a relational and an application source, each called ITEMS. In one
session, you specified the name ITEMS for a source connection instead of Relational:ITEMS.
When you replace the relational connection ITEMS with another relational connection, the
Workflow Manager does not replace any relational connection in the repository because it
cannot determine the connection type for the source connection entered as ITEMS.
The PowerCenter Server uses the updated connection information the next time the workflow
runs.
To replace connections in the Workflow Manager, you must have Super User privilege.
You must first close all folders before replacing a relational database connection.
To replace a relational database connection:
62
1.
2.
Choose Connections-Replace.
3.
4.
In the From list, choose a relational database connection you want to replace.
5.
6.
Click Replace.
All sessions in the repository that use the From connection now use the connection you
choose in the To list.
63
64
Chapter 3
Overview, 66
65
Overview
In the Workflow Manager, you define a set of instructions called a workflow to execute
mappings you build in the Designer. Generally, a workflow contains a session and any other
task you may want to perform when you execute a session. Tasks can include a session, email
notification, or scheduling information. You connect each task with links in the workflow.
You can also create a worklet in the Workflow Manager. A worklet is an object that groups a
set of tasks. A worklet is similar to a workflow, but without scheduling information. You can
execute a batch of worklets inside a workflow.
After you create a workflow, you run the workflow in the Workflow Manager and monitor it
in the Workflow Monitor. For details on the Workflow Monitor, see Monitoring Workflows
on page 401.
Task Developer. Use the Task Developer to create tasks you want to execute in the
workflow.
Workflow Designer. Use the Workflow Designer to create a workflow by connecting tasks
with links. You can also create tasks in the Workflow Designer as you develop the
workflow.
Figure 3-1 shows what a workflow might look like if you want to run a session, perform a
shell command after the session completes, and then stop the workflow:
Figure 3-1. Sample Workflow
Workflow Tasks
You can create the following types of tasks in the Workflow Manager:
66
Assignment. Assigns a value to a workflow variable. For details, see Working with the
Assignment Task on page 140.
Command. Specifies a shell command to run during the workflow. For details, see Using
Workflow Variables on page 103.
Control. Stops or aborts the workflow. For details on the Control task, see Stopping or
Aborting the Workflow on page 129.
Decision. Specifies a condition to evaluate. For details, see Working with the Decision
Task on page 149.
Email. Sends email during the workflow. For details on the Email task, see Sending
Email on page 319.
Event-Raise. Notifies the Event-Wait task that an event has occurred. For details, see
Working with Event Tasks on page 153.
Event-Wait. Waits for an event to occur before executing the next task. For details, see
Working with Event Tasks on page 153.
Session. Runs a mapping you create in the Designer. For details on the Session task, see
Working with Sessions on page 173.
Timer. Waits for a timed event to trigger. For details, see Scheduling a Workflow on
page 112.
Navigator. Allows you to connect to and work in multiple repositories and folders. In the
Navigator, the Workflow Manager displays a red icon over invalid objects.
Workspace. Allows you to create, edit, and view tasks, workflows, and worklets.
Output. Contains tabs to display different types of output messages. The Output window
contains the following tabs:
Save. Displays messages when you save a workflow, worklet, or task. The Save tab
displays a validation summary when you save a workflow or a worklet.
Fetch Log. Displays messages when the Workflow Manager fetches objects from the
repository.
Overview. An optional window that allows you to easily view large workflows in the
workspace. Outlines the visible area in the workspace and highlights selected objects in
color. Choose View-Overview Window to display this window.
You can view a list of open windows and switch from one window to another in the Workflow
Manager. To view the list of open windows, choose Window-Windows.
The Workflow Manager also displays a status bar that shows the status of the operation you
perform.
Overview
67
Workspace
Overview
Output
Status Bar
68
Customize windows.
Customize toolbars.
Display a window. From the menu, choose View. Then select the window you want to
open.
Close a window. Click the small x in the upper right corner of the window.
Dock or undock a window. Double-click the title bar, or drag the title bar toward or away
from the workspace.
Using Toolbars
The Workflow Manager can display the following toolbars to help you select tools and
perform operations quickly:
Standard. Contains buttons to connect to and disconnect from repositories and folders,
toggle windows, zoom in and out, pan the workspace, and find objects.
Repository. Contains buttons to connect to, disconnect from, and add repositories, open
folders, close tools, save changes to repositories, and print the workspace.
View. Contains buttons to customize toolbars, toggle the status bar and windows, toggle
full-screen view, create a new workbook, and view the properties of objects.
Layout. Contains buttons to arrange and restore objects in the workspace, find objects,
zoom in and out, and pan the workspace.
Run. Contains buttons to schedule the workflow, start the workflow, or start a task.
69
For details on how to perform these toolbar operations, see Using the Designer in the
Designer Guide.
Find in Workspace. Searches multiple items at once and returns a list of all task names,
link conditions, event names, or variable names that contain the search string.
Find Next. Searches through items one at a time and highlights the first task, link, event,
variable, or text string that contains the search string. If you repeat the search, the
Workflow Manager highlights the next item that contains the search string.
In any Workflow Manager tool, click the Find in Workspace toolbar button or choose
Edit-Find in Workspace.
The Find in Workspace dialog box opens:
2.
Choose whether you want to search for tasks, links, variables, or events.
3.
4.
Specify whether or not to match whole words and whether or not to perform a casesensitive search.
5.
6.
70
Click Close.
To search for a task, link, event, or variable, open the appropriate Workflow Manager
tool and click a task, link, or event. To search for text in the Output window, click the
appropriate tab in the Output window.
2.
3.
Choose Edit-Find Next, click the Find Next button on the toolbar, or press Enter or F3
to search for the string.
The Workflow Manager highlights the first task name, link condition, event name, or
variable name that contains the search string, or the first string in the Output window
that matches the search string.
4.
Zoom Point In/Out by 10%. Uses a point you select as the center point and increases or
decreases the magnification by 10% increments.
Zoom Rectangle. Increases the current magnification of a rectangular area you select.
Degree of magnification depends upon the size of the area you select, workspace size, and
current magnification.
71
Zoom Percent. Sets the zoom level to the percent you choose while maintaining the center
of the view.
To maximize the size of the workspace window, choose View-Full Screen. To go back to
normal view, click the Close Full Screen button or press Esc.
To pan the workspace, click Layout-Pan or click the Pan button on the toolbar. Drag the
focus of the workspace window and release the mouse button when it is in the appropriate
position. Double-click the workspace to stop panning.
72
Rename an object.
To edit any repository object, you must first add a repository in the Navigator so you can
access the repository object. To add a repository in the Navigator, choose Repository-Add or
click the Add Repository button on the Repository toolbar. Enter the repository name and
user name and click OK.
73
Checking In Objects
You commit changes to the repository by checking in objects. When you check in an object,
the repository creates a new version of the object and assigns it a version number. The
repository increments the version number by one each time it creates a new version.
You can check in an object from the Workflow Manager workspace. To do this, select the
object and choose Versioning-Check in.
You can check in an object when you review the results of the following tasks:
View object history. You can check in an object from the View History window when you
view the history of an object.
View checkouts. You can check in an object from the View Checkouts window when you
search for checked out objects.
View query results. You can check in an object from the Query Results window when you
search for object dependencies or run an object query.
To check in an object, select the object or objects and choose Versioning-Check in.
Enter text into the comment field in the Check In dialog box.
74
When you check in an object, the repository creates a new version of the object and
increments the version number by one.
75
Track repository objects during development. You can add Label, User, Last saved, or
Comments parameters to queries to track objects during development. For more
information about creating object queries, see Grouping Versioned Objects in the
Repository Guide.
Associate a query with a deployment group. When you create a dynamic deployment
group, you can associate an object query with it. For more information about working
with deployment groups, see Copying Folders and Deployment Groups in the Repository
Guide.
Edit a query.
Delete a query.
Create a query.
Configure permissions.
Run a query.
From the Query Browser, you can create, edit, and delete queries. You can also configure
permissions for each query from the Query Browser. You can run any queries for which you
have read permissions from the Query Browser.
For information about working with object queries, see Grouping Versioned Objects in the
Repository Guide.
76
from an XML file. The Import Wizard provides the same options to resolve conflicts as the
Copy Wizard. For details, see Exporting and Importing Objects in the Repository Guide.
Copying Sessions
When you copy a Session task, the Copy Wizard looks for the database connection and
associated mapping in the destination folder. If the mapping or connection does not exist in
the destination folder, you can select a new mapping or connection. If the destination folder
does not contain any mapping, you must first copy a mapping to the destination folder in the
Designer before you can copy the session.
When you copy a session that has mapping variable values saved in the repository, the
Workflow Manager either copies or retains the saved variable values.
77
2.
Select a segment by highlighting each task you want to copy. You can select multiple
reusable or non-reusable objects. You can also select segments by dragging the pointer in
a rectangle around objects in the workspace.
3.
4.
Open the workflow or worklet into which you want to paste the segment. You can also
copy the object into the Workflow or Worklet Designer workspace.
5.
The Copy Wizard opens, and notifies you if it finds copy conflicts.
Note: You can copy individual non-reusable tasks by selecting the individual task and
78
Tasks
Sessions
Worklets
Workflows
You can also compare instances of the same type. For example, if the workflows you compare
contain worklet instances with the same name, you can compare the instances to see if they
differ. The Workflow Manager also allows you to compare the following instances and
attributes:
Instances of sessions and tasks in a workflow or worklet comparison. For example, when
you compare workflows, you can compare task instances that have the same name.
The attributes of instances of the same type within a mapping comparison. For example,
when you compare flat file sources, you can compare attributes, such as file type (delimited
or fixed), delimiters, escape characters, and optional quotes.
You can compare schedulers and session configuration objects in the Repository Manager. You
cannot compare objects of different types. For example, you cannot compare an Email task
with a Session task.
When you compare objects, the Workflow Manager displays the results in the Diff Tool
window. The Diff Tool output contains different nodes for different types of objects.
When you import Workflow Manager objects, you can compare object conflicts. For more
information, see Exporting and Importing Objects in the Repository Guide.
79
Open the folders that contain the objects you want to compare.
2.
3.
4.
5.
Click Compare.
Tip: You can also compare objects from the Navigator or workspace. In the Navigator,
select the objects, right-click and choose Compare Objects. In the workspace, select the
objects, right-click and choose Compare Objects.
80
Differences between
objects are
highlighted and the
nodes are flagged.
Differences
between object
properties are
marked.
Displays the
properties of the
node you select.
You can further compare differences between object properties by clicking the Compare
Further icon or by right-clicking the differences.
6.
If you want to save the comparison as a text or HTML file, choose File-Save to File.
Comparing Repository Objects
81
Sessions
Workflows
Worklets
You can create both reusable and non-reusable metadata extensions. You associate reusable
metadata extensions with all repository objects of a certain type such as all sessions or all
worklets. You associate non-reusable metadata extensions with a single repository object such
as one workflow. For more information about metadata extensions, see Metadata Extensions
in the Repository Guide.
To create, edit, and delete user-defined metadata extensions in the Workflow Manager, you
must have read and write permissions on the folder.
82
1.
2.
3.
4.
User-Defined
Metadata
Extensions
This tab lists the existing user-defined and vendor-defined metadata extensions. Userdefined metadata extensions appear in the User Defined Metadata Domain. If they exist,
vendor-defined metadata extensions appear in their own domains.
5.
6.
Required/
Optional
Extension Name
Required
Datatype
Required
Precision
Description
83
7.
Field
Required/
Optional
Value
Optional
An optional value.
For a numeric metadata extension, the value must be an integer
between -2,147,483,647 and 2,147,483,647.
For a boolean metadata extension, choose true or false.
For a string metadata extension, click the Open button in the Value
field to enter a value of more than one line, up to 2,147,483,647
bytes.
Reusable
Required
UnOverride
Optional
Description
Optional
Description
Click OK.
84
85
Keyboard Shortcuts
When editing a repository object or maneuvering around the Workflow Manager, use the
following Keyboard shortcuts to help you complete different operations quickly.
Table 3-2 lists the Workflow Manager keyboard shortcuts for editing a repository object:
Table 3-2. Workflow Manager Keyboard Shortcuts
To
Press
Esc
Space Bar
Ctrl+C
Ctrl+X
Ctrl+F
Ctrl+directional arrows
Ctrl+V
F2
Table 3-3 lists the Workflow Manager keyboard shortcuts for navigating in the workspace:
Table 3-3. Keyboard Shortcuts for Navigating the Workspace
86
To
Press
Create links.
F2
Tab
Ctrl+mouse click
Chapter 4
Overview, 88
Developing Workflows, 91
87
Overview
A workflow is a set of instructions that tells the PowerCenter Server how to execute tasks such
as sessions, email notifications, and shell commands. After you create tasks in the Task
Developer and Workflow Designer, you connect the tasks with links to create a workflow.
In the Workflow Designer, you can specify conditional links and use workflow variables to
create branches in the workflow. The Workflow Manager also provides Event-Wait and EventRaise tasks so you can control the sequence of task execution in the workflow. You can also
create worklets and nest them inside the workflow.
Every workflow contains a Start task, which represents the beginning of the workflow.
Figure 4-1 shows a sample workflow:
Figure 4-1. Sample Workflow
Workflow Tasks
Start Task
Session Task
Assignment Task
Link
Command Task
88
After you create a workflow, select a PowerCenter Server to run the workflow. You can then
start the workflow using the Workflow Manager, Workflow Monitor, or pmcmd.
Use the Workflow Monitor to see the progress of a workflow during its run. The Workflow
Monitor can also show the history of a workflow. For more information about the Workflow
Monitor, see Monitoring Workflows on page 401.
Use the following guidelines when you develop a workflow:
1.
Create a new workflow. Create a new workflow in the Workflow Designer. For details on
creating a new workflow, see Creating a New Workflow on page 91.
2.
Add tasks in the workflow. You might have already created tasks in the Task Developer.
Or, you can add tasks to the workflow as you develop the workflow in the Workflow
Designer. For details on workflow tasks, see Working with Tasks on page 131.
3.
Connect tasks with links. After you add tasks in the workflow, connect them with links
to specify the order of execution in the workflow. For details on links, see Working with
Links on page 92.
4.
Specify conditions for each link. You can specify conditions on the links to create
branches and dependencies. For details, see Working with Links on page 92.
5.
Validate workflow. Validate the workflow in the Workflow Designer to identify errors.
For details on validation rules, see Validating a Workflow on page 119.
6.
Save workflow. When you save the workflow, the Workflow Manager validates the
workflow and updates the repository.
7.
Run workflow. In the workflow properties, select a PowerCenter Server to run the
workflow. Run the workflow from the Workflow Manager, Workflow Monitor, or
pmcmd. You can monitor the workflow in the Workflow Monitor. For details on starting
a workflow, see Running the Workflow on page 122.
For a complete list of workflow properties, see Workflow Properties Reference on page 721.
Overview
89
Workflow Privileges
You need the one of the following privileges to create a workflow:
Use Workflow Manager privilege with read and write folder permissions
You need one of the following privileges to run, schedule, and monitor the workflow:
90
Developing Workflows
The first step to develop a workflow is to create a new workflow in the Workflow Designer. A
workflow must contain a Start task. The Start task represents the beginning of a workflow.
When you create a workflow, the Workflow Designer creates a Start task and adds it to the
workflow. You cannot delete the Start task.
After you create a new workflow, the next step is to add tasks to the workflow. The Workflow
Manager includes tasks such as the Session task, the Command task, and the Email task so
you can design your workflow.
Finally, you connect workflow tasks with links to specify the order of execution in the
workflow. You can add conditions to links.
2.
Choose Workflows-Create.
3.
4.
Click OK.
The Workflow Designer creates a Start task in the new workflow.
For information on using the Workflow Wizard, see Using the Workflow Wizard on
page 99.
Developing Workflows
91
2.
3.
4.
5.
Click OK.
The Workflow Designer creates a workflow for the session.
92
The Workflow Manager does not allow you to create a workflow that contains a loop, such as
the loop shown in Figure 4-4. Figure 4-4 shows a loop where the three sessions may be run
multiple times:
Figure 4-4. Example of a Loop
Use the following procedure to link tasks in the Workflow Designer or the Worklet Designer.
To link two tasks:
1.
2.
In the workspace, click the first task you want to connect and drag it to the second task.
3.
If you have a number of tasks that you want to link concurrently, you may not wish to
connect each link manually. To quickly link tasks concurrently, use the following procedure.
To link several tasks concurrently:
1.
2.
93
4.
A link appears between the first task you selected and each task you added. The first task
you selected links to each task concurrently.
If you have a number of tasks that you want to link sequentially, you may not wish to connect
each link manually. To quickly link tasks sequentially, use the following procedure.
To link several tasks sequentially:
1.
2.
Ctrl-click the next task you want to connect. Continue to add tasks in the order you want
them to run.
3.
4.
Links appear in sequential order between the first task and each subsequent task you
added.
94
Figure 4-5 shows how to set the link condition using the target failed rows variable for
S_STORES_CA:
Figure 4-5. Setting Link Condition
After you specify the link condition in the Expression Editor, the Workflow Manager validates
the link condition and displays it next to the link in the workflow.
Figure 4-6 shows the link condition displayed in the workspace:
Figure 4-6. Displaying Link Condition in the Workflow
Link Condition
In the Workflow Designer workspace, double-click the link you want to specify.
or
Right-click the link and choose Edit. The Expression Editor displays.
Developing Workflows
95
2.
3.
Validate the expression using the Validate button. The Workflow Manager displays error
messages in the Output window.
Tip: Click and drag the end point of a link to move it from one task to another without losing
Link conditions
Decision task
Assignment task
The Expression Editor displays system variables, user-defined, and pre-defined workflow
variables such as $Session.status. For details on workflow variables, see Using Workflow
Variables on page 103.
The Expression Editor also displays a list of functions. PowerCenter uses a SQL-like language
that contains many functions designed to handle common expressions. For example, you can
use the ABS function to find the absolute value. For a complete list of functions, see the
Transformation Language Reference.
96
Adding Comments
The Expression Editor also allows you to add comments using -- or // comment indicators.
You can use comments to give descriptive information about the expression, or you can
specify a valid URL to access business documentation about the expression.
For examples on adding comments to expressions, see The Transformation Language in the
Transformation Language Reference.
Validating Expressions
You can use the Validate button to validate an expression. If you do not validate an
expression, the Workflow Manager validates it when you close the Expression Editor. You
cannot run a workflow with invalid expressions.
Expressions in link conditions and Decision task conditions must evaluate to a numerical
value. Workflow variables used in expressions must exist in the workflow.
Deleting a Workflow
You may decide to delete a workflow that you no longer use. When you delete a workflow,
you delete all non-reusable tasks and reusable task instances associated with the workflow.
Reusable tasks used in the workflow remain in the folder when you delete the workflow.
If you delete a workflow that is running, the PowerCenter Server aborts the workflow. If you
delete a workflow that is scheduled to run, the PowerCenter Server removes the workflow
from the schedule.
You can delete a workflow in the Navigator window, or you can delete the workflow currently
displayed in the Workflow Designer workspace.
To delete a workflow from the Navigator window, open the folder, select the workflow and
press the Delete key.
Developing Workflows
97
Editing a Workflow
When you edit a workflow, the repository updates the workflow information when you save
the workflow. If a workflow is running when you make edits, the PowerCenter Server uses the
updated information the next time you run the workflow.
In the Worklet Designer or Workflow Designer, right-click a task and choose Highlight
Path.
2.
In the Worklet Designer or Workflow Designer, select all links you want to delete.
Tip: You can use the mouse to click and drag the selection, or you can Ctrl-click the tasks
and links.
2.
98
2.
Create a session.
3.
In the Workflow Manager, open the folder containing the mapping you want to use in
the workflow.
2.
3.
Choose Workflows-Wizard.
99
4.
5.
6.
Choose the PowerCenter Server to run the workflow, and click Next.
In the second step of the Workflow Wizard, select a valid mapping and click the right
arrow button.
The Workflow Wizard creates a Session task in the right pane using the selected mapping
and names it s_MappingName by default.
100
2.
You can select additional mappings to create more Session tasks in the workflow.
When you add multiple mappings to the list, the Workflow Wizard creates sequential
sessions in the order you add them.
3.
4.
5.
Specify how you want the PowerCenter Server to run the workflow.
You can specify that the PowerCenter Server runs sessions only if previous sessions
complete, or you can specify that the PowerCenter Server always runs each session. When
you select this option, it applies to all sessions you create using the Workflow Wizard.
101
To schedule a workflow:
1.
In the third step of the Workflow Wizard, configure the scheduling and run options. For
more information about scheduling a workflow, see Scheduling a Workflow on
page 112.
2.
Click Next.
The Workflow Wizard displays the settings for the workflow:
3.
Verify the workflow settings and click Finish. To edit settings, click Back.
The completed workflow opens in the Workflow Designer workspace. From the
workspace, you can add tasks, create concurrent sessions, add conditions to links, or
modify properties.
4.
102
User-defined workflow variables. You create user-defined workflow variables when you
create a workflow. For more information, see User-Defined Workflow Variables on
page 108.
You can use workflow variables when you configure the following types of tasks:
Assignment tasks. You can use an Assignment task to assign a value to a user-defined
workflow variable. For example, you can increment a user-defined counter variable by
setting the variable to its current value plus 1. For information on using workflow variables
in Assignment tasks, see Working with the Assignment Task on page 140.
Decision tasks. Decision tasks determine how the PowerCenter Server executes a
workflow. For example, you can use the Status variable to run a second session only if the
first session completes successfully. For information on using workflow variables in
Decision tasks, see Working with the Decision Task on page 149.
Links. Links connect each workflow task. You can use workflow variables in links to create
branches in the workflow. For example, after a Decision task, you can create one link to
follow when the decision condition evaluates to true, and another link to follow when the
decision condition evaluates to false. For information on using workflow variables in Link
tasks, see Working with Links on page 92.
Timer tasks. Timer tasks specify when the PowerCenter Server begins to execute the next
task in the workflow. You can use a user-defined date/time variable to specify the exact
time the PowerCenter Server starts to execute the next task. For information on using
workflow variables in Timer tasks, see Working with the Timer Task on page 161.
You can use the Expression Editor to create an expression that uses variables.
103
When you build an expression, you can select pre-defined variables on the Pre-Defined tab.
You can select user-defined variables on the User-Defined tab. The Functions tab contains
functions that you can use with workflow variables.
Use the point-and-click method to enter an expression using a variable. For information on
using the Expression Editor, see Using the Expression Editor on page 96.
You can use the following keywords to write expressions for user-defined and pre-defined
workflow variables:
104
AND
OR
NOT
TRUE
FALSE
NULL
SYSDATE
Task-specific variables. The Workflow Manager provides a set of task-specific variables for
each task in the workflow. You can use task-specific variables in a link condition to control
the path the PowerCenter Server takes when running the workflow. The Workflow
Manager lists task-specific variables under the task name in the Expression Editor.
System variables. You can use the SYSDATE and WORKFLOWSTARTTIME system
variables within a workflow. For more information on system variables, see Variables in
the Transformation Language Reference. The Workflow Manager lists system variables under
the Built-in node in the Expression Editor.
Table 4-1 lists the task-specific workflow variables available in the Workflow Manager:
Table 4-1. Task-Specific Workflow Variables
Task-Specific Variables
Description
Task Types
Datatype
Condition
Decision
Integer
EndTime
All tasks
Date/time
ErrorCode
All tasks
Integer
ErrorMsg
All tasks
Nstring*
FirstErrorCode
Session
Integer
FirstErrorMsg
Session
Nstring*
PrevTaskStatus
All tasks
Integer
SrcFailedRows
Session
Integer
SrcSuccessRows
Session
Integer
105
Description
Task Types
Datatype
StartTime
All tasks
Date/time
Status
All tasks
Integer
TgtFailedRows
Session
Integer
TgtSuccessRows
Session
Integer
TotalTransErrors
Session
Integer
All pre-defined workflow variables except Status have a default value of null. The
PowerCenter Server uses the default value of null when it encounters a pre-defined variable
from a task that has not yet run in the workflow. Therefore, expressions and link conditions
that depend upon tasks not yet run are valid. The default value of Status is NOTSTARTED.
The Expression Editor displays the pre-defined workflow variables on the Pre-defined tab.
The Workflow Manager groups task-specific variables by task and lists system variables under
the Built-in node. To use a variable in an expression, double-click the variable. The
Expression Editor displays task-specific variables in the Expression field in the following
format:
$<TaskName>.<Pre-definedVariable>
106
Figure 4-9 shows the Expression Editor with an expression using a task-specific workflow
variable and keyword:
Figure 4-9. Expression Using a Pre-Defined Workflow Variable
Link condition:
$Session2.Status = SUCCEEDED
When you run the workflow, the PowerCenter Server evaluates the link condition and returns
the value based on the status of Session2.
107
Link condition:
$Session2.PrevTaskStatus = SUCCEEDED
When you run the workflow, the PowerCenter Server skips Session2 because the session is
disabled. When the PowerCenter Server evaluates the link condition, it returns the value
based on the status of Session1.
Tip: If you do not disable Session2, the PowerCenter Server returns the value based on the
status of Session2. You do not need to change the link condition when you enable and disable
Session2.
108
You can use a user-defined variable to determine when to run the session that updates the
orders database at headquarters.
To do this, set up the workflow as follows:
1.
2.
3.
Place a Decision task after the session that updates the local orders database.
Set up the decision condition to check to see if the number of workflow runs is evenly
divisible by 10. You can use the modulus (MOD) function to do this.
4.
5.
Link the Decision task to the session that updates the database at headquarters when the
decision condition evaluates to true. Link it to the Assignment task when the decision
condition evaluates to false.
When you do this, the session that updates the local database runs every time the workflow
runs. The session that updates the database at headquarters runs every 10th time the
workflow runs.
The start value is the value of the variable at the start of the workflow. The start value could
be a value defined in the parameter file for the variable, a value saved in the repository from
the previous run of the workflow, a user-defined initial value for the variable, or the default
value based on the variable datatype.
The PowerCenter Server looks for the start value of a variable in the following order:
1.
2.
3.
4.
For a list of datatype default values, see Table 4-2 on page 110.
For example, you create a workflow variable in a workflow and enter a default value, but you
do not define a value for the variable in a parameter file. The first time the PowerCenter
Server runs the workflow, it evaluates the start value of the variable to the user-defined default
value.
109
If you declare the variable as persistent, the PowerCenter Server saves the value of the variable
to the repository at the end of the workflow run. The next time the workflow runs, the
PowerCenter Server evaluates the start value of the variable as the value saved in the
repository.
If the variable is non-persistent, the PowerCenter Server does not save the value of the variable.
The next time the workflow runs, the PowerCenter Server evaluates the start value of the
variable as the user-specified default value.
If you want to override the value saved in the repository before running a workflow, you need
to define a value for the variable in a parameter file. When you define a workflow variable in
the parameter file, the PowerCenter Server uses this value instead of the value saved in the
repository or the configured initial value for the variable.
The current value is the value of the variable as the workflow progresses. When a workflow
starts, the current value of a variable is the same as the start value. The value of the variable
can change as the workflow progresses if you create an Assignment task that updates the value
of the variable.
If the variable is persistent, the PowerCenter Server saves the current value of the variable to
the repository at the end of a successful workflow run. If the workflow fails to complete, the
PowerCenter Server does not update the value of the variable in the repository.
The PowerCenter Server states the value saved to the repository for each workflow variable in
the workflow log.
Date/time
1/1/1753 A.D.
Double
Integer
Nstring
Empty string
110
2.
Validate Button
3.
4.
In the Datatype field, select the datatype for the new variable. You can select from the
following datatypes:
Date/time
Double
Integer
Nstring
Enable the Persistent option if you want the value of the variable retained from one
execution of the workflow to the next. For more information, see Start and Current
Values on page 109.
6.
Enter the default value for the variable in the Default field. If the default value is a null
value, enable the Is Null option.
7.
To validate the default value of the new workflow variable, click the Validate button.
8.
9.
111
Scheduling a Workflow
You can schedule a workflow to run continuously, repeat at a given time or interval, or you
can manually start a workflow. The PowerCenter Server runs a scheduled workflow as
configured.
By default, the workflow runs on demand. You can change the schedule settings by editing the
scheduler. If you change schedule settings, the PowerCenter Server reschedules the workflow
according to the new settings.
Each workflow has an associated scheduler. A scheduler is a repository object that contains a
set of schedule settings. You can create a non-reusable scheduler for the workflow. Or, you can
create a reusable scheduler so you can use the same set of schedule settings for workflows in
the folder.
The Workflow Manager marks a workflow invalid if you delete the scheduler associated with
the workflow.
If you choose a different PowerCenter Server for the workflow or restart the PowerCenter
Server, it reschedules all workflows. This includes workflows that are scheduled to run
continuously but whose start time has passed. You must manually reschedule workflows
whose start time has passed if they are not scheduled to run continuously.
The PowerCenter Server does not run the workflow if:
The prior workflow run fails. When a workflow fails, the PowerCenter Server removes the
workflow from the schedule, and you must manually reschedule it. You can reschedule the
workflow in the Workflow Manager or using pmcmd. In the Workflow Manager Navigator
window, right-click the workflow and select Schedule Workflow. For more information
about the pmcmd scheduleworkflow command, see Scheduleworkflow on page 604.
You remove the workflow from the schedule. You can remove the workflow from the
schedule in the Workflow Manager or using pmcmd. In the Workflow Manager Navigator
window, right-click the workflow and select Unschedule Workflow. For more information
about the pmcmd unscheduleworkflow command, see Unscheduleworkflow on page 610.
Note: The PowerCenter Server schedules the workflow in the time zone of the PowerCenter
Server machine. For example, the PowerCenter Client is in your current time zone and the
PowerCenter Server is in a time zone two hours later. If you schedule the workflow to start at
9 a.m., it starts at 9 a.m. in the time zone of the PowerCenter Server machine and 7 a.m.
current time.
To schedule a workflow:
1.
2.
Choose Workflows-Edit.
3.
In the Scheduler tab, choose Non-reusable if you want to create a non-reusable set of
schedule settings for the workflow.
Choose Reusable if you want to select an existing reusable scheduler for the workflow.
112
Note: If you do not have a reusable scheduler in the folder, you must create one before you
choose Reusable. The Workflow Manager displays a warning message if you do not have
an existing reusable scheduler.
4.
Click the right side of the Scheduler field to edit scheduling settings for the scheduler.
Edit scheduler
settings.
If you select Reusable, choose a reusable scheduler from the Scheduler Browser dialog
box.
6.
Click OK.
To remove a workflow from its schedule, right-click the workflow in the Navigator window
and choose Unschedule Workflow.
Scheduling a Workflow
113
To reschedule a workflow on its original schedule, right-click the workflow in the Navigator
window and choose Schedule Workflow.
2.
3.
4.
Configure the scheduler settings in the Scheduler tab. For a complete list of scheduler
settings, see Table 4-3 on page 115.
114
Required/
Optional
Description
Run Options:
Run On Server Initialization/
Run On Demand/Run
Continuously
Optional
Schedule Options:
Run Once/Run Every/
Customized Repeat
Optional
Scheduling a Workflow
115
Required/
Optional
Description
Optional
Required/
Optional
116
Required/
Optional
Repeat Every
Required
Enter the numeric interval you would like the PowerCenter Server to schedule
the workflow, and then select Days, Weeks, or Months, as appropriate.
If you select Days, select the appropriate Daily Frequency settings.
If you select Weeks, select the appropriate Weekly and Daily Frequency
settings.
If you select Months, select the appropriate Monthly and Daily Frequency
settings.
Weekly
Required/
Optional
Required to enter a weekly schedule. Select the day or days of the week on
which you would like the PowerCenter Server to run the workflow.
Monthly
Required/
Optional
Daily
Optional
Enter the number of times you would like the PowerCenter Server to run the
workflow on any day the session is scheduled.
If you select Run Once, the PowerCenter Server schedules the workflow once
on the selected day, at the time entered on the Start Time setting on the Time
tab.
If you select Run Every, enter Hours and Minutes to define the interval at which
the PowerCenter Server runs the workflow. The PowerCenter Server then
schedules the workflow at regular intervals on the selected day. The
PowerCenter Server uses the Start Time setting for the first scheduled
workflow of the day.
Description
Scheduling a Workflow
117
Reusable schedulers. When you edit settings for a reusable scheduler, the repository
creates a new version of the scheduler and increments the version number by one. To
update a workflow with the latest schedule, check in the scheduler after you edit it.
When you configure a reusable scheduler for a new workflow, you must check in both the
workflow and the scheduler to enable the schedule to take effect. Thereafter, when you
check in the scheduler after revising it, the workflow schedule is updated automatically
even if it is checked out.
You need to update the workflow schedule manually if you do not check in the scheduler.
To update a workflow schedule manually, right-click the workflow in the Navigator, and
select Schedule Workflow. Note that the new schedule is implemented only for latest
version of the workflow that is checked in. Workflows that are checked out are not
updated with the new schedule.
Disabling Workflows
You may want to disable the workflow while you edit it. This prevents the PowerCenter Server
from running the workflow on its schedule. Select the Disable Workflows option on the
General tab of the workflow properties. The PowerCenter Server does not run disabled
workflows until you clear the Disable Workflows option. Once you clear the Disable
Workflows option, the PowerCenter Server reschedules the workflow.
118
Validating a Workflow
Before you can run a workflow, you must validate it. When you validate the workflow, you
validate all task instances in the workflow, including nested worklets.
The Workflow Manager validates the following properties:
Tasks. Non-reusable task and Reusable task instances in the workflow must follow
validation rules.
Scheduler. If the workflow uses a reusable scheduler, the Workflow Manager verifies that
the scheduler exists.
The Workflow Manager also verifies that you linked each task properly. For example, you
must link the Start task to at least one task in the workflow.
Note: The Workflow Manager validates Session tasks separately. If a session is invalid, the
workflow may still be valid. For more information about session validation, see Validating a
Session on page 195.
Expression Validation
The Workflow Manager validates all expressions in the workflow. You can enter expressions in
the Assignment task, Decision task, and link conditions. The Workflow Manager writes any
error message to the Output window.
Expressions in link conditions and Decision task conditions must evaluate to a numerical
value. Workflow variables used in expressions must exist in the workflow.
The Workflow Manager marks the workflow invalid if a link condition is invalid.
Task Validation
The Workflow Manager validates each task in the workflow as you create it. When you save or
validate the workflow, the Workflow Manager validates all tasks in the workflow except
Session tasks. It marks the workflow invalid if it detects any invalid task in the workflow.
The Workflow Manager verifies that attributes in the tasks follow validation rules. For
example, the user-defined event you specify in an Event task must exist in the workflow. The
Workflow Manager also verifies that you linked each task properly. For example, you must
link the Start task to at least one task in the workflow. For details on task validation rules, see
Validating Tasks on page 139.
When you delete a reusable task, the Workflow Manager removes the instance of the deleted
task from workflows. The Workflow Manager also marks the workflow invalid when you
delete a reusable task used in a workflow.
The Workflow Manager verifies that there are no duplicate task names in a folder, and that
there are no duplicate task instances in the workflow.
Validating a Workflow
119
Running Validation
When you validate a workflow, you validate worklet instances, worklet objects, and all other
nested worklets in the workflow. You validate task instances and worklets, regardless of
whether you have edited them.
The Workflow Manager validates the worklet object using the same validation rules for
workflows. The Workflow Manager validates the worklet instance by verifying attributes in
the Parameter tab of the worklet instance. For details on validating worklets, see Validating
Worklets on page 171.
If the workflow contains nested worklets, you can select a worklet to validate the worklet and
all other worklets nested under it. To validate a worklet and its nested worklets, right-click the
worklet and choose Validate.
Example
For example, you have a workflow that contains a non-reusable worklet called Worklet_1.
Worklet_1 contains a nested worklet called Worklet_a. The workflow also contains a reusable
worklet instance called Worklet_2. Worklet_2 contains a nested worklet called Worklet_b.
In the example workflow in Figure 4-15, the Workflow Manager validates links, conditions,
and tasks in the workflow. The Workflow Manager validates all tasks in the workflow,
including tasks in Worklet_1, Worklet_2, Worklet_a, and Worklet_b.
You can validate a part of the workflow. Right-click Worklet_1 and choose Validate. The
Workflow Manager validates all tasks in Worklet_1 and Worklet_a.
Figure 4-15 shows the example workflow:
Figure 4-15. Example Workflow - Validation
Worklet_1: Non-reusable
worklet. Contains a
nested worklet called
Worklet_a.
Worklet_2: Reusable
worklet. Contains a
nested worklet called
Worklet_b.
results view or a view dependencies list. When you validate multiple workflows, the validation
does not include sessions, nested worklets, or reusable worklet objects in the workflows.
Note: If you are using the Repository Manager, you can select and validate multiple workflows
2.
3.
Choose whether to save objects and check in objects that you validate.
Validating a Workflow
121
2.
3.
Click the Select Server button on the General tab. A list of registered servers appear.
Select a server.
4.
5.
122
editing each workflow property individually. To assign the PowerCenter Server to multiple
workflows, you must first close all folders in the repository.
You can also choose a PowerCenter Server to run a specific workflow by editing the workflow
property. For details, see Running a Workflow on page 124.
To assign the PowerCenter Server to workflows, you must have Super User privilege.
To assign the PowerCenter Server:
1.
2.
Select a server to
assign.
Select a folder.
Assign a server to
a workflow.
3.
From the Choose Server list, select the server you want to assign.
4.
From the Show Folder list, select the folder you want to view. Or, choose All to view
workflows in all folders in the repository.
5.
Select the Select check box for each workflow you want to run on the PowerCenter
Server.
6.
Click Assign.
123
2.
3.
4.
From the Show Folder list, select the folder you want to view. Or, choose All to view
workflows in all folders in the repository.
5.
Select the workflows from which you want to remove the assigned server.
6.
Click Assign.
Running a Workflow
When you choose Workflows-Start, the PowerCenter Server runs the entire workflow.
To run a workflow from pmcmd, use the startworkflow command. For details on using
pmcmd, see Using pmcmd on page 581.
To start a workflow with the Workflow Manager:
1.
2.
From the Navigator, select the workflow that you want to start.
3.
When you choose Start Workflow, the workflow runs on the PowerCenter Server you selected
in the workflow properties. You can also use the Choose Server toolbar button to run the
workflow on a different server.
After the Workflow Manager sends a request to the PowerCenter Server, the Output window
displays the PowerCenter Server response. If an error displays, check the workflow log or
session log for error messages.
You can also manually start a workflow by right-clicking in the Workflow Designer workspace
and choosing Start Workflow.
124
To run a part of a workflow from pmcmd, use the startfrom flag of the startworkflow
command. For details on using pmcmd, see Using pmcmd on page 581.
To run a part of a workflow:
1.
2.
In the Navigator window, drill down the Workflow folder to show the tasks in the
workflow.
or
In the Workflow Designer workspace, select the task from which you want the
PowerCenter Server to begin running.
3.
Right-click the task on which you want the PowerCenter Server to begin running.
4.
For example, you have a workflow with multiple tasks. The example workflow in Figure 4-16
contains two branches. If you want to run the tasks commandtask2, e_email2, and
command3, you start the workflow from commandtask2. All subsequent tasks in the branch
will run.
Figure 4-16. Running Part of a Workflow - Example
125
To start a task in a workflow from pmcmd, use the starttask command. For details on using
pmcmd, see Using pmcmd on page 581.
126
Session
Command
Worklet
When a task fails in the workflow, the PowerCenter Server stops running tasks in its path.
The PowerCenter Server does not evaluate the output link of the failed task. If no other task is
running in the workflow, the Workflow Monitor displays the status of the workflow as
Suspended.
If one or more tasks are still running in the workflow when a task fails, the PowerCenter
Server stops running the failed task and continues running tasks in other paths. The
Workflow Monitor displays the status of the workflow as Suspending.
When the status of the workflow is Suspended or Suspending, you can fix the error, such
as a target database error, and resume or recover the workflow in the Workflow Monitor.
When you resume or recover a workflow, the PowerCenter Server restarts the failed tasks and
continues evaluating the rest of the tasks in the workflow. The PowerCenter Server does not
run any task that already completed successfully.
Note: Do not edit a workflow or the tasks inside a workflow when the PowerCenter Server
suspends a workflow.
For details about resuming the workflow, see Resuming a Workflow or Worklet on
page 417. For details about recovering the workflow, see Recovering a Workflow or Worklet
on page 417.
To suspend a workflow:
1.
2.
Choose Workflows-Edit.
127
3.
4.
Click OK.
128
Use a Control task in the workflow. For details, see Working with the Control Task on
page 147.
Issue a stop or abort command in the Workflow Monitor. For details, see Monitoring
Workflows on page 401.
Issue a stop or abort command in pmcmd. For details, see pmcmd Reference on
page 594.
You can also stop or abort a task within a workflow. For details on stopping the Session task,
see Stopping and Aborting a Session on page 200.
Session
Command
Timer
Event-Wait
Worklet
When you stop a Command task that contains multiple commands, the PowerCenter Server
finishes executing the current command and does not execute the rest of the commands. The
PowerCenter Server cannot stop tasks such as the Email task. For example, if the PowerCenter
Server has already started sending an email when you issue the stop command, the
PowerCenter Server finishes sending the email before it stops running the workflow.
The PowerCenter Server aborts the workflow if the Repository Server process shuts down.
129
Server continues processing concurrent tasks in the workflow. If the PowerCenter Server
cannot stop the task, you can abort the task.
When you abort a task, the PowerCenter Server kills the process on the task. The
PowerCenter Server continues processing concurrent tasks in the workflow when you abort a
task.
You can also stop or abort a worklet. The PowerCenter Server stops and aborts a worklet
similar to stopping and aborting a task. The PowerCenter Server stops the worklet while
executing concurrent tasks in the workflow. You can also stop or abort tasks within a worklet.
130
Chapter 5
Overview, 132
131
Overview
The Workflow Manager contains many types of tasks to help you build workflows and
worklets. You can create reusable tasks in the Task Developer. Or, create and add tasks in the
Workflow or Worklet Designer as you develop the workflow.
Table 5-1 summarizes workflow tasks available in Workflow Manager:
Table 5-1. Workflow Tasks
Task Name
Tool
Reusable
Description
Assignment
Workflow Designer
Worklet Designer
No
Command
Task Developer
Workflow Designer
Worklet Designer
Yes
Control
Workflow Designer
Worklet Designer
No
Decision
Workflow Designer
Worklet Designer
No
Task Developer
Workflow Designer
Worklet Designer
Yes
Event-Raise
Workflow Designer
Worklet Designer
No
Event-Wait
Workflow Designer
Worklet Designer
No
Session
Task Developer
Workflow Designer
Worklet Designer
Yes
Timer
Workflow Designer
Worklet Designer
No
The Workflow Manager validates tasks attributes and links. If a task is invalid, the workflow
becomes invalid. Workflows containing invalid sessions may still be valid. For details on
validating tasks, see Validating Tasks on page 139.
132
Creating a Task
You can create tasks in the Task Developer, or you can create them in the Workflow Designer
or the Worklet Designer as you develop the workflow or worklet. Tasks you create in the Task
Developer are reusable. Tasks you create in the Workflow Designer and Worklet Designer are
non-reusable by default.
For details on reusable tasks, see Reusable Workflow Tasks on page 135.
Command
Session
In the Task Developer, choose Tasks-Create. The Create Task dialog box appears.
2.
Select the task type you want to create, Command, Session, or Email.
3.
4.
For session tasks, select the mapping you want to associate with the session.
5.
Click Create.
The Task Developer creates the workflow task.
6.
Creating a Task
133
Perform the following steps to create tasks in the Workflow Designer or Worklet Designer.
To create tasks in the Workflow Designer or Worklet Designer:
1.
2.
Choose Tasks-Create.
3.
4.
5.
Click Create.
The Workflow Designer or Worklet Designer creates the task and adds it to the
workspace.
6.
Click Done.
You can also use the Tasks toolbar to create and add tasks to the workflow. Click the button
on the Tasks toolbar for the task you want to create. Click again in the Workflow Designer or
Worklet Designer workspace to create and add the task. The Workflow Designer or Worklet
Designer creates the task with a default task name when you use the Tasks toolbar.
134
Configuring Tasks
After you create the task, you can configure general task options on the General tab. For each
task instance in the workflow, you can configure how the PowerCenter Server runs the task
and the other objects associated with the selected task. You can also disable the task so you can
run rest of the workflow without the selected task.
Figure 5-1 displays the General tab in the Edit Tasks dialog box:
Figure 5-1. General Tab - Edit Tasks Dialog Box
When you use a task in the workflow, you can edit the task in the Workflow Designer and
configure the following task options in the General tab:
Treat input link as AND or OR. Choose to have the PowerCenter Server run the task
when all or one of the input link conditions evaluates to True.
Disable this task. Choose to disable the task so you can run the rest of the workflow
without the task.
Fail parent if this task fails. Choose to fail the workflow or worklet containing the task if
the task fails.
Fail parent if this task does not run. Choose to fail the workflow or worklet containing
the task if the task does not run.
Configuring Tasks
135
You have the option to create any task as non-reusable or reusable. Tasks you create in the
Task Developer are reusable. Tasks you create in the Workflow Designer are non-reusable by
default. However, you can edit the general properties of a task to promote it to a reusable task.
The Workflow Manager stores each reusable task separate from the workflows that use the
task. You can view a list of reusable tasks in the Tasks node in the Navigator window. You can
see a list of all reusable Session tasks in the Sessions node in the Navigator window.
To promote a non-reusable workflow task:
1.
In the Workflow Designer, double-click the task you want to make reusable.
2.
In the General tab of the Edit Task dialog box, check the Make Reusable option.
3.
When prompted whether you are sure you want to promote the task, click Yes.
4.
5.
Choose Repository-Save.
The newly promoted task appears in the list of reusable tasks in the Tasks node in the
Navigator window.
136
Figure 5-2 displays the Revert button in the Mapping tab of a Session task:
Figure 5-2. Revert Button in Session Properties
Disabling Tasks
In the Workflow Designer, you can disable a workflow task so that the PowerCenter Server
runs the workflow without the disabled task. The status of a disabled task is DISABLED.
Disable a task in the workflow by selecting the Disable This Task option in the Edit Tasks
dialog box.
Configuring Tasks
137
138
Validating Tasks
You can validate reusable tasks in the Task Developer. Or, you can validate task instances in
the Workflow Designer. When you validate a task, the Workflow Manager validates task
attributes and links. For example, the user-defined event you specify in an Event tasks must
exist in the workflow.
The Workflow Manager uses the following rules to validate tasks:
Assignment. The Workflow Manager validates the expression you enter for the
Assignment task. For example, the Workflow Manager verifies that you assigned a
matching datatype value to the workflow variable in the assignment expression.
Command. The Workflow Manager does not validate the shell command you enter for the
Command task.
Event-Wait. If you choose to wait for a pre-defined event, the Workflow Manager verifies
that you specified a file to watch. If you choose to use the Event-Wait task to wait for a
user-defined event, the Workflow Manager verifies that you specified an event.
Event-Raise. The Workflow Manager verifies that you specified a user-defined event for
the Event-Raise task.
Timer. The Workflow Manager verifies that the variable you specified for the Absolute
Time setting has the Date/Time datatype.
Start. The Workflow Manager verifies that you linked the Start task to at least one task in
the workflow.
When a task instance is invalid, the workflow using the task instance becomes invalid. When
a reusable task is invalid, it does not affect the validity of the task instance used in the
workflow. However, if a Session task instance is invalid, the workflow may still be valid. The
Workflow Manager validates sessions differently. For details, see Validating a Session on
page 195.
To validate a task, select the task in the workspace and choose Tasks-Validate. Or, right-click
the task in the workspace and choose Validate.
Validating Tasks
139
In the Workflow Designer, click the Assignment icon on the Tasks toolbar.
or
Choose Tasks-Create. Select Assignment Task for the task type.
2.
Enter a name for the Assignment task. Click Create. Then click Done.
The Workflow Designer creates and adds the Assignment task to the workflow.
3.
Double-click the Assignment task to open the Edit Task dialog box.
4.
Add an assignment.
Open Button
5.
140
6.
Select the variable for which you want to assign a value. Click OK.
7.
Click the Edit button in the Expression field to open the Expression Editor.
The Expression Editor shows pre-defined workflow variables, user-defined workflow
variables, variable functions, and boolean and arithmetic operators.
8.
Enter the value or expression you want to assign. For example, if you want to assign the
value 500 to the user-defined variable $$custno1, enter the number 500 in the
Expression Editor.
Validate the expression before you close the Expression Editor.
Working with the Assignment Task
141
142
9.
Repeat steps 5-7 to add more variable assignments as necessary. Use the up and down
arrows in the Expressions tab to change the order of the variable assignments.
10.
Click OK.
Standalone Command task. You can use a Command task anywhere in the workflow or
worklet to run shell commands.
Pre- and post-session shell command. You can call a Command task as the pre- or postsession shell command for a Session task. For more information about specifying presession and post-session shell commands, see Using Pre- or Post-Session Shell
Commands on page 188.
Note: You can use server variables or session variables in pre- and post-session shell commands.
You cannot use server variables or session variables in standalone Command tasks. The
PowerCenter Server does not expand server variables or session variables in standalone
Command tasks.
Use any valid UNIX command or shell script for UNIX servers, or any valid DOS or batch
file for Windows servers.
For example, you might use a shell command to copy a file from one directory to another. For
a Windows server you would use the following shell command to copy the SALES_ ADJ file
from the source directory, L, to the target, H:
copy L:\sales\sales_adj H:\marketing\
For a UNIX server, you would use the following command to perform a similar operation:
cp sales/sales_adj marketing/
Each shell command runs in the same environment (UNIX or Windows) as the PowerCenter
Server. Environment settings in one shell command script do not carry over to other scripts.
To run all shell commands in the same environment, call a single shell script that invokes
other scripts.
143
In the Workflow Designer or the Task Developer, click the Command Task icon on the
Tasks toolbar.
or
Choose Task-Create. Select Command Task for the task type.
2.
Enter a name for the Command task. Click Create. Then click Done.
3.
Double-click the Command task in the workspace to open the Edit Tasks dialog box.
4.
Add Button
Edit Button
5.
144
6.
In the Command field, click the Edit button to open the Command Editor.
7.
Enter the command you want to perform. Enter only one command in the Command
Editor.
8.
9.
10.
Click OK.
If you specify non-reusable shell commands for a session, you can promote the non-reusable
shell commands to a reusable Command task. For details, see Creating a Reusable Command
Task from Pre- or Post-Session Commands on page 191.
145
146
In the Workflow Designer, click the Control Task icon on the Tasks toolbar.
or
Choose Tasks-Create. Select Control Task for the task type.
2.
Enter a name for the Control task. Click Create. Then click Done.
The Workflow Manager creates and adds the Control task to the workflow.
3.
147
4.
148
Control Option
Description
Fail Me
Fail Parent
Stop Parent
Abort Parent
Example
For example, you have a Command task that depends on the status of the three sessions in the
workflow. You want the PowerCenter Server to run the Command task when any of the three
sessions fails. To accomplish this, use a Decision task with the following decision condition:
$Q1_session.status = FAILED OR $Q2_session.status = FAILED OR
$Q3_session.status = FAILED
You can then use the pre-defined condition variable in the input link condition of the
Command task. Configure the input link with the following link condition:
$Decision.condition = True
149
You can configure the same logic in the workflow without the Decision task. Without the
Decision task, you need to use three link conditions and treat the input links to the
Command task as OR links.
Figure 5-5 shows the example workflow without the Decision task:
Figure 5-5. Example Workflow without a Decision Task
You can further expand the example workflow in Figure 5-4. In Figure 5-4, the PowerCenter
Server runs the Command task if any of the three Session tasks fails. Suppose now you want
the PowerCenter Server to also run an Email task if all three Session tasks succeed.
150
To do this, add an Email task and use the decision condition variable in the link condition.
Figure 5-6 shows the expanded example workflow using a Decision task:
Figure 5-6. Expanded Example Workflow Using a Decision Task
$Decision.condition = True
$Decision.condition = False
In the Workflow Designer, click the Decision Task icon on the Tasks toolbar.
or
Choose Tasks-Create. Select Decision Task for the task type.
2.
Enter a name for the Decision task. Click Create. Then click Done.
The Workflow Designer creates and adds the Decision task to the workspace.
151
3.
4.
Click the Open button in the Value field to open the Expression Editor.
5.
In the Expression Editor, enter the condition you want the PowerCenter Server to
evaluate.
Validate the expression before you close the Expression Editor.
6.
152
Click OK.
Event-Raise task. Event-Raise task represents a user-defined event. When the PowerCenter
Server runs the Event-Raise task, the Event-Raise task triggers the event. Use the EventRaise task with the Event-Wait task to define events.
Event-Wait task. The Event-Wait task waits for an event to occur. Once the event triggers,
the PowerCenter Server continues executing the rest of the workflow.
To coordinate the execution of the workflow, you may specify the following types of events for
the Event-Wait and Event-Raise tasks:
Pre-defined event. A pre-defined event is a file-watch event. For pre-defined events, use an
Event-Wait task to instruct the PowerCenter Server to wait for the specified indicator file
to appear before continuing with the rest of the workflow. When the PowerCenter Server
locates the indicator file, it starts the next task in the workflow.
153
Perform the following steps to configure the workflow shown in Figure 5-7:
1.
2.
3.
Declare an event called Q1Q3_Complete in the Events tab of the workflow properties.
4.
5.
Specify the Q1Q3_Complete event in the Event-Raise task properties. This allows the
Event-Raise task to trigger the event when Q1_session and Q3_session complete.
6.
7.
8.
Add Q4_session after the Event-Wait task. When the PowerCenter Server processes the
Event-Wait task, it waits until the Event-Raise task triggers Q1Q3_Complete before it
runs Q4_session.
The PowerCenter Server runs the workflow shown in Figure 5-7 in the following order:
1.
2.
3.
4.
The Event-Wait task waits for the Event-Raise task to trigger the event.
5.
6.
7.
The PowerCenter Server runs Q4_session because the event, Q1Q3_Complete, has been
triggered.
8.
154
2.
3.
4.
Click OK.
In the Workflow Designer workspace, create an Event-Raise task and place it in the
workflow to represent the user-defined event you want to trigger. A user-defined event is
the sequence of tasks in the branch from the Start task to the Event-Raise task.
155
2.
3.
Click the Open button in the Value field on the Properties tab to open the Events
Browser for user-defined events.
4.
5.
156
specify an indicator file for the PowerCenter Server to watch. The PowerCenter Server waits
for the indicator file to appear. Once the indicator file appears, the PowerCenter Server
continues executing tasks after the Event-Wait task.
Do not use the Event-Raise task to trigger the event when you wait for a pre-defined event.
You can also use the Event-Wait task to wait for a user-defined event. To use the Event-Wait
task for a user-defined event, you specify the name of the user-defined event in the EventWait task properties. The PowerCenter Server waits for the Event-Raise task to trigger the
user-defined event. Once the user-defined event is triggered, the PowerCenter Server
continues running tasks after the Event-Wait task.
In the workflow, create an Event-Wait task and double-click the Event-Wait task to open
the Edit Task dialog box.
2.
In the Events tab of the Edit Tasks dialog box, select User-Defined.
157
3.
Click the Event button to open the Events Browser dialog box.
4.
5.
Click OK twice.
Perform the following steps to wait for a pre-defined event in the workflow.
To wait for a pre-defined event:
1.
158
Create an Event-Wait task and double-click the Event-Wait task to open it.
2.
In the Events tab of the Edit Task dialog box, select Pre-defined.
3.
4.
If you want the PowerCenter Server to delete the indicator file after it detects the file,
select the Delete Filewatch File option in the Properties tab.
5.
Click OK.
159
When you select Enable Past Events, the PowerCenter Server continues executing the next
tasks if the event already occurred.
Select the Enable Past Events option in the Properties tab of the Event-Wait task.
160
Absolute time. You specify the exact time that the PowerCenter Server starts running the
next task in the workflow. You may specify the exact date and time, or you can choose a
user-defined workflow variable to specify the exact time.
Relative time. You instruct the PowerCenter Server to wait for a specified period of time
after the Timer task, the parent workflow, or the top-level workflow starts.
For example, you may have two sessions in the workflow. You want the PowerCenter Server
wait ten minutes after the first session completes before it runs the second session. Use a
Timer task after the first session. In the Relative Time setting of the Timer task, specify ten
minutes from the start time of the Timer task.
Figure 5-8 shows the example workflow using the Timer task:
Figure 5-8. Example Workflow Using the Timer Task
You can use a Timer task anywhere in the workflow after the Start task.
To create a Timer task:
1.
In the Workflow Designer, click the Timer task icon on the Tasks toolbar.
or
Choose Tasks-Create. Select Timer Task for the task type.
2.
3.
161
4.
Click the Timer tab to specify when the PowerCenter Server starts the next task in the
workflow.
Specify attributes for Absolute Time or Relative Time described in Table 5-2:
Table 5-2. Timer Task Attributes
162
Timer Attribute
Description
The PowerCenter Server starts the next task in the workflow at the
exact date and time you specify.
Choose this option to wait a specified period of time after the start
time of the Timer task to run the next task.
Choose this option to wait a specified period of time after the start
time of the parent workflow/worklet to run the next task.
Choose this option to wait a specified period of time after the start
time of the top-level workflow to run the next task.
Chapter 6
Overview, 164
163
Overview
A worklet is an object that represents a set of tasks. It can contain any task available in the
Workflow Manager. You can run worklets inside a workflow. The workflow that contains the
worklet is called the parent workflow. You can also nest a worklet in another worklet.
Create a worklet when you want to reuse a set of workflow logic in several workflows. Use the
Worklet Designer to create and edit worklets.
When the PowerCenter Server runs a worklet, it expands the worklet. The PowerCenter
Server then runs the worklet as it would any other workflow, executing tasks and evaluating
links in the worklet.
The worklet does not contain any scheduling or server information. To run a worklet, include
the worklet in a workflow. The worklet runs on the PowerCenter Server you choose for the
workflow. The Workflow Manager does not provide a parameter file or log file for worklets.
The PowerCenter Server writes information about worklet execution in the workflow log.
Suspending Worklets
When you choose Suspend On Error for the parent workflow, the PowerCenter Server also
suspends the worklet if a task in the worklet fails. When a task in the worklet fails, the
PowerCenter Server stops executing the failed task and other tasks in its path. If no other task
is running in the worklet, the worklet status is Suspended. If one or more tasks are still
running in the worklet, the worklet status is Suspending. The PowerCenter Server suspends
the parent workflow when the status of the worklet is Suspended or Suspending.
For details on suspending workflows, see Suspending the Workflow on page 127.
164
Developing a Worklet
To develop a worklet, you must first create a worklet. After you create a worklet, configure
worklet properties and add tasks to the worklet. You can create reusable worklets in the
Worklet Designer. You can also create non-reusable worklets in the Workflow Designer as you
develop the workflow.
In the Worklet Designer, choose Worklets-Create. The Create Worklet dialog box
appears.
2.
3.
Click OK.
The Worklet Designer creates a Start task in the worklet.
Developing a Worklet
165
You can promote non-reusable worklets to reusable worklets by selecting the Reusable option
in the worklet properties. To rename non-reusable worklets, open the worklet properties in
the Workflow Designer.
To create a non-reusable worklet:
1.
2.
Choose Tasks-Create.
3.
4.
5.
Click Create.
The Workflow Designer creates the worklet and adds it to the workspace.
6.
Click Done.
Worklet variables. Use worklet variables to reference values and record information. You
use worklet variables the same way you use workflow variables. You can assign a workflow
variable to a worklet variable to override its initial value.
For details on worklet variables, see Using Worklet Variables on page 169.
Events. To use the Event-Wait and Event-Raise tasks in the worklet, you must first declare
an event in the worklet properties.
166
1.
2.
3.
The Worklet Designer opens so you can add tasks in the worklet.
4.
Add tasks in the worklet by using the Tasks toolbar or choose Tasks-Create in the
Worklet Designer.
5.
Nesting Worklets
You can nest a worklet within another worklet. When you run a workflow containing nested
worklets, the PowerCenter Server runs the nested worklet from within the parent worklet. You
can group several worklets together by function or simplify the design of a complex workflow
when you nest worklets.
You might choose to nest worklets to load data to fact and dimension tables. Create a nested
worklet to load fact and dimension data into a staging area. Then, create a nested worklet to
load the fact and dimension data from the staging area to the data warehouse.
You might choose to nest worklets to simplify the design of a complex workflow. Nest
worklets that can be grouped together within one worklet. In the workflow in Figure 6-1, two
worklets relate to regional sales and two worklets relate to quarterly sales.
Figure 6-1 shows a workflow that uses multiple worklets:
Figure 6-1. Workflow with Multiple Worklets
Developing a Worklet
167
The workflow in Figure 6-2 shows the same workflow with the worklets grouped and nested
in parent worklets.
Figure 6-2 shows a workflow that uses nested worklets:
Figure 6-2. Workflow with Nested Worklets
168
When you run the example workflow shown in Figure 6-3, the persistent worklet variable
retains its value from Worklet1 and becomes the initial value in Worklet2. After the
PowerCenter Server executes Worklet2, it retains the value of the persistent variable in the
repository and uses the value the next time you run the workflow.
Worklet variables only persist when you run the same workflow. A worklet variable does not
retain its value when you use instances of the worklet in different workflows.
169
2.
Add Button
Select a user-defined
worklet variable.
3.
Click the open button in the User-Defined Worklet Variables field to select a worklet
variable.
4.
Click the Open button in the Parent Workflow Variable field to select a workflow
variable to assign to the worklet variable.
5.
Click Apply.
The worklet variable in this worklet instance now has the selected workflow variable as its
initial value.
170
Validating Worklets
The Workflow Manager validates worklets when you save the worklet in the Worklet
Designer. In addition, when you use worklets in a workflow, the PowerCenter Server validates
the workflow according to the following validation rules at runtime:
You cannot run two instances of the same worklet concurrently in the same workflow.
You cannot run two instances of the same worklet concurrently across two different
workflows.
When a worklet instance is invalid, the workflow using the worklet instance remains valid.
For details on workflow validation rules, see Validating a Workflow on page 119.
The Workflow Manager displays a red invalid icon if the worklet object is invalid. The
Workflow Manager validates the worklet object using the same validation rules for workflows.
The Workflow Manager displays a blue invalid icon if the worklet instance in the workflow is
invalid. The worklet instance may be invalid when any of the following conditions occurs:
The parent workflow or worklet variable you assign to the user-defined worklet variable
does not have a matching datatype.
The user-defined worklet variable you used in the worklet properties does not exist.
You do not specify the parent workflow or worklet variable you want to assign.
For non-reusable worklets, you may see both red and blue invalid icons displayed over the
worklet icon in the Navigator.
Validating Worklets
171
172
Chapter 7
Overview, 174
173
Overview
A session is a set of instructions that tells the PowerCenter Server how and when to move data
from sources to targets. A session is a type of task, similar to other tasks available in the
Workflow Manager. In the Workflow Manager, you configure a session by creating a Session
task. To run a session, you must first create a workflow to contain the Session task.
When you create a Session task, you enter general information such as the session name,
session schedule, and the PowerCenter Server to run the session. You can also select options to
execute pre-session shell commands, send On-Success or On-Failure email, and use FTP to
transfer source and target files.
Using session properties, you can also override parameters established in the mapping, such as
source and target location, source and target type, error tracing levels, and transformation
attributes. When you assign a server in a server grid to a session, the server you specify at the
session level overrides the server you specify at the workflow level.
You can run as many sessions in a workflow as you need. You can run the Session tasks
sequentially or concurrently, depending on your needs.
The PowerCenter Server creates several files and in-memory caches depending on the
transformations and options used in the session. For more details on session output files and
caches, see Output Files and Caches on page 28.
174
communicate with databases and the PowerCenter Server. You must assign appropriate
permissions for any database, FTP, or external loader connections you configure. For details
on configuring the Workflow Manager, see Configuring the Workflow Manager on page 37.
Session Privileges
To create sessions, you must have one of the following sets of privileges and permissions:
Use Workflow Manager privilege with read, write, and execute permissions
You must have read permission for connection objects associated with the session in addition
to the above privileges and permissions.
PowerCenter allows you to set a read-only privilege for sessions. The Workflow Operator
privilege allows a user to view, start, stop, and monitor sessions without being able to edit
session properties.
In the Workflow Designer, click the Session Task icon on the Tasks toolbar.
or
Choose Tasks-Create. Select Session Task for the task type.
2.
3.
175
176
4.
Select the mapping you want to use in the Session task and click OK.
5.
Editing a Session
After you create a session, you can edit it. For example, you might need to adjust the buffer
and cache sizes, modify the update strategy, or clear a variable value saved in the repository.
Double-click the Session task to open the session properties. The session has the following
tabs, and each of those tabs has multiple settings:
General tab. Enter session name, mapping name, description for the Session task, specify a
PowerCenter Server override, and configure additional task options.
Properties tab. Enter session log information, test load settings, and performance
configuration.
Config Object tab. Enter advanced settings, log options, and error handling
configuration.
Mapping tab. Enter source and target information, override transformation properties,
and configure the session for partitioning.
For a detailed description of the session properties tabs and associated options, see Session
Properties Reference on page 667.
Figure 7-1 shows the session properties:
Figure 7-1. Session Properties
Editing a Session
177
You can edit session properties at any time. The repository updates the session properties
immediately.
If the session is running when you edit the session, the repository updates the session when
the session completes. If the mapping changes, the Workflow Manager might issue a warning
that the session is invalid. The Workflow Manager then allows you to continue editing the
session properties. After you edit the session properties, the PowerCenter Server validates the
session and reschedules the session as necessary. For details on session validation, see
Validating a Session on page 195.
Use Workflow Manager privilege with read and write permissions on the folder
178
Figure 7-2 shows the writers, connections, and properties settings for a target instance in a
session:
Figure 7-2. Session Target Object Settings
For a target
instance, you can
change writers,
connections, and
properties
settings.
Table 7-1 shows the options you can use to apply attributes to objects in a session. You can
apply different options depending on whether the setting is a reader or writer, connection, or
an object property.
Table 7-1. Apply All Options
Setting
Option
Description
Reader
Writer
Reader
Writer
Connections
Editing a Session
179
Option
Description
Connections
Connections
Connections
Apply the connection value and its connection attributes to all the
other instances that have the same connection type. This option
combines the connection option and the connection attribute
option.
Connections
Applies the connection value and its attributes to all the other
instances even if they do not have the same connection type. This
option is similar to Apply Connection Data, but it allows you to
change the connection type.
Properties
Properties
180
Figure 7-3 illustrates the connection options by showing where they display on a connection
browser:
Figure 7-3. Connection Options
2.
3.
Choose a source, target, or transformation instance from the Navigator. Settings for
properties, connections, and readers or writers might display, depending on the object
you choose.
Editing a Session
181
182
4.
5.
Select an option from the list and choose to apply it to all instances or all partitions.
6.
Select a
session
configuration
object.
Click the Browse button in the Config Name field to choose a session configuration. Select a
user-defined or default session configuration object from the browser.
To create a session configuration object:
1.
183
184
2.
3.
4.
In the Properties tab, configure advanced settings, log options, and error handling
options.
5.
Click OK.
For session configuration object settings descriptions, see Config Object Tab on page 675.
185
You can use any command that is valid for the database type. However, the PowerCenter
Server does not allow nested comments, even though the database might.
You can use mapping parameters and variables in SQL executed against the source, but not
the target.
The PowerCenter Server ignores semi-colons within single quotes, double quotes, or
within /* ...*/.
If you need to use a semi-colon outside of quotes or comments, you can escape it with a
back slash (\).
Error Handling
You can configure error handling on the Config Object tab. You can choose to stop or
continue the session if the PowerCenter Server encounters an error issuing the pre- or postsession SQL command.
186
Figure 7-6 shows how to configure error handling for a pre- or post-session SQL commands:
Figure 7-6. Stop or Continue the Session on Pre- or Post-Session SQL Errors
Stop or
continue the
session on preor postsession SQL
error.
187
Use any valid UNIX command or shell script for UNIX servers, or any valid DOS or batch
file for Windows servers.
The Workflow Manager provides a task called the Command task that allows you to specify
shell commands anywhere in the workflow. You can choose a reusable Command task for the
pre- or post-session shell command. Or, you can create non-reusable shell commands for the
pre- or post-session shell commands. For details on the Command task, see Working with
the Command Task on page 143.
If you create a non-reusable pre- or post-session shell command, you can make it into a
reusable Command task.
The Workflow Manager allows you to choose from the following options when you configure
shell commands:
Create non-reusable shell commands. Create a non-reusable set of shell commands for the
session. Other sessions in the folder cannot use this set of shell commands.
Use an existing reusable Command task. Select an existing Command task to run as the
pre- or post-session shell command.
Configure pre- and post-session shell commands in the Components tab of the session
properties.
variables in standalone Command tasks in the workflow. The PowerCenter Server does not
expand server variables or session variables used in standalone Command tasks.
Perform the following steps to create pre- or post-session shell commands for a specific
session.
189
In the Components tab of the session properties, select Non-reusable for pre- or postsession shell command.
Edit presession
commands.
190
2.
Click the Edit button in the Value field to open the Edit Pre- or Post-Session Command
dialog box.
3.
4.
If you want the PowerCenter Server to perform the next command only if the previous
command completed successfully, select Run If Previous Completed in the Properties tab.
5.
In the Commands tab, click the Add button to add shell commands.
Enter one command for each line.
Add a command.
6.
Click OK.
191
To create a Command Task from non-reusable pre- or post-session shell commands, click the
Edit button to open the Edit dialog box for the shell commands. In the General tab, select the
Make Reusable checkbox.
After you check the Make Reusable checkbox and click OK, a new Command task appears in
the Tasks folder in the Navigator window. You can use this Command task in other
workflows, just as you do with any other reusable workflow tasks.
In the Components tab of the session properties, click Reusable for the pre- or postsession shell command.
2.
Click the Edit button in the Value field to open the Task Browser dialog box.
3.
Select the Command task you want to run as the pre- or post-session shell command.
4.
Click the Override button in the Task Browser dialog box if you want to change the order
of the commands, or if you want to specify whether to run the next command when the
previous command fails.
Changes you make to the Command task from the session properties only apply to the
session. In the session properties, you cannot edit the commands in the Command task.
5.
Click OK to select the Command task for the pre- or post-session shell command.
The name of the Command task you select appears in the Value field for the shell
command.
192
Stop or
continue the
session on presession shell
command error.
193
On-Success Email. The PowerCenter Server sends the email when the session completes
successfully.
On-Failure Email. The PowerCenter Server sends the email when the session fails.
You can also use an Email task to send email anywhere in the workflow. If you already created
a reusable Email task, you can select it as the On-Success or On-Failure email for the session.
Or, you can create non-reusable emails that exist only within the Session task.
For more information about sending post-session emails, see Sending Email on page 319.
194
Validating a Session
The Workflow Manager validates a Session task when you save it. You can also manually
validate Session tasks and session instances. Validate reusable Session tasks in the Task
Developer. Validate non-reusable sessions and reusable session instances in the Workflow
Designer.
The Workflow Manager marks a reusable session or session instance invalid if you perform
one of the following tasks:
Edit the mapping in a way that might invalidate the session. You can edit the mapping
used by a session at any time. When you edit and save a mapping, the repository might
invalidate sessions that already use the mapping. The PowerCenter Server does not execute
invalid sessions.
You must reconnect to the folder to see the effect of mapping changes on Session tasks. For
details on validating mappings, see Mappings in the Designer Guide.
When you edit a session based on an invalid mapping, the Workflow Manager displays a
warning message:
The mapping [mapping_name] associated with the session [session_name] is
invalid.
Leave session attributes blank. For example, the session is invalid if you do not specify the
source file name.
Change the code page of a session database connection to an incompatible code page.
If you delete objects associated with a Session task such as session configuration object, Email,
or Command task, the Workflow Manager marks a reusable session invalid. However, the
Workflow Manager does not mark a non-reusable session invalid if you delete an object
associated with the session.
If you delete a shortcut to a source or target from the mapping, the Workflow Manager does
not mark the session invalid.
The Workflow Manager does not validate SQL overrides or filter conditions entered in the
session properties when you validate a session. You must validate SQL override and filter
conditions in the SQL Editor.
If a reusable session task is invalid, the Workflow Manager displays an invalid icon over the
session task in the Navigator and in the Task Developer workspace. This does not affect the
validity of the session instance and the workflows using the session instance.
If a reusable or non-reusable session instance is invalid, the Workflow Manager marks it
invalid in the Navigator and in the Workflow Designer workspace. Workflows using the
session instance remain valid.
To validate a session, select the session in the workspace and choose Tasks-Validate. Or, rightclick the session instance in the workspace and choose Validate.
Validating a Session
195
2.
3.
196
Choose whether to save objects and check in objects that you validate.
2.
Double-click the session in the workflow. The Edit Tasks dialog box appears.
3.
Click the Select Server button on the General tab. A list of registered servers appear.
Select a
server.
4.
197
5.
Instead of choosing a server for each session in the folder, you can assign multiple sessions to
a server.
2.
3.
198
From the Choose Server list, select the server you want to assign.
4.
From the Show Folder list, select the folder you want to view. Or, choose All to view
workflows in all folders in the repository.
5.
6.
7.
Click Assign.
You can remove an assigned server from a session in the Assign Server dialog box. Perform the
following steps to remove an assigned server from a session.
To remove an assigned server:
1.
2.
3.
4.
From the Show Folder list, select the folder you want to view. Or, choose All to view
workflows in all folders in the repository.
5.
Select the sessions from which you want to remove the assigned server.
6.
Click Assign.
199
Threshold Errors
You can choose to stop a session on a designated number of non-fatal errors. A non-fatal error
is an error that does not force the session to stop on its first occurrence. Establish the error
threshold in the session properties with the Stop On option. When you enable this option,
the PowerCenter Server counts non-fatal errors that occur in the reader, writer, and
transformation threads.
The PowerCenter Server maintains an independent error count when reading sources,
transforming data, and writing to targets. The PowerCenter Server counts the following nonfatal errors when you set the stop on option in the session properties:
Reader errors. Errors encountered by the PowerCenter Server while reading the source
database or source files. Reader threshold errors can include alignment errors while
running a session in Unicode mode.
Writer errors. Errors encountered by the PowerCenter Server while writing to the target
database or target files. Writer threshold errors can include key constraint violations,
loading nulls into a not null field, and database trigger responses.
When you create multiple partitions in a pipeline, the PowerCenter Server maintains a
separate error threshold for each partition. When the PowerCenter Server reaches the error
threshold for any partition, it stops the session. The writer may continue writing data from
one or more partitions, but it does not affect your ability to perform a successful recovery.
Note: If alignment errors occur in a non line-sequential VSAM file, the PowerCenter Server
Fatal Error
A fatal error occurs when the PowerCenter Server cannot access the source, target, or
repository. This can include loss of connection or target database errors, such as lack of
200
database space to load data. If the session uses a Normalizer or Sequence Generator
transformation, the PowerCenter Server cannot update the sequence values in the repository,
and a fatal error occurs.
If the session does not use a Normalizer or Sequence Generator transformation, and the
PowerCenter Server loses connection to the repository, the PowerCenter Server does not stop
the session. The session completes, but the PowerCenter Server cannot log session statistics
into the repository.
ABORT Function
Use the ABORT function in the mapping logic to abort a session when the PowerCenter
Server encounters a designated transformation error.
For more information about ABORT, see Functions in the Transformation Language
Reference.
User Command
You can stop or abort the session from the Workflow Manager. You can also stop the session
using pmcmd.
201
202
In the Navigator window of the Workflow Manager, right-click the Session task and
select View Persistent Values.
2.
3.
203
The precision attributed to a number also includes the scale of the number. For example, the
value 11.47 has a precision of 4 and a scale of 2.
For example, you might have a mapping with Decimal (20,0) that passes the number
40012030304957666903. If you enable high precision, the PowerCenter Server passes the
number as is. If you do not enable high precision, the PowerCenter Server passes
4.00120303049577 x 10 19.
If you want to process a Decimal value with a precision greater than 28 digits, the
PowerCenter Server automatically treats it as a Double value. For example, if you want to
process the number 2345678904598383902092.1927658, which has a precision of 29 digits,
the PowerCenter Server automatically treats this number as a Double value of
2.34567890459838 x 10 21.
To use high precision data handling in a session:
1.
204
2.
Enable
High
Precision
3.
205
206
Chapter 8
Overview, 208
207
Overview
In the Workflow Manager, you can create sessions with the following sources:
Relational. You can extract data from any relational database that the PowerCenter Server
can connect to. When extracting data from relational sources and Application sources, you
must configure the database connection to the data source prior to configuring the session.
File. You can create a session to extract data from a flat file, COBOL, or XML source. The
PowerCenter Server can extract data from any local directory or FTP connection for the
source file. If the file source requires an FTP connection, you need to configure the FTP
connection to the host machine before you create the session.
Heterogeneous. You can extract data from multiple sources in the same session. You can
extract from multiple relational sources, such as Oracle and SQL Server. Or, you can
extract from multiple source types, such as relational and flat file. When you configure a
session with heterogeneous sources, configure each source instance separately.
Globalization Features
You can choose a code page that you want the PowerCenter Server to use for relational sources
and flat files. You specify code pages for relational sources when you configure database
connections in the Workflow Manager. You can set the code page for file sources in the session
properties. For more information about code pages, see Globalization Overview in the
Installation and Configuration Guide.
Source Connections
Before you can extract data from a source, you must configure the connection properties the
PowerCenter Server uses to connect to the source file or database. You can configure source
database and FTP connections in the Workflow Manager.
For more information on creating database connections, see Configuring the Workflow
Manager on page 37. For more information on creating FTP connections, see Using FTP
on page 559.
additional memory blocks. If the PowerCenter Server cannot allocate enough memory blocks
to hold the data, it fails the session.
For more information on allocating buffer memory, see Optimizing the Session on
page 655.
Partitioning Sources
You can create multiple partitions for relational, Application, and file sources. For relational
or Application sources, the PowerCenter Server creates a separate connection to the source
database for each partition you set in the session properties. For file sources, you can
configure the session to read the source with one thread or multiple threads.
For more information on partitioning data, see Pipeline Partitioning on page 345.
Overview
209
The Sources node lists the sources used in the session and displays their settings. To view and
configure settings for a source, select the source from the list. You can configure the following
settings for a source:
Readers
Connections
Properties
Configuring Readers
You can click the Readers settings on the Sources node to view the reader the PowerCenter
Server uses with each source instance. The Workflow Manager specifies the necessary reader
for each source instance in the Readers settings on the Sources node.
210
Figure 8-2 shows the Readers settings in the Sources node of the Mapping tab:
Figure 8-2. Readers Settings in the Sources Node of the Mapping Tab
Configuring Connections
Click the Connections settings on the Sources node to define source connection information.
211
Figure 8-3 shows the Connections settings in the Sources node of the Mapping tab:
Figure 8-3. Connections Settings in the Sources Node
Edit a
connection.
Choose a
connection.
For relational sources, choose a configured database connection in the Value column for each
relational source instance. By default, the Workflow Manager displays the source type for
relational sources. For details on configuring database connections, see Selecting the Source
Database Connection on page 214.
For flat file and XML sources, choose one of the following source connection types in the
Type column for each source instance:
FTP. If you want to read data from a flat file or XML source using FTP, you must specify
an FTP connection when you configure source options. You must define the FTP
connection in the Workflow Manager prior to configuring the session.
You must have read permission for any FTP connection you want to associate with the
session. The user starting the session must have execute permission for any FTP
connection associated with the session. For details on using FTP, see Using FTP on
page 559.
None. Choose None when you want to read from a local flat file or XML file.
Configuring Properties
Click the Properties settings in the Sources node to define source property information. The
Workflow Manager displays properties, such as source file name and location for flat file,
212
COBOL, and XML source file types. You do not need to define any properties on the
Properties settings for relational sources.
Figure 8-4 shows the Properties settings in the Sources node of the Mapping tab:
Figure 8-4. Properties Settings in the Sources Node of the Mapping Tab
For more information on configuring sessions with relational sources, see Working with
Relational Sources on page 214. For more information on configuring sessions with flat file
sources, see Working with File Sources on page 218. For more information on configuring
sessions with XML sources, see the XML User Guide.
213
Source database connection. Select the database connection for each relational source. For
more information, see Selecting the Source Database Connection on page 214.
Treat source rows as. Define how the PowerCenter Server treats each source row as it reads
it from the source table. For more information, see Defining the Treat Source Rows As
Property on page 214.
Table owner name. Define the table owner name for each relational source. For more
information, see Configuring the Table Owner Name on page 216.
Override SQL query. You can override the default SQL query to extract source data. For
more information, see Overriding the SQL Query on page 216.
214
Figure 8-5 shows the Treat Source Rows As property on the General Options settings:
Figure 8-5. Treat Source Rows As Property
Treat Source
Rows As
Property
Table 8-1 describes the options you can choose for the Treat Source Rows As property:
Table 8-1. Treat Source Rows As Options
Treat Source Rows As Option
Description
Insert
The PowerCenter Server marks all rows to insert into the target.
Delete
The PowerCenter Server marks all rows to delete from the target.
Update
The PowerCenter Server marks all rows to update the target. You can further
define the update operation in the target options. For more information, see Target
Properties on page 241.
Data Driven
The PowerCenter Server uses the Update Strategy transformations in the mapping
to determine the operation on a row-by-row basis. You define the update operation
in the target options. If the mapping contains an Update Strategy transformation,
this option defaults to Data Driven. You can also use this option when the mapping
contains Custom transformations configured to set the update strategy.
Once you determine how to treat all rows in the session, you also need to set update strategy
options for individual targets. For more information on setting the target update strategy
options, see Target Properties on page 241.
For more information on setting the update strategy for a session, see Update Strategy
Transformation in the Transformation Guide.
Working with Relational Sources
215
Owner Name
216
Figure 8-7 shows the Properties settings in the Sources node where you can override the SQL
query:
Figure 8-7. SQL Query Override Property in the Session Properties
SQL Query
2.
3.
4.
Click the Open button in the SQL Query field to open the SQL Editor.
5.
6.
217
Source properties. You can define source properties on the Properties settings in the
Sources node, such as source file options. For more information, see Configuring Source
Properties on page 218.
Flat file properties. You can edit fixed-width and delimited source file properties. For
more information, see Configuring Fixed-Width File Properties on page 220 and
Configuring Delimited File Properties on page 222.
Line sequential buffer length. You can change the buffer length for flat files on the
Advanced settings on the Config Object tab. For more information, see Configuring Line
Sequential Buffer Length on page 225.
Treat source rows as. Define how the PowerCenter Server treats each source row as it reads
it from the source. For more information, see Defining the Treat Source Rows As
Property on page 214.
218
Figure 8-8 shows the flat file source properties you define in the Properties settings of the
Sources node on the Mapping tab:
Figure 8-8. Properties Settings in the Sources Node for a Flat File Source
219
Table 8-2 describes the properties you define on the Properties settings for flat file source
definitions:
Table 8-2. Flat File Source Properties
File Source
Options
Required/
Optional
Source File
Directory
Optional
Enter the directory name in this field. By default, the PowerCenter Server looks
in the server variable directory, $PMSourceFileDir, for file sources.
If you specify both the directory and file name in the Source Filename field,
clear this field. The PowerCenter Server concatenates this field with the Source
Filename field when it runs the session.
You can also use the $InputFileName session parameter to specify the file
directory.
For details on session parameters, see Session Parameters on page 495.
Source Filename
Required
Enter the file name, or file name and path. Optionally use the $InputFileName
session parameter for the file name.
The PowerCenter Server concatenates this field with the Source File Directory
field when it runs the session. For example, if you have C:\data\ in the Source
File Directory field, then enter filename.dat in the Source Filename field.
When the PowerCenter Server begins the session, it looks for
C:\data\filename.dat.
By default, the Workflow Manager enters the file name configured in the source
definition.
For details on session parameters, see Session Parameters on page 495.
Source Filetype
Required
Optional
Opens a dialog box that allows you to override source file properties. By
default, the Workflow Manager displays file properties as configured in the
source definition.
For more information, see Configuring Fixed-Width File Properties on
page 220 and Configuring Delimited File Properties on page 222.
Description
220
To edit the fixed-width properties, select Fixed Width and click Advanced. The Fixed-Width
Properties dialog box appears. By default, the Workflow Manager displays file properties as
configured in the mapping. Edit these settings to override those configured in the source
definition.
Figure 8-10 shows the Fixed-Width Properties dialog box:
Figure 8-10. Fixed-Width File Properties Dialog Box
221
Table 8-3 describes options you can define in the Fixed Width Properties dialog box for file
sources:
Table 8-3. Fixed-Width File Properties for File Sources
Fixed-Width
Properties Options
Required/
Optional
Text/Binary
Required
Indicates the character representing a null value in the file. This can be any
valid character in the file code page, or any binary value from 0 to 255. For
more information about specifying null characters, see Null Character
Handling on page 227.
Repeat Null
Character
Optional
Code Page
Required
Select the code page of the fixed-width file. The default setting is the client
code page.
Number of Initial
Rows to Skip
Optional
The PowerCenter Server skips the specified number of rows before reading
the file. Use this to skip header rows. One row may contain multiple records.
If you select the Line Sequential File Format option, the PowerCenter Server
ignores this option.
Number of Bytes to
Skip Between
Records
Optional
Optional
If selected, the PowerCenter Server strips trailing blank spaces from records
before passing them to the Source Qualifier transformation.
Optional
Select this option if the file uses a carriage return at the end of each record,
shortening the final column.
Description
222
To edit the delimited properties, select Delimited and click Advanced. The Delimited File
Properties dialog box appears. By default, the Workflow Manager displays file properties as
configured in the mapping. Edit these settings to override those configured in the source
definition.
Figure 8-12 shows the Delimited File Properties dialog box:
Figure 8-12. Delimited File Properties Dialog Box
223
Table 8-4 describes options you can define in the Delimited File Properties dialog box for file
sources:
Table 8-4. Delimited File Properties for File Sources
Delimited File
Properties Options
Required/
Optional
Delimiters
Required
Character used to separate columns of data in the source file. Use the button
to the right of this field to enter a different delimiter. Delimiters can be either
printable or single-byte unprintable characters, and must be different from
the escape character and the quote character (if selected). You cannot select
unprintable multibyte characters as delimiters. The delimiter must be in the
same code page as the flat file code page.
Treat Consecutive
Delimiters as One
Optional
Optional Quotes
Required
Description
tomorrow:
224
Code Page
Required
Select the code page of the delimited file. The default setting is the client
code page.
Escape Character
Optional
Required/
Optional
Remove Escape
Character From Data
Optional
This option is selected by default. Clear this option to include the escape
character in the output string.
Number of Initial
Rows to Skip
Optional
The PowerCenter Server skips the specified number of rows before reading
the file. Use this to skip title or header rows in the file.
Description
Line
Sequential
Buffer Length
225
Character set
Tab handling
Character Set
You can configure the PowerCenter Server to run sessions in either ASCII or Unicode data
movement mode.
Table 8-5 describes source file formats supported by each data movement path in
PowerCenter:
Table 8-5. Support for ASCII and Unicode Data Movement Modes
Character Set
Unicode mode
ASCII mode
7-bit ASCII
Supported
Supported
US-EBCDIC
(COBOL sources only)
Supported
Supported
8-bit ASCII
Supported
Supported
8-bit EBCDIC
(COBOL sources only)
Supported
Supported
ASCII-based MBCS
Supported
EBCDIC-based MBCS
Supported
If you configure a session to run in ASCII data movement mode, delimiters, escape
characters, and null characters must be valid in the ISO Western European Latin 1 code page.
Any 8-bit characters you specified in previous versions of PowerCenter are still valid. In
Unicode data movement mode, delimiters, escape characters, and null characters must be
valid in the specified code page of the flat file.
For more information about configuring and working with data movement modes, see
Globalization Overview in the Installation and Configuration Guide.
226
Non-line sequential file. The PowerCenter Server skips rows containing misaligned data
and resumes reading the next row. The skipped row appears in the session log with a
corresponding error message. If an alignment error occurs at the end of a row, the
PowerCenter Server skips both the current row and the next row, and writes them to the
session log.
Line sequential file. The PowerCenter Server skips rows containing misaligned data and
resumes reading the next row. The skipped row appears in the session log with a
corresponding error message.
Reader error threshold. You can configure a session to stop after a specified number of
non-fatal errors. A row containing an alignment error increases the error count by 1. The
session stops if the number of rows containing errors reaches the threshold set in the
session properties. Errors and corresponding error messages appear in the session log file.
Fixed-width COBOL sources are always byte-oriented and can be line sequential. The
PowerCenter Server handles COBOL files according to the following guidelines:
Line sequential files. The PowerCenter Server skips rows containing misaligned data and
writes the skipped rows to the session log. The session stops if the number of error rows
reaches the error threshold.
Non-line sequential files. The session stops at the first row containing misaligned data.
227
Table 8-6 describes how the PowerCenter Server uses the Null Character and Repeat Null
Character properties to determine if a column is null:
Table 8-6. Null Character Handling
Null
Character
Repeat Null
Character
Binary
Disabled
A column is null if the first byte in the column is the binary null character. The
PowerCenter Server reads the rest of the column as text data only to determine the
column alignment and track the shift state for shift sensitive code pages. If data in the
column is misaligned, the PowerCenter Server skips the row and writes the skipped row
and a corresponding error message to the session log.
Non-binary
Disabled
A column is null if the first character in the column is the null character. The
PowerCenter Server reads the rest of the column only to determine the column
alignment and track the shift state for shift sensitive code pages. If data in the column is
misaligned, the PowerCenter Server skips the row and writes the skipped row and a
corresponding error message to the session log.
Binary
Enabled
A column is null if it contains only the specified binary null character. The next column
inherits the initial shift state of the code page.
Non-binary
Enabled
A column is null if the repeating null character fits into the column exactly, with no bytes
leftover. For example, a five-byte column is not null if you specify a two-byte repeating
null character. In shift-sensitive code pages, shift bytes do not affect the null value of a
column. A column is still null if it contains a shift byte at the beginning or end of the
column.
Informatica recommends you specify a single-byte null character if you use repeating
non-binary null characters. This ensures that repeating null characters fit into a column
exactly.
The file is fixed-width line-sequential with a carriage return or line feed that appears
sooner than expected.
The file is fixed-width non-line sequential, and the last line in the file is shorter than
expected.
In these cases, the PowerCenter Server reads the data but does not append any blanks to fill
the remaining bytes. The PowerCenter Server reads subsequent fields as NULL. Fields
containing repeating null characters that do not fill the entire field length are not considered
NULL.
228
229
PowerCenter Server performs incremental aggregation across all listed source files.
230
Text file
One file name, or path and file name, for each line
The PowerCenter Server skips blank lines and ignores leading blank spaces. Any characters
indicating a new line, such as \n in ASCII files, must be valid in the code page of the
PowerCenter Server.
The following example shows a valid file list created for a PowerCenter Server on Windows.
Each of the drives listed are mapped on the server machine. The western_trans.dat file is
located in the same directory as the file list.
western_trans.dat
d:\data\eastern_trans.dat
e:\data\midwest_trans.dat
f:\data\canada_trans.dat
Once you create the file list, place it in a directory local to the PowerCenter Server.
2.
231
3.
Source
Filename
Indirect
File Type
4.
5.
In the Source Filename field, replace the file name with the name of the file list.
If necessary, also enter the path in the Source File Directory field.
If you enter only a file name in the Source Filename field, and you have specified a path
in the Source File Directory field, the PowerCenter Server looks for the named file in the
listed directory.
If you enter only a file name in the Source Filename field, and you do not specify a path
in the Source File Directory field, the PowerCenter Server looks for the named file in the
directory where the PowerCenter Server is installed on UNIX or in the system directory
on Windows.
6.
232
Click OK.
Chapter 9
Overview, 234
233
Overview
In the Workflow Manager, you can create sessions with the following targets:
Relational. You can load data to any relational database that the PowerCenter Server can
connect to. When loading data to relational targets, you must configure the database
connection to the target before you configure the session.
File. You can load data to a flat file or XML target. The PowerCenter Server can load data
to any local directory or FTP connection for the target file. If the file target requires an
FTP connection, you need to configure the FTP connection to the host machine before
you create the session.
Heterogeneous. You can output data to multiple targets in the same session. You can
output to multiple relational targets, such as Oracle and Microsoft SQL Server. Or, you
can output to multiple target types, such as relational and flat file. For more information,
see Working with Heterogeneous Targets on page 274.
Globalization Features
You can configure the PowerCenter Server to run sessions in either ASCII or Unicode data
movement mode.
Table 9-1 describes target character sets supported by each data movement mode in
PowerCenter:
Table 9-1. Support for ASCII and Unicode Data Movement Modes
Character Set
Unicode Mode
ASCII Mode
7-bit ASCII
Supported
Supported
8-bit ASCII
Supported
Supported
ASCII-based MBCS
Supported
UTF-8
PowerCenter allows you to work with targets that use multibyte character sets. You can choose
a code page that you want the PowerCenter Server to use for relational objects and flat files.
You specify code pages for relational objects when you configure database connections in the
Workflow Manager. The code page for a database connection used as a target must be a
superset of the repository code page.
When you change the database connection code page to one that is not two-way compatible
with the old code page, the Workflow Manager generates a warning and invalidates all
sessions that use that database connection.
234
Code pages you select for a file represent the code page of the data contained in these files. If
you are working with flat files, you can also specify delimiters and null characters supported
by the code page you have specified for the file.
Target code pages must be a superset of the repository code page. They must also be a superset
of the source code page and the PowerCenter Server code page.
However, if you configure the PowerCenter Server and Client for relaxed code page
validation, you can select any code page supported by PowerCenter for the target database
connection. When using relaxed code page validation, select compatible code pages for the
source and target data to prevent data inconsistencies. For more information about code page
compatibility, see Globalization Overview in the Installation and Configuration Guide.
If the target contains multibyte character data, configure the PowerCenter Server to run in
Unicode mode. When the PowerCenter Server runs a session in Unicode mode, it uses the
database code page to translate data.
If the target contains only single-byte characters, configure the PowerCenter Server to run in
ASCII mode. When the PowerCenter Server runs a session in ASCII mode, it does not
validate code pages.
Target Connections
Before you can load data to a target, you must configure the connection properties the
PowerCenter Server uses to connect to the target file or database. You can configure target
database and FTP connections in the Workflow Manager.
For details on creating database connections, see Setting Up a Relational Database
Connection on page 53. For details on creating FTP connections, see Using FTP on
page 559.
Partitioning Targets
When you create multiple partitions in a session with a relational target, the PowerCenter
Server creates multiple connections to the target database to write target data concurrently.
When you create multiple partitions in a session with a file target, the PowerCenter Server
creates one target file for each partition. You can configure the session properties to merge
these target files.
For details on configuring a session for pipeline partitioning, see Pipeline Partitioning on
page 345.
Overview
235
Targets Node
Writers Settings
Connections Settings
Properties Settings
Transformations View
The Targets node contains the following settings where you define properties:
Writers
Connections
Properties
Configuring Writers
Click the Writers settings in the Transformations view to define the writer to use with each
target instance.
236
Figure 9-2 shows you define the writer to use with each target instance:
Figure 9-2. Writers Settings on the Mapping Tab of the Session Properties
Writers Settings
When the mapping target is a flat file, an XML file, an SAP BW target, or an IBM MQSeries
target, the Workflow Manager specifies the necessary writer in the session properties.
However, when the target in the mapping is relational, you can change the writer type to File
Writer if you plan to use an external loader.
Note: You can change the writer type for non-reusable sessions in the Workflow Designer and
for reusable sessions in the Task Developer. You cannot change the writer type for instances of
reusable sessions in the Workflow Designer.
When you override a relational target to use the file writer, the Workflow Manager changes
the properties for that target instance on the Properties settings. It also changes the
connection options you can define in the Connections settings.
After you override a relational target to use a file writer, define the file properties for the
target. Click Set File Properties and choose the target to define. For more information, see
Configuring Fixed-Width Properties on page 265 and Configuring Delimited Properties
on page 266.
Configuring Connections
View the Connections settings on the Mapping tab to define target connection information.
237
Figure 9-3 shows the Connections settings on the Mapping tab of the session properties:
Figure 9-3. Connections Settings on the Mapping Tab of the Session Properties
Connections Settings
Choose a connection.
Edit a connection.
For relational targets, the Workflow Manager displays Relational as the target type by default.
In the Value column, choose a configured database connection for each relational target
instance. For details on configuring database connections, see Target Database Connection
on page 241.
For flat file and XML targets, choose one of the following target connection types in the Type
column for each target instance:
FTP. If you want to load data to a flat file or XML target using FTP, you must specify an
FTP connection when you configure target options. FTP connections must be defined in
the Workflow Manager prior to configuring sessions.
You must have read permission for any FTP connection you want to associate with the
session. The user starting the session must have execute permission for any FTP
connection associated with the session. For details on using FTP, see Using FTP on
page 559.
Loader. You can use the external loader option to improve the load speed to Oracle, DB2,
Sybase IQ, or Teradata target databases.
To use this option, you must use a mapping with a relational target definition and choose
File as the writer type on the Writers settings for the relational target instance. The
PowerCenter Server uses an external loader to load target files to the Oracle, DB2, Sybase
238
IQ, or Teradata database. You cannot choose external loader if the target is defined in the
mapping as a flat file, XML, MQ, or SAP BW target.
For details on using the external loader feature, see External Loading on page 523.
Queue. Choose Queue when you want to output to an IBM MQSeries message queue. For
details, see the PowerCenter Connect for IBM MQSeries User and Administrator Guide.
None. Choose None when you want to write to a local flat file or XML file.
Configuring Properties
View the Properties settings on the Mapping tab to define target property information. The
Workflow Manager displays different properties for the different target types: relational, flat
file, and XML.
Figure 9-4 shows the Properties settings on the Mapping tab:
Figure 9-4. Properties Settings on the Mapping Tab of the Session Properties
Properties Settings
For more information on relational target properties, see Working with Relational Targets
on page 240. For more information on flat file target properties, see Working with File
Targets on page 261. For more information on XML target properties, see Working with
Heterogeneous Targets on page 274.
For more information on configuring sessions with multiple target types, see Working with
Heterogeneous Targets on page 274.
239
Target properties. You can define target properties such as target load type, target update
options, and reject options. For more information, see Target Properties on page 241.
Truncate target tables. The PowerCenter Server can truncate target tables before loading
data. For more information, see Truncating Target Tables on page 245.
Deadlock retry. You can configure the session to retry deadlocks when writing to targets.
For more information, see Deadlock Retry on page 246.
Drop and recreate indexes. Use pre- and post-session SQL to drop and recreate an index
on a relational target table to optimize query speed. For more information, see Dropping
and Recreating Indexes on page 248.
Constraint-based loading. The PowerCenter Server can load data to targets based on
primary key-foreign key constraints and active sources in the session mapping. For more
information, see Constraint-Based Loading on page 248.
Bulk loading. You can specify bulk mode when loading to DB2, Microsoft SQL Server,
Oracle, and Sybase databases. For more information, see Bulk Loading on page 252.
You can define the following properties in the session and override the properties you define
in the mapping:
Table name prefix. You can specify the target owner name or prefix in the session
properties to override the table name prefix in the mapping. For more information, see
Table Name Prefix on page 254.
Pre-session SQL. You can create SQL commands and execute them in the target database
before loading data to the target. For example, you might want to drop the index for the
target table before loading data into it. For more information, see Using Pre- and PostSession SQL Commands on page 186.
Post-session SQL. You can create SQL commands and execute them in the target database
after loading data to the target. For example, you might want to recreate the index for the
target table after loading data into it. For more information, see Using Pre- and PostSession SQL Commands on page 186.
If any target table or column name contains a database reserved word, you can create and
maintain a reserved words file containing database reserved words. When the PowerCenter
Server executes SQL against the database, it places quotes around the reserved words. For
more information, see Reserved Words on page 255.
When the PowerCenter Server runs a session with at least one relational target, it performs
database transactions per target connection group. For example, it commits all data to targets
240
in a target connection group at the same time. For more information, see Working with
Target Connection Groups on page 257.
Target Properties
You can configure session properties for relational targets in the Transformations view on the
Mapping tab, and in the General Options settings on the Properties tab. Define the properties
for each target instance in the session.
When you click the Transformations view on the Mapping tab, you can view and configure
the settings of a specific target. Select the target under the Targets node.
241
Figure 9-5 shows the relational target properties you define in the Properties settings on the
Mapping tab:
Figure 9-5. Properties Settings on the Mapping Tab for a Relational Target
Table 9-2 describes the properties available in the Properties settings on the Mapping tab of
the session properties:
Table 9-2. Relational Target Properties
242
Target Property
Required/
Optional
Required
Insert*
Optional
If selected, the PowerCenter Server inserts all rows flagged for insert.
By default, this option is selected.
Optional
If selected, the PowerCenter Server updates all rows flagged for update.
By default, this option is selected.
Optional
If selected, the PowerCenter Server inserts all rows flagged for update.
By default, this option is not selected.
Description
Required/
Optional
Optional
If selected, the PowerCenter Server updates rows flagged for update if they
exist in the target, then inserts any remaining rows marked for insert.
By default, this option is not selected.
Delete*
Optional
If selected, the PowerCenter Server deletes all rows flagged for delete.
By default, this option is selected.
Truncate Table
Optional
Optional
Enter the directory name in this field. By default, the PowerCenter Server
writes all reject files to the server variable directory, $PMBadFileDir.
If you specify both the directory and file name in the Reject Filename field,
clear this field. The PowerCenter Server concatenates this field with the
Reject Filename field when it runs the session.
You can also use the $BadFileName session parameter to specify the file
directory.
For details on session parameters, see Session Parameters on page 495.
Reject Filename
Required
Enter the file name, or file name and path. By default, the PowerCenter
Server names the reject file after the target instance name:
target_name.bad. Optionally use the $BadFileName session parameter for
the file name.
The PowerCenter Server concatenates this field with the Reject File
Directory field when it runs the session. For example, if you have
C:\reject_file\ in the Reject File Directory field, and enter filename.bad in
the Reject Filename field, the PowerCenter Server writes rejected rows to
C:\reject_file\filename.bad.
For details on session parameters, see Session Parameters on page 495.
Description
*For details on target update strategies, see Update Strategy Transformation in the Transformation Guide.
243
Figure 9-6 shows the test load options in the General Options settings on the Properties tab:
Figure 9-6. Test Load Options
Test Load
Options
Table 9-3 describes the test load options on the General Options settings on the Properties
tab:
Table 9-3. Test Load Options
244
Property
Required/
Optional
Optional
Number of Rows to
Test
Optional
Enter the number of source rows you want the PowerCenter Server to test
load.
The PowerCenter Server reads the exact number you configure for the test
load.
Description
DB2
Informix
ODBC
Oracle
Sybase 11.x
*If you use a DB2 database on AS/400, the PowerCenter Server issues a clrpfm command.
** If you use the Microsoft SQL Server ODBC driver, the PowerCenter Server issues a delete statement.
If the PowerCenter Server issues a truncate target table command and the target table instance
specifies a table name prefix, the PowerCenter Server verifies the database user privileges for
the target table by issuing a truncate command. If the database user is not specified as the
target owner name or does not have the database privilege to truncate the target table, the
PowerCenter Server automatically issues a delete command instead and writes the following
error message to the session log:
WRT_8208 Error truncating target table <target table name> trying DELETE
FROM query.
If the PowerCenter Server issues a delete command and the database has logging enabled, the
database saves all deleted records to the log for rollback. If you do not want to save deleted
records for rollback, you can disable logging to improve the speed of the delete.
For all databases, if the PowerCenter Server fails to truncate or delete any selected table
because the user lacks the necessary privileges, the session fails.
If you use truncate target tables with one of the following functions, the PowerCenter Server
fails to successfully truncate target tables for the session:
Incremental aggregation. When you enable both truncate target tables and incremental
aggregation in the session properties, the Workflow Manager issues a warning that you
cannot enable truncate target tables and incremental aggregation in the same session.
245
Test load. When you enable both truncate target tables and test load, the PowerCenter
Server disables the truncate table function, runs a test load session, and writes the
following message to the session log:
WRT_8105 Truncate target tables option turned off for test load session.
2.
Click the Mapping tab, and then click the Transformations view.
3.
Truncate Target
Table Option
4.
In the Properties settings, select Truncate Target Table Option for each target table you
want the PowerCenter Server to truncate before it runs the session.
5.
Click OK.
Deadlock Retry
Select the Session Retry on Deadlock option in the session properties if you want the
PowerCenter Server to retry target writes on a deadlock. A deadlock might occur when the
PowerCenter Server attempts to take control of the same lock for a row when loading
partitioned targets or when running two sessions simultaneously to the same target.
246
If the PowerCenter Server encounters a deadlock when it tries to write to a target, the
deadlock only affects targets in the same target connection group. The PowerCenter Server
still writes to targets in other target connection groups.
Encountering deadlocks can slow session performance. To improve session performance, you
can increase the number of target connection groups the PowerCenter Server uses to write to
the targets in a session. To use a different target connection group for each target in a session,
use a different database connection name for each target instance. If you want, you can specify
the same connection information for each connection name. For more information, see
Working with Target Connection Groups on page 257.
You can only retry sessions on deadlock for targets configured for normal load. If you select
this option and configure a target for bulk mode, the PowerCenter Server does not retry target
writes on a deadlock for that target. You can also configure the PowerCenter Server to set the
number of deadlock retries and the deadlock sleep time period. For more information on
configuring the PowerCenter Server, see the Installation and Configuration Guide.
To retry a session on deadlock, click the Properties tab in the session properties and then
scroll down to the Performance settings.
Figure 9-7 shows how to retry sessions on deadlock:
Figure 9-7. Session Retry on Deadlock
Session Retry
on Deadlock
247
Using pre- and post-session SQL. The preferred method for dropping and re-creating
indexes is to define a SQL statement in the Pre SQL property that drops indexes before
loading data to the target. You can use the Post SQL property to recreate the indexes after
loading data to the target. Define the Pre SQL and Post SQL properties for relational
targets in the Transformations view on the Mapping tab in the session properties. For more
information, see Using Pre- and Post-Session SQL Commands on page 186.
Using the Designer. The same dialog box you use to generate and execute DDL code for
table creation can drop and recreate indexes. However, this process is not automatic. Every
time you run a session that modifies the target table, you need to launch the Designer and
use this feature.
Constraint-Based Loading
In the Workflow Manager, you can specify constraint-based loading for a session. When you
select this option, the PowerCenter Server orders the target load on a row-by-row basis. For
every row generated by an active source, the PowerCenter Server loads the corresponding
transformed row first to the primary key table, then to any foreign key tables. Constraintbased loading depends on the following requirements:
Active source. Related target tables must have the same active source.
Treat rows as insert. Use this option when you insert into the target. You cannot use
updates with constraint-based loading.
Active Source
When target tables receive rows from different active sources, the PowerCenter Server reverts
to normal loading for those tables, but loads all other targets in the session using constraintbased loading when possible. For example, a mapping contains three distinct pipelines. The
first two contain a source, source qualifier, and target. Since these two targets receive data
from different active sources, the PowerCenter Server reverts to normal loading for both
targets. The third pipeline contains a source, Normalizer, and two targets. Since these two
targets share a single active source (the Normalizer), the PowerCenter Server performs
constraint-based loading: loading the primary key table first, then the foreign key table.
For more information on active sources, see Working with Active Sources on page 259.
Key Relationships
When target tables have no key relationships, the PowerCenter Server does not perform
constraint-based loading. Similarly, when target tables have circular key relationships, the
248
PowerCenter Server reverts to a normal load. For example, you have one target containing a
primary key and a foreign key related to the primary key in a second target. The second target
also contains a foreign key that references the primary key in the first target. The
PowerCenter Server cannot enforce constraint-based loading for these tables. It reverts to a
normal load.
Verify all targets are in the same target load order group and receive data from the same
active source.
Use the default partition properties and do not add partitions or partition points.
Define the same target type for all targets in the session properties.
Define the same database connection name for all targets in the session properties.
Choose normal mode for the target load type for all targets in the session properties.
For more information, see Working with Target Connection Groups on page 257.
Load primary key table in one mapping and dependent tables in another mapping. You
can use constraint-based loading to load the primary table.
For more information about update strategies, see Update Strategy Transformation in the
Transformation Guide.
Constraint-based loading does not affect the target load ordering of the mapping. Target load
ordering defines the order the PowerCenter Server reads the sources in each target load order
group in the mapping. A target load order group is a collection of source qualifiers,
transformations, and targets linked together in a mapping. Constraint-based loading
establishes the order in which the PowerCenter Server loads individual targets within a set of
targets receiving data from a single source qualifier.
249
Example
The session for the mapping in Figure 9-8 is configured to perform constraint-based loading.
In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign keys
referencing the T1 primary key. T_3 has a primary key that T_4 references as a foreign key.
Since these four tables receive records from a single active source, SQ_A, the PowerCenter
Server loads rows to the target in the following order:
T_1
T_4
The PowerCenter Server loads T_1 first because it has no foreign key dependencies and
contains a primary key referenced by T_2 and T_3. The PowerCenter Server then loads T_2
and T_3, but since T_2 and T_3 have no dependencies, they are not loaded in any particular
order. The PowerCenter Server loads T_4 last, because it has a foreign key that references a
primary key in T_3.
Figure 9-8. Mapping Using Constraint-Based Loading
After loading the first set of targets, the PowerCenter Server begins reading source B. If there
are no key relationships between T_5 and T_6, the PowerCenter Server reverts to a normal
load for both targets.
If T_6 has a foreign key that references a primary key in T_5, since T_5 and T_6 receive data
from a single active source, the Aggregator AGGTRANS, the PowerCenter Server loads rows
to the tables in the following order:
250
T_5
T_6
T_1, T_2, T_3, and T_4 are in one target connection group if you use the same database
connection for each target, and you use the default partition properties. T_5 and T_6 are in
another target connection group together if you use the same database connection for each
target and you use the default partition properties. The PowerCenter Server includes T_5 and
T_6 in a different target connection group because they are in a different target load order
group from the first four targets.
To enable constraint-based loading:
1.
In the General Options settings of the Properties tab, choose Insert for the Treat Source
Rows As property.
Treat rows
as insert.
251
2.
Click the Config Object tab. In the Advanced settings, select Constraint Based Load
Ordering.
Constraint Based
Load Ordering
3.
Click OK.
Bulk Loading
You can enable bulk loading when you load to DB2, Sybase, Oracle, or Microsoft SQL Server.
If you enable bulk loading for other database types, the PowerCenter Server reverts to a
normal load. Bulk loading improves the performance of a session that inserts a large amount
of data to the target database. Configure bulk loading on the Mapping tab.
When bulk loading, the PowerCenter Server invokes the database bulk utility and bypasses
the database log, which speeds performance. Without writing to the database log, however,
the target database cannot perform rollback. As a result, you may not be able to perform
recovery. Therefore, you must weigh the importance of improved session performance against
the ability to recover an incomplete session.
For more information on increasing session performance when bulk loading, see Bulk
Loading on page 642.
Note: When loading to DB2, Microsoft SQL Server, and Oracle targets, you must specify a
normal load for data driven sessions. When you specify bulk mode and data driven, the
PowerCenter Server reverts to normal load.
252
Committing Data
When bulk loading to Sybase and DB2 targets, the PowerCenter Server ignores the commit
interval you define in the session properties and commits data when the writer block is full.
When bulk loading to Microsoft SQL Server and Oracle targets, the PowerCenter Server
commits data at each commit interval. Also, Microsoft SQL Server and Oracle start a new
bulk load transaction after each commit.
Tip: When bulk loading to Microsoft SQL Server or Oracle targets, define a large commit
interval to reduce the number of bulk load transactions and increase performance.
Oracle Guidelines
Oracle allows bulk loading for the following software versions:
You can use the Oracle client 8.1.7 if you install the Oracle Threaded Bulk Mode patch.
Use the following guidelines when bulk loading to Oracle:
Do not define primary and foreign keys in the database. However, you can define primary
and foreign keys for the target definitions in the Designer.
To bulk load into indexed tables, choose non-parallel mode. To do this, you must disable
the Enable Parallel Mode option. For more information, see Configuring a Relational
Database Connection on page 56.
Note that when you disable parallel mode, you cannot load multiple target instances,
partitions, or sessions into the same table.
To bulk load in parallel mode, you must drop indexes and constraints in the target tables
before running a bulk load session. After the session completes, you can rebuild them. If
you use bulk loading with the session on a regular basis, you can use pre- and post-session
SQL to drop and rebuild indexes and key constraints.
When you use the LONG datatype, verify it is the last column in the table.
Specify the Table Name Prefix for the target when you use Oracle client 9i. If you do not
specify the table name prefix, the PowerCenter Server uses the database login as the prefix.
DB2 Guidelines
Use the following guidelines when bulk loading to DB2:
You must drop indexes and constraints in the target tables before running a bulk load
session. After the session completes, you can rebuild them. If you use bulk loading with
the session on a regular basis, you can use pre- and post-session SQL to drop and rebuild
indexes and key constraints.
253
You cannot use source-based or user-defined commit when you run bulk load sessions on
DB2.
If you create multiple partitions for a DB2 bulk load session, you must use database
partitioning for the target partition type. If you choose any other partition type, the
PowerCenter Server reverts to normal load and writes the following message to the session
log:
ODL_26097 Only database partitioning is support for DB2 bulk load.
Changing target load type variable to Normal.
When you bulk load to DB2, the DB2 database writes non-fatal errors and warnings to a
message log file in the session log directory. The message log file name is
<session_log_name>.<target_instance_name>.<partition_index>.log. You can check both
the message log file and the session log when you troubleshoot a DB2 bulk load session.
environment SQL, the PowerCenter Server uses table owner name in the target instance. To
use the table owner name specified in the SET sqlid statement, do not enter a name in the
target name prefix.
To specify the target owner name or prefix at the session level:
254
1.
In the Workflow Manager, open the session properties and click the Transformations
view on the Mapping tab.
2.
3.
In the Properties settings, enter the table owner name or prefix in the Table Name Prefix
field, and click OK.
Target Instance
Reserved Words
If any table name or column name contains a database reserved word, such as MONTH or
YEAR, the session fails with database errors when the PowerCenter Server executes SQL
against the database. You can create and maintain a reserved words file, reswords.txt, in the
PowerCenter Server installation directory. When the PowerCenter Server initializes a session,
it searches for reswords.txt. If the file exists, the PowerCenter Server places quotes around
matching reserved words when it executes SQL against the database.
Use the following rules and guidelines when working with reserved words.
The PowerCenter Server searches the reserved words file when it generates SQL to connect
to source, target, and lookup databases.
If you override the SQL for a source, target, or lookup, you must enclose any reserved
word in quotes.
You may need to enable some databases, such as Microsoft SQL Server and Sybase, to use
SQL-92 standards regarding quoted identifiers. You can use environment SQL to issue the
command. For example, with Microsoft SQL Server, you can use the following command:
SET QUOTED_IDENTIFIER ON
255
256
Targets in the same target connection group meet the following criteria:
Have the same database connection name for relational targets, and Application
connection name for SAP BW targets. For more information, see the PowerCenter
Connect for SAP BW User and Administrator Guide.
Have the same target load type, either normal or bulk mode.
For example, suppose you create a session based on a mapping that reads data from one source
and writes to two Oracle target tables. In the Workflow Manager, you do not create multiple
partitions in the session. You use the same Oracle database connection for both target tables
in the session properties. You specify normal mode for the target load type for both target
tables in the session properties. The targets in the session belong to the same target
connection group.
Suppose you create a session based on the same mapping. In the Workflow Manager, you do
not create multiple partitions. However, you use one Oracle database connection name for
one target, and you use a different Oracle database connection name for the other target. You
specify normal mode for the target load type for both target tables. The targets in the session
belong to different target connection groups.
Note: When you define the target database connections for multiple targets in a session using
session parameters, the targets may or may not belong to the same target connection group.
The targets belong to the same target connection group if all session parameters resolve to the
same target connection name. For example, you create a session with two targets and specify
the session parameter $DBConnection1 for one target, and $DBConnection2 for the other
Working with Target Connection Groups
257
target. In the parameter file, you define $DBConnection1 as Sales1 and you define
$DBConnection2 as Sales1 and run the workflow. Both targets in the session belong to the
same target connection group.
258
Aggregator
Joiner
MQ Source Qualifier
Rank
Sorter
Source Qualifier
Note: Although the Filter, Router, Transaction Control, and Update Strategy transformations
are active transformations, the PowerCenter Server does not use them as active sources in a
pipeline.
Active sources affect how the PowerCenter Server processes a session when you use any of the
following transformations or session properties:
XML targets. The PowerCenter Server can load data from different active sources to an
XML target when each input group receives data from one active source. For more
information on XML targets, see Working with XML Targets in the XML User Guide.
Mapplets. An Input transformation must receive data from a single active source. For
more information on connecting mapplets to active sources in mappings, see Mapplets
in the Designer Guide.
Source-based commit. Some active sources generate commits. When you run a sourcebased commit session, the PowerCenter Server generates a commit from these active
sources at every commit interval. For more information on source-based commit sessions,
see Source-Based Commits on page 278.
259
260
Constraint-based loading. To use constraint-based loading, you must connect all related
targets to the same active source. The PowerCenter Server orders the target load on a rowby-row basis based on rows generated by an active source. For more information on
constraint-based loading, see Constraint-Based Loading on page 248.
Row error logging. If an error occurs downstream from an active source that is not a
source qualifier, the PowerCenter Server cannot identify the source row information for
the logged error row. For more information on logging errors, see Overview on
page 482.
Use a flat file target definition. Create a mapping with a flat file target definition. Create
a session using the flat file target definition. When the PowerCenter Server runs the
session, it creates the target flat file based on the flat file target definition.
Use a relational target definition. Use a relational definition to write to a flat file when
you want to use an external loader to load the target. Create a mapping with a relational
target definition. Create a session using the relational target definition. Configure the
session to output to a flat file by specifying the File Writer in the Writers settings on the
Mapping tab. For details on using the external loader feature, see External Loading on
page 523.
You can configure the following properties for flat file targets:
Target properties. You can define target properties such as partitioning options, output
file options, and reject options. For more information, see Configuring Target Properties
on page 261.
Flat file properties. You can choose to create delimited or fixed-width files, and define
their properties. For more information, see Configuring Fixed-Width Properties on
page 265 and Configuring Delimited Properties on page 266.
261
Figure 9-9 shows the flat file target properties you define in the Properties settings on the
Mapping tab in the session properties:
Figure 9-9. Properties Settings on the Mapping Tab for a Flat File Target
Table 9-5 describes the properties you define in the Properties settings for flat file target
definitions:
Table 9-5. Flat File Target Properties
Target Properties
262
Required/
Optional
Description
Merge Partitioned
Files
Optional
When selected, the PowerCenter Server merges the partitioned target files into
one file when the session completes, and then deletes the individual output
files. If the PowerCenter Server fails to create the merged file, it does not
delete the individual output files.
You cannot merge files if the session uses FTP, an external loader, or a
message queue.
For details on configuring a session for partitioning, see Pipeline Partitioning
on page 345.
Merge File
Directory
Optional
Enter the directory name in this field. By default, the PowerCenter Server
writes the merged file in the server variable directory, $PMTargetFileDir.
If you enter a full directory and file name in the Merge File Name field, clear
this field.
Optional
Required/
Optional
Description
Output File
Directory
Optional
Enter the directory name in this field. By default, the PowerCenter Server
writes output files in the server variable directory, $PMTargetFileDir.
If you specify both the directory and file name in the Output Filename field,
clear this field. The PowerCenter Server concatenates this field with the Output
Filename field when it runs the session.
You can also use the $OutputFileName session parameter to specify the file
directory.
For details on session parameters, see Session Parameters on page 495.
Output Filename
Required
Enter the file name, or file name and path. By default, the Workflow Manager
names the target file based on the target definition used in the mapping:
target_name.out.
If the target definition contains a slash character, the Workflow Manager
replaces the slash character with an underscore.
When you use an external loader to load to an Oracle database, you must
specify a file extension. If you do not specify a file extension, the Oracle loader
cannot find the flat file and the PowerCenter Server fails the session. For more
information about external loading, see Loading to Oracle on page 533.
Enter the file name, or file name and path. Optionally use the $OutputFileName
session parameter for the file name.
The PowerCenter Server concatenates this field with the Output File Directory
field when it runs the session.
For details on session parameters, see Session Parameters on page 495.
Note: If you specify an absolute path file name when using FTP, the
PowerCenter Server ignores the Default Remote Directory specified in the FTP
connection. When you specify an absolute path file name, do not use single or
double quotes.
Reject File
Directory
Optional
Enter the directory name in this field. By default, the PowerCenter Server
writes all reject files to the server variable directory, $PMBadFileDir.
If you specify both the directory and file name in the Reject Filename field,
clear this field. The PowerCenter Server concatenates this field with the Reject
Filename field when it runs the session.
You can also use the $BadFileName session parameter to specify the file
directory.
For details on session parameters, see Session Parameters on page 495.
Reject Filename
Required
Enter the file name, or file name and path. By default, the PowerCenter Server
names the reject file after the target instance name: target_name.bad.
Optionally use the $BadFileName session parameter for the file name.
The PowerCenter Server concatenates this field with the Reject File Directory
field when it runs the session. For example, if you have C:\reject_file\ in the
Reject File Directory field, and enter filename.bad in the Reject Filename
field, the PowerCenter Server writes rejected rows to
C:\reject_file\filename.bad.
For details on session parameters, see Session Parameters on page 495.
Optional
Opens a dialog box that allows you to define flat file properties. For more
information, see Configuring Fixed-Width Properties on page 265 and
Configuring Delimited Properties on page 266.
When you output to a flat file using a relational target definition in the mapping,
make sure you define the flat file properties by clicking the Set File Properties
link.
263
Figure 9-10 shows the test load options in the General Options settings on the Properties tab:
Figure 9-10. Test Load Options
Test Load
Options
Table 9-6 describes the test load options in the General Options settings on the Properties
tab:
Table 9-6. Test Load Options
264
Property
Required/
Optional
Optional
Number of Rows to
Test
Optional
Enter the number of source rows you want the PowerCenter Server to test
load.
The PowerCenter Server reads the number you configure for the test load.
Description
To edit the fixed-width properties, select Fixed Width and click Advanced.
Figure 9-12 shows the Fixed Width Properties dialog box:
Figure 9-12. Fixed Width Properties Dialog Box
265
Table 9-7 describes the options you define in the Fixed Width Properties dialog box:
Table 9-7. Writing to a Fixed-Width Target
Fixed-Width
Properties Options
Required/
Optional
Null Character
Required
Enter the character you want the PowerCenter Server to use to represent
null values. You can enter any valid character in the file code page.
For more information about using null characters for target files, see Null
Characters in Fixed-Width Files on page 272.
Optional
Select this option to indicate a null value by repeating the null character to
fill the field. If you do not select this option, the PowerCenter Server enters
a single null character at the beginning of the field to represent a null
value. For more information about specifying null characters for target
files, see Null Characters in Fixed-Width Files on page 272.
Code Page
Required
Select the code page of the fixed-width file. The default setting is the client
code page.
Description
266
Table 9-8 describes the options you can define in the Delimited File Properties dialog box:
Table 9-8. Delimited File Properties
Edit Delimiter
Options
Required/
Optional
Delimiters
Required
Character used to separate columns of data. Use the button to the right of this
field to enter a non-printable delimiter. Delimiters can be either printable or
single-byte unprintable characters, and must be different from the escape
character and the quote character (if selected). You cannot select unprintable
multibyte characters as delimiters.
Optional Quotes
Required
Code Page
Required
Select the code page of the delimited file. The default setting is the client code
page.
Description
267
Writing to fixed-width flat files from relational target definitions. The PowerCenter
Server adds spaces to target columns based on transformation datatype.
Writing to fixed-width flat files from flat file target definitions. You must configure the
precision and field width for flat file target definitions to accommodate the total length of
the target field.
Writing multibyte data to fixed-width files. You must configure the precision of string
columns to accommodate character data. When writing shift-sensitive data to a fixedwidth flat file target, the PowerCenter Server adds shift characters and spaces to meet file
requirements.
Null characters in fixed-width files. The PowerCenter Server writes repeating or nonrepeating null characters to fixed-width target file columns differently depending on
whether the characters are single- or multibyte.
Character set. You can write ASCII or Unicode data to a flat file target.
Writing metadata to flat file targets. You can configure the PowerCenter Server to write
the column header information when you write to flat file targets.
268
Table 9-9 describes the number of bytes the PowerCenter Server adds to the target column
and optional characters it uses for each datatype:
Table 9-9. Datatype Modifications for File Target Columns
Transformation Datatype
Connected to Fixed-Width
Flat File Target Column
Bytes Added by
PowerCenter
Server
Decimal
Double
Float
Integer
Money
Numeric
Real
Writes the row to the reject file for numeric and datetime columns
Note: When the PowerCenter Server writes a row to the reject file, it writes a message in the
session log.
When a session writes to a fixed-width flat file based on a fixed-width flat file target definition
in the mapping, the PowerCenter Server defines the total length of a field by the precision or
field width defined in the target.
Fixed-width files are byte-oriented, which means the total length of a field is measured in
bytes.
269
Table 9-10 describes how the PowerCenter Server measures the total field length for fields in a
fixed-width flat file target definition:
Table 9-10. Field Length Measurements for Fixed-Width Flat File Targets
Datatype
Number
Field width
String
Precision
Datetime
Field width
Table 9-11 lists the characters you must accommodate when you configure the precision or
field width for flat file target definitions to accommodate the total length of the target field:
Table 9-11. Characters to Include when Calculating Field Length for Fixed-Width Targets
Datatype
Characters to Accommodate
Number
- Decimal separator.
- Thousands separators.
- Negative sign (-) for the mantissa.
String
- Multibyte data.
- Shift-in and shift-out characters.
For more information, see Writing Multibyte Data to Fixed-Width Flat Files on page 270.
Datetime
- Date and time separators, such as slashes (/), dashes (-), and colons (:).
For example, the format MM/DD/YYYY HH24:MI:SS has a total length of 19 bytes.
When you edit the flat file target definition in the mapping, define the precision or field
width great enough to accommodate both the target data and the characters in Table 9-11.
For example, suppose you have a mapping with a fixed-width flat file target definition. The
target definition contains a number column with a precision of 10 and a scale of 2. You use a
comma as the decimal separator and a period as the thousands separator. You know some rows
of data might have a negative value. Based on this information, you know the longest possible
number is formatted with the following format:
-NN.NNN.NNN,NN
Open the flat file target definition in the mapping and define the field width for this number
column as a minimum of 14 bytes.
For more information on formatting numeric and datetime values, see Working with Flat
Files in the Designer Guide.
For string columns, the PowerCenter Server truncates the data if the precision is not large
enough to accommodate the multibyte data.
You might work with the following types of multibyte data:
Non shift-sensitive multibyte data. The file contains all multibyte data. Configure the
precision in the target definition to allow for the additional bytes.
For example, you know that the target data contains four double-byte characters, so you
define the target definition with a precision of 8 bytes.
If you configure the target definition with a precision of 4, the PowerCenter Server
truncates the data before writing to the target.
Shift-sensitive multibyte data. The file contains single-byte and multibyte data. When
writing to a shift-sensitive flat file target, the PowerCenter Server adds shift characters and
spaces to meet file requirements. You must configure the precision in the target definition
to allow for the additional bytes and the shift characters. For more information, see
Writing Shift-Sensitive Multibyte Data on page 271.
Note: Delimited files are character-oriented, and you do not need to allow for additional
If a column begins or ends with a double-byte character, the PowerCenter Server adds shift
characters so the column begins and ends with a single-byte shift character.
If the data is shorter than the column width, the PowerCenter Server pads the rest of the
column with spaces.
If the data is longer than the column width, the PowerCenter Server truncates the data so
the column ends with a single-byte shift character.
To illustrate how the PowerCenter Server handles a fixed-width file containing shift-sensitive
data, say you want to output the following data to the target:
SourceCol1
SourceCol2
AAAA
aaaa
The first target column contains eight bytes and the second target column contains four
bytes.
271
The PowerCenter Server must add shift characters to handle shift-sensitive data. Since the
first target column can only handle eight bytes, the PowerCenter Server truncates the data
before it can add the shift characters.
TargetCol1
TargetCol2
-oAAA-i
aaaa
Description
A
-o
-i
Double-byte character
Shift-out character
Shift-in character
For the first target column, the PowerCenter Server writes only three of the double-byte
characters to the target. It cannot write any additional double-byte characters to the output
column because the column must end in a single-byte character. If you add two more bytes to
the first target column definition, then the PowerCenter Server can add shift characters and
write all the data without truncation.
For the second target column, the PowerCenter Server writes all four single-byte characters to
the target. It does not add write shift characters to the column because the column begins and
ends with single-byte characters.
Character Set
You can configure the PowerCenter Server to run sessions with flat file targets in either ASCII
or Unicode data movement mode.
If you configure a session with a flat file target to run in Unicode data movement mode, the
target file code page must be a superset of the PowerCenter Server code page and the source
code page. Delimiters, escape, and null characters must be valid in the specified code page of
the flat file.
If you configure a session to run in ASCII data movement mode, delimiters, escape, and null
characters must be valid in the ISO Western European Latin1 code page. Any 8-bit character
you specified in previous versions of PowerCenter is still valid.
272
For more information about configuring and working with data movement modes and code
pages, see Globalization Overview in the Installation and Configuration Guide.
The column width for ITEM_ID is six. When you enable the Output Metadata For Flat File
Target option, the PowerCenter Server writes the following text to a flat file:
#ITEM_ITEM_NAME
PRICE
100001Screwdriver
9.50
100002Hammer
12.90
100003Small nails
3.00
For information about configuring the PowerCenter Server to output flat file metadata, see
the Installation and Configuration Guide.
273
Multiple target types. You can create a session that writes to both relational and flat file
targets.
Multiple target connection types. You can create a session that writes to a target on an
Oracle database and to a target on a DB2 database. Or, you can create a session that writes
to multiple targets of the same type, but you specify different target connections for each
target in the session.
All database connections you define in the Workflow Manager are unique to the PowerCenter
Server, even if you define the same connection information. For example, you define two
database connections, Sales1 and Sales2. You define the same user name, password, connect
string, code page, and attributes for both Sales1 and Sales2. Even though both Sales1 and
Sales2 define the same connection information, the PowerCenter Server treats them as
different database connections. When you create a session with two relational targets and
specify Sales1 for one target and Sales2 for the other target, you create a session with
heterogeneous targets.
You can create a session with heterogeneous targets in one of the following ways:
Create a session based on a mapping with targets of different types or different database
types. In the session properties, keep the default target types and database types.
Create a session based on a mapping with the same target types. However, in the session
properties, specify different target connections for the different target instances, or
override the target type to a different type.
You can override the target type in the session properties. However, you can only perform
certain overrides. You can specify the following target type overrides in a session:
Relational target to any other relational database type. Verify the datatypes used in the
target definition are compatible with both databases.
Note: When the PowerCenter Server runs a session with at least one relational target, it
performs database transactions per target connection group. For example, it orders the target
load for targets in a target connection group when you enable constraint-based loading. For
more information, see Working with Target Connection Groups on page 257.
274
Chapter 10
Understanding Commit
Points
This chapter covers the following topics:
Overview, 276
275
Overview
A commit interval is the interval at which the PowerCenter Server commits data to targets
during a session. The commit point can be a factor of the commit interval, the commit
interval type, and the size of the buffer blocks. The commit interval is the number of rows
you want to use as a basis for the commit point. The commit interval type is the type of rows
that you want to use as a basis for the commit point. You can choose between the following
commit types:
Target-based commit. The PowerCenter Server commits data based on the number of
target rows and the key constraints on the target table. The commit point also depends on
the buffer block size, the commit interval, and the PowerCenter Server configuration for
writer timeout.
Source-based commit. The PowerCenter Server commits data based on the number of
source rows. The commit point is the commit interval you configure in the session
properties.
276
Target-Based Commits
During a target-based commit session, the PowerCenter Server commits rows based on the
number of target rows and the key constraints on the target table. The commit point depends
on the following factors:
Commit interval. The number of rows you want to use as a basis for commits. Configure
the target commit interval in the session properties.
Writer wait timeout. The amount of time the writer waits before it issues a commit.
Configure the writer wait timeout in the PowerCenter Server setup.
Buffer blocks. Blocks of memory that hold rows of data during a session. You can
configure the buffer block size in the session properties, but you cannot configure the
number of rows the block holds.
When you run a target-based commit session, the PowerCenter Server may issue a commit
before, on, or after, the configured commit interval. The PowerCenter Server uses the
following process to issue commits:
When the PowerCenter Server reaches a commit interval, it continues to fill the writer
buffer block.When the writer buffer block fills, the PowerCenter Server issues a commit.
If the writer buffer fills before the commit interval, the PowerCenter Server writes to the
target, but waits to issue a commit. It issues a commit when one of the following
conditions is true:
The writer is idle for the amount of time specified by the PowerCenter Server writer wait
timeout option.
The PowerCenter Server reaches the commit interval and fills another writer buffer.
For more information about configuring the writer wait timeout, see Installing and
Configuring the PowerCenter Server on Windows or Installing and Configuring the
PowerCenter Server on UNIX in the Installation and Configuration Guide.
Note: When you choose target-based commit for a session containing an XML target, the
Workflow Manager disables the On Commit session property on the Transformations view of
the Mapping tab.
Target-Based Commits
277
Source-Based Commits
During a source-based commit session, the PowerCenter Server commits data to the target
based on the number of rows from some active sources in a target load order group. These
rows are referred to as source rows.
When the PowerCenter Server runs a source-based commit session, it identifies commit
source for each pipeline in the mapping. The PowerCenter Server generates a commit row
from these active sources at every commit interval. The PowerCenter Server writes the name
of the transformation used for source-based commit intervals into the session log:
Source-based commit interval based on... TRANSFORMATION_NAME
The PowerCenter Server might commit less rows to the target than the number of rows
produced by the active source. For example, you have a source-based commit session that
passes 10,000 rows through an active source, and 3,000 rows are dropped due to
transformation logic. The PowerCenter Server issues a commit to the target when the 7,000
remaining rows reach the target.
The number of rows held in the writer buffers does not affect the commit point for a sourcebased commit session. For example, you have a source-based commit session that passes
10,000 rows through an active source. When those 10,000 rows reach the targets, the
PowerCenter Server issues a commit. If the session completes successfully, the PowerCenter
Server issues commits after 10,000, 20,000, 30,000, and 40,000 source rows.
If the targets are in the same transaction control unit, the PowerCenter Server commits data
to the targets at the same time. If the session fails or aborts, the PowerCenter Server rolls back
all uncommitted data in a transaction control unit to the same source row.
If the targets are in different transaction control units, the PowerCenter Server performs the
commit when each target receives the commit row. If the session fails or aborts, the
PowerCenter Server rolls back each target to the last commit point. It might not roll back to
the same source row for targets in separate transaction control units. For more information on
transaction control units, see Understanding Transaction Control Units on page 289.
Note: Source-based commit may slow session performance if the session uses a one-to-one
mapping. A one-to-one mapping is a mapping that moves data from a Source Qualifier, XML
Source Qualifier, or Application Source Qualifier transformation directly to a target. For
more information about performance, see Performance Tuning on page 635.
278
Source Qualifier
MQ Source Qualifier
XML Source Qualifier when you only connect ports from one output group
Normalizer (VSAM)
Custom with one output group and with the All Input transformation scope
A multiple input group transformation with one output group connected to multiple
upstream transaction control points
For more information on transformation scope and transaction control, see Understanding
Transaction Control on page 287. For more information on active sources, see Working
with Active Sources on page 259.
A mapping can have one or more target load order groups, and a target load order group can
have one or more active sources that generate commits. The PowerCenter Server uses the
commits generated by the active source that is closest to the target definition. This is known
as the commit source.
For example, you have the mapping in Figure 10-1:
Figure 10-1. Mapping with a Single Commit Source
Transformation Scope
property is All Input.
Source-Based Commits
279
Transformation Scope
property is All Input.
The mapping contains a target load order group with one source pipeline that branches from
the Source Qualifier transformation to two targets. One pipeline branch contains an
Aggregator transformation with the All Input transformation scope, and the other contains an
Expression transformation. The PowerCenter Server identifies the Source Qualifier
transformation as the commit source for t_monthly_sales and the Aggregator as the commit
source for T_COMPANY_ALL. It performs a source-based commit for both targets, but uses
a different commit source for each.
280
The target receives data from the XML Source Qualifier transformation, and you
connect multiple output groups from an XML Source Qualifier transformation to
downstream transformations. An XML Source Qualifier transformation does not generate
commits when you connect multiple output groups downstream.
The target receives data from an active source with multiple output groups other than an
XML Source Qualifier transformation. For example, the target receives data from a
Custom transformation that you do not configure to generate transactions. Multiple
output group active sources neither generate nor propagate commits.
You put a commit source between the XML Source Qualifier transformation and the
target. The PowerCenter Server uses source-based commit for the target because it receives
commits from the commit source. The active source is the commit source for the target.
You do not put a commit source between the XML Source Qualifier transformation and
the target. The PowerCenter Server uses target-based commit for the target because it
receives no commits.
Connected to an XML
Source Qualifier
transformation with multiple
connected output groups.
PowerCenter Server uses
target-based commit when
loading to these targets.
Connected to an active
source that generates
commits, AGG_Sales.
PowerCenter Server uses
source-based commit
when loading to this
target.
This mapping contains an XML Source Qualifier transformation with multiple output groups
connected downstream. Because you connect multiple output groups downstream, the XML
Source Qualifier transformation does not generate commits. You connect the XML Source
Qualifier transformation to two relational targets, T_STORE and T_PRODUCT. Therefore,
these targets do not receive any commit generated by an active source. The PowerCenter
Server uses target-based commit when loading to these targets.
However, the mapping includes an active source that generates commits, AGG_Sales, between
the XML Source Qualifier transformation and T_YTD_SALES. The PowerCenter Server uses
source-based commit when loading to T_YTD_SALES.
Source-Based Commits
281
You put a commit source between the Custom transformation and the target. The
PowerCenter Server uses source-based commit for the target because it receives commits
from the active source. The active source is the commit source for the target.
You do not put a commit source between the Custom transformation and the target. The
PowerCenter Server uses target-based commit for the target because it receives no
commits.
transformation procedure outputs transactions. When you do this, configure the session for
user-defined commit. For more information on user-defined commit sessions, see UserDefined Commits on page 283.
282
User-Defined Commits
During a user-defined commit session, the PowerCenter Server commits and rolls back
transactions based on a row or set of rows that pass through a Transaction Control
transformation. The PowerCenter Server evaluates the transaction control expression for each
row that enters the transformation. The return value of the transaction control expression
defines the commit or rollback point.
You can use also create a user-defined commit session when the mapping contains a Custom
transformation configured to generate transactions. When you do this, the procedure
associated with the Custom transformation defines the transaction boundaries.
When the PowerCenter Server evaluates a commit row, it commits all rows in the transaction
to the target or targets. When it evaluates a rollback row, it rolls back all rows in the
transaction from the target or targets. The PowerCenter Server writes a message to the session
log at each commit and rollback point. The session details are cumulative. The following
message is a sample commit message from the session log:
WRITER_1_1_1> WRT_8317
USER-DEFINED COMMIT POINT
===================================================
WRT_8036 Target: TCustOrders (Instance Name: [TCustOrders])
WRT_8038 Inserted rows - Requested: 1003
Rejected: 0
Affected: 1023
Applied: 1003
When the PowerCenter Server writes all rows in a transaction to all targets, it issues commits
sequentially for each target.
The PowerCenter Server rolls back data based on the return value of the transaction control
expression or error handling configuration. If the transaction control expression returns a
rollback value, the PowerCenter Server rolls back the transaction. If an error occurs, you can
choose to roll back or commit at the next commit point.
If the transaction control expression evaluates to a value other than commit, rollback, or
continue, the PowerCenter Server fails the session. For more information about valid values,
see Transaction Control Transformation in the Transformation Guide.
When the session completes, the PowerCenter Server may write data to the target that was not
bound by commit rows. You can choose to commit at end of file or to roll back that open
transaction.
Note: If you use bulk loading with a user-defined commit session, the target may not recognize
the transaction boundaries. If the target connection group does not support transactions, the
PowerCenter Server writes the following message to the session log:
WRT_8234 Warning: Target Connection Groups connection doesnt support
transactions. Targets may not be loaded according to specified transaction
boundaries rules.
User-Defined Commits
283
Roll back on error. You choose to roll back commit transactions if the PowerCenter Server
encounters a non-fatal error.
Roll back on failed commit. If any target connection group in a transaction control unit
fails to commit, the PowerCenter Server rolls back all uncommitted data to the last
successful commit point.
For more information on transaction control units, see Understanding Transaction Control
Units on page 289.
Rollback Evaluation
If the transaction control expression returns a rollback value, the PowerCenter Server rolls
back the transaction and writes a message to the session log indicating that the transaction
was rolled back. It also indicates how many rows were rolled back.
The following message is a sample message that the PowerCenter Server writes to the session
log when the transaction control expression returns a rollback value:
WRITER_1_1_1> WRT_8326 User-defined rollback processed
WRITER_1_1_1> WRT_8331 Rollback statistics
WRT_8162 ===================================================
WRT_8330 Rolled back [333] inserted, [0] deleted, [0] updated rows for the
target [TCustOrders]
The following message is a sample message indicating that Commit on End of File is enabled
in the session properties:
WRITER_1_1_1> WRT_8143
Commit at end of Load Order Group
284
Description
Rolled-back insert
Rolled-back update
Rolled-back delete
Note: The PowerCenter Server does not roll back a transaction if it encounters an error before
The PowerCenter Server reaches the third commit point for all targets.
2.
3.
4.
5.
6.
The PowerCenter Server rolls back TCG2_T3 and TCG3_T4 to the second commit
point, but it cannot roll back TCG1_T1 and TCG1_T2 to the second commit point
because it successfully committed at the third commit point.
7.
The PowerCenter Server writes the rows to the reject file from TCG2_T3 and
TCG3_T4. These are the rollback rows associated with the third commit point.
8.
The PowerCenter Server writes the row to the reject file from TCG_T1 and TCG1_T2.
These are the commit rows associated with the third commit point.
User-Defined Commits
285
Figure 10-5 illustrates PowerCenter Server behavior when it rolls back on a failed commit:
Figure 10-5. Roll Back on Failed Commit Example
The following table describes row indicators in the reject file for committed transactions in a
failed transaction control unit:
286
Row Indicator
Description
Committed insert
Committed update
Committed delete
Source-based commit. Some active sources generate commits. They do not generate
rollback rows. Also, transaction generators generate commit and rollback rows. For a list
of active sources that generate commits, see Determining the Commit Source on
page 278.
For a list of transaction control points, see Table 10-1 on page 288.
Transformation Scope
You can configure how the PowerCenter Server applies the transformation logic to incoming
data with the Transformation Scope transformation property. When the PowerCenter Server
processes a transformation, it either drops transaction boundaries or preserves transaction
boundaries, depending on the transformation scope and the mapping configuration.
You can choose one of the following values for the transformation scope:
Row. Applies the transformation logic to one row of data at a time. Choose Row when a
row of data does not depend on any other row. When you choose Row for a
Understanding Transaction Control
287
All Input. Applies the transformation logic on all incoming data. When you choose All
Input, the PowerCenter Server drops incoming transaction boundaries and outputs all
rows from the transformation as an open transaction. Choose All Input when a row of data
depends on all rows in the source.
Table 10-1 lists the transformation scope values available for each transformation:
Table 10-1. Transformation Scope Property Values
Transformation
Row
Aggregator
Application Source
Qualifier
n/a.
Transaction control point.
Custom*
Optional.
Transaction control point
or when configured to
generate commits.
Expression
External Procedure
Filter
Joiner
Lookup
MQ Source Qualifier
n/a.
Transaction control point.
Normalizer (VSAM)
n/a.
Transaction control point.
Normalizer (relational)
Rank
288
Transaction
All Input
Optional.
Default.
Transaction control point.
Optional.
Transaction control point or
when configured to generate
commits.
Default.
Transaction control point
when it has one output
group or when configured
to generate commits.
Optional.
Default.
Transaction control point.
Optional.
Default.
Transaction control point.
Row
Router
Sorter
Sequence Generator
Source Qualifier
n/a.
Transaction control point.
Stored Procedure
Transaction Control
Union
Update Strategy
XML Generator
XML Parser
n/a.
Transaction control point.
Transaction
All Input
Optional.
Default.
Transaction control point.
Optional.
Transaction when the flush
on commit is set to create a
new document,
*For more information on how the Transformation Scope property affects the Custom transformation, see Custom Transformation in the
Transformation Guide.
289
Figure 10-6 illustrates transaction control units with a Transaction Control transformation:
Figure 10-6. Transaction Control Units
Transaction
Control Unit 1
Target Connection Group 2
Transaction
Control Unit 2
Note that T5_ora1 uses the same connection name as T1_ora1 and T2_ora1. Because
T5_ora1 is connected to a separate Transaction Control transformation, it is in a separate
transaction control unit and target connection group. If you connect T5_ora1 to
tc_TransactionControlUnit1, it will be in the same transaction control unit as all targets, and
in the same target connection group as T1_ora1 and T2_ora1.
290
Transformations with Transaction transformation scope must receive data from a single
transaction control point.
The PowerCenter Server uses the transaction boundaries defined by the first upstream
transaction control point for transformations with Transaction transformation scope.
Transaction generators can be effective or ineffective for a target. The PowerCenter Server
uses the transaction generated by an effective transaction generator when it loads data to a
target. For more information on effective and ineffective transaction generators, see
Transaction Control Transformation in the Transformation Guide.
The Workflow Manager prevents you from using incremental aggregation in a session with
an Aggregator transformation with Transaction transformation scope.
The PowerCenter Server resets any cache at the beginning of each transaction for
Aggregator, Joiner, Rank, and Sorter transformations with Transaction transformation
scope.
You can only choose the Transaction transformation scope for Joiner transformations when
you use sorted input.
291
Commit Type
Commit Interval
Commit on
End of File
Roll Back
Transactions
on Error
Table 10-2 describes the session commit properties that you set in the General Options
settings of the Properties tab:
Table 10-2. Session Commit Properties
292
Property
Target-Based
Source-Based
User-Defined
Commit Type
Selected by default if no
transaction generator or only
ineffective transaction
generators are in the
mapping.
Selected by default if
effective transaction
generators are in the
mapping.
Commit Interval*
Default is 10,000.
Default is 10,000.
n/a
Target-Based
Source-Based
User-Defined
Roll Back
Transactions on
Errors
n/a
*Tip: When you bulk load to Microsoft SQL Server or Oracle targets, define a large commit interval. Microsoft SQL Server
and Oracle start a new bulk load transaction after each commit. Increasing the commit interval reduces the number of
bulk load transactions and increases performance.
293
294
Chapter 11
Recovering Data
This chapter covers the following topics:
Overview, 296
295
Overview
If you stop a session or if an error causes a session to stop unexpectedly, refer to the session
logs to determine the cause of the failure. Correct the errors, and then complete the session.
The method you use to complete the session depends on the configuration of the mapping
and the session, the specific failure, and how much progress the session made before it failed.
If the PowerCenter Server did not commit any data, run the session again. If the session
issued at least one commit and is recoverable, consider running the session in recovery mode.
Recovery allows you to restart a failed session and complete it as if the session had run
without pause. When the PowerCenter Server runs in recovery mode, it continues to commit
data from the point of the last successful commit. For more information on PowerCenter
Server processing during recovery, see Server Handling for Recovery on page 314.
All recovery sessions run as part of a workflow. When you recover a session, you also have the
option to run part of the workflow. Consider the configuration and design of the workflow
and the status of other tasks in the workflow before you choose a method of recovery.
Depending on the configuration and status of the workflow and session, you can choose one
or more of the following recovery methods:
Recover a suspended workflow. If the workflow suspends due to session failure, you can
recover the failed session and resume the workflow. For details, see Recovering a
Suspended Workflow on page 305.
Recover a failed workflow. If the workflow fails as a result of session failure, you can
recover the session and run the rest of the workflow. For details, see Recovering a Failed
Workflow on page 308.
Recover a session task. If the workflow completes, but a session fails, you can recover the
session alone without running the rest of the workflow. You can also use this method to
recover multiple failed sessions in a branched workflow. For details, see Recovering a
Session Task on page 311.
For more information on session failure, see Stopping and Aborting a Session on page 200.
296
Sort the data from the source. This guarantees that the PowerCenter Server always
receives source rows in the same order. You can do this by configuring the Sorted Ports
option in the Source Qualifier or Application Source Qualifier transformation or by
adding a Sorter transformation configured for distinct output rows to the mapping after
the source qualifier.
Verify all targets receive data from transformations that produce repeatable data. Some
transformations produce repeatable data. You can enable a session for recovery in the
Workflow Manager when all targets in the mapping receive data from transformations that
produce repeatable data. For more information on repeatable data, see Working with
Repeatable Data on page 301.
Also, to perform consistent data recovery, the source, target, and transformation properties for
the recovery session must be the same as those for the failed session. Do not change the
properties of objects in the mapping before you run the recovery session.
The previous session run failed and the recovery information is accessible.
To enable recovery, select the Enable Recovery option in the Error Handling settings of the
Configuration tab in the session properties.
If you enable recovery and also choose to truncate the target for a relational normal load
session, the PowerCenter Server does not truncate the target when you run the session in
recovery mode.
Use the following guidelines when you enable recovery for a partitioned session:
297
The Workflow Manager configures all partition points to use the default partitioning
scheme for each transformation when you enable recovery.
The Workflow Manager sets the partition type to pass-through unless the transformation
receiving the data is either an Aggregator transformation, a Rank transformation, or a
sorted Joiner transformation.
You can only enable recovery for unsorted Joiner transformations with one partition.
For Custom transformations, you can enable recovery only for transformations with one
input group.
The PowerCenter Server disables test load when you enable the session for recovery.
To perform consistent data recovery, the session properties for the recovery session must be
the same as the session properties for the failed session. This includes the partitioning
configuration and the session sort order.
298
The PowerCenter Server creates the following recovery tables in the target database:
PM_RECOVERY. This table records target load information during the session run. The
PowerCenter Server removes the information from this table after each successful session
and initializes the information at the beginning of subsequent sessions.
If you want the PowerCenter Server to create the recovery tables, you must grant table
creation privileges to the database user name for the target database connection. If you do not
want the PowerCenter Server to create the recovery tables, you must create the recovery tables
manually.
Do not edit or drop the recovery tables while recovery is enabled. If you want to disable
recovery, the PowerCenter Server does not remove the recovery tables from the target
database. You must manually remove the recovery tables.
Table 11-1 describes the format of PM_RECOVERY:
Table 11-1. PM_RECOVERY Table Definition
Column Name
Datatype
REP_GID
VARCHAR(240)
WFLOW_ID
NUMBER
SUBJ_ID
NUMBER
TASK_INST_ID
NUMBER
TGT_INST_ID
NUMBER
PARTITION_ID
NUMBER
TGT_RUN_ID
NUMBER
RECOVERY_VER
NUMBER
CHECK_POINT
NUMBER
ROW_COUNT
NUMBER
Datatype
LAST_TGT_RUN_ID
NUMBER
Note: If you manually create the PM_TGT_RUN_ID table, you must specify a value other
than zero in the LAST_TGT_RUN_ID column to ensure that the session runs successfully in
recovery mode.
299
Description
12
The PowerCenter Server cannot start recovery because the session or workflow is scheduled, suspending,
waiting for an event, waiting, initializing, aborting, stopping, disabled, or running.
19
The PowerCenter Server cannot start the session in recovery mode because the workflow is configured to run
continuously.
For details on additional pmcmd return codes, see pmcmd Return Codes on page 590.
300
Never. The order of the output data is inconsistent between session runs. This is the
default for active Custom transformations.
Based on input order. The output order is consistent between session runs when the input
data order for all input groups is consistent between session runs. This is the default for
passive Custom transformations.
Always. The order of the output data is consistent between session runs even if the order
of the input data is inconsistent between session runs.
Output is Repeatable
Always.
MQ Source Qualifier
Always.
Always.
Aggregator
Always.
Custom
301
Output is Repeatable
Expression
External Procedure
Filter
Joiner
Lookup
Normalizer (VSAM)
Always.
You can enable the session for recovery, however, you might get inconsistent
results if you run the session in recovery mode. The Normalizer transformation
generates source data in the form of primary keys. Recovering a session might
generate different values than if the session completed successfully. However,
the PowerCenter Server continues to produce unique key values.
Normalizer (pipeline)
Rank
Always.
Router
Sequence Generator
Always.
Stored Procedure
Transaction Control
Union
Never.
Update Strategy
XML Generator
Always.
XML Parser
Always.
To run a session in recovery mode, you must first enable the failed session for recovery. To
enable a session for recovery, the Workflow Manager verifies all targets in the mapping receive
data from transformations that produce repeatable data. The Workflow Manager uses the
values in the Table 11-4 to determine whether or not you can enable a session for recovery.
However, the Workflow Manager cannot verify whether or not you configure some
transformations, such as the Sequence Generator transformation, correctly and always allows
you to enable these sessions for recovery. You may get inconsistent results if you do not
configure these transformations correctly.
302
You cannot enable a session for recovery in the Workflow Manager under the following
circumstances:
You connect a transformation that never produces repeatable data directly to a target. To
enable this session for recovery, you can add a transformation that always produces
repeatable data between the transformation that never produces repeatable data and the
target.
When a mapping contains a transformation that never produces repeatable data, you can add
a transformation that always produces repeatable data immediately after it.
Note: In some cases, you might get inconsistent data if you run some sessions in recovery
mode. For a description of circumstances that might lead to inconsistent data, see
Completing Unrecoverable Sessions on page 316.
Figure 11-1 illustrates a mapping you can enable for recovery:
Figure 11-1. Mapping You Can Enable for Recovery
The mapping contains an Aggregator transformation that always produces repeatable data.
The Aggregator transformation provides data for the Lookup and Expression transformations.
Lookup and Expression transformations produce repeatable data if they receive repeatable
data. Therefore, the target receives repeatable data, and you can enable this session for
recovery.
303
The mapping contains two Source Qualifier transformations that produce repeatable data.
However, the mapping contains a Union and Custom transformation downstream that never
produce repeatable data. The Lookup transformation only produces repeatable data if it
receives repeatable data. Therefore, the target does not receive repeatable data, and you
cannot enable this session for recovery.
You can modify this mapping to enable the session for recovery by adding a Sorter
transformation configured for distinct output rows immediately after transformations that
never output repeatable data. Since the Union transformation is connected directly to another
transformation that never produces repeatable data, you only need to add a Sorter
transformation after the Custom transformation, as shown in the mapping in Figure 11-3:
Figure 11-3. Modified Mapping You Can Enable for Recovery
304
Example
Suppose the workflow w_ItemOrders contains two sequential sessions. In this workflow,
s_ItemSales is enabled for recovery, and the workflow is configured to suspend on error.
305
Suppose s_ItemSales fails, and the PowerCenter Server suspends the workflow. You correct the
error and resume the workflow in recovery mode. The PowerCenter Server recovers the
session successfully, and then runs s_UpdateOrders.
If s_UpdateOrders also fails, the PowerCenter Server suspends the workflow again. You
correct the error, but you cannot resume the workflow in recovery mode because you did not
enable the session for recovery. Instead, you resume the workflow. The PowerCenter Server
starts s_UpdateOrders from the beginning, completes the session successfully, and then runs
the StopWorkflow control task.
Example
Suppose you have the workflow w_ItemsDaily, containing three concurrent sessions,
s_SupplierInfo, s_PromoItems, and s_ItemSales. In this workflow, s_SupplierInfo and
s_PromoItems are enabled for recovery, and the workflow is configured to suspend on error.
306
Workflow
configured to
suspend on error.
Suppose s_SupplierInfo fails while the PowerCenter Server is running the three sessions. The
PowerCenter Server places the workflow in a suspending state and continues running the
other two sessions. s_PromoItems and s_ItemSales also fail, and the PowerCenter Server then
places the workflow in a suspended state.
You correct the errors that caused each session to fail and then resume the workflow in
recovery mode. The PowerCenter Server starts s_SupplierInfo and s_PromoItems in recovery
mode. Since s_ItemSales is not enabled for recovery, it restarts the session from the beginning.
The PowerCenter Server runs the three sessions concurrently.
After all sessions succeed, the PowerCenter Server runs the Command task.
2.
Choose Task-Resume/Recover.
The PowerCenter Server resumes the workflow.
You can also use pmcmd to resume a workflow in recovery mode. For more information, see
Using pmcmd on page 581.
307
Example
Suppose the workflow w_ItemOrders contains two sequential sessions. s_ItemSales is enabled
for recovery and also configured to fail the parent workflow if it fails.
Figure 11-6 illustrates w_ItemOrders:
Figure 11-6. Recovering Part of a Workflow With Sequential Sessions
Session enabled for recovery.
308
Suppose s_ItemSales fails, and the PowerCenter Server fails the workflow. You correct the
error and recover the workflow from s_ItemSales. The PowerCenter Server successfully
recovers the session, and then runs the next task in the workflow, s_UpdateOrders.
Suppose s_UpdateOrders also fails, and the PowerCenter Server fails the workflow again. You
correct the error, but you cannot recover the workflow from the session. Instead, you start the
workflow from the session. The PowerCenter Server starts s_UpdateOrders from the
beginning, completes the session successfully, and then runs the StopWorkflow control task.
Example
Suppose the workflow w_ItemsDaily contains three concurrent sessions, s_SupplierInfo,
s_PromoItems, and s_ItemSales. In this workflow, each session is enabled for recovery and
configured to fail the parent workflow if the session fails.
Figure 11-7 illustrates w_ItemsDaily:
Figure 11-7. Recovering Part of a Workflow with Concurrent Sessions
Sessions enabled for recovery.
Sessions configured to fail parent workflow if
the session fails.
309
Suppose s_SupplierInfo fails while the three concurrent sessions are running, and the
PowerCenter Server fails the workflow. s_PromoItems and s_ItemSales also fail. You correct
the errors that caused each session to fail.
In this case, you must combine two recovery methods to run all sessions before completing
the workflow. You recover s_PromoItems individually. You cannot recover s_ItemSales
because it is not enabled for recovery, but you start the session from the beginning. After the
PowerCenter Server successfully completes s_PromoItems and s_ItemSales, you recover the
workflow from s_SupplierInfo. The PowerCenter Server runs the session in recovery mode,
and then runs the Command task.
Select the failed session in the Navigator or in the Workflow Designer workspace.
2.
Right-click the failed session and choose Recover Workflow from Task.
The PowerCenter Server runs the failed session in recovery mode, and then runs the rest
of the workflow.
2.
You can also use pmcmd to recover a failed workflow. For more information, see Using
pmcmd on page 581.
310
311
Example
Suppose the workflow w_ItemsDaily contains three concurrently running sessions. Each
session is enabled for recovery and configured to fail the workflow if the session fails.
Figure 11-8 illustrates w_ItemsDaily:
Figure 11-8. Recovering Concurrent Sessions Individually
Sessions enabled for recovery.
Sessions configured to fail parent workflow if
the session fails.
Suppose s_ItemSales fails and the PowerCenter Server fails the workflow. s_PromoItems and
s_SupplierInfo also fail. You correct the errors that caused the sessions to fail.
After you correct the errors, you individually recover each failed session. The PowerCenter
Server successfully recovers the sessions. The workflow paths after the sessions converge at the
Command task, allowing you to start the workflow from the Command task and complete
the workflow.
Alternatively, after you correct the errors, you could also individually recover two of the three
failed sessions. After the PowerCenter Server successfully recovers the sessions, you can
recover the workflow from the third session. The PowerCenter Server then recovers the third
session and, on successful recovery, runs the rest of the workflow.
Select the failed session in the Navigator or in the Workflow Designer workspace.
2.
312
2.
You can also use pmcmd to recover a failed session. For more information, see Using pmcmd
on page 581.
313
Running Recovery
If a session enabled for recovery fails, you can run the session in recovery mode. The
PowerCenter Server moves a recovery session through the states of a normal session:
scheduled, waiting, running, succeeded, and failed. When the PowerCenter Server starts the
recovery session, it runs all pre-session tasks.
314
For relational normal load targets, the PowerCenter Server performs incremental load
recovery. It uses the recovery information created during the normal session run to determine
the point at which the session stopped committing data to the target. It then continues
writing data to the target. On successful recovery, the PowerCenter Server removes the
recovery information from the tables.
For example, if the PowerCenter Server commits 10,000 rows before the session fails, when
you run the session in recovery mode, the PowerCenter Server bypasses the rows up to 10,000
and starts loading with row 10,001.
If the session writes to a relational target in bulk mode, the PowerCenter Server performs the
entire writer run. If the Truncate Target Table option is enabled in the session properties, the
PowerCenter Server truncates the target before loading data.
If the session writes to a flat file or XML file, the PowerCenter Server performs full load
recovery. It overwrites the existing output file and performs the entire writer run. If the
session writes to heterogeneous targets, the PowerCenter Server performs incremental load
recovery for all relational normal load targets and full load recovery for all other target types.
On successful recovery, the PowerCenter Server deletes recovery cache files associated with the
session. It also performs all post-session tasks.
315
You change the number of partitions. If you change the number of partitions after the
session fails, the recovery session fails.
Recovery table is empty or missing from the target database. The PowerCenter Server
fails the recovery session under the following circumstances:
You deleted the table after the PowerCenter Server created it.
The session enabled for recovery succeeded, and the PowerCenter Server removed the
recovery information from the table.
Recovery cache file is missing. The PowerCenter Server fails the recovery session if the
recovery cache file is missing from the PowerCenter Server cache directory.
You might get inconsistent data if you perform recovery under the following circumstances:
You change the partitioning configuration. If you change any partitioning options after
the session fails, you may get inconsistent data.
Source data is not sorted. To perform a successful recovery, the PowerCenter Server must
process source rows during recovery in the same order it processes them during the initial
session. Use the Sorted Ports option in the Source Qualifier transformation or add a Sorter
transformation directly after the Source Qualifier transformation.
The sources or targets change after the initial session failure. If you drop or create
indexes, or edit data in the source or target tables before recovering a session, the
PowerCenter Server may return missing or repeat rows.
The session writes to a relational target in bulk mode, but the session is not configured
to truncate the target table. The PowerCenter Server may load duplicate rows to the
during the recovery session.
316
transformation properties to the same value used when you ran the failed session. If you do
not reset the Current Value, the PowerCenter Server will continue to generate unique
Sequence values.
The session performs incremental aggregation and the PowerCenter Server stops
unexpectedly. If the PowerCenter Server stops unexpectedly while running an incremental
aggregation session, the recovery session cannot use the incremental aggregation cache
files. Rename the backup cache files for the session from PMAGG*.idx.bak and
PMAGG*.dat.bak to PMAGG*.idx and PMAGG*.dat before you perform recovery.
The PowerCenter Server data movement mode changes after the initial session failure. If
you change the data movement mode before recovering the session, the PowerCenter
Server might return incorrect data.
The PowerCenter Server code page or source and target code pages change after the
initial session failure. If you change the source, target, or PowerCenter Server code pages,
the PowerCenter Server might return incorrect data. You can perform recovery if the new
code pages are two-way compatible with the original code pages.
The PowerCenter Server runs in Unicode mode and you change the session sort order.
When the PowerCenter Server runs in Unicode mode, it sorts character data based on the
sort order selected for the session. Do not perform recovery if you change the session sort
order after the session fails.
317
318
Chapter 12
Sending Email
This chapter covers the following topics:
Overview, 320
Tips, 342
319
Overview
You can send email to designated recipients when the PowerCenter Server runs a workflow.
For example, if you want to track how long a session takes to complete, you can configure the
session to send an email containing the time and date the session starts and completes. Or, if
you want the PowerCenter Server to notify you when a workflow suspends, you can configure
the workflow to send email when it suspends.
When you create a workflow or worklet, you can include the following types of email:
Email task. You can include reusable and non-reusable Email tasks anywhere in the
workflow or worklet. For more information, see Using Email Tasks in a Workflow or
Worklet on page 341.
Post-session email. You can configure the session so the PowerCenter Server sends an
email when the session completes or fails. You create an Email task and use it for postsession email. For more information, see Working with Post-Session Email on page 332.
When you configure the subject and body of post-session email, you can use email
variables to include information about the session run, such as session name, status, and
the total number of records loaded. You can also use email variables to attach the session
log or other files to email messages. For more information, see Email Variables and
Format Tags on page 333.
Suspension email. You can configure the workflow so the PowerCenter Server sends an
email when the workflow suspends. You create an Email task and use it for suspension
email. For more information, see Working with Suspension Email on page 339.
Before you can configure a session or workflow to send email, you need to create an Email
task. For more information, see Working with Email Tasks on page 328.
The PowerCenter Server on Windows sends email in MIME format. This allows you to
include characters in the subject and body that are not in 7-bit ASCII. For more information
on the MIME format or the MIME decoding process, see your email documentation.
Before creating Email tasks, configure the PowerCenter Server to send email. For more
information, see Configuring Email on UNIX on page 321 and Configuring Email on
Windows on page 322.
320
Log on to the UNIX system as the Informatica user who starts the PowerCenter Server.
2.
3.
Log on to the UNIX system as the Informatica user who starts the PowerCenter Server.
2.
3.
To indicate the end of the message, type . on a line of its own and press Enter.
Or, type ^D.
You should receive a blank email from the email account of the Informatica user. If not,
locate the directory where rmail resides and add that directory to the path.
Once you verify that rmail is installed correctly, you can send email. For more information on
configuring email, see Working with Email Tasks on page 328.
321
Install the Microsoft Outlook mail client on the PowerCenter Server machine.
Create a Windows user account that has Log on as a service rights and a Microsoft Outlook
profile.
To configure the PowerCenter Server on Windows to send email, you must perform the
following steps:
1.
2.
Configure a Microsoft Outlook profile for the Informatica Service startup account.
3.
4.
5.
Configure the PowerCenter Server to send email using the Microsoft Outlook profile you
created in step 2.
Use the same log on name for both the Microsoft Outlook account you create and the user
you grant Log on as a service rights in the Informatica Service startup account.
Note: If you do not already have a Microsoft Outlook mailbox for the Informatica Service
Open the Control Panel on the machine running the PowerCenter Server.
2.
3.
On the Services tab of the user Properties dialog box, click Show Profiles.
The Mail dialog box displays the list of profiles configured for the computer.
4.
If you have a Microsoft Outlook profile set up for the Informatica Service startup
account, skip to Step 3. Configure Logon Network Security on page 325. If you do not
already have a Microsoft Outlook profile set up for the Informatica Service startup
account, continue to the next step.
5.
323
324
6.
Select Use The Following Information Services and then select Microsoft Exchange
Server. Click Next.
7.
Enter a profile name. You can enter any name, but Informatica recommends that you
enter a text string that matches the Informatica Service startup account. Click Next.
8.
Enter the name of the Microsoft Exchange Server. Enter your mailbox name. Click Next.
9.
10.
11.
Indicate whether you want to run Outlook when you start Windows. Click Next.
12.
The Setup Wizard indicates that you have successfully configured an Outlook profile.
13.
Click Finish.
Open the Control Panel on the machine running the PowerCenter Server.
2.
Double-click the Mail (or Mail and Fax) icon. The User Properties sheet appears.
325
3.
On the Services tab, select Microsoft Exchange Server and click Properties.
4.
Click the Advanced tab. Set the Logon network security option to NT Password
Authentication.
5.
Click OK.
326
2.
In the MS Exchange Profile field, enter the name of the Microsoft Outlook profile you
created for the Informatica Service startup account.
327
Session properties. You can configure the session to send email when the session
completes or fails. For more information, see Working with Post-Session Email on
page 332.
Workflow properties. You can configure the workflow to send email when the workflow
suspends. For more information, see Working with Suspension Email on page 339.
Workflow or worklet. You can include an Email task anywhere in the workflow or worklet
to send email based on a condition you define. For more information, see Using Email
Tasks in a Workflow or Worklet on page 341.
Figure 12-1 shows the Edit Tasks dialog box for an Email task in the Task Developer:
Figure 12-1. Email Task
328
If the PowerCenter Server runs on Windows, you can enter a Microsoft Exchange Profile
name. The mail recipient must have an entry in the Global Address book of the Microsoft
Outlook profile.
If the PowerCenter Server runs on Windows, you can send email to multiple recipients by
creating a distribution list in your Personal Address book. All recipients must also be in the
Global Address book. You cannot enter multiple addresses separated by commas or semicolons.
If the PowerCenter Server runs on UNIX, you can enter multiple email addresses separated
by a comma. Do not include spaces between email addresses.
In the Task Developer, choose Tasks-Create. The Create Task dialog box appears.
2.
Select an Email task and enter a name for the task. Click Create.
The Workflow Manager creates an Email task in the workspace.
3.
Click Done.
329
4.
Double-click the Email task in the workspace. The Edit Tasks dialog box appears.
5.
6.
You can optionally enter a description for the task in the Description field.
7.
8.
Enter the fully qualified email address of the mail recipient in the Email User Name field.
For more information on entering the email address, see Email Address Tips and
Guidelines on page 328.
330
9.
Enter the subject of the email in the Email Subject field. Or, you can leave this field
blank.
10.
Click the Open button in the Email Text field to open the Email Editor.
11.
12.
331
On-Success Email
On-Failure Email
Figure 12-2 shows the On-Success and On-Failure email properties on the Components tab of
the session properties:
Figure 12-2. Post-Session Email Properties
Use a reusable
Email task.
Select a
reusable Email
task.
Edit the nonreusable Email
task.
Use a nonreusable Email
task.
You can specify a reusable Email task you create in the Task Developer for either success email
or failure email. Or, you can create a non-reusable Email task for each session property. When
you create a non-reusable Email task for the session property, you create the Email task for
that session only. You cannot use the Email task in the workflow or worklet.
332
You cannot specify a non-reusable Email task you create in the Workflow or Worklet Designer
for post-session email.
Tip: When you configure an Email task for post-session email, use the email server variables,
$PMSuccessEmailUser. Email address of the user to receive email when the session
completes successfully. Use this variable for the Email User Name for success email only.
The PowerCenter Server does not expand this variable when you use it for any other email
type.
$PMFailureEmailUser. Email address of the user to receive email when the session fails to
complete. Use this variable for the Email User Name for failure email only. The
PowerCenter Server does not expand this variable when you use it for any other email type.
When you use one of these server variables, the PowerCenter Server sends email to the address
configured for the server variable.
You might use this functionality when you have an administrator who troubleshoots all failed
sessions. Instead of entering the administrator email address for each session, you can use the
email variable $PMFailureEmailUser. If the administrator changes, you can correct all sessions
by editing the $PMFailureEmailUser server variable, instead of editing the email address in
each session.
You might also use this functionality when you have different administrators for different
PowerCenter Servers. If you deploy a folder from one repository to another or otherwise
change the PowerCenter Server that runs the session, the new server automatically sends email
to users associated with the new server when you use server variables instead of hard-coded
email addresses.
Note: $PMSuccessEmailUser and $PMFailureEmailUser are optional server variables. Verify
333
Note: The PowerCenter Server does not limit the type or size of attached files. However, since
large attachments can cause problems with your email system, avoid attaching excessively
large files, such as session logs generated using verbose tracing. The PowerCenter Server
generates an error message in the email if an error occurs attaching the file.
Table 12-1 describes the email variables you can use in a post-session email:
Table 12-1. Email Variables for Post-Session Email
Email Variable
Description
%s
Session name.
%e
Session status.
%b
%c
%i
%l
%r
%t
Source and target table details, including read throughput in bytes per second and write throughput
in rows per second. The PowerCenter Server includes all information displayed in the session detail
dialog box.
%m
%n
%d
%g
%a<filename>
Attach the named file. The file must be local to the PowerCenter Server. The following are valid file
names: %a<c:\data\sales.txt> or %a</users/john/data/sales.txt>.
Note: The file name cannot include the greater than character (>) or a line break.
Note: The PowerCenter Server ignores %a, %g, or %t when you include them in the email subject. Include these variables in the email
message only.
Table 12-2 lists the format tags you can use in an Email task:
Table 12-2. Format Tags for Email Tasks
Formatting
Format Tag
tab
\t
new line
\n
334
2.
Select Reusable in the Type column for the success email or failure email field.
3.
Click the Open button in the Value column to select the reusable Email task.
335
4.
Select the Email task in the Object Browser dialog box and click OK.
5.
You can optionally edit the Email task for this session property by clicking the Edit
button in the Value column.
If you edit the Email task for either success email or failure email, the edits only apply to
this session.
6.
336
1.
2.
Select Non-Reusable in the Type column for the success email or failure email field.
3.
4.
Edit the Email task and click OK. For more information on editing Email tasks, see
Working with Email Tasks on page 328.
5.
Sample Email
The following is user-entered text from a sample post-session email configuration using
variables:
Session complete.
Session name: %s
%l
%r
%e
%b
%c
%i
%g
337
338
2.
3.
339
4.
Note: The Workflow Manager returns an error message if you do not have any reusable
Email tasks in the folder. Create a reusable Email task in the folder before you configure
suspension email.
340
5.
6.
variables related to the session as text. For example, if you use the variable %s in an Email task
in the workflow, the PowerCenter Server cannot provide a session name, as it is not within a
session.
Figure 12-4 shows a workflow that performs this operation:
Figure 12-4. Email Task in a Workflow
Configure the gen_report Command task to execute a shell script that generates the report.
Verify the shell script saves the report to a directory local to the PowerCenter Server.
Configure the em_report Email task to attach the file generated from the shell script.
341
Tips
The following suggestions can extend the capabilities of Email tasks.
Create generic user for sending email.
Often there are multiple users who can start sessions on a PowerCenter Server. If you want to
avoid entering the Microsoft Outlook profile each time the PowerCenter user changes, create
a generic Microsoft Outlook profile, such as PowerCenter, then grant each PowerCenter
user rights to send mail through this profile.
Use server variables to address post-session emails.
When the server variables $PMSuccessEmailUser and $PMFailureEmailUser are configured
for the PowerCenter Server, use them to address post-session emails. This allows you to
change the recipient of post-session emails for all sessions the server runs by editing the server
variables. It can also make deploying sessions into production easier when the variables are
defined for both development and production servers.
Generate and send post-session reports.
You can use a post-session success command to generate a report file and attach that file to a
success email. For example, you create a batch file called Q3rpt.bat that generates a sales
report, and you are running Microsoft Outlook on Windows.
Figure 12-5 shows how you can configure the post-session success command to generate a
report:
Figure 12-5. Using Post-Session Commands to Generate Reports
342
Figure 12-6 shows how you can configure success email to attach a report file:
Figure 12-6. Using Email Variables to Attach Reports
Tips
343
344
Chapter 13
Pipeline Partitioning
This chapter covers the following subjects:
Overview, 346
Overview
You create a session for each mapping you want the PowerCenter Server to run. Every
mapping contains one or more source pipelines. A source pipeline consists of a source
qualifier and all the transformations and targets that receive data from that source qualifier.
If you purchase the Partitioning option, you can specify partitioning information for each
source pipeline in a mapping. The partitioning information for a pipeline controls the
following factors:
The number of reader, transformation, and writer threads that the master thread creates
for the pipeline. For more information, see Understanding Processing Threads on
page 14.
How the PowerCenter Server reads data from the source, including the number of
connections to the source.
How the PowerCenter Server distributes rows of data to each transformation as it processes
the pipeline.
How the PowerCenter Server writes data to the target, including the number of
connections to each target in the pipeline.
You can specify partitioning information for a pipeline by setting the following attributes:
Location of partition points. Partition points mark the thread boundaries in a pipeline
and divide the pipeline into stages. The PowerCenter Server sets partition points at several
transformations in a pipeline by default. If you have the Partitioning option, you can
define other partition points. When you add partition points, you increase the number of
transformation threads, which can improve session performance. The PowerCenter Server
can redistribute rows of data at partition points, which can also improve session
performance. For more information on partition points, see Partition Points on
page 346.
Number of partitions. A partition is a pipeline stage that executes in a single thread. If you
purchase the Partitioning option, you can set the number of partitions at any partition
point. When you add partitions, you increase the number of processing threads, which can
improve session performance. For more information, see Number of Partitions on
page 348.
Partition types. The PowerCenter Server specifies a default partition type at each partition
point. If you purchase the Partitioning option, you can change the partition type. The
partition type controls how the PowerCenter Server redistributes data among partitions at
partition points. For more information, see Partition Types on page 348.
Partition Points
By default, the PowerCenter Server sets partition points at various transformations in the
pipeline. Partition points mark thread boundaries as well as divide the pipeline into stages. A
stage is a section of a pipeline between any two partition points. When you set a partition
point at a transformation, the new pipeline stage includes that transformation.
346
Table 13-1 lists the partition points that the Workflow Manager creates by default:
Table 13-1. Default Partition Points
Transformation
(Partition Point)
Default
Partition Type
Source Qualifier or
Normalizer transformation
Pass-through
Controls how the PowerCenter Server reads data from the source
and passes data into the source qualifier.
Hash auto-keys
Target instances
Pass-through
Description
If you purchase the Partitioning option, you can add partition points at other transformations
and delete some partition points.
Figure 13-1 shows the default partition points and pipeline stages for a simple mapping with
one source pipeline:
Figure 13-1. Default Partition Points and Stages in a Sample Mapping
First Stage
Second Stage
The mapping in Figure 13-1 contains four stages. The partition point at the source qualifier
marks the boundary between the first (reader) and second (transformation) stages. The
partition point at the Aggregator transformation marks the boundary between the second and
third (transformation) stages. The partition point at the target instance marks the boundary
between the third (transformation) and fourth (writer) stage.
When you add a partition point, you increase the number of pipeline stages by one. Similarly,
when you delete a partition point, you reduce the number of stages by one. For more
information, see Understanding Processing Threads on page 14.
Besides marking stage boundaries, partition points also mark the points in the pipeline where
the PowerCenter Server can redistribute data across partitions. For example, if you place a
partition point at a Filter transformation and define multiple partitions, the PowerCenter
Server can redistribute rows of data among the partitions before the Filter transformation
processes the data. The partition type you set at this partition point controls the way in which
the PowerCenter Server passes rows of data to each partition. For more information, see
Partition Types on page 348.
For more information on adding and deleting partition points, see Adding and Deleting
Partition Points on page 353.
Overview
347
Number of Partitions
A partition is a pipeline stage that executes in a single reader, transformation, or writer thread.
By default, the PowerCenter Server defines a single partition in the source pipeline. If you
purchase the Partitioning option, you can increase the number of partitions. This increases
the number of processing threads, which can improve session performance.
For example, you need to use the mapping in Figure 13-1 to extract data from three flat files
of various sizes. To do this, you define three partitions at the source qualifier to read the data
simultaneously. When you do this, the Workflow Manager defines three partitions in the
pipeline.
Figure 13-2 shows the threads that the master thread creates for this mapping:
Figure 13-2. Threads Created for a Sample Mapping with Three Partitions
(Second Stage)
6 Transformation Threads
(Third Stage)
3 Writer Threads
(Fourth Stage)
By default, the PowerCenter Server sets the number of partitions to one. You can generally
define up to 64 partitions at any partition point. However, there are situations in which you
can define only one partition in the pipeline. For more information, see Restrictions on the
Number of Partitions on page 395.
Note: Increasing the number of partitions or partition points increases the number of threads.
Therefore, increasing the number of partitions or partition points also increases the load on
the server machine. If the server machine contains ample CPU bandwidth, processing rows of
data in a session concurrently can increase session performance. However, if you create a large
number of partitions or partition points in a session that processes large amounts of data, you
can overload the system.
For more information on adding and deleting partitions, see Adding and Deleting Partitions
on page 356.
Partition Types
When you configure the partitioning information for a pipeline, you must specify a partition
type at each partition point in the pipeline. The partition type determines how the
PowerCenter Server redistributes data across partition points.
348
The Workflow Manager allows you to specify the following partition types:
Round-robin. The PowerCenter Server distributes data evenly among all partitions. Use
round-robin partitioning where you want each partition to process approximately the same
number of rows. For more information, see Round-Robin Partition Type on page 360.
Hash. The PowerCenter Server applies a hash function to a partition key to group data
among partitions. If you select hash auto-keys, the PowerCenter Server uses all grouped or
sorted ports as the partition key. If you select hash user keys, you specify a number of ports
to form the partition key. Use hash partitioning where you want to ensure that the
PowerCenter Server processes groups of rows with the same partition key in the same
partition. For more information, see Hash Keys Partition Types on page 361.
Key range. You specify one or more ports to form a compound partition key. The
PowerCenter Server passes data to each partition depending on the ranges you specify for
each port. Use key range partitioning where the sources or targets in the pipeline are
partitioned by key range. For more information, see Key Range Partition Type on
page 363.
Pass-through. The PowerCenter Server passes all rows at one partition point to the next
partition point without redistributing them. Choose pass-through partitioning where you
want to create an additional pipeline stage to improve performance, but do not want to
change the distribution of data across partitions. For more information, see Pass-Through
Partition Type on page 367.
Database partitioning. The PowerCenter Server queries the IBM DB2 system for table
partition information and loads partitioned data to the corresponding nodes in the target
database. Use database partitioning with IBM DB2 targets stored on a multi-node
tablespace. For more information, see Database Partitioning Partition Type on page 369.
You can specify different partition types at different points in the pipeline.
Figure 13-3 shows a mapping where you can specify different partition types to increase
session performance:
Figure 13-3. Sample Mapping
The mapping in Figure 13-3 reads data about items and calculates average wholesale costs and
prices. The mapping must read item information from three flat files of various sizes, and
then filter out discontinued items. It sorts the active items by description, calculates the
average prices and wholesale costs, and writes the results to a relational database in which the
target tables are partitioned by key range.
When you use this mapping in a session, you can increase session performance by specifying
different partition types at the following partition points in the pipeline:
Source qualifier. To read data from the three flat files concurrently, you must specify three
partitions at the source qualifier. Accept the default partition type, pass-through.
Overview
349
Filter transformation. Since the source files vary in size, each partition processes a
different amount of data. Set a partition point at the Filter transformation, and choose
round-robin partitioning to balance the load going into the Filter transformation.
Target. Since the target tables are partitioned by key range, specify key range partitioning
at the target to optimize writing data to the target.
For more information on specifying partition types, see Specifying Partition Types on
page 356.
350
Add a partition key and key ranges for certain partition types.
Figure 13-4 shows the configuration options on the Partitions view on the Mapping tab:
Figure 13-4. Session Properties Partitions View on the Mapping Tab
Add a partition
point.
Delete a partition
point.
Partitioning
Workspace
Edit Keys
Specify key
ranges.
Click to display
Partitions view.
351
Table 13-2 describes the configuration options for the Partitions view on the Mapping tab:
Table 13-2. Options on Session Properties Partitions View on the Mapping Tab
Partitions View Option
Description
Click to add a new partition point in the mapping. When you add a partition point, the
transformation name appears under the Partition Points node.
Click to edit the selected partition point. This opens the Edit Partition Point dialog box. For
more information on the options in this dialog box, see Table 13-3 on page 353.
Key Range
Displays the key and key ranges for the partition point, depending on the partition type.
For key range partitioning, you specify the key ranges.
For hash user keys partitioning, this field displays the partition key.
The Workflow Manager does not display this area for other partition types.
Edit Keys
Click to add or remove the partition key for key range or hash user keys partitioning. You
cannot create a partition key for hash auto-keys, round-robin, or pass-through partitioning.
You can configure the following information when you edit or add a partition point:
Figure 13-5 shows the configuration options in the Edit Partition Point dialog box:
Figure 13-5. Edit Partition Point Dialog Box
Selected Partition Point
Add a partition.
Delete a partition.
Select a partition.
Enter the partition description.
352
Table 13-3 describes the configuration options in the Edit Partition Point dialog box:
Table 13-3. Edit Partition Point Dialog Box Options
Partition Options
Description
Partition Names
Add a Partition
Adds a partition. You can add up to 64 partitions at any partition point. The number of
partitions must be consistent across the pipeline. Therefore, if you define three partitions
at one partition point, the Workflow Manager defines three partitions at all partition points
in the pipeline.
Delete a Partition
Deletes the selected partition. Each partition point must contain at least one partition.
Description
Source Qualifier or Normalizer. This partition point controls how the PowerCenter Server
extracts data from the source and passes it to the source qualifier. You cannot delete this
partition point.
Rank and unsorted Aggregator transformations. These partition points ensure that the
PowerCenter Server groups rows properly before it sends them to the transformation. You
can delete these partition points if the pipeline contains only one partition or if the
PowerCenter Server passes all rows in a group to a single partition before they enter the
transformation.
For example, in the mapping in Figure 13-3 on page 349, you can delete the default
partition point at the Aggregator transformation because hash auto-keys partitioning at
the Sorter transformation sends all rows that contain items with the same description to
the same partition. Therefore, the Aggregator transformation receives data for all items
with the same description in one partition and can calculate the average costs and prices
for this item correctly.
Target instances. This partition point controls how the writer passes data to the targets.
You cannot delete this partition point.
353
You can add partition points at any other transformation provided that no partition point
receives input from more than one pipeline stage.
In this mapping, the Workflow Manager creates partition points at the source qualifier and
target instance by default. You can place an additional partition point at Expression
transformation EXP_3.
If you place a partition point at EXP_3 and define one partition, the master thread creates the
following threads:
* Partition Points
*
Reader Thread
(First Stage)
(Second Stage)
Transformation Threads
(Third Stage)
Writer Thread
(Fourth Stage)
In this case, each partition point receives data from only one pipeline stage, so EXP_3 is a
valid partition point.
The following transformations are not valid partition points:
354
Transformation
Reason
Source
Transformation
Reason
SG_1
If you could place a partition point at EXP_1 or EXP_2, you would create an additional pipeline
stage that processes data from the source qualifier to EXP_1 or EXP_2. In this case, EXP_3
would receive data from two pipeline stages, which is not allowed.
For more information about processing threads, see Understanding Processing Threads on
page 14.
On the Partitions view of the Mapping tab, select a transformation that is not already a
partition point, and click the Add a Partition Point button.
Tip: You can select a transformation from the Non-Partition Points node.
2.
Select the partition type for the partition point or accept the default value. For
information on specifying a valid partition type, see Specifying Partition Types on
page 356.
3.
Click OK.
The transformation appears in the Partition Points node in the Partitions view on the
Mapping tab of the session properties.
355
356
Table 13-4 lists valid partition types and the default partition type for different partition
points in the pipeline:
Table 13-4. Valid Partition Types for Partition Points
Transformation
(Partition Point)
RoundRobin
Hash
Auto-Keys
Hash User
Keys
Key
Range
PassThrough
Source definition
Database
Partitioning
Default Partition
Type
Not a valid partition
point
Source Qualifier
(relational sources)
Pass-through
Source Qualifier
(flat file sources)
Pass-through
Pass-through
Normalizer
(COBOL sources)
Pass-through
Pass-through
Pass-through
Based on
transformation scope*
Normalizer
(relational)
Aggregator (sorted)
Aggregator (unsorted)
Custom
Pass-through
Expression
Pass-through
External Procedure
Pass-through
Filter
Pass-through
Based on
transformation scope*
Pass-through
Based on
transformation scope*
Pass-through
Joiner
Lookup
X
X
Rank
Router
X
X
Sequence Generator
Sorter
Based on
transformation scope*
Stored Procedure
Pass-through
Transaction Control
Pass-through
Union
Pass-through
Update Strategy
Pass-through
357
RoundRobin
Hash
Auto-Keys
Hash User
Keys
Key
Range
PassThrough
Database
Partitioning
Unconnected
transformation
Default Partition
Type
Not a valid partition
point
Relational target
definition
X
(DB2 targets
only)
Pass-through
The default for DB2
targets is database
partitioning
Pass-through
Not a valid partition
point
* The default partition type is pass-through when the transformation scope is Transaction, and hash auto-keys when the transformation scope is All Input.
358
Cache Partitioning
When you create a session with multiple partitions, the PowerCenter Server can partition
caches for the Aggregator, Joiner, Lookup, and Rank transformations. It creates a separate
cache for each partition, and each partition works with only the rows needed by that
partition. As a result, the PowerCenter Server requires only a portion of total cache memory
for each partition. When you run a session, the PowerCenter Server accesses the cache in
parallel for each partition.
After you configure the session for partitioning, you can configure memory requirements and
cache directories for each transformation in the Transformations view on the Mapping tab of
the session properties. To configure the memory requirements, calculate the total
requirements for a transformation, and divide by the number of partitions. To further
improve performance, you can configure separate directories for each partition.
The guidelines for cache partitioning is different for each cached transformation:
Aggregator transformation. The PowerCenter Server uses cache partitioning for any
multi-partitioned session with an Aggregator transformation. You do not have to set a
partition point at the Aggregator transformation.
Joiner transformation. The PowerCenter Server uses cache partitioning when you create a
partition point at the Joiner transformation. For more information about partitioning with
Joiner transformations, see Partitioning Joiner Transformations on page 384.
Lookup transformation. The PowerCenter Server uses cache partitioning when you create
a hash auto-keys partition point at the Lookup transformation. For more information
about partitioning with Lookup transformations, see Partitioning Lookup
Transformations on page 391.
Rank transformation. The PowerCenter Server uses cache partitioning for any multipartitioned session with a Rank transformation. You do not have to set a partition point at
the Rank transformation.
Cache Partitioning
359
The session based on this mapping reads item information from three flat files of different
sizes:
When the PowerCenter Server reads the source data, the first partition begins processing 80%
of the data, the second partition processes 5% of the data, and the third partition processes
15% of the data.
To distribute the workload more evenly, set a partition point at the Filter transformation and
set the partition type to round-robin. The PowerCenter Server distributes the data so that
each partition processes approximately one third of the data.
360
Hash auto-keys. The PowerCenter Server uses all grouped or sorted ports as a compound
partition key. You may need to use hash auto-keys partitioning at Rank, Sorter, and
unsorted Aggregator transformations.
Hash user keys. You specify a number of ports to generate the partition key.
Table 13-4 on page 357 lists the partition points where you can specify hash partitioning.
Hash Auto-Keys
You can use hash auto-keys partitioning at or before Rank, Sorter, Joiner, and unsorted
Aggregator transformations to ensure that rows are grouped properly before they enter these
transformations.
Figure 13-8 shows a mapping where hash auto-keys partitioning causes the PowerCenter
Server to distribute rows to each partition according to group before they enter the Sorter and
Aggregator transformations:
Figure 13-8. Mapping where Hash Partitioning Can Increase Performance
In this mapping, the Sorter transformation sorts items by item description. If items with the
same description exist in more than one source file, each partition will contain items with the
same description. Without hash auto-keys partitioning, the Aggregator transformation might
calculate average costs and prices for each item incorrectly.
To prevent errors in the cost and prices calculations, set a partition point at the Sorter
transformation and set the partition type to hash auto-keys. When you do this, the
PowerCenter Server redistributes the data so that all items with the same description reach the
Sorter and Aggregator transformations in a single partition.
361
To rearrange the order of the ports that make up the key, select a port in the Selected Ports list
and click the up or down arrow.
362
Partition 1: 00012999
Partition 2: 30005999
Partition 3: 60009999
2.
3.
4.
Start Range
Partition #1
End Range
3000
Partition #2
3000
Partition #3
6000
6000
When you do this, the PowerCenter Server sends all items with IDs less than 3000 to the first
partition. It sends all items with IDs between 3000 and 5999 to the second partition. Items
with IDs greater than or equal to 6000 go to the third partition. For more information on key
ranges, see Adding Key Ranges on page 365.
Key Range Partition Type
363
To rearrange the order of the ports that make up the partition key, select a port in the Selected
Ports list and click the up or down arrow.
In key range partitioning, the order of the ports does not affect how the PowerCenter Server
redistributes rows among partitions, but it can affect session performance. For example, you
might configure the following compound partition key:
Selected Ports
ITEMS.DESCRIPTION
ITEMS.DISCONTINUED_FLAG
Since boolean comparisons are usually faster than string comparisons, the session may run
faster if you arrange the ports in the following order:
Selected Ports
ITEMS.DISCONTINUED_FLAG
ITEMS.DESCRIPTION
364
You can leave the start or end range blank for a partition. When you leave the start range
blank, the PowerCenter Server uses the minimum data value as the start range. When you
leave the end range blank, the PowerCenter Server uses the maximum data value as the end
range.
For example, you can add the following ranges for a key based on CUSTOMER_ID in a
pipeline that contains two partitions:
CUSTOMER_ID
Start Range
Partition #1
Partition #2
End Range
135000
135000
When the PowerCenter Server reads the Customers table, it sends all rows that contain
customer IDs less than 135000 to the first partition, and all rows that contain customer IDs
equal to or greater than 135000 to the second partition. The PowerCenter Server eliminates
rows that contain null values or values that fall outside the key ranges.
365
When you configure a pipeline to load data to a relational target, if a row contains null values
in any column that makes up the partition key or if a row contains a value that fall outside all
of the key ranges, the PowerCenter Server sends that row to the first partition.
When you configure a pipeline to read data from a relational source, the PowerCenter Server
reads rows that fall within the key ranges. It does not read rows with null values in any
partition key column.
If you want to read rows with null values in the partition key, use pass-through partitioning
and create a SQL override.
Consider the following guidelines when you create key ranges:
Use the standard PowerCenter date format to enter dates in key ranges.
The Workflow Manager does not validate overlapping string or numeric ranges.
366
Reader Thread
(First Stage)
Transformation Thread
(Second Stage)
Writer Thread
(Third Stage)
By default, this mapping contains partition points only at the source qualifier and target
instance. Since this mapping contains an XML target, you can configure only one partition at
any partition point.
In this case, the master thread creates one reader thread to read data from the source, one
transformation thread to process the data, and one writer thread to write data to the target.
Each pipeline stage processes the rows as follows:
Time
Source Qualifier
(First Stage)
Row Set 1
Row Set 2
Row Set 3
Row Set 4
...
Row Set n
Transformations
(Second Stage)
Row Set 1
Row Set 2
Row Set 3
...
Row Set n-1
Target Instance
(Third Stage)
Row Set 1
Row Set 2
...
Row Set n-2
Because the pipeline contains three stages, the PowerCenter Server can process three sets of
rows concurrently.
If the Expression transformations are very complicated, processing the second
(transformation) stage can take a long time and cause low data throughput. To improve
performance, set a partition point at Expression transformation EXP_2 and set the partition
367
type to pass-through. This creates an additional pipeline stage. The master thread creates an
additional transformation thread:
Reader Thread
(First Stage)
(Second Stage)
Transformation Threads
(Third Stage)
Writer Thread
(Fourth Stage)
The PowerCenter Server can now process four sets of rows concurrently as follows:
Time
Source
Qualifier
(First Stage)
Row Set 1
Row Set 2
Row Set 3
Row Set 4
...
Row Set n
Target
Instance
(Fourth Stage)
Row Set 1
...
Row Set n-3
By adding an additional partition point at Expression transformation EXP_2, you replace one
long running transformation stage with two shorter running transformation stages. Data
throughput depends on the longest running stage. So in this case, data throughput increases.
For more information about processing threads, see Understanding Processing Threads on
page 14.
368
By default, the PowerCenter Server fails the session when you use database partitioning for
non-DB2 targets. However, you can configure the PowerCenter Server to default to passthrough partitioning when you use database partitioning for non-DB2 relational targets:
You cannot use database partitioning when you configure the session to use source-based
or user-defined commit, constraint-based loading, or session recovery.
The target table must contain a partition key. Also, you must link all not-null partition key
columns in the target instance to a transformation in the mapping.
You must use high precision mode when the IBM DB2 table partitioning key uses a Bigint
field. The PowerCenter Server fails the session when the IBM DB2 table partitioning key
uses a Bigint field and you use low precision mode.
If you create multiple partitions for a DB2 bulk load session, you must use database
partitioning for the target partition type. If you choose any other partition type, the
PowerCenter Server reverts to normal load and writes the following message to the session
log:
ODL_26097 Only database partitioning is support for DB2 bulk load.
Changing target load type variable to Normal.
If you configure a session for database partitioning, the PowerCenter Server reverts to passthrough partitioning under the following circumstances:
369
370
You configure the PowerCenter Server to treat the database partitioning partition type as
pass-through partitioning and you used database partitioning for a non-DB2 relational
target.
Browse Button
Enter SQL overrides.
Enter filter conditions.
Transformations View
For more information about partitioning Application sources, refer to the PowerCenter
Connect documentation.
371
The SQL query also overrides any key range and filter condition that you enter for a source
partition. So, if you also enter a key range and source filter, the PowerCenter Server uses the
SQL query override to extract source data.
If you create a key that contains null values, you can extract the nulls by creating another
partition and entering an SQL query or filter to extract null values.
To enter an SQL query for each partition, click the Browse button in the SQL Query field.
Enter the query in the SQL Editor dialog box, and then click OK.
If you entered an SQL query in the Designer when you configured the Source Qualifier
transformation, that query appears in the SQL Query field for each partition. To override this
query, click the Browse button in the SQL Query field, revise the query in the SQL Editor
dialog box, and then click OK.
Start Range
Partition #1
Partition #2
End Range
135000
135000
If you know that the IDs for customers outside the USA fall within the range for a particular
partition, you can enter a filter in that partition to exclude them. Therefore, you enter the
following filter condition for the second partition:
CUSTOMERS.COUNTRY = USA
When the session runs, the following queries for the two partitions appear in the session log:
READER_1_1_1> RR_4010 SQ instance [SQ_CUSTOMERS] SQL Query [SELECT
CUSTOMERS.CUSTOMER_ID, CUSTOMERS.COMPANY, CUSTOMERS.LAST_NAME FROM
CUSTOMERS WHERE CUSTOMER.CUSTOMER ID < 135000]
[...]
READER_1_1_2> RR_4010 SQ instance [SQ_CUSTOMERS] SQL Query [SELECT
CUSTOMERS.CUSTOMER_ID, CUSTOMERS.COMPANY, CUSTOMERS.LAST_NAME FROM
CUSTOMERS WHERE CUSTOMERS.COUNTRY = USA AND 135000 <=
CUSTOMERS.CUSTOMER_ID]
372
To enter a filter condition, click the Browse button in the Source Filter field. Enter the filter
condition in the SQL Editor dialog box, and then click OK.
If you entered a filter condition in the Designer when you configured the Source Qualifier
transformation, that query appears in the Source Filter field for each partition. To override
this filter, click the Browse button in the Source Filter field, change the filter condition in the
SQL Editor dialog box, and then click OK.
373
374
You can use single- or multi-threaded reading with flat file or COBOL sources.
You cannot use multi-threaded reading if the source files are non-disk files, such as FTP
files or IBM MQSeries sources.
If you use a shift-sensitive code page, you can use multi-threaded reading only if the
following conditions are true:
You did not enable user-defined shift state in the source definition.
If you configure a session for multi-threaded reading, and the PowerCenter Server cannot
create multiple threads to a file source, it writes a message to the session log and reads the
source with one thread.
When the PowerCenter Server uses multiple threads to read a source file, it may not read
the rows in the file sequentially. If sort order is important, configure the session to read the
file with a single thread. For example, sort order may be important if the mapping contains
a sorted Joiner transformation and the file source is the sort origin.
You can also use a combination of direct and indirect files to balance the load.
Session performance for multi-threaded reading is optimal with large source files.
Although the PowerCenter Server can create multiple connections to small source files,
performance may not be optimal.
Reading direct files. You can configure the PowerCenter Server to read from one or more
direct files. If you configure the session with more than one direct file, the PowerCenter
Server creates a concurrent connection to each file. It does not create multiple connections
to a file.
Reading indirect files. When the PowerCenter Server reads an indirect file, it reads the file
list and reads the files in the list sequentially. If the session has more than one file list, the
PowerCenter Server reads the file lists concurrently, and it reads the files in the list
sequentially.
Reading direct files. When the PowerCenter Server reads a direct file, it creates multiple
reader threads to read the file concurrently. You can configure the PowerCenter Server to
read from one or more direct files. For example, if a session reads from two files and you
create five partitions, the PowerCenter Server may distribute one file among two partitions
and one file among three partitions.
Reading indirect files. When the PowerCenter Server reads an indirect file, it creates
multiple threads to read the file list concurrently. It also creates multiple threads to read
the files in the list concurrently. The PowerCenter Server may use more than one thread to
read a single file.
375
Table 13-5 describes the file properties settings for file sources in a mapping:
Table 13-5. File Properties Settings for File Sources
Attribute Value
Description
Enter the local source file directory. The default location is $PMSourceFileDir.
Enter the local source file name. You can also use the session variable, $InputFileName, as
defined in the parameter file. If you use a file list, enter the name of the list.
By default, the Workflow Manager uses the source file name for each partition. Edit the file
name property for partitions 2-n based on how you want the PowerCenter Server to read
the files.
Value
Partition #1
Partition #2
Partition #3
ProductsA.txt
empty.txt
empty.txt
Partition #1
Partition #2
Partition #3
ProductsA.txt
empty.txt
ProductsB.txt
The PowerCenter Server creates two threads. It creates one thread to read
ProductsA.txt, and it creates one thread to read ProductsB.txt. It reads the
files concurrently, and it reads rows in the files sequentially.
If you use FTP to access source files, you can choose a different connection for each direct
file. For more information about using FTP to access source files, see Using FTP on
page 559.
376
Table 13-7 describes the session configuration and the PowerCenter Server behavior when it
uses multiple threads to read source files:
Table 13-7. Configuring Source File Name for Multi-Threaded Reading
Attribute
Value
Partition #1
Partition #2
Partition #3
ProductsA.txt
<blank>
<blank>
Partition #1
Partition #2
Partition #3
ProductsA.txt
<blank>
ProductsB.txt
377
Properties Settings
Selected Target Instance
378
Table 13-8 describes the partitioning attributes for relational targets in a pipeline:
Table 13-8. Partitioning Relational Target Attributes
Attribute
Description
Name of reject file. Default is target name partition number.bad. You can also use the session
variable, $BadFileName, as defined in the parameter file.
Database Compatibility
When you configure a session with multiple partitions at the target instance, the PowerCenter
Server creates one connection to the target for each partition. If you configure multiple target
partitions in a session that loads to a database or ODBC target that does not support multiple
concurrent connections to tables, the session fails.
When you create multiple target partitions in a session that loads data to an Informix
database, you must create the target table with row-level locking. If you insert data from a
session with multiple partitions into an Informix target configured for page-level locking, the
session fails and returns the following message:
WRT_8206 Error: The target table has been created with page level locking.
The session can only run with multi partitions when the target table is
created with row level locking.
Sybase IQ does not allow multiple concurrent connections to tables. If you create multiple
target partitions in a session that loads to Sybase IQ, the PowerCenter Server loads all of the
data in one partition.
379
FTP. Transfer the partitioned target files to another machine. You can transfer the files to
any machine to which the PowerCenter Server can connect. For more information about
using FTP to load to target files, see Using FTP on page 559.
Loader. Use an external loader that can load from multiple output files. This option
appears if the pipeline loads data to a relational target and you choose a file writer in the
Writers settings on the Mapping tab. If you choose a loader that cannot load from multiple
output files, the PowerCenter Server fails the session. For more information about
configuring external loaders for partitioning, see Partitioning Sessions with External
Loaders on page 526.
Message Queue. Transfer the partitioned target files to an IBM MQSeries message queue.
For more information about loading to message queues, refer to the PowerCenter Connect
for IBM MQSeries User and Administrator Guide.
You can merge target files only if you choose local connections for all target partitions.
380
Connection Type
Transformations View
Table 13-9 describes the connection options for file targets in a mapping:
Table 13-9. File Targets Connection Options
Attribute
Description
Connection Type
Choose a local, FTP, external loader, or message queue connection. Select None for a local
connection.
The connection type is the same for all partitions.
Value
For an FTP, external loader, or message queue connection, click the button in this field to
select the connection object.
You can specify a different connection object for each partition.
381
Properties Settings
Select to merge target files.
Table 13-10 describes the file properties for file targets in a mapping:
Table 13-10. Target File Properties
382
Attribute
Description
If you select this option, the PowerCenter Server merges the partitioned target files into one
file when the session completes, and then deletes the individual output files. It does not
delete the individual files if it fails to create the merged file.
You cannot merge files if the session uses FTP, an external loader, or an MQSeries
message queue.
Description
Name of target file. Default is target name partition number.out. You can also use the
session variable, $OutputFileName, as defined in the parameter file.
383
partitions for the detail source, but only one partition for the master source (1:n).
1:n. Use one partition for the master source and multiple partitions for the detail source.
The PowerCenter Server maintains the sort order because it does not redistribute master
data among partitions.
n:n. Use an equal number of partitions for the master and detail sources. When you use
n:n partitions, the PowerCenter Server processes multiple partitions concurrently. You may
need to configure the partitions to maintain the sort order depending on the type of
partition you use at the Joiner transformation.
Note: When you use 1:n partitions, do not add a partition point at the Joiner transformation.
If you add a partition point at the Joiner transformation, the Workflow Manager adds an
equal number of partitions to both master and detail pipelines.
Use different partitioning guidelines, depending on where you sort the data:
384
Using sorted flat files. Use one of the following partitioning configurations:
Use 1:n partitions when you have one flat file in the master pipeline and multiple flat
files in the detail pipeline. Configure the session to use one reader-thread for each file.
Use n:n partitions when you have one large flat file in the master and detail pipelines.
Configure partitions to pass all sorted data in the first partition, and pass empty file data
in the other partitions.
Using sorted relational data. Use one of the following partitioning configurations:
Use n:n partitions. If you use a hash auto-keys partition, configure partitions to pass all
sorted data in the first partition.
Using the Sorter transformation. Use n:n partitions. If you use a hash auto-keys partition
at the Joiner transformation, configure each Sorter transformation to use hash auto-keys
partition points as well.
Note: Add only pass-through partition points between the sort origin and the Joiner
transformation.
Configure the mapping with one source and one Source Qualifier transformation in each
pipeline.
Specify the path and file name for each flat file in the Properties settings of the
Transformations view on the Mapping tab of the session properties.
Each file must use the same file properties as configured in the source definition.
385
The range of sorted data in the flat files can overlap. You do not need to use a unique range
of data for each file.
Figure 13-18 shows sorted file data joined using 1:n partitioning:
Figure 13-18. Sorted File Data with 1:n Partitions
Flat File
Source
Qualifier
Joiner
transformation
Flat File 1
Flat File 2
Flat File 3
Source
Qualifier
with passthrough
partition
Sorted Data
Sorted output depends on join type.
The Joiner transformation may output unsorted data depending on the join type. If you use a
full outer or detail outer join, the PowerCenter Server processes unmatched master rows last,
which can result in unsorted data.
386
Figure 13-19 shows sorted file data passed through a single partition to maintain sort order:
Figure 13-19. Sorted File Data Passed Through a Single Partition
Source
Qualifier
Source
Qualifier
Joiner
transformation
with hash autokeys partition
point
Sorted Data
No Data
The example in Figure 13-19 shows sorted data passed in a single partition to maintain the
sort order. The first partition contains sorted file data while all other partitions pass empty file
data. At the Joiner transformation, the PowerCenter Server distributes the data among all
partitions while maintaining the order of the sorted data.
387
Relational
Source
Source Qualifier
transformation
Joiner
transformation
Relational
Source
Source Qualifier
transformation with
key-range or passthrough partition point
Sorted Data
Unsorted Data
Sorted output
depends on join
type.
The Joiner transformation may output unsorted data depending on the join type. If you use a
full outer or detail outer join, the PowerCenter Server processes unmatched master rows last,
which can result in unsorted data.
388
Figure 13-21 shows sorted relational data passed through a single partition to maintain the
sort order:
Figure 13-21. Sorted Relational Data Passed Through a Single Partition
Relational
Source
Relational
Source
Source Qualifier
transformation with
key-range partition
point
Source Qualifier
transformation with
key-range partition
point
Joiner
transformation
with hash autokeys partition
point
Sorted Data
No Data
The example in Figure 13-21 shows sorted relational data passed in a single partition to
maintain the sort order. The first partition contains sorted relational data while all other
partitions pass empty data. After the PowerCenter Server joins the sorted data, it redistributes
data among multiple partitions.
389
Figure 13-22 shows Sorter transformations used with hash auto-keys to maintain sort order:
Figure 13-22. Using Sorter Transformations with Hash Auto-Keys to Maintain Sort Order
Source with
unsorted
data
Source with
unsorted
data
Source
Qualifier
transformation
Source
Qualifier
transformation
Sorter
transformation
with hash autokeys partition
point
Sorter
transformation
with hash autokeys partition
point
Joiner
transformation
with hash autokeys or passthrough
partition point
Sorted Data
Unsorted Data
Note: For best performance, use sorted flat files or sorted relational data. You may want to
calculate the processing overhead for adding Sorter transformations to your mapping.
390
You use the hash auto-keys partition type for the Lookup transformation.
For more information about cache partitioning, see Cache Partitioning on page 359.
391
392
Figure 13-23 shows where you specify the work directories in the session properties:
Figure 13-23. Session Properties - Configuring Sorter Transformations
393
It updates the current value of the variable separately in each partition according to the
variable function used in the mapping.
2.
After loading all the targets in a target load order group, the PowerCenter Server
combines the current values from each partition into a single final value based on the
aggregation type of the variable.
3.
If there is more than one target load order group in the session, the final current value of
a mapping variable in a target load order group becomes the current value in the next
target load order group.
4.
When the PowerCenter Server completes loading the last target load order group, the
final current value of the variable is saved into the repository.
For more information about mapping variables, see Mapping Parameters and Variables
in the Designer Guide. For more information about target load order groups, see Reading
Source Data on page 22.
Use one of the following variable functions in the mapping to set the variable value:
SetCountVariable
SetMaxVariable
SetMinVariable
For more information about the variable functions, see Functions in the Transformation
Language Reference.
Table 13-11 describes how the PowerCenter Server calculates variable values across partitions:
Table 13-11. Variable Value Calculations with Partitioned Sessions
Variable Function
SetCountVariable
PowerCenter Server calculates the final count values from all partitions.
SetMaxVariable
PowerCenter Server compares the final variable value for each partition and saves the
highest value.
SetMinVariable
PowerCenter Server compares the final variable value for each partition and saves the
lowest value.
Note: You should use the SetVariable function only once for each mapping variable in a
pipeline. When you create multiple partitions in a pipeline, the PowerCenter Server uses
multiple threads to process that pipeline. If you use this function more than once for the same
variable, the current value of a mapping variable may have indeterministic results.
394
Partitioning Rules
You can create multiple partitions in a pipeline if the PowerCenter Server can maintain data
consistency when it processes the partitioned data. When you create a session, the Workflow
Manager validates each pipeline for partitioning. You can change the partitioning information
for a pipeline as long as it conforms to the rules and restrictions listed in this section.
There are several types of partitioning rules and restrictions. These include restrictions on the
number of partitions, partitioning restrictions when you change a mapping, restrictions that
apply to other Informatica products, and general guidelines.
Partitioning Rules
395
Table 13-12 describes the restrictions on the number of partitions for transformations:
Table 13-12. Restrictions on the Number of Partitions for Transformations
Transformation
Restrictions
Custom transformation
By default, you can only specify one partition if the pipeline contains a Custom
transformation.
However, this transformation contains an option on the Properties tab to allow
multiple partitions. If you enable this option, you can specify multiple partitions at this
transformation. Do not select Is Partitionable if the Custom transformation procedure
performs the procedure based on all the input data together, such as data cleansing.
External Procedure
transformation
By default, you can only specify one partition if the pipeline contains an External
Procedure transformation.
This transformation contains an option on the Properties tab to allow multiple
partitions. If this option is enabled, you can specify multiple partitions at this
transformation.
Joiner transformation
You can specify only one partition if the pipeline contains the master source for a
Joiner transformation and you do not add a partition point at the Joiner
transformation.
You can specify only one partition if the pipeline contains XML targets.
396
You change a transformation that is a partition point in any of the following ways:
You switch the master and detail source for the Joiner transformation after you create a
pipeline with multiple partitions.
Restrictions
For MQSeries sources, you can specify multiple partitions only if there is no
associated source qualifier in the pipeline.
You cannot merge output files from sessions with multiple partitions if you
use an MQSeries message queue as the target connection type.
If the mapping contains hierarchies or IDOCs, then you can specify only one
partition and the partition type must be pass-through.
If you generate the ABAP program using exec SQL, then you can specify only
one partition and the partition type must be pass-through.
You must use the Informatica default date format to enter dates in key
ranges.
You can specify only one partition when the target load order group contains
an SAP BW target.
Partitioning Rules
397
Restrictions
When you use a source filter in a join override, always use the following
syntax for Siebel business components:
SiebelBusinessComponentName.SiebelFieldName
When you create a source filter for a Siebel business component, always use
the following syntax:
SiebelBusinessComponentName.SiebelFieldName
If the mapping contains a multi-group target that receives data from more
than one pipeline, then you can specify only one partition.
If the mapping contains a multi-group target that receives data from multiple
groups, then the partition type must be pass-through.
For more information about these other products, please see the product documentation.
Partitioning Guidelines
This section summarizes the other guidelines that appear throughout this chapter.
You can add a partition point at any other transformation provided that no partition point
receives input from more than one pipeline stage.
For more information, see Adding and Deleting Partition Points on page 353.
398
If you choose key range partitioning at any partition point, you must specify a range for
each port in the partition key.
If you choose key range partitioning and need to enter a date range for any port, use the
standard PowerCenter date format. For details on the default date format, see Dates in
the Transformation Language Reference.
The Workflow Manager does not validate overlapping string ranges, overlapping numeric
ranges, gaps, or missing ranges.
If a row contains a null value in any column that makes up the partition key, or if a row
contains values that fall outside all of the key ranges, the PowerCenter Server sends that
row to the first partition.
When connecting to file sources or targets, you must choose the same connection type for
all partitions. You may choose different connection objects as long as each object is of the
same type. For more information, see Partitioning File Sources on page 374 and
Partitioning File Targets on page 380.
You cannot merge output files from sessions with multiple partitions if you use FTP, an
external loader, or an MQSeries message queue as the target connection type. For more
information, see Partitioning File Targets on page 380.
Partitioning Rules
399
400
Chapter 14
Monitoring Workflows
This chapter covers the following topics:
Overview, 402
Tips, 441
401
Overview
You can monitor workflows and tasks in the Workflow Monitor. View details about a
workflow or task in Gantt Chart view or Task view. You can run, stop, abort, and resume
workflows from the Workflow Monitor.
The Workflow Monitor displays workflows that have run at least once. The Workflow
Monitor continuously receives information from the PowerCenter Server and Repository
Server. It also fetches information from the repository to display historic information.
The Workflow Monitor consists of the following windows:
Output window. Displays messages from the PowerCenter Server and the Repository
Server.
Gantt Chart view. Displays details about workflow runs in chronological (Gantt Chart)
format.
Task view. Displays details about workflow runs in a report format, organized by workflow
run.
The Workflow Monitor displays time relative to the time configured on the PowerCenter
Server machine. For example, a folder contains two workflows. One workflow runs on a
PowerCenter Server in your local time zone, and the other runs on a PowerCenter Server in a
time zone two hours later. If you start both workflows at 9 a.m. local time, the Workflow
Monitor displays the start time as 9 a.m. for one workflow and as 11 a.m. for the other
workflow.
402
Navigator
Window
Gantt
Chart
View
Task View
Output Window
Time Window
Toggle between Gantt Chart view and Task view by clicking the tabs on the bottom of the
Workflow Monitor.
Note: You can view and hide the Output window in the Workflow Monitor. To toggle back
Use Workflow Manager privilege with the execute permission on the folder
You must also have execute permission for connection objects to restart, resume, stop, or
abort a workflow containing a session.
For more information on permissions and privileges necessary to use the Workflow Monitor,
see Permissions and Privileges by Task in the Repository Guide.
Overview
403
2.
3.
4.
5.
Configure the Workflow Manager to open the Workflow Monitor when you run a
workflow from the Workflow Manager.
You can open multiple instances of the Workflow Monitor on one machine using the
Windows Start menu.
To open the Workflow Monitor when you start a workflow:
1.
2.
In the General tab, select Launch Workflow Monitor When Workflow Is Started.
2.
404
Connecting to Repositories
When you open the Workflow Monitor, you must connect to a repository to monitor the
objects in it. Connect to repositories by choosing Repository-Connect. Enter the repository
name and connection information.
Once you connect to a repository, the Workflow Monitor displays a list of servers available for
the repository. The Workflow Monitor can monitor multiple repositories, PowerCenter
Servers, and workflows at the same time.
Note: If you are not connected to a repository, you can remove the repository from the
Navigator. Select the repository in the Navigator and choose Edit-Delete. The Workflow
Monitor displays a message verifying that you want to remove the repository from the
Navigator list. Click Yes to remove the repository. You can connect to the repository again at
any time.
When you open a PowerCenter Server, the Workflow Monitor gets workflow run information
stored in the repository. It does not get dynamic workflow run information from currently
running workflows.
Filtering Tasks
You can view all or some workflow tasks. You can filter out tasks to view only tasks you want.
For example, if you want to view only Session tasks, you can hide all other tasks. You can view
all tasks at any time.
405
You can also filter deleted tasks. To filter deleted tasks, choose Filters-Deleted Tasks.
To filter tasks:
1.
Choose Filters-Tasks.
The Filter Tasks dialog box appears.
2.
Clear the tasks you want to hide, and select the tasks you want to view.
3.
Click OK.
Note: When you filter a task, the Gantt Chart view displays a red link between tasks to
indicate a filtered task. You can double-click the link to view the tasks you hid.
Filtering Servers
When you connect to a repository, the Workflow Monitor displays a list of registered servers
and deleted servers. When you register multiple servers, you can filter out servers to view only
servers you want to monitor.
When you hide a server, the Workflow Monitor hides the server from the Navigator for both
Gantt Chart and Task view. You can show the server at any time.
You can hide unconnected servers. When you hide a connected server, the Workflow Monitor
asks if you want to disconnect from the server and then filter it. You must disconnect from a
server before hiding it.
To filter ser vers:
1.
406
2.
Select the servers you want to view, and clear the servers you want to filter. Click OK.
If you are connected to a server that you clear, the Workflow Monitor prompts you to
disconnect from the server before filtering.
3.
Tip: You can also filter a server in the Navigator by right-clicking it and selecting Filter Server.
407
Viewing Statistics
You can view statistics about the objects you monitor in the Workflow Monitor by choosing
View-Statistics. The Statistics dialog box displays the following information:
Number of connected servers. Number of servers you connected to since you opened the
Workflow Monitor.
Number of fetched tasks. Number of tasks the Workflow Monitor fetched from the
repository during the period specified in the Time window.
Viewing Properties
You can view properties for the following items:
Tasks. You can view properties such as task name, start time, and status.
Sessions. You can view properties about the Session task and session run, such as mapping
name and number of rows successfully loaded. You can also view load statistics about the
session run. For more information on session details, see Monitoring Session Details on
page 434. You can also view performance details about the session run. For more
information, see Creating and Viewing Performance Details on page 436.
Workflows. You can view properties such as start time, status, and run type.
Links. When you double-click a link between tasks in Gantt Chart view, you can view
tasks you hide.
Servers. You can view properties such as server version and startup time. You can also view
the sessions and workflows running on the PowerCenter Server.
Folders. You can view properties such as the number of workflow runs displayed in the
Time window.
To view properties for all objects, right-click the object and select Properties. You can rightclick items in the Navigator or the Time window in either Gantt Chart view or Task view.
To view link properties, double-click the link in the Time window of Gantt Chart view.
When you view link properties, you can double-click a task in the Link Properties dialog box
to view the properties for the filtered task.
408
General. Customize general options such as the maximum number of workflow runs to
display and whether to receive messages from the Workflow Manager. See Configuring
General Options on page 409
Gantt Chart view. Configure Gantt Chart view options such as workspace color, status
colors, and time format. See Configuring Gantt Chart View Options on page 411.
Task view. Configure which columns to display in Task view. See Configuring Task View
Options on page 412.
Advanced. Configure advanced options such as the number of workflow runs the
Workflow Monitor holds in memory for each server. Configuring Advanced Options on
page 412.
409
Table 14-1 describes the options you can configure on the General tab:
Table 14-1. Workflow Monitor General Options
410
Setting
Description
Maximum Days
Specifies the maximum number of workflow runs the Workflow Monitor displays for
each folder. The default is 200.
Select this option to receive messages from the Workflow Manager. The Workflow
Manager sends messages when you start or schedule a workflow in the Workflow
Manager. The Workflow Monitor displays these messages in the Output window.
Select this option to receive notifications from the Repository Server. Notifications
from the Repository Server display in the Output window Notifications tab.
Enter the path and file name of the text editor to view and edit workflow and session
logs. You can browse to select an editor. By default, the Workflow Monitor uses
WordPad.
Location
The location where the Workflow Monitor stores temporary versions of log files
when you open session or workflow logs from the Workflow Monitor.
Table 14-2 describes the options you can configure on the Gantt Chart Options tab:
Table 14-2. Gantt Chart Options
Gantt Chart Option
Description
Status Color
Choose a status and configure the color for the status. The Workflow Monitor displays tasks
with the selected status in the colors you choose. You can choose two colors to display a
gradient.
Recovery Color
Configure the color for the recovery sessions. The Workflow Monitor uses the status color for
the body of the status bar, and it uses and the recovery color as a gradient in the status bar.
Workspace Color
Time Format
411
412
Table 14-3 describes the options you can configure on the Advanced tab:
Table 14-3. Advanced Workflow Monitor Options
Setting
Description
Hides folders or workflows under the Workflow Run column in the Time
window when you filter running or scheduled tasks.
Highlights the entire row in the Time window for selected items. When
you disable this option, the Workflow Monitor highlights the item in the
Workflow Run column in the Time window.
413
414
Setting
Description
Allows you to open the number of workflow runs of your choice. The
number of runs to be opened is set at 20 by default.
Specifies the minimum number of workflow runs per server that the
Workflow Monitor holds in memory before it starts releasing older runs
from memory.
When you connect to a server, the Workflow Monitor fetches the
number of workflow runs specified on the General tab for each folder
you connect to. When the number of runs is less than the number
specified in this option, the Workflow Monitor stores new runs in
memory until it reaches this number. Then it releases the oldest run
from memory when it fetches a new run.
When the number of workflow runs the Workflow Monitor initially
fetches exceeds the number specified in this option, the Workflow
Monitor stores all those runs and then releases the oldest run from
memory when it fetches a new run.
For details on how to perform these toolbar operations, see Using the Designer in the
Designer Guide.
By default, the Workflow Monitor displays the following toolbars:
Standard. Contains buttons to connect to and disconnect from repositories, and to zoom
and print the workspace.
Figure 14-7 displays the Standard toolbar:
Figure 14-7. Standard Toolbar
Server. Contains buttons to connect to and disconnect from PowerCenter Servers, to ping
the server, and to start and stop workflows, worklets, and tasks.
Figure 14-8 displays the Server toolbar:
Figure 14-8. Server Toolbar
View. Contains buttons to refresh the view and to open workflow and session logs.
Figure 14-9 displays the View toolbar:
Figure 14-9. View Toolbar
Filter. Contains buttons to display most recent runs, and to filter tasks, servers, and
folders.
Figure 14-10 displays the Filter toolbar:
Figure 14-10. Filter Toolbar
415
2.
2.
Right-click the task or worklet in the Navigator and choose Restart Task.
The PowerCenter Server runs the task or worklet you specify. It does not run the rest of
the workflow.
416
In the Navigator, select the task from which you want to run the workflow.
2.
2.
Choose Tasks-Resume.
or
Right-click the workflow or worklet in the Navigator and choose Resume.
The Workflow Monitor displays server messages about the resume command in the
Output window.
2.
Choose Tasks-Resume/Recover.
or
Right-click the workflow or worklet in the Navigator and choose Resume/Recover.
The Workflow Monitor displays server messages about the recover command in the
Output window.
417
In the Navigator, select the task, workflow, or worklet you want to stop or abort.
2.
3.
The Workflow Monitor displays the status of the stop or abort command in the Output
window.
418
save logs by timestamp. For more information on workflow and session logs, see Log Files
on page 455.
2.
419
420
Status for
Description
Aborted
Workflows
Tasks
Aborting
Workflows
Tasks
Disabled
Workflows
Tasks
You select the Disabled option in the workflow or task properties. The
PowerCenter Server does not run the disabled workflow or task until you clear
the Disabled option.
Failed
Workflows
Tasks
Running
Workflows
Tasks
Scheduled
Workflows
You schedule the workflow to run at a future date. The PowerCenter Server
runs the workflow for the duration of the schedule.
Stopped
Workflows
Tasks
You choose to stop the workflow or task in the Workflow Monitor. The
PowerCenter Server stopped the workflow or task.
Stopping
Workflows
Tasks
Succeeded
Workflows
Tasks
Suspended
Workflows
Worklets
The PowerCenter Server suspends the workflow because a task fails and no
other tasks are running in the workflow. This status is available only when you
choose the Suspend on Error option.
Suspending
Workflows
Worklets
A task fails in the workflow when other tasks are still running. The PowerCenter
Server stops executing the failed task and continues executing tasks in other
paths. This status is available only when you choose the Suspend on Error
option.
Terminated
Workflows
Unscheduled
Workflows
You removed a workflow from the schedule. Or, the workflow is scheduled and
the PowerCenter Server is about to run the scheduled workflow.
Waiting
Workflows
Tasks
421
To see a list of tasks by status, view the workflow in Task view and sort by status. Or, choose
Edit-List Tasks in Gantt Chart view. For details, see Listing Tasks and Workflows on
page 424.
422
Duration. The length of time the PowerCenter Server spends running the most recent task
or workflow.
Status. The status of the most recent task or workflow. For more information about status,
see Workflow and Task Status on page 421.
Connection between objects. The Workflow Monitor shows links between objects in the
Time window.
Organizing Tasks
In Gantt Chart view, you can organize tasks in the Navigator. You can drag and drop tasks
within a workflow to change the order they appear in the Navigator.
Using the Gantt Chart View
423
For example, the Workflow Monitor usually displays the Decision task as the first task in the
following workflow:
You can drag and drop the Decision task within the Navigator so the Decision task is in the
middle or at the bottom of the list of tasks for that workflow:
424
Open the Gantt Chart view and choose Edit-List Tasks. The List Tasks dialog box
appears.
2.
In the List What field, select the type of task status you want to list.
For example, select Failed to view a list of failed tasks and workflows.
3.
Chart view.
Right-click the task or workflow and choose Go To Next Run, or choose Go To Previous
Run.
When you choose View-Organize, the Go To field appears above the Time window. Click the
Go To field to view a calendar and select the date you want to display. When you choose a
date, the Workflow Monitor displays that date beginning at 12:00 a.m.
425
426
Zoom
30 Minute
Increments
Solid Line
For Hour
Increments
Dotted Line
For Half Hour
Increments
To zoom the Time window in Gantt Chart view, choose View-Zoom and then choose the
desired time increment.
You can also choose the time increment in the Zoom button on the toolbar.
Performing a Search
Use the search tool in the Gantt Chart view to search for tasks, workflows, and worklets in all
repositories you connect to. The Workflow Monitor searches for the word you specify in task
names, workflow names, and worklet names. You can highlight the task in Gantt Chart view
by double-clicking the task after searching.
427
To perform a search:
1.
Open the Gantt Chart view and choose Edit-Find. The Find Object dialog box appears.
2.
In the Find What field, enter the keyword you want to find.
3.
428
429
Workflow run list. The list of workflow runs. The workflow run list contains folder,
workflow, worklet, and task names. The Workflow Monitor displays workflow runs
chronologically with the most recent run at the top. It displays folders and servers
alphabetically.
Start time. The time that the PowerCenter Server starts executing the task or workflow.
Completion time. The time that the PowerCenter Server finishes executing the task or
workflow.
Status message. Message from the PowerCenter Server regarding the status of the task or
workflow.
Run type. The method you used to start the workflow. You might manually start the
workflow or schedule the workflow to start.
Filter tasks. Use the Filter menu to select the tasks you want to display or hide. For more
information on filtering tasks in Task view, see Filtering in Task View on page 431.
Hide and view columns. Hide or view an entire column in Task view. For details on
hiding and viewing columns in Task view, see Configuring Task View Options on
page 412.
Hide and view the Navigator. You can hide the Navigator in Task view. Choose ViewNavigator to hide or view the Navigator.
To view the tasks in Task view, select the server you want to monitor in the Navigator.
430
Navigator
Window
Workflow
Run List
Time Window
Task View
Output
Window
By task type. You can filter out tasks to view only tasks you want. For example, if you want
to view only session task types, you can filter out all other tasks. For more information on
filtering task types and servers, see Filtering Tasks and Servers on page 405.
By nodes in the Navigator. You can filter the workflow runs the Workflow Monitor
displays in the Time window by selecting different nodes in the Navigator. For example,
when you select a repository name in the Navigator, the Time window displays all
workflow runs that ran on the PowerCenter Servers registered to that repository. When
you select a folder name in the Navigator, the Time window displays all workflow runs in
that folder.
By the most recent runs. To display by the most recent runs, choose Filters-Most Recent
Runs and choose the number of runs you want to display.
By Time window columns. You can choose Filters-Auto Filter and filter by properties you
specify in the Time window columns.
431
Filter Button
Select the
workflows you want
to display.
2.
3.
When you click the Filter button in either the Start Time or Completion Time column,
you can choose a custom time to filter.
432
4.
Select Custom for either Start Time or Completion Time. The Filter Start Time or
Custom Completion Time dialog box appears.
5.
Choose to show tasks before, after, or between the time you specify. Select the date and
time. Click OK.
433
When you create multiple partitions in a session, the PowerCenter Server provides session
details for each partition. You can use these details to determine if the data is evenly
distributed among the partitions. For example, if the PowerCenter Server moves more rows
through one target partition than another, or if the throughput is not evenly distributed, you
might want to adjust the data range for the partitions.
When you load data to a target with multiple groups, such as an XML target, the
PowerCenter Server provides session details for each group.
Table 14-5 lists the information on the Transformation Statistics tab:
Table 14-5. Session Details on the Transformation Statistics Tab
434
Session Detail
Description
Instance Name
Name of the source qualifier instance or the target instance in the mapping. If you create
multiple partitions in the source or target, the Instance Name displays the partition number.
If the source or target contains multiple groups, the Instance Name displays the group
name.
Transformation Name
Description
Applied Rows
For targets, shows the number of rows the PowerCenter Server successfully applied to the
target (that is, the target returned no errors).
For sources, shows the number of rows the PowerCenter Server successfully read from
the source.
Note: The number of applied rows equals the number of affected rows for sources.
Affected Rows
For targets, shows the number of rows affected by the specified operation. For example,
you have a table with one column called SALES_ID and five rows containing the values 1,
2, 3, 2, and 2. You mark rows for update where SALES_ID is 2. The writer affects three
rows, even though there was only one update request. Or, if you mark rows for update
where SALES_ID is 4, the writer affects 0 rows.
For sources, shows the number of rows the PowerCenter Server successfully read from
the source.
Note: The number of applied rows equals the number of affected rows for sources.
Rejected Rows
Number of rows the PowerCenter Server dropped when reading from the source, or the
number of rows the PowerCenter Server rejected when writing to the target.
Throughput (Rows/Sec)
Rate at which the PowerCenter Server read rows from the source or wrote data into the
target in bytes per second.
The most recent error message written to the session log. If you view details after the
session completes, this field displays the last error message.
The error message code of the most recent error message written to the session log. If you
view details after the session completes, this field displays the last error code.
Start Time
The time the PowerCenter Server started to read from the source or write to the target.
The Workflow Monitor displays time relative to the PowerCenter Server.
End Time
The time the PowerCenter Server finished reading from the source or writing to the target.
The Workflow Monitor displays time relative to the PowerCenter Server.
435
Index and data cache size for Aggregator, Rank, Lookup, and Joiner transformations
Lookup transformations
Before using performance details to improve session performance you must do the following:
Enable monitoring
Enabling Monitoring
To view performance details, you must enable monitoring in the session properties before
running the session.
To enable monitoring:
1.
2.
In the Performance settings of the Properties tab, select Collect Performance Data, and
click OK.
3.
436
While the session is running, right-click the session in the Workflow Monitor and choose
Properties.
2.
3.
Click OK.
2.
437
Some transformations have counters specific to their functionality. For example, each Lookup
transformation has a counter that indicates the number of rows stored in the lookup cache.
When you read performance details, the first column displays the transformation name as it
appears in the mapping, the second column contains the counter name, and the third column
holds the resulting number or efficiency percentage.
When you create multiple partitions in a pipeline, the PowerCenter Server generates one set
of counters for each partition. The following performance counters illustrate two partitions
for an Expression transformation:
Transformation
Counter
Value
EXPTRANS [1]
Expression_input rows
Expression_output rows
Expression_input rows
16
Expression_output rows
16
EXPTRANS [2]
Note: When you increase the number of partitions, the number of aggregate or rank input
rows may be different from the number of output rows from the previous transformation.
Table 14-6 lists the counters that may appear in the Session Performance Details dialog box or
in the performance details file:
Table 14-6. Performance Counters
Transformation
Aggregator and
Rank
Transformations
438
Counters
Description
Aggregator/Rank_inputrows
Aggregator/Rank_outputrows
Aggregator/Rank_errorrows
Aggregator/Rank_readfromcache
Aggregator/Rank_writetocache
Aggregator/Rank_readfromdisk
Aggregator/Rank_writetodisk
Aggregator/Rank_newgroupkey
Aggregator/Rank_oldgroupkey
Lookup
Transformation
Joiner
Transformation
Counters
Description
Lookup_inputrows
Lookup_outputrows
Lookup_errorrows
Lookup_rowsinlookupcache
Joiner_inputMasterRows
Joiner_inputDetailRows
Joiner_outputrows
Joiner_errorrows
Joiner_readfromcache
Joiner_writetocache
Joiner_readfromdisk*
Joiner_writetodisk*
Joiner_readBlockFromDisk**
Joiner_writeBlockToDisk**
Joiner_seekToBlockInDisk**
Joiner_insertInDetailCache*
Joiner_duplicaterows
Joiner_duplicaterowsused
439
All Other
Transformations
Counters
Description
Transformation_inputrows
Transformation_outputrows
Transformation_errorrows
*The PowerCenter Server generates this counter when you use sorted input for the Joiner transformation.
**The PowerCenter Server generates this counter when you do not use sorted input for the Joiner transformation.
If you have multiple source qualifiers and targets, evaluate them as a whole. For source
qualifiers and targets, a high value is considered 80-100 percent. Low is considered 0-20
percent.
440
Tips
Reduce the size of the Time window.
When you reduce the size of the Time window, the Workflow Monitor refreshes the screen
faster, reducing flicker.
Use the Repository Manager to truncate the list of workflow logs.
If the Workflow Monitor takes a long time to refresh from the repository or to open folders,
truncate the list of workflow logs. When you configure a session or workflow to archive
session logs or workflow logs, the PowerCenter Server saves those logs in local directories. The
repository also creates an entry for each saved workflow log and session log. If you move or
delete a session log or workflow log from the workflow log directory or session log directory,
truncate the lists of workflow and session logs to remove the entries from the repository. The
repository always retains the most recent workflow log entry for each workflow.
Tips
441
442
Chapter 15
Overview, 444
443
Overview
You can register and run multiple PowerCenter Servers against a local or global repository.
When you register multiple PowerCenter Servers to the same repository, you can distribute
the workload across the servers to increase performance.
You have the following options to run workflows and sessions using multiple servers:
Use a server grid to run workflows. You can use a server grid to automate the distribution
of sessions. A server grid is a server object that distributes sessions in a workflow to servers
based on server availability. The grid maintains connections to multiple servers in the grid.
For more information about using server grids, see Working with Server Grids on
page 446.
Change the assigned server for a workflow. When you configure a workflow, you assign a
server to run that workflow. Each time the scheduled workflow runs, it runs on the
assigned server. You can change the assigned server for a workflow in the workflow
properties.
Change the assigned server for a session. When you configure a session, by default it runs
on the server assigned to the workflow. You can change the assigned server for a session in
the session properties.
Start a workflow on a non-assigned server. By default, each workflow runs on its assigned
PowerCenter Server. You can run a workflow on a non-assigned server if the workflow is
not currently running. Use the Start Workflow button on the Standard toolbar, and choose
a PowerCenter Server.
You can use the Workflow Monitor to monitor workflows running on multiple servers. For
server grids, the Workflow Monitor shows the individual status of each server in a grid. You
can identify the server grid that a server is assigned to by right-clicking the server in the
Workflow Monitor and selecting Properties. For more information about using the Workflow
Monitor, see Monitoring Workflows on page 401.
Tip: You might want to place the most CPU intensive sessions on the more powerful servers.
444
If you do not use a central file server, you need to relocate input files to the default directories
of the new PowerCenter Server. Input files can include parameter files, cache files, external
procedures, and flat file sources.
Use consistent server variables. Use the same variable for $PMCacheDir for each
PowerCenter Server running incremental aggregation sessions.
Run incremental aggregation sessions on the same machine. When you run large
incremental aggregation sessions, you might want to consider assigning a server to a
session and overriding the server variable to write to a drive local to the assigned
PowerCenter Server.
Move incremental aggregation files. If you cannot make files accessible to each
PowerCenter Server, or if the files are very large, you must move them to the server
running the session.
Note: Since aggregate files can become very large, make sure the directory can accommodate
445
Distributing Sessions
In a server grid, the master server starts the workflow and then distributes sessions to worker
servers. The master server is the server that starts a workflow. A worker server is a server that
runs sessions assigned to it by a master server. By default, each PowerCenter Server in a server
grid is both a master server and a worker server. This means that a server in a grid can
distribute sessions to and receive sessions from every server in the grid. The master server
distributes sessions that are ready to run to available worker servers in a round-robin fashion
based on server availability. The starting point for the session assignment is random.
If a worker server is running the maximum number of concurrent sessions, the master server
assigns another worker server to run the session. If all worker servers are running the
maximum number of concurrent sessions, the master server places the session in its own ready
queue.
For information about configuring the maximum number of concurrent sessions, see
Installing and Configuring the PowerCenter Server on Windows and Installing and
Configuring the PowerCenter Server on UNIX in the Installation and Configuration Guide.
Figure 15-1 shows how a master server distributes the sessions in Workflow1 among the
servers in a grid. The server grid contains Server A, Server B, and Server C. Server A is the
master server, and Server B and Server C are worker servers.
Figure 15-1. Distributing Sessions in a Server Grid
In Workflow1, Server A is the master
server.
Server B
Server A
Server C
Server A
446
Figure 15-2 shows how a master server distributes sessions in a workflow where a non-session
task exists. Server C is the master server, and Server A and Server B are worker servers. Server
C runs all non-session tasks it encounters and assigns sessions in a round-robin fashion.
Figure 15-2. Running a Non-session Task on the Master Server
Server C is the master server.
Server C
Server A
Server C
Server B
Server A
Server C
Server B
447
Table 15-1 lists scenarios where a server grid can lose connectivity:
Table 15-1. Losing Connectivity in a Server Grid
Connectivity Loss
Server Behavior
The worker server is not available to the master servers in the server grid.
Master servers do not assign a session to the unavailable worker server and
proceed with the round-robin distribution of sessions.
The master server marks the status of the session as terminated. The worker
server stops running all sessions. The session settings you specify determine if
the workflow fails. For more information about the Fail parent if this task fails
option, Fail parent if this task does not run option, or Disable this task option,
see Configuring Tasks on page 135.
The shut down mode you specify determines how the worker server handles
sessions when it shuts down. When you shut down the worker server in
complete mode, it continues to run the sessions it started until it completes, but
does not accept sessions from master servers. For more information about
shut down modes, see pmcmd Reference on page 594.
The worker server continues to run the session and writes its status to the
session log. However, the master server marks the status of the session as
terminated.
You must resume the workflow or resume from the failed task to continue
running the workflow and update the session status. If you do not need the
session status of the previous run, you can restart the workflow or restart the
workflow from a task to start up a new workflow run. For more information, see
Working with Tasks and Workflows on page 416.
Workflow fails. You must restart the workflow on another server or wait for the
master server to become available.
The shut down mode you specify determines how the master server handles
workflows and sessions when it shuts down. When you shut down the master
server in complete mode, it continues to run the workflows and sessions it
started until they complete, but does not accept tasks from other master
servers. For more information about shut down modes, see pmcmd
Reference on page 594.
448
directories, and temp directories for the PowerCenter Servers, each PowerCenter Server in a
server grid must meet the following requirements:
Use the same server variables for each server in a grid, except for the $PMTempDir,
$PMSessionLogDir, and $PMWorkflowLogDir variables.
449
You can assign a server to run the workflow. When you assign a server to a workflow, the
server becomes the master server for the workflow.
You can configure the entire workflow to run only on the master server. By default, the
master server distributes sessions to worker servers. You can configure the session to
override this workflow configuration.
Caching. When you run sessions that access large cache files, such as incremental
aggregation files, you can increase performance by using a drive local to the PowerCenter
Server for the cache directory. Assign a server to a session and override the server variable
to write to a drive local to the PowerCenter Server.
External loader. Assign a server to run DB2 EEE external loader sessions. DB2 EEE
loaders require that the loader process runs on the PowerCenter Server running the session.
Note: If you assign a server to a session that is not in the grid, and the master server cannot
450
Override Examples
Table 15-2 shows a configuration where the session properties override the workflow
properties. The session runs on Server B even though you select the workflow option to run
all tasks on Server A because the session is assigned to Server B.
Table 15-2. Override Workflow Properties
Level
Configuration
Grid
Workflow
- Run on Server A.
- Tasks must run on server.
Session
Run on Server B.
Table 15-3 shows a configuration where the session properties override the server grid
properties. The session runs on Server B, even though you configure Server B not to accept
tasks from the grid because you assigned the session to Server B.
Table 15-3. Override Server Grid Properties
Level
Configuration
Grid
Workflow
- Run on Server A.
- Tasks can run on other servers in the grid.
Session
Run on Server B.
2.
Click New.
451
The Server Grid Editor opens with a list of available PowerCenter Servers.
3.
4.
Select the server you want to include in the server grid, and click Add.
The selected server appears in Selected Servers column.
5.
Clear Accept tasks from Server Grid if you want the server to be only a master server.
Configure as both a
master and worker
server.
6.
452
Repeat steps 4 and 5 until you have chosen all the servers for the grid.
7.
Click OK.
The server grid name appears in the Server Grid Browser. Select Show servers in grid to
view the servers in the grid.
8.
Click Close.
453
454
Chapter 16
Log Files
This chapter covers the following topics:
Overview, 456
455
Overview
The PowerCenter Server can create log files for each workflow it runs. These files contain
information about the tasks the PowerCenter Server performs, plus statistics about the
workflow and all sessions in the workflow. If the writer or target database rejects data during a
session run, the PowerCenter Server creates a file that contains the rejected rows.
The PowerCenter Server can create the following types of log files:
Workflow log. Contains information about the workflow run such as workflow name,
tasks executed, and workflow errors. By default, the PowerCenter Server writes this
information to the server log or Windows Event Log, depending on how you configure the
PowerCenter Server. If you wish to create a workflow log, enter a workflow file name in the
workflow properties. For more information, see Workflow Logs on page 457.
Session log. Contains information about the tasks that the PowerCenter Server performs
during a session, plus load summary and transformation statistics. By default, the
PowerCenter Server creates one session log for each session it runs. If a workflow contains
multiple sessions, the PowerCenter Server creates a separate session log for each session in
the workflow. For more information, see Session Logs on page 463.
Reject file. Contains rows rejected by the writer or target file during a session run. If the
writer or target does not reject any data during a session, the PowerCenter Server does not
generate a reject file for that session. For more information, see Reject Files on page 476.
By default, the PowerCenter Server saves each type of log file in its own directory. The
PowerCenter Server represents these directories using server variables.
Table 16-1 shows the default location for each type of log file:
Table 16-1. Log File Default Locations
Log File Type
Default Directory
(Server Variable)
Value
Workflow logs
$PMWorkflowLogDir
$PMRootDir/WorkflowLogs
Session logs
$PMSessionLogDir
$PMRootDir/SessLogs
Reject files
$PMBadFileDir
$PMRootDir/BadFiles
You can change the default directories at the server level by editing the server connection in
the Workflow Manager. You can also override these values for individual workflows or sessions
by updating the workflow or session properties.
456
Workflow Logs
You can configure a workflow to create a workflow log. When you do this, the PowerCenter
Server writes information such as process initialization, workflow task run information, errors
encountered, and workflow run summary to the workflow log.
In general, a workflow log contains the following information about the workflow:
Workflow name
Workflow status
The PowerCenter Server categorizes workflow log error messages into severity levels. The
PowerCenter Server either writes or does not write an error message to the log file based on
the error severity level. You can set the Error Severity Level for Log Files in the PowerCenter
Server setup program. For more information, see Installing and Configuring the
PowerCenter Server on Windows or Installing and Configuring the PowerCenter Server on
UNIX in the Installation and Configuration Guide. You can also configure the PowerCenter
Server to suppress writing messages to the workflow log file completely.
As with PowerCenter Server logs and session logs, the PowerCenter Server enters a code
number into the workflow log file message along with message text. You can find information
on error messages in the Troubleshooting Guide.
You configure a workflow to create a workflow log by entering a workflow log file name in the
workflow properties. If you choose to create a workflow log, the PowerCenter Server saves the
workflow log in a directory entered for the server variable $PMWorkflowLogDir in the
PowerCenter Server registration. You can override the workflow log directory at the server
level or at the workflow level.
By default, the PowerCenter Server saves one workflow log for each workflow. If you want to
save multiple logs for different workflow runs, you can configure the workflow to save a
workflow log file by timestamp, which permits an unlimited number of workflow logs, or by
run, which saves a specified number of logs. To view previous workflow logs, save log files by
timestamp.
If you choose not to create workflow logs, the PowerCenter Server writes the workflow log
messages to the to the server log or Windows Event Log, depending on how you configure the
PowerCenter Server. For more information on configuring the PowerCenter Server, see
Installing and Configuring the PowerCenter Server on Windows or Installing and
Configuring the PowerCenter Server on UNIX in the Installation and Configuration Guide.
Workflow Logs
457
Description
CMN
Messages related to databases, memory allocation, Lookup and Joiner transformations, and internal
errors.
LM
REP
TM
VAR
458
INFO : LM_36522 : (270|305) Started DTM process [pid = 273] for session
instance [s_PhoneList].
INFO : CMN_1760 : (273|255) Message from session: LM_36033 [Connected to
repository [SALES] running on server:port [monster]:[5001] user
[Administrator]].
INFO : CMN_1760 : (273|255) Message from session: TM_6228 [Writing session
output to log file [d:\pcserver\SessLogs\s_PhoneList.log].].
INFO : LM_36333 [Tue Nov 18 11:16:43 2003] : (270|306) Execution of
session instance [s_PhoneList] succeeded.
INFO : LM_36318 [Tue Nov 18 11:16:43 2003] : (270|306) Execution of
workflow [wf_PhoneList] succeeded.
Location. You can configure the directory where you want the workflow log created. By
default, the PowerCenter Server creates the workflow log in the directory configured for
the $PMWorkflowLogDir server variable. You can enter a different directory, but if the
directory does not exist or is not local to the PowerCenter Server that runs the workflow,
the workflow fails.
Name. If you wish to create a workflow log, you can enter a name for the workflow log
file. If you do not enter a filename, the PowerCenter Server does not create a workflow log.
Instead, the PowerCenter Server writes workflow log messages to the Windows Event Log
or UNIX server log.
Archive. You can configure the number of workflow logs you want the PowerCenter Server
to archive for each workflow. By default, the PowerCenter Server does not archive
workflow logs.
If you configure the workflow to save a specific number of workflow logs, it names the most
recent log filename.log. It then cycles through a closed naming sequence for historical logs as
follows: filename.log.0, filename.log.1, filename.log.2, , filename.log.n-1, where n represents
the number of workflow logs. Because the PowerCenter Server cycles through the numeric
naming sequence, check the workflow log file timestamp to determine the chronological order
of those files.
Workflow Logs
459
Instead of entering a specific number of workflow logs to save, you can use the server variable
$PMWorkflowLogCount. When you use $PMWorkflowLogCount server variable, the
PowerCenter Server archives the number of workflow logs configured for the server variable.
If you use $PMWorkflowLogCount for all workflows, you can increase the number of
archived workflow logs for all workflows by changing the server variable.
Note: By default, $PMWorkflowLogCount is set to 0. To archive workflow logs using
yyyy = year
To prevent filling the workflow log directory, periodically delete or backup log files when
using the timestamp option.
Note: You can also truncate workflow and session log entries from the repository. For more
460
2.
3.
Description
Designates the name and directory for the parameter file. Use the parameter file to
define workflow parameters. For details on parameter files, see Parameter Files
on page 511.
Designates a location for the workflow log file. By default, the PowerCenter Server
writes the log file in the server variable directory, $PMWorkflowLogDir.
If you enter a full directory and file name in the Workflow Log File Name field, clear
this field.
Workflow Logs
461
4.
Option Name
Description
If you select Save Workflow Log by Timestamp, the PowerCenter Server saves all
workflow logs, appending a timestamp to each log.
If you select Save Workflow Log by Runs, the PowerCenter Server saves a
designated number of workflow logs. Configure the number of workflow logs in the
Save Workflow Log for These Runs option.
For details on these options, see Archiving Workflow Logs on page 459.
You can also use the $PMWorkflowLogCount server variable to save the
configured number of workflow logs for the PowerCenter Server.
The number of historical workflow logs you want the PowerCenter Server to save.
The Informatica saves the number of historical logs you specify, plus the most
recent workflow log. Therefore, if you specify 5 runs, the PowerCenter Server
saves the most recent workflow log, plus historical logs 0 to 4, for a total of 6 logs.
You can specify up to 2,147,483,647 historical logs. If you specify 0 logs, the
PowerCenter Server saves only the most recent workflow log.
In the Navigator window, connect to the server on which the workflow runs.
2.
3.
If you save workflow logs by timestamp, you can also use the Workflow Monitor to view past
workflow logs. To do this, right click the workflow in the Gantt chart view and choose Get
Workflow Log.
For more information about the Workflow Monitor, see Using the Workflow Monitor on
page 404.
462
Session Logs
The session log file contains information about all tasks the PowerCenter Server performs,
plus the load summary and transformation statistics. The amount of detail in the session log
depends on the tracing level that you set. You can define the tracing level for each
transformation or for the entire session. The session-level tracing overrides any
transformation-level tracing levels.
In general, the session log contains the following information about the session:
Load summary of reader, writer, and Data Transformation Manager (DTM) statistics
By default, the PowerCenter Server saves session logs in the directory for the PowerCenter
Server variable $PMSessionLogDir, which you define in the Workflow Manager. The default
name for the session log is s_mapping name.log. You can override the session log name and
location in the session properties.
The PowerCenter Server does not archive session logs by default. Instead, it creates one log for
each session and overwrites the existing log with the latest session log. However, you can
configure the session to archive session logs. For more information, see Archiving Session
Logs on page 471.
By default, the PowerCenter Server generates session log files based on the PowerCenter
Server code page. However, if you enable the Output Session Log in UTF-8 option on the
Configuration tab of the PowerCenter Server setup program, the PowerCenter Server writes
to the session log using the UTF-8 character set.
Note: By default, the PowerCenter Server writes row errors to the session log. However, if you
enable row error logging in the sessions properties, the PowerCenter Server does not write
dropped rows to the session log. When you enable row error logging, you can configure the
PowerCenter Server to write row errors to the session log in addition to the row error log by
enabling verbose data tracing.
Session Logs
463
You can configure the PowerCenter Server to write session log messages to an external library
as well as to the session log. To do this, you can set the Export Session Log Lib Name in the
PowerCenter Server setup program. For more information, see Installing and Configuring
the PowerCenter Server on Windows or Installing and Configuring the PowerCenter Server
on UNIX in the Installation and Configuration Guide.
464
Message Code
Description
BLKR
CNX
CMN
Messages related to databases, memory allocation, Lookup and Joiner transformations, and
internal errors.
DBG
DBGR
EP
ES
FR
FTP
HIER
LM
NTSERV
OBJM
ODL
PETL
PMF
RAPP
REP
RR
SF
Messages related to server framework, used by Load Manager and Repository Server.
Description
SORT
TE
TM
TT
VAR
WRT
XMLR
XMLW
Thread Identification
The thread identification consists of the thread type and a series of numbers separated by
underscores. The numbers following a thread name indicate the following information:
Partition number
Note: The PowerCenter Server writes an asterisk (*) as the partition point number for writer
threads.
The PowerCenter Server prints the thread identification before the log file code and the
message text in the session log. The following example illustrates a reader thread from target
load order group one, concurrent source set one, source pipeline one, and partition one:
READER_1_1_1> DBG_21438 Reader: Source is [p152636], user [jennie]
Partition number
A concurrent source set is the group of sources in a target load order group the PowerCenter
Server reads concurrently. A target load order group might contain multiple concurrent
source sets if it contains a Joiner transformation and you configure the PowerCenter Server to
read Joiner transformation sources sequentially.
Session Logs
465
Enable the PMServer 6.X Joiner source order compatibility PowerCenter Server option to
configure it to read Joiner transformation sources sequentially.
Target tables:
Emp_target
READER_1_1_1> BLKR_16019 Read [1] rows, read [0] error rows for source
table [EMP_SRC] instance name [EMP_SRC]
READER_1_1_1> BLKR_16008 Reader run completed.
TRANSF_1_1_1> DBG_21216 Finished transformations for Source Qualifier
[SQ_EMP_SRC]. Total errors [0]
WRITER_1_*_1> WRT_8167 Start loading table [Emp_target] at: Tue Aug 03
11:30:00 2004
.
MASTER> PETL_24002 Parallel Pipeline Engine finished.
MASTER> PETL_24012 Session run completed successfully.
466
Some messages are embedded within other messages. For example, a code CMN_1039
contains informational messages from the Microsoft SQL Server as it changes to the source
database to be used in the session.
Note: If you configure the PowerCenter Server to run in ASCII mode, the session log file
reports the sort order as Binary, even if you select a different sort order in the session
properties.
Load Summary
The session log includes a load summary that reports the number of rows inserted, updated,
deleted, and rejected for each target as of the last commit point. The PowerCenter Server
reports the load summary for each session by default. However, you can set tracing level to
Verbose Initialization or Verbose Data to report the load summary for each transformation.
The following sample is an excerpt from a load summary:
*****START LOAD SESSION*****
Target tables:
Emp_target
Commit on end-of-data
===================================================
Applied: 1
LOAD SUMMARY
============
Session Logs
467
Applied: 1
.
.
,
WRITER_1_*_1> WRT_8043 *****END LOAD SESSION*****
The PowerCenter Server reports statistics for each of the following operations performed on
the target:
Inserted. Shows the number of rows the PowerCenter Server marked for insert into the
target. The number of affected rows cannot be larger than requested for this operation.
Updated. Shows the number of rows the PowerCenter Server marked for update in the
target. The number of affected rows can be different from the number of requested rows.
For example, you have a table with one column called SALES_ID and five rows containing
the values: 1, 2, 3, 2, and 2. You mark rows for update where SALES_ID is 2. The writer
affects three rows, even though there was only one update request. Or, if you mark rows for
update where SALES_ID is 4, the writer affects 0 rows.
Deleted. Shows the number of rows the PowerCenter Server marked to remove from the
target. The number of affected rows can be different from the number of requested rows.
Rejected. Shows the number of rows the PowerCenter Server rejected during the writing
process. These rows cannot be applied to the target. For the Rejected rows category, the
number of affected and applied rows is always zero since these rows are not written to the
target.
468
Requested rows. Shows the number of rows the writer actually received for the specified
operation.
Applied rows. Shows the number of rows the writer successfully applied to the target (that
is, the target returned no errors).
Affected rows. Shows the number of rows affected by the specified operation. Depending
on the operation, the number of affected rows can be different from the number of
requested rows. For example, you have a table with one column called SALES_ID and five
rows containing the values: 1, 2, 3, 2, and 2. You mark rows for update where SALES_ID
is 2. The writer affects three rows, even though there was only one update request. Or, if
you mark rows for update where SALES_ID is 4, the writer affects 0 rows.
Rejected rows. Shows the number of rows the writer could not apply to the target. For
example, the target database rejects a row if the PowerCenter Server attempts to insert
NULL into a not-null field. The PowerCenter Server writes all rejected rows to the session
reject file, or to the row error log, depending on how you configure the session.
Mutated from update. Shows the number of rows originally flagged for update that are
instead inserted into the target when the session is configured Update Else Insert.
If the number of rows requested, applied, rejected, and affected are all zero for any of these
four operations, the operation does not appear as a line in the load summary. If no data is
passed to the target, the writer reports the following message:
No data loaded for this target.
The number of input rows and the name of the input source
The number of output rows and the name of the output transformation or target
The following sample is an excerpt from the transformation statistics in a session log file:
DETAILED TRANSFORMATION ROW STATISTICS
for DSQ [SQ_EMPLOYEES], Partition[1]
--------------------------------MAPPING>
MAPPING> TT_11031 Transformation [SQ_EMPLOYEES]:
MAPPING> TT_11035 Input
- 12 (__READER__)
Location. You can configure the directory where you want the session log created. By
default, the PowerCenter Server creates the session log in the directory configured for the
$PMSessionLogDir server variable. You can enter a different directory, but if the directory
does not exist or is not local to the PowerCenter Server that runs the session, the session
fails.
Session Logs
469
Name. You can name the session log or accept the default name. The default name for the
session log is s_mapping name.log.
Archive. You can configure the number of session logs you want the PowerCenter Server to
archive for each session. By default, the PowerCenter Server does not archive session logs.
Tracing levels. You can control the type of information the PowerCenter Server includes in
the session log by setting a tracing level for the session. By default, the PowerCenter Server
uses tracing levels configured in the mapping.
2.
470
3.
4.
Description
By default, the PowerCenter Server uses the session name for the log file name:
s_mapping name.log. For a debug session, it uses DebugSession_mapping
name.log.
Optionally enter a file name, a file name and directory, or use the
$PMSessionLogFile session parameter. The PowerCenter Server appends
information in this field to that entered in the Session Log File Directory field. For
example, if you have C:\session_logs\ in the Session Log File Directory field, then
enter logname.txt in the Session Log File field, the PowerCenter Server writes the
logname.txt to the C:\session_logs\ directory.
You can also use the $PMSessionLogFile session parameter to represent the name
of the session log or the name and location of the session log. For details on session
parameters, see Session Parameters on page 495.
Location of the log file. Enter a valid directory local to the PowerCenter Server. By
default, the PowerCenter Server creates session logs in the directory configured for
the $PMSessionLogDir server variable.
By default, the PowerCenter Server does not archive session logs. It creates one session log for
each session and overwrites the existing log with the latest session log.
If you configure the session to save a specific number of session logs, it names the most recent
log s_mapping name.log. It then cycles through a closed naming sequence for historical logs as
follows: s_mapping name.log.0, s_mapping name.log.1, s_mapping name.log.2, , s_mapping
name.log.n-1, where n is the number of session logs. Because the PowerCenter Server cycles
through the numeric naming sequence, check the session log file timestamp to determine the
chronological order of those files.
Instead of entering a specific number of session logs to save, you can use the server variable
$PMSessionLogCount. When you use $PMSessionLogCount server variable, the
PowerCenter Server archives the number of session logs configured for the server variable. If
you use $PMSessionLogCount for all sessions, you can increase the number of archived
session logs for all sessions by changing the server variable.
Note: By default, $PMSessionLogCount is set to 0. To archive session logs using
Session Logs
471
You can also save all session logs by configuring a session to save logs by timestamp. When
timestamping session logs, the PowerCenter Server appends the month, day, hour, and minute
of the session completion to the log file. The resulting log file name is s_mapping
name.log.yyyymmddhhmi, where:
yyyy = year
To prevent filling the session log directory, periodically delete or backup log files when using
the timestamp option.
Note: You can also truncate workflow and session log entries from the repository. For more
2.
472
3.
4.
Description
If you select Save Session Log by Timestamp, the PowerCenter Server saves all
session logs, appending a timestamp to each log.
If you select Save Session Log by Runs, the PowerCenter Server saves a designated
number of session logs. Configure the number of sessions in the Save Session Log for
These Runs option.
You can also use the $PMSessionLogCount server variable to save the configured
number of session logs for the PowerCenter Server.
The number of historical session logs you want the PowerCenter Server to save.
The Informatica saves the number of historical logs you specify, plus the most recent
session log. Therefore, if you specify 5 runs, the PowerCenter Server saves the most
recent session log, plus historical logs 0 to 4, for a total of 6 logs.
You can specify up to 2,147,483,647 historical logs. If you specify 0 logs, the
PowerCenter Server saves only the most recent session log.
Description
None
The PowerCenter Server uses the tracing level set in the mapping.
Terse
PowerCenter Server logs initialization information as well as error messages and notification of
rejected data.
Normal
PowerCenter Server logs initialization and status information, errors encountered, and skipped
rows due to transformation row errors. Summarizes session results, but not at the level of
individual rows.
Session Logs
473
Description
Verbose
Initialization
In addition to normal tracing, PowerCenter Server logs additional initialization details, names of
index and data files used, and detailed transformation statistics.
Verbose Data
In addition to verbose initialization tracing, PowerCenter Server logs each row that passes into the
mapping. Also notes where the PowerCenter Server truncates string data to fit the precision of a
column and provides detailed transformation statistics.
When you configure the tracing level to verbose data, the PowerCenter Server writes row data for
all rows in a block when it processes a transformation.
You can also enter tracing levels for individual transformations in the mapping. When you
enter a tracing level in the session properties, you override tracing levels configured for
transformations in the mapping.
To set the tracing level:
1.
Tracing
Level
2.
Select a tracing level from the Override Tracing list. Table 16-4 on page 473 describes the
session log tracing levels.
3.
474
You can also view session logs through the Workflow Monitor. When you do this, the
Workflow Monitor creates a temporary file that stores the session log. You can view the
temporary file through the Workflow Monitor.
If a session fails, you can still view the session log file.
The PowerCenter Server generates the session log based on the PowerCenter Server code page.
You can specify the language in which you want to view the session log based on the locale of
the machine hosting the PowerCenter Server.
To use the Workflow Monitor to view the most recent session log:
1.
In the Navigator window, connect to the server on which the workflow runs.
2.
3.
Open the workflow that contains the session whose log you wish to view.
4.
If you save session logs by timestamp, you can also use the Workflow Monitor to view past
session logs. To do this, right-click the session in the Gantt chart view and choose Get Session
Log.
For more information about the Workflow Monitor, see Using the Workflow Monitor on
page 404.
Session Logs
475
Reject Files
During a session, the PowerCenter Server creates a reject file for each target instance in the
mapping. If the writer or the target rejects data, the PowerCenter Server writes the rejected
row into the reject file. The reject file and session log contain information that helps you
determine the cause of the reject.
Each time you run a session, the PowerCenter Server appends rejected data to the reject file.
Depending on the source of the problem, you can correct the mapping and target database to
prevent rejects in subsequent sessions.
Note: If you enable row error logging in the session properties, the PowerCenter Server does
not create a reject file. It writes the reject rows to the row error tables or file.
476
When you run a session that contains multiple partitions, the PowerCenter Server creates a
separate reject file for each partition.
Row indicator. The first column in each row of the reject file is the row indicator. The
numeric indicator tells whether the row was marked for insert, update, delete, or reject.
If the session is a user-defined commit session, the row indicator might tell whether the
transaction was rolled back due to a non-fatal error or if the committed transaction was in
a failed target connection group. For more information about user-defined commit
sessions and rejected rows, see User-Defined Commits on page 283.
Column indicator. Column indicators appear after every column of data. The alphabetical
character indicators tell whether the data was valid, overflow, null, or truncated.
The following sample reject file shows the row and column indicators:
0,D,1921,D,Nelson,D,William,D,415-541-5145,D
0,D,1922,D,Page,D,Ian,D,415-541-5145,D
Reject Files
477
0,D,1923,D,Osborne,D,Lyle,D,415-541-5145,D
0,D,1928,D,De Souza,D,Leo,D,415-541-5145,D
0,D,2001,D,S. MacDonald,D,Ira,D,415-541-5145,D
Row Indicators
The first column in the reject file is the row indicator. The number listed as the row indicator
tells the writer what to do with the row of data.
Table 16-5 describes the row indicators in a reject file:
Table 16-5. Row Indicators in Reject File
Row Indicator
Meaning
Rejected By
Insert
Writer or target
Update
Writer or target
Delete
Writer or target
Reject
Writer
Rolled-back insert
Writer
Rolled-back update
Writer
Rolled-back delete
Writer
Committed insert
Writer
Committed update
Writer
Committed delete
Writer
If a row indicator is 3, the writer rejected the row because an update strategy expression
marked it for reject.
If a row indicator is 0, 1, or 2, either the writer or the target database rejected the row. To
narrow down the reason why rows marked 0, 1, or 2 were rejected, review the column
indicators and consult the session log.
Column Indicators
After the row indicator is a column indicator, followed by the first column of data, and
another column indicator. Column indicators appear after every column of data and define
the type of the data preceding it.
478
Type of data
Writer Treats As
Valid data.
Null columns appear in the reject file with commas marking their column. An example of a
null column surrounded by good data appears as follows:
5,D,,N,5,D
Because either the writer or target database can reject a row, and because they can reject the
row for a number of reasons, you need to evaluate the row carefully and consult the session
log to determine the cause for reject.
Reject Files
479
480
Chapter 17
Overview, 482
481
Overview
When you configure a session, you can choose to log row errors in a central location. When a
row error occurs, the PowerCenter Server logs error information that allows you to determine
the cause and source of the error. The PowerCenter Server logs information such as source
name, row ID, current row data, transformation, timestamp, error code, error message,
repository name, folder name, session name, and mapping information.
You can log row errors into relational tables or flat files. When you enable error logging, the
PowerCenter Server creates the error tables or an error log file the first time it runs the session.
Error logs are cumulative. If the error logs exist, the PowerCenter Server appends error data to
the existing error logs.
You can choose to log source row data. Source row data includes row data, source row ID, and
source row type from the source qualifier where an error occurs. The PowerCenter Server
cannot identify the row in the source qualifier that contains an error if the error occurs after a
non pass-through partition point with more than one partition or one of the following active
sources:
Aggregator
Joiner
Normalizer (pipeline)
Rank
Sorter
By default, the PowerCenter Server logs transformation errors in the session log and reject
rows in the reject file. When you enable error logging, the PowerCenter Server does not
generate a reject file or write dropped rows to the session log. Without a reject file, the
PowerCenter Server does not log Transaction Control transformation rollback or commit
errors. If you want to write rows to the session log in addition to the row error log, you can
enable verbose data tracing.
Note: When you log row errors, session performance may decrease because the PowerCenter
Server processes one row at a time instead of a block of rows at once.
482
PMERR_DATA. Stores data and metadata about a transformation row error and its
corresponding source row.
PMERR_TRANS. Stores metadata about the source and transformation ports, such as
name and datatype, when a transformation error occurs.
PMERR_DATA
When the PowerCenter Server encounters a row error, it inserts an entry into the
PMERR_DATA table. This table stores data and metadata about a transformation row error
and its corresponding source row.
Table 17-1 describes the structure of the PMERR_DATA table:
Table 17-1. PMERR_DATA Table Schema
Column Name
Datatype
Description
REPOSITORY_GID
Varchar
WORKFLOW_RUN_ID
Integer
WORKLET_RUN_ID
Integer
SESS_INST_ID
Integer
TRANS_MAPPLET_INST
Varchar
483
Datatype
Description
TRANS_NAME
Varchar
TRANS_GROUP
Varchar
TRANS_PART_INDEX
Integer
TRANS_ROW_ID
Integer
TRANS_ROW_DATA
Long Varchar
484
SOURCE_ROW_ID
Integer
SOURCE_ROW_TYPE
Integer
The row indicator that tells whether the row was marked
for insert, update, delete, or reject.
0 - Insert
1 - Update
2 - Delete
3 - Reject
Datatype
Description
SOURCE_ROW_DATA
Long Varchar
LINE_NO
Integer
PMERR_MSG
When the PowerCenter Server encounters a row error, it inserts an entry into the
PMERR_MSG table. This table stores metadata about the error and the error message.
Table 17-2 describes the structure of the PMERR_MSG table:
Table 17-2. PMERR_MSG Table Schema
Column Name
Datatype
Description
REPOSITORY_GID
Varchar
WORKFLOW_RUN_ID
Integer
WORKLET_RUN_ID
Integer
SESS_INST_ID
Integer
MAPPLET_INST_NAME
Varchar
TRANS_NAME
Varchar
485
Datatype
Description
TRANS_GROUP
Varchar
TRANS_PART_INDEX
Integer
TRANS_ROW_ID
Integer
ERROR_SEQ_NUM
Integer
ERROR_TIMESTAMP
Date/Time
ERROR_UTC_TIME
Integer
ERROR_CODE
Integer
ERROR_MSG
Long Varchar
ERROR_TYPE
Integer
LINE_NO
Integer
PMERR_SESS
When you choose relational database error logging, the PowerCenter Server inserts entries
into the PMERR_SESS table. This table stores metadata about the session where an error
occurred.
486
Datatype
Description
REPOSITORY_GID
Varchar
WORKFLOW_RUN_ID
Integer
WORKLET_RUN_ID
Integer
SESS_INST_ID
Integer
SESS_START_TIME
Date/Time
SESS_START_UTC_TIME
Integer
REPOSITORY_NAME
Varchar
FOLDER_NAME
Varchar
Specifies the folder where the mapping and session are located.
WORKFLOW_NAME
Varchar
TASK_INST_PATH
Varchar
Fully qualified session name that can span multiple rows. The
PowerCenter Server creates a new line for the session name. The
PowerCenter Server also creates a new line for each worklet in the
qualified session name. For example, you have a session named
WL1.WL2.S1. Each component of the name appears on a new line:
WL1
WL2
S1
The PowerCenter Server writes the line number in the LINE_NO
column.
MAPPING_NAME
Varchar
LINE_NO
Integer
PMERR_TRANS
When the PowerCenter Server encounters a transformation error, it inserts an entry into the
PMERR_TRANS table. This table stores metadata, such as the name and datatype of the
source and transformation ports.
Table 17-4 describes the structure of the PMERR_TRANS table:
Table 17-4. PMERR_TRANS Table Schema
Column Name
Datatype
Description
REPOSITORY_GID
Varchar
WORKFLOW_RUN_ID
Integer
487
Datatype
Description
WORKLET_RUN_ID
Integer
SESS_INST_ID
Integer
TRANS_MAPPLET_INST
Varchar
TRANS_NAME
Varchar
TRANS_GROUP
Varchar
TRANS_ATTR
Varchar
SOURCE_MAPPLET_INST
Varchar
SOURCE_NAME
Varchar
SOURCE_ATTR
Varchar
LINE_NO
Integer
488
Session header. Contains session run information. Information in the session header is like
the information stored in the PMERR_SESS table.
Column data. Contains actual row data and error message information.
The following sample error log file contains a session header, column header, and column
data:
**********************************************************************
Repository GID: fe4817ab-7d87-465f-9110-354222424df0
Repository: CustomerInfo
Folder: Row_Error_Logging
Workflow: wf_basic_REL_errors_AGG_case
Session: s_m_basic_REL_errors_AGG_case
Mapping: m_basic_REL_errors_AGG_case
Workflow Run ID: 1310
Worklet Run ID: 0
Session Instance ID: 19
Session Start Time: 08/03/2004 16:57:01
Session Start Time (UTC): 1067126221
**********************************************************************
489
490
Description
Transformation
Name of the mapplet that contains the transformation. N/A appears when this
information is not available.
Transformation Group
Name of the input or output group where an error occurred. Defaults to either input
or output if the transformation does not have a group.
Partition Index
Transformation Row ID
Error Sequence
Counter for the number of errors per row in each transformation group. If a session
has multiple partitions, the PowerCenter Server maintains this counter for each
partition.
For example, if a transformation generates three errors in partition 1 and two errors
in partition 2, ERROR_SEQ_NUM generates the values 1, 2, and 3 for partition 1,
and values 1 and 2 for partition 2.
Description
Error Timestamp
The Coordinated Universal Time, also known as Greenwich Mean Time, when the
error occurred.
Error Code
Error Message
Error message.
Error Type
The type of error that occurred. The PowerCenter Server uses the following values:
1 - Reader error
2 - Writer error
3 - Transformation error
Transformation Data
Delimited string containing all column data, including the column indicator. Column
indicators are:
D - valid
O - overflow
N - null
T - truncated
B - binary
U - data unavailable
The fixed delimiter between column data and column indicator is a colon ( : ). The
delimiter between the columns is a pipe ( | ). You can override the column delimiter
in the error handling settings.
The PowerCenter Server converts all column data to text string in the error file. For
binary data, the PowerCenter Server uses only the column indicator.
Source Name
Name of the source qualifier. N/A appears when a row error occurs downstream of
an active source that is not a source qualifier or a non pass-through partition point
with more than one partition. For a list of active sources that can affect row error
logging, see Overview on page 482.
Source Row ID
Value that the source qualifier assigns to each row it reads. If the PowerCenter
Server cannot identify the row, the value is -1.
491
Description
The row indicator that tells whether the row was marked for insert, update, delete, or
reject.
0 - Insert
1 - Update
2 - Delete
3 - Reject
Source Data
Delimited string containing all column data, including the column indicator. Column
indicators are:
D - valid
O - overflow
N - null
T - truncated
B - binary
U - data unavailable
The fixed delimiter between column data and column indicator is a colon ( : ). The
delimiter between the columns is a pipe ( | ). You can override the column delimiter
in the error handling settings.
The PowerCenter Server converts all column data to text string in the error table or
error file. For binary data, the PowerCenter Server uses only the column indicator.
492
Object tab. For more information on creating a session configuration object, see Creating a
Session Configuration Object on page 183.
To configure error logging options:
1.
2.
3.
Error Log
Options
493
Table 17-6 describes the error logging settings of the Config Object tab:
Table 17-6. Error Log Options
4.
494
Required/
Optional
Required
Specifies the type of error log to create. You can specify relational
database, flat file, or no log. By default, the PowerCenter Server
does not create an error log.
Required/
Optional
Optional
Specifies the table name prefix for relational logs. The PowerCenter
Server appends 11 characters to the prefix name. Oracle and
Sybase have a 30 character limit for table names. If a table name
exceeds 30 characters, the session fails.
Required/
Optional
Specifies the directory where errors are logged. By default, the error
log file directory is $PMBadFilesDir\. This option is required when
you enable flat file logging.
Required/
Optional
Specifies error log file name. The character limit for the error log file
name is 255. By default, the error log file name is PMError.log. This
option is required when you enable flat file logging.
Optional
Optional
If you choose not to log source row data, or if source row data is
unavailable, the PowerCenter Server writes an indicator such as N/
A or -1, depending on the column datatype.
If you do not need to capture source row data, consider disabling
this option to increase PowerCenter Server performance.
Required
Delimiter for string type source row data and transformation group
row data. By default, the PowerCenter Server uses a pipe ( | )
delimiter. Verify that you do not use the same delimiter for the row
data as the error logging columns. If you use the same delimiter, you
may find it difficult to read the error log file.
Click OK.
Description
Chapter 18
Session Parameters
This chapter contains information on the following topics:
Overview, 496
Tips, 510
495
Overview
Session parameters, like mapping parameters, represent values you might want to change
between sessions, such as a database connection or source file. Use session parameters in the
session properties, and then define the parameters in a parameter file. You can specify the
parameter file for the session to use in the session properties. You can also specify it when you
use pmcmd to start the session.
The Workflow Manager provides one built-in session parameter, $PMSessionLogFile. With
$PMSessionLogFile, you can change the name of the session log generated for the session.
The Workflow Manager also allows you to create user-defined session parameters.
Table 18-1 describes required naming conventions for the session parameters you can define:
Table 18-1. Naming Conventions for User-Defined Session Parameters
Parameter Type
Naming Convention
Database Connection
$DBConnectionName
Source File
$InputFileName
Target File
$OutputFileName
Lookup File
$LookupFileName
Reject File
$BadFileName
Use session parameters to make sessions more flexible. For example, you have the same type of
transactional data written to two different databases, and you use the database connections
TransDB1 and TransDB2 to connect to the databases. You want to use the same mapping for
both tables. Instead of creating two sessions for the same mapping, you can create a database
connection parameter, $DBConnectionSource, and use it as the source database connection
for the session. When you create a parameter file for the session, you set
$DBConnectionSource to TransDB1 and run the session. After the session completes, you set
$DBConnectionSource to TransDB2 and run the session again.
You might use several session parameters together to make session management easier. For
example, you might use source file and database connection parameters to configure a session
to read data from different source files and write the results to different target databases. You
can then use reject file parameters to write the session reject files to the target machine. You
can use the session log parameter, $PMSessionLogFile, to write to different session logs in the
target machine, as well.
When you use session parameters, you must define the parameters in the parameter file.
Session parameters do not have default values. When the PowerCenter Server cannot find a
value for a session parameter, it fails to initialize the session.
496
Session Log
Parameter
Session Log
Directory
Parameter
Filename
For example, in a session, you leave Session Log File Directory set to its default value, the
$PMSessionLogDir server variable. For Session Log File Name, you enter the session
parameter $PMSessionLogFile. In the parameter file, you set $PMSessionLogFile to
TestRun.txt. When you registered the PowerCenter Server, you defined $PMSessionLogDir
as C:/Program Files/Informatica/PowerCenter Server/SessLogs. When the PowerCenter Server
Session Log Parameter
497
runs the session, it creates a session log named TextRun.txt in the C:/Program Files/
Informatica/PowerCenter Server/SessLogs directory.
In the session properties, click the General Options settings of the Properties tab.
2.
3.
If you want $PMSessionLogFile to represent both the session log name and directory,
clear the Session Log File Directory field.
4.
Enter a parameter file and directory in the Parameter File Name field.
5.
Click OK.
Before you run the session, create the parameter file in the specified directory and define
$PMSessionLogFile. For details, see Parameter Files on page 511.
498
In the session properties, click the Mapping tab (Transformation view) and click
Connections settings for the sources or targets node.
499
2.
Open
Button
500
3.
4.
Enter a name for the database connection parameter. Name the connection parameter
$DBConnectionName.
5.
In the General Options settings of the Properties tab, enter a parameter file and directory
in the Parameter Filename field.
The directory must be local to the PowerCenter Server.
6.
Click OK.
Before you run the session, create the parameter file in the specified directory and define the
database connection parameter. For details, see Parameter Files on page 511.
501
Source File
Directory
Source
Filename In the
Parameter File
502
For example, in a session, you leave Source File Directory set to its default, the
$PMSourceFileDir server variable. For the source file name, you create a session parameter
named $Inputfile_products. In the parameter file, you set $Inputfile_products to
products.txt. When you registered the PowerCenter Server, you set $PMSourceFileDir for
C:/Program Files/Informatica/PowerCenter Server/SrcFiles. When the PowerCenter Server
runs the session, it reads the products.txt file in the C:/Program Files/Informatica/
PowerCenter Server/SrcFiles directory.
2.
3.
In the Source Filename field, enter the source file parameter name.
Name all source file parameters $InputFileName.
4.
If you want the parameter to represent both the source file name and location, clear the
Source Directory field.
5.
In the General Options settings of the Properties tab, enter a parameter file and directory
in the Parameter Filename field.
6.
Click OK.
Before you run the session, create the parameter file in the specified directory and define the
source file parameter. For details, see Parameter Files on page 511.
503
Target file
directory
Target file name
in the
parameter file
504
For example, you want to name the target file based on the month in which the session runs.
In the session you leave the target directory set to its default, the $PMTargetFileDir server
variable. For the target file name, you create a session parameter named $OutputFileName. In
the parameter file, you set $OutputFileName to Nov2000.out. When you registered the
PowerCenter Server, set the $PMTargetFileDir to C:/Program Files/Informatica/PowerCenter
Server/TgtFiles. When the PowerCenter Server runs the session, it creates Nov2000.out in the
C:/Program Files/Informatica/PowerCenter Server/TgtFiles directory.
2.
3.
In the Output Filename field, enter the target file parameter name.
Name all target file parameters $OutputFileName.
4.
If you want the parameter to represent both the target file name and location, clear the
Output File Directory field.
5.
In the General Options settings of the Properties tab, enter a parameter file and directory
in the Parameter Filename field.
6.
Click OK.
Before you run the session, create the parameter file in the specified directory and define the
target file parameter you created. For details, see Parameter Files on page 511.
505
Lookup File
Directory
Lookup file
name in the
parameter file
506
For example, in a session, you leave Lookup File Directory set to its default, the
$PMLookupFileDir server variable. For the lookup file name, you create a session parameter
named $LookupFile_orders. In the parameter file, you set $LookupFile_orders to orders.txt.
When you registered the PowerCenter Server, you set $PMLookupFileDir for C:/Program
Files/Informatica/PowerCenter Server/LkpFiles. When the PowerCenter Server runs the
session, it reads the orders.txt file in the C:/Program Files/Informatica/PowerCenter Server/
LkpFiles directory.
2.
3.
In the Lookup Source Filename field, enter the lookup file parameter name.
Name all lookup file parameters $LookupFileName.
4.
If you want the parameter to represent both the source file name and location, clear the
Lookup Directory field.
5.
In the General Options settings of the Properties tab, enter a parameter file and directory
in the Parameter Filename field.
6.
Click OK.
Before you run the session, create the parameter file in the specified directory and define the
lookup file parameter. For details, see Parameter Files on page 511.
507
Reject file
directory
Reject file
name in the
parameter file
508
For example, you want to rename reject files between sessions to keep rejected data from
different session runs in different files. in a session, you leave Reject File Directory set to its
default, the $PMBadFileDir server variable. For the reject file name, you create a session
parameter named $BadFileName. In the parameter file, you set $BadFileName to
FirstRun.bad. When you registered the PowerCenter Server, you set $PMBadFileDir for C:/
Program Files/Informatica/PowerCenter Server/BadFiles. When the PowerCenter Server runs
the session, it creates the FirstRun.bad file in the C:/Program Files/Informatica/PowerCenter
Server/BadFiles directory.
2.
In the Reject Filename field, enter the reject file parameter name.
Name all reject file parameters $BadFileName.
3.
If you want the parameter to represent both the reject file name and location, clear the
Reject File Directory field.
4.
In the General Options settings of the Properties tab, enter a parameter file and directory
in the Parameter Filename field.
5.
Click OK.
Before you run the session, create the parameter file in the specified directory and define the
reject file parameter. For details, see Parameter Files on page 511.
509
Tips
Use reject file and session log parameters in conjunction with target file or target database
connection parameters.
When you use a target file or target database connection parameter with a session, you can
keep track of reject files by using a reject file parameter to write the reject file to the target
machine. You can also use the session log parameter to write the session log to the target
machine.
510
Chapter 19
Parameter Files
This chapter covers the following topics:
Overview, 512
Troubleshooting, 520
Tips, 521
511
Overview
You can use a parameter file to define the values for parameters and variables used in a
workflow, worklet, or session. You can create a parameter file using a text editor such as
WordPad or Notepad. You list the parameters or variables and their values in the parameter
file. Parameter files can contain the following types of parameters and variables:
Workflow variables
Worklet variables
Session parameters
When you use parameters or variables in a workflow, worklet, or session, the PowerCenter
Server checks the parameter file to determine the start value of the parameter or variable. You
can use a parameter file to initialize workflow variables, worklet variables, mapping
parameters, and mapping variables. If you do not define start values for these parameters and
variables, the PowerCenter Server checks for the start value of the parameter or variable in
other places. For more information, see Using Workflow Variables on page 103 and
Mapping Parameters and Variables in the Designer Guide.
You can place parameter files on the PowerCenter Server machine or on a local machine. Use
a local parameter file if you do not have access to parameter files on the PowerCenter Server
machine. When you use a local parameter file, pmcmd passes variables and values in the file to
the PowerCenter Server. Local parameter files are used with the startworkflow pmcmd
command. For more information, see pmcmd Reference on page 594.
You must define session parameters in a parameter file. Since session parameters do not have
default values, when the PowerCenter Server cannot locate the value of a session parameter in
the parameter file, it fails to initialize the session.
You can include parameter or variable information for more than one workflow, worklet, or
session in a single parameter file by creating separate sections for each object within the
parameter file.
You can also create multiple parameter files for a single workflow, worklet, or session and
change the file that these tasks use as needed. To specify the parameter file the PowerCenter
Server uses with a workflow, worklet, or session, you can do either of the following:
Enter the parameter file name and directory in the workflow, worklet, or session
properties.
Start the workflow, worklet, or session using pmcmd and enter the parameter filename and
directory in the command line. For details, see Using pmcmd on page 581.
If you enter a parameter file name and directory in both the workflow, worklet, or session
properties and in the pmcmd command line, the PowerCenter Server uses the information
you enter in the pmcmd command line.
512
Workflow variables:
[folder name.WF:workflow name]
Worklet variables:
[folder name.WF:workflow name.WT:worklet name]
or
[folder name.session name]
or
[session name]
Below each heading, you define parameter and variable values as follows:
parameter name=value
parameter2 name=value
variable name=value
variable2 name=value
For example, you have a session, s_MonthlyCalculations, in the Production folder. The
session uses a string mapping parameter, $$State, that you want to set to MA, and a
datetime mapping variable, $$Time. $$Time already has an initial values of 9/30/2000
00:00:00 saved in the repository, but you want to override this value to 10/1/2000
00:00:00. The session also uses session parameters to connect to source files and target
databases, as well as to write session log to the appropriate session log file.
Table 19-1 shows the parameters and variables that you define in the parameter file:
Table 19-1. Parameters and Variables in Parameter File
Parameter and Variable Type
Desired Definition
$$State
MA
$$Time
10/1/2000 00:00:00
513
Desired Definition
$InputFile1
Sales.txt
$DBConnection_Target
$PMSessionLogFile
d:/session logs/firstrun.txt
The parameter file for the session includes the folder and session name, as well as each
parameter and variable:
[Production.s_MonthlyCalculations]
$$State=MA
$$Time=10/1/2000 00:00:00
$InputFile1=sales.txt
$DBConnection_target=sales
$PMSessionLogFile=D:/session logs/firstrun.txt
The next time you run the session, you might edit the parameter file to change the state to
MD and delete the $$Time variable. This allows the PowerCenter Server to use the value for
the variable that was set in the previous session run.
514
Capitalize folder and session names as necessary. Folder and session names are casesensitive in the parameter file.
Enter folder names for non-unique session names. When a session name exists more than
once in a repository, enter the folder name to indicate the location of the session.
Create one or more parameter files. You assign parameter files to workflows, worklets, and
sessions individually. You can specify the same parameter file for all of these tasks or create
several parameter files.
When you want to include parameter and variable information for more than one
session in the file, create a new section for each session as follows. The folder name is
optional.
[folder_name.session_name]
parameter_name=value
variable_name=value
mapplet_name.parameter_name=value
[folder2_name.session_name]
parameter_name=value
variable_name=value
mapplet_name.parameter_name=value
Specify headings in any order. You can place headings in any order in the parameter file.
However, if you define the same parameter or variable more than once in the file, the
PowerCenter Server assigns the parameter or variable value using the first instance of the
parameter or variable.
Specify parameters and variables in any order. Below each heading, you can specify the
parameters and variables in any order.
When defining parameter values, do not use unnecessary line breaks or spaces. The
PowerCenter Server might interpret additional spaces as part of the value.
List all necessary mapping parameters and variables. Values entered for mapping
parameters and variables become the start value for parameters and variables in a mapping.
Mapping parameter and variable names are not case sensitive.
List all session parameters. Session parameters do not have default values. An undefined
session parameter can cause the session to fail. Session parameter names are not casesensitive.
Use correct date formats for datetime values. When entering datetime values, use the
following date formats:
MM/DD/RR
MM/DD/RR HH24:MI:SS
Guidelines for Creating Parameter Files
515
MM/DD/YYYY
MM/DD/YYYY HH24:MI:SS
Precede parameters and variables created in mapplets with the mapplet name as follows:
mapplet_name.parameter_name=value
mapplet2_name.variable_name=value
516
517
Select Workflows-Edit.
2.
3.
Enter the parameter directory and name in the Parameter Filename field.
You can enter either a direct path or a server variable directory. Use the appropriate
delimiter for the PowerCenter Server operating system.
Enter the
parameter
directory.
4.
Click OK.
518
1.
Click the Properties tab and open the General Options settings.
2.
Enter the parameter directory and name in the Parameter Filename field.
3.
You can enter either a direct path or a server variable directory. Use the appropriate
delimiter for the PowerCenter Server operating system.
Enter the
parameter
directory.
4.
Click OK.
519
Troubleshooting
I have a section in a parameter file for a session, but the PowerCenter Server does not seem
to read it.
In the parameter file, folder and session names are case-sensitive. Make sure to enter folder
and session names exactly as they appear in the Workflow Manager. Also, use the appropriate
prefix for all user-defined session parameters.
Table 19-2 describes required naming conventions for user-defined session parameters:
Table 19-2. Naming Conventions for User-Defined Session Parameters
Parameter Type
Naming Convention
Database Connection
$DBConnectionName
Reject File
$BadFileName
Source File
$InputFileName
Target File
$OutputFileName
Lookup File
$LookupFileName
I am trying to use a source file parameter to specify a source file and location, but the
PowerCenter Server cannot find the source file.
Make sure to clear the source file directory in the session properties. The PowerCenter Server
concatenates the source file directory with the source file name to locate the source file.
Also, make sure to enter a directory local to the PowerCenter Server and to use the
appropriate delimiter for the operating system.
I am trying to run a workflow with a parameter file and one of the sessions keeps failing.
The session might contain a parameter that is not listed in the parameter file. The
PowerCenter Server uses the parameter file to start all sessions in the workflow. Check the
session properties, then verify that all session parameters are defined correctly in the
parameter file.
520
Tips
Use a single parameter file to group parameter information for related sessions.
When sessions are likely to use the same database connection or directory, you might want to
include them in the same parameter file. When existing systems are upgraded, you can update
information for all sessions by editing one parameter file.
Use pmcmd and multiple parameter files for sessions with regular cycles.
When you change parameter values for a session in a cycle, reuse the same values on a regular
basis. If you run a session against both the sales and marketing databases once a week, you
might want to create separate parameter files for each regular session run. Then, instead of
changing the parameter file in the session properties each time you run the session, use pmcmd
to specify the parameter file to use when you start the session.
Tips
521
522
Chapter 20
External Loading
This chapter covers the following topics:
Overview, 524
Troubleshooting, 557
523
Overview
You can configure a session to use DB2, Oracle, Sybase IQ, and Teradata external loaders to
load session target files into the respective databases. External Loaders can increase session
performance since these databases can load information directly from files faster than they can
run the SQL commands to insert the same data into the database.
To use an external loader for a session, you must perform the following tasks:
1.
Create an external loader connection in the Workflow Manager and configure the
external loader attributes. For details on creating external loader connections, see
Creating an External Loader Connection on page 551.
2.
Configure the session to write to flat file instead of to a relational database. For more
information, see Configuring a Session to Write to a File on page 553.
3.
Choose an external loader connection for each target file in the session properties. For
more information, see Selecting an External Loader Connection on page 555.
When you run a session that uses an external loader, the PowerCenter Server creates a control
file and a target flat file. The control file contains information about the target flat file such as
data format and loading instructions for the external loader. The control file has an extension
of .ctl. You can view the control file and the target flat file in the target file directory (default:
$PMTargetFileDir).
The PowerCenter Server waits for all external loading to complete before performing postsession commands, external procedures, and sending post-session email.
Before you run external loaders, consider the following issues:
Disable constraints. Normally, you disable constraints built into the tables receiving the
data before performing the load. Consult your database documentation for instructions on
how to disable constraints.
Performance issues. To preserve high performance, you can increase commit intervals and
turn off database logging. However, to perform database recovery on failed sessions, you
must have database logging turned on.
Code page requirements. DB2, Oracle, Sybase IQ, and Teradata database servers must run
in the same code page as the target flat file code page. The external loaders start in the
target flat file code page. The PowerCenter Server creates the control and target flat files
using the target flat file code page. If you are using a code page other than 7-bit ASCII for
the target flat file, run the PowerCenter Server in Unicode data movement mode.
The PowerCenter Server can use multiple external loaders within one session. For example, if
the mapping contains two targets, you can create a session that uses different connection
types: one uses an Oracle external loader connection and the other uses a Sybase IQ external
loader connection.
524
Super User
To configure a session to use an external loader, you must have one of the following sets of
privileges and permissions:
Use Workflow Manager privilege and folder read and write permissions
Super User
If you enabled enhanced security, you must also have read permission for external loader
connections associated with the session.
525
Loading to named pipes. When you load data to named pipes, the external loader starts to
load data to the target database as soon as the data appears in the named pipe.
Staging data using flat files. When you stage data in flat files, the external loader starts to
load data to the target databases only after the PowerCenter Server completes writing to
the target flat files.
name. You can rename the output file in the session that uses the external loader.
526
When you use an external loader that can load data from multiple files, you can create
multiple partitions in the session. You choose an external loader connection for each
partition. The PowerCenter Server creates an output file for each partition, and the external
loader loads the output from each target file to the database.
If you use a loader that cannot load from multiple files, the session fails.
Table 20-1 lists the external loaders and loader behavior:
Table 20-1. Partitioning Guidelines for External Loaders
External Loader
Load Behavior
DB2 EE db2load
Oracle
Sybase IQ
Teradata MultiLoad
Teradata TPump
Teradata Fastload
*The PowerCenter Server cannot pass multiple output files to the DB2 EEE autoloader.
527
Loading to DB2
The DB2 EE external loader and DB2 EEE external loader can perform insert and replace
operations on targets. The external loaders can also restart or terminate load operations.
The DB2 EE external loader invokes the db2load executable located in the PowerCenter
Server installation directory. The DB2 EE external loader can load data to a DB2 server on a
machine that is remote to the PowerCenter Server.
The DB2 EEE external loader invokes the IBM DB2 Autoloader program to load data. The
Autoloader program uses the db2atld executable. The DB2 EEE external loader can partition
data and load the partitioned data simultaneously to the corresponding database partitions.
When you use the DB2 EEE external loader, the PowerCenter Server and the DB2 EEE server
must be on the same machine.
The DB2 external loaders load from a delimited flat file. Verify that the target table columns
are wide enough to store all of the data.
If you select a DB2 loader in a session with multiple partitions, the session fails. For more
information about partitioning sessions with external loaders, see Partitioning Sessions with
External Loaders on page 526.
If you configure multiple targets in the same pipeline to use DB2 external loaders, each loader
must load to a different tablespace on the target database. For information on selecting
external loaders, see Configuring External Loading in a Session on page 553.
When you load data to a DB2 database using the DB2 EE or DB2 EEE external loader, you
must have the correct authority levels and privileges to load data to the database tables.
Insert. Adds loaded data to the table without changing existing table data.
Replace. Deletes all existing data from the table, and inserts the loaded data. The table and
index definitions do not change.
Terminate. Terminates a previously interrupted load operation and rolls back the
operation to the starting point, even if consistency points were passed. The tablespaces
return to normal state, and all table objects are made consistent.
528
DB2 privileges allow you to create or access database resources. Authority levels provide a
method of grouping privileges and higher-level database manager maintenance and utility
operations. Together, these act to control access to the database manager and its database
objects. You can access objects for which you have the required privilege or authority.
To load data into a table, you must have one of the following authorities:
SYSADM authority
DBADM authority
INSERT privilege on the table when the load utility is invoked in INSERT mode,
TERMINATE mode (to terminate a previous load insert operation), or RESTART mode
(to restart a previous load insert operation)
INSERT and DELETE privilege on the table when the load utility is invoked in
REPLACE mode, TERMINATE mode (to terminate a previous load replace operation),
or RESTART mode (to restart a previous load replace operation)
In addition, you must have proper read access and read/write permissions:
The database instance owner must have read access to the external loader input files.
If you run DB2 as a service on Windows, you must configure the service start account with
a user account that has read/write permissions to use LAN resources, including drives,
directories, and files.
If you load to DB2 EEE, the database instance owner must have write access to the load
dump file and the load temporary file.
Default
Value
Opmode
Insert
The DB2 external loader operation mode. Choose one of the following operation
modes:
- Insert
- Replace
- Restart
- Terminate
For more information about DB2 operation modes, see Setting DB2 External
Loader Operation Modes on page 528.
External Loader
Executable
db2load
Description
Loading to DB2
529
Default
Value
Remote
The location of the DB2 EE database server relative to the PowerCenter Server.
Select Local if the DB2 EE database server resides on the PowerCenter Server
machine. Select Remote if the DB2 EE Server resides on another machine.
Is Staged
Disabled
The method of loading data. Select Is Staged to load data to a flat file staging
area before loading to the database. Otherwise, the data is loaded to the
database using a named pipe. For more information, see Loading Data Using
Named Pipes on page 526 or Staging Data to Flat Files on page 526.
Recoverable
Enabled
Description
Any other return code indicates that the load operation failed. The PowerCenter Server writes
the following error message to the session log:
WRT_8047 Error: External loader process <external loader name> exited with
error <return code>.
Table 20-3 describes the return codes for the DB2 EE external loader:
Table 20-3. DB2 EE External Loader Return Codes
Code
Description
The external loader could not open the external loader log file.
The external loader could not access the control file because the control file is locked by another process.
530
partitions in the database. You can configure the DB2 EEE external loader to use the
following loading modes:
Split and load. The DB2 EEE external loader partitions the data and loads it
simultaneously on the corresponding database partitions.
Split only. The DB2 EEE external loader partitions the data and writes the output to files
in the specified split file directory.
Load only. The DB2 EEE external loader does not partition the data. It loads data in
existing split files on the corresponding database partitions.
Analyze. The DB2 EEE external loader generates an optimal partitioning map with even
distribution across all database partitions. If you run the external loader in split and load
mode after you run it in analyze mode, the external loader uses the optimal partitioning
map to partition the data.
For more information about DB2 loading modes, consult your DB2 database documentation.
The DB2 EEE external loader also writes multiple external loader logs. The number of
external loader logs depends on the number of database partitions to which the external
loader loads data. For each partition, the external loader appends a number corresponding to
the partition number to the external loader log file name. The DB2 EEE external loader log
file format is file_name.ldrlog.partition_number.
The PowerCenter Server does not archive or overwrite DB2 EEE external loader logs. If an
external loader log of the same name exists when the external loader runs, the external loader
appends new external loader log messages to the end of the existing external loader log file.
You must manually archive or delete the external loader log files. For details on log files
generated by DB2 Autoload, consult your DB2 documentation.
For information on DB2 EEE external loader return codes, consult your DB2
documentation.
Table 20-4 describes attributes for DB2 EEE external loader connections:
Table 20-4. DB2 EEE External Loader Attributes
Attribute
Default
Value
Opmode
Insert
The DB2 external loader operation mode. Choose one of the following
operation modes:
- Insert
- Replace
- Restart
- Terminate
For more information about DB2 operation modes, see Setting DB2 External
Loader Operation Modes on page 528.
External Loader
Executable
db2atld
n/a
The location of the split files. The external loader creates split files if you
configure SPLIT_ONLY loading mode.
Output Nodes
n/a
Description
Loading to DB2
531
532
Attribute
Default
Value
Split Nodes
n/a
The database partitions that determine how to split the data. If you do not
specify this attribute, the external loader automatically determines an optimal
splitting method.
Mode
Split and
load
The loading mode the external loader uses to load the data. Choose one of the
following loading modes:
- Split and load
- Split only
- Load only
- Analyze
25
Force
No
Status Interval
100
Ports
6000-6063
The range of TCP ports the external loader uses to create sockets for internal
communications with the DB2 server.
Check Level
Nocheck
Specifies whether the external loader should check for record truncation during
input or output.
n/a
The name of the file that specifies the partitioning map. If you want to use a
customized partitioning map, you must specify this attribute. You can generate
a customized partitioning map when you run the external loader in Analyze
loading mode.
n/a
The name of the partitioning map when you run the external loader in Analyze
loading mode. You must specify this attribute if you want to run the external
loader in Analyze loading mode.
Trace
The number of rows the external loader traces when you need to review a
dump of the data conversion process and output of hashing values.
Is Staged
Disabled
The method of loading data. Select Is Staged to load data to a flat file staging
area before loading to the database. Otherwise, the data is loaded to the
database using a named pipe. For more information, see Loading Data Using
Named Pipes on page 526 or Staging Data to Flat Files on page 526.
Date Format
mm/dd/
yyyy
The date format. The date format in the Connection Object definition must
match the date format you define in the target definition. DB2 supports the
following date formats:
- mm/dd/yyyy
- yyyy-mm-dd
- dd.mm.yyyy
- yyyy-mm-dd
Description
Loading to Oracle
The Oracle SQL loader can perform insert, update, and delete operations on targets. The
target flat file for an Oracle external loader can be fixed-width or delimited.
constraints on any columns, the session may write null data into a NOT NULL column.
If you select an Oracle external loader, the default external loader executable name is
SQLLOAD. This is accurate for most UNIX platforms, but if you use Windows, check
your Oracle documentation to find the name of the external loader executable.
Select Do Not Enable Parallel Load to write to a non-partitioned Oracle target table.
To write to a partitioned Oracle target using Direct Path, you must select Enable Parallel
Load and Append load mode.
To write to a partitioned Oracle target using Conventional Path, select Enable Parallel
Load for best performance.
Tip: For optimal performance, select Direct Path when writing to a partitioned Oracle target.
Loading to Oracle
533
Table 20-5 describes the attributes for Oracle external loader connections:
Table 20-5. Oracle External Loader Attributes
Attribute
Default Value
Description
Error Limit
Number of errors to allow before the external loader stops the load
operation.
Load Mode
Append
The loading mode the external loader uses to load data. Choose from
one of the following loading modes:
- Append
- Insert
- Replace
- Truncate
Load Method
Use Conventional
Path
The method the external loader uses to load data. Choose from one of
the following load methods:
- Use Conventional Path
- Use Direct Path (Recoverable)
- Use Direct Path (Unrecoverable)
Enable Parallel
Load
Enable Parallel
Load
10000
For Conventional Path load method, this attribute specifies the number
of rows in the bind array for load operations. For Direct Path load
methods, this attribute specifies the number of rows the external loader
reads from the target flat file before it saves the data to the database.
External Loader
Executable
sqlload
n/a
Is Staged
Disabled
The method of loading data. Select Is Staged to load data to a flat file
staging area before loading to the database. Otherwise, the data is
loaded to the database using a named pipe. For more information, see
Loading Data Using Named Pipes on page 526 or Staging Data to Flat
Files on page 526.
Reject File
The Oracle external loader creates a reject file for data rejected by the database. The reject file
has an extension of .ldrreject. The loader saves the reject file in the target files directory
(default location: $PMTargetFileDir).
534
Loading to Sybase IQ
The Sybase external loader can perform insert operations on Sybase IQ targets. It cannot
perform update or delete operations on targets.
Use the following rules and guidelines when you work with a Sybase IQ external loader:
Configure a Sybase IQ user with read/write access before you use a Sybase IQ external
loader.
Target flat files for a Sybase IQ external loader can be fixed-width or delimited.
If you select a Sybase IQ external loader in a session with multiple partitions, the session
fails. For more information about partitioning sessions with external loaders, see
Partitioning Sessions with External Loaders on page 526.
If the PowerCenter Server and Sybase IQ Server are on different machines, map a drive
from the machine hosting the PowerCenter Server to the machine hosting the Sybase IQ
Server. In a UNIX environment, mount the drive.
Loading to Sybase IQ
535
When you create a Sybase IQ external loader connection, the Workflow Manager sets the
name of the external loader executable file to dbisql by default. If you use an executable file
with a different name, for example, dbisqlc, you must update the External Loader
Executable field. If the external loader executable file directory is not in the system path,
you must enter the file path and file name in this field.
Table 20-6 describes the attributes for Sybase IQ external loader connections:
Table 20-6. Sybase IQ External Loader Attributes
536
Attribute
Default
Value
Block Factor
10000
The number of records per block in the target Sybase table. The external
loader applies the Block Factor attribute to load operations for fixedwidth flat file targets only.
Block Size
50000
Checkpoint
Enabled
Description
Default
Value
Notify Interval
1000
The number of rows the Sybase IQ external loader loads before it writes
a status message to the external loader log.
n/a
The location of the flat file target. You must specify this attribute relative
to the database server installation directory. Enter the target file directory
path using the syntax for the machine hosting the database server
installation. For example, if the PowerCenter Server is on a Windows
machine and the Sybase IQ Server is on a UNIX machine, use UNIX
syntax.
External Loader
Executable
dbisql
Is Staged
Enabled
The method of loading data. Select Is Staged to load data to a flat file
staging area before loading to the database. Otherwise, the data is
loaded to the database using a named pipe. For more information, see
Loading Data Using Named Pipes on page 526 or Staging Data to Flat
Files on page 526.
Description
Loading to Sybase IQ
537
Loading to Teradata
When you load to Teradata, you can use the following external loaders:
Multiload. Performs insert, update, delete, and upsert operations for large volume
incremental loads. You can use this loader when you run a session with a single partition.
Multiload acquires table level locks, making it appropriate for offline loading. For more
information about configuring the Multiload external loader connection object, see
Teradata MultiLoad External Loader Attributes on page 540.
TPump. Performs insert, update, delete, and upsert operations for relatively low volume
updates. You can use this loader when you run a session with multiple partitions. TPump
acquires row-hash locks on the table, allowing other users to access the table as TPump
loads to it. For more information about configuring the Tpump external loader
connection object, see Teradata TPump External Loader Attributes on page 542.
FastLoad. Performs insert operations for high volume initial loads, or for high volume
truncate and reload operations. You can use this loader when you run a session with a
single partition. You can only use this loader on empty tables with no secondary indexes.
For more information about configuring the FastLoad external loader connection object,
see Teradata FastLoad External Loader Attributes on page 545.
Warehouse Builder. Performs insert, update, upsert, and delete operations on targets. You
can use this loader when you run a session with multiple partitions. You can achieve the
functionality of the other loaders based on the operator you use. For more information
about configuring the Warehouse Builder external loader connection object, see Teradata
Warehouse Builder External Loader Attributes on page 547.
If you use a Teradata external loader to perform update or upsert, you can use the Target
Update Override option in the Mapping Designer to override the UPDATE statement in the
external loader control file. For upsert, the INSERT statement in the external loader control
file remains unchanged. For details on using the Target Update Override option, see
Mappings in the Designer Guide.
Use the following guidelines when you use the Teradata external loaders:
538
The PowerCenter Server can use Teradata external loaders to load fixed-width flat files to a
Teradata database.
The target output file name, including the file extension, must not exceed 27 characters. If
the session contains multiple partitions, the target output file name, including the file
extension, must not exceed 25 characters.
You can use the Teradata external loaders to load multibyte data.
You cannot use the Teradata external loaders to load binary data.
When you load to Teradata using named pipes, set the checkpoint value to 0 to prevent
external loaders from performing checkpoint operations.
When you edit a session, you can specify error, log, or work table names, depending on the
loader you use. You can also specify error, log, or work database names.
When you edit a session, you can override the control file in the loader connection
properties.
You can view the Teradata control file in the target directory.
See the Teradata documentation for more information about the loaders.
In the Control File Editor dialog box, click Generate to create the default control file. The
Workflow Manager creates the default control file based on the session and loader properties.
Edit the generated control file, and click OK to save your changes.
Note that if you change a target or loader connection setting after you edit the control file,
the control file does not include those changes. If you want to include those changes, you
must generate the control file again and edit it.
Note: The Workflow Manager does not validate the control file syntax. Teradata verifies the
control file syntax when you run a session. If the control file is invalid, the session fails.
Loading to Teradata
539
You can perform insert, update, delete, and upsert operations on targets. You can also use
data driven mode to perform insert, update, or delete operations based on instructions
coded in an Update Strategy or Custom transformation within a mapping.
The MultiLoad external loader cannot load from multiple output files. If you run a session
with multiple partitions, the session fails. For more information about partitioning
sessions with external loaders, see Partitioning Sessions with External Loaders on
page 526.
If you invoke a greater number of sessions than the maximum number of concurrent
sessions the database allows, the session may hang. You can set the minimum value for
Tenacity and Sleep to ensure that sessions fail rather than hang.
Table 20-7 shows the attributes that you configure for the Teradata MultiLoad external
loader:
Table 20-7. Teradata MultiLoad External Loader Attributes
540
Attribute
Default
Value
Description
TDPID
n/a
Database Name
n/a
Date Format
n/a
The date format. The date format in the Connection Object definition must match
the date format you define in the target definition. The PowerCenter Server
supports the following date formats:
- dd/mm/yyyy
- mm/dd/yyyy
- yyyy/dd/mm
- yyyy/mm/dd
Error Limit
The total number of rejected records that MultiLoad can write to the MultiLoad error
tables. Uniqueness violations do not count as rejected records.
An error limit of 0 means that there is no limit on the number of rejected rows.
Checkpoint
10,000
The interval between checkpoints. You can set the interval to the following values:
- 60 or more: MultiLoad performs a checkpoint operation after it processes each
multiple of that number of records.
- 159: MultiLoad performs a checkpoint operation at the specified interval, in
minutes.
- 0: MultiLoad does not perform any checkpoint operations during the import task.
Tenacity
10,000
Specifies how long, in hours, MultiLoad tries to log onto the required sessions. If a
logon fails, MultiLoad delays for the number of minutes specified in the Sleep
attribute, and then retries the logon. MultiLoad keeps trying until the logon
succeeds or the number of hours specified in the Tenacity attribute elapses.
Default
Value
Load Mode
Upsert
The mode to generate SQL commands: Insert, Delete, Update, Upsert, or Data
Driven.
When you select Data Driven loading, the PowerCenter Server follows instructions
coded in an Update Strategy or Custom transformations within the mapping to
determine how to flag rows for insert, delete, or update. The PowerCenter Server
writes a column in the target file or named pipe to indicate the update strategy. The
control file uses these values to determine how to load data to the target. The
PowerCenter Server uses the following values to indicate the update strategy:
0 - Insert
1 - Update
2 - Delete
Enabled
Specifies whether to drop the MultiLoad error tables before beginning the next
session. Select this option to drop the tables, or clear it to keep them.
External Loader
Executable
mload
The name and optional file path of the Teradata external loader executable. If the
external loader executable directory is not in the system path, you must enter the
file path and filename.
Max Sessions
The maximum number of MultiLoad sessions per MultiLoad job. Max Sessions must
be between 1 and 32,767.
Running multiple MultiLoad sessions causes the client and database to use more
resources. Therefore, setting this value to a small number may improve
performance.
Sleep
The number of minutes MultiLoad waits before retrying a logon. MultiLoad tries until
the logon succeeds or the number of hours specified in the Tenacity attribute
elapses.
Sleep must be greater than 0. If you specify 0, MultiLoad issues an error message
and uses the default value, 6 minutes.
Is Staged
Disabled
The method of loading data. Select Is Staged to load data to a flat file staging area
before loading to the database. Otherwise, the data is loaded to the database using
a named pipe. For more information, see Loading Data Using Named Pipes on
page 526 or Staging Data to Flat Files on page 526.
Error Database
n/a
The error database name. You can use this attribute to override the default error
database name. If you do not specify a database name, the PowerCenter Server
uses the target table database.
Work Table
Database
n/a
The work table database name. You can use this attribute to override the default
work table database name. If you do not specify a database name, the
PowerCenter Server uses the target table database.
Log Table
Database
n/a
The log table database name. You can use this attribute to override the default log
table database name. If you do not specify a database name, the PowerCenter
Server uses the target table database.
Description
Loading to Teradata
541
Table 20-8 shows the attributes that you configure when you edit a session and override the
Teradata MultiLoad external loader connection object:
Table 20-8. Teradata MultiLoad External Loader Attributes Defined at the Session Level
Attribute
Default
Value
Error Table 1
n/a
The table name for the first error table. You can use this attribute to override
the default error table name. If you do not specify an error table name, the
PowerCenter Server uses ET_<target_table_name>.
Error Table 2
n/a
The table name for the second error table. You can use this attribute to
override the default error table name. If you do not specify an error table name,
the PowerCenter Server uses UV_<target_table_name>.
Work Table
n/a
The work table name. You can use this attribute to override the default work
table name. If you do not specify a work table name, the PowerCenter Server
uses WT_<target_table_name>.
Log Table
n/a
The log table name. You can use this attribute to override the default log table
name. If you do not specify a log table name, the PowerCenter Server uses
ML_<target_table_name>.
n/a
The control file text. You can use this attribute to override the control file the
PowerCenter Server uses when it loads to Teradata. For more information, see
Overriding the Control File on page 539.
Description
For more information about these attributes, consult your Teradata documentation.
542
Attribute
Default
Value
Description
TDPID
n/a
Database Name
n/a
Default
Value
Error Limit
Limits the number of rows rejected for errors. When the error limit is exceeded,
TPump rolls back the transaction that causes the last error. An error limit of 0
causes TPump to stop processing after any error.
Checkpoint
15
The number of minutes between checkpoints. You must set the checkpoint to a
value between 0 and 60.
Tenacity
Specifies how long, in hours, TPump tries to log onto the required sessions. If a
logon fails, TPump delays for the number of minutes specified in the Sleep
attribute, and then retries the logon. TPump keeps trying until the logon succeeds
or the number of hours specified in the Tenacity attribute elapses.
To disable Tenacity, set the value to 0.
Load Mode
Upsert
The mode to generate SQL commands: Insert, Delete, Update, Upsert, or Data
Driven.
When you select Data Driven loading, the PowerCenter Server follows instructions
coded in an Update Strategy or Custom transformations within the session mapping
to determine how to flag rows for insert, delete, or update. The PowerCenter Server
writes a column in the target file or named pipe to indicate the update strategy. The
control file uses these values to determine how to load data to the database. The
PowerCenter Server uses the following values to indicate the update strategy:
0 - Insert
1 - Update
2 - Delete
Enabled
Specifies whether to drop the TPump error tables before beginning the next
session. Select this option to drop the tables, or clear it to keep them.
External Loader
Executable
tpump
The name and optional file path of the Teradata external loader executable. If the
external loader executable directory is not in the system path, you must enter the
file path and filename.
Max Sessions
The maximum number of TPump sessions per TPump job. Each partition in a
session starts its own TPump job. Running multiple TPump sessions causes the
client and database to use more resources. Therefore, setting this value to a small
number may improve performance.
Sleep
The number of minutes TPump waits before retrying a logon. TPump tries until the
logon succeeds or the number of hours specified in the Tenacity attribute elapses.
Packing Factor
20
The number of rows that each session buffer holds. Packing improves network/
channel efficiency by reducing the number of sends and receives between the
target flat file and the Teradata database.
Statement Rate
The initial maximum rate, per minute, at which the TPump executable sends
statements to the Teradata database. If you set this attribute to 0, the statement
rate is unspecified.
Description
Loading to Teradata
543
Default
Value
Serialize
Disabled
Robust
Disabled
When Robust is not selected, it signals TPump to use simple restart logic. In this
case, restarts cause TPump to begin at the last checkpoint. TPump reloads any
data that was loaded after the checkpoint. This method does not have the extra
overhead of the additional database writes in the robust logic.
No Monitor
Enabled
When selected, this attribute prevents TPump from checking for statement rate
changes from, or update status information for, the TPump monitor application.
Is Staged
Disabled
The method of loading data. Select Is Staged to load data to a flat file staging area
before loading to the database. Otherwise, the data is loaded to the database using
a named pipe. For more information, see Loading Data Using Named Pipes on
page 526 or Staging Data to Flat Files on page 526.
Error Database
n/a
The error database name. You can use this attribute to override the default error
database name. If you do not specify a database name, the PowerCenter Server
uses the target table database.
Log Table
Database
n/a
The log table database name. You can use this attribute to override the default log
table database name. If you do not specify a database name, the PowerCenter
Server uses the target table database.
Description
Table 20-10 shows the attributes that you configure when you edit a session and override the
Teradata TPump external loader connection object:
Table 20-10. Teradata TPump External Loader Attributes Defined at the Session Level
544
Attribute
Default
Value
Error Table
n/a
The error table name. You can use this attribute to override the default error
table name. If you do not specify an error table name, the PowerCenter Server
uses ET_<target_table_name><partition_number>.
Log Table
n/a
The log table name. You can use this attribute to override the default log table
name. If you do not specify a log table name, the PowerCenter Server uses
LT_<target_table_name><partition_number>.
n/a
The control file text. You can use this attribute to override the control file the
PowerCenter Server uses when it loads to Teradata. For more information, see
Overriding the Control File on page 539.
Description
For more information about these attributes, consult your Teradata documentation.
Each FastLoad job loads data to one Teradata database table. If you want to load data to
multiple tables using FastLoad, you must create multiple FastLoad jobs.
The FastLoad external loader cannot load from multiple output files. If you run a session
with multiple partitions, the session fails. For more information about partitioning
sessions with external loaders, see Partitioning Sessions with External Loaders on
page 526.
FastLoad does not load duplicate rows from the output file to the target table in the
Teradata database if the target table has a primary key.
If you load date values to the target table, you must configure the date format for the
column in the target table in the format YYYY-MM-DD.
You can view the Teradata FastLoad control file in the target directory.
Table 20-11 shows the attributes that you configure for the Teradata FastLoad external loader:
Table 20-11. Teradata FastLoad External Loader Attributes
Attribute
Default
Value
Description
TDPID
n/a
Database Name
n/a
Error Limit
1,000,000
The maximum number of rows that FastLoad rejects before it stops loading data to
the database table.
Checkpoint
Tenacity
The number of hours FastLoad tries to log on to the required FastLoad sessions
when the maximum number of load jobs are already running on the Teradata
database. When FastLoad tries to log on for a new session, and the Teradata
database indicates that the maximum number of load sessions is already running,
FastLoad logs off all new sessions that were logged on, delays for the number of
minutes specified in the Sleep attribute, and then retries the logon. FastLoad keeps
trying until it logs on for the required number of sessions or exceeds the number of
hours specified in the Tenacity attribute.
Loading to Teradata
545
Default
Value
Enabled
Specifies whether to drop the FastLoad error tables before beginning the next
session. FastLoad will not run if non-empty error tables exist from a prior job.
Select this option to drop the tables, or clear it to keep them.
External Loader
Executable
fastload
The name and optional file path of the Teradata external loader executable. If the
external loader executable directory is not in the system path, you must enter the
file path and file name.
Max Sessions
The maximum number of FastLoad sessions per FastLoad job. Max Sessions must
be between 1 and the total number of access module processes (AMPs) on your
system.
Sleep
The number of minutes FastLoad pauses before retrying a logon. FastLoad tries
until the logon succeeds or the number of hours specified in the Tenacity attribute
elapses.
Truncate Target
Table
Disabled
Specifies whether to truncate the target database table before beginning the
FastLoad job. FastLoad cannot load data to non-empty tables.
Is Staged
Disabled
The method of loading data. Select Is Staged to load data to a flat file staging area
before loading to the database. Otherwise, the data is loaded to the database using
a named pipe. For more information, see Loading Data Using Named Pipes on
page 526 or Staging Data to Flat Files on page 526.
Error Database
n/a
The error database name. You can use this attribute to override the default error
database name. If you do not specify a database name, the PowerCenter Server
uses the target table database.
Description
Table 20-12 shows the attributes that you configure when you edit a session and override the
Teradata FastLoad external loader connection object:
Table 20-12. Teradata FastLoad External Loader Attributes Defined at the Session Level
Attribute
Default
Value
Error Table 1
n/a
The table name for the first error table. You can use this attribute to override
the default error table name. If you do not specify an error table name, the
PowerCenter Server uses ET_<target_table_name>.
Error Table 2
n/a
The table name for the second error table. You can use this attribute to
override the default error table name. If you do not specify an error table
name, the PowerCenter Server uses UV_<target_table_name>.
n/a
The control file text. You can use this attribute to override the control file the
PowerCenter Server uses when it loads to Teradata. For more information,
see Overriding the Control File on page 539.
Description
For more information about these attributes, consult your Teradata documentation.
546
Protocol
Load
Uses FastLoad protocol. Load attributes are described in Table 20-14. For more
information about how FastLoad works, see Teradata FastLoad External Loader
Attributes on page 545.
Update
Uses MultiLoad protocol. Update attributes are described in Table 20-14. For more
information about how MultiLoad works, see Teradata MultiLoad External Loader
Attributes on page 540.
Stream
Uses TPump protocol. Stream attributes are described in Table 20-14. For more
information about how TPump works, see Teradata TPump External Loader
Attributes on page 542.
Each Teradata Warehouse Builder operator has associated attributes. Not all attributes
available for FastLoad, MultiLoad, and TPump external loaders are available for Teradata
Warehouse Builder.
Table 20-14 shows the attributes that you configure for Teradata Warehouse Builder:
Table 20-14. Teradata Warehouse Builder External Loader Attributes
Attribute
Default
Value
Description
TDPID
n/a
Database Name
n/a
Error Database
Name
n/a
Operator
Update
The Warehouse Builder operator used to load the data. Choose Load, Update, or
Stream.
Max instances
Loading to Teradata
547
548
Attribute
Default
Value
Error Limit
The maximum number of rows that Warehouse Builder rejects before it stops loading
data to the database table.
Checkpoint
Tenacity
The number of hours Warehouse Builder tries to log on to the Warehouse Builder
sessions when the maximum number of load jobs are already running on the
Teradata database. When Warehouse Builder tries to log on for a new session, and
the Teradata database indicates that the maximum number of load sessions is
already running, Warehouse Builder logs off all new sessions that were logged on,
delays for the number of minutes specified in the Sleep attribute, and then retries the
logon. Warehouse Builder keeps trying until it logs on for the required number of
sessions or exceeds the number of hours specified in the Tenacity attribute.
To disable Tenacity, set the value to 0.
Load Mode
Upsert
The mode to generate SQL commands. Choose Insert, Update, Upsert, Delete or
Data Driven.
When you use the Update or Stream operators, you can choose Data Driven load
mode. When you select data driven loading, the PowerCenter Server follows
instructions coded in Update Strategy or Custom transformations within the mapping
to determine how to flag rows for insert, delete, or update. The PowerCenter Server
writes a column in the target file or named pipe to indicate the update strategy. The
control file uses these values to determine how to load data to the database. The
PowerCenter Server uses the following values to indicate the update strategy:
0 - Insert
1 - Update
2 - Delete
Enabled
Specifies whether to drop the Warehouse Builder error tables before beginning the
next session. Warehouse Builder will not run if error tables containing data exist from
a prior job. Clear the option to keep error tables.
Truncate Target
Table
Disabled
Specifies whether to truncate target tables. Enable this option to truncate the target
database table before beginning the Warehouse Builder job.
External Loader
Executable
tbuild
The name and optional file path of the Teradata external loader executable file. If the
external loader directory is not in the system path, enter the file path and file name.
Max Sessions
The maximum number of Warehouse Builder sessions per Warehouse Builder job.
Max Sessions must be between 1 and the total number of access module processes
(AMPs) on your system.
Sleep
Serialize
Disabled
Description
Default
Value
Packing Factor
20
The number of rows that each session buffer holds. Packing improves network/
channel efficiency by reducing the number of sends and receives between the target
file ad the Teradata database. Enabled with Stream operator only.
Robust
Disabled
The recovery or restart mode. When you disable Robust, the Stream operator uses
simple restart logic. The Stream operator reloads any data that was loaded after the
last checkpoint.
When you enable Robust, Warehouse Builder uses robust restart logic. In robust
mode, the Stream operator determines how many rows were processed since the
last checkpoint. The Stream operator processes all the rows that were not processed
after the last checkpoint. Enabled with Stream operator only.
Is Staged
Disabled
The method of loading data. Select Is Staged to load data to a flat file staging area
before loading to the database. Otherwise, the data is loaded to the database using a
named pipe. For more information, see Loading Data Using Named Pipes on
page 526 or Staging Data to Flat Files on page 526.
Error Database
n/a
The error database name. You can use this attribute to override the default error
database name. If you do not specify a database name, the PowerCenter Server
uses the target table database.
Work Table
Database
n/a
The work table database name. You can use this attribute to override the default
work table database name. If you do not specify a database name, the PowerCenter
Server uses the target table database.
Log Table
Database
n/a
The log table database name. You can use this attribute to override the default log
table database name. If you do not specify a database name, the PowerCenter
Server uses the target table database.
Description
Table 20-15 shows the attributes that you configure when you edit a session and override
Teradata Warehouse Builder external loader connection object:
Table 20-15. Teradata Warehouse Builder External Loader Attributes Defined at the Session Level
Attribute
Default
Value
Error Table 1
n/a
The table name for the first error table. You can use this attribute to override the
default error table name. If you do not specify an error table name, the
PowerCenter Server uses ET_<target_table_name>.
Error Table 2
n/a
The table name for the second error table. You can use this attribute to override
the default error table name. If you do not specify an error table name, the
PowerCenter Server uses UV_<target_table_name>.
Work Table
n/a
The work table name. You can use this attribute to override the default work table
name. If you do not specify a work table name, the PowerCenter Server uses
WT_<target_table_name>.
Log Table
n/a
The log table name. You can use this attribute to override the default log table
name. If you do not specify a log table name, the PowerCenter Server uses
RL_<target_table_name>.
Description
Loading to Teradata
549
Table 20-15. Teradata Warehouse Builder External Loader Attributes Defined at the Session Level
Attribute
Control File
Content Override
Default
Value
n/a
Description
The control file text. You can use this attribute to override the control file the
PowerCenter Server uses when it loads to Teradata. For more information, see
Overriding the Control File on page 539.
For more information about these attributes, consult your Teradata documentation.
550
updated connection.
To create an external loader connection:
1.
2.
Click New.
551
3.
4.
5.
prevent the password from appearing in the control file. When you do this, the
PowerCenter Server writes an empty string for the password in the control file.
552
6.
7.
Click OK.
8.
To create additional connections, repeat steps 3-7, and then click Close to save your
changes.
2.
3.
Target
Instance
Writer Type
553
To change the writer type for the target, select the target instance in the Instances list. Change
the writer type from Relational Writer to File Writer.
Target
Instance
Properties
Settings
To set the file properties, select the target instance in the Instances list.
554
Description
Enter the directory name in this field. By default, the PowerCenter Server writes output
files to the directory $PMTargetFileDir.
If you enter a full directory and file name in the Output Filename field, clear this field.
External loader sessions may fail if you use double spaces in the path for the output file.
Output Filename
Enter the file name, or file name and path. By default, the Workflow Manager names the
target file based on the target definition used in the mapping: target_name.out. External
loader sessions may fail if you use double spaces in the path for the output file.
By default, the PowerCenter Server writes all reject files to the directory $PMBadFileDir.
If you enter a full directory and file name in the Reject Filename field, clear this field.
Reject Filename
Enter the file name, or file name and directory. The PowerCenter Server appends
information in this field to that entered in the Reject File Directory field. For example, if you
have C:/reject_file/ in the Reject File Directory field, and enter filename.bad in the
Reject Filename field, the PowerCenter Server writes rejected rows to C:/reject_file/
filename.bad.
By default, the PowerCenter Server names the reject file after the target instance name:
target_name.bad.
You can also enter a reject file session parameter to represent the reject file or the reject
file and directory. Name all reject file parameters $BadFileName. For details on session
parameters, see Session Parameters on page 495.
Opens a dialog box that allows you to define flat file properties. When you use an external
loader, you must define the flat file properties by clicking the Set File Properties button.
For Oracle external loaders, the target flat file can be fixed-width or delimited.
For Sybase IQ external loaders, the target flat file can be fixed-width or delimited.
For Teradata external loaders, the target flat file must be fixed-width. For DB2 external
loaders, the target flat file must be delimited.
For more information, see Configuring Fixed-Width Properties on page 265 and
Configuring Delimited Properties on page 266.
Note: Do not select Merge Partitioned Files or enter a merge file name. You cannot merge
555
Target
Instance
Connection
Type and
selected
Connection
Object
2.
3.
Click the Open button in the Value field to select the correct external loader connection
object.
4.
5.
If the session contains multiple partitions, and you choose a loader that can load from
multiple output files, you can select a different connection for each partition, but each
connection must be of the same type. For example, you can select different Teradata TPump
external loader connections for each partition, but you cannot select a Teradata TPump
connection for one partition and an Oracle connection for another partition.
If the session contains multiple partitions, and you choose a loader that can load from only
one output file, the session fails. For more information about running external loader sessions
with multiple partitions, see Partitioning Sessions with External Loaders on page 526.
556
Troubleshooting
I am trying to set up a session to load data to an external loader, but I cannot select an
external loader connection in the session properties.
Check your mapping to make sure you did not configure it to load to a flat file target. In
order to use an external loader, you must configure the mapping with a DB2, Oracle, Sybase
IQ, or Teradata relational target. When you create the session, select a file writer in the
Writers settings of the Mapping tab in the session properties. Then open the Connections
settings and select an external loader connection.
I am trying to run a session that uses TPump, but the session fails. The session log displays
an error saying that the Teradata output file name is too long.
The PowerCenter Server uses the Teradata output file name to generate names for the TPump
error and log files, as well as the log table name. To do this, the PowerCenter Server adds a
prefix of several characters to the output file name. It adds three characters for sessions with
one partition and five characters for sessions with multiple partitions.
Teradata allows log table names of up to 30 characters. Because the PowerCenter Server adds a
prefix, if you are running a session with a single partition, specify a target output file name
with a maximum of 27 characters, including the file extension. If you are running a session
with multiple partitions, specify a target output file name with a maximum of 25 characters,
including the file extension.
I tried to load data to Teradata using TPump, but the session failed. I corrected the error,
but the session still fails.
Occasionally, Teradata does not drop the log table when you rerun the session. Check the
Teradata database, and manually drop the log table if it exists. Then rerun the session.
Troubleshooting
557
558
Chapter 21
Using FTP
This chapter covers the following topics:
Overview, 560
559
Overview
The PowerCenter Server can use File Transfer Protocol (FTP) to access source and target files.
With both source and target files, you can use FTP to transfer the files directly to the
PowerCenter Server or stage them on a local directory.
You can also stage files by creating a pre-session shell command to move the files local to the
PowerCenter Server. Accessing files directly with FTP generally provides better session
performance than using FTP to stage the files. However, you may want to stage FTP files to
keep a local archive.
Before creating an FTP session, you must configure the FTP connection in the Workflow
Manager. For details, see Creating an FTP Connection on page 561.
When using FTP file sources and targets in a session, you should know the following
information:
Mainframe Notes
Due to mainframe restrictions, the following constraints apply when using FTP with
mainframe machines:
560
You cannot execute sessions concurrently if the sessions use the same FTP source file or
target file located on a mainframe.
If you abort a workflow containing a session with a staged FTP source or target from a
mainframe, you may need to wait for the connection to timeout before you can run the
workflow again.
Host name. The name or IP address of the remote machine. Optionally, you can specify a
port number between 1 and 65535 inclusive. If you do not specify a port number, the
PowerCenter Server uses the port number 21 by default. Use the following syntax for
specifying a host name:
hostname:port-number
or
IP address:port-number
When you specify a port number, enable that port number for FTP on the host machine.
Default remote directory. The directory you want the PowerCenter Server to use by
default. In the session, when you enter a file name without a directory, the PowerCenter
Server appends the file name to this directory. Therefore, this path must be exact and
contain the appropriate trailing delimiters. For example, if you enter c:/data/ and in the
session specify the file FILENAME, the PowerCenter Server reads the path and file name
as c:\data\FILENAME.
If you enter the wrong delimiter for an FTP directory, the Workflow Manager does not
correct it. If the FTP host is a mainframe machine, the directory must begin with a single
quote and end with the period delimiter, such as: defaultdir. You can override this option
in the session properties.
Depending on the remote machine you access, you might also need to enter the user name
and password. The password must be in 7-bit ASCII only. As with database connections, if
you edit an FTP connection, all sessions using the FTP connection use the updated
connection.
FTP Permissions
If you enable enhanced security, you can set FTP connection permissions in the Workflow
Manager. The Workflow Manager assigns Owner permissions to the user who registers the
connection. The Workflow Manager grants Owner Group permissions to the first group in
the Group Memberships list of the owner. You can manage FTP connection permissions if
you are the owner of the connection or if you have Super User privileges.
A registered FTP connection does not appear in the list of FTP connections if you do not
have at least read permission for the connection. If you want to edit a connection, you must
Creating an FTP Connection
561
have read and write permissions for the connection. If you want to run sessions that use a
source or target FTP connection, you must have execute permission for the connection.
To create an FTP connection, you must have one of the following privileges:
Super User
562
1.
2.
3.
Click New.
4.
Required/
Optional
Description
Name
Required
User Name
Optional
Password
Optional
Host Name
Required
-orIP address:port-number
When you specify a port number, enable that port number for FTP on the
host machine.
Default Remote
Directory
Required
563
564
5.
Click OK.
6.
Repeat steps 3-5 for any other necessary FTP connection, then click Close.
Use Workflow Manager privilege with folder read and write permissions
You must have read permission for FTP connections you want to associate with the session in
addition to the privileges and permissions listed above.
2.
In the Connections settings on the Mapping tab, select FTP for Type.
Select an
FTP
connection.
565
3.
Click the Open button in the Value field to select an FTP connection.
4.
If you enter a file name without a leading slash or drive letter, the PowerCenter Server
appends the file name to the Default Remote Directory path entered in the FTP
Connection dialog box. For example, if your default remote directory is c:/data/, and you
enter a remote file name of FILENAME, the PowerCenter Server connects to the FTP
host and looks for c:/data/FILENAME.
If you enter a fully qualified file name in the Remote Filename field, the PowerCenter
Server uses the named path rather than the path entered in the Default Remote
Directory.
566
If you enter a mainframe file name for a source file in the default directory, make sure you
enter the closing quote. For example, if your default remote directory is:
defaultdir.
To access the file, FILENAME, from the default mainframe directory, enter the following
in the Remote Filename field:
filename
When the PowerCenter Server begins the session, it connects to the mainframe host and
looks for:
defaultdir.filename
In contrast, if you want to use a file in a different directory, you must enter that directory
and file name in the Remote Filename field, like this:
overridedir.filename
Note: Depending on the FTP server you use, you may have limited options for entering
FTP directories. Please see your FTP server documentation for details.
5.
To store the file in a directory local to the PowerCenter Server, select Is Staged.
When you select this option for a source file, the PowerCenter Server moves the source
file from the FTP host to a local directory before the session begins, then uses the local
file during the session. If the staged file exists, the PowerCenter Server truncates the
staged file before running the session.
The location of the local file differs depending on the information entered in the
Properties settings of the Sources tab:
567
If you have an individual path and file name listed in the Source Filename field, the
PowerCenter Server uses that path as the local directory, and names the staged local file
after the listed file. For example, if the Source Filename field contains the path, c:/data/
sales_info, the PowerCenter Server connects to the FTP host, then moves the file to c:/
data, and names the file sales_info.
If the Source Filename field contains only a file name (and no path), the PowerCenter
Server names the file as defined in the Source Filename field, and places the file in the
directory listed in the Source file directory field. If the directory is not specified, the
PowerCenter Server stages the file in the directory where the PowerCenter Server runs on
UNIX or in Windows system directory.
If you do not stage the source file, the PowerCenter Server accesses the data directly from
the FTP host.
6.
Repeat steps 3-5 for each FTP source and target in the session, then click OK.
7.
568
2.
In the Connections settings on the Mapping tab, select FTP for Type.
Select an
FTP
connection.
3.
Click the Open button in the Value field to select an FTP connection.
4.
569
If you enter a file name without a leading slash or drive letter, the PowerCenter Server
appends the file name to the Default Remote Directory path entered in the FTP
Connection dialog. For example, if your default remote directory is c:/data/, and you
enter a remote file name of FILENAME, the PowerCenter Server connects to the FTP
host and looks for c:/data/FILENAME.
If you enter a fully qualified file name, the PowerCenter Server uses the named path
rather than the path entered in the Default Remote Directory. Do not enclose the fully
qualified file name in single or double quotation marks. The session may fail if you
enclose the fully qualified file name in quotation marks.
When you transfer a target file to a mainframe host, make sure you enter the opening
quote. For example, if your default remote directory is defaultdir., you enter the
following in the default remote directory field:
defaultdir.
Note: Depending on the FTP server you use, you may have limited options for entering
FTP directories. Please see your FTP server documentation for details.
5.
To store the target file in a directory on the machine where the PowerCenter Server runs,
select Is Staged.
When you select this option, the PowerCenter Server writes to the local target file during
the session, then moves the file to the FTP host after the session is complete. The
location of the local file differs depending on the information entered in the Properties
settings of the Mapping tab:
570
If you have an individual path and file name listed in the Output Filename field, the
PowerCenter Server uses that path as the local directory, and names the staged local file
after the listed file. For example, if the Output Filename field contains the path, c:/data/
t_company_all.out, the PowerCenter Server connects to the FTP host, then moves the
file to c:/data, and names the file t_company_all.out.
If the Output Filename field contains only a file name (and no path), the PowerCenter
Server names the file as defined in the Output Filename field, and places the file in the
directory listed in the Output file directory field. If the directory is not specified, the
PowerCenter Server stages the file in the directory where the PowerCenter Server runs on
UNIX or the system directory on Windows.
If you do not stage the file, the PowerCenter Server accesses the data directly from the
FTP host. The local file and directory are not used.
Select the Merge Partitioned Files option and specify the merge file name and directory
when you partition your target. For more information, see Partitioning File Targets on
page 380.
6.
Repeat steps 3-5 for each FTP target in the session, and then click OK.
7.
571
572
Chapter 22
Using Incremental
Aggregation
This chapter covers the following topics:
Overview, 574
573
Overview
When using incremental aggregation, you apply captured changes in the source to aggregate
calculations in a session. If the source changes only incrementally and you can capture
changes, you can configure the session to process only those changes. This allows the
PowerCenter Server to update your target incrementally, rather than forcing it to process the
entire source and recalculate the same data each time you run the session.
For example, you might have a session using a source that receives new data every day. You
can capture those incremental changes because you have added a filter condition to the
mapping that removes pre-existing data from the flow of data. You then enable incremental
aggregation.
When the session runs with incremental aggregation enabled for the first time on March 1,
you use the entire source. This allows the PowerCenter Server to read and store the necessary
aggregate data. On March 2, when you run the session again, you filter out all the records
except those time-stamped March 2. The PowerCenter Server then processes only the new
data and updates the target accordingly.
Consider using incremental aggregation in the following circumstances:
You can capture new source data. Use incremental aggregation when you can capture new
source data each time you run the session. Use a Stored Procedure or Filter transformation
to process only new data.
Incremental changes do not significantly change the target. Use incremental aggregation
when the changes do not significantly change the target. If processing the incrementally
changed source alters more than half the existing target, the session may not benefit from
using incremental aggregation. In this case, drop the table and re-create the target with
complete source data.
Note: Do not use incremental aggregation if your mapping contains percentile or median
functions. The PowerCenter Server uses system memory to process Percentile and Median
functions in addition to the cache memory you configure in the session property sheet. As a
result, the PowerCenter Server does not store incremental aggregation values for Percentile
and Median functions in disk caches.
574
Move the aggregate files without correcting the configured path or directory for the files in
the session property sheet.
Change the configured path or directory for the aggregate files without moving the files to
the new location.
Note: When the PowerCenter Server rebuilds incremental aggregation files, the data in the
575
you cannot change from a Latin1 code page to an MSLatin1 code page, even though these
code pages are compatible.
576
Change the PowerCenter Server data movement mode from ASCII to Unicode or from
Unicode to ASCII.
Change the session sort order when the PowerCenter Server runs in Unicode mode.
If you do not run the session using Verbose Init mode or use an identifiable transformation
naming convention, you may have difficulty determining which files belong to each session.
For more information about cache file storage and naming conventions, see Cache Files on
page 615.
577
Change the cache directory for a partition. If you change the directory for a partition and
you want the PowerCenter Server to reuse the cache files, you must move the cache files for
the partition associated with the changed directory.
If you change the directory for the first partition, and you do not move the cache files,
the PowerCenter Server rebuilds the cache files for all partitions.
If you change the directory for partitions 2-n, and you do not move the cache files, the
PowerCenter Server rebuilds the cache files that it cannot locate.
Decrease the number of partitions. If you delete a partition, and you want the
PowerCenter Server to reuse the cache files, you must move the cache files for the deleted
partition to the directory configured for the first partition. If you do not move the files to
the directory of the first partition, the PowerCenter Server rebuilds the cache files that it
cannot locate.
Note: If you increase the number of partitions, the PowerCenter Server realigns the index
and data cache files the next time you run a session. It does not need to rebuild the files.
Move cache files. If you move cache files for a partition and you want the PowerCenter
Server to reuse the files, you must also change the partition directory. If you do not change
the directory, the Informatica rebuilds the files the next time you run a session.
Delete cache files. If you delete cache files, the PowerCenter Server rebuilds them the next
time you run a session.
If you change the number of partitions and the cache directory, you may need to move cache
files for both. For example, if you change the cache directory for the first partition, and you
decrease the number of partitions, you need to move the cache files for the deleted partition as
well as the cache files for the partition associated with the changed directory.
578
Configure the session for incremental aggregation and verify that the file directory has
enough disk space for the aggregate files.
Using a filter in the mapping. You may be able to remove pre-existing source data during
a session with a filter.
Using a stored procedure. You may be able to remove pre-existing source data at the
source database with a pre-load stored procedure.
Verify the location where you want to store the aggregate files. The index and data files
grow in proportion to the source data. When denoting the directory for those files, be sure
the directory has enough disk space to store historical data for the session.
When you run multiple sessions with incremental aggregation, decide where you want the
files stored. Then enter the appropriate directory for the server variable, $PMCacheDir, in
the Workflow Manager. You can enter session-specific directories for the index and data
files. However, by using the server variable for all sessions using incremental aggregation,
you can easily change the cache directory when necessary by changing $PMCacheDir.
Changing the cache directory without moving the files causes the PowerCenter Server to
reinitialize the aggregate cache and gather new aggregate data.
In a server grid, PowerCenter Servers rebuild incremental aggregation files they cannot
find. When a PowerCenter Server rebuilds incremental aggregation files, it loses aggregate
history. For more information about methods to save aggregate history in a server grid, see
Running Sessions with Cache Files on page 445.
Configure the session to write file names in the session log. If you want the PowerCenter
Server to write the incremental aggregation cache file names in the session log, configure
the session with Verbose Init tracing. You can override tracing in the Error Handling
settings on the Config Object tab.
Verify the incremental aggregation settings in the session properties. You can configure
the session for incremental aggregation in the Performance settings on the Properties tab.
579
You can also configure the session to reinitialize the aggregate cache. If you choose to
reinitialize the cache, the Workflow Manager displays a warning indicating the
PowerCenter Server overwrites the existing cache and a reminder to clear this option after
running the session.To configure a session for incremental aggregation:
Figure 22-1 shows the Performance settings on the Properties tab where you configure
incremental aggregation options:
Figure 22-1. Incremental Aggregation Session Properties
Configure
incremental
aggregation.
Note: You cannot use incremental aggregation when the mapping includes an Aggregator
transformation with Transaction transformation scope. The Workflow Manager marks the
session invalid.
580
Chapter 23
Using pmcmd
This chapter covers the following topics:
Overview, 582
581
Overview
pmcmd is a program that you can use to communicate with the PowerCenter Server. You can
perform some of the tasks that you can also perform in the Workflow Manager such as
starting and stopping workflows and tasks.
You can use pmcmd in the following modes:
Command line mode. The command line syntax allows you to write scripts for scheduling
workflows. Each command you write in the command line mode must include connection
information to the PowerCenter Server.
Interactive mode. You establish and maintain an active connection to the PowerCenter
Server. This allows you to issue a series of commands.
You can use repository user names and passwords as environment variables with pmcmd. You
can also customize the way pmcmd displays the date and time on the machine running the
PowerCenter Server. Before you use pmcmd, configure these variables on the PowerCenter
Server. For more information, see Configuring Environment Variables on page 585.
Note: To issue the shutdownserver command, you must have the Super User privilege or
582
Command
Mode(s)
Description
aborttask
Command line,
Interactive
abortworkflow
Command line,
Interactive
connect
Interactive
disconnect
Interactive
exit
Interactive
getrunningsessionsdetails
Command line,
Interactive
Mode(s)
Description
getserverdetails
Command line,
Interactive
getserverproperties
Command line,
Interactive
getsessionstatistics
Command line,
Interactive
gettaskdetails
Command line,
Interactive
Displays details for a task including folder and workflow name. Also
displays the task, status, and run mode.
In a server grid, this command displays the PowerCenter Servers
that runs each task instance. For more information, see
Gettaskdetails on page 601.
getworkflowdetails
Command line,
Interactive
help
Command line,
Interactive
pingserver
Command line,
Interactive
quit
Interactive
resumeworkflow
Command line,
Interactive
resumeworklet
Command line,
Interactive
scheduleworkflow
Command line,
Interactive
setfolder
Interactive
Overview
583
584
Command
Mode(s)
Description
setnowait
Interactive
setwait
Interactive
showsettings
Interactive
shutdownserver
Command line,
Interactive
startask
Command line,
Interactive
startworkflow
Command line,
Interactive
stoptask
Command line,
Interactive
stopworkflow
Command line,
Interactive
unscheduleworkflow
Command line,
Interactive
unsetfolder
Interactive
version
Command line,
Interactive
waittask
Command line,
Interactive
waitworkflow
Command line,
Interactive
PM_CODEPAGENAME
PMTOOL_DATEFORMAT
PM_HOME
Configuring PM_CODEPAGENAME
pmcmd uses the code page of the machine hosting pmcmd unless you specify the code page
environment variable, PM_CODEPAGENAME, to override it. The code page must be
compatible with the PowerCenter Server code page. pmcmd sends commands in Unicode. If
the code pages are not compatible, the PowerCenter Server might not find the workflow,
session, or task in the repository. For more information about code page compatibility, see
Globalization Overview and Code Pages in the Installation and Configuration Guide.
To configure a code page environment variable in a UNIX environment:
1.
2.
Enter a system variable named PM_CODEPAGENAME and set the value to the code
page name.
Configuring PMTOOL_DATEFORMAT
Use this environment variable to customize the way pmcmd displays the date and time. The
pmcmd program verifies that the string you specify is a valid format. If the format string is not
valid, the PowerCenter Server generates a warning message and displays the date in the format
DY MON DD HH24:MI:SS YYYY.
585
2.
Enter a system or user variable named PMTOOL_DATEFORMAT and set the value to
the display format string.
You can assign the environment variable any valid UNIX name.
To configure a password as an environment variable on UNIX:
1.
In a UNIX session, navigate to the directory where the PowerCenter Server is installed.
2.
This command runs the encryption utility pmpasswd located in the directory where the
PowerCenter Server is installed. The encryption utility generates and displays your
encrypted password. The following is sample output. In this example, the password
entered was monday.
Encrypted string -->bX34dqq<-Will decrypt to -->monday<--
586
3.
You can assign the environment variable any valid UNIX name.
To configure a username as an environment variable on Windows:
1.
2.
Enter the name of the user environment variable in the Variable field. Enter your
repository username in the Value field.
You can set these up as either a user or system variable. User variables take precedence
over system variables.
In Windows DOS, navigate to the directory where the PowerCenter Server is installed.
2.
The encryption utility generates and displays your encrypted password. The following is
sample output. In this example, the password entered was monday.
Encrypted string -->bX34dqq<-Will decrypt to -->monday<--
4.
Enter the name of your password environment variable in the Variable field. Enter your
encrypted password in the Value field.
You can set these up as either a user or system variable. User variables take precedence
over system variables.
Configuring PM_HOME
Use the PM_HOME variable to start pmcmd from a directory other than the install directory.
On UNIX, point the PM_HOME and PATH environment variables to the PowerCenter
587
Server installation directory. On Windows, include the PowerCenter Server install directory
in the environment path.
Warning: If you specify an incorrect directory path for the PM_HOME environment variable
the PowerCenter Server cannot start.
To start pmcmd from any directory on UNIX:
1.
In the system properties, add the installation directory to the path variable. For example, on
Windows 2000, configure the path variable in System settings. Click the Environment tab to
select the path variable and add the installation directory to the variable value.
588
The following command immediately starts the workflow wSalesAvg, located in the east
folder, on the remote PowerCenter Server with host name Sales listening at port 6258:
pmcmd startworkflow -u seller3 -p jackson -s SALES:6258 -f east -wait
wSalesAvg
The user, seller3, with the password jackson sends the request to start the workflow. When
you use the wait option, pmcmd returns to the shell or command prompt when the workflow
completes.
For a list of commands you can use in the command line mode, see Table 23-1 on page 582.
For details on each command see pmcmd Reference on page 594.
For information on defining username and password environment variables, see Configuring
Repository Username and Password on page 586.
589
Table 23-2 describes the connection information you enter each time you write a command in
the command line mode:
Table 23-2. Connection Information for the Command Line Mode
Required/
Optional
Description
-user
-u
Required
userEnvVar
-uservar
-uv
Required
password
-password
-p
Required
passwordEnvVar
-passwordvar
-pv
Required
serveraddr
-serveraddr
-s
Required
host
N/A
Optional
portno
N/A
Required
Parameter
Flags
username
590
Code
Description
For all commands, a return value of zero indicates that the command ran successfully. You can issue
these commands in the wait or nowait mode: starttask, startworkflow, resumeworklet, resumeworkflow,
aborttask, and abortworkflow. If you issue a command in the wait mode, a return value of zero indicates
the command ran successfully. If you issue a command in the nowait mode, a return value of zero
indicates that the request was successfully transmitted to the PowerCenter Server, and it acknowledged
the request.
The PowerCenter Server is down, or pmcmd cannot connect to the PowerCenter Server. The TCP/IP
host name or port number or a network problem occurred.
The specified task name, workflow name, or folder name does not exist.
Description
An error occurred while stopping the PowerCenter Server. Contact Informatica Technical Support.
You do not have the appropriate permissions or privileges to perform this task.
The connection to the PowerCenter Server timed out while sending the request.
12
The PowerCenter Server cannot start recovery because the session or workflow is scheduled,
suspending, waiting for an event, waiting, initializing, aborting, stopping, disabled, or running.
13
14
15
16
17
18
The PowerCenter Server found the parameter file, but it did not have the initial values for the session
parameters, such as $input or $output.
19
The PowerCenter Server cannot start the session in recovery mode because the workflow is configured
to run continuously.
20
A repository error has occurred. Please make sure that the Repository Server and the database are
running and the number of connections to the database is not exceeded.
21
22
The PowerCenter Server cannot find a unique instance of workflow/session you specified. Enter the
command again with the folder name and workflow name.
23
24
Out of memory.
25
Command is cancelled.
591
The following commands immediately start the workflow wSalesAvg, located in the east
folder:
pmcmd> connect -user seller3 -password jackson -serveraddr SALES:6258
pmcmd> setwait
pmcmd> setfolder east
pmcmd> startworkflow wSalesAvg
The setwait command means that for all subsequent commands, pmcmd returns the command
prompt when the workflow completes. The setfolder command means that for all subsequent
commands dealing with workflows or tasks, pmcmd uses the specified workflow or task from
the east folder.
For a list of commands you can use in the interactive mode, see Table 23-1 on page 582. For
details on each command see pmcmd Reference on page 594.
In either a Windows DOS session or a UNIX session, navigate to the directory where the
PowerCenter Server is installed.
2.
This command returns the PowerCenter version number and the pmcmd prompt.
3.
Or, if you use username and password environment variables, type the following at the
pmcmd prompt:
connect -uv USERNAME -pv PASSWORD -serveraddr ServerName:PortNo
For information on defining user name and password environment variables, see
Configuring Repository Username and Password on page 586.
592
If you omit connection information, pmcmd prompts you to enter the correct information.
Once pmcmd successfully connects, you receive the pmcmd prompt. At the pmcmd prompt,
you can issue commands without specifying the connection information.
Description
setfolder
Designates a folder as the default folder in which to execute all subsequent commands.
setnowait
Instructs the PowerCenter Server to execute subsequent commands in the nowait mode.
The pmcmd prompt is available after the PowerCenter Server receives the previous
command. The nowait mode is the default mode.
setwait
Instructs the PowerCenter Server to execute subsequent commands in the wait mode.
The pmcmd prompt is available only after the PowerCenter Server completes the previous
command.
showsettings
unsetfolder
For a list of all the commands that you can use in the interactive mode, see Table 23-1 on
page 582.
593
pmcmd Reference
pmcmd provides multiple ways to enter some of the parameters. For example, to enter a
repository password, use the following syntax:
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
You can use -password or -p before entering a password. Or, use -passwordvar or -pv before a
password environment variable.
To enter a password, precede the password with either the -password or the -p flag.
-password YourPassword
or
-p YourPassword
If you use a password environment variable, precede the variable name with either the -pv flag
or the -passwordvar flag.
-passwordvar PASSWORD
or
-pv PASSWORD
For a list of all the parameters you can use with pmcmd, see Table 23-5 on page 594.
Command Parameters
When you use most parameters, you precede the parameter with a flag. For ease of use, you
can use a shortened version for most flags. For example, you can either use -serveraddr or its
shortened equivalent, -s.
Table 23-5 describes the parameters used in pmcmd commands and lists the associated flags:
Table 23-5. Command Parameters
594
Parameter
Flags
Description
folder
-folder
-f
Name of the folder containing the workflow or task. Required if the workflow
or task name is not unique in the repository.
host
N/A
The name of the machine hosting the PowerCenter Server. If you do not
specify a host name, pmcmd assumes the PowerCenter Server runs on the
machine executing pmcmd.
localparamfile
-localparamfile
-lpf
paramfile
-paramfile
The paramfile parameter determines which parameter file is used when a task
or workflow runs. It overrides the configured parameter file for the workflow or
task. Use in conjunction with the starttask or startworkflow commands.
Flags
Description
password
-password
-p
passwordEnvVar
-passwordvar
-pv
portno
N/A
recovery
-recovery
serveraddr
-serveraddr
-s
startfrom
-startfrom
taskInstancePath
N/A
Indicates a task and where it appears within the workflow. A task within a
workflow is indicated by its task name alone. A task within a worklet is
indicated by WorkletName.TaskName.
userEnvVar
-uservar
-uv
username
-user
-u
workflow
-workflow
-w
To denote an empty string, use two single quotes () or two double quotes (). Be sure you
match an opening quote with a closing quote.
Syntax Notation
Table 23-6 describes the notation used in pmcmd syntax:
Table 23-6. pmcmd Syntax Notation
Convention
Description
-z
Flag placed before a parameter. This designates the parameter you enter. For
example, to enter the username, type -u or -user followed by the username.
<x>
pmcmd Reference
595
Description
<x | y >
Select between required parameters. For the command to run, you must select
from the listed parameters. If you omit a required parameter, pmcmd returns an
error message.
[x]
Optional parameter. The command runs whether or not you enter in optional
parameters. For example, if you want to use the help command, the syntax is a
follows:
Help [Command]
If you enter a command, pmcmd returns information on that command only. If you
omit the command name, pmcmd returns a list of all commands.
[x|y]
Select between optional parameters. The command runs whether or not you
enter in optional parameters. For example, many commands run in either the wait
or nowait mode.
[-wait|-nowait]
The command runs in the mode you specify. If you do not specify a mode,
pmcmd runs the command in the default nowait mode.
When a set contains subsets, the superset is indicated with bold brackets < >. A
bold pipe symbol (| )separates the subsets.
Tip: When you enter commands in pmcmd, type the command name first followed by the
Aborttask
The aborttask command aborts a task. Issue this command only after the PowerCenter Server
fails to stop the task when you issue the stoptask command. For details on how the
PowerCenter Server aborts and stops tasks, see Server Handling of Stop and Abort on
page 129.
In the command line mode, use the following syntax to abort a task:
pmcmd aborttask
<-serveraddr|-s> [host:]portno
<<-user|-u> username|<-uservar|-uv> userEnvVar>
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
[<-folder|-f> folder]
<<-workflow|-w> workflow>
[-wait|-nowait]
taskInstancePath
In the interactive mode, enter the following syntax at the pmcmd prompt to abort a task:
aborttask
[<-folder|-f> folder]
<<-workflow|-w> workflow>
596
[-wait|-nowait]
taskInstancePath
Write the taskInstancePath as a fully qualified string. If the task is within a worklet, write the
string as WorkletName.TaskName. If the task is directly within a workflow, use the task name
alone.
For information on other parameters used in this command, see Table 23-5 on page 594.
Abortworkflow
The abortworkflow command aborts a workflow. Issue this command only after the
PowerCenter Server fails to stop the workflow when you issue the stopworkflow command.
For details on how the PowerCenter Server aborts and stops workflows, see Server Handling
of Stop and Abort on page 129.
In the command line mode, use the following syntax to abort a workflow:
pmcmd abortworkflow
<-serveraddr|-s> [host:]portno
<<-user|-u> username|<-uservar|-uv> userEnvVar>
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
[<-folder|-f> folder]
[-wait|-nowait]
workflow
In the interactive mode, enter the following syntax at the pmcmd prompt to abort a workflow:
abortworkflow
[<-folder|-f> folder]
[-wait|-nowait]
workflow
For information on other parameters used in this command, see Table 23-5 on page 594.
Connect
The connect command connects the pmcmd program to the PowerCenter Server in the
interactive mode. If you omit connection information, pmcmd prompts you to enter the
correct information. Once pmcmd successfully connects, you receive the pmcmd prompt. At
the pmcmd prompt, you can issue commands without specifying the connection information.
connect
<-serveraddr|-s> [host:]portno
<<-user|-u> username|<-uservar|-uv> userEnvVar>
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
pmcmd Reference
597
Note: You can use this command in the interactive mode only.
Disconnect
The disconnect command disconnects pmcmd from the PowerCenter Server. It does not close
the pmcmd program. Use this command when you want to disconnect from a PowerCenter
Server and connect to another in the interactive mode.
In the interactive mode, use the following syntax to disconnect pmcmd from a PowerCenter
Server:
disconnect
Note: You can use this command only in the pmcmd interactive mode.
Exit
The exit command disconnects pmcmd from the PowerCenter Server and closes the pmcmd
program.
In the interactive mode, use the following syntax to exit pmcmd:
exit
Note: You can use this command only in the pmcmd interactive mode.
Getrunningsessionsdetails
The getrunningsessionsdetails command returns the details for all sessions currently running
on the PowerCenter Server. Details include startup and current time, folder and workflow
names, session instance, master and execution servers, number of successful and failed rows in
sources and targets, number of transformation errors, and number of sessions running on the
PowerCenter Server.
In the command line mode, use the following syntax to get details about sessions running on
the PowerCenter Server:
pmcmd getrunningsessionsdetails
<-serveraddr|-s> [host:]portno
<<-user|-u> username|<-uservar|-uv> userEnvVar>
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
In the interactive mode, enter the following syntax at the pmcmd prompt to get details about
the PowerCenter Server:
getrunningsessionsdetails
598
Getserverdetails
The getserverdetails command returns details about workflows and tasks running on a
PowerCenter Server.
Workflow details. Workflow details include the name of the PowerCenter Server, folder,
workflow, workflow log file, and user that runs the workflow. It includes workflow run
type, start time, run status, and run error code. It also includes the number of active
workflows and the number of scheduled workflows.
Task details. In addition to workflow details, task details include folder name, workflow
name, task instance name, task type, task start time, task run status, task run error code,
and task run mode. When the task is a session, the getserverdetails command also returns
master server name, worker server name, server grid name, the number of active sessions,
and the number of waiting sessions.
In the command line mode, use the following syntax to get details about the PowerCenter
Server:
pmcmd getserverdetails
<-serveraddr|-s> [host:]portno
<<-user|-u> username|<-uservar|-uv> userEnvVar>
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
[-all|-running|-scheduled]
In the interactive mode, enter the following syntax at the pmcmd prompt to get details about
the PowerCenter Server:
getserverdetails
[-all|-running|-scheduled]
Issue the getserverdetails command for all or some of the workflows. The -running option
returns status details on active workflows. Active workflows include running, suspending, and
suspended workflows. The -scheduled option returns status details on the scheduled
workflows. The default option is the -all option, and it returns status details on the scheduled
and running workflows.
For information on other parameters used in this command, see Table 23-5 on page 594.
Getserverproperties
The getserverproperties command returns the PowerCenter Server name, type, and version. It
returns the timestamp on the PowerCenter Server, the PowerCenter Server startup time, and
the name of the repository. It indicates the data movement mode, the PowerCenter Server
code page, and whether the PowerCenter Server can debug mappings. It also specifies the
server grid name.
In the command line mode, use the following syntax to see the PowerCenter Server
properties:
pmcmd getserverproperties
pmcmd Reference
599
<-serveraddr|-s>[host:]portno
In the interactive mode, enter the following syntax at the pmcmd prompt to see PowerCenter
Server properties:
getserverproperties
<-serveraddr|-s>[host:]portno
Serveraddr is the server name and port number of the PowerCenter Server.
Getsessionstatistics
The getsessionstatistics command returns session details and statistics. The command returns
the following information for each partition:
Session details. Session details include the name of the folder, workflow, task instance, and
mapping. It includes the task run status, session log file name, first error code and message,
the number of transformation errors, and the number of successful and failed rows for the
sources and targets. It also includes the name of the master server, worker server, and server
grid.
In the command line mode, use the following syntax to get session statistics:
pmcmd getsessionstatistics
<-serveraddr|-s> [host:]portno
<<-user|-u> username|<-uservar|-uv> userEnvVar>
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
[<-folder|-f> folder]
<<-workflow|-w> workflow>
taskInstancePath
In the interactive mode, enter the following syntax at the pmcmd prompt to get session
statistics:
getsessionstatistics
[<-folder|-f> folder]
<<-workflow|-w> workflow>
taskInstancePath
When using this command, specify the workflow name. Also, write the taskInstancePath as a
fully qualified string. If the task is within a worklet, write the string as
WorkletName.TaskName. If the task is directly within a workflow, enter only the task name.
For information on other parameters used in this command, see Table 23-5 on page 594.
600
Gettaskdetails
The gettaskdetails command returns the folder name, workflow name, task instance name,
task type, last execution start time, last execution complete time, task run status, and task run
mode. It also returns the run error code and message.
If you issue the gettaskdetails command for a Session task, the command also returns the
following additional information: mapping name, session log file name, first error code and
message, number of successful and failed rows from the source and target, the number of
transformation errors, master server name, worker server name, and server grid name.
In the command line mode, use the following syntax to get details on a task:
pmcmd gettaskdetails
<-serveraddr|-s> [host:]portno
<<-user|-u> username|<-uservar|-uv> userEnvVar>
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
[<-folder|-f> folder]
<<-workflow|-w> workflow>
taskInstancePath
In the interactive mode, enter the following syntax at the pmcmd prompt to get details on a
task:
gettaskdetails
[<-folder|-f> folder]
<<-workflow|-w> workflow>
taskInstancePath
When you use this command, specify the workflow name. Also, write the taskInstancePath as
a fully qualified string. If the task is within a worklet, write the string as
WorkletName.TaskName. If the task is directly within a workflow, enter only the task name.
For information on other parameters used in this command, see Table 23-5 on page 594.
Getworkflowdetails
The getworkflowdetails command returns the folder name, workflow name, last start time,
last completion time, workflow status, run mode, and the username that ran the last
workflow.
In the command line mode, use the following syntax to get details on a workflow:
pmcmd getworkflowdetails
<-serveraddr|-s> [host:]portno
<<-user|-u> username|<-uservar|-uv> userEnvVar>
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
pmcmd Reference
601
[<-folder|-f> folder]
workflow
In the interactive mode, enter the following syntax at the pmcmd prompt to get details on a
workflow:
getworkflowdetails
[<-folder|-f> folder]
workflow
For information on other parameters used in this command, see Table 23-5 on page 594.
Help
The help command returns the syntax for the command you specify. If you omit the
command name, pmcmd lists each command and syntax.
In the command line mode, use the following command for help with command line
commands:
pmcmd help [command]
In the interactive mode, use the following command for help with interactive mode
commands:
help [command]
Pingserver
The pingserver command verifies that the PowerCenter Server is running.
In the command line mode, use the following syntax to ping the PowerCenter Server:
pmcmd pingserver
<-serveraddr|-s> [host:]portno
In the interactive mode, enter the following syntax at the pmcmd prompt to ping the
PowerCenter Server:
pingserver
Serveraddr is the host name and port number of the PowerCenter Server.
Quit
The quit command disconnects pmcmd from the PowerCenter Server and closes the pmcmd
program.
In the interactive mode, use the following syntax to quit pmcmd:
quit
Note: You can use this command in the pmcmd interactive mode only.
602
Resumeworkflow
The resumeworkflow command resumes suspended workflows. To resume a workflow, specify
the folder and workflow name. The PowerCenter Server resumes the workflow from all
suspended and failed worklets and all suspended and failed Command, Email, and Session
tasks.
In the command line mode, use the following syntax to resume a workflow:
pmcmd resumeworkflow
<-serveraddr|-s> [host:]portno
<<-user|-u> username|<-uservar|-uv> userEnvVar>
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
[<-folder|-f> folder]
[-wait|-nowait]
[-recovery]
workflow
In the interactive mode, enter the following syntax at the pmcmd prompt to resume a
workflow:
resumeworkflow
[<-folder|-f> folder]
[-wait|-nowait]
[-recovery]
workflow
For information on other parameters used in this command, see Table 23-5 on page 594.
Resumeworklet
The resumeworklet command resumes suspended worklets. To resume the workflow from a
specific worklet, specify the taskInstancePath as a fully qualified string. If you do not specify a
taskInstancePath, the workflow resumes from the suspended worklet.
In the command line mode, use the following syntax to resume a worklet:
pmcmd resumeworklet
<-serveraddr|-s> [host:]portno
<<-user|-u> username|<-uservar|-uv> userEnvVar>
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
[<-folder|-f> folder]
<<-workflow|-w> workflow>
[-wait|-nowait]
[-recovery]
pmcmd Reference
603
taskInstancePath
In the interactive mode, enter the following syntax at the pmcmd prompt to resume a worklet:
resumeworklet
[<-folder|-f> folder]
<<-workflow|-w> workflow>
[-wait|-nowait]
[-recovery]
taskInstancePath
For information on other parameters used in this command, see Table 23-5 on page 594.
Scheduleworkflow
The scheduleworkflow command instructs the PowerCenter Server to schedule a workflow.
Use this command to reschedule a workflow that has been removed from the schedule.
In the command line mode, use the following syntax to schedule a workflow:
pmcmd scheduleworkflow <-serveraddr|-s> [host:]portno
<<-user|-u> username|<-uservar|-uv> user_env_var>
<<-password|-p> password|<-passwordvar|-pv> password_env_var>
[<-folder|-f> folder] workflow
In the interactive mode, enter the following syntax at the pmcmd prompt to schedule a
workflow:
scheduleworkflow [<-folder|-f> folder] workflow
For information on other parameters used in this command, see Table 23-5 on page 594.
Setfolder
The setfolder command designates a folder as the default folder in which to execute all
subsequent commands. After issuing this command, you do not need to enter a folder name
for workflow, task, and session commands. If you enter a folder name in a command after the
setfolder command, that folder name overrides the default folder name for that command
only.
In the interactive mode, enter the following syntax at the pmcmd prompt to designate a folder
as the default folder:
setfolder folder
Note: You can use this command in the pmcmd interactive mode only.
604
Setnowait
The setnowait command instructs the PowerCenter Server to execute subsequent commands
in the nowait mode. The nowait mode is the default mode.
In the interactive mode, enter the following syntax at the pmcmd prompt to instruct the
PowerCenter Server to execute subsequent commands in the nowait mode:
setnowait
When the nowait mode is set, the pmcmd prompt is available after the PowerCenter Server
receives the previous command. No parameters are required for this command.
Note: You can use this command in the pmcmd interactive mode only.
Setwait
The setwait command instructs the PowerCenter Server to execute subsequent commands in
the wait mode. The pmcmd prompt is available only after the PowerCenter Server completes
the previous command.
In the interactive mode, enter the following syntax at the pmcmd prompt to instruct the
PowerCenter Server to execute subsequent commands in the wait mode:
setwait
Showsettings
The showsettings command returns the name of the PowerCenter Server and repository to
which pmcmd is connected. It displays the username, wait mode, and default folder. No
parameters are required for this command.
In the interactive mode, enter the following syntax at the pmcmd prompt to display interactive
mode settings:
showsettings
Note: You can use this command in the pmcmd interactive mode only.
Shutdownserver
The shutdownserver command stops the PowerCenter Server. You must have the Super User
or Administer Server privilege to use this command.
You can shut down the PowerCenter Server in the complete, stop, or abort mode. In the
complete mode, pmcmd allows currently running workflows to complete before shutting
down the PowerCenter Server. In the stop mode, the PowerCenter Server stops the running
workflows. In the abort mode, the PowerCenter Server aborts the running workflows. For
pmcmd Reference
605
In the interactive mode, enter the following syntax at the pmcmd prompt to stop the
PowerCenter Server:
shutdownserver
<-complete|-stop|-abort>
For information on other parameters used in this command, see Table 23-5 on page 594.
Starttask
The starttask command starts a task.
In the command line mode, use the following syntax to start a task:
pmcmd starttask
<-serveraddr|-s> [host:]portno
<<-user|-u> username|<-uservar|-uv> userEnvVar>
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
[<-folder|-f> folder]
<<-workflow|-w> workflow>
[-paramfile paramfile]
[-wait|-nowait]
[-recovery]
taskInstancePath
In the interactive mode, enter the following syntax at the pmcmd prompt to start a task:
starttask
[<-folder|-f> folder]
<<-workflow|-w> workflow>
[-paramfile paramfile]
[-wait|-nowait]
[-recovery]
taskInstancePath
606
Write the taskInstancePath as a fully qualified string. If the task is within a worklet, write the
string as WorkletName.TaskName. If the task is directly within a workflow, enter only the
task.
For Windows command prompt users, the parameter file name cannot have beginning or
trailing spaces. If the name includes spaces, enclose the file name in double quotes:
-paramfile $PMRootDir\my file.txt
When you write a pmcmd command that includes a parameter file located on another
machine, use the backslash (\) with the dollar sign ($). This ensures that the machine where
the variable is defined expands the server variable.
pmcmd starttask -uv USERNAME -pv PASSWORD -s SALES:6258 -f east -w
wSalesAvg -paramfile \$PMRootDir/myfile.txt taskA
For information on other parameters used in this command, see Table 23-5 on page 594.
Startworkflow
The startworkflow command starts a workflow.
In the command line mode, use the following syntax to start a workflow:
pmcmd startworkflow
<-serveraddr|-s> [host:]portno
<<-user|-u> username|<-uservar|-uv> userEnvVar>
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
[<-folder|-f> folder]
[<-startfrom> taskInstancePath]
[-recovery]
[-paramfile paramfile]
[<-localparamfile|-lpf> localparamfile]
[-wait|-nowait]
workflow
In the interactive mode, enter the following syntax at the pmcmd prompt to start a workflow:
startworkflow
[<-folder|-f> folder]
pmcmd Reference
607
[<-startfrom> taskInstancePath]
[-recovery]
[-paramfile paramfile]
[<-localparamfile|-lpf> localparamfile]
[-wait|-nowait]
workflow
Use the -startfrom flag to start the workflow at a designated taskInstancePath. Write the
taskInstancePath as a fully qualified string. If the task is within a worklet, write the string as
WorkletName.TaskName. If the task is directly within a workflow, enter only the task. If you
do not specify a starting point, the workflow starts at the Start task.
PowerCenter Server machine. When you use a parameter file located on the PowerCenter
Server machine, use the -paramfile option to indicate the location and name of the
parameter file.
On UNIX, use the following syntax:
-paramfile $PMRootDir/myfile.txt
Local machine. When you use a parameter file located on the machine where pmcmd is
invoked, pmcmd passes variables and values in the file to the PowerCenter Server. When
you list a local parameter file, specify the absolute path or relative path to the file. Use the
-localparamfile or -lpf option to indicate the location and name of the local parameter file.
On UNIX, use the following syntax:
-lpf param_file.txt
-lpf c:\Informatica\parameterfiles\param file.txt
-localparamfile c:\Informatica\parameterfiles\param file.txt
608
Shared network drives. When you use a parameter file located on another machine, use
the backslash (\) with the dollar sign ($). This ensures that the machine where the variable
is defined expands the server variable.
-paramfile \$PMRootDir/myfile.txt
For information on other parameters used in this command, see Table 23-5 on page 594.
Stoptask
The stoptask command stops a task.
In the command line mode, use the following syntax to stop a task:
pmcmd stoptask
<-serveraddr|-s> [host:]portno
<<-user|-u> username|<-uservar|-uv> userEnvVar>
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
[<-folder|-f> folder]
<<-workflow|-w> workflow>
[-wait|-nowait]
taskInstancePath
In the interactive mode, enter the following syntax at the pmcmd prompt to stop a task:
stoptask
[<-folder|-f> folder]
<<-workflow|-w> workflow>
[-wait|-nowait] taskInstancePath
Write the taskInstancePath as a fully qualified string. If the task is within a worklet, write the
string as WorkletName.TaskName. If the task is directly within a workflow, use the task name
alone.
For information on other parameters used in this command, see Table 23-5 on page 594.
Stopworkflow
The stopworkflow command stops a workflow.
In the command line mode, use the following syntax to stop a workflow:
pmcmd stopworkflow
<-serveraddr|-s> [host:]portno
<<-user|-u> username|<-uservar|-uv> userEnvVar>
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
[<-folder|-f> folder]
pmcmd Reference
609
[-wait|-nowait]
workflow
In the interactive mode, enter the following syntax at the pmcmd prompt to stop a workflow:
stopworkflow
[<-folder|-f> folder]
[-wait|-nowait]
workflow
For information on other parameters used in this command, see Table 23-5 on page 594.
Unscheduleworkflow
The unscheduleworkflow command instructs the PowerCenter Server to remove the workflow
from the schedule.
In the command line mode, enter the following syntax at the pmcmd prompt to remove the
workflow from the schedule:
pmcmd unscheduleworkflow <-serveraddr|-s> [host:]portno
<<-user|-u> username|<-uservar|-uv> user_env_var>
<<-password|-p> password|<-passwordvar|-pv> password_env_var>
[<-folder|-f> folder] workflow
In the interactive mode, enter the following syntax at the pmcmd prompt to remove the
workflow from the schedule:
unscheduleworkflow [<-folder|-f> folder] workflow
For information on other parameters used in this command, see Table 23-5 on page 594.
Unsetfolder
The unsetfolder command designates no folder as the default folder. After you issue this
command, you must specify a folder name each time you enter a command for a session,
workflow, or task.
In the interactive mode, enter the following syntax at the pmcmd prompt to clear the setfolder
command:
unsetfolder
610
Version
The version command displays the PowerCenter version and Informatica trademark and
copyright information.
In the command line mode, use the following command to verify the PowerCenter version:
pmcmd version
In the interactive mode, enter the following syntax at the pmcmd prompt to verify the
PowerCenter version:
version
Waittask
The waittask command instructs the PowerCenter Server to complete the task before
returning the pmcmd prompt to the command prompt or shell.
In the command line mode, use the following syntax to set a task in the wait mode:
pmcmd waittask
<-serveraddr|-s> [host:]portno
<<-user|-u> username|<-uservar|-uv> userEnvVar>
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
[<-folder|-f> folder]
<<-workflow|-w> workflow>
taskInstancePath
In the interactive mode, enter the following syntax at the pmcmd prompt to set a task in the
wait mode:
waittask
[<-folder|-f> folder]
<<-workflow|-w> workflow>
taskInstancePath
Write the taskInstancePath as a fully qualified string. If the task is within a worklet, write the
string as WorkletName.TaskName. If the task is directly within a workflow, use the task name
alone.
For information on other parameters used in this command, see Table 23-5 on page 594.
Waitworkflow
The waitworkflow command notifies you whether the specified workflow has run successfully
or is not running. If the workflow is running, pmcmd indicates the success with return code 0
after the workflow has completed. If the workflow is not running, pmcmd indicates the
pmcmd Reference
611
workflow is not running with return code 3. For more information on pmcmd return codes,
see pmcmd Return Codes on page 590.
The waitworkflow command returns the pmcmd prompt to the command prompt or shell
when a workflow completes.
In the command line mode, use the following syntax to set a workflow to the wait mode:
pmcmd waitworkflow
<-serveraddr|-s> [host:]portno
<<-user|-u> username|<-uservar|-uv> userEnvVar>
<<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>
[<-folder|-f> folder]
workflow
In the interactive mode, enter the following syntax at the pmcmd prompt to set a workflow to
the wait mode:
waitworkflow
[<-folder|-f> folder]
workflow
You can use waitworkflow in conjunction with the startworkflow command if you are running
scripts. For example, you may want to check the status of a critical workflow that was
previously started. You can use the waitworkflow command to wait for that workflow to
complete before you start the next workflow.
For information on other parameters used in this command, see Table 23-5 on page 594.
612
Chapter 24
Session Caches
This chapter includes the following topics:
Overview, 614
613
Overview
The PowerCenter Server creates index and data caches in memory for Aggregator, Rank,
Joiner, and Lookup transformations in a mapping. The PowerCenter Server stores key values
in the index cache and output values in the data cache. You configure memory parameters for
the index and data cache in the transformation or session properties.
If the PowerCenter Server requires more memory, it stores overflow values in cache files.
When the session completes, the PowerCenter Server releases cache memory, and in most
circumstances, it deletes the cache files.
The PowerCenter Server creates cache files based on the PowerCenter Server code page.
Table 24-1 gives an overview of the type of information that the PowerCenter Server stores in
the index and data caches:
Table 24-1. Caching Storage Overview
Transformation
Index Cache
Data Cache
Aggregator
Rank
Joiner
Lookup
Memory Cache
The PowerCenter Server creates a memory cache based on the size configured in the session
properties. When you create a mapping, you specify the index and data cache size for each
transformation instance. When you create a session, you can override the index and data
cache size for each transformation instance in the session properties.
When you configure a session, you calculate the amount of memory the PowerCenter Server
needs to process the session. Calculate requirements based on factors such as processing
overhead and column size for key and output columns.
By default, the PowerCenter Server allocates 1,000,000 bytes to the index cache and
2,000,000 bytes to the data cache for each transformation instance. If the PowerCenter Server
cannot allocate the configured amount of cache memory, it cannot initialize the session and
the session fails.
If a server grid has 32-bit and 64-bit servers, and if a session exceeds 2 GB of memory, the
master server assigns it to a 64-bit server. For information on server grids, see Working with
Server Grids on page 446.
614
When you specify large cache sizes in transformations on 64-bit machines, the PowerCenter
Server might run out of physical memory and perform slower. If the cache size forces the
PowerCenter Server to swap virtual memory and to spill to disk, performance decreases.
Note: A PowerCenter Server running on a 32-bit machine cannot run a session if the total size
Cache Files
If the PowerCenter Server requires more memory than the configured cache size, it stores
overflow values in the cache files. Since paging to disk can slow session performance, try to
configure the index and data cache sizes to store data in memory.
The PowerCenter Server creates the index and data cache files by default in the PowerCenter
Server variable directory, $PMCacheDir. If you do not define $PMCacheDir, the
PowerCenter Server saves the files in the PMCache directory specified in the UNIX
configuration file or the cache directory in the Windows registry. If the UNIX PowerCenter
Server does not find a directory there, it creates the index and data files in the installation
directory. If the PowerCenter Server on Windows does not find a directory there, it creates the
files in the system directory.
If a cache file handles more than 2 GB of data, the PowerCenter Server creates multiple index
and data files. When creating these files, the PowerCenter Server appends a number to the
end of the filename, such as PMAGG*.idx1 and PMAGG*.idx2. The number of index and
data files are limited only by the amount of disk space available in the cache directory.
When you run a session, the PowerCenter Server writes a message in the session log indicating
the cache file name and the transformation name. When a session completes, the
PowerCenter Server typically deletes index and data cache files. However, you may find index
and data files in the cache directory under the following circumstances:
The PowerCenter Server use the following naming convention when it creates cache files:
[<Name Prefix> | <Prefix> <session ID>_<transformation ID>]_[partition
index]<suffix>.[overflow index]
Overview
615
Table 24-2 describes the naming convention for cache files that the PowerCenter Server
creates:
Table 24-2. Cache File Names
File Name Component
Description
Name Prefix
Prefix
Session ID
Transformation ID
Partition Index
If the session contains more than one partition, this identifies the partition number. The
partition index is zero-based, so the first partition has no partition index. Partition index 2
indicates a cache file created in the third partition.
Suffix
Overflow Index
If a cache file handles more than 2 GB of data, the PowerCenter Server creates multiple
index and data files. When creating these files, the PowerCenter Server appends an
overflow index to the filename, such as PMAGG*.idx.1 and PMAGG*.idx.2. The number of
index and data files are limited by the amount of disk space available in the cache
directory.
For example, in the file name, PMLKUP8_4_2.idx, PMLKUP identifies the transformation
type as Lookup, 8 is the session ID, 4 is the transformation ID, and 2 is the partition index.
The cache directory should be local to the PowerCenter Server. You might encounter
performance or reliability problems when you cache large quantities of data on a mapped or
mounted drive.
For details on tuning the caches, see Performance Tuning on page 635.
616
2.
3.
4.
Configure the index and data cache in the transformation properties. You configure cache
sizes for each transformation on the Properties tab in the mapping.
The amount of memory you configure depends on the partition properties and how much
memory cache and disk cache you want to use. If you use cache partitioning, the PowerCenter
Server requires only a portion of total cache memory for each partition. For information on
cache partitioning, see Cache Partitioning on page 620.
Cache Calculations
To determine cache requirements for a session, first add the total column size in the cache to
the row overhead. Multiply the result by the number of groups or rows in the cache. This
gives the minimum caching requirements. To determine the maximum requirements for the
index cache, you multiply the minimum requirements by two.
The following tables provide the calculations for the minimum cache requirements for each
transformation:
Table 24-3. Aggregate Cache Calculation
Cache
Calculation
Columns in Cache
Index
Group by columns.
Data
* Each aggregate function has different cache space requirements. As a general rule, you can multiply the column containing the
aggregate function by three.
617
Calculation
Columns in Cache
Index
Group by columns.
Data
Calculation
Columns in Cache
Index
Data
Calculation
Index
(minimum)
200 * [(
Index
(maximum)
Data
Columns in Cache
Columns in lookup condition.
For more information about each cache, see the separate sections in this chapter.
618
Datatype
Aggregator, Rank
Joiner, Lookup
Binary
precision + 2
precision + 8
Round to nearest multiple of 8
Date/Time
18
24
10
16
18
24
Aggregator, Rank
Joiner, Lookup
22
32
10
16
10
16
Double
10
16
Real
10
16
Integer
16
Small integer
16
Unicode mode:
2*(precision + 2)
ASCII mode: precision + 3
Unicode mode:
2*(precision + 5)
ASCII mode: precision + 9
The column sizes include the bytes required for a null indicator.
Additionally, to increase lookup and join performance, the PowerCenter Server aligns all data
for lookup and joiner caches on an eight byte boundary. So, each Lookup and Joiner column
includes rounding to the nearest multiple of eight.
619
Cache Partitioning
When you create a session with multiple partitions, the PowerCenter Server can partition
caches for the Aggregator, Joiner, Lookup, and Rank transformations. It creates a separate
cache for each partition, and each partition works with only the rows needed by that
partition. As a result, the PowerCenter Server requires only a portion of total cache memory
for each partition. When you run a session, the PowerCenter Server accesses the cache in
parallel for each partition. If you do not use cache partitioning, the PowerCenter Server
accesses the cache serially for each partition.
After you configure the session for partitioning, you can configure memory requirements and
cache directories for each transformation in the Transformations view on the Mapping tab of
the session properties. To configure the memory requirements, calculate the total
requirements for a transformation, and divide by the number of partitions. To further
improve performance, you can configure separate directories for each partition.
The guidelines for cache partitioning is different for each cached transformation:
Aggregator transformation. The PowerCenter Server uses cache partitioning for any
multi-partitioned session with an Aggregator transformation. You do not have to set a
partition point at the Aggregator transformation. For more caching information, see
Aggregator Caches on page 621.
Joiner transformation. The PowerCenter Server uses cache partitioning when you create a
partition point at the Joiner transformation. For more caching information, see Joiner
Caches on page 624.
Lookup transformation. The PowerCenter Server uses cache partitioning when you create
a hash auto-keys partition point at the Lookup transformation. For more caching
information, see Lookup Caches on page 628.
Rank transformation. The PowerCenter Server uses cache partitioning for any multipartitioned session with a Rank transformation. You do not have to set a partition point at
the Rank transformation. For more caching information, see Joiner Caches on page 624.
620
Aggregator Caches
When the PowerCenter Server runs a session with an Aggregator transformation, it stores data
in memory until it completes the aggregation. The PowerCenter Server uses cache
partitioning when you create multiple partitions in a pipeline that contains an Aggregator
transformation. It creates one memory cache and one disk cache for each partition and routes
data from one partition to another based on group key values of the transformation.
After you configure the partitions in the session, you can configure the memory requirements
and cache directories for the Aggregator transformation on the Mappings tab in session
properties. Allocate enough disk space to hold one row in each aggregate group.
If you use incremental aggregation, the PowerCenter Server saves the cache files in the cache
file directory. For information about caching with incremental aggregation, see Partitioning
Guidelines with Incremental Aggregation on page 578.
Note: The PowerCenter Server uses memory to process an Aggregator transformation with
sorted ports. It does not use cache memory. You do not need to configure cache memory for
Aggregator transformations that use sorted ports.
For more information about the Aggregator transformation, see Aggregator Transformation
in the Transformation Guide.
Columns in Cache
Group by columns.
Aggregator Caches
621
Use the column sizes in Table 24-7 on page 618 to add the group by columns.
Column Name
Column Type
Datatype
Size
STORE_ID
Group by
Integer
ITEM
Group by
String (15)
18
24
You know that there are 36 stores and 2,000 items, so the total number of groups is 72,000.
Use the following calculation to determine the minimum index cache requirements:
72,000 * (24 + 17) = 2,952,000
Therefore, this Aggregator transformation requires an index cache size between 2,952,000
and 5,904,000 bytes.
622
Columns in Cache
*The cache space requirements for aggregate functions are different for each function. However, you can multiply the port containing the
aggregate function by three for all aggregate functions.
Use the column sizes in Table 24-7 on page 618 to add the columns in the data cache:
Column Name
Column Type
Datatype
Size
ORDER_ID
Integer
SALES_PER_STORE_ITEMS
Decimal (12, 2)
30*
36
*Remember to multiply the port containing the aggregate function by three. For more information, see Table 24-3 on page 617.
Note that you do not use STORE_ID and ITEM in the data cache calculation. These
columns are connected to the target, but you do not use them in the cache calculation because
they are group by ports and are used in the index cache calculation.
The total number of groups as calculated for the index cache size is 72,000. Use the following
calculation to determine the minimum data cache requirements:
72,000 * (36 + 7) = 3,096,000
Therefore, this Aggregator transformation requires a data cache size of 3,096,000 bytes.
Aggregator Caches
623
Joiner Caches
When the PowerCenter Server runs a session with a Joiner transformation, it reads rows from
the master and detail sources concurrently and builds index and data caches based on the
master rows. The PowerCenter Server then performs the join based on the detail source data
and the cache data.
The number of rows the PowerCenter Server stores in the cache depends on the partitioning
scheme, the data in the master source, and whether or not you use sorted input. For more
information on how many rows the PowerCenter Server stores, see Calculating the Number
of Master Rows on page 625.
When you create multiple partitions in a session, the PowerCenter Server processes the Joiner
transformation differently when you use n:n partitioning and when you use 1:n partitioning.
Processing master and detail data for outer joins. When you run a multi-partitioned
session with a partitioned Joiner transformation, the PowerCenter Server builds one cache
per partition. In a single-partitioned master pipeline (1:n), the PowerCenter Server
outputs unmatched master rows after it processes all detail partitions. In a multipartitioned master pipeline (n:n), the PowerCenter Server outputs unmatched master rows
after it processes the partition for each detail cache.
Configuring memory requirements. When you run a session with a Joiner transformation,
the PowerCenter Server uses n times the memory you specify on the Transformation view
of the Mapping tab. The PowerCenter Server might page to disk if you do not specify
enough memory.
When you use 1:n partitioning, each partition requires as much memory as a 1:1 partition
session. When you configure the cache for the Joiner transformation, enter the total
transformation memory requirements for a single partition.
When you use n:n partitioning, each partition requires only a portion of the memory
required by a 1:1 partition session. When you configure the cache, divide the memory
requirements for a 1:1 partition session by the number of partitions. Enter that amount for
the cache requirements.
For example, you calculate the following cache requirements for a Joiner transformation
instance and determine that the transformation requires 2,000,000 bytes of memory for
the index cache and 4,000,000 bytes of memory for the data cache. You create four
partitions for the pipeline. If you use 1:n partitioning, you enter 2,000,000 bytes for the
index cache and 4,000,000 bytes for the data cache. If you use n:n partitioning, enter
500,000 bytes for the index cache and 1,000,000 bytes for the data cache.
To increase join performance, the PowerCenter Server aligns all data for joiner caches on an
eight byte boundary.
Note: To use n:n partitioning with a Joiner transformation, you must create a partition point
at the Joiner transformation. This allows you to create multiple partitions for both the master
and detail source of a Joiner transformation.
For more information about the Joiner transformation, see Joiner Transformation in the
Transformation Guide.
624
However, when you use sorted input and you use n:n partitioning, the PowerCenter Server
caches a different number of rows in the index and data cache:
Index cache. The PowerCenter Server caches 100 master rows with unique keys.
Data cache. The PowerCenter Server caches the master rows in the data cache that
correspond to the 100 rows in the index cache. The number of rows it stores in the data
cache depends on the data. For example, if every master row contains a unique key, the
PowerCenter Server stores 100 rows in the data cache. However, if the master data
contains multiple rows with the same key, the PowerCenter Server stores more than 100
rows in the data cache.
Columns in Cache
Joiner Caches
625
For example the Joiner transformation, JNR_ORDERS_PRODUCTS, does not use sorted
input, and it joins the sources ORDERS and PRODUCTS on ITEM_NO:
Use the column sizes in Table 24-7 on page 618 to add the columns in the index cache:
Column Name
Column Type
Datatype
Size
ITEM_NO
Decimal (10)
16
16
PRODUCTS is the master source and has 90,000 rows. Use the following calculation to
determine the minimum index cache requirements:
90,000 * (16 + 16) = 2,880,000
Therefore, this Joiner transformation requires an index cache size between 2,880,000 and
5,760,000 bytes.
626
Columns in Cache
The following figure shows the connected output ports for JNR_ORDERS_PRODUCTS:
Use the column sizes in Table 24-7 on page 618 to add the columns for the data cache:
Column Name
Column Type
Datatype
Size
ITEM_NAME
String (23)
32
PRODUCT CATEGORY
Decimal (21)
30
62
Note that you do not use ITEM_NO in the data cache calculation because it is part of the
join condition and is used in the index cache.
The master source has 90,000 rows.
Use the following calculation to determine the minimum data cache requirements:
90,000 * (62 + 8) = 6,300,000
Joiner Caches
627
Lookup Caches
When the PowerCenter Server builds a lookup cache in memory, it processes the first row of
data in the transformation. It queries the cache for each row that enters the transformation.
Configure the index and data cache memory for each Lookup transformation. The
PowerCenter Server caches data differently for static and dynamic caches and also for sessions
that use cache partitioning.
When you run the session, the PowerCenter Server rebuilds a persistent cache if any cache file
is missing or invalid.
For more information about configuring the lookup cache and how the PowerCenter Server
processes lookup requests, see Lookup Caches in the Transformation Guide.
Static Cache
When you use a static lookup cache, the PowerCenter Server creates one memory cache for
each partition.
If you use cache partitioning, the PowerCenter Server requires only a portion of the total
memory to cache each partition. So, when you configure cache size, you can divide the total
memory requirements by the number of partitions.
If you do not use cache partitioning, the PowerCenter Server requires as much memory for
each partition as it does for a single partition pipeline. So, when you configure cache size, you
enter the total memory requirements for the transformation.
If two Lookup transformations in a mapping share the cache, the PowerCenter Server does not
allocate additional memory for shared transformations in the same pipeline stage. For shared
transformations in a different pipeline stage, the PowerCenter Server does allocate additional
memory.
Static Lookup transformations that use the same data or a subset of data to create a disk cache
can share the disk cache. However, the lookup keys may be different, so the transformations
must have separate memory caches.
For more information about caching the Lookup transformation, see Lookup Caches in the
Transformation Guide.
Dynamic Cache
When you use a dynamic lookup cache, the PowerCenter Server creates the memory cache
based on whether you use cache partitioning or not.
If you use cache partitioning, the PowerCenter Server creates one memory cache for each
partition. It requires only a portion of the total memory to cache each partition. So, when you
configure cache size, you can divide the total memory requirements by the number of
partitions.
628
If you do not use cache partitioning, the PowerCenter Server creates one memory cache and
one disk cache for each transformation. All partitions share the memory and disk cache.
When you configure the cache size, enter the total memory requirements in the
transformation or on the Mapping tab in the session properties.
When Lookup transformations share a dynamic cache, the PowerCenter Server updates the
memory cache and disk cache. To keep the caches synchronized, the PowerCenter Server must
share the disk cache and the corresponding memory cache between the transformations.
Lookup transformations can share a partitioned cache if the transformations meet the
following conditions:
The cache structures are identical. The lookup/output ports for the first shared
transformation must match the lookup/output ports for the subsequent transformations.
The transformations have the same lookup conditions, and the lookup condition
columns are in the same order.
When you share Lookup caches across target load order groups, you must configure the
target load order groups with the same number of partitions.
Note: If the PowerCenter Server detects a mismatch between Lookup transformations sharing
an unnamed cache, it rebuilds the cache files. If the PowerCenter Server detects a mismatch
between Lookup transformations sharing a named cache, it fails the session.
Columns in Cache
Lookup Caches
629
Columns in Cache
Example
The Lookup transformation, LKP_PROMOS, looks up values based on the ITEM_ID. It
uses the following lookup condition:
ITEM_ID = IN_ITEM_ID1
Use the column sizes in Table 24-7 on page 618 to add the columns for the index cache:
Column Name
Column Type
Datatype
Size
ITEM_ID
integer
16
16
The lookup condition uses one column, ITEM_ID, and the table contains 60,000 rows.
Use the following calculation to determine the minimum index cache requirements:
200 * (16 + 16) = 6,400
Use the following calculation to determine the maximum index cache requirements:
60,000 * (16 + 16) * 2 = 3,840,000
630
Therefore, this Lookup transformation requires an index cache size between 6,400 and
3,840,000 bytes.
Columns in Cache
The following figure shows the connected output ports for LKP_PROMOS:
Use the column sizes in Table 24-7 on page 618 to add the columns for the data cache:
Column Name
Column Type
Datatype
Size
PROMOTION_ID
Integer
16
DISCOUNT
Decimal (10)
16
32
Lookup Caches
631
Rank Caches
When the PowerCenter Server runs a session with a Rank transformation, it compares an
input row with rows in the data cache. If the input row out-ranks a stored row, the
PowerCenter Server replaces the stored row with the input row.
For example, you configure a Rank transformation to find the top three sales. The
PowerCenter Server reads the following input data:
SALES
10,000
12,210
5,000
2,455
6,324
The PowerCenter Server caches the first three rows (10,000, 12,210, and 5,000). When the
PowerCenter Server reads the next row (2,455) it compares it to the cache values. Since the
row is lower in rank than the cached rows, it discards the row with 2,455. The next row
(6,324), however, is higher in rank than one of the cached rows. Therefore, the PowerCenter
Server replaces the cached row with the higher-ranked input row.
If the Rank transformation is configured to rank across multiple groups, the PowerCenter
Server ranks incrementally for each group it finds.
The PowerCenter Server uses cache partitioning, when you create multiple partitions in a
pipeline that contains a Rank transformation. It creates one memory cache and one disk cache
per partition and routes data from one partition to another based on group key values of the
transformation.
After you configure the partitions in the session, you can configure the memory requirements
and cache directories for the Rank transformation on the Mappings tab in session properties.
For more information about the Rank transformation, see Rank Transformation in the
Transformation Guide.
632
Columns in Cache
Group by columns.
Use the column sizes in Table 24-7 on page 618 to add the columns in the index cache:
Column Name
Column Type
Datatype
Size
PRODUCT_CATEGORY
Group by
String (21)
24
24
There are 10,000 product categories, so the total number of groups is 10,000. Use the
following calculation to determine the minimum index cache requirements:
10,000 * (24 + 17) = 410,000
Therefore, this Rank transformation requires an index cache size between 410,000 and
820,000 bytes.
Rank Caches
633
transformations. Use the following information to calculate the minimum rank data cache
size:
Rank Data Cache Calculation
Columns in Cache
Use the column sizes in Table 24-7 on page 618 to add the columns in the data cache:
Column Name
Column Type
Datatype
Size
ITEM_NO
Decimal (10)
10
ITEM_NAME
String (23)
26
PRICE
Rank port
Decimal (14)
10
46
RNK_TOPTEN ranks by price, and the total number of ranks is 10. The number of groups is
10,000.
Use the following calculation to determine the minimum data cache requirements:
10,000[(10 * (46 + 10)) + 20] = 5,800,000
634
Chapter 25
Performance Tuning
This chapter covers the following topics:
Overview, 636
635
Overview
The goal of performance tuning is to optimize session performance by eliminating
performance bottlenecks. To tune the performance of a session, first you identify a
performance bottleneck, eliminate it, and then identify the next performance bottleneck until
you are satisfied with the session performance. You can use the test load option to run sessions
when you tune session performance.
The most common performance bottleneck occurs when the PowerCenter Server writes to a
target database. You can identify performance bottlenecks by the following methods:
Running test sessions. You can configure a test session to read from a flat file source or to
write to a flat file target to identify source and target bottlenecks.
Studying performance details. You can create a set of information called performance
details to identify session bottlenecks. Performance details provide information such as
buffer input and output efficiency. For details about performance details, see Creating
and Viewing Performance Details on page 436.
Monitoring system performance. You can use system monitoring tools to view percent
CPU usage, I/O waits, and paging to identify system bottlenecks.
Once you determine the location of a performance bottleneck, you can eliminate the
bottleneck by following these guidelines:
Eliminate source and target database bottlenecks. Have the database administrator
optimize database performance by optimizing the query, increasing the database network
packet size, or configuring index and key constraints.
Eliminate mapping bottlenecks. Fine tune the pipeline logic and transformation settings
and options in mappings to eliminate mapping bottlenecks.
Eliminate session bottlenecks. You can optimize the session strategy and use performance
details to help tune session configuration.
Eliminate system bottlenecks. Have the system administrator analyze information from
system monitoring tools and improve CPU and network performance.
If you tune all the bottlenecks above, you can further optimize session performance by
increasing the number of pipeline partitions in the session. Adding partitions can improve
performance by utilizing more of the system hardware while processing the session.
Because determining the best way to improve performance can be complex, change only one
variable at a time, and time the session both before and after the change. If session
performance does not improve, you might want to return to your original configurations.
636
Target
2.
Source
3.
Mapping
4.
Session
5.
System
You can identify performance bottlenecks by running test sessions, viewing performance
details, and using system monitoring tools.
637
Add a filter transformation in the mapping after each source qualifier. Set the filter condition
to false so that no data is processed past the filter transformation. If the time it takes to run
the new session remains about the same, then you have a source bottleneck.
2.
In the copied mapping, keep only the sources, source qualifiers, and any custom joins or
queries.
3.
4.
Use the read test mapping in a test session. If the test session performance is similar to the
original session, you have a source bottleneck.
638
You can also identify mapping bottlenecks by using performance details. High errorrows and
rowsinlookupcache counters indicate a mapping bottleneck. For details on eliminating
mapping bottlenecks, see Optimizing the Mapping on page 647.
639
If the session performs incremental aggregation, the PowerCenter Server reads historical
aggregate data from the local disk during the session and writes to disk when saving historical
data. As a result, the Aggregator_readtodisk and writetodisk counters display a number
besides zero. However, since the PowerCenter Server writes the historical data to a file at the
end of the session, you can still evaluate the counters during the session. If the counters show
any number other than zero during the session run, you can increase performance by tuning
the index and data cache sizes.
To view the session performance details while the session runs, right-click the session in the
Workflow Monitor and choose Properties. Click the Properties tab in the details dialog box.
640
Use the Windows Performance Monitor to create a chart that provides the following
information:
Percent processor time. If you have several CPUs, monitor each CPU for percent
processor time. If the processors are utilized at more than 80%, you may consider adding
more processors.
Pages/second. If pages/second is greater than five, you may have excessive memory
pressure (thrashing). You may consider adding more physical memory.
Physical disks percent time. This is the percent time that the physical disk is busy
performing read or write requests. You may consider adding another disk device or
upgrading the disk device.
Physical disks queue length. This is the number of users waiting for access to the same
disk device. If physical disk queue length is greater than two, you may consider adding
another disk device or upgrading the disk device.
Server total bytes per second. This is the number of bytes the server has sent to and
received from the network. You can use this information to improve network bandwidth.
lsattr -E -I sys0. Use this tool to view current system settings. This tool shows maxuproc,
the maximum level of user background processes. You may consider reducing the amount
of background process on your system.
iostat. Use this tool to monitor loading operation for every disk attached to the database
server. Iostat displays the percentage of time that the disk was physically active. High disk
utilization suggests that you may need to add more disks.
If you use disk arrays, use utilities provided with the disk arrays instead of iostat.
vmstat or sar -w. Use this tool to monitor disk swapping actions. Swapping should not
occur during the session. If swapping does occur, you may consider increasing your
physical memory or reduce the number of memory-intensive applications on the disk.
sar -u. Use this tool to monitor CPU loading. This tool provides percent usage on user,
system, idle time, and waiting time. If the percent time spent waiting on I/O (%wio) is
high, you may consider using other under-utilized disks. For example, if your source data,
target data, lookup, rank, and aggregate cache files are all on the same disk, consider
putting them on different disks.
641
Bulk Loading
You can use bulk loading to improve the performance of a session that inserts a large amount
of data to a DB2, Sybase, Oracle, or Microsoft SQL Server database. Configure bulk loading
on the Mapping tab.
When bulk loading, the PowerCenter Server bypasses the database log, which speeds
performance. Without writing to the database log, however, the target database cannot
perform rollback. As a result, you may not be able to perform recovery. Therefore, you must
642
weigh the importance of improved session performance against the ability to recover an
incomplete session.
For more information on configuring bulk loading, see Bulk Loading on page 252.
External Loading
You can use the External Loader session option to integrate external loading with a session.
If you have a DB2 EE or DB2 EEE target database, you can use the DB2 EE or DB2 EEE
external loaders to bulk load target files. The DB2 EE external loader uses the PowerCenter
Server db2load utility to load data. The DB2 EEE external loader uses the DB2 Autoloader
utility.
If you have a Teradata target database, you can use the Teradata external loader utility to bulk
load target files.
If your target database runs on Oracle, you can use the Oracle SQL*Loader utility to bulk
load target files. When you load data to an Oracle database using a pipeline with multiple
partitions, you can increase performance if you create the Oracle target table with the same
number of partitions you use for the pipeline.
If your target database runs on Sybase IQ, you can use the Sybase IQ external loader utility to
bulk load target files. If your Sybase IQ database is local to the PowerCenter Server on your
UNIX system, you can increase performance by loading data to target tables directly from
named pipes.
For details on the External Loader option, see External Loading on page 523.
643
When you write to Oracle target databases, the database uses rollback segments during loads.
Make sure that the database stores rollback segments in appropriate tablespaces, preferably on
different disks. The rollback segments should also have appropriate storage clauses.
You can optimize the Oracle target database by tuning the Oracle redo log. The Oracle
database uses the redo log to log loading operations. Make sure that redo log size and buffer
size are optimal. You can view redo log properties in the init.ora file.
If your Oracle instance is local to the PowerCenter Server, you can optimize performance by
using IPC protocol to connect to the Oracle database. You can set up Oracle database
connection in listener.ora and tnsnames.ora.
See your Oracle documentation for details on optimizing Oracle databases.
644
645
Change the packet size in the Workflow Manager database connection to reflect the
database server packet size.
For Oracle, increase the packet size in listener.ora and tnsnames.ora. For other databases,
check your database documentation for details on optimizing network packet size.
646
Optimize transformations.
Optimize expressions.
647
transformations, you can minimize work by subtracting the percentage before splitting the
pipeline as shown in Figure 25-1:
Figure 25-1. Single-Pass Reading
648
Caching Lookups
If a mapping contains Lookup transformations, you might want to enable lookup caching. In
general, you want to cache lookup tables that need less than 300MB.
When you enable caching, the PowerCenter Server caches the lookup table and queries the
lookup cache during the session. When this option is not enabled, the PowerCenter Server
queries the lookup table on a row-by-row basis. You can increase performance using a shared
or persistent cache:
Shared cache. You can share the lookup cache between multiple transformations. You can
share an unnamed cache between transformations in the same mapping. You can share a
named cache between transformations in the same or different mappings.
Persistent cache. If you want to save and reuse the cache files, you can configure the
transformation to use a persistent cache. Use this feature when you know the lookup table
does not change between session runs. Using a persistent cache can improve performance
because the PowerCenter Server builds the memory cache from the cache files instead of
from the database.
For more information on lookup caching options, see Lookup Transformation in the
Transformation Guide.
Cached lookups. You can improve performance by indexing the columns in the lookup
ORDER BY. The session log contains the ORDER BY statement.
Optimizing the Mapping
649
Uncached lookups. Because the PowerCenter Server issues a SELECT statement for each
row passing into the Lookup transformation, you can improve performance by indexing
the columns in the lookup condition.
651
Optimizing Expressions
As a final step in tuning the mapping, you can focus on the expressions used in
transformations. When examining expressions, focus on complex expressions for possible
simplification. Remove expressions one-by-one to isolate the slow expressions.
Once you locate the slowest expressions, take a closer look at how you can optimize those
expressions.
If you factor out the aggregate function call, as below, the PowerCenter Server adds
COLUMN_A to COLUMN_B, then finds the sum of both.
SUM(COLUMN_A + COLUMN_B)
652
653
VAL_A ,
IIF( FLG_A = 'N' and FLG_B = 'Y' AND FLG_C = 'Y',
VAL_B + VAL_C,
IIF( FLG_A = 'N' and FLG_B = 'Y' AND FLG_C = 'N',
VAL_B ,
IIF( FLG_A = 'N' and FLG_B = 'N' AND FLG_C = 'Y',
VAL_C,
IIF( FLG_A = 'N' and FLG_B = 'N' AND FLG_C = 'N',
0.0,
))))))))
This results in three IIFs, two comparisons, two additions, and a faster session.
Evaluating Expressions
If you are not sure which expressions slow performance, the following steps can help isolate
the problem.
To evaluate expression performance:
654
1.
2.
Copy the mapping and replace half of the complex expressions with a constant.
3.
4.
Make another copy of the mapping and replace the other half of the complex expressions
with a constant.
5.
Table 25-1 lists the settings and values you can use to improve session performance:
Table 25-1. Session Tuning Parameters
Setting
Default Value
Suggested
Minimum Value
Suggested
Maximum Value
12,000,000 bytes
6,000,000 bytes
128,000,000 bytes
64,000 bytes
4,000 bytes
128,000 bytes
1,000,000 bytes
1,000,000 bytes
12,000,000 bytes
2,000,000 bytes
2,000,000 bytes
24,000,000 bytes
Commit interval
10,000 rows
N/A
N/A
High Precision
Disabled
N/A
N/A
Tracing Level
Normal
Terse
N/A
Pipeline Partitioning
If you purchased the partitioning option, you can increase the number of partitions in a
pipeline to improve session performance. Increasing the number of partitions allows the
PowerCenter Server to create multiple connections to sources and process partitions of source
data concurrently.
When you create a session, the Workflow Manager validates each pipeline in the mapping for
partitioning. You can specify multiple partitions in a pipeline if the PowerCenter Server can
maintain data consistency when it processes the partitioned data.
For details on partitioning sessions, see Pipeline Partitioning on page 663.
655
By default, a session has enough buffer blocks for 83 sources and targets. If you run a session
that has more than 83 sources and targets, you can increase the number of available memory
blocks by adjusting the following session parameters:
DTM Buffer Size. Increase the DTM buffer size found in the Performance settings of the
Properties tab. The default setting is 12,000,000 bytes.
Default Buffer Block Size. Decrease the buffer block size found in the Advanced settings
of the Config Object tab. The default setting is 64,000 bytes.
To configure these settings, first determine the number of memory blocks the PowerCenter
Server requires to initialize the session. Then, based on default settings, you can calculate the
buffer size and/or the buffer block size to create the required number of session blocks.
If you have XML sources or targets in your mapping, use the number of groups in the XML
source or target in your calculation for the total number of sources and targets.
For example, you create a session that contains a single partition using a mapping that
contains 50 sources and 50 targets.
1.
2.
Next, based on default settings, you determine that you can change the DTM Buffer Size
to 15,000,000, or you can change the Default Buffer Block Size to 54,000:
(session Buffer Blocks) = (.9) * (DTM Buffer Size) / (Default Buffer Block
Size) * (number of partitions)
200 = .9 * 14222222 / 64000 * 1
or
200 = .9 * 12000000 / 54000 * 1
because the PowerCenter Server is unable to allocate memory to the required processes.
656
2.
Increase the setting for DTM Buffer Size, and click OK.
The default for DTM Buffer Size is 12,000,000 bytes. Increase the setting by increments of
multiples of the buffer block size, then run and time the session after each increase.
2.
3.
4.
5.
If you have more than one target in the mapping, repeat steps 2-4 for each additional
target to calculate the precision for each target.
6.
7.
Choose the largest precision of all the source and target precisions for the total precision
in your buffer block size calculation.
The total precision represents the total bytes needed to move the largest row of data. For
example, if the total precision equals 33,000, then the PowerCenter Server requires 33,000
bytes in the buffers to move that row. If the buffer block size is 64,000 bytes, the PowerCenter
Server can move only one row at a time.
Ideally, a buffer should accommodate at least 20 rows at a time. So if the total precision is
greater than 32,000, increase the size of the buffers to improve performance.
To increase buffer block size:
1.
2.
Increase the setting for Default Buffer Block Size, and click OK.
The default for this setting is 64,000 bytes. Increase this setting in relation to the size of the
rows. As with DTM buffer memory allocation, increasing buffer block size should improve
657
performance. If you do not see an increase, buffer block size is not a factor in session
performance.
658
The Decimal datatype is a numeric datatype with a maximum precision of 28. To use a high
precision Decimal datatype in a session, configure the PowerCenter Server to recognize this
datatype by selecting Enable High Precision in the session properties. However, since reading
and manipulating the high precision datatype slows the PowerCenter Server, you can improve
session performance by disabling high precision.
When you disable high precision, the PowerCenter Server converts data to a double. The
PowerCenter Server reads the Decimal row 3900058411382035317455530282 as
390005841138203 x 1013 . For details on high precision, Handling High Precision Data on
page 204.
Click the Performance settings on the Properties tab to enable high precision.
659
Improve network speed. Slow network connections can slow session performance. Have
your system administrator determine if your network runs at an optimal speed. Decrease
the number of network hops between the PowerCenter Server and databases.
Use a server grid. Use a collection of PowerCenter Servers to distribute and process the
workload of a workflow. For information on server grids, see Working with Server Grids
on page 446.
Improve CPU performance. Run the PowerCenter Server and related machines on high
performance CPUs, or configure your system to use additional CPUs.
Configure the PowerCenter Server for ASCII data movement mode. When all character
data processed by the PowerCenter Server is 7-bit ASCII or EBCDIC, configure the
PowerCenter Server for ASCII data movement mode.
Check hard disks on related machines. Slow disk access on source and target databases,
source and target file systems, as well as the PowerCenter Server and repository machines
can slow session performance. Have your system administrator evaluate the hard disks on
your machines.
Reduce paging. When an operating system runs out of physical memory, it starts paging to
disk to free physical memory. Configure the physical memory for the PowerCenter Server
machine to minimize paging to disk.
660
If you use relational source or target databases, try to minimize the number of network hops
between the source and target databases and the PowerCenter Server. Moving the target
database onto a server system might improve PowerCenter Server performance.
When you run sessions that contain multiple partitions, have your network administrator
analyze the network and make sure it has enough bandwidth to handle the data moving across
the network from all partitions.
661
Reducing Paging
Paging occurs when the PowerCenter Server operating system runs out of memory for a
particular operation and uses the local disk for memory. You can free up more memory or
increase physical memory to reduce paging and the slow performance that results from
paging. Monitor paging activity using system tools.
You might want to increase system memory in the following circumstances:
If you cannot free up memory, you might want to add memory to the system.
662
Pipeline Partitioning
Once you have tuned the application, databases, and system for maximum single-partition
performance, you may find that your system is under-utilized. At this point, you can
reconfigure your session to have two or more partitions. Adding partitions may improve
performance by utilizing more of the hardware while processing the session.
Use the following tips when you add partitions to a session:
Add one partition at a time. To best monitor performance, add one partition at a time,
and note your session settings before you add each partition.
Set DTM Buffer Memory. For a session with n partitions, this value should be at least n
times the value for the session with one partition.
Set cached values for Sequence Generator. For a session with n partitions, there should be
no need to use the Number of Cached Values property of the Sequence Generator
transformation. If you must set this value to a value greater than zero, make sure it is at
least n times the original value for the session with one partition.
Partition the source data evenly. Configure each partition to extract the same number of
rows.
Monitor the system while running the session. If there are CPU cycles available (twenty
percent or more idle time) then this session might see a performance improvement by
adding a partition.
Monitor the system after adding a partition. If the CPU utilization does not go up, the
wait for I/O time goes up, or the total data transformation rate goes down, then there is
probably a hardware or software bottleneck. If the wait for I/O time goes up a significant
amount, then check the system for hardware bottlenecks. Otherwise, check the database
configuration.
Tune databases and system. Make sure that your databases are tuned properly for parallel
ETL and that your system has no bottlenecks.
Pipeline Partitioning
663
You can also consider adding partitions to increase the speed of your query. Each database
provides an option to separate the data into different tablespaces. If your database allows it,
you can use the SQL override feature to provide a query that extracts data from a single
partition.
To maximize a single-sorted query on your database, you need to look at options that enable
parallelization. There are many options in each database that may increase the speed of your
query.
Here are some configuration options to look for in your source database:
Check for configuration parameters that perform automatic tuning. For example, Oracle
has a parameter called parallel_automatic_tuning.
Make sure intra-parallelism (the ability to run multiple threads on a single query) is
enabled. For example, on Oracle you should look at parallel_adaptive_multi_user. On
DB2, you should look at intra_parallel.
Maximum number of parallel processes that are available for parallel executions. For
example, on Oracle, you should look at parallel_max_servers. On DB2, you should look at
max_agents.
Size for various resources used in parallelization. For example, Oracle has parameters such
as large_pool_size, shared_pool_size, hash_area_size, parallel_execution_message_size,
and optimizer_percent_parallel. DB2 has configuration parameters such as dft_fetch_size,
fcm_num_buffers, and sort_heap.
Turn off options that may affect your database scalability. For example, disable archive
logging and timed statistics on Oracle.
Note: The above examples are not a comprehensive list of all the tuning options available to
you on the databases. Check your individual database documentation for all performance
tuning configuration parameters available.
664
Look for a configuration option that needs to be set explicitly to enable parallel inserts.
For example, Oracle has db_writer_processes, and DB2 has max_agents (some databases
may have this enabled by default).
Consider partitioning your target table. If it is possible, try to have each partition write to
a single database partition. You can use the Router transformation to do this. Also, look
into having the database partitions on separate disks to prevent I/O contention among the
pipeline partitions.
Turn off options that may affect your database scalability. For example, disable archive
logging and timed statistics on Oracle.
Pipeline Partitioning
665
666
Appendix A
Session Properties
Reference
This appendix contains a listing of settings in the session properties. These settings are
grouped by the following tabs:
667
General Tab
By default, the General tab appears when you edit a session task.
Figure A-1 displays the General tab:
Figure A-1. General Tab
On the General tab you can rename the session task and enter a description for the session
task.
Table A-1 describes settings on the General tab:
Table A-1. General Tab
668
General Tab
Options
Required/
Optional
Description
Rename
Optional
The Rename button allows you to enter a new name for the session task.
Description
Optional
You can enter a description for the session task in the Description field.
Mapping name
Required
Server
Required
Optional
Required/
Optional
Description
Optional
Fails the parent worklet or workflow if this task does not run.
Optional
Required
Runs the task when all or one of the input link conditions evaluate to True.
General Tab
669
Properties Tab
On the Properties tab you can configure the following settings:
General Options. General Options settings allow you to configure session log file name,
session log file directory, parameter filename and other general session settings. For more
information, see General Options Settings on page 670.
Performance. The Performance settings allow you to increase memory size, collect
performance details, and set configuration parameters. For more information, see
Performance Settings on page 673.
670
Table A-2 describes the General Options settings on the Properties tab:
Table A-2. Properties Tab - General Options Settings
General Options
Settings
Required/
Optional
Optional
By default, the PowerCenter Server uses the session name for the log file
name: s_mapping name.log. For a debug session, it uses
DebugSession_mapping name.log.
Optionally enter a file name, a file name and directory, or use the
$PMSessionLogFile session parameter. The PowerCenter Server appends
information in this field to that entered in the Session Log File Directory field.
For example, if you have C:\session_logs\ in the Session Log File Directory
File field, then enter logname.txt in the Session Log File field, the
PowerCenter Server writes the logname.txt to the C:\session_logs\ directory.
You can also use the $PMSessionLogFile session parameter to represent the
name of the session log or the name and location of the session log. For details
on session parameters, see Session Parameters on page 495.
Required
Designates a location for the session log file. By default, the PowerCenter
Server writes the log file in the server variable directory,
$PMSessionLogFileDir.
If you enter a full directory and file name in the Session Log File Name field,
clear this field.
Parameter File
Name
Optional
Designates the name and directory for the parameter file. Use the parameter
file to define session parameters. You can also use it to override values of
mapping parameters and variables. For details on session parameters, see
Session Parameters on page 495. For details on mapping parameters and
variables, see Mapping Parameters and Variables in the Designer Guide.
Optional
Number of Rows to
Test
Optional
Enter the number of source rows you want the PowerCenter Server to test
load.
The PowerCenter Server reads the exact number you configure for the test
load. You cannot perform a test load when you run a session against a
mapping that contains XML sources.
Description
Properties Tab
671
672
General Options
Settings
Required/
Optional
$Source
Connection Value
Optional
Enter the database connection you want the PowerCenter Server to use for the
$Source variable. Choose a relational or application database connection. You
can also choose a $DBConnection parameter.
You can use the $Source variable in Lookup and Stored Procedure
transformations to specify the database location for the lookup table or stored
procedure.
If you use $Source in a mapping, you can specify the database location in this
field to ensure the PowerCenter Server uses the correct database connection
to run the session.
If you use $Source in a mapping, but do not specify a database connection in
this field, the PowerCenter Server determines which database connection to
use when it runs the session. If it cannot determine the database connection, it
fails the session. For more information, see Lookup Transformation and
Stored Procedure Transformation in the Transformation Guide.
$Target Connection
Value
Optional
Enter the database connection you want the PowerCenter Server to use for the
$Target variable. Choose a relational or application database connection. You
can also choose a $DBConnection parameter.
You can use the $Target variable in Lookup and Stored Procedure
transformations to specify the database location for the lookup table or stored
procedure.
If you use $Target in a mapping, you can specify the database location in this
field to ensure the PowerCenter Server uses the correct database connection
to run the session.
If you use $Target in a mapping, but do not specify a database connection in
this field, the PowerCenter Server determines which database connection to
use when it runs the session. If it cannot determine the database connection, it
fails the session. For more information, see Lookup Transformation and
Stored Procedure Transformation in the Transformation Guide.
Required
Indicates how the PowerCenter Server treats all source rows. If the mapping
for the session contains an Update Strategy transformation or a Custom
transformation configured to set the update strategy, the default option is Data
Driven.
When you select Data Driven and you load to either a Microsoft SQL Server or
Oracle database, you must use a normal load. If you bulk load, the
PowerCenter Server fails the session.
Commit Type
Required
Commit Interval
Required
In conjunction with the selected commit interval type, indicates the number of
rows. By default, the PowerCenter Server uses a commit interval of 10,000
rows.
This option is not available for user-defined commit.
Description
Required/
Optional
Commit On End Of
File
Required
Rollback
Transactions on
Errors
Optional
For source-based commit, the PowerCenter Server rolls back the transaction at
the next commit point when it encounters a non-fatal writer error.
For user-defined commit, the PowerCenter Server rolls back the transaction at
the next commit point when it encounters a non-fatal error.
This option is not available for target-based commit.
Description
*Tip: When you bulk load to Microsoft SQL Server or Oracle targets, define a large commit interval. Microsoft SQL
Server and Oracle start a new bulk load transaction after each commit. Increasing the commit interval reduces the
number of bulk load transactions and increases performance.
Performance Settings
You can configure performance settings on the Properties tab. In Performance settings you
can increase memory size, collect performance details, and set configuration parameters.
Figure A-3 displays the Performance settings on the Properties tab:
Figure A-3. Properties Tab - Performance Settings
Properties Tab
673
674
Performance
Settings
Required/
Optional
Required
The amount of memory allocated to the session from the DTM process. By
default, the Workflow Manager allocates 12 MB for DTM buffer memory. If a
session contains large amounts of character data and you configure it to run in
Unicode mode, increase the DTM Buffer size to 24 MB.
Note: If a source contains a large binary object with a precision larger than the
allocated DTM buffer size, then increase the DTM buffer size to increase the
buffer memory. If you do not increase the DTM buffer memory, the session will
fail.
For information on improving session performance, see Performance Tuning
on page 635.
Collect
Performance Data
Optional
Incremental
Aggregation
Optional
Reinitialize
Aggregate Cache
Optional
Enable High
Precision
Optional
Session Retry On
Deadlock
Optional
Select this option if you want the PowerCenter Server to retry target writes on
deadlock. You can only use Session Retry on Deadlock for sessions configured
for normal load. This option is disabled for bulk mode. You can configure the
PowerCenter Server to set the number of deadlock retries and the deadlock
sleep time period.
Required
Specify a sort order for the session. The session properties display all sort
orders associated with the PowerCenter Server code page. When the
PowerCenter Server runs in Unicode mode, it sorts character data in the
session using the selected sort order. When the PowerCenter Server runs in
ASCII mode, it ignores this setting and uses a binary sort order to sort
character data.
Description
Log Options. Log options allow you to configure how you want to save the session log. By
default, the PowerCenter Server saves only the current session log. For more information,
see Log Options Settings on page 677.
Error Handling. Error Handling settings allow you to determine if the session fails or
continues when it encounters pre-session command errors, stored procedure errors, or a
specified number of session errors. For more information see, Error Handling Settings
on page 678.
Advanced Settings
Advanced settings allow you to configure constraint-based loading, lookup caches, and buffer
sizes.
675
Figure A-4 displays the Advanced settings on the Config Object tab:
Figure A-4. Config Object Tab - Advanced Settings
Table A-4 describes the Advanced settings of the Config Object tab:
Table A-4. Config Object Tab - Advanced Settings
676
Advanced
Settings
Required/
Optional
Constraint Based
Load Ordering
Optional
Cache Lookup()
Function
Optional
Description
Required/
Optional
Default Buffer
Block Size
Optional
Line Sequential
Buffer Length
Optional
Affects the way the PowerCenter Server reads flat files. Increase this setting
from the default of 1024 bytes per line only if source flat file records are larger
than 1024 bytes.
Description
677
Table A-5 displays the Log Options settings of the Config Object tab:
Table A-5. Config Object Tab - Log Options Settings
Log Options Settings
Required/
Optional
Required
Required
The number of historical session logs you want the PowerCenter Server
to save.
The Informatica saves the number of historical logs you specify, plus the
most recent session log. Therefore, if you specify 5 runs, the
PowerCenter Server saves the most recent session log, plus historical
logs 0-4, for a total of 6 logs.
You can specify up to 2,147,483,647 historical logs. If you specify 0 logs,
the PowerCenter Server saves only the most recent session log.
Description
678
Figure A-6 displays the Error Handling settings on the Config Object tab:
Figure A-6. Config Object Tab - Error Handling Settings
Table A-6 describes the Error handling settings of the Config Object tab:
Table A-6. Config Object Tab - Error Handling Settings
Error Handling
Settings
Required/
Optional
Stop On Errors
Optional
Override Tracing
Optional
Description
679
680
Error Handling
Settings
Required/
Optional
On Stored Procedure
Error
Optional
On Pre-Session
Command Task Error
Optional
Optional
Enable Recovery
Optional
Required
Specifies the type of error log to create. You can specify relational, file, or
no log. By default, the Error Log Type is set to none.
Optional
Optional
Specifies table name prefix for a relational error log. Oracle and Sybase
have a 30 character limit for table names. If a table name exceeds 30
characters, the session fails.
Optional
Specifies the directory where errors are logged. By default, the error log
file directory is $PMBadFilesDir\.
Optional
Specifies error log file name. By default, the error log file name is
PMError.log.
Optional
Specifies whether or not to log row data. By default, the check box is clear
and row data is not logged.
Optional
Specifies whether or not to log source row data. By default, the check box
is clear and source row data is not logged.
Optional
Delimiter for string type source row data and transformation group row
data. By default, the PowerCenter Server uses a pipe ( | ) delimiter. Verify
that you do not use the same delimiter for the row data as the error
logging columns. If you use the same delimiter, you may find it difficult to
read the error log file.
Description
Connections
Sources
Targets
Transformations
Connections Node
The Connections node displays the source, target, lookup, stored procedure, FTP, external
loader, and queue connections. You can choose connection types and connection values. You
can also edit connection object values.
Figure A-7 displays the Connections settings on the Mapping tab:
Figure A-7. Mapping Tab - Connections Settings
681
682
Connections
Node Settings
Required/
Optional
Type
Required
Description
Enter the connection type for relational and non-relational sources and targets.
Specifies Relational for relational sources and targets.
You can choose the following connection types for flat file, XML, and MQSeries
sources/Targets:
- Queue. Select this connection type to access a MQSeries source if you are
using MQ Source Qualifiers. For static MQSeries targets, set the connection
type to FTP or Queue. For dynamic MQSeries targets, the connection type is
set to Queue. MQSeries connections must be defined in the Workflow
Manager prior to configuring sessions. For more information, see the
PowerCenter Connect for IBM MQSeries User and Administrator Guide .
- Loader. Select this connection type to use the External Loader to load output
files to Teradata, Oracle, DB2, or Sybase IQ databases. If you select this
option, select a configured loader connection in the Value column.
To use this option, you must use a mapping with a relational target definition
and choose File as the writer type on the Writers tab for the relational target
instance. As the PowerCenter Server completes the session, it uses an
external loader to load target files to the Oracle, Sybase IQ, DB2, or Teradata
database. You cannot choose external loader for flat file or XML target
definitions in the mapping.
Note to Oracle 8 users: If you configure a session to write to an Oracle 8
external loader target table in bulk mode with NOT NULL constraints on any
columns, the session may write the null character into a NOT NULL column if
the mapping generates a NULL output.
For details on using the external loader feature, see External Loading on
page 523.
- FTP. Select this connection type to use FTP to access the source/target
directory for flat file and XML sources/targets. If you select this option, select
a configured FTP connection in the Value column. FTP connections must be
defined in the Workflow Manager prior to configuring sessions. For details on
using FTP, see Using FTP on page 559.
- None. Choose None when you want to read from a local flat file or XML file, or
if you are using an associated source for a MQSeries session.
The type also specifies lists the connections in the mapping, such as $Source
connection value and $Target connection value.
You can also configure connection information for Lookups and Stored
Procedures.
Required/
Optional
Description
Partitions
N/A
Value
Required
Enter a source and target connection based on the value you choose in the
Type column. You can also specify the $Source and $Target connection value:
- $Source connection value. Enter the database connection you want the
PowerCenter Server to use for the $Source variable. Choose a relational or
application database connection. You can also choose a $DBConnection
parameter. You can use the $Source variable in Lookup and Stored
Procedure transformations to specify the database location for the lookup
table or stored procedure. If you use $Source in a mapping, you can specify
the database location in this field to ensure the PowerCenter Server uses the
correct database connection to run the session. If you use $Source in a
mapping, but do not specify a database connection in this field, the
PowerCenter Server determines which database connection to use when it
runs the session. If it cannot determine the database connection, it fails the
session. For more information, see the Transformation Guide.
- $Target connection value. Enter the database connection you want the
PowerCenter Server to use for the $Target variable. Choose a relational or
application database connection. You can also choose a $DBConnection
parameter. You can use the $Target variable in Lookup and Stored Procedure
transformations to specify the database location for the lookup table or stored
procedure. If you use $Target in a mapping, you can specify the database
location in this field to ensure the PowerCenter Server uses the correct
database connection to run the session. If you use $Target in a mapping, but
do not specify a database connection in this field, the PowerCenter Server
determines which database connection to use when it runs the session. If it
cannot determine the database connection, it fails the session. For more
information, see the Transformation Guide.
You can also specify the lookup and stored procedure location information
value, if your mapping has lookups or stored procedures.
Sources Node
The Sources node lists the sources used in the session and displays their settings. If you want
to view and configure the settings of a specific source, select the source from the list.
You can configure the following settings:
Readers. The Readers settings displays the reader the PowerCenter Server uses with each
source instance. For more information, see Readers Settings on page 684.
Connections. The Connections settings allows you to configure connections for the
sources. For more information, see Connections Settings on page 684.
Properties. The Properties settings allows you to configure the source properties. For more
information, see Properties Settings on page 686.
683
Readers Settings
You can view the reader the PowerCenter Server uses with each source instance. The
Workflow Manager specifies the necessary reader for each source instance. For relations
sources the reader is Relational Reader and for file sources it is File Reader.
Figure A-8 displays the Readers settings on the Mapping tab (Sources node):
Figure A-8. Mapping Tab - Sources Node - Readers Settings
Connections Settings
You can configure the connections the PowerCenter Server uses with each source instance.
684
Figure A-9 displays the Connections settings on the Mapping tab (Sources node):
Figure A-9. Mapping Tab - Sources Node - Connections Settings
Table A-8 describes the Connections settings on the Mapping tab (Sources node):
Table A-8. Mapping Tab - Sources Node - Connections Settings
Connections
Settings
Required/
Optional
Type
Required
Enter the connection type for relational and non-relational sources. Specifies
Relational for relational sources.
You can choose the following connection types for flat file, XML, and MQSeries
sources:
- Queue. Select this connection type to access a MQSeries source if you are using
MQ Source Qualifiers. MQSeries connections must be defined in the Workflow
Manager prior to configuring sessions. For more information, see the PowerCenter
Connect for IBM MQSeries User and Administrator Guide .
- FTP. Select this connection type to use FTP to access the source directory for flat
file and XML sources. If you want to extract data from a flat file or XML source
using FTP, you must specify an FTP connection when you configure source
options. If you select this option, select a configured FTP connection in the Value
column. FTP connections must be defined in the Workflow Manager prior to
configuring sessions. For details on using FTP, see Using FTP on page 559.
- None. Choose None when you want to read from a local flat file or XML file, or if
you are using an associated source for a MQSeries session.
Value
Required
Enter a source connection based on the value you choose in the Type column.
Description
685
Properties Settings
Click the Properties settings to define source property information. The Workflow Manager
displays properties for both relational and file sources.
Figure A-10 displays the Properties settings on the Mapping tab (Sources node):
Figure A-10. Mapping Tab - Sources Node - Properties Settings
Table A-9 describes Properties settings on the Mapping tab for relational sources:
Table A-9. Mapping Tab - Sources Node - Properties Settings (Relational Sources)
686
Relational
Source Options
Required/
Optional
Description
Owner Name
Optional
Optional
Tracing Level
N/A
Select Distinct
Optional
Table A-9. Mapping Tab - Sources Node - Properties Settings (Relational Sources)
Relational
Source Options
Required/
Optional
Pre SQL
Optional
Post SQL
Optional
Sql Query
Optional
Source Filter
Optional
Description
Table A-10 describes the Properties settings on the Mapping tab for file sources:
Table A-10. Mapping Tab - Sources Node - Properties Settings (File Sources)
File Source
Options
Required/
Optional
Source File
Directory
Optional
Enter the directory name in this field. By default, the PowerCenter Server looks
in the server variable directory, $PMSourceFileDir, for file sources.
If you specify both the directory and file name in the Source Filename field,
clear this field. The PowerCenter Server concatenates this field with the Source
Filename field when it runs the session.
You can also use the $InputFileName session parameter to specify the file
directory.
For details on session parameters, see Session Parameters on page 495.
Source Filename
Required
Enter the file name, or file name and path. Optionally use the $InputFileName
session parameter for the file name.
The PowerCenter Server concatenates this field with the Source File Directory
field when it runs the session. For example, if you have C:\data\ in the Source
File Directory field, then enter filename.dat in the Source Filename field.
When the PowerCenter Server begins the session, it looks for
C:\data\filename.dat.
By default, the Workflow Manager enters the file name configured in the source
definition.
For details on session parameters, see Session Parameters on page 495.
Description
687
Table A-10. Mapping Tab - Sources Node - Properties Settings (File Sources)
File Source
Options
Required/
Optional
Source Filetype
Required
Optional
Allows you to configure the file properties. For more information, see Setting
File Properties for Sources on page 688.
Datetime Format*
N/A
Thousand
Separator*
N/A
Decimal Separator*
N/A
Description
*You can view the value of this attribute when you click Show all properties. This attribute is read-only. For more information, see the
Designer Guide.
Select the file type (fixed-width or delimited) you want to configure and click Advanced.
688
Note: Edit these settings only if you need to override those configured in the source definition.
Figure A-12 displays the Fixed Width Properties dialog box for flat file sources:
Figure A-12. Fixed Width Properties
Table A-11 describes the options you define in the Fixed Width Properties dialog box for
sources:
Table A-11. Fixed-Width Properties for File Sources
Fixed-Width
Properties Options
Required/
Optional
Required
Indicates the character representing a null value in the file. This can be any
valid character in the file code page, or any binary value from 0 to 255. For
more information about specifying null characters, see Null Character
Handling on page 227.
Repeat Null
Character
Optional
Code Page
Required
Select the code page of the fixed-width file. The default setting is the client
code page.
Number of Initial
Rows to Skip
Optional
The PowerCenter Server skips the specified number of rows before reading
the file. Use this to skip header rows. One row may contain multiple rows. If
you select the Line Sequential File Format option, the PowerCenter Server
ignores this option.
You can enter any integer from zero to 2147483647.
Description
689
Required/
Optional
Number of Bytes to
Skip Between
Records
Optional
Optional
If selected, the PowerCenter Server strips trailing blank spaces from records
before passing them to the Source Qualifier transformation.
Optional
Select this option if the file uses a carriage return at the end of each record,
shortening the final column.
Description
Figure A-13 displays the Delimited File Properties dialog box for flat file sources:
Figure A-13. Delimited Properties for File Sources
690
Table A-12 describes the options you can define in the Delimited File Properties dialog box
for flat file sources:
Table A-12. Delimited Properties for File Sources
Delimited File
Properties Options
Required/
Optional
Delimiters
Required
Character used to separate columns of data in the source file. Use the
Browse button to the right of this field to enter a different delimiter. Delimiters
can be either printable or single-byte unprintable characters, and must be
different from the escape character and the quote character (if selected). You
cannot select unprintable multibyte characters as delimiters. The delimiter
must be in the same code page as the flat file code page.
Optional Quotes
Required
Description
tomorrow:
Required
Select the code page of the delimited file. The default setting is the client
code page.
Escape Character
Optional
Remove Escape
Character From Data
Optional
This option is selected by default. Clear this option to include the escape
character in the output string.
691
Required/
Optional
Treat Consecutive
Delimiters as One
Optional
Number of Initial
Rows to Skip
Optional
The PowerCenter Server skips the specified number of rows before reading
the file. Use this to skip title or header rows in the file.
Description
Targets Node
The Targets node lists the used in the session and displays their settings. If you want to view
and configure the settings of a specific target, select the target from the list.
You can configure the following settings:
Writers. The Writers settings displays the writer the PowerCenter Server uses with each
target instance. For more information, see Writers Settings on page 692.
Connections. The Connections settings allows you to configure connections for the
targets. For more information, see Connections Settings on page 693.
Properties. The Properties settings allows you to configure the target properties. For more
information, see Properties Settings on page 695.
Writers Settings
You can view and configure the writer the PowerCenter Server uses with each target instance.
The Workflow Manager specifies the necessary writer for each target instance. For relational
targets the writer is Relational Writer and for file targets it is File Writer.
692
Figure A-14 displays the Writers settings on the Mapping tab (Targets node):
Figure A-14. Mapping Tab - Targets Node - Writers Settings
Table A-13 describes the Writers settings on the Mapping tab (Targets node):
Table A-13. Mapping Tab - Targets Node - Writers Settings
Writers
Setting
Required/
Optional
Writers
Required
Description
For relational targets, choose Relational Writer or File Writer. When the target in the
mapping is a flat file, an XML file, a SAP BW target, or MQ target, the Workflow
Manager specifies the necessary writer in the session properties.
When you choose File Writer for a relational target you can use an external loader
to load data to this target. For more information, see External Loading on
page 523.
When you override a relational target to use the file writer, the Workflow Manager
changes the properties for that target instance on the Properties settings. It also
changes the connection options you can define on the Connections settings.
After you override a relational target to use a file writer, define the file properties for
the target. Click Set File Properties and choose the target to define. For more
information, see Configuring Fixed-Width Properties on page 265 and Configuring
Delimited Properties on page 266.
Connections Settings
You can enter connection types and specific target database connections on the Targets node
of the Mappings tab.
Mapping Tab (Transformations View)
693
Figure A-15 displays the Connections settings on the Mapping tab (Targets node):
Figure A-15. Mapping Tab - Targets Node - Connections Settings
694
Table A-14 describes the Connections settings on the Mapping tab (Targets node):
Table A-14. Mapping Tab - Targets Node - Connections Settings
Connections
Settings
Required/
Optional
Type
Required
Enter the connection type for non-relational targets. Specifies Relational for
relational targets.
You can choose the following connection types for flat file, XML, and MQ
targets:
- FTP. Select this connection type to use FTP to access the target directory for
flat file and XML targets. If you want to load data to a flat file or XML target
using FTP, you must specify an FTP connection when you configure target
options. If you select this option, select a configured FTP connection in the
Value column. FTP connections must be defined in the Workflow Manager
prior to configuring sessions. For details on using FTP, see Using FTP on
page 559.
- External Loader. Select this connection type to use the External Loader to
load output files to Teradata, Oracle, DB2, or Sybase IQ databases. If you
select this option, select a configured loader connection in the Value column.
To use this option, you must use a mapping with a relational target definition
and choose File as the writer type on the Writers tab for the relational target
instance. As the PowerCenter Server completes the session, it uses an
external loader to load target files to the Oracle, Sybase IQ, DB2, or Teradata
database. You cannot choose external loader for flat file or XML target
definitions in the mapping.
Note to Oracle 8 users: If you configure a session to write to an Oracle 8
external loader target table in bulk mode with NOT NULL constraints on any
columns, the session may write the null character into a NOT NULL column if
the mapping generates a NULL output.
For details on using the external loader feature, see External Loading on
page 523.
- Queue. Choose Queue when you want to output to an MQSeries message
queue. If you select this option, select a configured MQ connection in the
Value column. For more information, see the PowerCenter Connect for IBM
MQSeries User and Administrator Guide.
- None. Choose None when you want to write to a local flat file or XML file.
Partitions
N/A
Value
Required
Enter a target connection based on the value you choose in the Type column.
Description
Properties Settings
Click the Properties settings to define target property information. The Workflow Manager
displays different properties for the different target types: relational, flat file, and XML.
695
Figure A-16 displays the Properties settings on the Mapping tab for relational targets:
Figure A-16. Mapping Tab - Targets Node - Properties Settings (Relational)
696
Table A-15 describes the Properties settings on the Mapping tab for relational targets:
Table A-15. Mapping Tab - Targets Node - Properties Settings (Relational)
Target Property
Required/
Optional
Required
Insert
Optional
If selected, the PowerCenter Server inserts all rows flagged for insert.
By default, this option is selected.
For details on target update strategies, see Update Strategy
Transformation in the Transformation Guide.
Optional
If selected, the PowerCenter Server updates all rows flagged for update.
By default, this option is selected.
For details on target update strategies, see Update Strategy
Transformation in the Transformation Guide.
Optional
If selected, the PowerCenter Server inserts all rows flagged for update.
By default, this option is not selected.
For details on target update strategies, see Update Strategy
Transformation in the Transformation Guide.
Optional
Delete
Optional
If selected, the PowerCenter Server deletes all rows flagged for delete.
For details on target update strategies, see Update Strategy
Transformation in the Transformation Guide.
Truncate Table
Optional
If selected, the PowerCenter Server truncates the target before loading. For
details on this feature, see Truncating Target Tables on page 245.
Description
697
Required/
Optional
Optional
Enter the directory name in this field. By default, the PowerCenter Server
writes all reject files to the server variable directory, $PMBadFileDir.
If you specify both the directory and file name in the Reject Filename field,
clear this field. The PowerCenter Server concatenates this field with the
Reject Filename field when it runs the session.
You can also use the $BadFileName session parameter to specify the file
directory.
For details on session parameters, see Session Parameters on page 495.
Reject Filename
Required
Enter the file name, or file name and path. By default, the PowerCenter
Server names the reject file after the target instance name:
target_name.bad. Optionally use the $BadFileName session parameter for
the file name.
The PowerCenter Server concatenates this field with the Reject File
Directory field when it runs the session. For example, if you have
C:\reject_file\ in the Reject File Directory field, and enter filename.bad in
the Reject Filename field, the PowerCenter Server writes rejected rows to
C:\reject_file\filename.bad.
For details on session parameters, see Session Parameters on page 495.
Rejected Truncated/
Overflowed rows*
Optional
Update Override*
Optional
Optional
Pre SQL
Optional
Post SQL
Optional
Description
*You can view the value of this attribute when you click Show all properties. This attribute is read-only. For more information, see the
Designer Guide.
698
Table A-16 describes the Properties settings on the Mapping tab for file targets:
Table A-16. Mapping Tab - Targets Node - File Properties Settings
Target Property
Required/
Optional
Description
Merge Partitioned
Files
Optional
When selected, the PowerCenter Server merges the partitioned target files into
one file when the session completes, and then deletes the individual output
files. If the PowerCenter Server fails to create the merged file, it does not
delete the individual output files.
You cannot merge files if the session uses FTP, an external loader, or a
message queue.
For details on configuring a session for partitioning, see Pipeline Partitioning
on page 345.
Merge File
Directory
Optional
Enter the directory name in this field. By default, the PowerCenter Server
writes the merged file in the server variable directory, $PMTargetFileDir.
If you enter a full directory and file name in the Merge File Name field, clear
this field.
Optional
699
700
Required/
Optional
Description
Output File
Directory
Optional
Enter the directory name in this field. By default, the PowerCenter Server
writes output files in the server variable directory, $PMTargetFileDir.
If you specify both the directory and file name in the Output Filename field,
clear this field. The PowerCenter Server concatenates this field with the Output
Filename field when it runs the session.
You can also use the $OutputFileName session parameter to specify the file
directory.
For details on session parameters, see Session Parameters on page 495.
Output Filename
Required
Enter the file name, or file name and path. By default, the Workflow Manager
names the target file based on the target definition used in the mapping:
target_name.out.
If the target definition contains a slash character, the Workflow Manager
replaces the slash character with an underscore.
When you use an external loader to load to an Oracle database, you must
specify a file extension. If you do not specify a file extension, the Oracle loader
cannot find the flat file and the PowerCenter Server fails the session. For more
information about external loading, see Loading to Oracle on page 533.
Enter the file name, or file name and path. Optionally use the $OutputFileName
session parameter for the file name.
The PowerCenter Server concatenates this field with the Output File Directory
field when it runs the session.
For details on session parameters, see Session Parameters on page 495.
Note: If you specify an absolute path file name when using FTP, the
PowerCenter Server ignores the Default Remote Directory specified in the FTP
connection. When you specify an absolute path file name, do not use single or
double quotes.
Reject File
Directory
Optional
Enter the directory name in this field. By default, the PowerCenter Server
writes all reject files to the server variable directory, $PMBadFileDir.
If you specify both the directory and file name in the Reject Filename field,
clear this field. The PowerCenter Server concatenates this field with the Reject
Filename field when it runs the session.
You can also use the $BadFileName session parameter to specify the file
directory.
For details on session parameters, see Session Parameters on page 495.
Reject Filename
Required
Enter the file name, or file name and path. By default, the PowerCenter Server
names the reject file after the target instance name: target_name.bad.
Optionally use the $BadFileName session parameter for the file name.
The PowerCenter Server concatenates this field with the Reject File Directory
field when it runs the session. For example, if you have C:\reject_file\ in the
Reject File Directory field, and enter filename.bad in the Reject Filename
field, the PowerCenter Server writes rejected rows to
C:\reject_file\filename.bad.
For details on session parameters, see Session Parameters on page 495.
Optional
Allows you to configure the file properties. For more information, see Setting
File Properties for Targets on page 701.
Datetime Format*
N/A
Description
Thousand
Separator*
N/A
Decimal Separator*
N/A
Target Property
*You can view the value of this attribute when you click Show all properties. This attribute is read-only. For more information, see the
Designer Guide.
Select the file type (fixed-width or delimited) you want to configure and click Advanced.
701
Figure A-19 displays the Fixed-Width Properties dialog box for flat file targets:
Figure A-19. Fixed-Width Properties for File Targets
Table A-17 describes the options you define in the Fixed Width Properties dialog box:
Table A-17. Fixed-Width Properties for File Targets
Fixed-Width
Properties Options
Required/
Optional
Null Character
Required
Enter the character you want the PowerCenter Server to use to represent
null values. You can enter any valid character in the file code page.
For more information about specifying null characters for target files, see
Null Characters in Fixed-Width Files on page 272.
Optional
Select this option to indicate a null value by repeating the null character to
fill the field. If you do not select this option, the PowerCenter Server enters
a single null character at the beginning of the field to represent a null
value. For more information about specifying null characters for target
files, see Null Characters in Fixed-Width Files on page 272.
Code Page
Required
Select the code page of the fixed-width file. The default setting is the client
code page.
Description
702
Table A-18 describes the options you can define in the Delimited File Properties dialog box
for flat file targets:
Table A-18. Delimited Properties for File Targets
Edit Delimiter
Options
Required/
Optional
Delimiters
Required
Character used to separate columns of data. Use the Browse button to the right
of this field to enter a non-printable delimiter. Delimiters can be either printable
or single-byte unprintable characters, and must be different from the escape
character and the quote character (if selected). You cannot select unprintable
multibyte characters as delimiters.
Optional Quotes
Required
Code Page
Required
Select the code page of the delimited file. The default setting is the client code
page.
Description
Transformations Node
On the Transformations node, you can override properties that you configure in
transformation and target instances in a mapping. The attributes you can configure depends
on the type of transformation you select.
703
704
Partition Properties. For more information, see Partition Properties Node on page 705.
Partition Points. For more information, see Partition Points Node on page 706.
705
KeyRange Node
In the KeyRange node, you can configure the partition range for key-range partitioning.
Select Edit Keys to edit the partition key. For more information, see Edit Partition Key on
page 708.
Figure A-23 displays the KeyRange node on the Mapping tab:
Figure A-23. Mapping Tab - KeyRange Node
HashKeys Node
The HashKeys node you can configure hash key partitioning. Select Edit Keys to edit the
partition key. For more information, see Edit Partition Key on page 708.
706
For more information about partitioning a pipeline, see Pipeline Partitioning on page 345.
Figure A-24 displays Mapping tab - Partition Points node:
Figure A-24. Mapping Tab - Partition Points Node
Description
Click to add a new partition point to the Transformation list. For information on adding partition
points, see Adding and Deleting Partition Points on page 353.
Delete Partition
Point
Click to delete the current partition point. You cannot delete certain partition points. For details,
see Adding and Deleting Partition Points on page 353.
Edit Keys
Click to add, remove, or edit the key for key range or hash user keys partitioning. This button is
not available for auto-hash, round-robin, or pass-through partitioning.
For more information on adding keys and key ranges, see Adding Keys and Key Ranges on
page 358.
707
Table A-20 describes the options in the Edit Partition Point dialog box:
Table A-20. Edit Partition Point Dialog Box Options
Edit Partition Point
Options
Description
Add button
Click to add a partition. You can add up to 64 partitions. For more information on
adding partitions, see Adding and Deleting Partitions on page 356.
Delete button
Click to delete the selected partition. For more information on deleting partitions, see
Adding and Deleting Partitions on page 356.
Name
Partition number.
Description
Select a partition type from the list. For more information, see Specifying Partition
Types on page 356.
708
You can specify one or more ports as the partition key. To rearrange the order of the ports that
make up the key, select a port in the Selected Ports list and click the up or down arrow.
For information on adding a key for key range partitioning, see Key Range Partition Type
on page 363. For information on adding a key for hash partitioning, see Hash Keys Partition
Types on page 361.
709
Components Tab
In the Components tab, you can configure pre-session shell commands, post-session
commands, and email messages if the session succeeds or fails.
Figure A-27 displays the Components Tab:
Figure A-27. Components Tab
710
Optional/
Required
Task
n/a
Tasks you can perform in the Components tab. You can configure pre- or postsession shell commands and success or failure email messages in the
Components tab.
Type
Required
Select None if you do not want to configure commands and emails in the
Components tab.
For pre- and post-session commands, select Reusable to call an existing
reusable Command task as the pre- or post-session shell command. Select
Non-Reusable to create pre- or post-session shell commands for this session
task.
For success or failure emails, select Reusable to call an existing Email task as
the success or failure email. Select Non-Reusable to create email messages
for this session task.
Value
Optional
Description
Required/
Optional
Pre-Session
Command
Optional
Post-Session
Success Command
Optional
Shell commands that the PowerCenter Server performs after the session
completes successfully. For details on using pre-session shell commands, see
Using Pre- or Post-Session Shell Commands on page 188.
Post-Session
Failure Command
Optional
Shell commands that the PowerCenter Server performs after the session if the
session fails. For details on using pre-session shell commands, see Using
Pre- or Post-Session Shell Commands on page 188.
On Success Email
Optional
On Failure Email
Optional
The PowerCenter Server sends On Failure email message if the session fails.
Description
Components Tab
711
Click the Override button to override the Run If Previous Completed option in the
Command task. For details on the Run If Previous Completed option, see Table A-24 on
page 714.
712
Table A-23 describes General tab for editing pre- or post-session shell commands:
Table A-23. Pre- or Post-Session Commands - General Tab
General Tab for
Pre- or PostSession
Commands
Required/
Optional
Description
Name
Required
Make Reusable
Required
Select Make Reusable to create a reusable Command task from the pre- or
post-session shell commands.
Clear the Make Reusable option if you do not want the Workflow Manager to
create a reusable Command task from the shell commands.
For details on creating Command tasks from pre- or post-session shell
commands, see Creating a Reusable Command Task from Pre- or PostSession Commands on page 191.
Description
Optional
Components Tab
713
Table A-24 describes the Properties tab for editing pre- or post-session commands:
Table A-24. Pre- or Post-Session Commands - Properties Tab
Properties Tab
for Pre- or PostSession
Commands
Required/
Optional
Description
Name
Required
Run If Previous
Completed
Required
Select this option if you want the PowerCenter Server to perform the next
command only if the previous command completed successfully.
Table A-25 describes the Commands tab for editing pre- or post-session commands:
Table A-25. Pre- or Post-Session Commands - Commands Tab
Commands Tab
for Pre- or PostSession
Commands
Required/
Optional
Description
Name
Required
Command
Required
The shell command you want the PowerCenter Server to perform. Enter one
command for each line. You can use session parameters or server variables in
shell commands.
If your command contains spaces, enclose the command in quotes. For
example, if you want to call c:\program files\myprog.exe, you must enter
c:\program files\myprog.exe, including the quotes. Enter only one command
on each line.
Reusable Email
Select Reusable in the Type field for the On-Success or On-Failure email if you want to select
an existing Email task as the On-Success or On-Failure email. The Email Object Browser
appears when you click the right side of the Values field.
714
Select an Email task to use as On-Success or On-Failure email. Click the Override button to
override properties of the email. For more information about email properties, see Table A-27
on page 717.
Non-Reusable Email
Select Non-Reusable in the Type field to create a non-reusable email for the session. NonReusable emails do not appear as Email tasks in the Task folder. Click the right side of the
Values field to edit the properties for the non-reusable On-Success or On-Failure emails. For
more information about email properties, see Table A-27 on page 717.
Email Properties
You configure email properties for On-Success or On-Failure Emails when you override an
existing Email task or when you create a non-reusable email for the session.
Components Tab
715
Figure A-31 displays the dialog box for editing the On-Success or On-Failure email
properties:
Figure A-31. On-Success or On-Failure Email - General Tab
Table A-26 describes general settings for editing On-Success or On-Failure emails:
Table A-26. On-Success or On-Failure Emails - General Tab
716
Email Settings
Required/
Optional
Description
Name
Required
Description
Required
Table A-27 describes the email properties for On-Success or On-Failure emails:
Table A-27. On-Success or On-Failure Emails - Properties Tab
Email Properties
Required/
Optional
Required
Email subject
Optional
Email text
Optional
Enter the text of the email. You can use several variables when creating this
text to convey meaningful information, such as the session name and session
status. For details, see Sending Email on page 319.
Description
Components Tab
717
The Metadata Extensions tab allows you to create and promote metadata extensions. For
information on creating metadata extensions, see Metadata Extensions in the Repository
Guide.
Table A-28 describes the configuration options for the Metadata Extensions tab:
Table A-28. Metadata Extensions Tab
718
Metadata
Extensions Tab
Options
Required/
Optional
Extension Name
Required
Datatype
Required
Description
Required/
Optional
Value
Optional
Precision
Required for
string and
XML objects
Reusable
Required
Select to make the metadata extension apply to all objects of this type
(reusable). Clear to make the metadata extension apply to this object only
(non-reusable).
Description
Optional
Description
719
720
Appendix B
Workflow Properties
Reference
This appendix contains a listing of settings in the workflow properties. These settings are
grouped by the following tabs:
721
General Tab
You can change the workflow name and enter a comment for the workflow on the General
tab. By default, the General tab appears when you open the workflow properties.
Figure B-1 displays the General tab of the workflow properties:
Figure B-1. Workflow Properties - General Tab
Select a
PowerCenter Server
to run the workflow.
Select a suspension
email.
722
General Tab
Options
Required/
Optional
Description
Name
Required
Comments
Optional
Server
Required
Optional
Requires all workflow tasks to run on the PowerCenter Server that you
select.
Suspension Email
Optional
Select a reusable email task for the suspension email. When a task fails,
the PowerCenter Server suspends the workflow and sends the
suspension email.
For details on suspending workflows, see Suspending the Workflow on
page 127.
Disabled
Optional
Required/
Optional
Suspend On Error
Optional
Web Services
Optional
Description
General Tab
723
Properties Tab
Configure parameter file name and workflow log options in the Properties tab.
Figure B-2 displays the Properties tab:
Figure B-2. Workflow Properties - Properties Tab
724
Properties Tab
Options
Required/
Optional
Parameter File
Name
Optional
Designates the name and directory for the parameter file. Use the parameter
file to define workflow parameters. For details on parameter files, see
Parameter Files on page 511.
Optional
Description
Required/
Optional
Required
Designates a location for the workflow log file. By default, the PowerCenter
Server writes the log file in the server variable directory, $PMWorkflowLogDir.
If you enter a full directory and file name in the Workflow Log File Name field,
clear this field.
Required
If you select Save Workflow Log by Timestamp, the PowerCenter Server saves
all workflow logs, appending a timestamp to each log.
If you select Save Workflow Log by Runs, the PowerCenter Server saves a
designated number of workflow logs. Configure the number of workflow logs in
the Save Workflow Log for These Runs option.
For details on these options, see Archiving Workflow Logs on page 459.
You can also use the $PMWorkflowLogCount server variable to save the
configured number of workflow logs for the PowerCenter Server.
Required
The number of historical workflow logs you want the PowerCenter Server to
save.
The Informatica saves the number of historical logs you specify, plus the most
recent workflow log. Therefore, if you specify 5 runs, the PowerCenter Server
saves the most recent workflow log, plus historical logs 04, for a total of 6
logs.
You can specify up to 2,147,483,647 historical logs. If you specify 0 logs, the
PowerCenter Server saves only the most recent workflow log.
Description
Properties Tab
725
Scheduler Tab
The Scheduler Tab allows you to schedule a workflow to run continuously, run at a given
interval, or manually start a workflow. For details on scheduling workflows, see Scheduling a
Workflow on page 112.
Figure B-3 displays the Scheduler tab:
Figure B-3. Workflow Properties - Scheduler Tab
Edit
scheduler
settings.
726
Required/
Optional
Non-Reusable/Reusable
Required
Scheduler
Required
Description
Optional
Summary
N/A
Description
Scheduler Tab
727
Table B-4 describes the settings on the Edit Scheduler dialog box:
Table B-4. Workflow Properties - Scheduler Tab - Edit Scheduler Dialog Box
Scheduler Options
728
Required/
Optional
Description
Optional
Optional
Edit
Optional
Start Date
Optional
Start Time
Optional
Optional
Required/
Optional
Repeat Every
Required
Enter the numeric interval you want to schedule the workflow, then select Days,
Weeks, or Months, as appropriate.
If you select Days, select the appropriate Daily Frequency settings.
If you select Weeks, select the appropriate Weekly and Daily Frequency
settings.
If you select Months, select the appropriate Monthly and Daily Frequency
settings.
Weekly
Optional
Required to enter a weekly schedule. Select the day or days of the week on
which you want to schedule the workflow.
Description
Scheduler Tab
729
730
Repeat Option
Required/
Optional
Monthly
Optional
Daily
Required
Enter the number of times you would like the PowerCenter Server to run the
workflow on any day the session is scheduled.
If you select Run Once, the PowerCenter Server schedules the workflow once
on the selected day, at the time entered on the Start Time setting on the Time
tab.
If you select Run Every, enter Hours and Minutes to define the interval at which
the PowerCenter Server runs the workflow. The PowerCenter Server then
schedules the workflow at regular intervals on the selected day. The
PowerCenter Server uses the Start Time setting for the first scheduled
workflow of the day.
Description
Variables Tab
Before you can use workflow variables, you must declare them in the Variables tab.
Figure B-6 displays the settings on the Variables tab:
Figure B-6. Workflow Properties - Variables Tab
Required/
Optional
Description
Name
Required
Datatype
Required
Persistent
Required
Indicates whether the PowerCenter Server maintains the value of the variable
from the previous workflow run.
Is Null
Required
Default
Optional
Description
Optional
Variables Tab
731
Events Tab
Before you can use the Event-Raise task, declare a user-defined event in the Events tab.
Figure B-7 displays the Events Tab:
Figure B-7. Workflow Properties - Events Tab
732
Events Tab
Options
Required/
Optional
Description
Events
Required
Description
Optional
The Metadata Extensions tab allows you to create and promote metadata extensions. For
information on creating metadata extensions, see Metadata Extensions in the Repository
Guide.
Table B-8 describes the configuration options for the Metadata Extensions tab:
Table B-8. Workflow Properties - Metadata Extensions Tab
Metadata
Extensions Tab
Options
Required/
Optional
Extension Name
Required
Datatype
Required
Value
Optional
An optional value.
For a numeric metadata extension, the value must be an integer.
For a boolean metadata extension, choose true or false.
For a string or XML metadata extension, click the Edit button on the right side
of the Value field to enter a value of more than one line. The Workflow Manager
does not validate XML syntax.
Description
733
734
Required/
Optional
Description
Precision
Required for
string and
XML objects
Reusable
Required
Select to make the metadata extension apply to all objects of this type
(reusable). Clear to make the metadata extension apply to this object only
(non-reusable).
UnOverride
Optional
This column appears only if the value of one of the metadata extensions was
changed. To restore the default value, click Revert.
Description
Optional
Appendix C
Overview, 736
735
Overview
The Workflow Manager and Workflow Monitor replace the Server Manager in PowerCenter
5.x and PowerMart 5.x. This appendix compares session properties in the Server Manager
with session and workflow options in the Workflow Manager. It lists the session properties as
they appeared on the session properties in the Server Manager. It then gives the corresponding
options in the Workflow Manager.
The session properties for the Server Manager contain the following tabs:
736
General tab
Time tab
Transformations tab
Partitions tab
General Tab
In the Server Manager, the General tab appeared when you opened the session properties. In
the Workflow Manager, the General tab appears when you open the session properties in the
Task Developer or the Workflow Designer.
Figure C-1 shows the Server Manager General tab:
Figure C-1. Server Manager General Tab
In the Server Manager, you configured the following options from the General tab:
General options
Source options
Target options
Session commands
Performance
General Options
In the Server Manager, you could configure the Session Name field, Server Name, and the
Session Enabled option on the General tab of the session properties.
In the Workflow Manager, these options are on either the General tab of the session
properties or in the workflow properties.
General Tab
737
Table C-1 compares general session options for the Server Manager with the corresponding
options for Workflow Manager:
Table C-1. General Session Options Comparison
Server Manager General Tab
Properties
Session Name
Server Name
Session Enabled
General tab-Disable this task. You can only view this property when you edit
the session instance from the Workflow Designer.
Source Options
In the Server Manager, Source options appeared under the Session Name field on the General
tab.
In the Workflow Manager, source options appear under the Sources node on the Mapping tab
(Transformations view). The Sources node contains connections, properties, and readers
settings.
Table C-2 compares Source options for the Server Manager with the corresponding properties
for the Workflow Manager:
Table C-2. Source Options Comparison
Server Manager General Tab-Source
Options Properties
Source Type
Treat Rows As
Source Database
738
Figure C-2 shows the Server Manager Source Options Dialog Box for File Sources:
Figure C-2. Server Manager Source Options Dialog Box for File Sources
Table C-3 compares source options for file sources for the Server Manager with the
corresponding options for the Workflow Manager:
Table C-3. File Source Options Comparison
Server Manager General Tab-Source
Options Properties
Source Directory
File Name
File Type
File List
FTP File
N/A
General Tab
739
740
Figure C-4 shows the Server Manager Delimited File Properties dialog box:
Figure C-4. Server Manager Delimited File Properties Dialog Box
General Tab
741
Table C-4 compares XML source options for the Server Manager with the corresponding
options for the Workflow Manager:
Table C-4. XML Sources Options Comparison
Server Manager XML Source Options
Properties
Source Directory
File Name
Code Page
File List
FTP File
FTP Properties
In the Server Manager, the FTP Properties dialog box appeared when you edited FTP
properties.
In the Workflow Manager, the FTP Connection Editor appears when you choose FTP as the
connection type from the Sources tab, click the Edit button on the right side of the Value
field, and then click Override to edit the FTP properties.
Figure C-6 shows the Server Manager FTP Properties dialog box:
Figure C-6. Server Manager FTP Properties Dialog Box
742
Table C-5 compares FTP properties for the Server Manager with the corresponding options
for the Workflow Manager:
Table C-5. FTP Properties Comparison
Server Manager FTP Properties
Connection Name
Target Options
In the Server Manager target options appeared on the General tab. In the target options, you
could select the target type for the session, configure reject file names, and create database
connection session parameters in the target options.
In the Workflow Manager, the Mapping tab-Transformations view-Targets node contains
connections, properties, and writers settings.
Table C-6 compares target options for the Server Manager with the corresponding options for
Workflow Manager:
Table C-6. Target Options Comparison
Server Manager General Tab-Target
Table Properties
Target Type
Properties in the Target Options dialog box are located on the Mapping
tab-Transformations view-Targets node-Properties settings.
General Tab
743
Properties in the Rejects Options dialog box are located on the Mapping
tab-Transformations view-Targets node-Properties settings.
Target Database
Table C-7 compares relational target options for the Server Manager with the corresponding
options for the Workflow Manager:
Table C-7. Relational Target Options Comparison
744
Insert
Delete
Truncate Table
Normal/Bulk
Test Load
Output Files
In the Server Manager, the Output Files dialog box appeared when you selected a file target
type, then clicked Target Options on the General tab.
In the Workflow Manager, output file target options appear on the Mapping tabTransformations view. The Targets node contains connections, properties, and writer settings.
Figure C-8 shows the Server Manager Output Files dialog box:
Figure C-8. Server Manager Output Files Dialog Box
General Tab
745
Table C-8 compares output file options for the Server Manager with the corresponding
options for the Workflow Manager:
Table C-8. File Target Output Options Comparison
Server Manager General Tab-Output
Files Properties
Directory
File Name
FTP file
Loader
Fixed Width/Delimited
746
Figure C-9 shows the Server Manager External Loader Properties dialog box:
Figure C-9. Server Manager External Loader Properties
Fixed-Width Properties
In the Server Manager, the Fixed-Width dialog box appeared when you configured a session
to write to a fixed-width target file, and then clicked Edit Null Character.
In the Workflow Manager, you can access the Fixed-Width Properties dialog box from the
Properties settings of the Mappings tab. Click Set File Properties, and select Fixed-Width.
Figure C-10 shows the Server Manager Fixed-Width dialog box:
Figure C-10. Server Manager Fixed-Width Dialog Box (Output Files)
General Tab
747
Figure C-11 shows the Server Manager Delimited File Properties dialog box:
Figure C-11. Server Manager Delimited File Properties Dialog Box (Output Files)
XML Targets
In the Server Manager, the XML Target dialog box appeared when you selected an XML file
target type, then clicked Target Options.
In the Workflow Manager, you can access the XML Target dialog box from the Properties
settings of the Mappings tab. Click Set File Properties.
Figure C-12 shows the Server Manager XML Target dialog box:
Figure C-12. Server Manager XML Target Dialog Box
Table C-9 compares XML target options for the Server Manager with the corresponding
options for Workflow Manager:
Table C-9. XML Target Options Comparison
748
Directory
File Name
Code Page
FTP File
Reject Files
In the Server Manager, the Reject Files dialog box appeared when you clicked Reject Options
on the General tab.
In the Workflow Manager, the reject file options appear in the Targets node Properties
settings on the Mapping tab.
Figure C-13 shows the Server Manager Reject File dialog box:
Figure C-13. Server Manager Reject File Dialog Box
Table C-10 compares Reject Files options for the Server Manager with the corresponding
options for Workflow Manager:
Table C-10. Reject Files Options Comparison
Server Manager General tab-Reject
File Properties
File Name
General Tab
749
Session Commands
In the Server Manager, session commands appeared under the Server Name field on the
General tab. You could enter pre-session shell commands, post-session commands and
separate email messages if the session succeeded or failed.
In the Workflow Manager, session commands appear on the Components tab.
Pre-Session Commands
In the Server Manager, the Pre-Session Commands dialog box appeared when you clicked PreSession on the General tab of the session properties.
In the Workflow Manager, pre-session command options appear on the Components tab.
Figure C-14 shows the Server Manager Pre-Session Commands dialog box:
Figure C-14. Server Manager Pre-Session Commands Dialog Box
Table C-11 compares session command options for the Server Manager with the
corresponding options for the Workflow Manager:
Table C-11. Pre-Session Commands Comparison
Server Manager General Tab-Session
Commands Pre-Session Properties
Description
Components tab. Click the Edit button on the right side of the Value field
for Pre-Session Commands. Enter the description in the General tab of
the Edit Pre-Session Commands dialog box.
Command
Components tab. Click the Edit button on the right side of the Value field
for Pre-Session Commands. Enter the command in the Command tab of
the Edit Pre-Session Commands dialog box.
750
Figure C-15 shows the Server Manager Post-Session Commands and Email dialog box:
Figure C-15. Server Manager Post-Session Commands and Email
Table C-12 compares post-session command and email options for the Server Manager with
the corresponding options for the Workflow Manager:
Table C-12. Post-Session Commands and Email Comparison
Server Manager General Tab-PostSession Commands And Email
Properties
Description
Components tab. Click the Edit button on the right side of the Value field
for Post-Session Commands. Enter the description in the General tab of
the Edit Post-Session Commands dialog box.
Command
Components tab. Click the Edit button on the right side of the Value field
for Post-Session Commands. Enter the command in the Command tab
of the Edit Post-Session Commands dialog box.
Success
Failure
Components tab. Click the Edit button on the right side of the Value field
for On Success Email or On Failure Email. Enter the email user name in
the Properties tab of the Edit Success Email or Edit Failure Email dialog
box.
Email Subject
Components tab. Click the Edit button on the right side of the Value field
for On Success Email or On Failure Email. Enter the email subject in the
Properties tab of the Edit Success Email or Edit Failure Email dialog
box.
Email Text
Components tab. Click the Edit button on the right side of the Value field
for On Success Email or On Failure Email. Enter the email text in the
Properties tab of the Edit Success Email or Edit Failure Email dialog
box.
General Tab
751
Performance Options
In the Server Manager, Performance options appeared under Session Commands on the
General tab. In Performance options you could increase memory size, selected performance
details, and set configuration parameters. In the Workflow Manager, Performance options
appear on the Properties tab in the session properties.
Table C-13 compares performance options for the Server Manager with the corresponding
options for the Workflow Manager:
Table C-13. Performance Options Comparison
Server Manager General TabPerformance Properties
Configuration Parameters
In the Server Manager, the Configuration Parameters dialog box appeared when you clicked
Advanced Options on the General tab. In the Configuration Parameters dialog box, you could
configure the DTM memory parameters, general parameters, reader parameters, and eventbased scheduling.
In the Workflow Manager, the configuration parameters options appear on multiple tabs.
Figure C-16 shows the Server Manager Configuration Parameter dialog box:
Figure C-16. Server Manager Configuration Parameter Dialog Box
752
Table C-14 compares configuration parameters for the Server Manager with the
corresponding options for the Workflow Manager:
Table C-14. Configuration Parameters Comparison
Server Manager Advanced Option
Properties
Commit Interval
Event Wait Task-Events tab-Pre Defined Event. Enter the name of the
file to watch.
General Tab
753
754
Time Tab
In the Server Manager, the Time tab appeared after the General tab unless the session was
heterogeneous. If the session was heterogeneous, the Time tab appeared after the Source
Location tab.
In the Workflow Manager, the Schedule tab contains workflow scheduling options. To
configure reusable scheduler options, select Workflows-Schedulers from the menu. To
configure non-reusable schedule options, select Edit-Workflow to open workflow properties
and click the Schedule tab.
Figure C-18 shows the Server Manager Time tab:
Figure C-18. Server Manager Time tab
In the Server Manager, you configured the following options from the Time tab:
Schedule options
Start options
Duration options
Batch option
Schedule Options
In the Server Manager, you used the Schedule options on the Time tab of the session
properties to schedule the frequency of a session run.
Time Tab
755
In the Workflow Manager, you use the Run Options and Schedule Options on the Schedule
tab of the Scheduler properties to schedule the frequency of a workflow run.
Repeat Options
In the Server Manager, the Repeat dialog box appeared when you selected Customized
Repeat, then clicked Edit on the Time tab.
In the Workflow Manager, the Customized Repeat dialog box appears when you schedule a
session to run on server initialization, select Customized Repeat, and then click Edit.
Figure C-19 shows the Server Manager Repeat dialog box:
Figure C-19. Server Manager Repeat Dialog Box
Start Options
In the Server Manager, the Start options appeared below the Schedule options on the Time
tab. In the Start options, you could select the session start date and session start time.
In the Workflow Manager, the Start options appear on the Schedule tab of the workflow
properties.
Duration Options
In the Server Manager, Duration options appeared next to Start options on the Time tab. In
Duration options, you could set the end date of a session run, the number of session runs, or
schedule a session to run forever as long as it was successful.
In the Workflow Manager, End options appear next to Start options on the Scheduler tab of
the workflow properties.
756
Time Tab
757
In the Server Manager, on the Log and Error Handling tab you could configure the following
options:
758
Table C-15 compares the Log File options for Server Manager with the corresponding options
for the Workflow Manager:
Table C-15. Log File Options Comparison
Server Manager Log and Error
Properties
Stop On
Perform Recovery
759
760
Override Tracing
Transformations Tab
In the Server Manager, the Transformations tab appeared on the session properties after the
Log and Error Handling tab.
In the Workflow Manager, the settings for transformations appear on the Mapping tabTransformations view.
Figure C-21 shows the Server Manager Transformations tab:
Figure C-21. Server Manager Transformations Tab
Table C-17 compares the Transformations tab options for Server Manager with the
corresponding options for the Workflow Manager:
Table C-17. Transformations Tab Options Comparison
Server Manager Transformations Tab
Properties
Aggregate Behavior
Sort Order
Transformations Tab
761
Partitions Tab
In the Server Manager, the Partitions tab appeared in the session properties after the
Transformations tab.
In the Workflow Manager, the settings for partitioning appear on the Mapping tab-Partitions
view. For more information about partitioning, see Configuring Partitioning Information
on page 351.
762
Index
A
ABORT function
See also Transformation Language Reference
session failure 200
aborted status 421
aborting
Control tasks 147
server handling 129
sessions 130
status 421
tasks 129
tasks in Workflow Monitor 418
workflows 129
Aborttask
pmcmd syntax 596
Abortworkflow
pmcmd syntax 597
absolute time
specifying 162
Timer task 161
active sources
constraint-based loading 248
defined 259
generating commits 278
row error logging 260
source-based commit 278
transaction generators 259
XML targets 259
adding
tasks 92
advanced settings
session properties 675
aggregate caches
calculating the data cache 622
calculating the index cache 621
overview 621
reinitializing 576, 674
aggregate files
deleting 577
moving 577
aggregate function calls
minimizing 652
Aggregator transformation
cache options 621
cache partitioning 621
caches 26, 34
data cache 622
index cache 621
optimizing performance 650
optimizing with Sorted Input 651
partitioning guidelines 347
performance detail 639
allocating memory
XML sources 655
AND links 137
archiving
session logs 471
763
B
$BadFile
definition 508
naming convention 496, 520
using 509
blocking
definition 23
blocking source data
PowerCenter Server handling 23
buffer block size
configuring 677
optimizing 655, 657
buffer memory
allocating 655
buffer blocks 25
DTM process 25
bulk loading
commit interval 253
data driven session 252
DB2 642
DB2 guidelines 253
Oracle 643
Oracle guidelines 253
session properties 252, 697
Sybase IQ 643
targets 642
test load 244
using user-defined commit 283
764
Index
C
cache files
locating 577
naming convention 615
permissions 28
cache partitioning
Aggregator transformation 621
described 359
incremental aggregation 621
Joiner transformation 624
Lookup transformation 391
Rank transformation 620
caches
Aggregator transformation 621
calculating Aggregator data cache 622
calculating Aggregator index cache 621
calculating Joiner data cache 626
calculating Joiner index cache 625
calculating Lookup data cache 631
calculating Lookup index cache 629
calculating Rank data cache 633
calculating Rank index cache 632
default directory 34
files for index and data 614
files, overview 34
Joiner transformation 624
Lookup transformation 628
memory 26, 614
memory usage 26
optimizing 658
overview 28, 614
resetting with real-time sessions 288
session cache files 614
transformation 34
caching
lookup functions 676
Char datatypes
removing trailing blanks for optimization 653
check point interval
optimizing 642
checking in
versioned objects 74
checking out versioned objects 74
COBOL sources
error handling 227
numeric data handling 229
code page compatibility
See also Installation and Configuration Guide
multiple file sources 230
targets 235
code pages
See also Installation and Configuration Guide
data movement modes 27
database connections 54, 234
delimited source 224
delimited target 267, 703
external loader files 524
fixed-width sources 222
fixed-width target 266, 702
relaxed validation 55
validation 12
viewing the session log 475
color
setting 42
workspace 42
command line mode for pmcmd
connecting 589
return codes 590
using 589
command line program See pmcmd
Command task
multiple UNIX commands 145
Command tasks
creating 143
definition 143
description 132
executing commands 145
promoting to reusable 145
Run if Previous Completed 145
using server variables 188, 193
using session parameters 143
comments
adding in Expression Editor 97
commit interval
bulk loading 253
configuring 292
description 276
optimizing 655, 658
source- and target-based 276
commit source
source-based commit 278
commit type
configuring 672
committing data
target connect groups 278
transaction control 283
common logic
factoring 652
comparing objects
See also Designer Guide
See also Repository Guide
sessions 79
tasks 79
workflows 79
worklets 79
Components tab
properties 710
concurrent connections
in partitioned pipelines 379
Config Object tab
properties 675
configuring
error handling options 493
connect string
examples 54
syntax 54
connection objects
See also Repository Guide
assigning permissions 51
definition 51
deleting 59
connection settings
applying to all session instances 180
targets 695
connections
copy as 59, 60
copying a relational database connection 59
external loader 551
FTP 561
multiple targets 274
relational database 56
replacing a relational database connection 62
sources 211
targets 237
connectivity
See also Installation and Configuration Guide
connect string examples 54
overview 5
server grids 447
constraint-based loading
active sources 248
configuring 248
enabling 251
key relationships 248
session property 676
target connection groups 249
Update Strategy transformations 249
control file
overriding Teradata 539
overview 33
permissions 28
Index
765
Control tasks
definition 147
description 132
options 148
stopping or aborting the workflow 129
copying
repository objects 77
counters
BufferInput_efficiency 640
BufferOutput_efficiency 640
overview 437
Rowsinlookupcache 639
Transformation_errorrows 639
Transformation_readfromdisk 639
Transformation_writetodisk 639
CPU usage
PowerCenter Server 24
creating
external loader connections 551
FTP sessions 565
server grids 451
sessions 175
workflows 91
CUME
partitioning restrictions 395
Custom transformation
partitioning guidelines 396
customized repeat
daily 117
editing 115
monthly 117
options 116
repeat every 117
weekly 117
D
data
capturing incremental source changes 574, 579
data caches
Aggregator transformation 622
description 614
for incremental aggregation 577
memory usage 26
optimizing 655, 658
Rank transformation 633
data driven
bulk loading 252
data files
creating directory 579
766
Index
finding 577
data flow
See pipeline
data movement mode
See also ASCII mode
See also Installation and Configuration Guide
See also Unicode mode
affecting incremental aggregation 577
overview 27
database connections
See also Installation and Configuration Guide
configuring 56
copying a relational database connection 59
domain name 58
packet size 58
privileges required to create 53
replacing a relational database connection 62
rollback segment 58
session parameter 499
use trusted connection 58
using Oracle OS Authentication 53
databases
connection requirements 57
connectivity overview 46
environment SQL 55
optimizing sources 645
optimizing targets 642
selecting code pages 54
setting up connections 53
datatypes
See also Designer Guide
Char 653
Decimal 269
Double 269
Float 269
Integer 269
minimizing conversions 648
Money 269
Numeric 269
padding bytes for fixed-width targets 268
Real 269
Varchar 653
dates
configuring 38
formats 38
DB2
bulk loading 642
bulk loading guidelines 253
commit interval 253
See IBM DB2
$DBConnection
definition 499
naming convention 496, 520
using 499
deadlock
retry session 674
deadlock retry
See also Installation and Configuration Guide
configuring 246
target connection groups 257
Debugger
restrictions in partitioned pipelines 396
decimal arithmetic
See high precision
Decision tasks
creating 151
decision condition variable 149
definition 149
description 132
example 149
using Expression Editor 96
variables in 103
DECODE function
See also Transformation Language Reference
using for optimization 653
default remote directories
for FTP connections 561
deleting
connection objects 59
servers 50
workflows 97
delimited flat files
code page 691
code page, sources 224
code page, targets 267
consecutive delimiters 692
escape character 691
escape character, sources 224
numeric data handling 229
quote character 691
quote character, sources 224
quote character, targets 267
session properties, sources 222
session properties, targets 266
sources 691
delimited sources
number of rows to skip 692
delimited targets
session properties 703
delimiter
session properties, sources 222
E
edit
delimiter 690
edit null characters
session properties 702
editing
delimiter 702
Index
767
768
Index
events
in worklets 167
pre-defined events 153
user-defined events 153
Event-Wait tasks
definition 153
description 132
for pre-defined events 158
for user-defined events 157
waiting for past events 159
working with 156
Expression Editor
adding comments 97
displaying 97
syntax colors 97
using 96
validating 119
validating expressions using 97
expressions
optimizing 652
validating 97
external loader
behavior 526
code page 524
connections 551
DB2 528
error messages 527
loading multibyte data 533, 535
on Windows systems 526
Oracle 533
overview 524
performance 643
permissions 525
PowerCenter Server support 524
privileges required to create connection 525
session properties 682, 695
setting up Workflow Manager 553
Sybase IQ 535
Teradata 538
using with partitioned pipeline 380
External Procedure transformation
See also Designer Guide
partitioning guidelines 396
F
fail parent workflow 138
failed status 421
failing workflows
failing parent workflows 148
Index
769
770
Index
G
Gantt Chart
configuring 411
filtering 405
listing tasks and workflows 424
navigating 425
opening and closing folders 407
organizing 425
overview 402
searching 427
using 423
zooming 426
general options
arranging workflow vertically 40
configuring 39
in-place editing 40
launching Workflow Monitor 41
open editor 41
panning windows 40
receive notification from server 41
reload task or workflow 40
session properties 668
show expression on a link 41
show full name of task 41
General tab in session properties
FTP properties 742
in Server Manager 737
in Workflow Manager 668
session commands 750
source options 738
target options 743
General tab of session properties
general options 737
performance options 752
generating
commits with source-based commit 278
Getrunningsessionsdetails
pmcmd syntax 598
Getserverdetails
pmcmd syntax 599
Getserverproperties
pmcmd syntax 599
Getsessionstatistics
pmcmd syntax 600
Gettaskdetails
pmcmd syntax 601
Getworkflowdetails
pmcmd syntax 601
globalization
See also Installation and Configuration Guide
H
hash partitioning
adding hash keys 362
hash auto-keys partitioning 361
hash user keys partitioning 362
overview 348, 361
Help
pmcmd syntax 602
heterogeneous sources
defined 208
heterogeneous targets
overview 274
high precision
disabling 658
enabling 674
handling 204
optimizing 655
history names
in Workflow Monitor 419
host names
for FTP connections 561
registering the PowerCenter Server 49
I
IBM DB2
connect string example 54
icon
Workflow Monitor 404
worklet validation 171
IIF expressions
See also Transformation Language Reference
optimizing 653
incremental aggregation
See also Installation and Configuration Guide
cache partitioning 621
changing server code page 577
changing server data movement mode 577
changing session sort order 577
configuring 674
configuring the session 579
deleting files 577
files 34
moving files 577
overview 574
J
joiner cache
overview 624
Joiner transformation
cache partitioning 624
caches 26, 34, 624
joining sorted flat files 385
joining sorted relational data 387
optimizing 651
optimizing performance 650
Index
771
K
key constraints
optimizing by dropping 642
key range partitioning 348, 363
keys
constraint-based loading 248
L
launch
Workflow Monitor 41, 404
line sequential buffer length
configuring 677
sources 225
links
AND 137
condition 92
example link condition 94
linking tasks concurrently 93
linking tasks sequentially 94
loops 92
OR 137
show expression on a link 41
show solid lines 42
specifying condition 94
using Expression Editor 96
variables in 103
working with 92
List Tasks
in Workflow Monitor 424
Load Manager
creating log files 11
memory usage 24
overview 3
parameters 25
post-session email 10
process 7, 8
running sessions and workflows 7
scheduling workflows 8
validating code pages 12
load summary
sessions 467
local variables
replacing sub-expressions 652
772
Index
M
mapping bottlenecks
identify 638
mapping parameters
See also Designer Guide
in session properties 203
overriding 203
mapping threads
description 14
mapping variables
See also Designer Guide
in partitioned pipelines 394
mappings
definition 2
factoring common logic 652
identify bottlenecks 638
increasing performance 636
single-pass reading 647
master servers 446
master thread
description 14
Maximum Days
Workflow Monitor 410
maximum sessions
See also Installation and Configuration Guide
parameter, description 25
Maximum Workflow Runs
Workflow Monitor 410
memory
caches 614
DTM buffer 25
increasing to avoid paging 662
merge target files
session properties 699
merging target files 380, 382
message queue
using with partitioned pipeline 380
metadata extensions
creating 82
deleting 85
editing 84
overview 82
session properties 718
Microsoft Access
pipeline partitioning 379
Microsoft Outlook
configuring an email user 322, 342
configuring the PowerCenter Server 322
Microsoft SQL Server
bulk loading 642
commit interval 253
connect string syntax 54
optimizing 646
MIME format
email 320
monitoring
data flow 639
session details 434
MOVINGAVG
See also Transformation Language Reference
partitioning restrictions 395
MOVINGSUM
See also Transformation Language Reference
partitioning restrictions 395
multibyte data
character handling 227
Oracle external loader 533
Sybase IQ external loader 535
writing to files 270
multiple servers
overview 444
multiple sessions 196
N
naming convention
See also Getting Started Guide
naming conventions
session parameters 496, 520
native connect string
See connect string
navigating
workspace 69
network packets
increasing 643, 646
non-persistent variables 110
non-reusable tasks
inherited changes 136
promoting to reusable 136
normal loading
session properties 697
Normal tracing levels
definition 473
Normalizer transformation
partitioning guidelines 347
notification
general option 41
null characters
editing 702
file targets 266
server handling 227
session properties, targets 265
targets 702
Index
773
numeric operations
optimizing by using 653
numeric values
reading from sources 229
O
open transaction
defined 287
operators
using for optimization 653
optimizing
block size 657
buffer block size 655
choosing numeric vs. string operations 653
commit interval 655, 658
data cache 655
data caches 658
data flow 440, 637, 639
disabling high precision 658
dropping indexes and key constraints 642
DTM Buffer Pool Size 655
eliminating transformation errors 648
expressions 652
factoring out common logic 652
filters 650
high precision 655
IIF expressions 653
increasing checkpoint interval 642
increasing network packet size 646
index cache 655, 658
Joiner transformation 651
Lookup transformation 649, 650
mapping 647
minimizing aggregate function calls 652
minimizing datatype conversions 648
minimizing error tracing 659
pipeline partitioning 663
removing trailing blank spaces 653
replacing sub-expressions with local variables 652
sessions 655
single-pass reading 647
source database 645
system-level 660
target database 642
Tracing Level 655
using DECODE vs. LOOKUP expressions 653
using operators vs. functions 653
optimizing performance
Aggregator transformation 650
774
Index
OR links 137
Oracle
bulk loading 642
bulk loading guidelines 253
commit intervals 253
connect string syntax 54
connection with OS Authentication 53
Oracle external loader
attributes 533
bulk loading 643
connecting with OS Authentication 552
data precision 533
delimited flat file target 533
external loader connections 551
external loader support 524, 533
fixed-width flat file target 533
multibyte data 533
null constraint 533
partitioned target files 533
reject file 534
output files
overview 28, 33
permissions 28
session parameter 504
session properties 700
targets 263
$OutputFile
definition 504
naming convention 496, 520
using 505
override
Teradata loader control file 539
tracing levels 473, 679
owner name
truncating target tables 245
P
packet size 58
paging
eliminating 662
parameter files
format 513
location 518
session 512
specifying in session 518
using with pmcmd starttask 607
using with pmcmd startworkflow 608
parameters
session 496
partition keys
adding 358, 362, 364
adding key ranges 365
partition points
adding and deleting 353
default 17
description 17, 346
Joiner transformation 384
partition types
description 348
partitioning
See pipeline partitioning
partitioning data
incremental aggregation 578
partitioning restrictions
Debugger 396
Informix 379
numerical functions 395
PowerCenter Connect for IBM MQSeries restrictions
397
PowerCenter Connect for PeopleSoft restrictions 397
PowerCenter Connect for SAP BW 397
PowerCenter Connect for SAP R/3 397
PowerCenter Connect for Siebel 398
relational targets 395
Sybase IQ 379, 395
transformations 395
unconnected transformations 353
XML targets 396
Partitioning tab
in the Server Manager 762
in the Workflow Manager 762
Partitions
properties 352
partitions
adding and deleting 356
description 18, 348
Partitions views
properties 351
pass-through pipeline
overview 15
performance
See also optimizing
commit interval 278
detail file 31
identifying bottlenecks 637
monitoring 436
server data movement mode 661
Sybase IQ 643
tuning, overview 636
performance data
collecting 674
performance detail files
creating 436
enabling session monitoring 436
permissions 28
understanding counters 437
viewing 436
performance settings
session properties 674
permissions
connection objects 51
creating a session 175
database 51
deleting a PowerCenter Server 50
editing sessions 177
external loader 525
FTP connections 561
FTP session 565
output and log files 28
recovery files 28
scheduling 90
Workflow Monitor tasks 403
persistent lookup cache
session output 35
persistent variables 110
in worklets 169
pinging
pmcmd syntax 602
PowerCenter Server in Workflow Monitor 405
Pingserver
pmcmd syntax 602
pipeline partitioning
adding and deleting partitions 356
adding hash keys 362
adding key ranges 365
adding partition points 353
caching Lookup transformations 628
concurrent connections 379
configuring a session 351
configuring for sorted data 384
configuring to optimize join performance 384
database compatibility 379
description 346
error threshold 200
example of use 349
external loaders 380, 526
file lists 375
file sources 374
file targets 380
filter conditions 372
Index
775
776
Index
using 498
$PMSuccessEmailUser
definition 333
tips 342
PMTOOL_DATEFORMAT
using with pmcmd 585
$PMWorkflowLogDir
definition 459
$PMWorkflowLogCount
saving a number of logs 460
post-session command
session properties 711
shell command properties 714
post-session email
overview 33, 332
See also email
session options 716
session properties 711
post-session shell command
configuring non-reusable 189
configuring reusable 192
using 188
post-session SQL commands 186
post-session threads
description 14
PowerCenter Connect for IBM MQSeries
partitioning restrictions 397
PowerCenter Connect for PeopleSoft
partitioning restrictions 397
PowerCenter Connect for SAP BW
partitioning restrictions 397
PowerCenter Connect for SAP R/3
partitioning restrictions 397
PowerCenter Connect for Siebel
partitioning restrictions 398
PowerCenter Server 22
architecture 2
assigning sessions 198
assigning workflows 122
blocking data 23
changing servers 445
commit interval overview 276
configuring for multiple servers 445
connecting in Workflow Monitor 405
connectivity overview 5, 46
creating server grids 451
data movement modes 27
deleting 50
external loader support 524
filtering in Workflow Monitor 406
handling file targets 268
logs 28
messages 29
monitoring 436
multiple servers overview 444
multiple source file list 230
online and offline mode 405
output files 33
performance detail file 31
permissions to delete 50
pinging in Workflow Monitor 405
privileges required to register 46
processing data 22
reading sources 22
registering 46, 48
removing assigned sessions 199
removing assigned workflows 123
reporting session statistics 468
server grids overview 446
system resources 24
tracing levels 473
truncating target tables 245
using FTP 561
using multiple to increase performance 661
using server grids to increase performance 661
variables for 46
pre- and post-session SQL
entering 186
guidelines 186
precision
flat files 270
writing to file targets 269
pre-defined events
waiting for 158
pre-defined variables
in Decision tasks 149
pre-session shell command
configuring non-reusable 189
configuring reusable 192
errors 193
session properties 711
using 188
pre-session SQL commands 186
pre-session threads
description 14
privileges
See also permissions
See also Repository Guide
scheduling 90
session 175
workflow 90
Workflow Monitor tasks 403
Index
777
workflow operator 90
Properties tab in session properties
in Workflow Manager 670
Q
Quit
pmcmd syntax 602
quoted identifiers
reserved words 255
R
rank cache
calculating data cache 633
calculating index cache 632
location 632
overview 632
size 632
Rank transformation
See also Transformation Guide
cache partitioning 620
caches 26, 34, 632
partitioning guidelines 347
performance detail 639
reader threads
description 14, 15
reading
sources 22
real-time sessions
transformation scope 288
recovering
pipeline partitioning 200
recovery
completing unrecoverable sessions 316
configuring mappings 297
configuring the session 297
configuring the target database 298
configuring the workflow 298
files, permissions 28
overview 296
PM_RECOVERY table format 299
PM_TGT_RUN_ID table format 299
pmcmd return codes 300
recover from task 308
recover task 311
recovering a failed workflow 308
recovering a session task 311
recovering a suspended workflow 305
recovery table layout 314
778
Index
resume/recover 305
server handling 314
recovery files
permissions 28
recreating
indexes 248
registering
PowerCenter Server 46, 48
registering server
See also Installation and Configuration Guide
reinitializing
aggregate cache 576
reject file
changing names 476
column indicators 478
locating 456, 476
Oracle external loader 534
overview 32
permissions 28
pipeline partitioning 476
reading 477
row indicators 478
session parameter 508
session properties 243, 263, 698, 700
transaction control 284
viewing 476
relational connections
See relational databases
relational databases
configuring a connection 56
copying a relational database connection 59
replacing a relational database connection 62
rollback segment 58
relational sources
partitioning 371
session properties 214
relational targets
partitioning 378
partitioning restrictions 395
session properties 240, 697
Relative time
specifying 162
Timer task 161
reload task or workflow
configuring 40
rename
repository objects 73
repositories
adding 73
connecting in Workflow Monitor 405
enter description 73
repository objects
configuring 73
rename 73
Repository Server
notification 41
notification in Workflow Monitor 410
requirements
server grids 448
reserved words
generating SQL with 255
resword.txt 255
reserved words file
creating 256
reset all 42
restarting
in Workflow Monitor 416
Resumeworkflow
pmcmd syntax 603
Resumeworklet
pmcmd syntax 603
reusable tasks
inherited changes 136
reverting changes 136
reverting changes
tasks 136
rmail
See also email
configuring 321
rollback segment 58
rolling back data
transaction control 283
round-robin partitioning 348, 360
row error log files
permissions 28
row error logging
active sources 260
row indicators
reject file 478
rows to skip
delimited files 692
Run if Previous Completed
in Command Tasks 145
session command 714
run options
run continuously 115
run on demand 115
server initialization 115
running status 421
running, sessions 197
running, workflows 122
S
saving
session logs 471
workflow logs 459
scheduled status 421
scheduling
configuring 114
creating reusable scheduler 114
disabling workflows 118
editing 117
end options 116
error message 113
permission 90
run every 115
run once 115
run options 115
schedule options 115
start date 116
start time 116
workflows 112
searching
for versioned objects in the Workflow Manager 76
Workflow Manager 70
Workflow Monitor 427
Sequence Generator transformation
partitioning guidelines 353, 396
server
See PowerCenter Server
See also database-specific server
selecting 122, 197
server code page
See also PowerCenter Server
affecting incremental aggregation 577
Server Grid Browser 453
Server Grid Editor 452
server grids
connectivity 447
creating 451
definition 444
distributing sessions 446
increasing performance 661
master servers 446
overview 446
requirements 448
worker servers 446
server handling
file targets 268
fixed-width targets 269, 270
multibyte data to file targets 271
shift-sensitive data, targets 271
Index
779
server logs
messages 29
overview 28
Server Manager session properties
General tab 737
Log and Error Handling tab 758
Partitioning tab 762
Source Location tab 754
Time tab 755
Transformations tab 761
server variables
description 46
email 333
for multiple servers 445
in Command tasks 188, 193
list 47
log files 46
servers
assigned 444
non-associated 444
session command settings
session properties 711
session details
monitoring sessions 434
session errors 201
session logs
archiving 471
changing location 498
changing locations 471
changing name 497
changing names 471
code page 475
codes 463
creation 11
default name 470
editing 419
external loader error messages 527
generating using UTF-8 463
load summary 467
locating 456, 469
location 671
log file settings 469, 470, 472, 474
overview 31
parameter 497
permissions 28
reading 463
sample 466
saving 678
session details 31
session parameter 497
thread identification 465
780
Index
timestamp 472
tracing levels 473
transformation statistics 469
viewing 474
viewing dynamically 419
viewing in Workflow Monitor 419
session output
cache files 34
control file 33
incremental aggregation files 34
indicator file 33
performance detail file 31
persistent lookup cache 35
post-session email 33
PowerCenter Server log 28
reject file 32
session logs 31
target output file 33
session parameters
database connection parameter 499
defining 512
in Command tasks 143
naming conventions 496, 520
overview 496
reject file parameter 508
session log parameter 497
session parameter file 512
source file parameter 502
target file parameter 504
session properties
Components tab 710
Config Object tab 675
constraint-based loading 251
delimited files, sources 222
delimited files, targets 266
edit delimiter 690, 702
edit null character 702
email 332, 714
external loader 682, 695
fixed-width files, sources 220
fixed-width files, targets 265
FTP files 682, 695
general settings 668
General tab 668
log files 469, 470, 472, 474
Metadata Extensions tab 718
null character, targets 265
on failure email 332
on success email 332
output files, flat file 700
partition attributes 351, 352
metadata extensions in 82
monitoring counters 437
multiple source files 230
optimizing 636, 655
output files 28
overview 174
parameter file 512
parameters 496
performance detail file 31
performance tuning 636
properties reference 667
read-only 175
removing assigned PowerCenter Servers 199
running 197
runtime operations overview 7
session details file 31
starting 197
stopping 130, 200
test load 244, 264
truncating target tables 245
using FTP 565
validating 195
viewing performance details 436
Setfolder
pmcmd syntax 604
Setnowait
pmcmd syntax 605
Setwait
pmcmd syntax 605
shared memory
Load Manager 24
shell commands
executing in Command tasks 145
make reusable 191
post-session 188
post-session properties 714
pre-session 188
using Command tasks 143
using server variables 188, 193
using session parameters 143
Showsettings
pmcmd syntax 605
Shutdownserver
pmcmd syntax 605
single-pass reading
definition 647
sort order
See also session properties
affecting incremental aggregation 577
sorted flat files
partitioning for optimized join performance 385
Index
781
sorted ports
caching requirements 621
sorted relational data
partitioning for optimized join performance 387
Sorter transformation
partitioning 392
partitioning for optimized join performance 389
$Source
session properties 672
source bottlenecks
using a database query to identify 638
using a read test session to identify 638
using filter transformation to identify 637
source data
capturing changes for aggregation 574
source databases
database connection session parameter 499
identifying bottlenecks 637
optimizing 645
optimizing by partitioning 663
optimizing the query 645
optimizing with conditional filters 646
source files
accessing through FTP 560, 565
configuring for multiple files 230, 231
delimited properties 691
fixed-width properties 689
session parameter 502
session properties 220, 687
using parameters 502, 506
source location
session properties 220, 687
Source Location tab
in the Workflow Manager 754
Server Manager session properties 754
source pipelines
description 346
pass-through 15
reading 22
stages 17
target load order groups 22
threads created 19
with Joiner transformations 19
Source Qualifier transformation
partitioning guidelines 347
source-based commit
active sources 278
description 278
sources
code page 224
code page, flat file 222
782
Index
connections 211
delimiters 224
escape character 691
line sequential buffer length 225
multiple sources in a session 230
null character 689
null character handling 227
null characters 222
overriding SQL query, session 216
partitioning 371, 374
quote character 691
reading 22
session properties 210
specifying code page 689, 691
SQL
configuring environment SQL 55
guidelines for entering environment SQL 55
SQL queries
in partitioned pipelines 371
stages
description 17
staging areas
removing to improve performance 659
start date, scheduling 116
Start tasks, definition 88
start time, scheduling 116
starting
selecting a server 122, 197
sessions 197
start from task 124
starting a part of a workflow 124
starting tasks 125
starting workflows using Workflow Manager 124
Workflow Monitor 404
workflows 122
Starttask
pmcmd syntax 606
using a parameter file 607
Startworkflow
pmcmd syntax 607
using a parameter file 608
statistics
for Workflow Monitor 408
viewing 408
status
aborted 421
aborting 421
disabled 421
failed 421
in Workflow Monitor 421
running 421
scheduled 421
stopped 421
stopping 421
succeeded 421
suspended 127, 421
suspending 127, 421
tasks 421
terminated 421
unscheduled 421
waiting 421
workflows 421
stop on
$PMSessionErrorThreshold 47
error threshold 200
errors 679
pre- and post-session SQL errors 186
stopped status 421
stopping
PowerCenter Server See Installation and Configuration
Guide
in Workflow Monitor 418
server handling 129
sessions 130
tasks 129
using Control tasks 147
workflows 129
stopping status 421
Stoptask
pmcmd syntax 609
Stopworkflow
pmcmd syntax 609
string operations
minimizing for performance 653
sub-expressions
replacing with local variables 652
succeeded status 421
Suspend On Error option 127
suspended status 127, 421
suspending
behavior 127
email 128
resume in Workflow Monitor 417
status 127
workflows 127
worklets 164
suspending status 421
suspension email 339
Sybase
commit interval 253
Sybase IQ
partitioning restrictions 379, 395
T
table name prefix
target owner 254
table owner name
session properties 216
targets 254
$Target
session properties 672
target connect groups
committing data 278
target connection group
Transaction Control transformation 289
target connection groups
constraint-based loading 249
defined 257
target connection settings
session properties 682, 695
target databases
bulk loading 642
database connection session parameter 499
identifying bottlenecks 637
optimizing 642
optimizing by partitioning 664
Index
783
784
Index
Task view
configuring 412
customizing 412
displaying 430
filtering 431
hiding 412
opening and closing folders 407
overview 402
using 430
tasks
aborted 421
aborting 129, 421
adding in workflows 92
arranging 71
Assignment tasks 140
Command tasks 143
configuring 135
Control task 147
copying 77
creating 133
creating in Task Developer 133
creating in Workflow Designer 133
Decision tasks 149
disabled 421
disabling 137
email 328
Event-Raise tasks 153
Event-Wait tasks 153
failed 421
failing parent workflow 138
in worklets 166
inherited changes 136
instances 136
list of 132
non-reusable 92
overview 132
promoting to reusable 136
restarting in Workflow Monitor 416
reusable 92
reverting changes 136
running 421
show full name 41
starting 125
status 421
stopped 421
stopping 129, 421
stopping and aborting in Workflow Monitor 418
succeeded 421
Timer tasks 161
using Tasks toolbar 92
validating 119
Tasks toolbar
creating tasks 134
TCP/IP network protocol
server settings 49
Teradata
connect string example 54
Teradata external loader
code page 538
connections 551
date format 538
FastLoad attributes 545
MultiLoad attributes 540
overriding the control file 539
support 524
Teradata Warehouse Builder attributes 547
TPump attributes 542
Teradata Warehouse Builder
attributes 547
operators 547
terminated status 421
Terse tracing levels
See also Designer Guide
defined 473
test load
bulk loading 244
enabling 671
file targets 264
number of rows to test 671
relational targets 244
thread identification
session log file 465
threads
and partitions 18
creation 13, 14
mapping 14
master 14
post-session 14
pre-session 14
reader 14, 15
transformation 14, 16
types 14
writer 14, 16
time
configuring 38
formats 38
Time tab
duration options 756
schedule options 755
Server Manager session properties 755
start options 756
use absolute time option 757
Timer tasks
absolute time 161, 162
definition 161
description 132
example 161
relative time 161, 162
variables in 103
timestamps
session logs 472
workflow logs 460, 462
Workflow Monitor 402
tool names
displaying and hiding 41
toolbars 69
adding tasks 92
creating tasks 134
using 69
Workflow Monitor 415
Tracing Level
optimizing 655
tracing levels
See also Designer Guide
Normal 473
overriding 679
session 473
Terse 473
Verbose Data 474
Verbose Initialization 474
transaction
defined 287
transaction boundary
dropping 287
transaction control 287
transaction control
bulk loading 283
end of file 284
open transaction 287
overview 287
PowerCenter Server handling 283
real-time sessions 287
reject file 284
rules and guidelines 290
transaction control points 287
transformation error 284
transformation scope 287
user-defined commit 283
transaction control point
defined 287
Transaction Control transformation
partitioning guidelines 356
target connection group 289
Index
785
U
unconnected transformations
partitioning restrictions 353
Unicode mode
See also Installation and Configuration Guide
code pages 27
session behavior 16
UNIX systems
email 321
external loader behavior 526
PowerCenter Server as daemon 3
unscheduled status 421
Unsetfolder
pmcmd syntax 610
786
Index
update strategy
target properties 241
Update Strategy transformation
constraint-based loading 249
updating
incrementally 579
URL
adding through business documentation links 97
user-defined commit
see also transaction control
bulk loading 283
user-defined events
declaring 155
example 153
waiting for 157
using multiple servers 444
V
validating 196
expressions 97, 119
tasks 119
workflows 119, 120
worklets 171
Varchar datatypes
See also Designer Guide
removing trailing blanks for optimization 653
variables
email 333
server 46
workflow 103
Verbose Data tracing levels
configuring session log 474
See also Designer Guide
Verbose Initialization tracing levels
configuring session log 474
See also Designer Guide
Version
pmcmd syntax 611
versioned objects
See also Repository Guide
checking in 74
checking out 74
searching for in the Workflow Manager 76
viewing
reject file 476
session logs 474
workflow logs 462
W
waiting status 421
Waittask
pmcmd syntax 611
Waitworkflow
pmcmd syntax 611
web links
adding to expressions 97
webzine l
windows
customizing 69
displaying and closing 69
docking and undocking 69
Navigator 67
Output 67
overview 67
panning 40
reloading 40
Workflow Manager 67
Workflow Monitor 402
workspace 67
Windows System Tray
accessing Workflow Monitor 404
Windows systems
email 322
external loader behavior 526
Informatica service owner 322
logon network security 325
PowerCenter Server service 3
worker servers 446
Workflow Designer
creating tasks 133
displaying and hiding tool name 41
workflow logs
archiving 459
changing locations 461
changing name 461
codes 458
configuring 460
creation 9
editing 419
enabling and disabling 459, 461
locating 456, 459
log file settings 459, 460
overview 30
permissions 28
reading 458
sample 458
timestamp 460
viewing 462
Index
787
788
Index
workflows
aborted 421
aborting 129, 421
adding tasks 92
assigning PowerCenter Servers 122
branches 88
copying 77
creating 91
definition 2, 88
deleting 97
developing 89, 91
disabled 421
disabling 118
editing 98
email 341
events 88
fail parent workflow 138
failed 421
guidelines 89
links 88
locking 8
metadata extensions in 82
monitor 89
overview 88
parameter file 9
privileges 90
properties reference 721
removing assigned PowerCenter Servers 123
restarting in Workflow Monitor 416
resuming in Workflow Monitor 417
running 7, 122, 421
runtime operations overview 7
scheduled 421
scheduling 112
selecting a server 89
starting 122
starting on non-associated server 444
status 127, 421
stopped 421
stopping 129, 421
stopping and aborting in Workflow Monitor 418
succeeded 421
suspended 421
suspending 127, 421
suspension email 339
terminated 421
unscheduled 421
using tasks 132
validating 119
variables 103
waiting 421
Worklet Designer
displaying and hiding tool name 41
worklets
adding tasks 166
configuring properties 166
create non-reusable worklets 165
create reusable worklets 165
declaring events 167
developing 165
email 341
fail parent worklet 138
metadata extensions in 82
overriding variable value 169
overview 164
parameters tab 169
persistent variable example 169
persistent variables 169
restarting in Workflow Monitor 416
resuming in Workflow Monitor 417
suspended 421
suspending 164, 421
unscheduled 421
validating 171
variables 169
waiting 421
workspace
color 42
navigating 69
setting colors 42
setting fonts 42
zooming 71
workspace file directory 41
writer threads
description 14, 16
writers
session properties 692
WriterWaitTimeout
target-based commit 277
writing
multibyte data to files 270
to fixed-width files 268, 269
Z
zooming
Workflow Manager 71
Workflow Monitor 426
X
XML sources
allocating memory 655
numeric data handling 229
XML targets
active sources 259
partitioning restrictions 396
Index
789
790
Index