Questions

Values of configuration Variables(to connect another
server)Priority
There are situations where you need to read/write

data to some other server than the running host. In
those cases we need certain configuration
Parameters to be set with the appropriate values.
Abinitio takes this priority while checking for those
parameters.
Files specified by the value of the

AB_CONFIGURATION environment variable.
If the Co>Operating System does not find a value for

a configuration variable in the environment, it looks
next at the files listed in the AB_CONFIGURATION
environment variable. You set a value for
AB_CONFIGURATION as follows:
On Unix — A colon-
separated list of the URLs
of the files
On Windows — A
semicolon-separated list
of the URLs of the files
The files listed in the value of

AB_CONFIGURATION must be
located on the run host. The
Co>Operating System reads the
files in the order listed
2) The user configuration file

The user configuration file must be
named either .abinitiorc or
abinitio.abrc and must reside:
On Unix — In the user's

home directory:
$HOME/.abinitiorc
$HOME/abinitio.abrc
On Windows — In the
user's home directory:
$HOME\.abinitiorc
$HOME\abinitio.a
brc
Only one user configuration file is

allowed. If the Co>Operating
System finds more than one file
named either .abinitiorc or
abinitio.abrc in the $HOME
directory, an error results
3) The system configuration file (usually set

up by the system administrator)
The system configuration file is

named abinitiorc, and is usually set
up by the Co>Operating System
administrator.
On Unix — The pathname
of the system
configuration file is:
$AB_HOME/config/abinitio
rc
On Windows — The
pathname of the system
configuration file is:
$AB_HOME\config\abinitio
rc
The value of AB_HOME is the path

of the directory in which the
Co>Operating System is installed
Performance Considerations During Development
Performance Considerations During

Development
During development, here are some things to watch

out for:
 Over-reliance on databases
There are many things that you can (and

should) do outside the database. For
example, operations involving heavy
computation are usually better done with
components in the graph rather than in a
database. Sorting will almost always be
faster when you use the SORT component
rather than sorting in the database.
For other performance considerations

involving databases, see the Ab Initio
Guide>Book.
 Paging (or having very little free physical

memory, which means you’re close to paging)
Paging is often a result of:
o Phases that have too many components

trying to run at once
o Too much data parallelism
 Having too little data per run
When this is true, the graph’s startup time

will be disproportionately large in relation to
the actual run time. Can the application
process more data per run? Maybe it could
use READ MULTIPLE FILES, for example, to
read may little files per run, instead of
running many times.
 Bad placement of phase breaks
Whenever a phase break occurs in a graph,

the data in the flow is written to disk; it is
then read back into memory at the
beginning of the next phase. For example,
putting a phase break just before a FILTER
BY EXPRESSION is probably a bad idea. The
size of the data is probably going to be
reduced by the component, so why write it
all to disk just before that happens?
 Too many sorts
The SORT component breaks pipeline

parallelism and causes additional disk I/O to
happen.
Checkout Code in Heterogeneous environmentDBC

file Parameterize
Check out Code in Heterogeneous Environment:

When your EME and sandbox are not in the same
server you can check out the objects on command line/ GDE
by doing these simple steps
In your .abinitiorc file (should be created only in

your home directory), please have the following entries
AB_NODES @ <<any name
typically target server>> : << your EME Server>>
AB_HOME @ <<any name typically
target server>> : /apps/xt01/abinitio-V2-14-1
AB_AIR_ROOT @ <<any name typically
target server>> : /apps/xt01/eme/v214/repo
AB_USERNAME @ <<any name typically
target server>> : << your user name of the target
server>>
AB_ENCRYPTED_PASSWORD @ <<any name
typically target server>> : << encrypted passwd of the
target server>>
AB_CONNECTION @ <<any name typically
target server>> : telnet
Command line:
Type the below command on your shell
Export AB_AIR_ROOT=//<<target EME
server>>/<<EME path>>
Now you are ready to do the checkout
using air export command.
GDE:
In your EME Datastore settings, please
provide the details of your target EME server. In your Run
settings, please provide the details of your Sandbox server.
Go the Project -> check out screen to do
the necessary checkout of the objects.
DBC file Parameterize:

It’s good Practice to Parameterize the values
of DBC file as much as you can. While Parameterize try to
use variables which already exist rather than defining again
in your graph/Project parameters.
EG: db_nodes in your DBC file
expects the server name. Instead of hard coding the server
name, see whether you can use any parameter already
defined in the common projects. If your value is current
server, AI_EXECUTION_HOST (variable defined in the
stdenv) can be used.
Parameterize will help you in shifting to

new servers easily during Disaster recovery or any other
server migration issues.
Setting Confiuration variable values in

configuration files and their priority
Setting configuration variable values in configuration
files
If the Co>Operating System does not find a value for a
configuration variable in the environment, it looks through any
available configuration files on the run host.
Available configuration Files (in the order it searches)
1) files named by AB_CONFIGURATION
2) User configuration file
3) System configuration file
If multiple entries for the same variable occur in any
configuration file or files, the Co>Operating System uses only
the first entry it encounters and ignores the rest
Most common ConfigurationVariables
AB_NODES
AB_HOME
AB_WORK_DIR
AB_CONNECTION
AB_TELNET_PORT
AB_TELNET_TERMTYPE
AB_TIMEOUT
AB_STARTUP_TIMEOUT
AB_TELNET_TIMEOUT_SECONDS
AB_TELNET_PAUSE_MSECS
AB_LOCAL_NETRC
AB_USERNAME
AB_PASSWORD
COE team handles most of the definition of the above
variables.
We expect AB_USERNAME and AB_PASSWORD be
defined by the individual application projects
In addition to the above variables, DBC files variables can
be considered as configuration variables
1) Files named by AB_CONFIGURATION:
A list of files where configuration variable and
values can be specified. Separate items in the list with a colon (:)
on Unix platforms
2) User configuration file
If the Co>Operating System does not find a value
for a configuration variable in the environment of a process or in
one of the files listed in AB_CONFIGURATION, it looks next at
the user configuration file, .abinitiorc file which should be in the
user’s home directory
3) System configuration file
If the Co>Operating System does not find a value for a
configuration variable in the environment, in one of the files
listed in AB_CONFIGURATION, or in the user
configuration file, it looks next at the system configuration
file , $AB_HOME/config/abinitiorc
Most common issues:

1) AB_CONFIGURATION defined in graph level and you
see errors stating AB_PASSWORD or any of your DBC file’s
variables not found even though you set those values in your
config file and attach it to AB_CONFIGURATION.
Reason: AB_CONFIGURATION when defined it
should be defined as an export parameter. Please make sure export
check box is clicked.
2) AB_CONFIGURATION defined in my project level

parameter/ graph level with the right settings.
I am getting ambiguous error when trying to
evaluate the parameter.
Reason: As you know AB_CONFIGURATION
contains the paths of various configuration files. If the
variable is defined in multiple places i.e., in your
common projects and private projects and have no
proper association, you get this error. This error means
it couldn’t understand which value to associate for this
variable. It found multiple values for this variable.
3) I have defined my variables but it is taking the old

values.
Reason: Please make sure that these variables are
not defined twice and should be available only in the file you want.
As specified above, co-operation system uses the first entry
found and ignores the rest. The file selection will be taken in the
specified order as above
Job Tracking Window in the GDE
The Co>Operating system generates tracking information as

a job runs. When you run a job from the GDE, the GDE can
display this information using the Tracking Window or Text
Tracking.
The Tracking Window:
You can open one or several Tracking windows in the GDE

and track all, or any combination of, the flows and
components in a graph. If you execute the graph with
Tracking windows open, they display tracking information as
the graph runs.
How to open the Tracking window for a graph
 Do one of the following:

o Click the background of the graph for which you
want tracking information, then choose Tracking
Detail from the pop-up menu
o From the GDE main menu, choose View >
Tracking Detail
o In the GDE, press Ctrl + F2
How to open a separate Tracking window for a subgraph,

component, or flow
 Do one of the following:

o In the Tracking window for a graph, double-click a
row to open a separate window for the subgraph,
component, or flow represented by that row.
o Click a subgraph, component, or flow in the graph,
then choose Tracking Detail from the pop-up
menu.
o Select a subgraph, component, or flow in the
graph, then choose View > Tracking Detail from
the GDE main menu.
How to open a separate Tracking window for a port
 Click the component whose port you want to track, then

choose Tracking Detail for Port from the pop-up
menu.
References: Ab Initio GDE Help and Ab Initio Co>Op
Graph Developer’s Guide
Simple way to remove header and trailer records
Here a simple way to remove header and trailer

records using Ab Initio graph. If you are processing a
file that has a header and a trailer with no field
identifier that identifies the record type, you can
follow the simple graph below. In this example, the
first record is considered the header and the last
record is the trailer. The data file can still be in
EBCDIC format as long as the DML is already
generated. You can use the “cobol-to-dml” utility to
generate DML automatically using COBOL copybook.
Please check GDE Help for more details regarding
“cobol-to-dml” utility.
 Filter the 1st record using next_in_sequence()
Parameters:
Name Value
----------- --------------------------
select_expr next_in_sequence()>1
 Use the Dedup Sorted Component to get the last

record
Parameters:
Name Value
----------- --------
key {}
keep last
Sample Records:
Use “m_dump” command to display record in

UNIX, please see GDE help for more details.
Record 1:
[record
WS_B_CLAIM_NBR "BILL FILE "
WS_B_CHECK_NBR " "
WS_B_BILL_NBR " 2009-02-
20 "
WS_B_CLAIMANT_ID " "
WS_B_BILL_RECVD_DT " "
WS_B_BILL_PAID_DT " "
Record 2:
[record
WS_B_CLAIM_NBR "1890070194"
WS_B_CHECK_NBR "371521908"
WS_B_BILL_NBR
"18900701940120000131133906410"
WS_B_CLAIMANT_ID "01"
WS_B_BILL_RECVD_DT "2000-01-26"
WS_B_BILL_PAID_DT "2000-01-31"
Record 432017:
[record
WS_B_CLAIM_NBR "6524235006"
WS_B_CHECK_NBR "600230290"
WS_B_BILL_NBR
"65242350060220090217100052724"
WS_B_CLAIMANT_ID "02"
WS_B_BILL_RECVD_DT "2009-01-29"
WS_B_BILL_PAID_DT "2009-02-18"
Record 432018: (Last Record)
[record
WS_B_CLAIM_NBR "BILL FILE "
WS_B_CHECK_NBR " "
WS_B_BILL_NBR E" 2009-
02-20\x04\x32\x01%\x00\x00Â\b\x30\x20Ê@ "
WS_B_CLAIMANT_ID " "
WS_B_BILL_RECVD_DT " "
WS_B_BILL_PAID_DT " "
Sharing a subgraph across graphs
Sharing a subgraph across graphs:

We all know that when a subgraph is built, it becomes a part
of the graph in which we build it. However If we have a
situation to use that subgraph in many other graphs, or in
other places in the original graph, then this can be achieved
by saving it as a component and placing it in the server for
shared access.
How do I save a subgraph as a component?
To save a subgraph as a component:
1. If you do not have a components folder in your
sandbox, do the following:
a. Create a components folder, and a parameter
with which to reference it, in your sandbox.
b. Add the components folder to the Component
Organizer as a top-level folder.
2. Select the subgraph.
3. From the File menu, choose Save Component
"subgraph_name" As.
4. Navigate to the components folder in your sandbox.
5. In the Save as type text box, choose Program
Components (*.mpc, *.mp).
6. Click Save.
You may need to right-click the components folder you
added to the Component Organizer and then refresh it so
the subgraph will appear in the components folder. Once the
subgraph appears, you can drag it from the Component
Organizer into any graph in which you want to use it, just as
you would use a pre-built component.
A subgraph that is saved in this way becomes a Linked
Subgraph. If you insert an instance of such a subgraph into a
graph from the Component Organizer and then double-click
it, the GDE displays (linked) following the name of the
subgraph.
To make the changes made in the subgraph available in the
instances,
1. Save the desired changes to the subgraph in
the Component Organizer that you used to create
the graph.
2. Select the instances of the subgraph you want
to update in the graph or other graphs that is using
that subgraph.
3. From the GDE Edit menu, choose Update.
Unique Identifier Function
The DML function “unique_identifier()” returns a

variable-length string of printable characters that is
guaranteed to be unique. This includes hashed
versions of the timestamp, hostname, and process
id, as well as a few other fields to guarantee
uniqueness. You can use the return string as a
unique key or to construct a unique filename. To
return parts of the identifier, use
unique_identifier_pieces to decode the output. To
test, try typing the following examples on the UNIX
prompt.
Examples
Code:
$ m_eval 'unique_identifier()'
"136fe-9a55-b-s0183dd943f-2"
$ m_eval 'unique_identifier()'
"1370e-9a55-b-s0183dd943f-2"
Conditional Components in the GDE
Can I make my graph conditional so that certain components do

not run?
You can enter a condition statement on the Condition tab for a

graph component. This statement is an expression that evaluates to
the string value for true or false; the GDE then evaluates the
expression at runtime. If the expression evaluates to true, the
component or subgraph is executed. If it is false, the component or
subgraph is not executed, and is either removed completely or
replaced with a flow between two user-designated ports.
Details
To turn on the conditional components in the GDE:

1. On the GDE menu bar, choose File > Preferences to open the
Preferences dialog.
2. Click the Conditional Components checkbox on the
Advanced tab.
Enabling this option adds a Condition tab to the Properties
dialog of your graph components.
3. Use the Condition tab to specify a conditional expression for a
subgraph or component that the GDE evaluates at runtime. If the
expression returns the string value 1, the GDE runs the subgraph or
component. If the expression returns the string value 0, you have
two choices for how the graph will behave:
 Remove Completely — Use this option to disable the entire
graph branch. This disables all upstream and downstream
components until you reach an optional port. In other words,
all components in this branch are disabled until the graph
makes sense.
 Replace With Flow — Use this option to disable only the
subgraph or component to which the condition has been
applied.
Note the following when writing conditional components:
 You must be using the Korn shell in your host profile.
 The evaluated value for the condition must be a string for
TRUE or FALSE. The following is the set of valid FALSE
values: the boolean value False, 0 (numeric zero), "0" (the
string "0"), "false", "False", "F", "f". All other string
values evaluate to TRUE.
 Be careful not to have something propagate from a
component that might not exist at runtime. This could cause
your graph to fail.
 Components or subgraphs that are excluded are displayed
with gray tracking LEDs at runtime.
It is important to use the precise syntax for if statements in the
Korn shell. The correct form is:
$( if condition ; then statement; else statement; fi)
Here are three examples:
$(if -n $VARIABLE ; then echo 0; else

echo 1; fi)
$(if $LOAD_TYPE = "INITIAL" ; then
echo 1; else echo 0; fi)
$(if [ -a "file_A.dat" ]; then echo
"1"; elif [ -a "file_B.dat" ] && [ -a
"file_C.dat" ]; then echo "1"; elif [ -a
"file_D.dat" ] && [ -a "file_E.dat" ]; then
echo "1"; else echo "0"; fi;)
Performance improvement of a graph
Improving the performance of an already-existing graph
Working on performance problems in an already-existing graph is

a lot like debugging any other problem. An important principle to
follow when making changes to a graph, and then measuring what
differences (if any) have occurred in the graph's efficiency, is to
change only one thing at a time. Otherwise, you can never be sure
which of the changes you made in the graph have changed its
performance.
Performance considerations during development
During development, here are some things to watch out for:
 Over-reliance on databases
There are many things that you can (and should) do outside the
database.
For example, operations involving heavy computation are usually

better done with components, in the graph, rather than in a
database. Sorting will almost always be faster when you use the
Sort component rather than sorting in the database.
For other performance considerations involving databases, see the

Ab Initio Guide>Book.
 Paging (or having very little free physical memory, which

means you're close to paging)
Paging is often a result of:
o phases that have too many components trying to run at

once
o too much data parallelism
 Having too little data per run
When this is true, the graph's startup time will be

disproportionately large in relation to the actual run time. Can the
application process more data per run? Maybe it could use Read
Multiple Files, for example, to read many little files per run,
instead of running so many times.
 Bad placement of phase breaks
Wherever a phase break occurs in a graph, the data in the flow is

written to disk; it is then read back into memory at the beginning
of the next phase. For example, putting a phase break just before a
Filter by Expression is probably a bad idea: the size of the data is
probably going to be reduced by the component, so why write it all
to disk just before that happens?
 Too many sorts
The Sort component breaks pipeline parallelism and causes

additional disk I/O to happen. Examples of misplaced or
unnecessary sorts can be found in the performance example graphs
in $AB_HOME/examples/basic-performance.
AB_SAS_USE_METHOD, lookup implicit usage
When you are creating a SAS file and have numeric

fields as part of your dml, please set
AB_SAS_USE_METHOD_3 to true in your parameters
(graph level or sandbox level), otherwise you will
end up having zero value in the numeric fields
When matching an input field with a lookup field, the

lookup field type will be an implicit type casting. You
don’t need to type cast it again.
Eg: input had date field in format “YYYY-MM-DD

HH24:MI:SS.NNNNNN” and in lookup key field was in
“YYYYMMDD” format. When you join these fields, you
don’t need to type cast it again. It will automatically
match.
Same case applies to join component in which the

format of the key field in driving port is taken as
reference. The non-driving port key value will be
converted (only to perform join) to that of the
driving port.
Below is the supporting help document.
Syntax with a LOOKUP FILE component

record lookup (string file_label, [ expression [ , expression ... ] ] )
Argument Description
file_label A string constant representing the name of a LOOKUP
FILE component.
expression An expression on which to base the match. Typically,
this is an expression taken from the input record. The
function implicitly casts expression to the type of the
corresponding key field(s).
The number of expression arguments must match the
number of semicolon-separated field names in the key
specifier of file_label. The maximum number is 24. Any
number of expressions can be NULL. If all expressions
match the corresponding key fields in the lookup
record, the record is considered a match.
You can omit the expression arguments if the key for
this lookup is empty: that is, if the key parameter of the
Lookup File component is set to { }. Note that all
records will match and the first one will be selected.
APAD Requisition Steps
To request a new software like Ab Initio GDE, Ab

Initio Forum, and Data Profiler to be installed using
APAD, please follow the steps below:
1. Use the Service Catalog website to submit a
request
http://rc.allstate.com
2. Look for "Advanced IT Services" at click the

link "Software Packaging"
3. Find the "Software Packaging

Workstation" request and click "Proceed to Order"
4. Refer to the previous GDE package installation
request when filling up the form
View Multiple errors at once Next error
To view multiple errors at once:
When the AB_XFR_COLLECT_ERRORS configuration

variable is set to true, the Co>Operating System will
attempt to accumulate multiple errors encountered
during transform compilation, rather than just
stopping at the first one. If the variable is set to false
(the default), compilation will be aborted on the first
error encountered.
By setting this configuration variable to true, you

may be able to identify and fix multiple errors with
each execution.
To view Next error:
If you have multiple errors in the GDE Application

Output: Job pane, the F4 key allows you to cycle
through the errors. Pressing F4 scrolls the next error
into view and highlights the component that
generated the error. After you've reached the end of
the errors, the feature will prompt you whether to
start again at the beginning of the output.
simple steps to diagnosis checkout issueknow more

details at graph failures
GDE Run settings:

If you face the below error either in the check out/
setups
The GDE encountered problems while attempting to execute

the script.
Check for syntax errors in graph parameters with shell
interpretation,
or in the host setup and cleanup scripts.
The error was:
./GDE-dmei3-command0003.ksh[177]: /config.ksh: not

found
It means you are trying to checkout to a path where

you don’t have access.
Eg: If I am checking out from Dev eme to

/export/home/sven7 (path in dev) or
/apps/home/sven7 (path in QA), where I don’t have
access, I will get the above error.
You need to check your settings. Please make sure

you provide all the appropriate values.
Also in the checkout process, you will have the

option of selecting the run host settings. Please
select the appropriate one.
Use AB_BREAK_ON_ERROR while debugging:
When the configuration variable

AB_BREAK_ON_ERROR is set to true and debugging
is enabled, an error or reject will start the debugger
at the point of failure or rejection. This makes it
easy to examine the state of the transform at the
point of failure or rejection
API vs. UtilityBulk Load
There were some confusions regarding the loading methods

(API and utility) using output table component when the table
has indexes and constraints defined on them. Below are the
various scenarios and the behavior of the table component.
API Utility
Direct - True Direct - Fa
1) Record by record 1) Bulk load
loading 2) Disables
2) Checks index/constraints at
constraints for beginning, loads data and
records enables index. 1) Bulk Loading
3) Very slow 3) If data has duplicates, 2) Between API a
Serial 4)Suitable for base index becomes unusable Utility - direct-tru
Loading load while re-building performance.
MFS Graph will fail saying 1) Graph will run
Loading Same as above index built on table 2) Slow performa
Inference: 1) Table with constraints and indexes
a) Serial loading – Utility Direct True or API

depending on requirement.
b) Parallel loading (If huge data) – Disable index,
load using utility direct true, enable index or API
c) Parallel Loading (if less data) - API
2) Table with no constraints and indexes
a) Serial or Parallel Loading – Utility Direct True.
Here’s the excerpt from Ab Initio GDE Help pertaining to API

and Utility
What is the difference between API mode and utility

mode in database components?
Short answer
API and utility are two possible interfaces to

databases from the Ab Initio software and their uses
can differ depending on the database in question.
Details
Enterprise-level database software often provides

more than one interface to its data. For example, it
usually provides an API (application programming
interface) that allows a software developer to use
database vendor-provided functions to talk directly
to the program.
In addition, the vendor usually provides small

programs, or utilities, that allow the user to
accomplish a specific task or range of tasks. For
example, the vendor might provide a utility to load
data into a table or extract table data, or provide a
command-line interface to the database engine. The
exact functionality of the utility or API varies by
database vendor; for that reason, specific details are
not provided here.
API and utility modes both have advantages and

disadvantages:
 API mode — Provides flexibility: generally, the

vendor opens up a range of functions for the
programmer to use; this permits a wide variety
of tasks to be performed against the database.
However, the tradeoff is performance; this is
often a slower process than using a utility. As an
Ab Initio user, you might use API mode when
you want to use a function that is not available
through a utility. In some instances, a
component will only run in API mode for just this
reason — the function inherent in the
component is not available through that
vendor's published utilities. In general, however,
it is useful to remember that API mode executes
SQL statements.
 Utility mode — Makes direct use of the
vendor's utilities to access the database. These
programs are generally tuned by the vendor for
optimum performance. The tradeoff here is
functionality. For example, you might not be
able to set up a commit table. In such an
instance, you must trust the ability of the utility
to do its job correctly. Because the granular
control given by API mode is not present in
utility mode, utility mode is best when your
purpose most closely resembles the purpose for
which the utility was created. For example, any
support of transactionality and record locking is
subject to the abilities of the utility in question.
Also, unlike API mode, utility mode does not
normally run SQL statements.
When choosing whether to use api or utility mode

with OUTPUT TABLE, keep the following in mind:
 api mode usually gives better diagnostics
 utility mode (the default mode for OUTPUT

TABLE) usually gives better performance.
API mode parallelization (all databases) for

UPDATE TABLE
You can apply any level of parallelism to the layout

of UPDATE TABLE, but note that each partition of an
UPDATE TABLE component running in parallel can
compete for database locks on the table it references
and deadlock can result.
You can often avoid such deadlock by partitioning

the input data on the primary key you are updating.
However, this does not necessarily always eliminate
the danger of deadlock.
If you have indexes on other columns in the update

table besides the primary key, then either inserting
rows or performing updates to the particular column
values that are indexed might cause multiple
partitions to contend for locks on the same objects.
Often these secondary keys are across relatively
small numbers of values, and the corresponding
indexes can be rebuilt quickly. In such cases, instead
of trying to update in parallel you can often get
better performance by either:
 Loading serially
OR
 Dropping and recreating any affected secondary

index
Using the NULL key to access a lookup with a single

record
Using the NULL key to access a lookup with a single

record
There are situations where you need to put some piece
of global information in a single record into a file and that
record being accessed with a lookup call. In this case use
the NULL key, that is {}, as the key on the lookup file. It is not
necessary to add a dummy key to be able to retrieve that
single record information from the file.
For example, we would have situations like we store the
count of records processed in a graph into an output file
which will be used as a lookup file for the balancing
purposes. We will insert a dummy key like 1 or “X” in order to
retrieve the records count. For example, a sample lookup file
named “Total Recs Processed Count” with the dummy key
will look like this:
Dummy_Key Recs_Count
X 2340
And we will use the following lookup call to retrieve the

Recs_Count field value,
lookup (“Total Recs Processed Count”, ‘X’).Recs_Count
Instead of inserting the dummy value into the lookup file with
single record information and using the dummy key to
retrieve that information, we can use the NULL key {} to
retrieve the same information. Your lookup file will now be in
the form,
Recs_Count
2340
And we can use the following lookup call to retrieve the

Recs_Count field value,
lookup (“Total Recs Processed Count”, {} ).Recs_Count
Way to check the performance
When environmental variable

AB_XFR_PROFILE_LEVEL set to value
“statement”, you can get more detailed information
how the functions in your XFR are working from
performance perspective.
Steps to achieve:
Define AB_XFR_PROFILE_LEVEL as graph level local
parameter and provide the value as “statement”.
Redirect the log port of your XFR component to a
file. It will provide more details from performance
perspective.
Note: The graph execution time will be slowed. This

type of testing should be done only in development
and as a test case only.
Recently Prama application had an improvement in

their graph performance by doing these little
changes.
Scenario:
1) They get a no of day’s value from mainframe and
they need to subtract from 18500101
Before change:
(date(“YYYYMMDD”)datetime_add((date("YYYYMMDD
"))"18500101", days-1)
After change:
(date("YYYYMMDD")) (days - 18263)
This improved around 25% of the time.
2) They need to replace all non-printable characters

with blank.
Before change:
re_replace (input_string, "[^ -~]", ' ');
After change:
Defined the below variables in the XFR

let integer(4) i = 0;
let integer(1) blank_char = string_char(" ", 1);
let unsigned integer(1)[256] translate_map = for(i, i
< 256) : if (i>= 32 && i <= 126) i else blank_char;
and replaced the re_replace function with

translate_bytes(input_string, translate_map);
With this, the time to process 100M records went

down from 770 seconds to 120, more than a factor
of 6
Small Useful Developer Tips
Evaluating graph parameter values without

accessing the GDE
 It can sometimes be useful to know what

values graph parameters are resolved to when
you do not have access to a GDE. You can view
the resolved parameter values by using the air
sandbox run command with the -script-only
option. This command generates a deployed script
that is displayed to stdout. The output shows how
the parameters are resolved without actually
running the graph.
 Alternatively you can also use air sandbox

parameter for this purpose as well for graph and
sandbox parameters. For example,
air sandbox parameter -path
my_graph.input.pset -eval my_PARAM
will return the value of the graph parameter,
my_PARAM.
Double Clicking the “Stop Execution” button in
GDE
 When you click the Stop Execution button in

the GDE, the GDE issues a command to
gracefully shut down the job. Double-clicking Stop
Execution issues a command to disconnect from
the remote server. This is almost always
undesirable as it can leave remote processes
running or in an inconsistent state. If you
accidentally double-click Stop Execution, the GDE
will prompt you with a dialog asking you to confirm
that you really want to disconnect. When
presented with this dialog, you should select No
and let the graph shut down cleanly.
 The GDE cleans up watcher files automatically

unless the cleanup process terminates abnormally
— for example, if you disconnect from the run host
by clicking the Stop Execution button two or more
times. Old watcher datasets that have not been
cleaned up should be removed through the GDE
main menu by selecting Debugger>Delete
Watcher Datasets. This action deletes all watcher
datasets in the current run directory.
Check parameter usages in all places of a graph
 To see all places in a graph where a parameter

is used:
1. From the GDE menu, choose Edit >

Parameters.
2. From the Parameters Editor, choose Edit >
Find All and enter the name of the parameter you
are looking for.
Inline expansion of small, frequently called

functionsUse AB_BREAK_ON_ERROR while
debugging
1) You can sometimes improve the runtime performance of a

transform by expanding small frequently-called functions
inline. Inline expansion replaces a function call with inline
code, thus eliminating call overhead. However, this
increases the size of the generated top-level code and also
the time it takes for a component to start up, so it should only
be used with small functions that are frequently called. By
default, single-line transform functions are expanded inline
wherever they are located.
The inline keyword can be added to a function definition to

indicate that the function will be expanded inline at runtime.
For example:
out :: MyFunction( a ) inline =

begin
…
end;
For more information, see "Inline expansion of simple and
complex transforms" in Ab Initio Help
2) When the configuration variable AB_BREAK_ON_ERROR

is set to true and debugging is enabled, an error or reject will
start the debugger at the point of failure or rejection. This
makes it easy to examine the state of the transform at the
point of failure or rejection
Sample Memory Allocation Issue and Resolution
Error:
========= Error from
PBKS_COMPANY_ID_POLICY_NUM_SOURCE_SYSTE
M_ENTY_ID_CODE_.Sort.004 on
abihost ========= Memory allocation failed
(8388608 bytes).
Current data ulimit is 'unlimited'.
Current vmem ulimit is 'unlimited'.
Current address space ulimit is 'unlimited'.
Heap is at least 35739328 bytes already.
This graph may be improved in its use of memory

but the change that would really impact the memory
allocation and would avoid the graph failure is to add
phase breaking or to reduce the parallelism and run
the graph 1-way instead of 2-ways.
In fact, as the graph is written now, here is the
computation of the maximum memory required by
the (only) phase:
11 Sort * 2-ways parallel = 22 Sort processes

* 100 Mb max-core = 2 Gb.
Plus you have to take into account ~7Mb. overhead

per component in the phase.
So the change that would have a bigger impact

would be having the PBYK components in one phase
and all the SORT-DEDUP in the second phase.
There are other things that can be improved. The
following comments are minor changes with
decreasing order of importance that can improve
your graph and memory allocation.
1) As a general rule, Sorting is expensive. It's often
necessary, but you should always think carefully
about whether it's required. For example, don't use
Partition by Key and Sort if just a Partition by Key
will do. In your graph you have a PBKS component
that can be replaced by a PBK and SORT WITHIN
GROUP. The component is
PBKS_COMPANY_ID_POLICY_NUM_SOURCE_SYSTEM
_ENTY_ID_CODE_. Since your data are already
partitioned and sorted by the first 3 keys you can
simply partition by {ENTY_ID_COD} and Sort within
group. In this way you can at least take rid of one
SORT component.
2) Besides that, you could use a REFORMAT before
the PBKS and drop the unnecessary fields that you
are not going to have on the output port of your
JOIN. In this case you would sort on a smaller
number of data and it would be more efficient.
You may want to use a REFORMAT and its parameter
output-index (look at the GDE online help for more
details) to separate the input records in different
transforms and output ports. This Reformat would
drop ~half of the fields per record.
3) My guess is that you don't need the REFORMAT
FORMAT D if then you are going to trash these
records.
The Partition by Key and Sort done before using

partition by expression wouldn't improve the
performance and neither sort/dedup ahead since
sorting on smaller group on data is more
performing.
Parameter evaluation in a graph using PDL
Let us assume the mapping file

($AI_MAPPING/ewpoc_table_details.txt) have the following
contents:
f36:EWT_CLM_STATUS_HIST :{ f36_adw_claim_id}:
f37:EWT_CLM:{f37_adw_claim_id}:
And our requirement is to extract table name (2nd field from
mapping file) and key (3rd field from the mapping file) based
on the table code which is a formal parameter to the graph.
The approach is as follows:
Step 1: Get the contents of the mapping file into a
parameter (say parameter_file)
parameter_file :
$AI_MAPPING/ewpoc_table_details.txt
Step 2: Get the corresponding row to the table code
parameter_row : $
[string_split(re_get_match(parameter_file,
TABLE_CD+":.*"), "\n")[0]]
Step 3: Get the values from parameter_row as follows:
TABLE_NAME : $[string_split(parameter_row, ":")

[1]]
PARTITION_KEY : $[string_split(parameter_row,
":")[2]]
Example: Suppose f36 is the formal parameter as table code
From step-1, the parameter_file will have the following

content
f36:EWT_CLM_STATUS_HIST:
{f36_adw_claim_id}:
f37:EWT_CLM:{f37_adw_claim_id}:
From step-2, parameter_row will have the row related

to the passed table code
f36:EWT_CLM_STATUS_HIST:
{f36_adw_claim_id}:
From step-3,
TABLE_NAME is EWT_CLM_STATUS_HIST
PARTITION_KEY is {f36_adw_claim_id}
Use AB_DML_DEFS and AB_INCLUDE_FILES
Do not confuse the AB_DML_DEFS DML inclusion

parameter with the AB_INCLUDE_FILES
configuration variable.
AB_DML_DEFS is a graph, plan, or project parameter

that contains DML declarations and definitions for
use within inline DML in other parameter definitions.
Inline DML is evaluated during parameter evaluation.
AB_INCLUDE_FILES is a configuration variable that
specifies paths of files to include during DML
evaluation in component transforms and record
formats. This happens during runtime DML
evaluation, which occurs separately from and much
later than parameter evaluation.
In general, always declare AB_DML_DEFS as a local

-- not an input -- parameter. The purpose of
AB_DML_DEFS is to allow you to define a self-
contained DML context for a graph or plan,
independent of the circumstances of the graph's or
plan's execution.
For more information see "AB_INCLUDE_FILES",

"The AB_DML_DEFS DML inclusion parameter", and
"Using AB_DML_DEFS" in Ab Initio Help.
Converting Datetimes Between Time Zones
Converting a date format specifier to a specifier with a UTC

time-zone offset
Sometimes we need to convert a datetime from the DDD,

DD MMM YYYY HH:MM:SS +OOOO format to YYYY-MM-
DD 24HH:MM:SS with a UTC time-zone offset (for example:
Mon, 04 Jul 2005 08:52:50 -0400).
As in some of the application we convert the time to GMT,

CST, MST, EST, PST or many more time zones
The time-zone value data conversion rules are:

• When you assign a datetime value without time-zone
offset information to a datetime value with a time-zone offset,
the result is assumed to be a UTC time.
Depending on the format specifier, the assignment adds a Z,

+0000, or +00:00 to the datetime value with a time-zone
offset.
• A cast to a datetime format without a timezone specifier

converts the timestamp to UTC. Compare the following:
Code:
$ m_eval '(datetime("YYYY-MM-DD
HH24:MI:SS+ZONE"))"2009-08-06 18:34:23+0600"'
"2009-08-06 18:34:23+0600"
$ m_eval '(datetime("YYYY-MM-DD HH24:MI:SS"))

(datetime("YYYY-MM-DD HH24:MI:SS+ZONE"))"2009-08-06
18:34:23+0600"'
"2009-08-06 12:34:23"
$ m_eval '(datetime("YYYY-MM-DD HH24:MI:SS"))

(datetime("DDD, DD-MM-YYYY HH:MI:SS +ZO:NE")) "Mon,
10-08-2009 10:12:01 +06:00"'
"2009-08-10 04:12:01"
Converting Invalid Date Format To Valid Oracle Date

Format
As several times we have the scenario were we have to form
the target date field concatenating Year, Month and Day field
or combination of two fields and hard coding the third field
and many other ways.
Also there are the situations where we have 2 byte input
source field and the data we are receiving for the Month/Day
field say for example
“04”
“ 4”…etc
Or during design at many places we keep the check if the

month is less than 10 then append ‘0’ to the value to form
the month value something like ‘01’,’02’ and so on.
But we can overcome all the extra effort with extra type
casting the data. See the below examples for more details:
Example 1: In this case Abinitio evaluated the date as valid

even though the month has space in the value.
$ m_eval ' (date("YYYYMMDD"))"2005 505"'
"2005 505"
When you will try to load this value to Oracle it will failed
saying invalid month. I know many of you are curious to
know why Abinitio treated as the valid date but this is how it
works.
Example 2: Using Extra type caste

$ m_eval '(date("YYYYMMDD"))(int)
(date("YYYYMMDD"))"2005 505"'
"20050505"
In the above case, If you reformat the date in any way, the
space embedded in the date would have been replaced with
a zero.
That is why the cast to an integer and back allows this to
work. But if you are just copying the column without
changing its format in any way then we do not check.
Determining Whether A Vector Contains A Given

Element
Use the member operator to determine whether a vector
contains a given element.
The member operator is highly optimized and is generally

the most efficient method of searching a vector for a given
value. The following example shows how you can determine
whether a vector of names contains the name Smith:
Code:
out.found :: "Smith" member in.names;
Examples
The following examples show the use of the member
operator.
Example 1. This example assumes that the following vector

named New_England was defined globally in
AB_INCLUDE_FILES:
let string('\0')[6] New_England = [vector "Massachusetts",

"Rhode Island", "Connecticut", "Maine", "New Hampshire",
"Vermont"];
$ m_eval "'Massachusetts' member New_England"

1
$ m_eval "'New York' member New_England"

0
Improving the Performance of Sort
Ab Initio Tip of the Week:
The Ab Initio sort algorithm is efficient, but it is still an

expensive operation in terms of CPU usage and memory. If
you wish to improve the performance of a sort operation
within your graph, there are a number of areas you can
examine.
Do you really need to Sort?
The quickest way to decrease the impact of a SORT

component on your overall graph performance is to remove
the SORT component entirely. Look at your requirements
and use of SORT components in your graph carefully. For
example, if you are sorting prior to sending records to a
ROLLUP component, it may make more sense to use an in-
memory rollup. If it isn’t possible to eliminate sorting entirely,
look at combining multiple sort operations into a single
SORT component or a series of SORT and SORT WITHIN
GROUPS components.
Record Format
The SORT component needs to parse each record in your

data stream, therefore making sure that your records and the
sort keys can be parsed efficiently is important. In general
keys should be of a fixed-width non-nullable type and are
grouped at the beginning of your record. Your record will be
parsed more quickly if it contains only fixed-width fields.
If this is not true of your existing record format, you can alter
an existing transform component or add a new one before
the SORT to optimize your record format. You can then use
another transform after your sort to return the records to your
required format. The extra overhead of reformatting is often
compensated for by the quicker sort.
Compression
If you are sorting a volume of data that will not fit within the
amount of memory specified by the max_core parameter, the
sort will need to spill all of its records to disk. If a large
volume of data needs to be written to disk, the I/O time used
for this operation may be significant enough that it makes
sense to compress the data before writing it to disk. As
changing this parameter to compress spill files can add
significant CPU time to your graph, it is important to
benchmark your graph with realistic amounts of data before
and after making this change. For example, if the disk I/O
rate is relatively high (compared to CPU), it may be that the
Sort component will run faster without the compression.
Use Of re_match_replace Function
The regular expression DML function re_match_replace,

introduced in the Co>Operating System 2.15.3, allows you to
use named capturing groups to replace substrings of a
matched pattern.
The following example shows how to reverse the order of

three short words.
Code:
m_eval 're_match_replace("Mon Tue Wed","(.{3})(.{3})(.
{3})","$3 $2 $1"
)'
"Wed Tue Mon"
Each parenthesized sub expression used in the pattern can

be referenced in the replacement string with the format
$number– where $0 refers to the whole expression, $1
refers to the first sub expression, $2 refers to the second sub
expression and so on.
To Format Numbers, Cast To A Decimal Type Rather

Than Using Printf
In most programming languages, you have to call a function

to convert the numeric value to a string. C programmers
often use the printf or sprintf functions for this, and users
new to Ab Initio software sometimes are drawn to the DML
printf function to perform that task. It works, but it's almost
always overkill.
In most cases, the most efficient and elegant way to get the
text form of a number is to cast it to a decimal type. For
example, if x is a real(8), you can get the text form of its
value formatted with an explicit decimal point, with four digits
to the right of the decimal, with the expression:
(decimal("".4))x
In some cases, the number you need as a string may be a

decimal number, in which case it's already in text form. You
can assign such values directly to string fields or use them
as input to various string functions. Occasionally, it may be
necessary to explicitly cast a decimal value to a string type,
as when using the + concatenation operator. For example, if
d is a decimal, you might write:
// Reject the record if d is smaller than 42:
if (d < 42)
force_error("The value of d, " + (string(""))d + ", is too

small.");
Mainframe IssuesResolution
Here are the most common mainframe issues you see
Scenario 1:
ABINITIO(DB00113): Error remotely

executing 'm_db' on node 'mvsgl93'.
mvsgl93: Remote job failed to start up
===========================================
=====================
Waiting for login prompt ... responding ...
done.
Waiting for password prompt ...
responding ... done.
Waiting for command prompt ...
/apps/xt01/abinitio-V2-14-1/bin/m_rtel:
pipe closed unexpectedly when processing
pattern %|#|$|> and answer cat >
/tmp/rtel.10.48.74.75.3962
failing command: /apps/xt01/abinitio-V2-14-

1/bin/m_rtel -h mvsgl93 -u hlprod
-script /apps/xt01/abinitio-V2-14-
1/lib/telnet.script -shell sh -packet
/apps/abi/abinitio/bin/bootstrap
/apps/abi/abinitio /NONE bin/inet-exec
"7920" "7921" "022" "10.48.74.75:60686"
"m_db list - -use_args_in_config
-do_data_translation"
"AB_HOST_INTERFACE=mvsgl93"
"AB_TCP_CONNECTION_TOKEN=enabled"
"AB_LAUNCHER_VERSION=2.14.104"
"AB_LAUNCHER_PROTOCOL_VERSION=P_late_arg_pa
ssing"
-------------------------------------------
---------------------
Trouble starting job:
Remote host: mvsgl93
User name: hlprod
Startup method: telnet
Remote AB_HOME: /apps/abi/abinitio
Local interface: 10.48.74.75
===========================================
=====================
ABINITIO(*): Database Package
Version 2-14-104-e11-1
Scenario 2:

executing 'm_db' on node 'mvsusys'.
mvsusys: Remote job failed to start up
===========================================
=====================
done.
Waiting for command prompt ... got it.
-f /tmp/rtel.10.48.74.11.3886 221 <
/dev/null ; rm -f
/tmp/rtel.10.48.74.11.3886 ; exit
/apps/abi/abinitio/bin/inet-exec: corrupt
argument file /tmp/rtel.10.48.74.11.3886 -
expected size 221 but actual file size 0
Possibly the value of AB_TELNET_PAUSE_MSECS
should be increased from its
current setting of 200
======= Argument file follows:
-------------------------------------------
---------------------
Remote host: mvsusys
User name: ABIEPC
Local interface: 10.48.74.11 (from
AB_HOST_INTERFACE)
===========================================
=====================

Version 2-14-104-e11-1
Scenario 3:

[DB00109,DB00112,DB00200,DB00113,B148,B1105
,B1108,B1101,B1,B1104,B1103]
ABINITIO(DB00109): Error getting the
database layout.
ABINITIO(DB00112): Subprocess m_db
returned with exit code 4.
ABINITIO(DB00112): It was called as: m_db
hosts
/export/home/rrudnick/sandbox/apt/hrm/ic/db
/testv_db2hrm.dbc -select SELECT
agn_agent_type_cd, agn_agent_nbr FROM
testv.P1T_TOT_AGENT WHERE agn_end_eff_dt =
'9999-12-31'
ABINITIO(DB00112): The following errors
were returned:
ABINITIO(DB00112):
-------------------------------------------
----------

executing 'm_db' on node 'mvsasys'.
mvsasys: Remote job failed to start up
===========================================
=====================
done.
-f /tmp/rtel.10.48.74.75.3943 221 <
/dev/null ; rm -f
/tmp/rtel.10.48.74.75.3943 ; exit
/apps/abi/abinitio/bin/inet-exec: corrupt
argument file /tmp/rtel.10.48.74.75.3943 -
expected size 221 but actual file size 0
Possibly the value of AB_TELNET_PAUSE_MSECS
should be increased from its
current setting of 200
======= Argument file follows:
-------------------------------------------
---------------------
Remote host: mvsasys
User name: HRMABID
===========================================
=====================

Version 2-14-104-e11-1
ABINITIO(DB00112):
-------------------------------------------
----------
[Hide Details]

Cause of Error: [DB00112]
DB00112_1: 4
DB00112_2: m_db hosts
/export/home/rrudnick/sandbox/apt/hrm/ic/db
/testv_db2hrm.dbc -select SELECT
agn_agent_type_cd, agn_agent_nbr FROM
testv.P1T_TOT_AGENT WHERE agn_end_eff_dt =
'9999-12-31'
DB00112_0: m_db

DB00112_3: [DB00200]
Database Package Version: 2-14-104-e11-1

Base Error: [DB00113]
DB00113_0: m_db
DB00113_1: mvsasys


Execution starting...
Error reported with 'mp error' command
layout4
Error getting the database layout.
ABINITIO: Fatal Error
Script end...
ERROR : ++++ FAILED ++++ Job
clifeii_018_clifeii_018_ic_002_rfmt_af_hrm_
common_layout failed.
Failed

Scenario 4:
[R147,R3999]
Could not create working directory: Agent
failure
Base File =
"file://mvsasys/~mvsqds/RNN.EDW.EW368.NOVA.PR
OCESS3.SRTD.OCTQC01,%20recfm(vb),
%20varstring,%20recall,recfm(vb)
varstring
recall"
Work Dir =
"file://mvsasys/~ab_data_dir/a304a48-
48cfd68d-16c2-000"
Error details:
ABINITIO: start failed on node mvsasys
Could not start agent:
Cannot create agent data directory: No
space left on device
Path = "/apps/abi/data/a304a48-48cfd68d-
16c2-000"
Scenario 5:
cjade@gl04dm02:ewabipd2 [/allstate/log] -->

more
/apps/xt11//data/admin/ent/adw/premium_rewr
ite/error/./ewprd610_nwt_thrd_pty_unload_27
316_2008-11-17-17-42-20.err
Trouble creating layout "layout2":
Could not create working directory: Remote

process did not start correctly
Base File = "file://mvssw91/"
Work Dir =
"file://mvssw91/~ab_data_dir/a4253a0-
4921f354-6b81-001"
Error details:
mvssw91: Remote job failed to start up
===========================================
=====================
Waiting for login prompt ...
-------------------------------------------
---------------------
Remote host: mvssw91
User name: ABIPRM1
===========================================
=====================
cjade@gl04dm02:ewabipd2 [/allstate/log] -->
Scenario 6:

Base File =
"file://mvsasys/~mvsqds/TESTPR10.PRM.OCT17K.D
PR10001,%20recfm(vb),%20varstring,%20recall"
Work Dir =
"file://mvsasys/~ab_data_dir/a304a48-
49272e61-e1d-000"
Error details:
mvsasys: Remote job failed to start up
=============================================
===================
IKJ56644I NO VALID TSO USERID, DEFAULT USER
ATTRIBUTES USED
IKJ56621I INVALID COMMAND NAME SYNTAX
---------------------------------------------
-------------------
Remote host: mvsasys
User name: TESTZ
Startup method: rexec
Remote AB_HOME: /apps/xt01/abinitio-V2-
14-1
=============================================
===================
Scenario 7:
Execution starting...


[D205]
Trouble creating layout "layout-
Unload_Products_Using_Q_Schema__table_":
[Show Details]


[R147,R3999,B148,B1105,B1108,B1101,B1,B1104,B
1103]
Base File = "file://mvsesys/"
Work Dir =
"file://mvsesys/~ab_data_dir/a311cca-
48d7400f-48c2-000"
Error details:
mvsesys: Remote job failed to start up
=============================================
===================
EZA4386E rshd: Permission denied.
---------------------------------------------
-------------------
Remote host: mvsesys
User name: awetlrun
Startup method: rsh
Remote AB_HOME: /apps/xt01/abinitio-V2-
14-1
=============================================
===================
Solutions:
Please follow these simple steps and you will be able to

identify the root cause and many times solve the issue.
1) All the needed settings are provided i.e., the
following parameters should be set before your graph
gets executed.
AB_NODES @ mvshost_all :
mvsasys
AB_HOME @ mvshost_all :
/apps/abi/abinitio
AB_WORK_DIR @ mvshost_all :
/apps/abi/abi-var
AB_CONNECTION @ mvshost_all :
telnet
AB_TELNET_PORT @ mvshost_all :
1023
AB_TELNET_TERMTYPE @ mvshost_all :
vt100
AB_EBCDIC_PAGE @ mvshost_all :
ebcdic_page_1047
AB_STARTUP_TIMEOUT @ mvshost_all :
120
AB_USERNAME @ mvshost_all :
xxxxxx
AB_ENCRYPTED_PASSWORD @
mvshost_all : xxxxxxx
Usually they get set in your .abinitiorc file (existing in

your home directory) or in AB_CONFIGURATION or in your
DBC file
2) In order for abinitio to run its utilities in mainframe the
id should have unix system services.
You can check by
i) If you know the password:

Go to start  run  cmd . type “telnet <<your
mainframe server>> 1023.
It will prompt for user/passwd. If you are able
to log in, it means you have the permission or
indirectly OMVS segment is added for your id
ii) If you don’t know the password:
In your unix session type “ m_ls //<<your
mainframe server>>/tmp
If it gives you with information, then you have
access/OMVS segment.
Step 1 and 2 will identify issues like settings/user and name

password issues/omvs segment. Common ones are
password expirations/omvs segment not added for your id.
3) If you are successful in step 1 and 2, you need to
check the space issue.
In mainframe we write files either to /tmp or
$AB_WORK_DIR (/apps/abi/abi-var) only.
If you know the password:

Do the login using telnet as stated in step 2,
type the df command or du (by going into the respective
path). This will tell you the space.
If you don’t know the password:
Type in your unix session m_du or m_df
Eg:
(abinitio)abinitio@xtnb1dv1 : /export/home/abinitio
=> m_df //mvsgl93/tmp
1024-Blocks Used Avail Cap Skew
Filesystem
350,640 137,440 213,200 39%
//mvsgl93/tmp
If you know its space issue, during day time please raise
ticket to “3OS390_SOL”/jack arras. If it is off time, please
contact ATSC/DCO to raise incident against IOM/zOS.
In the above examples, 6 and 7 are setting issues

and 1-5 are space issues.
Inline Expansion Of Simple And Complex Transforms
Inline expansion of simple and complex transforms
You can improve runtime performance by expanding a

function inline. Inline expansion replaces a function call
with actual function code, thus eliminating call overhead.
However, this can increase the size of the generated top-
level code and also the time it takes for a component to
start up.
Inline expansion is controlled by the inline keyword (shown
below in"Expanding a particular function" ) and several
configuration variables. By default, single-line transform
functions are expanded inline wherever they are located.
This behavior is controlled by the default value of
AB_XFR_INLINE_SIZE_LIMIT, which is 1. For more
information on this and the other configuration variables,
see "Configuration variables affecting inline expansion".
Expanding a particular function
To expand a particular function inline at every calling

location, add the word inline to the function definition.
For example:
out :: myfun(a, b, c) inline =
begin
...
end;
Transforms declared this way are expanded inline as long
as AB_XFR_EXPAND_INLINE is set to True.
Expanding all functions
To expand inline all transforms having a particular level of

complexity, you can set the configuration variable
AB_XFR_INLINE_SIZE_LIMIT. For inline expansion, the
complexity of a transform is taken to mean the total
number of statements, rules, and local variable
declarations. The setting of AB_XFR_INLINE_SIZE_LIMIT
affects all transforms, regardless of whether they were
explicitly declared inline.
For example, the following transform is expanded inline if
AB_XFR_INLINE_SIZE_LIMIT is set to 4 or greater. The
complexity of the transform is four because the transform
has two local variables, one statement, and one rule
(2+1+1=4):
out :: yourfun(a, b) =
begin
let int x = a - b;
let int y = x * x;
y = y + y / 2;
out :: if (y > 2 * x) a else b;
end;
When To Use The Protocol Prefix (File, Mfile or Mvs)
When a GDE text box for a component parameter is labeled

URL, it’s a good idea to use Ab Initio URL syntax:
protocol://hostname/pathname
Where:
 The value of protocol represents the type of dataset to
which the URL points: file, mfile, or mvs.
 The value of hostname specifies the computer where
the file or control partition resides.
 The value of pathname is an absolute pathname
indicating where on the computer the file or control
partition resides. It must be in the form accepted by the
native operating system of that computer.
Under most circumstances the Co>Operating System will

infer the correct value for an omitted protocol, but specifying
the protocol prefix explicitly will make your graph more
readable and resolve any ambiguity of the dataset type.
When a GDE text box for a component parameter is labeled

File, the value of the parameter should be simply a local file
path:
/pathname/filename
In particular, this applies to DML, XFR, DBC (and similar)

files, which should all be local to the graph at startup.
Null Does Not Equal Null When Doing Field

Comparisons
Null is a special marker used in Structured Query Language

(SQL) to indicate that a data value does not exist in the
database. Introduced by the creator of the relational
database model, E. F. Codd, SQL Null serves to fulfill the
requirement that all true relational database management
systems (RDBMS) support a representation of "missing
information and inapplicable information". Codd also
introduced the use of the lowercase Greek omega (ω)
symbol to represent Null in database theory. NULL is also
an SQL reserved keyword used to identify the Null special
marker.
Null has been the focus of controversy and a source of

debate because of its associated Three-Valued Logic (3VL),
special requirements for its use in SQL joins, and the
special handling required by aggregate functions and SQL
grouping operators. Although special functions and
predicates are provided to properly handle Nulls,
opponents feel that resolving these issues introduces
unnecessary complexity and inconsistency into the
relational model of databases.
NULL is a marker that represents missing, unknown, or
inapplicable data. Null is untyped in SQL, meaning that it is
not designated as a NUMBER, CHAR, or any other specific
data type. Do not use NULL to represent a value of zero,
because they are not equivalent.
NOT NULL constraint

Columns in a table can be defined as NOT NULL to indicate
that they may not contain NULL values (a value must be
entered). Example:
CREATE TABLE t1 (c1 NUMBER PRIMARY KEY, c2 DATE NOT

NULL);
Comparisons
Any arithmetic expression containing a NULL always
evaluates to NULL. For example, 10 + NULL = NULL. In fact,
all operators (except concatenation and the DECODE
function) return null when given a null operand.
Some invalid examples:
Example 1:
A NULL is not equal to a NULL:
SELECT * FROM emp WHERE NULL = NULL;
Example 2:
A NULL cannot be "not equal" to a NULL either:
SELECT * FROM emp WHERE NULL <> NULL;
Example 3:
A NULL does not equal an empty string either:
SELECT * FROM emp WHERE NULL = '';
Valid examples
Example 1:
Select column values that are NULL:
SELECT * FROM emp WHERE comm IS NULL;
Example 2:
Select column values that are NOT NULL:
SELECT * FROM emp WHERE comm IS NOT NULL;
Example 3:
Change a column value to NULL:
UPDATE emp SET comm = NULL WHERE deptno = 20;
Handling Delimited Data with Missing and Extra

Delimiters
The easiest solution to handling data with missing

delimiters is to have your data provider provide you with
clean data in the first place. Otherwise, depending on the
nature of your data, you can run into issues trying to
decipher where a delimiter is supposed to be.
Often, if the incidence of bad data is low enough, you can

just collect these records for manual processing through
the reject port of an early component. Keep in mind that
relying on validating the data against its type may not
catch all the bad data as shown in the examples that
follow.
Throughout the remainder of the week, we'll post simple

cases demonstrating the basic techniques that can be used
with badly delimited data. An example graph and data that
implements the techniques described here and in the tips
to follow is attached.
For more information see, the REPAIR INPUT component

and the “Malformed Delimited Data” topics in Ab Initio
Help. For help with more complex examples, contact Ab
Initio Support.
In this example, we'll use a record with two delimited

fields defined as:
record
string(“|”) code;
string(“\n”) description;
end;
Here are two records provided:
AThis text describes type A

B|This text describes type B
Because the first record is missing the pipe, the data in
these two records will incorrectly be parsed as a single
record. Relying on validating data by its type will not catch
the error:
[record
code " AThis text describes type A\nB "
description "This text describes type B"
]
To repair bad input records automatically within your

graph, you must understand your data and what logic you'll
need to form a good record from a bad record.
When you know that there may be missing internal

delimiters, but you'll always have a line delimiter, you can
use a more generic DML record format to describe the
data, and then use a REFORMAT transform to parse the
data with explicit logic:
record
string(“\n”) line;
end;
To use this type of solution, you must understand the logic

behind how, in the absence of a delimiter, you could
identify which portion of the newline-delimited field goes
into the code field and which portion goes into the
description field. Here you know that along with being
delimited by a pipe, code is also always a single character.
You can write a transform that first checks for a delimiter,
then takes the first character of the line and assigns it to
the code field, and assigns the remainder to the description
field.
Using the NORMALIZE Component To Drop Records
The NORMALIZE component allows you to output a variable

number of records – including zero records – for each
incoming record. This makes it possible to use the
NORMALIZE component to drop records.
The FILTER BY EXPRESSION component is usually used to

select or deselect records but there are times when the
logic required to select records cannot be written in a
single expression. The transform parameter of the
NORMALIZE component allows you to use global variables
and more complex and stateful calculations when
determining whether to drop records.
For example, consider a flow of integer values in which you

want to keep only the integers that are greater than the
sum of the integers you've seen so far. Given the following
input values:
2 5 1 9 4 9 2 35 24
The correct output would be:
2 5 9 35
To do this with a NORMALIZE component, use a global

variable to keep the running sum. In the length function,
compare the running sum to the current value; if the
current value is greater than the running sum output 1;
otherwise output 0. This function will drop any record with
an integer that is less than the running sum.
Keyword Versus Positional Parameters In Command
Lines
When using input parameters, keyword parameters offer

more flexibility and clarity than positional parameters in
the command line options. With positional parameters, it is
important that you specify the parameters in the right
order, as prescribed in the graph. For example, command
syntax using positional parameters may look like:
my_graph.ksh 200612 some_tb some_database_name
With keyword parameters, you specify the parameter name

first (preceded by a hyphen) and the value next. The order
in which the parameter names appear is not important:
my_graph.ksh -PMONTH 200612 -SOURCE_TABLE some_tb

-DATABASE_NAME some_database_name
or
my_graph.ksh -SOURCE_TABLE some_tb -DATABASE_NAME

some_database_name -PMONTH 200612
The keyword syntax provides more insight as to what

parameters correspond to what values and tends to be
more maintainable over time, as new parameters get
added.
For more information, see “The parameter lines” in Ab

Initio Help.
General Information Regarding Phases, Checkout

and Run Program
Do not decouple phases and checkpoints:
A phase break without a checkpoint is no more efficient

than a checkpoint, and in some cases a checkpoint will
actually use less disk space during the execution of a
graph. For example, if a phase writes to an output file, the
previous contents of that file can be discarded immediately
after a checkpoint, but the file contents must be retained
following a phase break without a checkpoint.
In the absence of any specific recovery requirements, a

graph with all checkpointed phase breaks will use the
minimum disk resources compared to the same graph with
a combination of uncheckpointed phase breaks and
checkpoints in the same locations in the graph.
For more information, see “Phases and checkpoints” in Ab

Initio Help.
Use exit codes to indicate failure in RUN PROGRAM:
When using custom components or the RUN PROGRAM

component, be sure the applications you call indicate
failures by passing any errors through their exit codes.
Unless there is a side-effect on the resulting data used
downstream, the Co>Operating System can only recognize
errors through the non-zero exit status of the called
applications.
XML SPLIT Component:

XML SPLIT reads, normalizes, and filters hierarchical XML
data, turning it into DML-described records that contain
only the fields you specify.
The component requires a description of the input XML, in

the form of either a Schema file or an exemplar file. You
specify the Schema file or exemplar file with the Import
XML dialog, which you then use to describe and create the
DML record format for each output.
Used with XML COMBINE

XML COMBINE reverses the operations of the XML SPLIT
component, so you can use XML COMBINE to recover the
original XML input passed to XML SPLIT. That is, XML
COMBINE re-creates previously flattened hierarchies and
normalized elements, and recombines multiple input
streams.
Exceptions to this behavior can occur when XML COMBINE

reads the following types of data:
Flattened repeating elements

Multiple inputs without a specified key
In these cases, you must use sequence numbers with both
XML SPLIT and XML COMBINE to preserve hierarchical and
other contextual information. You can do this in either of
the following ways:
Use the -generate-id-fields argument when you run the

xml-to-dml utility.
Select the Generate fields checkbox in the Import XML
Options dialog. (This is the default.) For more information,
see "Import XML Options dialog".
Loop Expressions and Vectors
A loop expression results in a vector of values — one value

per iteration of the loop. The following for loop expression
computes a vector of n elements, each of which is the
value of expression, evaluated with i set to incrementing
values from 0 to n-1.
for ( i , i < n ) : expression
For example, this expression squares the value of i:
for ( i, i < 5 ) : i*i;
It returns this vector:
[vector 0, 1, 4, 9, 16]
As the following examples demonstrate, loop expressions

simplify vector related business logic. Using a loop
expression, Example 1 builds a vector from a lookup file
using two local variables and three lines of code. Example
2 implements the same logic without a loop expression and
requires eight lines of code and three local variables. The
loop expression makes a transformation more compact and
readable, but is not necessarily more performant.
Example 1:
Code:
let integer(4) no_of_managers
=first_defined(lookup_count("Stores Lookup", in0.store_no),
0);
let integer(4) idx=0;
out.store_managers :: for (idx, idx < no_of_managers):
lookup_next("Stores Lookup").store_manager;
Example 2:
Code:
let integer(4) no_of_managers
=first_defined(lookup_count("Stores Lookup", in0.store_no),
0);
let integer(4) idx=0;
let string("\1")[integer(4)] store_managers =allocate();
for (idx, idx < no_of_managers)
begin
store_managers= vector_append(store_managers,
lookup_next("Stores Lookup").store_manager);
end
out.store_managers :: store_managers;
m_rollback versus m_cleanup

What is the difference between m_rollback and m_cleanup
and when would I use them?
Short answer
m_rollback has the same effect as an automatic rollback —
using the jobname.rec file, it rolls back a job to the last
completed checkpoint, or to the beginning if the job has
not completed any checkpoints. The m_cleanup commands
are used when the jobname.rec file doesn't exist and you
want to remove temporary files and directories left by
failed jobs.
For detailed information on using the cleanup commands,

see "Cleanup" and "Cleanup commands".
Details
In the course of running a job, the Co>Operating System
creates a jobname.rec file in the working directory on the
run host.
NOTE: The script takes jobname from the value of the

AB_JOB environment variable. If you have not specified a
value for AB_JOB, the GDE supplies the filename of the
graph as the default value for AB_JOB when it generates
the script.
The jobname.rec file contains a set of pointers to the

internal job-specific files written by the launcher, some of
which the Co>Operating System uses to recover a job after
a failure. The Co>Operating System also creates temporary
files and directories in various locations. When a job fails,
it typically leaves the jobname.rec file, the temporary files
and directories, and many of the internal job-specific files
on disk. (When a jobs succeeds, these files are
automatically removed, so you don't have to worry about
them.)
If your job fails, determine the cause and fix the problem.
Then:
If desired, restart the job.

If the job succeeds, the jobname.rec file and all the
temporary files and directories are cleaned up. For details,
see "Automatic rollback and recovery".
Alternatively, run m_rollback -d to clean up the files left

behind by the failed job.
How Does Job Recovery Work

How does job recovery work
Synopsis
The Co>Operating System monitors and records the state of
jobs so that if a job fails, it can be restarted. This state
information is stored in files associated with the job and
enables the Co>Operating System to roll back the system to
its initial state, or to its state as of the most recent
checkpoint. Generally, if the application encounters a
failure, all hosts and their respective files will be rolled
back to their initial state or their state as of the most
recent checkpoint; you recover the job simply by rerunning
it.
Answer
An Ab Initio job is considered completed when the mp run
command returns. This means that all the processes
associated with the job — excluding commands you might
have added in the script end — have completed. These
include the process on the host system that executes the
script, and all processes the job has started on remote
computers. If any of these processes terminate abnormally,
the Co>Operating System terminates the entire job and
cleans up as much as possible.
When an Ab Initio job runs, the Co>Operating System

creates a file in the working directory on the host system
with the name jobname.rec. This file contains a set of
pointers to the log files on the host and on every computer
associated with the job. The log files enable the
Co>Operating System to roll back the system to its initial
state or to its state as of the most recent checkpoint. If the
job completes successfully, the recovery files are removed
(they are also removed when a single-phase graph is rolled
back).
If the application encounters a software failure (for

example, one of the processes signals an error or the
operator aborts the application), all hosts and their
respective files are rolled back to their initial state, as if
the application had not run at all. The files return to the
state they were in at the start, all temporary files and
storage are deleted, and all processes are terminated. If
the program contains checkpoint commands, the state
restored is that of the most recent checkpoint.
When a job has been rolled back, you recover it simply by

rerunning it. Of course, the cause of the original failure
may also repeat itself when the failed job is rerun. You will
have to determine the cause of the failure by investigation
or by debugging.
When a check pointed application is rerun, the

Co>Operating System performs a "fast-forward" replay of
the successful phases. During this replay, no programs run
and no data flows; that is, the phases are not actually
repeated (although the monitoring system cannot detect
the difference between the replay and an actual
execution). When the replayed phases are completed, the
Co>Operating System runs the failed phase again.
Note that it may not always be possible for the

Co>Operating System to restore the system to an earlier
state. For example, a failure could occur because a host or
its native operating system crashed. In this case, it is not
possible to cleanly shut down flow or file operations, nor to
roll back file operations performed in the current phase. In
fact, it is likely that intermediate or temporary files will be
left around.
To complete the cleanup and get the job running again,

you must perform a manual rollback. You do this with the
command m_rollback. The syntax is:
m_rollback [-d] [-i] [-h] recoveryfile
Running m_rollback recoveryfile rolls the job back to its

initial state or the last checkpoint. Using the -d option
deletes the partially run job and the recovery file.
Parallel Loading Of Oracle Tables

Parallel Loading of Oracle tables:
There are restrictions that mean you cannot load an

indexed Oracle table from multi file using utility mode.
This would effectively mean multiple instances of
SQL*Loader running against a table. This is not directly a
problem but the maintenance of the index is. In utility
(direct) mode the index is disabled at the start of a load
and rebuilt at the end of the load, but when the are
multiple loads Oracle does not know which one will finish
last and is to rebuild the index, therefore a graph that
attempts to do this will fail will the error:
SQL*Loader-951: Error calling once/load initialization
ORA-26002: Table EWTESTBM.AUDIT_EOM_LASTACCEPT has

index defined upon it
To work around this the index rebuilding option can be

turned off using:
SKIP_INDEX_MAINTENANCE=TRUE
In the native_options parameter of the Output Table

component used to load the Oracle table.
This means that at the end of the load any table indexes
are left in an unusable state. They can be rebuilt calling
the handy stored procedure DUP_RBLD_UNUSABLE_IDX after
the load has completed, e.g. using Run SQL component in
later phase:
exec DUP_RBLD_UNUSABLE_IDX('${SCHEMA_NAME}','$
{TABLE_NAME}');
Note that the stored procedure requires the schema name.

If required this can be read from the relevant database
configuration file into a graph parameter (use
interpretation of shell), e.g.
$(m_db print ${MY_DBC} -value dbms)
The issue will probably not arise if we don’t require the

indexes.
Parallel Unloading From Oracle Tables
Ab Initio will allow you to parallelise the unloading in a

number of different ways. You are likely to need to
experiment to find the approach that is best for you, as
this can depend on the Oracle database layout, amount of
data involved, network, etc. When testing, remember to
use a representative configuration of computers, network
and data to decide what is best.
You should also look at the log output of the Input Table
component carefully to see the queries that Ab Initio is
issuing. This is an important way to confirm that what one
wants is what one is actually getting.
You should also consider unloading the raw data from the
database and doing the join in Ab Initio. This can turn out
to be faster than doing the join in the database itself.
The following help topics (all in the on-line help) provide

some additional information:
- FAQ: Degree of parallelism and the Database:default
layout
- Parallelizing Oracle queries
- Unloading data from Oracle
Some things to know are that:
1. With ablocal_expr or a serial unload Ab Initio will leave

your hints completely alone and won't add any extra hints.
2. With automatic parallelism (ie using a MFS or

database:default layout and not specifying an
ablocal_expr) Ab Initio will end up specifying a ROWID
hint. If you wish to specify your own hint in addition, you
should explicitly use ABLOCAL(tablename). In this case Ab
Initio issues multiple queries to Oracle, each with a rowid
range clause; an ABLOCAL(tablename) clause in this form
tells the component which table to use when determining
the rowid ranges, and the placement of the ABLOCAL
clause tells the component where to put the rowid range
clause in the SQL statement.
3. If you wish to specify an Oracle hint of /*+ parallel...*/,
then Oracle itself parallelises each query. Therefore if you
are running your Ab Initio INPUT TABLE component with a
n-way MFS, and your Oracle parallel query runs m-ways,
you will end up running n*m ways on Oracle itself. This
may not be what you wish to do.
To summarise:
1. Test on a representative configuration, with
representative data.
2. Examine the output from the log port.
3. If you want to use the /*+ parallel */ hint, you probably
want to run the component serially.
4. If you want Ab Initio to determine the parallelism, use a
MFS layout, and don't specify the /*+ parallel */ hint.
5. Consider unloading the data from Oracle and doing the
join in Ab Initio.
Use Dynamic Script Option and PDL Instead Of Shell

Interpretation
Going forward we are advising developers to use the

Dynamic Script Generation feature of Ab Initio. “Dynamic
script generation is a feature of Co>Operating Systems 2.14
and higher that gives you the option of running a graph
without having to deploy it from the Graphical
Development Environment (GDE). Enabling dynamic script
generation also makes it possible to use Ab Initio's
parameter definition language (PDL) in your graphs, and to
use the Co>Operating System Component Folding feature to
improve your graphs' performance.” To find more about
Dynamic Script Generation please refer to your Ab Initio
Help and search for “dynamic script generation”.
To use Parameter Definition Language (PDL) in graph
parameter makes sure to select PDL as an Interpretation
attribute instead of Shell. Below are sample screen shots
for reference.
In the Run Settings, you see
Please select Dynamic instead of the default value GDE

1.13 Compatible.
This will give additional options for your graph level

parameter interpretations.
As stated above, when you define a parameter and want to

have shell interpretation, please replace that with PDL
interpretation. It can do the same thing a shell
interpretation and additionally it will benefit for
dependency analysis when you check in the graph in EME.
Note: This option has to be set for every graph you do.
Currently there is no way we can have that as default.
With PDL interpretation, you can avoid invoking the Ksh for
each shell interpretation (this happens at the background
which you might not have observed).
Appending Multi Files Using AI_MFS_DEPTH

Parameter

We all make use of Multi File System extensively and
perform all the options available like copy,
move, remove and so on.
But again the scenario becomes little tricky when it comes to
append a Multi File and generic
code has to kept in a place. As we know all the environments
has different depth of Parallelism,
so to mitigate that scenario I have came up with the generic
code which will append the data to
the Multi File irrespective of what the environment code is
running.
Code Snippet:
In the above code you can see the following parameters
been used:
${AI_MFS_PARTITIONS} 􀃆/apps/abinitio/data/mfs/parts
${AI_MFS_DEPTH} 􀃆The values varies from environment
to environment
a) DEV ->2
b) QA ->8
c) PROD -> 8
${AI_MFS_NAME} 􀃆The values varies from environment
to environment
d) DEV 􀃆mfs_2_way
e) QA 􀃆mfs_8_way
f) PROD 􀃆mfs_8_way
With the help of the above piece of a code it will not create
conflicts between environments and
data will be appended to the Multi File properly at the
partition levels.
I have used this code in one of my application and its giving
the required output.
Layout definition for ORACLEDB2 Database

tItt s just from Information point off view as I think most of you
know.
Whenever we use the table component we have lot many
options for defining the layout of the
component they are as follow:
1) Propagate from neighbors
2) Component
3) URL
4) Custom
5) Host
6) Database.
But the behavior is slightly different when we make
connection to Oracle and DB2.
The time when we set the layout as URL with the path, it
holds true when connection is made to
ORCALE. as when ever Ab Initio makes a connection from
UNIX to Oracle, it needs to store
some of the tcpip configuration in a file in the temp directory,
so it write to tmp folder with pattern
6
6
some thing liketel.10.66.142.48.497
�r
66 6� .
But when Ab Initio makes a connection from UNIX to
Mainframe DB2 with layout defined through
URL value, you will end up with getpwnam failure. The
reason for the same is if you want to use
the mainframe dbc file you should set the LAYOUT to be
database:serial and not
AI_SERIAL/AI_MFS, so that the Data Base component runs
on the mainframe and not on the
UNIX box.
If you want the Data Base component to run on UNIX, then
you must use a different dbc file that
uses DB2 Connect to get to the mainframe database.
So while using the Oracle or DB2 database please make a
note of this things.
On the FLY KEY DML Creation for Compare and
Chaining Process in ADW
From: Dalal, Pratik (Syntel)

Sent: Wednesday, April 21, 2010 4:28 PM
To: Ab Initio Users; Ab Initio Leads
Cc: ISG-Ab Initio Support
Subject: Ab Initio Utility of the Week-On the FLY KEY DML
Creation for Compare and
Chaining Process in ADW
Ab Initio Utility of the Week:
All of us are aware of the Compare and Chaining process we
do in our world. The process that
we follow for creating the dmls for Compare and chaining
process is very tedious and has couple
of steps.
Saying that chance of making the mistakes are also very
prominent i.e. grouping the logical field
as compare key or no compare key and vice versa. So to
make it robust I have come up with the
utility which can serve the following purpose:
1) Save time, as the tables to be added during the design of
any application are very high in
number.
2) Chances of committing the mistakes are zero percent
unless we goofed up anything in
the mapping files.
3) On the fly generation of the code and ready for the use.
n4) Also dont require another eye to view the code.
Usage of the Utility:
The utility looks for the following inputs at the run time:
1) Project alias name like
i) prm for PRAMA
ii) slc for STAND_CLM
iii) nxg for NEXTGEN and so on.
2) Mapping file depicting the table code
3) Mapping file depicting the Logical Key Columns
One file depicting all the required information will also serve
the purpose.
So once project alias name along with the both the file
names are passed to the script following
functionalities will be achieved:
1) Takes the Table Code and TABLE NAME value from the
mapping file having all the table
codes information.
2) Create the no compare fields. The reason for the same as
it varies from project to project.
So for example:
a) PRAMA 􀃆<table_cd>_atomic_ts and
<table_cd>_source_sys_archive_ind
b) STAND_CLM 􀃆<table_cd>_src_sys_eff_ts and
<table_cd>_src_sys_end_eff_ts
c) NEXTGEN 􀃆<table_cd>_d_atomic_ts and
<table_cd>_d_end_atomic_ts
d) Voice 􀃆<table_cd>_process_ts and
<table_cd>_process_end_ts and <table_cd>_atomic_ts
e) And can go differently with some other projects
3) Once the above steps are done it forms three subsets
they are:
a) Logical keys
b) Compare Keys
c) No Compare Keys
4) Also replaced the delimiter of the last key in all the above
mentioned sub sets from say
�\30 1 to \307\001. The reason for the same is once the key
7 17
dml is formed and in the
transformation when we use reinterpret function and go with
the same delimiter it will just
http://www.SmartPDFCreator.com
display the value for only first attribute of LOGICAL_KEY,
COMPARE_KEY and
NO_COMPARE_KEY. So by flipping the delimiter of the last
attribute it will give the edge
to have the information of all the associated attributes in
single field which can be further
used for compare and chaining process.
Below are run time snap shots of the utilities:
Snippet 1: Run time Parameter
Snippet 2: Table Code, Logical Column Names from the
Mapping file
Snippet 3: Key Dml
Location:
/apps/abinitio/admin/util/keydml_generic.sh on
xtabidv2 server
Please let me know in case you have any queries or
concerns.
Regards,
Pratik
ISG-Ab Initio Support
Office:(847)-402-0892
Key Creation in Multi Layout Using

nuxt_in_sequence() Function
1
Aguas, Jessie
From: Dalal, Pratik (Syntel)
Sent: Monday, April 19, 2010 5:09 PM
To: Ab Initio Users
Cc: Ab Initio Leads; ISG-Ab Initio Support
Subject: Ab Initio Tip of the Week-Creation Of Key in Multi
Layout Using next_in_sequence()
We all know the use of next_in_sequence() and it’s a pretty
straight forward when we need to use in a serial layout.
The complexity comes when we need to use for multi file
layout as it become tricky. The reason for the same is say for
ex.
If we have 4-way partition and each partition has 6 records,
so below are the two scenarios:
Scenario 1: Using next_in_sequence() only in the
component working in Multi Layout:
Record 1 Record 2 Record 3 Record 4 Record 5 Record 6
Partition 0
123456
Partition 1
123456
Partition 2
123456
Partition 3
123456
As shown above the key value will have duplicates in Multi
layout.
Scenario 2: Expected Key values when Component working
in Multi Layout:
Record 1 Record 2 Record 3 Record 4 Record 5 Record 6
Partition 0
1 5 9 13 17 21
Partition 1
2 6 10 14 18 22
Partition 2
3 7 11 15 19 23
Partition 3
4 8 12 16 20 24
This can be achieved by using the next_in_sequence(),
number_of_partitions() and this_partiton(). To do so below is
the
derived formula:
[(next_in_sequence() – 1) * number_of_partitions() +
this_partiton() ] + 1.
2
With the help of this we’ll be able to generate the sequence
as shown above and thus omitting the duplicate key Value.
Note:
number_of_partitions() Returns the number of partitions.
this_partiton() Returns the partition number of the
component from which the function was called.
Please let me know in case you have any questions or
concerns.
Regards,
Pratik
ISG-Ab Initio Support
Office:(847)-402-0892

Questions

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Questions

Încărcat de

Drepturi de autor:

Formate disponibile

Values of configuration Variables(to connect another

There are situations where you need to read/write

Files specified by the value of the

If the Co>Operating System does not find a value for

The files listed in the value of

2) The user configuration file

On Unix — In the user's

Only one user configuration file is

3) The system configuration file (usually set

The system configuration file is

The value of AB_HOME is the path

Performance Considerations During Development

Performance Considerations During

During development, here are some things to watch

There are many things that you can (and

For other performance considerations

 Paging (or having very little free physical

Paging is often a result of:

o Phases that have too many components

 Having too little data per run

When this is true, the graph’s startup time

 Bad placement of phase breaks

Whenever a phase break occurs in a graph,

 Too many sorts

The SORT component breaks pipeline

Checkout Code in Heterogeneous environmentDBC

Check out Code in Heterogeneous Environment:

In your .abinitiorc file (should be created only in

DBC file Parameterize:

Parameterize will help you in shifting to

Setting Confiuration variable values in

Most common issues:

2) AB_CONFIGURATION defined in my project level

3) I have defined my variables but it is taking the old

Job Tracking Window in the GDE

The Co>Operating system generates tracking information as

The Tracking Window:

You can open one or several Tracking windows in the GDE

How to open the Tracking window for a graph

 Do one of the following:

How to open a separate Tracking window for a subgraph,

 Do one of the following:

How to open a separate Tracking window for a port

 Click the component whose port you want to track, then

Simple way to remove header and trailer records

Here a simple way to remove header and trailer

 Filter the 1st record using next_in_sequence()

 Use the Dedup Sorted Component to get the last

Use “m_dump” command to display record in

Record 432018: (Last Record)

Sharing a subgraph across graphs

Sharing a subgraph across graphs:

The DML function “unique_identifier()” returns a

Conditional Components in the GDE

Can I make my graph conditional so that certain components do

You can enter a condition statement on the Condition tab for a

To turn on the conditional components in the GDE:

$( if condition ; then statement; else statement; fi)

Here are three examples:

$(if -n $VARIABLE ; then echo 0; else

Performance improvement of a graph

Improving the performance of an already-existing graph

Working on performance problems in an already-existing graph is

Performance considerations during development