Sunteți pe pagina 1din 83

Values of configuration Variables(to connect another

server)Priority

There are situations where you need to read/write


data to some other server than the running host. In
those cases we need certain configuration
Parameters to be set with the appropriate values.
Abinitio takes this priority while checking for those
parameters.

Files specified by the value of the


AB_CONFIGURATION environment variable.

If the Co>Operating System does not find a value for


a configuration variable in the environment, it looks
next at the files listed in the AB_CONFIGURATION
environment variable. You set a value for
AB_CONFIGURATION as follows:

On Unix — A colon-
separated list of the URLs
of the files

On Windows — A
semicolon-separated list
of the URLs of the files

The files listed in the value of


AB_CONFIGURATION must be
located on the run host. The
Co>Operating System reads the
files in the order listed

2) The user configuration file


The user configuration file must be
named either .abinitiorc or
abinitio.abrc and must reside:

On Unix — In the user's


home directory:

$HOME/.abinitiorc

$HOME/abinitio.abrc

On Windows — In the
user's home directory:

$HOME\.abinitiorc

$HOME\abinitio.a
brc

Only one user configuration file is


allowed. If the Co>Operating
System finds more than one file
named either .abinitiorc or
abinitio.abrc in the $HOME
directory, an error results

3) The system configuration file (usually set


up by the system administrator)

The system configuration file is


named abinitiorc, and is usually set
up by the Co>Operating System
administrator.
On Unix — The pathname
of the system
configuration file is:

$AB_HOME/config/abinitio
rc

On Windows — The
pathname of the system
configuration file is:

$AB_HOME\config\abinitio
rc

The value of AB_HOME is the path


of the directory in which the
Co>Operating System is installed

Performance Considerations During Development

Performance Considerations During


Development

During development, here are some things to watch


out for:

 Over-reliance on databases

There are many things that you can (and


should) do outside the database. For
example, operations involving heavy
computation are usually better done with
components in the graph rather than in a
database. Sorting will almost always be
faster when you use the SORT component
rather than sorting in the database.

For other performance considerations


involving databases, see the Ab Initio
Guide>Book.

 Paging (or having very little free physical


memory, which means you’re close to paging)

Paging is often a result of:

o Phases that have too many components


trying to run at once
o Too much data parallelism

 Having too little data per run

When this is true, the graph’s startup time


will be disproportionately large in relation to
the actual run time. Can the application
process more data per run? Maybe it could
use READ MULTIPLE FILES, for example, to
read may little files per run, instead of
running many times.

 Bad placement of phase breaks

Whenever a phase break occurs in a graph,


the data in the flow is written to disk; it is
then read back into memory at the
beginning of the next phase. For example,
putting a phase break just before a FILTER
BY EXPRESSION is probably a bad idea. The
size of the data is probably going to be
reduced by the component, so why write it
all to disk just before that happens?

 Too many sorts

The SORT component breaks pipeline


parallelism and causes additional disk I/O to
happen.

Checkout Code in Heterogeneous environmentDBC


file Parameterize

Check out Code in Heterogeneous Environment:


When your EME and sandbox are not in the same
server you can check out the objects on command line/ GDE
by doing these simple steps

In your .abinitiorc file (should be created only in


your home directory), please have the following entries
AB_NODES @ <<any name
typically target server>> : << your EME Server>>
AB_HOME @ <<any name typically
target server>> : /apps/xt01/abinitio-V2-14-1
AB_AIR_ROOT @ <<any name typically
target server>> : /apps/xt01/eme/v214/repo
AB_USERNAME @ <<any name typically
target server>> : << your user name of the target
server>>
AB_ENCRYPTED_PASSWORD @ <<any name
typically target server>> : << encrypted passwd of the
target server>>
AB_CONNECTION @ <<any name typically
target server>> : telnet
Command line:
Type the below command on your shell
Export AB_AIR_ROOT=//<<target EME
server>>/<<EME path>>
Now you are ready to do the checkout
using air export command.

GDE:
In your EME Datastore settings, please
provide the details of your target EME server. In your Run
settings, please provide the details of your Sandbox server.
Go the Project -> check out screen to do
the necessary checkout of the objects.

DBC file Parameterize:


It’s good Practice to Parameterize the values
of DBC file as much as you can. While Parameterize try to
use variables which already exist rather than defining again
in your graph/Project parameters.
EG: db_nodes in your DBC file
expects the server name. Instead of hard coding the server
name, see whether you can use any parameter already
defined in the common projects. If your value is current
server, AI_EXECUTION_HOST (variable defined in the
stdenv) can be used.

Parameterize will help you in shifting to


new servers easily during Disaster recovery or any other
server migration issues.

Setting Confiuration variable values in


configuration files and their priority
Setting configuration variable values in configuration
files
If the Co>Operating System does not find a value for a
configuration variable in the environment, it looks through any
available configuration files on the run host.
Available configuration Files (in the order it searches)
1) files named by AB_CONFIGURATION
2) User configuration file
3) System configuration file
If multiple entries for the same variable occur in any
configuration file or files, the Co>Operating System uses only
the first entry it encounters and ignores the rest
Most common ConfigurationVariables
AB_NODES
AB_HOME
AB_WORK_DIR
AB_CONNECTION
AB_TELNET_PORT
AB_TELNET_TERMTYPE
AB_TIMEOUT
AB_STARTUP_TIMEOUT
AB_TELNET_TIMEOUT_SECONDS
AB_TELNET_PAUSE_MSECS
AB_LOCAL_NETRC
AB_USERNAME
AB_PASSWORD
COE team handles most of the definition of the above
variables.
We expect AB_USERNAME and AB_PASSWORD be
defined by the individual application projects
In addition to the above variables, DBC files variables can
be considered as configuration variables
1) Files named by AB_CONFIGURATION:
A list of files where configuration variable and
values can be specified. Separate items in the list with a colon (:)
on Unix platforms
2) User configuration file
If the Co>Operating System does not find a value
for a configuration variable in the environment of a process or in
one of the files listed in AB_CONFIGURATION, it looks next at
the user configuration file, .abinitiorc file which should be in the
user’s home directory
3) System configuration file
If the Co>Operating System does not find a value for a
configuration variable in the environment, in one of the files
listed in AB_CONFIGURATION, or in the user
configuration file, it looks next at the system configuration
file , $AB_HOME/config/abinitiorc

Most common issues:


1) AB_CONFIGURATION defined in graph level and you
see errors stating AB_PASSWORD or any of your DBC file’s
variables not found even though you set those values in your
config file and attach it to AB_CONFIGURATION.
Reason: AB_CONFIGURATION when defined it
should be defined as an export parameter. Please make sure export
check box is clicked.

2) AB_CONFIGURATION defined in my project level


parameter/ graph level with the right settings.
I am getting ambiguous error when trying to
evaluate the parameter.
Reason: As you know AB_CONFIGURATION
contains the paths of various configuration files. If the
variable is defined in multiple places i.e., in your
common projects and private projects and have no
proper association, you get this error. This error means
it couldn’t understand which value to associate for this
variable. It found multiple values for this variable.

3) I have defined my variables but it is taking the old


values.
Reason: Please make sure that these variables are
not defined twice and should be available only in the file you want.
As specified above, co-operation system uses the first entry
found and ignores the rest. The file selection will be taken in the
specified order as above

Job Tracking Window in the GDE

The Co>Operating system generates tracking information as


a job runs. When you run a job from the GDE, the GDE can
display this information using the Tracking Window or Text
Tracking.

The Tracking Window:

You can open one or several Tracking windows in the GDE


and track all, or any combination of, the flows and
components in a graph. If you execute the graph with
Tracking windows open, they display tracking information as
the graph runs.

How to open the Tracking window for a graph

 Do one of the following:


o Click the background of the graph for which you
want tracking information, then choose Tracking
Detail from the pop-up menu
o From the GDE main menu, choose View >
Tracking Detail
o In the GDE, press Ctrl + F2

How to open a separate Tracking window for a subgraph,


component, or flow

 Do one of the following:


o In the Tracking window for a graph, double-click a
row to open a separate window for the subgraph,
component, or flow represented by that row.
o Click a subgraph, component, or flow in the graph,
then choose Tracking Detail from the pop-up
menu.
o Select a subgraph, component, or flow in the
graph, then choose View > Tracking Detail from
the GDE main menu.

How to open a separate Tracking window for a port

 Click the component whose port you want to track, then


choose Tracking Detail for Port from the pop-up
menu.
References: Ab Initio GDE Help and Ab Initio Co>Op
Graph Developer’s Guide

Simple way to remove header and trailer records

Here a simple way to remove header and trailer


records using Ab Initio graph. If you are processing a
file that has a header and a trailer with no field
identifier that identifies the record type, you can
follow the simple graph below. In this example, the
first record is considered the header and the last
record is the trailer. The data file can still be in
EBCDIC format as long as the DML is already
generated. You can use the “cobol-to-dml” utility to
generate DML automatically using COBOL copybook.
Please check GDE Help for more details regarding
“cobol-to-dml” utility.

 Filter the 1st record using next_in_sequence()

Parameters:
Name Value
----------- --------------------------
select_expr next_in_sequence()>1

 Use the Dedup Sorted Component to get the last


record

Parameters:
Name Value
----------- --------
key {}
keep last

Sample Records:

Use “m_dump” command to display record in


UNIX, please see GDE help for more details.

Record 1:

[record
WS_B_CLAIM_NBR "BILL FILE "
WS_B_CHECK_NBR " "
WS_B_BILL_NBR " 2009-02-
20 "
WS_B_CLAIMANT_ID " "
WS_B_BILL_RECVD_DT " "
WS_B_BILL_PAID_DT " "

Record 2:

[record
WS_B_CLAIM_NBR "1890070194"
WS_B_CHECK_NBR "371521908"
WS_B_BILL_NBR
"18900701940120000131133906410"
WS_B_CLAIMANT_ID "01"
WS_B_BILL_RECVD_DT "2000-01-26"
WS_B_BILL_PAID_DT "2000-01-31"

Record 432017:

[record
WS_B_CLAIM_NBR "6524235006"
WS_B_CHECK_NBR "600230290"
WS_B_BILL_NBR
"65242350060220090217100052724"
WS_B_CLAIMANT_ID "02"
WS_B_BILL_RECVD_DT "2009-01-29"
WS_B_BILL_PAID_DT "2009-02-18"

Record 432018: (Last Record)

[record
WS_B_CLAIM_NBR "BILL FILE "
WS_B_CHECK_NBR " "
WS_B_BILL_NBR E" 2009-
02-20\x04\x32\x01%\x00\x00Â\b\x30\x20Ê@ "
WS_B_CLAIMANT_ID " "
WS_B_BILL_RECVD_DT " "
WS_B_BILL_PAID_DT " "

Sharing a subgraph across graphs

Sharing a subgraph across graphs:


We all know that when a subgraph is built, it becomes a part
of the graph in which we build it. However If we have a
situation to use that subgraph in many other graphs, or in
other places in the original graph, then this can be achieved
by saving it as a component and placing it in the server for
shared access.
How do I save a subgraph as a component?
To save a subgraph as a component:
1. If you do not have a components folder in your
sandbox, do the following:
a. Create a components folder, and a parameter
with which to reference it, in your sandbox.
b. Add the components folder to the Component
Organizer as a top-level folder.
2. Select the subgraph.
3. From the File menu, choose Save Component
"subgraph_name" As.
4. Navigate to the components folder in your sandbox.
5. In the Save as type text box, choose Program
Components (*.mpc, *.mp).
6. Click Save.
You may need to right-click the components folder you
added to the Component Organizer and then refresh it so
the subgraph will appear in the components folder. Once the
subgraph appears, you can drag it from the Component
Organizer into any graph in which you want to use it, just as
you would use a pre-built component.
A subgraph that is saved in this way becomes a Linked
Subgraph. If you insert an instance of such a subgraph into a
graph from the Component Organizer and then double-click
it, the GDE displays (linked) following the name of the
subgraph.
To make the changes made in the subgraph available in the
instances,
1. Save the desired changes to the subgraph in
the Component Organizer that you used to create
the graph.
2. Select the instances of the subgraph you want
to update in the graph or other graphs that is using
that subgraph.
3. From the GDE Edit menu, choose Update.
Unique Identifier Function

The DML function “unique_identifier()” returns a


variable-length string of printable characters that is
guaranteed to be unique. This includes hashed
versions of the timestamp, hostname, and process
id, as well as a few other fields to guarantee
uniqueness. You can use the return string as a
unique key or to construct a unique filename. To
return parts of the identifier, use
unique_identifier_pieces to decode the output. To
test, try typing the following examples on the UNIX
prompt.

Examples
Code:
$ m_eval 'unique_identifier()'
"136fe-9a55-b-s0183dd943f-2"

$ m_eval 'unique_identifier()'
"1370e-9a55-b-s0183dd943f-2"

Conditional Components in the GDE

Can I make my graph conditional so that certain components do


not run?

You can enter a condition statement on the Condition tab for a


graph component. This statement is an expression that evaluates to
the string value for true or false; the GDE then evaluates the
expression at runtime. If the expression evaluates to true, the
component or subgraph is executed. If it is false, the component or
subgraph is not executed, and is either removed completely or
replaced with a flow between two user-designated ports.

Details

To turn on the conditional components in the GDE:


1. On the GDE menu bar, choose File > Preferences to open the
Preferences dialog.
2. Click the Conditional Components checkbox on the
Advanced tab.
Enabling this option adds a Condition tab to the Properties
dialog of your graph components.
3. Use the Condition tab to specify a conditional expression for a
subgraph or component that the GDE evaluates at runtime. If the
expression returns the string value 1, the GDE runs the subgraph or
component. If the expression returns the string value 0, you have
two choices for how the graph will behave:
 Remove Completely — Use this option to disable the entire
graph branch. This disables all upstream and downstream
components until you reach an optional port. In other words,
all components in this branch are disabled until the graph
makes sense.
 Replace With Flow — Use this option to disable only the
subgraph or component to which the condition has been
applied.
Note the following when writing conditional components:
 You must be using the Korn shell in your host profile.
 The evaluated value for the condition must be a string for
TRUE or FALSE. The following is the set of valid FALSE
values: the boolean value False, 0 (numeric zero), "0" (the
string "0"), "false", "False", "F", "f". All other string
values evaluate to TRUE.
 Be careful not to have something propagate from a
component that might not exist at runtime. This could cause
your graph to fail.
 Components or subgraphs that are excluded are displayed
with gray tracking LEDs at runtime.
It is important to use the precise syntax for if statements in the
Korn shell. The correct form is:

$( if condition ; then statement; else statement; fi)

Here are three examples:

$(if -n $VARIABLE ; then echo 0; else


echo 1; fi)
$(if $LOAD_TYPE = "INITIAL" ; then
echo 1; else echo 0; fi)
$(if [ -a "file_A.dat" ]; then echo
"1"; elif [ -a "file_B.dat" ] && [ -a
"file_C.dat" ]; then echo "1"; elif [ -a
"file_D.dat" ] && [ -a "file_E.dat" ]; then
echo "1"; else echo "0"; fi;)

Performance improvement of a graph

Improving the performance of an already-existing graph

Working on performance problems in an already-existing graph is


a lot like debugging any other problem. An important principle to
follow when making changes to a graph, and then measuring what
differences (if any) have occurred in the graph's efficiency, is to
change only one thing at a time. Otherwise, you can never be sure
which of the changes you made in the graph have changed its
performance.

Performance considerations during development

During development, here are some things to watch out for:

 Over-reliance on databases

There are many things that you can (and should) do outside the
database.

For example, operations involving heavy computation are usually


better done with components, in the graph, rather than in a
database. Sorting will almost always be faster when you use the
Sort component rather than sorting in the database.

For other performance considerations involving databases, see the


Ab Initio Guide>Book.

 Paging (or having very little free physical memory, which


means you're close to paging)
Paging is often a result of:

o phases that have too many components trying to run at


once
o too much data parallelism
 Having too little data per run

When this is true, the graph's startup time will be


disproportionately large in relation to the actual run time. Can the
application process more data per run? Maybe it could use Read
Multiple Files, for example, to read many little files per run,
instead of running so many times.

 Bad placement of phase breaks

Wherever a phase break occurs in a graph, the data in the flow is


written to disk; it is then read back into memory at the beginning
of the next phase. For example, putting a phase break just before a
Filter by Expression is probably a bad idea: the size of the data is
probably going to be reduced by the component, so why write it all
to disk just before that happens?

 Too many sorts

The Sort component breaks pipeline parallelism and causes


additional disk I/O to happen. Examples of misplaced or
unnecessary sorts can be found in the performance example graphs
in $AB_HOME/examples/basic-performance.

AB_SAS_USE_METHOD, lookup implicit usage

When you are creating a SAS file and have numeric


fields as part of your dml, please set
AB_SAS_USE_METHOD_3 to true in your parameters
(graph level or sandbox level), otherwise you will
end up having zero value in the numeric fields

When matching an input field with a lookup field, the


lookup field type will be an implicit type casting. You
don’t need to type cast it again.

Eg: input had date field in format “YYYY-MM-DD


HH24:MI:SS.NNNNNN” and in lookup key field was in
“YYYYMMDD” format. When you join these fields, you
don’t need to type cast it again. It will automatically
match.

Same case applies to join component in which the


format of the key field in driving port is taken as
reference. The non-driving port key value will be
converted (only to perform join) to that of the
driving port.

Below is the supporting help document.

Syntax with a LOOKUP FILE component


record lookup (string file_label, [ expression [ , expression ... ] ] )

Argument Description
file_label A string constant representing the name of a LOOKUP
FILE component.
expression An expression on which to base the match. Typically,
this is an expression taken from the input record. The
function implicitly casts expression to the type of the
corresponding key field(s).
The number of expression arguments must match the
number of semicolon-separated field names in the key
specifier of file_label. The maximum number is 24. Any
number of expressions can be NULL. If all expressions
match the corresponding key fields in the lookup
record, the record is considered a match.
You can omit the expression arguments if the key for
this lookup is empty: that is, if the key parameter of the
Lookup File component is set to { }. Note that all
records will match and the first one will be selected.

APAD Requisition Steps

To request a new software like Ab Initio GDE, Ab


Initio Forum, and Data Profiler to be installed using
APAD, please follow the steps below:
1. Use the Service Catalog website to submit a
request

http://rc.allstate.com

2. Look for "Advanced IT Services" at click the


link "Software Packaging"

3. Find the "Software Packaging


Workstation" request and click "Proceed to Order"
4. Refer to the previous GDE package installation
request when filling up the form

View Multiple errors at once Next error

To view multiple errors at once:

When the AB_XFR_COLLECT_ERRORS configuration


variable is set to true, the Co>Operating System will
attempt to accumulate multiple errors encountered
during transform compilation, rather than just
stopping at the first one. If the variable is set to false
(the default), compilation will be aborted on the first
error encountered.

By setting this configuration variable to true, you


may be able to identify and fix multiple errors with
each execution.

To view Next error:

If you have multiple errors in the GDE Application


Output: Job pane, the F4 key allows you to cycle
through the errors. Pressing F4 scrolls the next error
into view and highlights the component that
generated the error. After you've reached the end of
the errors, the feature will prompt you whether to
start again at the beginning of the output.

simple steps to diagnosis checkout issueknow more


details at graph failures

GDE Run settings:


If you face the below error either in the check out/
setups

The GDE encountered problems while attempting to execute


the script.
Check for syntax errors in graph parameters with shell
interpretation,
or in the host setup and cleanup scripts.

The error was:

./GDE-dmei3-command0003.ksh[177]: /config.ksh: not


found

It means you are trying to checkout to a path where


you don’t have access.

Eg: If I am checking out from Dev eme to


/export/home/sven7 (path in dev) or
/apps/home/sven7 (path in QA), where I don’t have
access, I will get the above error.

You need to check your settings. Please make sure


you provide all the appropriate values.

Also in the checkout process, you will have the


option of selecting the run host settings. Please
select the appropriate one.

Use AB_BREAK_ON_ERROR while debugging:

When the configuration variable


AB_BREAK_ON_ERROR is set to true and debugging
is enabled, an error or reject will start the debugger
at the point of failure or rejection. This makes it
easy to examine the state of the transform at the
point of failure or rejection

API vs. UtilityBulk Load

There were some confusions regarding the loading methods


(API and utility) using output table component when the table
has indexes and constraints defined on them. Below are the
various scenarios and the behavior of the table component.

API Utility
Direct - True Direct - Fa
1) Record by record 1) Bulk load
loading 2) Disables
2) Checks index/constraints at
constraints for beginning, loads data and
records enables index. 1) Bulk Loading
3) Very slow 3) If data has duplicates, 2) Between API a
Serial 4)Suitable for base index becomes unusable Utility - direct-tru
Loading load while re-building performance.
MFS Graph will fail saying 1) Graph will run
Loading Same as above index built on table 2) Slow performa

Inference: 1) Table with constraints and indexes

a) Serial loading – Utility Direct True or API


depending on requirement.
b) Parallel loading (If huge data) – Disable index,
load using utility direct true, enable index or API

c) Parallel Loading (if less data) - API

2) Table with no constraints and indexes

a) Serial or Parallel Loading – Utility Direct True.

Here’s the excerpt from Ab Initio GDE Help pertaining to API


and Utility

What is the difference between API mode and utility


mode in database components?

Short answer

API and utility are two possible interfaces to


databases from the Ab Initio software and their uses
can differ depending on the database in question.

Details

Enterprise-level database software often provides


more than one interface to its data. For example, it
usually provides an API (application programming
interface) that allows a software developer to use
database vendor-provided functions to talk directly
to the program.

In addition, the vendor usually provides small


programs, or utilities, that allow the user to
accomplish a specific task or range of tasks. For
example, the vendor might provide a utility to load
data into a table or extract table data, or provide a
command-line interface to the database engine. The
exact functionality of the utility or API varies by
database vendor; for that reason, specific details are
not provided here.

API and utility modes both have advantages and


disadvantages:

 API mode — Provides flexibility: generally, the


vendor opens up a range of functions for the
programmer to use; this permits a wide variety
of tasks to be performed against the database.
However, the tradeoff is performance; this is
often a slower process than using a utility. As an
Ab Initio user, you might use API mode when
you want to use a function that is not available
through a utility. In some instances, a
component will only run in API mode for just this
reason — the function inherent in the
component is not available through that
vendor's published utilities. In general, however,
it is useful to remember that API mode executes
SQL statements.
 Utility mode — Makes direct use of the
vendor's utilities to access the database. These
programs are generally tuned by the vendor for
optimum performance. The tradeoff here is
functionality. For example, you might not be
able to set up a commit table. In such an
instance, you must trust the ability of the utility
to do its job correctly. Because the granular
control given by API mode is not present in
utility mode, utility mode is best when your
purpose most closely resembles the purpose for
which the utility was created. For example, any
support of transactionality and record locking is
subject to the abilities of the utility in question.
Also, unlike API mode, utility mode does not
normally run SQL statements.

When choosing whether to use api or utility mode


with OUTPUT TABLE, keep the following in mind:

 api mode usually gives better diagnostics

 utility mode (the default mode for OUTPUT


TABLE) usually gives better performance.

API mode parallelization (all databases) for


UPDATE TABLE

You can apply any level of parallelism to the layout


of UPDATE TABLE, but note that each partition of an
UPDATE TABLE component running in parallel can
compete for database locks on the table it references
and deadlock can result.

You can often avoid such deadlock by partitioning


the input data on the primary key you are updating.
However, this does not necessarily always eliminate
the danger of deadlock.

If you have indexes on other columns in the update


table besides the primary key, then either inserting
rows or performing updates to the particular column
values that are indexed might cause multiple
partitions to contend for locks on the same objects.
Often these secondary keys are across relatively
small numbers of values, and the corresponding
indexes can be rebuilt quickly. In such cases, instead
of trying to update in parallel you can often get
better performance by either:

 Loading serially

OR

 Dropping and recreating any affected secondary


index

Using the NULL key to access a lookup with a single


record

Using the NULL key to access a lookup with a single


record
There are situations where you need to put some piece
of global information in a single record into a file and that
record being accessed with a lookup call. In this case use
the NULL key, that is {}, as the key on the lookup file. It is not
necessary to add a dummy key to be able to retrieve that
single record information from the file.
For example, we would have situations like we store the
count of records processed in a graph into an output file
which will be used as a lookup file for the balancing
purposes. We will insert a dummy key like 1 or “X” in order to
retrieve the records count. For example, a sample lookup file
named “Total Recs Processed Count” with the dummy key
will look like this:

Dummy_Key Recs_Count
X 2340

And we will use the following lookup call to retrieve the


Recs_Count field value,
lookup (“Total Recs Processed Count”, ‘X’).Recs_Count
Instead of inserting the dummy value into the lookup file with
single record information and using the dummy key to
retrieve that information, we can use the NULL key {} to
retrieve the same information. Your lookup file will now be in
the form,

Recs_Count
2340

And we can use the following lookup call to retrieve the


Recs_Count field value,
lookup (“Total Recs Processed Count”, {} ).Recs_Count

Way to check the performance

When environmental variable


AB_XFR_PROFILE_LEVEL set to value
“statement”, you can get more detailed information
how the functions in your XFR are working from
performance perspective.

Steps to achieve:
Define AB_XFR_PROFILE_LEVEL as graph level local
parameter and provide the value as “statement”.
Redirect the log port of your XFR component to a
file. It will provide more details from performance
perspective.

Note: The graph execution time will be slowed. This


type of testing should be done only in development
and as a test case only.

Recently Prama application had an improvement in


their graph performance by doing these little
changes.

Scenario:
1) They get a no of day’s value from mainframe and
they need to subtract from 18500101

Before change:

(date(“YYYYMMDD”)datetime_add((date("YYYYMMDD
"))"18500101", days-1)
After change:
(date("YYYYMMDD")) (days - 18263)

This improved around 25% of the time.

2) They need to replace all non-printable characters


with blank.
Before change:
re_replace (input_string, "[^ -~]", ' ');

After change:

Defined the below variables in the XFR


let integer(4) i = 0;
let integer(1) blank_char = string_char(" ", 1);
let unsigned integer(1)[256] translate_map = for(i, i
< 256) : if (i>= 32 && i <= 126) i else blank_char;

and replaced the re_replace function with


translate_bytes(input_string, translate_map);

With this, the time to process 100M records went


down from 770 seconds to 120, more than a factor
of 6

Small Useful Developer Tips

Evaluating graph parameter values without


accessing the GDE

 It can sometimes be useful to know what


values graph parameters are resolved to when
you do not have access to a GDE. You can view
the resolved parameter values by using the air
sandbox run command with the -script-only
option. This command generates a deployed script
that is displayed to stdout. The output shows how
the parameters are resolved without actually
running the graph.

 Alternatively you can also use air sandbox


parameter for this purpose as well for graph and
sandbox parameters. For example,
air sandbox parameter -path
my_graph.input.pset -eval my_PARAM
will return the value of the graph parameter,
my_PARAM.
Double Clicking the “Stop Execution” button in
GDE

 When you click the Stop Execution button in


the GDE, the GDE issues a command to
gracefully shut down the job. Double-clicking Stop
Execution issues a command to disconnect from
the remote server. This is almost always
undesirable as it can leave remote processes
running or in an inconsistent state. If you
accidentally double-click Stop Execution, the GDE
will prompt you with a dialog asking you to confirm
that you really want to disconnect. When
presented with this dialog, you should select No
and let the graph shut down cleanly.

 The GDE cleans up watcher files automatically


unless the cleanup process terminates abnormally
— for example, if you disconnect from the run host
by clicking the Stop Execution button two or more
times. Old watcher datasets that have not been
cleaned up should be removed through the GDE
main menu by selecting Debugger>Delete
Watcher Datasets. This action deletes all watcher
datasets in the current run directory.

Check parameter usages in all places of a graph

 To see all places in a graph where a parameter


is used:

1. From the GDE menu, choose Edit >


Parameters.
2. From the Parameters Editor, choose Edit >
Find All and enter the name of the parameter you
are looking for.

Inline expansion of small, frequently called


functionsUse AB_BREAK_ON_ERROR while
debugging

1) You can sometimes improve the runtime performance of a


transform by expanding small frequently-called functions
inline. Inline expansion replaces a function call with inline
code, thus eliminating call overhead. However, this
increases the size of the generated top-level code and also
the time it takes for a component to start up, so it should only
be used with small functions that are frequently called. By
default, single-line transform functions are expanded inline
wherever they are located.

The inline keyword can be added to a function definition to


indicate that the function will be expanded inline at runtime.
For example:

out :: MyFunction( a ) inline =


begin

end;
For more information, see "Inline expansion of simple and
complex transforms" in Ab Initio Help

2) When the configuration variable AB_BREAK_ON_ERROR


is set to true and debugging is enabled, an error or reject will
start the debugger at the point of failure or rejection. This
makes it easy to examine the state of the transform at the
point of failure or rejection
Sample Memory Allocation Issue and Resolution

Error:
========= Error from
PBKS_COMPANY_ID_POLICY_NUM_SOURCE_SYSTE
M_ENTY_ID_CODE_.Sort.004 on
abihost ========= Memory allocation failed
(8388608 bytes).
Current data ulimit is 'unlimited'.
Current vmem ulimit is 'unlimited'.
Current address space ulimit is 'unlimited'.
Heap is at least 35739328 bytes already.

This graph may be improved in its use of memory


but the change that would really impact the memory
allocation and would avoid the graph failure is to add
phase breaking or to reduce the parallelism and run
the graph 1-way instead of 2-ways.
In fact, as the graph is written now, here is the
computation of the maximum memory required by
the (only) phase:

11 Sort * 2-ways parallel = 22 Sort processes


* 100 Mb max-core = 2 Gb.

Plus you have to take into account ~7Mb. overhead


per component in the phase.

So the change that would have a bigger impact


would be having the PBYK components in one phase
and all the SORT-DEDUP in the second phase.
There are other things that can be improved. The
following comments are minor changes with
decreasing order of importance that can improve
your graph and memory allocation.
1) As a general rule, Sorting is expensive. It's often
necessary, but you should always think carefully
about whether it's required. For example, don't use
Partition by Key and Sort if just a Partition by Key
will do. In your graph you have a PBKS component
that can be replaced by a PBK and SORT WITHIN
GROUP. The component is
PBKS_COMPANY_ID_POLICY_NUM_SOURCE_SYSTEM
_ENTY_ID_CODE_. Since your data are already
partitioned and sorted by the first 3 keys you can
simply partition by {ENTY_ID_COD} and Sort within
group. In this way you can at least take rid of one
SORT component.
2) Besides that, you could use a REFORMAT before
the PBKS and drop the unnecessary fields that you
are not going to have on the output port of your
JOIN. In this case you would sort on a smaller
number of data and it would be more efficient.
You may want to use a REFORMAT and its parameter
output-index (look at the GDE online help for more
details) to separate the input records in different
transforms and output ports. This Reformat would
drop ~half of the fields per record.
3) My guess is that you don't need the REFORMAT
FORMAT D if then you are going to trash these
records.

The Partition by Key and Sort done before using


partition by expression wouldn't improve the
performance and neither sort/dedup ahead since
sorting on smaller group on data is more
performing.

Parameter evaluation in a graph using PDL

Let us assume the mapping file


($AI_MAPPING/ewpoc_table_details.txt) have the following
contents:
f36:EWT_CLM_STATUS_HIST :{ f36_adw_claim_id}:
f37:EWT_CLM:{f37_adw_claim_id}:
And our requirement is to extract table name (2nd field from
mapping file) and key (3rd field from the mapping file) based
on the table code which is a formal parameter to the graph.
The approach is as follows:
Step 1: Get the contents of the mapping file into a
parameter (say parameter_file)

parameter_file :
$AI_MAPPING/ewpoc_table_details.txt

Step 2: Get the corresponding row to the table code

parameter_row : $
[string_split(re_get_match(parameter_file,
TABLE_CD+":.*"), "\n")[0]]

Step 3: Get the values from parameter_row as follows:

TABLE_NAME : $[string_split(parameter_row, ":")


[1]]
PARTITION_KEY : $[string_split(parameter_row,
":")[2]]

Example: Suppose f36 is the formal parameter as table code

From step-1, the parameter_file will have the following


content

f36:EWT_CLM_STATUS_HIST:
{f36_adw_claim_id}:
f37:EWT_CLM:{f37_adw_claim_id}:

From step-2, parameter_row will have the row related


to the passed table code

f36:EWT_CLM_STATUS_HIST:
{f36_adw_claim_id}:

From step-3,

TABLE_NAME is EWT_CLM_STATUS_HIST
PARTITION_KEY is {f36_adw_claim_id}

Use AB_DML_DEFS and AB_INCLUDE_FILES

Do not confuse the AB_DML_DEFS DML inclusion


parameter with the AB_INCLUDE_FILES
configuration variable.

AB_DML_DEFS is a graph, plan, or project parameter


that contains DML declarations and definitions for
use within inline DML in other parameter definitions.
Inline DML is evaluated during parameter evaluation.
AB_INCLUDE_FILES is a configuration variable that
specifies paths of files to include during DML
evaluation in component transforms and record
formats. This happens during runtime DML
evaluation, which occurs separately from and much
later than parameter evaluation.

In general, always declare AB_DML_DEFS as a local


-- not an input -- parameter. The purpose of
AB_DML_DEFS is to allow you to define a self-
contained DML context for a graph or plan,
independent of the circumstances of the graph's or
plan's execution.

For more information see "AB_INCLUDE_FILES",


"The AB_DML_DEFS DML inclusion parameter", and
"Using AB_DML_DEFS" in Ab Initio Help.

Converting Datetimes Between Time Zones

Converting a date format specifier to a specifier with a UTC


time-zone offset

Sometimes we need to convert a datetime from the DDD,


DD MMM YYYY HH:MM:SS +OOOO format to YYYY-MM-
DD 24HH:MM:SS with a UTC time-zone offset (for example:
Mon, 04 Jul 2005 08:52:50 -0400).

As in some of the application we convert the time to GMT,


CST, MST, EST, PST or many more time zones

The time-zone value data conversion rules are:


• When you assign a datetime value without time-zone
offset information to a datetime value with a time-zone offset,
the result is assumed to be a UTC time.

Depending on the format specifier, the assignment adds a Z,


+0000, or +00:00 to the datetime value with a time-zone
offset.

• A cast to a datetime format without a timezone specifier


converts the timestamp to UTC. Compare the following:

Code:
$ m_eval '(datetime("YYYY-MM-DD
HH24:MI:SS+ZONE"))"2009-08-06 18:34:23+0600"'
"2009-08-06 18:34:23+0600"

$ m_eval '(datetime("YYYY-MM-DD HH24:MI:SS"))


(datetime("YYYY-MM-DD HH24:MI:SS+ZONE"))"2009-08-06
18:34:23+0600"'
"2009-08-06 12:34:23"

$ m_eval '(datetime("YYYY-MM-DD HH24:MI:SS"))


(datetime("DDD, DD-MM-YYYY HH:MI:SS +ZO:NE")) "Mon,
10-08-2009 10:12:01 +06:00"'
"2009-08-10 04:12:01"

Converting Invalid Date Format To Valid Oracle Date


Format
As several times we have the scenario were we have to form
the target date field concatenating Year, Month and Day field
or combination of two fields and hard coding the third field
and many other ways.
Also there are the situations where we have 2 byte input
source field and the data we are receiving for the Month/Day
field say for example
“04”
“ 4”…etc

Or during design at many places we keep the check if the


month is less than 10 then append ‘0’ to the value to form
the month value something like ‘01’,’02’ and so on.

But we can overcome all the extra effort with extra type
casting the data. See the below examples for more details:

Example 1: In this case Abinitio evaluated the date as valid


even though the month has space in the value.
$ m_eval ' (date("YYYYMMDD"))"2005 505"'
"2005 505"

When you will try to load this value to Oracle it will failed
saying invalid month. I know many of you are curious to
know why Abinitio treated as the valid date but this is how it
works.

Example 2: Using Extra type caste


$ m_eval '(date("YYYYMMDD"))(int)
(date("YYYYMMDD"))"2005 505"'
"20050505"

In the above case, If you reformat the date in any way, the
space embedded in the date would have been replaced with
a zero.
That is why the cast to an integer and back allows this to
work. But if you are just copying the column without
changing its format in any way then we do not check.

Determining Whether A Vector Contains A Given


Element
Use the member operator to determine whether a vector
contains a given element.

The member operator is highly optimized and is generally


the most efficient method of searching a vector for a given
value. The following example shows how you can determine
whether a vector of names contains the name Smith:
Code:
out.found :: "Smith" member in.names;

Examples
The following examples show the use of the member
operator.

Example 1. This example assumes that the following vector


named New_England was defined globally in
AB_INCLUDE_FILES:

let string('\0')[6] New_England = [vector "Massachusetts",


"Rhode Island", "Connecticut", "Maine", "New Hampshire",
"Vermont"];

$ m_eval "'Massachusetts' member New_England"


1

$ m_eval "'New York' member New_England"


0
Improving the Performance of Sort

Ab Initio Tip of the Week:

The Ab Initio sort algorithm is efficient, but it is still an


expensive operation in terms of CPU usage and memory. If
you wish to improve the performance of a sort operation
within your graph, there are a number of areas you can
examine.

Do you really need to Sort?

The quickest way to decrease the impact of a SORT


component on your overall graph performance is to remove
the SORT component entirely. Look at your requirements
and use of SORT components in your graph carefully. For
example, if you are sorting prior to sending records to a
ROLLUP component, it may make more sense to use an in-
memory rollup. If it isn’t possible to eliminate sorting entirely,
look at combining multiple sort operations into a single
SORT component or a series of SORT and SORT WITHIN
GROUPS components.

Record Format

The SORT component needs to parse each record in your


data stream, therefore making sure that your records and the
sort keys can be parsed efficiently is important. In general
keys should be of a fixed-width non-nullable type and are
grouped at the beginning of your record. Your record will be
parsed more quickly if it contains only fixed-width fields.

If this is not true of your existing record format, you can alter
an existing transform component or add a new one before
the SORT to optimize your record format. You can then use
another transform after your sort to return the records to your
required format. The extra overhead of reformatting is often
compensated for by the quicker sort.

Compression

If you are sorting a volume of data that will not fit within the
amount of memory specified by the max_core parameter, the
sort will need to spill all of its records to disk. If a large
volume of data needs to be written to disk, the I/O time used
for this operation may be significant enough that it makes
sense to compress the data before writing it to disk. As
changing this parameter to compress spill files can add
significant CPU time to your graph, it is important to
benchmark your graph with realistic amounts of data before
and after making this change. For example, if the disk I/O
rate is relatively high (compared to CPU), it may be that the
Sort component will run faster without the compression.

Use Of re_match_replace Function

The regular expression DML function re_match_replace,


introduced in the Co>Operating System 2.15.3, allows you to
use named capturing groups to replace substrings of a
matched pattern.

The following example shows how to reverse the order of


three short words.

Code:
m_eval 're_match_replace("Mon Tue Wed","(.{3})(.{3})(.
{3})","$3 $2 $1"
)'
"Wed Tue Mon"

Each parenthesized sub expression used in the pattern can


be referenced in the replacement string with the format
$number– where $0 refers to the whole expression, $1
refers to the first sub expression, $2 refers to the second sub
expression and so on.

To Format Numbers, Cast To A Decimal Type Rather


Than Using Printf

In most programming languages, you have to call a function


to convert the numeric value to a string. C programmers
often use the printf or sprintf functions for this, and users
new to Ab Initio software sometimes are drawn to the DML
printf function to perform that task. It works, but it's almost
always overkill.

In most cases, the most efficient and elegant way to get the
text form of a number is to cast it to a decimal type. For
example, if x is a real(8), you can get the text form of its
value formatted with an explicit decimal point, with four digits
to the right of the decimal, with the expression:

(decimal("".4))x

In some cases, the number you need as a string may be a


decimal number, in which case it's already in text form. You
can assign such values directly to string fields or use them
as input to various string functions. Occasionally, it may be
necessary to explicitly cast a decimal value to a string type,
as when using the + concatenation operator. For example, if
d is a decimal, you might write:
// Reject the record if d is smaller than 42:

if (d < 42)

force_error("The value of d, " + (string(""))d + ", is too


small.");

Mainframe IssuesResolution

Here are the most common mainframe issues you see

Scenario 1:

ABINITIO(DB00113): Error remotely


executing 'm_db' on node 'mvsgl93'.
mvsgl93: Remote job failed to start up
===========================================
=====================
Waiting for login prompt ... responding ...
done.
Waiting for password prompt ...
responding ... done.
Waiting for command prompt ...
/apps/xt01/abinitio-V2-14-1/bin/m_rtel:
pipe closed unexpectedly when processing
pattern %|#|$|> and answer cat >
/tmp/rtel.10.48.74.75.3962

failing command: /apps/xt01/abinitio-V2-14-


1/bin/m_rtel -h mvsgl93 -u hlprod
-script /apps/xt01/abinitio-V2-14-
1/lib/telnet.script -shell sh -packet
/apps/abi/abinitio/bin/bootstrap
/apps/abi/abinitio /NONE bin/inet-exec
"7920" "7921" "022" "10.48.74.75:60686"
"m_db list - -use_args_in_config
-do_data_translation"
"AB_HOST_INTERFACE=mvsgl93"
"AB_TCP_CONNECTION_TOKEN=enabled"
"AB_LAUNCHER_VERSION=2.14.104"
"AB_LAUNCHER_PROTOCOL_VERSION=P_late_arg_pa
ssing"
-------------------------------------------
---------------------
Trouble starting job:
Remote host: mvsgl93
User name: hlprod
Startup method: telnet
Remote AB_HOME: /apps/abi/abinitio
Local interface: 10.48.74.75
===========================================
=====================
ABINITIO(*): Database Package
Version 2-14-104-e11-1

Scenario 2:

ABINITIO(DB00113): Error remotely


executing 'm_db' on node 'mvsusys'.
mvsusys: Remote job failed to start up
===========================================
=====================
Waiting for login prompt ... responding ...
done.
Waiting for password prompt ...
responding ... done.
Waiting for command prompt ... got it.
Waiting for command prompt ... got it.
/apps/abi/abinitio/bin/bootstrap
/apps/abi/abinitio /NONE bin/inet-exec
-f /tmp/rtel.10.48.74.11.3886 221 <
/dev/null ; rm -f
/tmp/rtel.10.48.74.11.3886 ; exit
/apps/abi/abinitio/bin/inet-exec: corrupt
argument file /tmp/rtel.10.48.74.11.3886 -
expected size 221 but actual file size 0
Possibly the value of AB_TELNET_PAUSE_MSECS
should be increased from its
current setting of 200
======= Argument file follows:

-------------------------------------------
---------------------
Trouble starting job:
Remote host: mvsusys
User name: ABIEPC
Startup method: telnet
Remote AB_HOME: /apps/abi/abinitio
Local interface: 10.48.74.11 (from
AB_HOST_INTERFACE)
===========================================
=====================

ABINITIO(*): Database Package


Version 2-14-104-e11-1

Scenario 3:
 
[DB00109,DB00112,DB00200,DB00113,B148,B1105
,B1108,B1101,B1,B1104,B1103]
ABINITIO(DB00109): Error getting the
database layout.
ABINITIO(DB00112): Subprocess m_db
returned with exit code 4.
ABINITIO(DB00112): It was called as: m_db
hosts
/export/home/rrudnick/sandbox/apt/hrm/ic/db
/testv_db2hrm.dbc -select SELECT
agn_agent_type_cd, agn_agent_nbr FROM
testv.P1T_TOT_AGENT WHERE agn_end_eff_dt =
'9999-12-31'
ABINITIO(DB00112): The following errors
were returned:
ABINITIO(DB00112):
-------------------------------------------
----------

ABINITIO(DB00113): Error remotely


executing 'm_db' on node 'mvsasys'.
mvsasys: Remote job failed to start up
===========================================
=====================
Waiting for login prompt ... responding ...
done.
Waiting for password prompt ...
responding ... done.
Waiting for command prompt ... got it.
Waiting for command prompt ... got it.
/apps/abi/abinitio/bin/bootstrap
/apps/abi/abinitio /NONE bin/inet-exec
-f /tmp/rtel.10.48.74.75.3943 221 <
/dev/null ; rm -f
/tmp/rtel.10.48.74.75.3943 ; exit
/apps/abi/abinitio/bin/inet-exec: corrupt
argument file /tmp/rtel.10.48.74.75.3943 -
expected size 221 but actual file size 0
Possibly the value of AB_TELNET_PAUSE_MSECS
should be increased from its
current setting of 200
======= Argument file follows:

-------------------------------------------
---------------------
Trouble starting job:
Remote host: mvsasys
User name: HRMABID
Startup method: telnet
Remote AB_HOME: /apps/abi/abinitio
Local interface: 10.48.74.75
===========================================
=====================

ABINITIO(*): Database Package


Version 2-14-104-e11-1

ABINITIO(DB00112):
-------------------------------------------
----------
[Hide Details]

Cause of Error: [DB00112]
DB00112_1: 4
DB00112_2: m_db hosts
/export/home/rrudnick/sandbox/apt/hrm/ic/db
/testv_db2hrm.dbc -select SELECT
agn_agent_type_cd, agn_agent_nbr FROM
testv.P1T_TOT_AGENT WHERE agn_end_eff_dt =
'9999-12-31'
DB00112_0: m_db

DB00112_3: [DB00200]
Database Package Version: 2-14-104-e11-1

Base Error: [DB00113]
DB00113_0: m_db
DB00113_1: mvsasys



Execution starting...
Error reported with 'mp error' command
layout4
Error getting the database layout.
ABINITIO: Fatal Error
Script end...
ERROR : ++++ FAILED ++++ Job
clifeii_018_clifeii_018_ic_002_rfmt_af_hrm_
common_layout failed.
Failed
 
Scenario 4:

[R147,R3999]
Could not create working directory: Agent
failure
Base File =
"file://mvsasys/~mvsqds/RNN.EDW.EW368.NOVA.PR
OCESS3.SRTD.OCTQC01,%20recfm(vb),
%20varstring,%20recall,recfm(vb)
varstring
recall"
Work Dir =
"file://mvsasys/~ab_data_dir/a304a48-
48cfd68d-16c2-000"
Error details:
ABINITIO: start failed on node mvsasys
Could not start agent:
Cannot create agent data directory: No
space left on device
Path = "/apps/abi/data/a304a48-48cfd68d-
16c2-000"

Scenario 5:

cjade@gl04dm02:ewabipd2 [/allstate/log] -->


more
/apps/xt11//data/admin/ent/adw/premium_rewr
ite/error/./ewprd610_nwt_thrd_pty_unload_27
316_2008-11-17-17-42-20.err
Trouble creating layout "layout2":

Could not create working directory: Remote


process did not start correctly
Base File = "file://mvssw91/"
Work Dir =
"file://mvssw91/~ab_data_dir/a4253a0-
4921f354-6b81-001"
Error details:
mvssw91: Remote job failed to start up

===========================================
=====================
Waiting for login prompt ...
responding ... done.
Waiting for password prompt ...
responding ... done.
Waiting for command prompt ... got it.
Waiting for command prompt ... got it.

-------------------------------------------
---------------------
Trouble starting job:
Remote host: mvssw91
User name: ABIPRM1
Startup method: telnet
Remote AB_HOME: /apps/abi/abinitio
Local interface: 10.66.83.160

===========================================
=====================

cjade@gl04dm02:ewabipd2 [/allstate/log] -->

Scenario 6:

Could not create working directory: Remote


process did not start correctly
Base File =
"file://mvsasys/~mvsqds/TESTPR10.PRM.OCT17K.D
PR10001,%20recfm(vb),%20varstring,%20recall"
Work Dir =
"file://mvsasys/~ab_data_dir/a304a48-
49272e61-e1d-000"
Error details:
mvsasys: Remote job failed to start up
=============================================
===================
IKJ56644I NO VALID TSO USERID, DEFAULT USER
ATTRIBUTES USED
IKJ56621I INVALID COMMAND NAME SYNTAX

---------------------------------------------
-------------------
Trouble starting job:
Remote host: mvsasys
User name: TESTZ
Startup method: rexec
Remote AB_HOME: /apps/xt01/abinitio-V2-
14-1
Local interface: 10.48.74.72

=============================================
===================

Scenario 7:

Execution starting...


[D205]
Trouble creating layout "layout-
Unload_Products_Using_Q_Schema__table_":
[Show Details]


[R147,R3999,B148,B1105,B1108,B1101,B1,B1104,B
1103]
Could not create working directory: Remote
process did not start correctly
Base File = "file://mvsesys/"
Work Dir =
"file://mvsesys/~ab_data_dir/a311cca-
48d7400f-48c2-000"
Error details:
mvsesys: Remote job failed to start up

=============================================
===================
EZA4386E rshd: Permission denied.

---------------------------------------------
-------------------
Trouble starting job:
Remote host: mvsesys
User name: awetlrun
Startup method: rsh
Remote AB_HOME: /apps/xt01/abinitio-V2-
14-1
Local interface: 10.49.28.202

=============================================
===================

Solutions:

Please follow these simple steps and you will be able to


identify the root cause and many times solve the issue.
1) All the needed settings are provided i.e., the
following parameters should be set before your graph
gets executed.
AB_NODES @ mvshost_all :
mvsasys
AB_HOME @ mvshost_all :
/apps/abi/abinitio
AB_WORK_DIR @ mvshost_all :
/apps/abi/abi-var
AB_CONNECTION @ mvshost_all :
telnet
AB_TELNET_PORT @ mvshost_all :
1023
AB_TELNET_TERMTYPE @ mvshost_all :
vt100
AB_EBCDIC_PAGE @ mvshost_all :
ebcdic_page_1047
AB_STARTUP_TIMEOUT @ mvshost_all :
120
AB_USERNAME @ mvshost_all :
xxxxxx
AB_ENCRYPTED_PASSWORD @
mvshost_all : xxxxxxx

Usually they get set in your .abinitiorc file (existing in


your home directory) or in AB_CONFIGURATION or in your
DBC file
2) In order for abinitio to run its utilities in mainframe the
id should have unix system services.

You can check by

i) If you know the password:


Go to start  run  cmd . type “telnet <<your
mainframe server>> 1023.
It will prompt for user/passwd. If you are able
to log in, it means you have the permission or
indirectly OMVS segment is added for your id
ii) If you don’t know the password:
In your unix session type “ m_ls //<<your
mainframe server>>/tmp
If it gives you with information, then you have
access/OMVS segment.

Step 1 and 2 will identify issues like settings/user and name


password issues/omvs segment. Common ones are
password expirations/omvs segment not added for your id.
3) If you are successful in step 1 and 2, you need to
check the space issue.
In mainframe we write files either to /tmp or
$AB_WORK_DIR (/apps/abi/abi-var) only.

If you know the password:


Do the login using telnet as stated in step 2,
type the df command or du (by going into the respective
path). This will tell you the space.
If you don’t know the password:
Type in your unix session m_du or m_df

Eg:
(abinitio)abinitio@xtnb1dv1 : /export/home/abinitio
=> m_df //mvsgl93/tmp
1024-Blocks Used Avail Cap Skew
Filesystem
350,640 137,440 213,200 39%
//mvsgl93/tmp
If you know its space issue, during day time please raise
ticket to “3OS390_SOL”/jack arras. If it is off time, please
contact ATSC/DCO to raise incident against IOM/zOS.

In the above examples, 6 and 7 are setting issues


and 1-5 are space issues.

Inline Expansion Of Simple And Complex Transforms

Inline expansion of simple and complex transforms

You can improve runtime performance by expanding a


function inline. Inline expansion replaces a function call
with actual function code, thus eliminating call overhead.
However, this can increase the size of the generated top-
level code and also the time it takes for a component to
start up.
Inline expansion is controlled by the inline keyword (shown
below in"Expanding a particular function" ) and several
configuration variables. By default, single-line transform
functions are expanded inline wherever they are located.
This behavior is controlled by the default value of
AB_XFR_INLINE_SIZE_LIMIT, which is 1. For more
information on this and the other configuration variables,
see "Configuration variables affecting inline expansion".

Expanding a particular function

To expand a particular function inline at every calling


location, add the word inline to the function definition.
For example:
out :: myfun(a, b, c) inline =
begin
...
end;
Transforms declared this way are expanded inline as long
as AB_XFR_EXPAND_INLINE is set to True.

Expanding all functions

To expand inline all transforms having a particular level of


complexity, you can set the configuration variable
AB_XFR_INLINE_SIZE_LIMIT. For inline expansion, the
complexity of a transform is taken to mean the total
number of statements, rules, and local variable
declarations. The setting of AB_XFR_INLINE_SIZE_LIMIT
affects all transforms, regardless of whether they were
explicitly declared inline.
For example, the following transform is expanded inline if
AB_XFR_INLINE_SIZE_LIMIT is set to 4 or greater. The
complexity of the transform is four because the transform
has two local variables, one statement, and one rule
(2+1+1=4):
out :: yourfun(a, b) =
begin
let int x = a - b;
let int y = x * x;
y = y + y / 2;
out :: if (y > 2 * x) a else b;
end;

When To Use The Protocol Prefix (File, Mfile or Mvs)

When a GDE text box for a component parameter is labeled


URL, it’s a good idea to use Ab Initio URL syntax:

protocol://hostname/pathname
Where:
 The value of protocol represents the type of dataset to
which the URL points: file, mfile, or mvs.
 The value of hostname specifies the computer where
the file or control partition resides.
 The value of pathname is an absolute pathname
indicating where on the computer the file or control
partition resides. It must be in the form accepted by the
native operating system of that computer.

Under most circumstances the Co>Operating System will


infer the correct value for an omitted protocol, but specifying
the protocol prefix explicitly will make your graph more
readable and resolve any ambiguity of the dataset type.

When a GDE text box for a component parameter is labeled


File, the value of the parameter should be simply a local file
path:

/pathname/filename

In particular, this applies to DML, XFR, DBC (and similar)


files, which should all be local to the graph at startup.

Null Does Not Equal Null When Doing Field


Comparisons

Null is a special marker used in Structured Query Language


(SQL) to indicate that a data value does not exist in the
database. Introduced by the creator of the relational
database model, E. F. Codd, SQL Null serves to fulfill the
requirement that all true relational database management
systems (RDBMS) support a representation of "missing
information and inapplicable information". Codd also
introduced the use of the lowercase Greek omega (ω)
symbol to represent Null in database theory. NULL is also
an SQL reserved keyword used to identify the Null special
marker.

Null has been the focus of controversy and a source of


debate because of its associated Three-Valued Logic (3VL),
special requirements for its use in SQL joins, and the
special handling required by aggregate functions and SQL
grouping operators. Although special functions and
predicates are provided to properly handle Nulls,
opponents feel that resolving these issues introduces
unnecessary complexity and inconsistency into the
relational model of databases.
NULL is a marker that represents missing, unknown, or
inapplicable data. Null is untyped in SQL, meaning that it is
not designated as a NUMBER, CHAR, or any other specific
data type. Do not use NULL to represent a value of zero,
because they are not equivalent.

NOT NULL constraint


Columns in a table can be defined as NOT NULL to indicate
that they may not contain NULL values (a value must be
entered). Example:

CREATE TABLE t1 (c1 NUMBER PRIMARY KEY, c2 DATE NOT


NULL);

Comparisons
Any arithmetic expression containing a NULL always
evaluates to NULL. For example, 10 + NULL = NULL. In fact,
all operators (except concatenation and the DECODE
function) return null when given a null operand.
Some invalid examples:

Example 1:
A NULL is not equal to a NULL:
SELECT * FROM emp WHERE NULL = NULL;

Example 2:
A NULL cannot be "not equal" to a NULL either:
SELECT * FROM emp WHERE NULL <> NULL;

Example 3:
A NULL does not equal an empty string either:
SELECT * FROM emp WHERE NULL = '';

Valid examples

Example 1:
Select column values that are NULL:
SELECT * FROM emp WHERE comm IS NULL;

Example 2:
Select column values that are NOT NULL:
SELECT * FROM emp WHERE comm IS NOT NULL;

Example 3:
Change a column value to NULL:

UPDATE emp SET comm = NULL WHERE deptno = 20;

Handling Delimited Data with Missing and Extra


Delimiters

The easiest solution to handling data with missing


delimiters is to have your data provider provide you with
clean data in the first place. Otherwise, depending on the
nature of your data, you can run into issues trying to
decipher where a delimiter is supposed to be.

Often, if the incidence of bad data is low enough, you can


just collect these records for manual processing through
the reject port of an early component. Keep in mind that
relying on validating the data against its type may not
catch all the bad data as shown in the examples that
follow.

Throughout the remainder of the week, we'll post simple


cases demonstrating the basic techniques that can be used
with badly delimited data. An example graph and data that
implements the techniques described here and in the tips
to follow is attached.

For more information see, the REPAIR INPUT component


and the “Malformed Delimited Data” topics in Ab Initio
Help. For help with more complex examples, contact Ab
Initio Support.

In this example, we'll use a record with two delimited


fields defined as:

record
string(“|”) code;
string(“\n”) description;
end;
Here are two records provided:

AThis text describes type A


B|This text describes type B
Because the first record is missing the pipe, the data in
these two records will incorrectly be parsed as a single
record. Relying on validating data by its type will not catch
the error:

[record
code " AThis text describes type A\nB "
description "This text describes type B"
]

To repair bad input records automatically within your


graph, you must understand your data and what logic you'll
need to form a good record from a bad record.

When you know that there may be missing internal


delimiters, but you'll always have a line delimiter, you can
use a more generic DML record format to describe the
data, and then use a REFORMAT transform to parse the
data with explicit logic:

record
string(“\n”) line;
end;

To use this type of solution, you must understand the logic


behind how, in the absence of a delimiter, you could
identify which portion of the newline-delimited field goes
into the code field and which portion goes into the
description field. Here you know that along with being
delimited by a pipe, code is also always a single character.
You can write a transform that first checks for a delimiter,
then takes the first character of the line and assigns it to
the code field, and assigns the remainder to the description
field.
Using the NORMALIZE Component To Drop Records

The NORMALIZE component allows you to output a variable


number of records – including zero records – for each
incoming record. This makes it possible to use the
NORMALIZE component to drop records.

The FILTER BY EXPRESSION component is usually used to


select or deselect records but there are times when the
logic required to select records cannot be written in a
single expression. The transform parameter of the
NORMALIZE component allows you to use global variables
and more complex and stateful calculations when
determining whether to drop records.

For example, consider a flow of integer values in which you


want to keep only the integers that are greater than the
sum of the integers you've seen so far. Given the following
input values:

2 5 1 9 4 9 2 35 24

The correct output would be:

2 5 9 35

To do this with a NORMALIZE component, use a global


variable to keep the running sum. In the length function,
compare the running sum to the current value; if the
current value is greater than the running sum output 1;
otherwise output 0. This function will drop any record with
an integer that is less than the running sum.
Keyword Versus Positional Parameters In Command
Lines

When using input parameters, keyword parameters offer


more flexibility and clarity than positional parameters in
the command line options. With positional parameters, it is
important that you specify the parameters in the right
order, as prescribed in the graph. For example, command
syntax using positional parameters may look like:

my_graph.ksh 200612 some_tb some_database_name

With keyword parameters, you specify the parameter name


first (preceded by a hyphen) and the value next. The order
in which the parameter names appear is not important:

my_graph.ksh -PMONTH 200612 -SOURCE_TABLE some_tb


-DATABASE_NAME some_database_name

or

my_graph.ksh -SOURCE_TABLE some_tb -DATABASE_NAME


some_database_name -PMONTH 200612

The keyword syntax provides more insight as to what


parameters correspond to what values and tends to be
more maintainable over time, as new parameters get
added.

For more information, see “The parameter lines” in Ab


Initio Help.

General Information Regarding Phases, Checkout


and Run Program
Do not decouple phases and checkpoints:

A phase break without a checkpoint is no more efficient


than a checkpoint, and in some cases a checkpoint will
actually use less disk space during the execution of a
graph. For example, if a phase writes to an output file, the
previous contents of that file can be discarded immediately
after a checkpoint, but the file contents must be retained
following a phase break without a checkpoint.

In the absence of any specific recovery requirements, a


graph with all checkpointed phase breaks will use the
minimum disk resources compared to the same graph with
a combination of uncheckpointed phase breaks and
checkpoints in the same locations in the graph.

For more information, see “Phases and checkpoints” in Ab


Initio Help.

Use exit codes to indicate failure in RUN PROGRAM:

When using custom components or the RUN PROGRAM


component, be sure the applications you call indicate
failures by passing any errors through their exit codes.
Unless there is a side-effect on the resulting data used
downstream, the Co>Operating System can only recognize
errors through the non-zero exit status of the called
applications.

XML SPLIT Component:


XML SPLIT reads, normalizes, and filters hierarchical XML
data, turning it into DML-described records that contain
only the fields you specify.

The component requires a description of the input XML, in


the form of either a Schema file or an exemplar file. You
specify the Schema file or exemplar file with the Import
XML dialog, which you then use to describe and create the
DML record format for each output.

Used with XML COMBINE


XML COMBINE reverses the operations of the XML SPLIT
component, so you can use XML COMBINE to recover the
original XML input passed to XML SPLIT. That is, XML
COMBINE re-creates previously flattened hierarchies and
normalized elements, and recombines multiple input
streams.

Exceptions to this behavior can occur when XML COMBINE


reads the following types of data:

Flattened repeating elements


Multiple inputs without a specified key
In these cases, you must use sequence numbers with both
XML SPLIT and XML COMBINE to preserve hierarchical and
other contextual information. You can do this in either of
the following ways:

Use the -generate-id-fields argument when you run the


xml-to-dml utility.
Select the Generate fields checkbox in the Import XML
Options dialog. (This is the default.) For more information,
see "Import XML Options dialog".
Loop Expressions and Vectors

A loop expression results in a vector of values — one value


per iteration of the loop. The following for loop expression
computes a vector of n elements, each of which is the
value of expression, evaluated with i set to incrementing
values from 0 to n-1.

for ( i , i < n ) : expression

For example, this expression squares the value of i:

for ( i, i < 5 ) : i*i;

It returns this vector:

[vector 0, 1, 4, 9, 16]

As the following examples demonstrate, loop expressions


simplify vector related business logic. Using a loop
expression, Example 1 builds a vector from a lookup file
using two local variables and three lines of code. Example
2 implements the same logic without a loop expression and
requires eight lines of code and three local variables. The
loop expression makes a transformation more compact and
readable, but is not necessarily more performant.

Example 1:

Code:
let integer(4) no_of_managers
=first_defined(lookup_count("Stores Lookup", in0.store_no),
0);
let integer(4) idx=0;
out.store_managers :: for (idx, idx < no_of_managers):
lookup_next("Stores Lookup").store_manager;
Example 2:

Code:
let integer(4) no_of_managers
=first_defined(lookup_count("Stores Lookup", in0.store_no),
0);
let integer(4) idx=0;
let string("\1")[integer(4)] store_managers =allocate();
for (idx, idx < no_of_managers)
begin
store_managers= vector_append(store_managers,
lookup_next("Stores Lookup").store_manager);
end

out.store_managers :: store_managers;

m_rollback versus m_cleanup


What is the difference between m_rollback and m_cleanup
and when would I use them?

Short answer
m_rollback has the same effect as an automatic rollback —
using the jobname.rec file, it rolls back a job to the last
completed checkpoint, or to the beginning if the job has
not completed any checkpoints. The m_cleanup commands
are used when the jobname.rec file doesn't exist and you
want to remove temporary files and directories left by
failed jobs.

For detailed information on using the cleanup commands,


see "Cleanup" and "Cleanup commands".
Details
In the course of running a job, the Co>Operating System
creates a jobname.rec file in the working directory on the
run host.

NOTE: The script takes jobname from the value of the


AB_JOB environment variable. If you have not specified a
value for AB_JOB, the GDE supplies the filename of the
graph as the default value for AB_JOB when it generates
the script.

The jobname.rec file contains a set of pointers to the


internal job-specific files written by the launcher, some of
which the Co>Operating System uses to recover a job after
a failure. The Co>Operating System also creates temporary
files and directories in various locations. When a job fails,
it typically leaves the jobname.rec file, the temporary files
and directories, and many of the internal job-specific files
on disk. (When a jobs succeeds, these files are
automatically removed, so you don't have to worry about
them.)

If your job fails, determine the cause and fix the problem.
Then:

If desired, restart the job.


If the job succeeds, the jobname.rec file and all the
temporary files and directories are cleaned up. For details,
see "Automatic rollback and recovery".

Alternatively, run m_rollback -d to clean up the files left


behind by the failed job.

How Does Job Recovery Work


How does job recovery work

Synopsis
The Co>Operating System monitors and records the state of
jobs so that if a job fails, it can be restarted. This state
information is stored in files associated with the job and
enables the Co>Operating System to roll back the system to
its initial state, or to its state as of the most recent
checkpoint. Generally, if the application encounters a
failure, all hosts and their respective files will be rolled
back to their initial state or their state as of the most
recent checkpoint; you recover the job simply by rerunning
it.

Answer
An Ab Initio job is considered completed when the mp run
command returns. This means that all the processes
associated with the job — excluding commands you might
have added in the script end — have completed. These
include the process on the host system that executes the
script, and all processes the job has started on remote
computers. If any of these processes terminate abnormally,
the Co>Operating System terminates the entire job and
cleans up as much as possible.

When an Ab Initio job runs, the Co>Operating System


creates a file in the working directory on the host system
with the name jobname.rec. This file contains a set of
pointers to the log files on the host and on every computer
associated with the job. The log files enable the
Co>Operating System to roll back the system to its initial
state or to its state as of the most recent checkpoint. If the
job completes successfully, the recovery files are removed
(they are also removed when a single-phase graph is rolled
back).

If the application encounters a software failure (for


example, one of the processes signals an error or the
operator aborts the application), all hosts and their
respective files are rolled back to their initial state, as if
the application had not run at all. The files return to the
state they were in at the start, all temporary files and
storage are deleted, and all processes are terminated. If
the program contains checkpoint commands, the state
restored is that of the most recent checkpoint.

When a job has been rolled back, you recover it simply by


rerunning it. Of course, the cause of the original failure
may also repeat itself when the failed job is rerun. You will
have to determine the cause of the failure by investigation
or by debugging.

When a check pointed application is rerun, the


Co>Operating System performs a "fast-forward" replay of
the successful phases. During this replay, no programs run
and no data flows; that is, the phases are not actually
repeated (although the monitoring system cannot detect
the difference between the replay and an actual
execution). When the replayed phases are completed, the
Co>Operating System runs the failed phase again.

Note that it may not always be possible for the


Co>Operating System to restore the system to an earlier
state. For example, a failure could occur because a host or
its native operating system crashed. In this case, it is not
possible to cleanly shut down flow or file operations, nor to
roll back file operations performed in the current phase. In
fact, it is likely that intermediate or temporary files will be
left around.

To complete the cleanup and get the job running again,


you must perform a manual rollback. You do this with the
command m_rollback. The syntax is:

m_rollback [-d] [-i] [-h] recoveryfile

Running m_rollback recoveryfile rolls the job back to its


initial state or the last checkpoint. Using the -d option
deletes the partially run job and the recovery file.

Parallel Loading Of Oracle Tables


Parallel Loading of Oracle tables:

There are restrictions that mean you cannot load an


indexed Oracle table from multi file using utility mode.
This would effectively mean multiple instances of
SQL*Loader running against a table. This is not directly a
problem but the maintenance of the index is. In utility
(direct) mode the index is disabled at the start of a load
and rebuilt at the end of the load, but when the are
multiple loads Oracle does not know which one will finish
last and is to rebuild the index, therefore a graph that
attempts to do this will fail will the error:

SQL*Loader-951: Error calling once/load initialization

ORA-26002: Table EWTESTBM.AUDIT_EOM_LASTACCEPT has


index defined upon it

To work around this the index rebuilding option can be


turned off using:
SKIP_INDEX_MAINTENANCE=TRUE

In the native_options parameter of the Output Table


component used to load the Oracle table.

This means that at the end of the load any table indexes
are left in an unusable state. They can be rebuilt calling
the handy stored procedure DUP_RBLD_UNUSABLE_IDX after
the load has completed, e.g. using Run SQL component in
later phase:

exec DUP_RBLD_UNUSABLE_IDX('${SCHEMA_NAME}','$
{TABLE_NAME}');

Note that the stored procedure requires the schema name.


If required this can be read from the relevant database
configuration file into a graph parameter (use
interpretation of shell), e.g.

$(m_db print ${MY_DBC} -value dbms)

The issue will probably not arise if we don’t require the


indexes.

Parallel Unloading From Oracle Tables

Ab Initio will allow you to parallelise the unloading in a


number of different ways. You are likely to need to
experiment to find the approach that is best for you, as
this can depend on the Oracle database layout, amount of
data involved, network, etc. When testing, remember to
use a representative configuration of computers, network
and data to decide what is best.
You should also look at the log output of the Input Table
component carefully to see the queries that Ab Initio is
issuing. This is an important way to confirm that what one
wants is what one is actually getting.

You should also consider unloading the raw data from the
database and doing the join in Ab Initio. This can turn out
to be faster than doing the join in the database itself.

The following help topics (all in the on-line help) provide


some additional information:
- FAQ: Degree of parallelism and the Database:default
layout
- Parallelizing Oracle queries
- Unloading data from Oracle

Some things to know are that:

1. With ablocal_expr or a serial unload Ab Initio will leave


your hints completely alone and won't add any extra hints.

2. With automatic parallelism (ie using a MFS or


database:default layout and not specifying an
ablocal_expr) Ab Initio will end up specifying a ROWID
hint. If you wish to specify your own hint in addition, you
should explicitly use ABLOCAL(tablename). In this case Ab
Initio issues multiple queries to Oracle, each with a rowid
range clause; an ABLOCAL(tablename) clause in this form
tells the component which table to use when determining
the rowid ranges, and the placement of the ABLOCAL
clause tells the component where to put the rowid range
clause in the SQL statement.
3. If you wish to specify an Oracle hint of /*+ parallel...*/,
then Oracle itself parallelises each query. Therefore if you
are running your Ab Initio INPUT TABLE component with a
n-way MFS, and your Oracle parallel query runs m-ways,
you will end up running n*m ways on Oracle itself. This
may not be what you wish to do.

To summarise:
1. Test on a representative configuration, with
representative data.
2. Examine the output from the log port.
3. If you want to use the /*+ parallel */ hint, you probably
want to run the component serially.
4. If you want Ab Initio to determine the parallelism, use a
MFS layout, and don't specify the /*+ parallel */ hint.
5. Consider unloading the data from Oracle and doing the
join in Ab Initio.

Use Dynamic Script Option and PDL Instead Of Shell


Interpretation

Going forward we are advising developers to use the


Dynamic Script Generation feature of Ab Initio. “Dynamic
script generation is a feature of Co>Operating Systems 2.14
and higher that gives you the option of running a graph
without having to deploy it from the Graphical
Development Environment (GDE). Enabling dynamic script
generation also makes it possible to use Ab Initio's
parameter definition language (PDL) in your graphs, and to
use the Co>Operating System Component Folding feature to
improve your graphs' performance.” To find more about
Dynamic Script Generation please refer to your Ab Initio
Help and search for “dynamic script generation”.
To use Parameter Definition Language (PDL) in graph
parameter makes sure to select PDL as an Interpretation
attribute instead of Shell. Below are sample screen shots
for reference.

In the Run Settings, you see

Please select Dynamic instead of the default value GDE


1.13 Compatible.

This will give additional options for your graph level


parameter interpretations.

As stated above, when you define a parameter and want to


have shell interpretation, please replace that with PDL
interpretation. It can do the same thing a shell
interpretation and additionally it will benefit for
dependency analysis when you check in the graph in EME.

Note: This option has to be set for every graph you do.
Currently there is no way we can have that as default.

With PDL interpretation, you can avoid invoking the Ksh for
each shell interpretation (this happens at the background
which you might not have observed).

Appending Multi Files Using AI_MFS_DEPTH


Parameter

Ab Initio Tip of the Week:


We all make use of Multi File System extensively and
perform all the options available like copy,
move, remove and so on.
But again the scenario becomes little tricky when it comes to
append a Multi File and generic
code has to kept in a place. As we know all the environments
has different depth of Parallelism,
so to mitigate that scenario I have came up with the generic
code which will append the data to
the Multi File irrespective of what the environment code is
running.
Code Snippet:
In the above code you can see the following parameters
been used:
${AI_MFS_PARTITIONS} 􀃆/apps/abinitio/data/mfs/parts
${AI_MFS_DEPTH} 􀃆The values varies from environment
to environment
a) DEV ->2
b) QA ->8
c) PROD -> 8
${AI_MFS_NAME} 􀃆The values varies from environment
to environment
d) DEV 􀃆mfs_2_way
e) QA 􀃆mfs_8_way
f) PROD 􀃆mfs_8_way
With the help of the above piece of a code it will not create
conflicts between environments and
data will be appended to the Multi File properly at the
partition levels.
I have used this code in one of my application and its giving
the required output.

Layout definition for ORACLEDB2 Database


tItt s just from Information point off view as I think most of you
know.
Whenever we use the table component we have lot many
options for defining the layout of the
component they are as follow:
1) Propagate from neighbors
2) Component
3) URL
4) Custom
5) Host
6) Database.
But the behavior is slightly different when we make
connection to Oracle and DB2.
The time when we set the layout as URL with the path, it
holds true when connection is made to
ORCALE. as when ever Ab Initio makes a connection from
UNIX to Oracle, it needs to store
some of the tcpip configuration in a file in the temp directory,
so it write to tmp folder with pattern
6
6
some thing liketel.10.66.142.48.497
�r
66 6� .
But when Ab Initio makes a connection from UNIX to
Mainframe DB2 with layout defined through
URL value, you will end up with getpwnam failure. The
reason for the same is if you want to use
the mainframe dbc file you should set the LAYOUT to be
database:serial and not
AI_SERIAL/AI_MFS, so that the Data Base component runs
on the mainframe and not on the
UNIX box.
If you want the Data Base component to run on UNIX, then
you must use a different dbc file that
uses DB2 Connect to get to the mainframe database.
So while using the Oracle or DB2 database please make a
note of this things.
On the FLY KEY DML Creation for Compare and
Chaining Process in ADW

From: Dalal, Pratik (Syntel)


Sent: Wednesday, April 21, 2010 4:28 PM
To: Ab Initio Users; Ab Initio Leads
Cc: ISG-Ab Initio Support
Subject: Ab Initio Utility of the Week-On the FLY KEY DML
Creation for Compare and
Chaining Process in ADW
Ab Initio Utility of the Week:
All of us are aware of the Compare and Chaining process we
do in our world. The process that
we follow for creating the dmls for Compare and chaining
process is very tedious and has couple
of steps.
Saying that chance of making the mistakes are also very
prominent i.e. grouping the logical field
as compare key or no compare key and vice versa. So to
make it robust I have come up with the
utility which can serve the following purpose:
1) Save time, as the tables to be added during the design of
any application are very high in
number.
2) Chances of committing the mistakes are zero percent
unless we goofed up anything in
the mapping files.
3) On the fly generation of the code and ready for the use.
n4) Also dont require another eye to view the code.
Usage of the Utility:
The utility looks for the following inputs at the run time:
1) Project alias name like
i) prm for PRAMA
ii) slc for STAND_CLM
iii) nxg for NEXTGEN and so on.
2) Mapping file depicting the table code
3) Mapping file depicting the Logical Key Columns
One file depicting all the required information will also serve
the purpose.
So once project alias name along with the both the file
names are passed to the script following
functionalities will be achieved:
1) Takes the Table Code and TABLE NAME value from the
mapping file having all the table
codes information.
2) Create the no compare fields. The reason for the same as
it varies from project to project.
So for example:
a) PRAMA 􀃆<table_cd>_atomic_ts and
<table_cd>_source_sys_archive_ind
b) STAND_CLM 􀃆<table_cd>_src_sys_eff_ts and
<table_cd>_src_sys_end_eff_ts
c) NEXTGEN 􀃆<table_cd>_d_atomic_ts and
<table_cd>_d_end_atomic_ts
d) Voice 􀃆<table_cd>_process_ts and
<table_cd>_process_end_ts and <table_cd>_atomic_ts
e) And can go differently with some other projects
3) Once the above steps are done it forms three subsets
they are:
a) Logical keys
b) Compare Keys
c) No Compare Keys
4) Also replaced the delimiter of the last key in all the above
mentioned sub sets from say
�\30 1 to \307\001. The reason for the same is once the key
7 17
dml is formed and in the
transformation when we use reinterpret function and go with
the same delimiter it will just
http://www.SmartPDFCreator.com
http://www.SmartPDFCreator.com
http://www.SmartPDFCreator.com
http://www.SmartPDFCreator.com
http://www.SmartPDFCreator.com
display the value for only first attribute of LOGICAL_KEY,
COMPARE_KEY and
NO_COMPARE_KEY. So by flipping the delimiter of the last
attribute it will give the edge
to have the information of all the associated attributes in
single field which can be further
used for compare and chaining process.
Below are run time snap shots of the utilities:
Snippet 1: Run time Parameter
Snippet 2: Table Code, Logical Column Names from the
Mapping file
Snippet 3: Key Dml
http://www.SmartPDFCreator.com
http://www.SmartPDFCreator.com
http://www.SmartPDFCreator.com
http://www.SmartPDFCreator.com
http://www.SmartPDFCreator.com
Location:
/apps/abinitio/admin/util/keydml_generic.sh on
xtabidv2 server
Please let me know in case you have any queries or
concerns.
Regards,
Pratik
ISG-Ab Initio Support
Office:(847)-402-0892

Key Creation in Multi Layout Using


nuxt_in_sequence() Function
1
Aguas, Jessie
From: Dalal, Pratik (Syntel)
Sent: Monday, April 19, 2010 5:09 PM
To: Ab Initio Users
Cc: Ab Initio Leads; ISG-Ab Initio Support
Subject: Ab Initio Tip of the Week-Creation Of Key in Multi
Layout Using next_in_sequence()
Ab Initio Tip of the Week:
We all know the use of next_in_sequence() and it’s a pretty
straight forward when we need to use in a serial layout.
The complexity comes when we need to use for multi file
layout as it become tricky. The reason for the same is say for
ex.
If we have 4-way partition and each partition has 6 records,
so below are the two scenarios:
Scenario 1: Using next_in_sequence() only in the
component working in Multi Layout:
Record 1 Record 2 Record 3 Record 4 Record 5 Record 6
Partition 0
123456
Partition 1
123456
Partition 2
123456
Partition 3
123456
As shown above the key value will have duplicates in Multi
layout.
Scenario 2: Expected Key values when Component working
in Multi Layout:
Record 1 Record 2 Record 3 Record 4 Record 5 Record 6
Partition 0
1 5 9 13 17 21
Partition 1
2 6 10 14 18 22
Partition 2
3 7 11 15 19 23
Partition 3
4 8 12 16 20 24
This can be achieved by using the next_in_sequence(),
number_of_partitions() and this_partiton(). To do so below is
the
derived formula:
[(next_in_sequence() – 1) * number_of_partitions() +
this_partiton() ] + 1.
2
With the help of this we’ll be able to generate the sequence
as shown above and thus omitting the duplicate key Value.
Note:
number_of_partitions() Returns the number of partitions.
this_partiton() Returns the partition number of the
component from which the function was called.
Please let me know in case you have any questions or
concerns.
Regards,
Pratik
ISG-Ab Initio Support
Office:(847)-402-0892

S-ar putea să vă placă și