Sunteți pe pagina 1din 25

What is EME?

EME is an object oriented data storage system that version controls and manages various
kinds of information associated with AbInitio applications, which may range from design
information to operational data. In simple terms, it is a repository, which contains data
about datametadata.


Revisiting Sandbox Concept
What is a Sandbox?
Projects held in the EME Data store cant be manipulated directly. To work on Projects,
they must be checked out to a working area on the file system where we can develop and
modify code. This working area on the file system is known as a Sandbox. It has exactly
the similar directory structure as that of a Project in the Datastore.
Each object that needs to be worked on is checked out to a sandbox where modifications
or enhancements are carried out. After the changes are complete the code is checked in
from the sandbox area to the EME Datastore. This action creates a new version of the
code in the EME Datastore.
Sandbox Projects vs. EME Datastore Project
Sandboxes are work areas used to develop, test or run code associated with a given
project. Only one version of the code can be held within the sand box at any time. The
EME Datastore contains all versions of the code that have been checked into it. A
particular sandbox is associated with only one Project where as a Project can be checked
out to a number of sandboxes.

EME Data store/Repository connection settings
EME datastore is a specific instance of EME in the environment. This is a repository
where different versions of code and its related data like the record formats,
transformations etc are maintained. At any point of time a user can connect to only one
such EME repository instance. To access an EME Datastore, go to Project>EME
Datastore Settings in the GDE Menu and details are to be filled up in the following
boxlike:

EME Data store/Repository connection settings
Following details are to be filled up in the EME Datastore Settings
Method: Remote Execution (Rexec)/Telnet
Host: The host where the EME Datastore resides
Login and Password: Unix Login credentials for the host
Co>Operating system Location: Path to where the Ab Initio Co>Operating system is
installed
EME Datastore Location: Path to where the EME Datastore is located
Mode: Source Code Control
After filling in the detail press on the Connect button to test the connection. If the details
are filled in correctly you will get a message box confirming the connection.

Project
Project
A Project is a collection of related graphs and its associated elements like dml, xfr etc in
the EME Datastore.
Project structure
Typically a project should contains maximum of 5 to 10 graphs. This helps in organizing
the code efficiently within EME. With increase in the number of graphs in a Project, the
time taken to perform dependency analysis on the graphs and related data increases.
Before adding a Project to an existing application, which already has a number of
Projects in place, the impact it might have on other Projects and on the Application as a
whole must be considered.
Structure of a Project in EME

SQL Sub Directory for Sql queries

Different Types of Projects
Private vs. Public Project
There is often information common to multiple Projects. For instance several Projects
may share some record format files or transform files. Such elements which are used
across Projects can be made widely reusable by making them part of a Project and
including this Project in other projects to access the common elements. A Project that is
included by other Projects is termed as a Public Project and the Projects including public
Projects are known as Private Projects.
A public Project is public in the sense that their data and metadata are expected to be
shared with other Projects and a private Project is private in the sense that their data and
metadata are not expected to be shared with other Projects.
The Environment Project (Stdenv)
There is a special Project associated with every instance of Ab Initio environment known
as the Environment Project or stdenv (Standard Environment). This is no different from a
regular Project in the structure. It contains machine and Application specific settings like
the data directory mount points, max-core settings and application wide parameters like
current date, which are used across all Projects. During creation of any Project, stdenv is
included in it by default. A single stdenv is required for an entire set of applications on a
single machine and sharing a single EME Datastore.
Version Control and Tagging
Each object under EME source control, which may be a file, a directory or a Project, exist
as a series of versions, each of which is a representation of what was checked in by some
user. It can optionally have a textual description attached to it called at agenda
description as a comment. Each version is separately numbered and can be accessed by
either the version number or the tag attached to it.Version numbers, which are integers
and tags, are global to the whole EME datastore. Tags are the basic units during
migration of code across EME data store instances.
Check out of files using GDE
Check out wizard is invoked by navigating to Project>CheckOut, which looks as follows:


Select the Project /directory or file you want to check out by browsing to the particular
Project /directory or file.
In sandbox host dropdown list select the host on which the sandbox resides.
Enter the path to an existing sandbox (the sandbox must be associated with the concerned
Project, which is being checked out) or mention a new one in the directory field, which
would be created during check out.
The advanced options dialog can be seen by clicking on advanced button.


The first two options specify whether to check out the required files from the parent
project and whether to check out required files from the common Projects. The default is
check out the required files from the parent project. A file is required if it is directly
referenced in a graph or if it is referenced in an include in a dml or xfr. While checking
out a whole project these two options are disabled as shown above.
Run host setup script makes sure to run the host profiles set up script before check out
and mark files read only on check out does exactly what it says. The default is on for both
of these options.
We can select a particular tagged version of the object we want to check out from the tag
drop down list. By default the latest version is checked out.

On clicking next, if the sandbox doesnt exist then a confirmation is asked whether to
create the new sandbox or not. Clicking yes creates the sandbox and checks out the object
mentioned to this sandbox.
You will be prompted to enter the sandbox locations of stdenv and any common projects
associated with the project, unless the sandbox has already these values specified or the
sandbox is a pre-existing one.

Clicking on Do Check out performs the checkout operation and on its completion a
window shows the operations performed.


Locking
A lock must be acquired on the object to be modified in the sandbox after successful
completion of checkout. To modify a graph that has been checked out, first open the
graph in the GDE and then click on the lock symbol on the menu. This checks whether
the version in the sandbox is the latest version of the object in the data store and if it is,
the lock symbol turns green showing that the graph is now locked and is editable.
If the graph has already been locked in some other sandbox, after opening the graph in
the GDE the lock is red in colour denoting that there is already a lock on it. A lock can be
acquired on an object only if the sandbox version and the current version of the object in
the EME are the same.
Once a lock is acquired and the changes are complete the object must be checked into the
data store to create a new version in the Datastore.
For Non-AbInitio objects which cant be locked from the GDE,a lock can be obtained
from the Unix command line using the air commands available to obtain a lock on the
particular object.
Check in of files using GDE
Once the project files have been edited and updated they need to be checked into create a
new version in the EME data store, which will be available for other users. Check in
wizard is invoked by navigating to Project>Checkin. Before starting the check in wizard,
it checks for any unsaved file in the sandbox and prompts whether to save them or not.
The checkin wizard looks as follows:

Choose the Sandbox host from the drop down list
In the Directory or file field,browse to the particular file in the sandbox that you want to
checkin. You may select a file under the sandbox or you may also select the whole
sandbox in which case the whole project would be checked into the EME datastore.
Browse to the parent Project in Project Directory field,which points to the Project
directory in the EME data store where the object would be checked in.
To go to the advanced options in check in click on the advanced button.
The checkin tab indicates how you want the checkin to be performed.By defaultForce
overwriteis unchecked. Once it is checked the object is checked in even if there are
conflicts and becomes the latest version in the datastore.Run Host Setup scriptcauses to
run the host profiles setup script before each checkin. It is advised not to change any
settings here.

The analysis tab specifies how much dependency analysis is done and on which objects
during check in.

A tag, which is a descriptive piece of text and a comment, can be attached to the version
that will be checked in.This can be mentioned in the tag tab of advanced options dialog
box. The tagging standards are described in another document.
After filling in the tag information, on clicking next in the check in wizard a check in
ready dialog is displayed.

Clicking on Do Checkin performs the actual check in and displays a window similar to
the check out finished window with the results of check in and dependency analysis (if
specified in the advanced option).
Working with previous versions of graphs/objects EME
Check out the required previous tagged version of the graph to your sandbox.(V1 in
figure below).
Check it back in withForce Overwritein advanced option in checkin wizard.
This will make it the current version in the datastore.(V4 in figure below).
Lock the graph now to make the changes.
Check in the graph back to the EME data store. This updated version will become the
latest version in the EME datastore.(V5 in figure below)
Check in the graph back to the EME data store. This updated version will become the
latest version in the EME data store.(V5 in figure below)


Parameters
A parameter is a name-value pair with some additional attributes that determine when
and how to interpret or resolve its value. Parameters are used to provide logical names to
physical location and should always be used instead of hardcoded paths in graphs. We
can have two types of parameters, graph and Project parameters.
Graph parameters
Graph parameters, as the name suggests are specific to the individual graphs and are
private to them. They affect execution of the graph for which they have been defined.
Graph parameters can be defined by navigating to Edit>Parameters in the GDE which
opens the graph parameters editor.

Project parameters
Project parameters are inherited by all the graphs in the Project and are accessed from the
GDE by the sandbox parametered it or in Project>Edit Sandbox>Parameters. This shows
a dialog box prompting to enter the sand box path. Choose the correct host and the sand
box path and press OK to open the sand box parameter editor, which exactly like the
graph parametered it or shown as above.

Major Parameter Attributes

Scope: Scope of a parameter can be formal or local. A local parameter is internal to the
sandbox and most of the parameters have their scope as local. Its value is taken from the
value column in the parameter editor. A formal parameter is one whose value can be set
from outside, i.e. from the environment where the graph is run. Its value is supplied from
the command line. A green diamond can identify the formal parameters with an arrow
mark.
Kind: If scope is local, kind is left unspecified, but if it is formal, the kind is
automatically set to keyword.
Type: This determines the nature of the parameter. Project parameters have four types
as string, common Project, switch and dependent. Graph parameters have different set
of types.
Export: When this check box is checked, the corresponding parameter value is exported
as an environment variable, otherwise it is generated as a local shell variable.
Private Value: If a parameter is specified as a private value, any subsequent changes to it
remain private to the local sandbox and are not checked in into the EME.
This is useful when different users want different values for the same parameter.
Value: This column specifies the value of the parameter.
Interpretation: This determines how the parameter is going to be evaluated.
Constant: Value is taken literally.
$ Substitution: Variables with $ prefixes are replaced with their values
${} Substitution: Variables within {} and with $ prefixes are replaced by their values but
other occurrences of $ are ignored.
Shell: Korn shell syntax is used to evaluate the value of the parameter.
Required: This attribute can take two values, required (the default) or optional. If it is
required, the value column cant be left blank but if it is optional, it can be left blank.
SESSION I (Day 1)

Introduction to Ab-Initio
What is Ab Initio?
Applications of Ab Initio
Architecture
Co>Operating system
Types of Development
GDE
Co>Op system Configuration
Sandbox Environment
Graph
Component Properties
Attribute Editor
Graph Properties
View Data Panel
Expression Editor
Session II (Day 2 & 3)
DML
Type Reference
Key Specifier Reference
Expression Reference
Transform Reference
Package Reference
Function Reference
DML Utilities
DML Examples
Components
Run SQL
Intermediate File
Lookup File
Concatenate
Gather
Interleave
Merge
Gather Logs
Redefine Format
Replicate
Filter by Expression
Join
Reformat
Rollup
Session III (Day 4 & 5)
Parallelism
Multi file system
Component Parallelism
Data Parallelism
Pipeline Parallelism
Partition and De-Partition Components
Metadata Management
Concepts
Commands
Introduction to Job Management


Ab Initio is Latin for From the Beginning

From the beginning the software was designed to support a complete range of
business applications, from simple to the most complex.
The graphical development environment and a powerful set of components allows the
customers to get valuable results from the beginning.
Moving Data
Move small and large volumes of data in an efficient manner.
Deal with the complexity associated with business data.
High Performance
Scalable Solutions
Better Productivity.
Ab Initio software is a general purpose data processing platform for mission critical
applications such as:
Data warehousing
Batch Processing
Click-Stream Analysis
Data Movement
Data Transformation
Computers come in many shapes and sizes:
Single-CPU, Multi-CPU
Network of single-CPU computers
Network of multi-CPU computers
Multi-CPU machines are often called SMPs (for Symmetric Multi Processors).
Specifically-built networks of machines are often called MPPs (for Massively
Parallel Processors).
Distribution a platform for applications to execute across collection of processors
within confines of a single machine or across multiple machines.
Reduced Run Time Complexity The ability for applications to run in parallel on any
combination of computers where the Ab Initio Co>Operating system is installed from
a single point of control.
Ab Initio software consists of two main programs.
Co>Operating System, which your system administrator installs on a host UNIX or
Windows NT Server, as well as on processing nodes. (The host is also referred to as
the control node).
Graphical Development Environment (GDE), which you install on your PC (client
node) and configure to communicate with the host (control node).

Ab Initio Architecture

Co>Operating System
Co>Operating system is a powerful engine for every kind of data processing.
It delivers crucial facilities including distributed and parallel execution, platform
independent data transport and Process Monitoring.
Co-operating system delivers:
Unlimited scalability double the number of cpu's and execution time is halved
Flexibility open component model for extending and customizing ab initio's
functionality.
Portability The Co>Operating system runs heterogeneously across a huge variety
of operating system and hardware platforms from OS/390 on mainframes, to 10
different implementations of Unix, to windows NT and windows 2000.
Parallel and distributed application execution
Control
Data Transport
Transactional semantics at the application level
Check pointing
Monitoring and debugging
Parallel file management
Metadata-driven components
Co>Op system Configuration

Testing Co>Op systems

The Ab Initio Co>Operating System Runs on
CompaqTru64Unix
DIGITALUNIX
HP-UX
IBMAIX
NCRMP-RAS
Red hot linux
IBM/Sequent DYNIX/ptx
Siemens Pyramid Reliant UNIX
SiliconGraphicsIRIX
Sun Solaris
WindowsNTandWindows2000
Types of Development Environment
GDE (Graphical Development Environment)
SDE (Shell Development Environment)
GDE Layout

A Sandbox Environment
A sandbox is a collection of graphs and related files that are stored in a single
directory tree, and treated as a group for purposes of version control, navigation, and
migration.
Setting up a standard working environment helps a development team work together
The Sandbox capability allows an application to be designed to be trivially portable
The Sandbox contents are a project administrative function
Sandbox Parameters
Start the Ab Initio GDE
Go to Repository-Edit Sandbox

Environment Quick Overview
$AI_RUN run directory
$AI_DML record format files
$AI_XFR transform files
$AI_MP graphs
$AI_DB database config files
$AI_SERIAL Serial source data, other serial data.
$AI_MFS -Ab Initio multi file directory in training will also contain partition
directories (more about this after).
$AI_LOG A location to place logging files, etc.

The goal is to have a development which enables the migration of a graph or set of
graphs to any other environment with absolutely no changes.






Sample Graph

1. Components
2. Datasets
3. Flows
Components
Components may run on any computer running the Co>Operating System.
The Ab Initio Component library contains a diverse built-in set of components.
The particular work a component accomplishes depends upon its parameter
settings.
Some components may require a data transformation parameter, that is, a set of
business rules to be applied to an input(s) to produce a required output.
Datasets
A dataset is a source or destination of data. It can be a simple file, a database table,
a SAS dataset, .
Datasets may reside on any machine running the Co>Operating System.
Datasets may reside on other machines if connected by FTP or database
middleware.
Data within a dataset must always be exactly described using Ab Initios Data
Manipulation Language (DML) to form record format metadata.
Viewing Component Properties

Viewing Port Properties

Dataset: Records and Fields
A dataset is made up of records; a record consist of fields Analogous database
terms are rows and columns

S-ar putea să vă placă și