Documente Academic
Documente Profesional
Documente Cultură
BT PROJECT TEAM
TCS, Kolkata
Co-operating System
On a typical installation, the Co-operating system is
installed on a Unix or Windows NT server while the
GDE is installed on a Pentium PC.
Co>Operating System
Ab Initio Built-in Host Machine 2
Component Programs User
(Partitions, Transforms etc)
Programs
Co-Operating
User
System Programs
Operating System
( Unix , Windows NT )
Operating System
Data Transport
Checkpointing
Metadata-driven components
The GDE …
can talk to the Co-operating system using several protocols like Telnet,
release
Note: During deployment, GDE sets AB_COMPATIBILITY to the Co>Operating System version number. So, a change in the
A Graph
is the logical modular unit of an application.
A Component
is a program that does a specific type of job and can be controlled
by its parameter settings.
A Component Organizer
groups all components under different categories.
A Sample Graph …
Datasets
Dataset
Components
L1
Other
Customers
Flows
A Sample Graph …
Expression Metadata
Ports
Layout
10
Files
Formats
Components
Flows
Layouts
11
Setup Command
Ab Initio Host (AIH) file
Graph
End Script
Local to the Graph
12
13
14
15
DML
Ab Initio stores metadata in the form of record formats.
XFR
Data can be transformed with the help of transform functions.
16
CPU2
Pushing “Run” button generates script
BUS
Memory
Script is transmitted to Host node
Disk
Script is invoked,creating Host process
HOST
GDE
17
HOST
GDE
Agent Agent
18
HOST
GDE
Agent
Agent
19
HOST
GDE
Agent
Agent
20
HOST
GDE
Agent
Agent
21
HOST
GDE
22
HOST
GDE
23
HOST
GDE
Agent
Agent
24
HOST
GDE
Agent
Agent
25
GDE
26
GDE
27
0212Sam Spade
0492Sue West
Data Types 0221William Black
record
decimal(4) id;
DML BLOCK string(6) first_name;
string(6) last_name;
end
28
DML Syntax
29
Data Types
String
Decimal
Integer
30
0322,17-01-00, 890.50Elvis,Jones
0492,25-12-02,1000.00Sue,West
decimal(“,”) id;
date(“DD-MM-YY”)(“,”) join_date;
decimal(7,2) salary_per_day;
Precision string(“,”) first_name;
& Scale
string(“\n”) last_name;
end
31
32
record
0345,090297John,Smith; decimal(7) id;
date(“MMDDYY”) join_date;
string(“,”) first_name;
Drop string(“;”) last_name;
end
Reformat
Reformat Reorder
id+1000000
record
decimal(7) id;
string(8) last_name;
date(“DD-MM-YY”)(“,”) join_date; 1000345,Smith 1997-09-02
end
33
34
Ab Initio >
DAY 2
35
Filter by Expression
Reformat
Redefine Format
Sort
Join
Replicate
Dedup
Aggregate
Rollup
Scan
36
Input port
expr
true?
Yes No
REJECT
Input records that caused error
ERROR
Associated error message
LOG
Logging records
38
39
Count
Reject-Threshold
Abort
Never Abort
Limit
Ramp
40
Limit
Number of errors to tolerate
Ramp
Scale of errors to tolerate per input
Limit = 1 Ramp = 0.01 Abort if more than 1 in 100 records causes error
41
Keys
A key identifies a field or set of fields to organize a dataset
Single Field: employee_number
Multiple field or Composite key: (last_name; first_name)
Modifiers: employee_number descending
Sort Component
Reads records from input port, sorts them by key, writes result to output
port
Parameters
Key
Max-core
42
PORTS PARAMETERS
in count
out key
unused override key
reject (optional) transform
error (optional) limit
log (optional) ramp
43
Join Types
Inner
Outer
Explicit
Join Methods
Merge Join
Using sorted inputs
Hash Join
Using in-memory hash tables to group input
44
transform function.
An example
A join component may have a transform function with
prioritized rules as
out.ssn :1: in1.ssn;
45
rules
Aggregate/Rollup/Scan
Generates summary records for group of input records
46
Name Description
Normalize Generates multiple data records from each input data record
Separate a data record with a vector field into several individual records, each containing
one element of the vector.
Denormalize Consolidates groups of related data records into a single output record with a vector
Sorted field for each group
Requires Grouped Input
Validate Separates valid data records from invalid data records
Records
Check Order Tests whether data records are sorted according to a key-specifier.
47
Function categories
Date functions : now(), today(), date_to_int(), ..
48
49
A Vector is
number of repeats is a constant integer or the value of another field in the record.
record
string(20) cust_id;
decimal(3) num_purchases;
decimal(8.2)[num_purchases]purchase_amt;
end;
Initializing a Vector
let decimal(‘,’)[100] values = make_constant_vector(100, 0);
50
Input Table
Output Table
Update Table
DB table
51
Truncate Table
Run SQL
52
Ab Initio >
DAY 3
53
a process.
Forms of Parallelism
Component Parallelism
Inherent in Ab Initio
Pipeline Parallelism
Data Parallelism
54
Sorting Transactions
55
Processing Record 99
56
Global View
Multifile
Multifiles
A global view of a set of ordinary files called partitions usually
Ab Initio provides shell level utilities called “m_ commands” for
part:
mfile://pluto.us.com/usr/ed/mfs1/new.dat
58
mfile://host1/u/jo/mfs/mydir
//host1/u1/jo/mfs
//host1/vol4/pA/mydir //host2/vol3/pB/mydir //host3/vol7/pC/mydir
<.mdir>
59
mfile://host1/u/jo/mfs/mydir/myfile.dat
//host1/u1/jo/mfs/mydir
/myfile.dat //host1/vol4/pA/mydir //host2/vol3/pB/mydir //host3/vol7/pC/mydir
/myfile.dat /myfile.dat /myfile.dat
60
A multidirectory
A multifile
Control file
Partitions (Serial Files)
61
62
Broadcast
Partition by Key
Partition by Expression
Partition by Range
Partition by Percentage
63
64
Partition by key
Records with same key value goes into the same partition
65
D D A B D
A B D
F F
A B D
E E
A E D
A A
C E F
D D
C G F
66
67
Partition by Range
partitions according to the ranges of key values specified for each
partition
Splitters or split port
Partition by Range + Sort Global Ordering
Partition by Percentage
distributes a specified percentage of the total number of input data
records to each output flow
Pct port
68
Partition by Expression
partitions according to a specified hash function or DML expression
Broadcast
Acts like a partitioning component when the layout changes
69
70
A A
A A
A A p0 A A
B A A
C B C A p0
D B D
A B p1 A B
D D B p1
C C D
D C D C p2
B C p2 B
C D D
B D B D
A D A D
D p3 D
D p3
Partitions evenly balanced Partitions skewed
Gather
Concatenate
Global View
Merge
Interleave
Expanded View
72
Gather
Reads data records from the flows connected to the input port
Concatenate
another
Merge
Combines data records from multiple flow partitions that have been sorted on
a key
74
Ab Initio >
DAY 4
75
Serial or Multifiles
on disks
associates key values with corresponding data values to index records and
retrieve them
Lookup parameters
Key
Record Format
76
Storage Methods
Lookup Functions
Name Arguments Purpose
lookup() File Label and Expression. Returns a data record from a Lookup File which matches with the
values of the expression argument
NOTE: Data needs to be partitioned on same key before using lookup local functions
77
next_in_sequence()
this_partition()
number_of_partitions()
last_generated_key
78
when the graph stops progressing because of mutual dependency of data among
components
Identified when record count in does NOT change over a period of time
Phasing
Checkpointing
Flow Buffering
Blocking on read
Blocking on write
80
Ab Initio >
Performance
81
Source/Target files
Temporary files
Phases
Checkpoints
Buffered Flows
82
datasets
phases
checkpoints
"spilling" to disk
sort
83
84
Overdriving machine
Symptoms
Increased processes
85
Memory: Consumers
Lookup Tables
In-memory components
Join
86
Memory: max_core
Exceeding max-core
Issues
87
separately
If multiple graphs require same database table, unload to a file and replicate.
88
Performance Enhancements:
Memory usage:
89
Performance Enhancements
90
DO NOT put phases after Replicates, across all-to-all flows, after temp
91
Purpose:
At job start, output datasets are copied to temporary files (in .WORK dirs)
temporary files
92
Same as phase
m_rollback –d graphname.rec
WARNING: rolling back old .rec files will restore the output files to old state
93
94
95
Vertex: component
Records: # processed
96
Vertex: component
Port: of component
Avoid Sorts
Use Lookups
Phasing
98
THANK YOU
99