Documente Academic
Documente Profesional
Documente Cultură
Introduction to Datastage
History of Datastage
Architecture of Datastage
Components of Datastage
2)
3)
4) Go to Project Tab and Click on Add button
5)
6) Provide the name of the project you want to create.
7)
Enter Datastage User Name and Password and select the project you want to work and Click on
OK
What is Job?
Types of Jobs?
Creating a Datastage Job:
Business Requirement:
Gosales(MSSQL)
Order Method
OrderMethodCD
Gosalesdw(Oracle)
ORDER_METHOD_DIM
Order
Nm
ORDER_METHOD_ID
ORDER_METHOD_DESC
Target Table(DIM_ORDER_METHOD)
ORDER_METHOD_KEY(surrogate_key)
ORDER_METHOD_CODE
ORDER_METHOD_DESC
2)
By Using Transformer Stage in the Job.
3)
4)
5)
6)
7)
8)
4)
To use RCP in the jobs we need enable RCP flag at Source Stage and Transformer Stage.
EMP
Trsnfr
Emp_dim
Emp: R2: Wants to Populate Gender Name in the Target table based on Gender Code
BR: EMP_DIM: R3 Want to store Year of Hire and Month of Hire based on Hire Date available in the
source.
EMP_DIM: R5: Wants to populate Manger Code1, Manager_name1, Manger Code2, Manger Code3,
Manger Code4, Manger Code5 in to Employee DIM table.
Data Set
Joiner(emp_
cd=emp_cd
EMp_Hist
Joiner(mgr1_cd=
emp_cd
EMP_HIST
Parameters:
Parameters can be divided into 2 types. 1) Global Parameters(Project Level). 2) Local
Parameters(Job Level).
Global Parameters can be created in a datastage Administrator
Local Parameters can be created in Datastage designer.
Global Parametres:
If you want to use these global parameters in the Job. We need include the parameters in the Job
Parameters.
Order method NM
Fax
Web
Email
Telephone
CD
601
602
603
604
605
1
2
3
3
4
NM
Fax
Web
E-Mail
Email
Telephone
CURR_INDICATOR
Y
Y
N
Y
Y
Configure Lkp table to connect Order Method Dim table on Target DB:
Configure Target DB as mentioned Below to connect Target Table(Order Method Dim) for Insert
Configure Target DB to connect target table(Order Method Dim) for Update purpose.
4) Execute "LIST UV.ACCOUNT <project name>" and if you see the project name
type: "DELETE UV.ACCOUNT project name"
5) "LIST UV_SCHEMA to see the List of project names, then if you see the
Project name type: "VERIFY.SQL SCHEMA <project_name> FIX"
6) Check that you cannot see the project by typing "LIST UV_SCHEMA"
7) If you still see the project then enter "Drop SCHEMA project_name cascade;"
$ cd /opt/dsSoftware/Ascential/RTIServer/bin/
3. Start or stop the server using nohup command
For starting
$ nohup ./RTIServer.sh start &
For stopping
$ nohup ./RTIServer.sh stop &
4. Check whether the RTI server has been restarted. Execute the below command
$ ps -ef| grep RTI
Find the sample output for the above command
dsadm 4977 4946 0 16:26:09 pts/7 0:00 grep RTI
dsadm 20018 1 0 Feb 27 ?
1061:21
/opt/dsSoftware/Ascential/RTIServer/apps/jre/bin/java -Xmx256m -server -Dprogra
Or open the IE browser and enter the url in the address bar
http://<Server Name >:8080/rti/
Eg : http://stvsauxpac01.corpnet2.com:8080/rti/
http://kopsapace02.corpnet2.com:8080/rti/
EnableStaticCursorsForLongData=0
ApplicationUsingThreads=1
This document is about adding an entry in .odbc.ini file to allow for DB2 connectivity.
The Sample Entry for DB2 connectivity is given below
L ogonID =S CD WUS ER
P asswor d=
P ackage=P M ARP CK
P ackageOwner =S CD WUS ER
TcpP or t=4 46
WithHold=1
The entry within [ ] is the name of the entry (PMAR_JDE_446_ODBC in this case)
The driver is the location of ODBC driver for DB2. An ODBC driver is needed to allow connectivity from
Datastage to any Database.
AddStringToCreateTable is the string that should be added while issuing create table commands
Collection is the name of the Library that has tables to which the user has access. (I believe that no
matter which library you are using here you would be able to access the ones your DB user has
privileges).
Location is the name of the Relational Database (RDB) on the AS/400 server
LogonID is the user Logon with which the user logs on to RDB on AS/400
Run the "WRKRDBDIRE" command on AS/400 and use the entry against the one that is typed *LOCAL.
To determine the correct port number, execute 'NETSTAT' from an AS/400 command line. Choose
option 3 to display a list of active ports on the AS/400. Find the entry for DRDA and press F-14 to toggle
the display the port number.
Once the changes are made to .ODBC.INI file, the next step is to Bind the package. This is essential
before checking the DSN Connectivity.
$. /bind20 PMAR_JDE_8471_ODBC
Bindings may not happen successfully sometimes and most probable reasons are
1. Wrong Port: This will cause the Bind operation to hang forever. Press ctrl c to break from the
operation and re-edit the entries in .odbc.ini by specifying the correct port umber using the
command listed above
2. Incorrect UserId/Password: Attempting to bind the package with incorrect user credentials or
credentials having insufficient privileges will cause the Bind operation to fail with the error 7680.
Discuss with DB2 or AS/400 admin to get the correct user credentials and/or privileges.
3. Incorrect Port / IP Address: Trying to bind a package with incorrect IP/Port will cause the
binding to fail with the error 7505. Follow the commands listed above to identify the Port
number and discuss with DB2 or AS/400 admin to get the correct IP Address.
4. Incorrect Location/Collection: Package bind will fail with the error 1242 if the Location or
collection mentioned in the .odbc.ini file is incorrect. Use the procedure mentioned above to
find out the correct Location and contact DB2 or AS/400 admin to get the Collection name to
which you have access.
5. Network Error: Package creation and binding may sometimes fail with the error 7500 which
would indicate a network failure.
The cause for other error messages during the package binding can be found by looking at the file
ivdb220.po in the directory /branded_odbc/locale/en_US/LC_MESSAGES
Once the Bind is successful, the next step is to test the ODBC connectivity. This can be done as
1. If you havent previously done so, cd to $DSHOME and set up the
DataStage environment by running dsenv:
. ./dsenv
2.
./bin/dssh
The DSEngine shell starts.
3. Log to the project:
LOGTO PMARDev
Where project_name is case sensitive.
4. Get a list of available DSNs by typing:
DS_CONNECT
5. Test the required connection by typing:
DS_CONNECT PMAR_JDE_8471_ODBC
6. Once the test is successful,
exit out by pressing .Q
Once the DSN connectivity is tested from the Unix box, the next step is to import tables from Datastage
using ODBC and start using the same in Datastage Jobs.
2. Check that there are no client connections or phantom jobs running in the background
There should not be any processes as a result of the above commands. If there are any
phantom processes or client connections, they need to be killed using the process below.
Request the client (szs42740 for example) to close their client connections and/or log
onto to Unix Box and kill the process.
If the clients (szs42740 for example) are not traceable and/or there is a pressing need to
restart the Datastage service, issue the following commands
$ super mdc-kill-phantom
$ super mdc-kill-dsapi_slave
The first command kills all the phantom processes and the second command kills all the
dsapi_slave connections.
$ /opt/dsSoftware/Ascential/DataStage/DSEngine/bin
$ ../dsenv
$ /opt/dsSoftware/Ascential/DataStage/DSEngine
8. Change the below mentioned values in the uvconfig file using Vi editor
RLTABSZ 100
GLTABSZ 100
MAXRLOCK 99
12. Change the directory to DsEngine .To Confirm the changes have taken effect issue
the command below
$ /opt/dsSoftware/Ascential/DataStage/DSEngine/
$ bin/uvregen -t
Step 6: Click OK
Step 7: Select the created project and click Properties
Step 8: Check the options in the Project Properties as displayed in the image below
Users who are not part of the primary group(Eg:dstage) or secondary group (Eg : ds_scdw) entered in
the .developer.adm file cannot access the project.
$ /opt/dsSoftware/Ascential/DataStage/DSEngine/bin
$ ../dsenv
$ ./uv
> LOGTO UV
> DEADLOCK.MENU
>Q
The deadlock daemon on waking found defunct processes and initiated a cleanup
Someone (whose pid is given) started the deadlock daemon, maybe from
DEADLOCK.MENU, maybe from the command line. If pid = 1 this was the auto-start
on re-boot.
LDAP Configuration
Initial Setting in the WAS for Global security
Please follow the steps below to configure LDAP for IBM Information Server.
Prerequisites: Create VSED user id and Password which has full administrative rights
And get the Type, Host, Port, Base Distinguished name
Step 3: Login as root to the Datastage Server. Stop the IBM Information Server
Cd etc/rc2.d
# ./AppServerAdmin.sh -was -user yqz99739 -password mask31july
Info WAS instance /Node:stvus059Node01/Server:server1/ updated with new user information
Info MetadataServer daemon script updated with new user information
# ^C
# ./DirectoryAdmin.sh -delete_groups
#
./DirectoryAdmin.sh -delete_users#
^C#
# ./DirectoryAdmin.sh -delete_users
Eg: cd /local/apps/dsSoftware/715A/Ascential/DataStage/DSEngine
$cd bin
$../dsenv
$. /dspackinst
Screenshot 1:
Screenshot 2:
Screenshot 3:
Screen shot 4:
Screen shot 5:
Step 7 The package installer will search for the Projects on the server and select the project you want to the plug-in to be
registered (Screen shot 6)
Screen shot 6:
Screen shot 7:
The package installer will show the installation details which you have given in the previous steps (Screen shot 8)
Screen shot 8:
Screen shot 9:
Note: Proper considerations have to be taken when doing an FTP from the plug-in source to the Datastage server.
Releasing Resource Locks
DataStage Director pull down Job->Cleanup Resources. Choosing this option will open the Job Resources
interface.
Locate the PID in the Processes pane and select the row.
Release the lock by clicking on the Logout button. This will kill the process holding the lock,thus
releasing it.
$ cd /opt/dsSoftware/Ascential/DataStage/DSEngine/bin
$ ../dsenv
$ ./uv
$ LIST.READU EVERY
Device.... Inode.... Netnode Userno Lmode G-Address. Locks ...RD ...SH ...EX
69847306
11234
0 36986 5 IN
1000
69847306
7126
0 40649 8 IN
B800
69847306
16839
0 58192 10 IN
B000
69847306
16839
0 40669 19 IN
7000
69847306
20851
0 40649 19 IN
1000
11234
69847306
11234
69847306
7126
69847306
69847306
7126
6997
aInsertLots
If the user wants to release the Resource identified above in the messages higlighted .Issue the
command below
$ LOGTO UV
$ UNLOCK INODE 6997 USER 65053 ALL
Please backup and save the original project if anything goes wrong;
Prerequisites
After logging onto the Unix box using our login credentials, we switch the user to dsadm
This can be done by issuing the command
su dsadm
Before attempting to stop Datastage service, ensure that there are no client connections or
phantom jobs running in the background.
There should not be any processes as a result of the above commands. If there are any phantom
processes or client connections, they need to be killed using the process below.
a. Find out the user of the process. This can be found by looking at the process entry
Eg: szs42740 7854 7846 0 11:29:25 ?
0:07 dsapi_slave 9 8 0
A sample entry as shown above, indicates that the user szs42740 is having a client connection
(dsapi_slave).
In such case, request the client (szs42740 in the example) to close their client connections and/or
log onto to Unix Box and kill the process.
If the clients are not traceable and/or there is a pressing need to restart the Datastage service, issue
the following commands
super mdc-kill-phantom
super mdc-kill-dsapi_slave
The first command kills all the phantom processes and the second command kills all the dsapi_slave
connections.
Attempt to stop the service after performing prerequisite activities detailed above.
The Datastage Service can be stopped by issuing the commands
cd $DSHOME
. ./dsenv
bin/uvsh
cd bin
This shuts down the server engine and frees any resources held by the server engine process.
Wait for atleast 30 seconds, after stopping the Datastage service, before you attempt to restart the
Datastage Service
This command starts the dsrpcd daemon, which is daemon for server engine.
Check whether the Datastage service is running by issuing the following command
The above command may have multiple line output, but if the service is running then there should
be a row with a LISTEN.
*.31538
*.*
0 49152
0 LISTEN
Symptom
The Datastage service was stopped and restarted, but attempting to connecting to the Host from
the Datastage client (eg: Designer), results in an error such as
Cause
The service was restarted without ensuring that client connections are closed. This causes the port to
be unavailable for any connections
Remedy
Restart the service once again using the commands by issuing commands under the sections Stop
Datastage Service and Start Datastage Service. Stopping and Starting the service again is known to
resolve this issue.
Restarting of a Datastage service may be necessary under various circumstances. The most common
need for a restart of the service is the changes made to the Environment file dsenv.
Prerequisites
3. Disable the SiteScope Monitor for the server that you are going to re-start (e.g breus002)
http://stvsawnv0539:8888/SiteScope/accounts/
4. Login as dsadm user and switch to super root user.
After logging onto the Unix box using our login credentials, we switch the user to dsadm
This can be done by issuing the command
super root-shell
Before attempting to stop Datastage service, ensure that there are no client connections or
phantom jobs running in the background.
There should not be any processes as a result of the above commands. If there are any phantom
processes or client connections, they need to be killed using the process below.
b. Find out the user of the process. This can be found by looking at the process entry
Eg: szs42740 7854 7846 0 11:29:25 ?
0:07 dsapi_slave 9 8 0
A sample entry as shown above, indicates that the user szs42740 is having a client connection
(dsapi_slave).
In such case, request the client (szs42740 in the example) to close their client connections and/or
log onto to Unix Box and kill the process.
If the clients are not traceable and/or there is a pressing need to restart the Datastage service, issue
the following commands
super mdc-kill-phantom
super mdc-kill-dsapi_slave
The first command kills all the phantom processes and the second command kills all the dsapi_slave
connections.
Attempt to stop the service after performing prerequisite activities detailed above.
cd /etc/rc2.d
# ./S99ds.rc 'stop'
Stopping JobMonApp
JobMonApp has been shut down.
DataStage Engine 8.1.0.0 instance "ade" has been brought down.
# ./S99ISFAgents 'stop'
Agent stopped.
LoggingAgent stopped.
# ./S99ISFServer 'stop'
ADMU0116I: Tool information is being logged in file
/local/apps/DRS_dstage/IS81/IBM/AppServer/profiles/default/logs/server1/stopServer.log
ADMU0128I: Starting tool with the default profile
ADMU3100I: Reading configuration for server: server1
ADMU3201I: Server stop request issued. Waiting for stop status.
ADMU4000I: Server server1 stop completed.
# ps -efd|grep java
root 6042 5544 0 21:39:52 pts/3
# ps -efd|grep ds
dsadm 3201 3200 0 20:27:24 ?
0:00 /opt/openssh/libexec/sftp-server
0:00 /opt/openssh/sbin/sshd -R
0:00 -ksh
0:00 -ksh
0:00 /opt/openssh/sbin/sshd -R
0:00 tail -f startServer.log
0:00 grep ds
0:00 /opt/openssh/sbin/sshd -R
0:00 /opt/openssh/sbin/sshd -R
0:00 -ksh
0:00 /opt/openssh/sbin/sshd -R
0:00 -ksh
# kill 26245
# ps -efd|grep ds
dsadm 3201 3200 0 20:27:24 ?
0:00 /opt/openssh/libexec/sftp-server
0:00 /opt/openssh/sbin/sshd -R
0:00 -ksh
0:00 -ksh
0:00 /opt/openssh/sbin/sshd -R
0:00 grep ds
0:00 tail -f startServer.log
0:00 /opt/openssh/sbin/sshd -R
0:00 /opt/openssh/sbin/sshd -R
0:00 -ksh
0:00 /opt/openssh/sbin/sshd -R
0:00 ksh
This shuts down the server engine and frees any resources held by the server engine process.
# ./S99ISFServer 'start'
# ./S99ISFAgents 'start'
LoggingAgent.pid: No such file or directory
Starting LoggingAgent...
LoggingAgent started.
Agent.pid: No such file or directory
Starting Agent...
Agent started.
# ./S99ds.rc 'start'.
# ./S99dsrfcd.rc 'start'
Check whether the Datastage service is running by issuing the following command
The above command may have multiple line output, but if the service is running then there should
be a row with a LISTEN.
*.31538
*.*
0 49152
0 LISTEN
Symptom
The Datastage service was stopped and restarted, but attempting to connecting to the Host from
the Datastage client (eg: Designer), results in an error such as
Cause
The service was restarted without ensuring that client connections are closed. This causes the port to
be unavailable for any connections
Remedy
Restart the service once again using the commands by issuing commands under the sections Stop
Datastage Service and Start Datastage Service. Stopping and Starting the service again is known to
resolve this issue.
$ cd /opt/dsSoftware/Ascential/RTIAgent/bin/
For starting
$ nohup ./RTIAgent.sh start &
For stopping
$ nohup ./RTIAgent.sh stop &
4. Check whether the RTI server has been restarted .execute the below command
ps -efd|grep RTIAgent
dsadm 26190 26178 0 14:02:32 pts/3 0:00 grep RTIAgent
dsadm 26164 1 0 14:02:00 pts/2 0:01 /opt/dsSoftware/Ascential/RTIAgent/jre/bin/java Djava.library.path=/opt/dsSoft
Aggregator Stage :
Aggregator classifies data rows from a single input link into groups and calculates totals or other
aggregate functions for each group. The summed totals for each group are output from the stage thro'
output link. Group is a set of record with the same value for one or more columns
Example : Transaction records might be grouped by both day of the week and by month. These
groupings might show the busiest day of the week varies by season.
Filter Stage :
The Filter stage transfers, unmodified, the records of the input data set which satisfy the specified
requirements and filters out all other records.
Filter stage can have a single input link and a any number of output links and, optionally, a single reject
link. You can specify different requirements to route rows down different output links. The filtered out
records can be routed to a reject link, if required.
Funnel Stage :
Funnel Stage copies multiple input data sets to a single output data set. This operation is useful for
combining separate data sets into a single large data set. The stage can have any number of input links
and a single output link.
Join Stage :
Definition : Join Stage performs join operations on two or more data sets input to the stage and then
outputs the resulting data set.
The input data sets are notionally identified as the "right" set and the "left" set, and "intermediate" sets.
It has any number of input links and a single output link.
Lookup Stage :
Lookup Stage used to perform lookup operations on a data set read into memory from any
other Parallel job stage that can output data.
It can also perform lookups directly in a DB2 or Oracle database or in a lookup table contained
in a Lookup File Set stage.
Merge Stage :
Join Stage combines a sorted master data set with one or more update data sets. The columns
from the records in the master and update data sets are merged so that the output record contains all
the columns from the master record plus any additional columns from each update record.
A master record and an update record are merged only if both of them have the same values for
the merge key column(s) that you specify. Merge key columns are one or more columns that exist in
both the master and update records.
The data sets input to the Merge stage must be key partitioned and sorted. This ensures that
rows with the same key column values are located in the same partition and will be processed by the
same node.
Modify Stage
The Modify stage alters the record schema of its input data set. The modified data set is then output. It
is a processing stage. It can have a single input and single output.
Pivot Stage :
Pivot Stage converts columns in to rows.
Eg., Mark-1 and Mark-2 are two columns.
Task : Convert all the columns in to one column.
Implication : Can be used to co SCD Type-3 to Type-2.
Using Methodology : In the deviation field of the output column change the input columns in to one
column.
Eg., Column Name "Marks".
Derivation : Mark-1 and Mark-2.
Note : Column "Marks" is derived from the input columns Mark-1 and Mark-2.
Switch Stage
The switch stage takes a single data set as input and assigns each input row to an output data set based
on the value of a selector field.
It can have a single input link, up to 128 output links and a single rejects link. This stage performs an
operation similar to a C switch statement. Rows that satisfy none of the cases are output on the rejects
link.
Compress Stage :
The Compress stage uses the UNIX compress or GZIP utility to compress a data set. It converts a data set
from a sequence of records into a stream of raw binary data
A compressed data set cannot be processed by many stages until it is expanded, i.e., until its rows are
returned to their normal format. Stages that do not perform column based processing or reorder the
rows can operate on compressed data sets. For example, you can use the copy stage to create a copy of
the compressed data set.
Expand Stage
The Expand stage uses the UNIX compress or GZIP utility to expand the data set. It converts a data set
from a stream of raw binary data into sequence of records.