Sunteți pe pagina 1din 222

V7.0.

cover

Front cover

IBM InfoSphere DataStage


Essentials v9.1

(Course code KM202)

Student Exercises
ERC 1.0
Student Exercises

Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International
Business Machines Corp., registered in many jurisdictions worldwide.
The following are trademarks of International Business Machines Corporation, registered in
many jurisdictions worldwide:
AIX AS/400 DataStage
DB2 HACMP InfoSphere
iSeries pSeries QualityStage
WebSphere xSeries zSeries
Intel and Intel Core are trademarks or registered trademarks of Intel Corporation or its
subsidiaries in the United States and other countries.
Lenovo and ThinkPad are trademarks or registered trademarks of Lenovo in the United
States, other countries, or both.
Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other
countries, or both.
VMware and the VMware "boxes" logo and design, Virtual SMP and VMotion are registered
trademarks or trademarks (the "Marks") of VMware, Inc. in the United States and/or other
jurisdictions.
Other product and service names might be trademarks of IBM or other companies.

December 2012 edition


The information contained in this document has not been submitted to any formal IBM test and is distributed on an as is basis without
any warranty either express or implied. The use of this information or the implementation of any of these techniques is a customer
responsibility and depends on the customers ability to evaluate and integrate them into the customers operational environment. While
each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will
result elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk.

Copyright International Business Machines Corporation 2012.


This document may not be reproduced in whole or in part without the prior written permission of IBM.
Note to U.S. Government Users Documentation related to restricted rights Use, duplication or disclosure is subject to restrictions
set forth in GSA ADP Schedule Contract with IBM Corp.
V7.0.1
Student Exercises

TOC Contents
Exercises description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Exercise 1. Log onto the Information Server Web Console . . . . . . . . . . . . . . . . . . 1-1


Course Image Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Task: Log onto the Information Server Web Console . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3

Exercise 2. Administering DataStage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1


Task: Create a DataStage administrator and user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Task: Log onto DataStage Administrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
Task: Specify property values in DataStage Administrator . . . . . . . . . . . . . . . . . . . . . . 2-7
Task: Set DataStage permissions and defaults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10

Exercise 3. Importing and exporting DataStage objects . . . . . . . . . . . . . . . . . . . . . 3-1


Task: Log onto DataStage Designer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2
Task: Create a Repository folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
Task: Import DataStage object files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4
Task: Export a folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6

Exercise 4. Import a table definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1


Task: Import a table definition from a sequential file . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2

Exercise 5. Creating parallel jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1


Task: Create a parallel job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2
Task: Compile, run, and monitor the job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5
Task: Specify Extended Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6
Task: Document your job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8
Task: Add a job parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9
Task: Create a parameter set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-10

Exercise 6. Reading from and writing to sequential files . . . . . . . . . . . . . . . . . . . . 6-1


Task: Read and write to a sequential file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
Task: Create a job parameter for the target file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5
Task: Add Reject links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
Task: Create a second output link from a Copy stage . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8
Task: Read a file using multiple readers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11
Task: Create a job that reads multiple files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12

Exercise 7. Reading and writing NULL values to a sequential file . . . . . . . . . . . . . 7-1


Task: Read NULL values from a sequential file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2
Task: Write NULL values to a sequential file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-7

Exercise 8. Working with data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1


Task: Write to a Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
Task: View a data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3

Copyright IBM Corp. 2012 Contents iii


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Exercise 9. Partitioning and collecting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1


Task: Partitioning and collecting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
Task: View the OSH, Configuration File, and Score . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-7

Exercise 10. Using the Lookup stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1


Task: Look up the warehouse item description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2
Task: Handle lookup failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-6
Task: Add a Reject link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-9

Exercise 11. Range lookups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-1


Task: Design a job with a reference link range lookup . . . . . . . . . . . . . . . . . . . . . . . . 11-2
Task: Design a job with a stream range lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-5

Exercise 12. Using the Join, Merge, and Funnel stages . . . . . . . . . . . . . . . . . . . . 12-1
Task: Use the Join stage in a job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2
Task: Use the Merge stage in a job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-5
Task: Use the Funnel stage in a job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-8

Exercise 13. Group processing stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-1


Task: Create the job design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2

Exercise 14. Defining a constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-1


Task: Define Transformer Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-2
Task: Use an Otherwise Link to capture range errors in the data . . . . . . . . . . . . . . . . 14-5

Exercise 15. Define derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-1


Task: Build a formatting derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-2
Task: Use a function in a derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-6
Task: Build a conditional replacement derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-7
Task: Capture rejects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-8

Exercise 16. Loop processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-1


Task: Pivot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-2

Exercise 17. Group processing in the Transformer . . . . . . . . . . . . . . . . . . . . . . . 17-1


Task: Process groups in a Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-2
Task: Add group results to individual group records . . . . . . . . . . . . . . . . . . . . . . . . . . 17-9
DataStage parallel job debugger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-11

Exercise 18. Repository functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-1


Task: Execute a Quick Find . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-2
Task: Execute an Advanced Find . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-3
Task: Generate a report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-5
Task: Perform an impact analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-6
Task: Find the differences between two jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-7
Task: Find the differences between two table definitions . . . . . . . . . . . . . . . . . . . . . 18-10

Exercise 19. Reading and writing to relational tables . . . . . . . . . . . . . . . . . . . . . . 19-1

iv IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

TOC Task: Create a Data Connection object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-2


Task: Create and load a DB2 table using the DB2 Connector stage . . . . . . . . . . . . . . 19-4
Task: Import a table definition using ODBC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-7
Task: Create a job that reads from a DB2 table using the ODBC Connector stage . . 19-10

Exercise 20. Connector stages with multiple input links . . . . . . . . . . . . . . . . . . . 20-1


Task: Create a job with multiple Connector input links . . . . . . . . . . . . . . . . . . . . . . . . . 20-2

Exercise 21. Construct an SQL statement using SQL Builder . . . . . . . . . . . . . . . 21-1


Task: Build an SQL SELECT statement using SQL Builder . . . . . . . . . . . . . . . . . . . . . 21-2
Task: Use the SQL Builder expression editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-6

Exercise 22. Build and run a Sequence job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-1


Task: Build a Job Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-2
Task: Add a user variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-7
Task: Add a Wait for File stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-9
Task: Add exception handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-11

Copyright IBM Corp. 2012 Contents v


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

vi IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

pref

Copyright IBM Corp. 2012 vii


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

viii IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

pref Exercises description


This course includes the following exercises:
One or more exercises for each unit.
The exercises to be done for a particular unit and the point at which
they are to be done is identified in the presentation slides.

Copyright IBM Corp. 2012 Exercises description ix


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

x IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 1. Log onto the Information Server Web


Console

What this exercise is about


This exercise introduces the Information Server Web Console.

What you should be able to do


At the end of this exercise, you should be able to:
Log onto the Information Server Web Console.

Introduction
This exercise introduces the Information Server Web Console.

Requirements
This lab must be taken using the course VMWare images or the
equivalent configuration as described in the Lab Setup Guide.

Copyright IBM Corp. 2012 Exercise 1. Log onto the Information Server Web Console 1-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Course Image Information


If you are using the VMWare course images supplied with this course,
the following summarizes the user IDs and passwords.
Server system name:
EDSERVER.IBM.COM. Alias EDSERVER.
Server system user IDs:
Root: root / master
DB2 administrator: db2inst1 / db2inst1
Client system user IDs:
System: student / student
IS User IDs and passwords:
WebSphere Application Server: wasadmin / wasadmin
Information Server Administrator: isadmin / isadmin
DataStage Administrator: student / student
Information Server Repository owner: xmeta / xmeta

1-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Log onto the Information Server Web Console


__ 1. Open a Web Browser. Enter the IP address of the InfoSphere Information Server
Web Console: http://edserver.ibm.com:9080/ibm/iis/console/. Here,
edserver.ibm.com is the name of the Information Server Domain system and 9080
is the port number used to communicate with it.

Copyright IBM Corp. 2012 Exercise 1. Log onto the Information Server Web Console 1-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 2. Enter a Suite Administrator user ID and password, here isadmin / isadmin.

__ 3. Click Login. If you see the following window, Information Server is up and running.

1-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 2. Administering DataStage

What this exercise is about


This exercise covers DataStage administration in both the Information
Server (IS) Web Console and in the DataStage Administrator client.

What you should be able to do


At the end of this exercise, you should be able to:
Create DataStage user IDs in the Information Server (IS) Web
Console
User DataStage Administrator to specify the DataStage global and
project environment

Introduction
In this exercise you learn how DataStage user IDs are created in the
IS Web Console. Then you will log onto DataStage Administrator and
configure your DataStage environment.

Requirements
Exercise 1 was completed and the IS Information Web console is
open.

Copyright IBM Corp. 2012 Exercise 2. Administering DataStage 2-1


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Create a DataStage administrator and user


__ 1. In the Information Server Administration Console, click the Administration tab.
I

__ 2. Expand the Domain Management folder. Click Engine Credentials.


__ 3. Select the DataStage Server, here EDSERVER.IBM.COM.
__ 4. Click Open Configuration. In the Default Credentials boxes, type a valid operating
system user ID and password of the DataStage Server machine. Here, enter dsadm
/ dsadm. (If you are using the course images this is already done for you.)

__ 5. Click Save and Close.


__ 6. Expand the Users and Groups folder and then click Users. You should see at least
two users: isadmin is the Information Server administrator ID; wasadmin is the
WebSphere Application Server administrator ID.
__ 7. Select the isadmin user and then click Open User.

2-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 8. Note the first and last names of this user. Expand the Suite and Suite Component
folders. Note what Suite roles and Product roles have been assigned to this user.

__ 9. Click Cancel to return to the Users main window.

Copyright IBM Corp. 2012 Exercise 2. Administering DataStage 2-3


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 10. Click New User. Create a new user ID named dsadmin. Use dsadmin for the first
and last names and password as well. Assign Suite User role and DataStage and
QualityStage Administrator Suite Component role to this user.

__ 11. Scroll down to click Save and Close.

2-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 12. Following the same procedure, create an additional user named dsuser. Assign
Suite User and DataStage and QualityStage User roles to dsuser (password is
also dsuser).

__ 13. Click Save and Close. Verify that dsuser and dsadmin have been created.

Copyright IBM Corp. 2012 Exercise 2. Administering DataStage 2-5


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Log onto DataStage Administrator


__ 1. Open the Administrator Client icon on your Windows Client desktop.

__ 2. Specify the host name (EDSERVER.IBM.COM). Type dsadmin in the User name
and Password boxes. Specify your DataStage server, here EDSERVER.IBM.COM.

__ 3. Click Login.

2-6 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Specify property values in DataStage Administrator


__ 1. Click the Projects tab. Select your project, DSProject, and then click the
Properties button.
__ 2. On the General tab select the Enable Runtime Column Propagation box (but not
for new links).

Copyright IBM Corp. 2012 Exercise 2. Administering DataStage 2-7


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 3. Click the Environment button to open up the Environment Variables window. In


the Parallel folder, examine the APT_CONFIG_FILE parameter and its default.
(The configuration file is discussed in a later unit.)

__ 4. In the Reporting folder, set to true the APT_DUMP_SCORE,


APT_STARTUP_STATUS, and OSH_DUMP variables.

__ 5. Click OK.

2-8 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 6. On the Parallel tab, check the box to make the generated OSH visible. Note the
default date and time formats. For example, the default date format is
YYYY-MM-DD, which is expressed by the format string shown.

__ 7. On the Sequence tab, check all the boxes.

Copyright IBM Corp. 2012 Exercise 2. Administering DataStage 2-9


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Set DataStage permissions and defaults


__ 1. Click the Permissions tab. Notice that isadmin and dsadmin (among others)
already exist as DataStage Administrators. This is because they were assigned the
DataStage Suite Component Administrator role in the Information Server
Administration console. DataStage administrators have full developer and
administrator permissions in all DataStage projects. On the other hand, dsuser,
does not receive permission to develop within a specified DataStage project unless
a DataStage Administrator explicitly gives permission. So you do not see dsuser
here.

2-10 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 2. Click Add User or Group. Notice that dsuser is available to be added. Select
dsuser and then click Add.

__ 3. Click OK to return to the Permissions tab. Select dsuser. In the User Role box,
select the DataStage Developer role.

__ 4. Click OK and then Close to close down DataStage Administrator.


__ 5. Log back into the Administrator client using the dsuser ID.
__ 6. Select your project and then click Properties. Notice that the Permissions tab is
disabled. This is because dsuser has not been assigned the DataStage
Administrator role and therefore does not have the authority to set DataStage
permissions.

Copyright IBM Corp. 2012 Exercise 2. Administering DataStage 2-11


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 7. Click on the Logs tab. Select the Auto-purge of job log box and set the
Auto-purge action to up to 2 previous job runs.

__ 8. Click OK and then close Administrator client.

2-12 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 3. Importing and exporting DataStage


objects

What this exercise is about


This exercise covers the import and export of DataStage objects.

What you should be able to do


At the end of this exercise, you should be able to:
Import and export DataStage objects to a file.

Introduction
This exercise introduces Designer client and covers the import and
export functionality.

Requirements
No new requirements.

Copyright IBM Corp. 2012 Exercise 3. Importing and exporting DataStage objects 3-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Log onto DataStage Designer


__ 1. Open the Designer client icon on the Windows desktop.

__ 2. Type information to log into your DataStage project:


Host name of the services tier followed by port number:
EDSERVER.IBM.COM:9080
User name: student
Password: student
Project: EDSERVER.IBM.COM/DSProject
__ 3. Click Login.

3-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Create a Repository folder


__ 1. Select your project folder in the Repository window, click your right mouse button,
and then click New>Folder. Create a folder named _Training. Under it, create two
folders: Jobs and Metadata.
__ 2. Click Repository>Refresh, which moves the folder you created to the top.

Copyright IBM Corp. 2012 Exercise 3. Importing and exporting DataStage objects 3-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Import DataStage object files


__ 1. Click Import>DataStage Components.
__ 2. In the Import from file box, select the TableDefs.dsx file in your
C:\CourseData\DSEss_Files>dsxfiles directory on your client machine.
__ 3. Select the Import selected button.

__ 4. Click OK.

__ 5. Select the table definition and then click OK.

3-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 6. Open up the table definition you have imported. You will find it in the
_Training>Metadata folder. It is named Employees.txt.

__ 7. Click the Columns tab. Note the column definitions and their types.

Copyright IBM Corp. 2012 Exercise 3. Importing and exporting DataStage objects 3-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Export a folder


In this task, you export your _Training folder into a file named Training.dsx.
__ 1. Select your _Training folder, click your right mouse button, and then click Export.
__ 2. In the Export to file box, select DSEssFiles>dsxfiles folder. Add the file name
Training.dsx.

3-6 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 3. Click Open. The Employees.txt file will be ready to export.

__ 4. Click Export, and then click Close.

Copyright IBM Corp. 2012 Exercise 3. Importing and exporting DataStage objects 3-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

3-8 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 4. Import a table definition

What this exercise is about


This exercise covers how to import a table definition for a sequential
file.

What you should be able to do


At the end of this exercise, you should be able to:
Import a table definition for a sequential file.
View a table definition stored in the Repository.

Introduction
Table definitions are loaded into stages in a job. A table definition for a
sequential file will be loaded into a Sequential File stage in order for
the stage to read the sequential file.

Requirements
No new requirements.

Copyright IBM Corp. 2012 Exercise 4. Import a table definition 4-1


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Import a table definition from a sequential file


__ 1. On your Client system, in a text editor, open up the Selling_Group_Mapping.txt file
in your DSEss_Files directory and examine its format and contents. Some
questions to consider:
Is the first row a row of column names?
Are the columns delimited or fixed-width?
If the columns are delimited, what is the delimiter?
How many columns? What types are they?

Note

For your convenience a copy of this file has been placed on your Client system in your
DSEss_Files folder. The file that you will import a table definition for and the file that the
DataStage job reads must be on the DataStage Server system, where the job runs.

__ 2. In Designer, click Import>Table Definitions>Sequential File Definitions.


__ 3. Click on the ellipsis (...) button next to the Directory box.
/CourseData>DSEss_Files. DSEss_Files will be entered in the Directory name
box.
__ 4. Click OK. The files in the DSEss_Files directory should be displayed in the Files
panel.
__ 5. Select \_Training\Metadata as the To folder.

4-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 6. Select the Selling_Group_Mapping.txt file.

__ 7. Click Import.

Copyright IBM Corp. 2012 Exercise 4. Import a table definition 4-3


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 8. Specify the general format on the Format tab. Be sure to specify that the first line is
column names, if this is the case. Then DataStage can use these names in the
column definitions.

__ 9. Click Preview to view the data in your file in the specified format. This is a check
whether you have defined the format correctly. If it looks like a mess, you have not
correctly specified the format. In the current case, everything looks fine.

4-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 10. Click the Define tab to examine the column definitions.

__ 11. Click OK, to import your table definition.


__ 12. After closing the import window, locate and then open your new table definition in
the Repository window. It is located in the folder you specified in the To folder box
during the import, namely, _Training\Metadata.

Copyright IBM Corp. 2012 Exercise 4. Import a table definition 4-5


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Information

If the table definition is not in _Training\Metadata in Designer, look for it in the Table
Definitions folder, where table definitions go by default. You may move the Table Definition
from there to _Training\Metadata by drag and drop.

__ 13. Click on the Columns tab to examine the imported column definitions.

__ 14. Click on the Format tab to examine the format specification. Notice the delimiter and
that the first row contains column names.

4-6 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 5. Creating parallel jobs

What this exercise is about


This exercise covers the whole process of creating, compiling,
running, and monitoring a DataStage parallel job.

What you should be able to do


At the end of this exercise, you should be able to:
Design a simple DataStage parallel job
Compile a job
Run a job
View messages written to the job log
Document a job using the Annotation stage
Define and use a job parameter in the job
Define and use a parameter set in the job

Introduction
Building a DataStage parallel job, however complex, involves the
same basic workflow. This exercise introduces you to that workflow.
Later exercises will introduce additional functionality into the workflow.

Requirements
No new requirements.

Copyright IBM Corp. 2012 Exercise 5. Creating parallel jobs 5-1


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Create a parallel job


__ 1. Open a new parallel job and save it under the name GenDataJob. Save it in your
_Training>Jobs folder. To accomplish this:
__ a. Click on File>New.
__ b. Click on the Parallel Job Icon and click OK.
__ c. Click on File>Save. Save as GenDataJob in your _Training>Jobs folder.
__ 2. Add a Row Generator stage and a Peek stage from the Development/Debug folder.
__ 3. Draw a link from the Row Generator stage to the Peek stage. To accomplish this,
click the right mouse button over the Row Generator stage and drag the mouse
cursor to the Peek stage.
__ 4. Name the Row Generator and link as Employees. Name the Peek stage
PeekEmployees as shown.

5-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 5. Open up the Row Generator stage to the Columns tab. Click the Load button to
load the column definitions from the Employees.txt table definition you imported in
an earlier lab.

Copyright IBM Corp. 2012 Exercise 5. Creating parallel jobs 5-3


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 6. Verify your column definitions with the following.

__ 7. On the Properties tab specify that 100 rows are to be generated.

__ 8. Click View Data to view the data that will be generated.

5-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Compile, run, and monitor the job


__ 1. Click on the Compile icon to compile your job. If your job compiles with errors, fix
the errors before continuing.

__ 2. Click your right mouse button over an empty part of the canvas. Select or verify that
Show performance statistics is enabled.
__ 3. Run your job by clicking on the Run icon.

__ 4. Move to Director from within Designer by clicking Tools>Run Director. Alternatively,


click View>Job Log to open a window within Designer to view the log messages.
__ 5. In the Director Status window select your job.
__ 6. Click the Log icon (open book).
__ 7. Scroll through the messages in the log. There should be no warnings or errors. If
there are, double-click on the messages to examine their contents. Fix the problem
and then recompile and run.
__ 8. Notice that there are one or more log messages starting with PeekEmployees, the
label on your Peek stage. Double-click on one of these to open the message
window.

Copyright IBM Corp. 2012 Exercise 5. Creating parallel jobs 5-5


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Specify Extended Properties


__ 1. Save your job as GenDataJobAlgor in your _Training>Jobs folder.
__ 2. Open up the Row Generator stage to the Columns tab. Double-click on the row
number to the left of the first column name.
__ 3. Specify the extended properties as shown.
__ a. Click on Type to add the Type property.
__ b. Click on Initial Value. Set its value to 10000 in the Initial value field to the right.
__ c. Select the Type property, and then add the Increment property. Set 1 as the
increment value

__ 4. Click Apply then Next. For the Name column, specify that you want to cycle through
three names, your choice.
__ a. Select Generator in the Properties panel, and then click Algorithm.
__ b. Choose cycle from the drop down menu on the right.
__ c. Click on Value. In the Value field add a name for the first value.
__ d. Press Enter to add a second value.
__ e. Repeat to add a third value.

__ 5. Click Apply and Next.


__ 6. For the HireDate column, specify that you want the dates generated randomly.
__ a. In the Available properties to add: window on the lower right, choose Type.

5-6 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ b. In the Type field select random.

__ c. Click Close.
__ 7. Click View Data to see the data that will be generated.

__ 8. Close the stage.

Copyright IBM Corp. 2012 Exercise 5. Creating parallel jobs 5-7


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Document your job


__ 1. From the Palette General folder, add an Annotation stage to your job diagram that
describes what your job does. Open up the Annotation stage and choose another
background color. Briefly describe what the job does.

__ 2. Compile and run your job.


__ 3. In Designer, click View>Job Log to view the messages in the job log. Fix any
warnings or errors.
__ 4. Verify the data by examining the Peek stage messages in the log.

5-8 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Add a job parameter


__ 1. Save your job as GenDataJobParam in your _Training>Jobs folder.
__ 2. Click Edit>Job Properties window. (Alternatively, click the Job Properties icon in
the toolbar.) Click on the Parameters tab.
__ 3. Define a new parameter named NumRows with a default value of 10. Its type is
Integer.

__ 4. Open up the Properties tab of the Row Generator stage in your job. Select the
Number of Records property, and then click on the right-pointing arrow to select
your parameter. Use your NumRows job parameter.

__ 5. View the data.


__ 6. Compile and run your job. Verify the results.

Copyright IBM Corp. 2012 Exercise 5. Creating parallel jobs 5-9


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Create a parameter set


1. Click the New button in the toolbar.
2. Click the Other folder.

5-10 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty 3. Double-click on the parameter set icon. Name the parameter set
RowGenTarget.

4. Click the Parameters tab. Create the NumRows parameter shown along with
the default value shown (100).

Copyright IBM Corp. 2012 Exercise 5. Creating parallel jobs 5-11


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

5. Click the Values tab. Create two values files. The first is named LowGen and
uses the default values for the NumRows parameter. The second changes the
default value of the NumRows parameter to 10000.

6. Click OK. Save your parameter set in your _Training>Metadata folder.


7. Save your job as GenDataJobParamSet.
8. Open the Job Properties window to the Parameters tab.
9. Click the Add Parameter Set button.
10. Select the RowGenTarget parameter set you created earlier.
11. Click OK to add the parameter set to the job.

12. Click OK to close the Job Properties window.


13. Open up the Row Generator stage. Then select the Number of Records
property.

5-12 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty 14. Select the NumRows parameter from the parameter set as the value for the
property.

15. Click OK to close the stage.


16. Compile your job.
17. Click the Run button. In the Job Run Options window select the HighGen
values file.

Copyright IBM Corp. 2012 Exercise 5. Creating parallel jobs 5-13


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

18. Click Run. Verify that the job generates 10000 records.

5-14 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 6. Reading from and writing to


sequential files

What this exercise is about


This exercise covers reading from and writing to sequential files.

What you should be able to do


At the end of this exercise, you should be able to:
Read from a sequential file using the Sequential File stage
Write to a sequential file using the Sequential File stage
Use the Copy stage in a job
Create Reject links from Sequential File stages
Use multiple readers in the Sequential file stage
Read multiple files using a file pattern

Introduction
Sequential files are one type of data that enterprises commonly need
to process. The primary way of reading and writing to sequential files
in a DataStage job uses the Sequential File stage.

Requirements
No new requirements.

Copyright IBM Corp. 2012 Exercise 6. Reading from and writing to sequential files 6-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Read and write to a sequential file


In this task, you design a job that reads data from the Selling_Group_Mapping.txt file,
copies it through a Copy stage, and then writes the data to a new file named
Selling_Group_Mapping_Copy.txt.
__ 1. Open a new Parallel job and save it under the name CreateSeqJob. Save it in your
_Training>Jobs folder.
__ 2. Add a Sequential File stage from the Palette File folder, a Copy stage from the
Palette Processing folder, and a second Sequential stage. Draw links. Name the
stages and links as shown.

__ 3. In the source Sequential File stage Columns and Format tabs, load the format and
column definitions from the Selling_Group_Mapping.txt table definition you
imported in a previous exercise.
__ 4. On the Properties tab specify a path to the file to read, namely the
Selling_Group_Mapping.txt file. Here, also set the First Line is Column Names
property to True. If you do not, your job will have trouble reading the first row and
issue a warning message in the job log.

6-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 5. Click View Data to verify that the metadata has been specified properly in the stage.

__ 6. In the Copy stage Output>Mapping tab, drag all the columns across from the
source to the target.

__ 7. In the target Sequential stage, click on Format. Confirm that Field


defaults>Delimiter=comma. Return to the Properties tab. Name the file
Selling_Group_Mapping_Copy.txt and write it to your

Copyright IBM Corp. 2012 Exercise 6. Reading from and writing to sequential files 6-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

C:\CourseData\DSEss_Files>Temp directory. Create it with a first line of column


names. It should overwrite any existing file with the same name.

__ 8. Compile and run your job.


__ 9. View the job log. Fix any errors.

6-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Create a job parameter for the target file


__ 1. Save your CreateSeqJob job as CreateSeqJobParam. Rename the last link and
Sequential File stage to TargetFile.

__ 2. Open up the Job Properties window.


__ 3. On the Parameters tab, define a job parameter named TargetFile of type string.
Create an appropriate default filename, for example, TargetFile.txt.

__ 4. Open up your target stage to the Properties tab. Select the File property. In the File
text box retain the directory path. Replace the name of your file by your job
parameter.

Copyright IBM Corp. 2012 Exercise 6. Reading from and writing to sequential files 6-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Add Reject links


__ 1. Add a second link (which will automatically become a reject link) from the source
Sequential File stage to a Peek stage. Also add a reject link from the target
Sequential File stage to a Peek stage. Give appropriate names to these new stages
and links.

__ 2. On the Properties tab of each Sequential File stage, change the Reject Mode
property value to Output.

__ 3. Compile and run. Verify that it is running correctly. You should not have any rejects,
errors, or warnings.
__ 4. To test the rejects link, temporarily change the property First Line is Column
Names to False in the source stage and then recompile and run. This will cause the
first row to be rejected because the values in the first row, which are all strings, will
not match the column definitions, some of which are integer types.

6-6 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 5. Examine the job log. Look for a warning message indicating an import error in the
first record read (record 0). Also open the SourceRejects Peek stage message. Note
the data in the row that was rejected.

Copyright IBM Corp. 2012 Exercise 6. Reading from and writing to sequential files 6-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Create a second output link from a Copy stage


__ 1. Add a second output link from your Copy stage to a Peek stage.

__ 2. Open the Copy stage. Click the Output>Mapping tab. Then select the link to your
Peek stage (ToPeek) from the Output name box.

6-8 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 3. Drag the first two columns to the target link.

__ 4. Click on the Columns tab. Change the name of the second column to SG_Desc.

Copyright IBM Corp. 2012 Exercise 6. Reading from and writing to sequential files 6-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 5. Compile and run your job. View the messages written to the log by the Peek output
stage.

6-10 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Read a file using multiple readers


__ 1. Save your job as CreateSeqJobMultiRead.
__ 2. Click the Properties tab of your source stage.
__ 3. Click the Options folder and add the Number of Readers Per Node property. Set
this property to 2.
__ 4. Compile and run your job.
__ 5. View the job log.

Note

You will receive some warning messages related to the first row. And this row will be
rejected. You can safely ignore these.

Copyright IBM Corp. 2012 Exercise 6. Reading from and writing to sequential files 6-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Create a job that reads multiple files


__ 1. Save your job as CreateSeqJobPattern.
__ 2. Compile and then run your job twice specifying the following file names in the job
parameter for the target file: TargetFile_A.txt, TargetFile_B.txt. This writes two
files to your DSEss_Files>Temp directory.
__ 3. Edit the source Sequential stage. Change read method to File Pattern. You will get
a warning message. Click Yes to continue. Place a wildcard (?) in the last portion of
the file name: TargetFile_?.txt

__ 4. Click View Data to verify that you can read the files.
__ 5. Compile and run the job. View the job log.
__ 6. Click View Data over the target stage to verify the results. There should be two
copies of each row, since you are now reading two identical files. You can use the
Find button in the View Data window to locate both copies.

6-12 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 7. Reading and writing NULL values to a


sequential file

What this exercise is about


This exercise covers reading NULL values in and writing NULL values
to sequential files.

What you should be able to do


At the end of this exercise, you should be able to:
Read NULL values from a sequential file
Write NULL values to a sequential file

Introduction
NULL values enter into the job stream in a number of places in
DataStage jobs. This exercise looks at how they are handled in the
context of reading from and writing to sequential files.

Requirements
No new requirements.

Copyright IBM Corp. 2012 Exercise 7. Reading and writing NULL values to a sequential file 7-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Read NULL values from a sequential file


__ 1. Open your CreateSeqJobParam job.
__ 2. Save your job as CreateSeqJobNULL.

__ 3. Open up the Selling_Group_Mapping_Nulls.txt file in your DSEss_Files directory


on your client system. Use WordPad to view the file.

7-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty
Note

Although your DataStage jobs read sequential files in your DSEss_Files directory on the
DataStage server system, copies of these files have been placed on your client system, for
your convenience.

__ 4. Notice in the data that the Special_Handling_Code column contains some integer
values of 1. Notice also that the last column (Distr_Chann_Desc) is missing some
values. To test how to read NULLs, let us assume that 1 in the third column means
NULL and that the absence of a value in the last column means NULL. In the
following steps, you will specify this.
__ 5. Open up the source Sequential stage to the Columns tab. Double-click to the left of
the Special_Handling_Code column to open up the Edit Column Meta Data
window.

Copyright IBM Corp. 2012 Exercise 7. Reading and writing NULL values to a sequential file 7-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 6. Change the Nullable field to Yes. Notice that the Nullable folder shows up in the
Properties window. Select this folder and then add the Null field value property.
Specify a value of 1 for it.

__ 7. Click Apply, and then Next.


__ 8. Move to the Distribution_Channel_Description column. Set this field to nullable.
Add the Null field value property. Here, you will treat the empty string as meaning

7-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty NULL. To do this specify back-to-back double quotes. Click Apply and then
Close.

__ 9. On the Properties tab, for the File property, select the


Selling_Group_Mapping_Nulls.txt file.
__ 10. Click the View Data button. Notice that values that are interpreted by DataStage as
NULL show up as the word NULL, regardless of their actual value in the file.

Copyright IBM Corp. 2012 Exercise 7. Reading and writing NULL values to a sequential file 7-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 11. Compile and run your job. It should abort since NULL values will be written to
non-nullable columns on your target. View the job log to see the messages.

7-6 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Write NULL values to a sequential file


__ 1. Save your job as CreateSeqJobHandleNULL.
__ 2. Open up your target stage to the Columns tab. Specify that the
Special_Handling_Code column and the Distribution_Channel_Description
column are nullable.
__ 3. Compile and run your job. What happens?
__ 4. In this case, the job does not abort, since NULL values are not being written to
non-nullable columns. But the rows with NULL values get rejected because the
NULL values are not being handled. They are written to the TargetRejects Peek
stage, where you can view them.

__ 5. Now, let us handle the NULL values. That is, we will specify values to be written to
the target file that represent NULLs. For the Special_Handling_Code column we
will specify a value of -99999. For the Distribution_Channel_Description column
we will specify a value of UNKNOWN.

Copyright IBM Corp. 2012 Exercise 7. Reading and writing NULL values to a sequential file 7-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 6. Open up the target stage and specify these values. The procedure is the same as
when the Sequential stage is used as a source. Shown below is the specification for
the Special_Handling_Code column.

__ 7. Compile and run your job. View the job log. You should not get any errors or rejects.
__ 8. Click View Data to verify the results.

Note

When you view the data in DataStage, all you will see is the word NULL, not the actual
values. To see those values you would need to open up the data file on the DataStage
server system in a text editor.

7-8 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 8. Working with data sets

What this exercise is about


This exercise covers Data Sets in DataStage jobs.

What you should be able to do


At the end of this exercise, you should be able to:
Write to a data set
Use the Data Set Management utility to view data in a data set

Introduction
Data Sets are suitable as temporary staging files between DataStage
jobs.

Requirements
No new requirements.

Copyright IBM Corp. 2012 Exercise 8. Working with data sets 8-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Write to a Data Set


__ 1. Open up your CreateSeqJob job and save it as CreateDataSetJob.
__ 2. Delete the target sequential stage leaving a dangling link.
__ 3. Drag a Data Set stage from the Palette File folder to the canvas and connect it to
the dangling link. Change the name of the target stage to
SellingGroupMappingCopy.

__ 4. Edit the Data Set stage properties. Write to a file named


Selling_Group_Mapping.ds in your DSEss_Files>Temp directory.

__ 5. Open the source stage and add the optional property to read the file using multiple
readers per node. Click Yes when confronted with the warning message. Then
change the value of the property to 2. (This will ensure that data is written to more
than one partition.)
__ 6. Compile and run your job. Check the job log for errors. You can safely ignore the
warning message about record 0.

8-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: View a data set


__ 1. In Designer, click Tools > Data Set Management. Browse for the data set that was
created. Notice how many records are written to each of the two partitions.

Copyright IBM Corp. 2012 Exercise 8. Working with data sets 8-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 2. Click the Show Data Window icon at the top of the window. Select partition number
1. This will display the data in just the second partition.

__ 3. Click OK to view the records in that partition.


__ 4. Click the Show Schema Window icon at the top of the window to view the data set
schema. A data set contains its own column metadata in the form of a schema. A
schema is the data set version of a table definition.

8-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 9. Partitioning and collecting

What this exercise is about


This exercise covers how to set partitioning and collecting algorithms
in DataStage jobs.

What you should be able to do


At the end of this exercise, you should be able to:
View partitioning icons.
Set partitioning algorithms in stages.
View the OSH in the job log.
View the Configuration File in the job log.
View the Score in the job log.

Introduction
The configuration file determines the number of nodes (partitions) that
a job runs under. Partitioning algorithms that can be set in each stage
determines how the data gets put into the partitions.

Requirements
No new requirements.

Copyright IBM Corp. 2012 Exercise 9. Partitioning and collecting 9-1


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Partitioning and collecting


1. Save your CreateSeqJobParam job as CreateSeqJobPartition.

2. Note the icon on the input link to the target stage (fan-in). It indicates that the
stage is collecting the data.
3. Open up the target Sequential File stage to the Input>Partitioning tab. Note the
collecting algorithm (Auto) that is selected.

4. Compile and run your job.

9-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty 5. View the data in the target stage.


6. Open up the target Sequential stage to the Properties tab. Instead of writing to
one file, write to two files. Make sure they have different names. Create the files
in your DSEss_Files>Temp directory. To accomplish this:
Click on the folder with the File property, here Target.
Choose File from the Available properties to add panel.
For the File properties, add the directory path and the
#TargetFile# parameter for the second file.
Append something at the end of the path to distinguish the two file
names, for example, 1 and 2.
Here, 1 and 2 have been appended to the file name parameter, respectively, so
that the names of the two files are different.

7. Click on the Partitioning tab. Notice that the stage is no longer collecting, but
now is partitioning. You can see this by noting the words on top of the

Copyright IBM Corp. 2012 Exercise 9. Partitioning and collecting 9-3


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Partitioning / Collecting box. If it says Partition type, then the stage is


partitioning. If it says Collector type, it is collecting.

8. Click OK to close the stage. Notice that the partitioning icon has changed. It no
longer indicates collecting. The icon you see now indicates Auto partitioning.

9-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty 9. Now open the target Sequential File stage again. This time change the
partitioning type to Same.

10. Close the stage. Notice how the partitioning icon has changed.

11. Compile and run your job.

Copyright IBM Corp. 2012 Exercise 9. Partitioning and collecting 9-5


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

12. View the job log. Notice how the data is exported to the two different partitions (0
and 1). 23 records go into one partition (partition 1) and 24 records go into the
other (partition 0).

9-6 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: View the OSH, Configuration File, and Score


1. In the job log for the last run of the CreateSeqJobPartition job, open the
message labeled OSH script. This displays the OSH script that was generated
when the job was compiled.

2. In the OSH notice the following:


Operators: These correspond to stages in the job design.
Schemas: These correspond to table definitions in the stages.
Properties: These correspond to properties defined on the Stage
Properties tab.

Copyright IBM Corp. 2012 Exercise 9. Partitioning and collecting 9-7


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

3. In the log open up the message labeled main_program: APT configuration


file

4. Notice the following in the configuration file:


The number of nodes and their names. In this example, there are two
nodes labeled node1 and node2
Resource disks used by each node. The entries labeled resource disk.
This identifies disk space used to store the data in data sets.
Resource scratch disks used by each node. These store temporary files
created during a job run, such as those used in sorting.
5. In the log, open up the message labeled main_program: This step has X
datasets. This is the Score. The score is divided into two sections. The second

9-8 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty section lists the nodes each operator runs on. For example, op0 runs on just the
single node, node1. Notice that op3 (TargetFile) runs on two nodes.

Copyright IBM Corp. 2012 Exercise 9. Partitioning and collecting 9-9


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

9-10 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 10.Using the Lookup stage

What this exercise is about


This exercise covers equality match lookups using the Lookup stage.

What you should be able to do


At the end of this exercise, you should be able to:
Use the Lookup stage to lookup the warehouse item description in
a file.
Handle lookup failures.
Capture lookup failures in a reject link

Introduction
There are several stages that can be used to combine data. This
exercise and the next explore the Lookup stage.

Requirements
No new requirements.

Copyright IBM Corp. 2012 Exercise 10. Using the Lookup stage 10-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Look up the warehouse item description


__ 1. Open a new parallel job and save it under the name LookupWarehouseItem. Add
the stages and links and name them as shown. The Lookup stage is found in the
Processing section of the Palette.

__ 2. Import the table definition for the Warehouse.txt sequential file to your
_Training>Metadata folder.

__ 3. Edit the Warehouse Sequential File stage. Warehouse.txt will be the source file
from which data will be extracted. The format properties identified in the table

10-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty definition will need to be duplicated in the Sequential File stage. Be sure you can
view the data. If there are problems, check that the metadata is correct on both the
Columns and the Format tabs.
__ 4. Import the table definition for the Items.txt file.
__ 5. Edit the Items Sequential File stage to extract data from the Items.txt file. Also, on
the Format tab, change the quote character to the single quote (). This is because
some of the data contains double quotes as part of the data.

__ 6. Again, be sure you can view the data in the Items stage before continuing.
__ 7. Open the Lookup stage. Map the Item column in the top left pane to the lookup Item
key column in the bottom left pane of the Items table panel, by dragging the one to

Copyright IBM Corp. 2012 Exercise 10. Using the Lookup stage 10-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

the other. If the Confirm Action window appears, click Yes to make the Item column
a key field.

__ 8. Drag all the Warehouse panel columns to the Warehouse_Items target link on the
right.
__ 9. Drag the Description column from the Items panel to just above the Onhand target
column in the Warehouse_Items panel.

10-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 10. On the Warehouse_Items tab at the bottom of the window, change the name of the
Description target column, which you just added, to ItemDescription.

__ 11. Edit your target Sequential stage as needed.


__ 12. Compile and run. Examine the job log. Your job probably aborted. Try to determine
why it failed and think what you might do about it. (You will fix things in the next
task.)

Copyright IBM Corp. 2012 Exercise 10. Using the Lookup stage 10-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Handle lookup failures


__ 1. Save your job as LookupWarehouseItemNoMatch.
__ 2. Open up Lookup stage. Click the Constraints icon (top, second from left). When the
lookup fails, specify that the job is to continue.

__ 3. Compile and run. Examine the log. You should not get any fatal errors this time.
__ 4. View the data in the target file. Do you find any rows in the target file in which the
lookup failed? These would be rows with missing item descriptions. Increase the
number of rows displayed to at least a few hundred, if you do not initially see any
missing items. By default, when there is a lookup failure with Continue, DataStage
outputs empty values to the lookup columns. If the columns are nullable, DataStage
outputs NULLs. If the columns are not nullable, DataStage outputs default empty
values depending on their type.

10-6 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 5. Open up the Lookup stage. Make both the Description column on the left side and
the ItemDescription column on the right side nullable. Now, for non-matches
DataStage will return NULLs instead of empty strings.

__ 6. Since NULLs will be written to the target stage, we will need to handle them. Open
up the target Sequential stage. Replace NULLs by the string NOMATCH. To do
this, double-click to the left of the ItemDescription column on the Columns tab. In
the extended properties, specify a null field value of NOMATCH.

__ 7. Compile and run.

Copyright IBM Corp. 2012 Exercise 10. Using the Lookup stage 10-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 8. View the data in the target Sequential File stage. Click Find. Type NULL in the Find
what: box. Select ItemDescription from the In column: drop down. Click Find
Next to locate the first NULL value.

10-8 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Add a Reject link


__ 1. Save your job as LookupWarehouseItemReject.
__ 2. Open up Lookup stage and specify that lookup failures are to be rejected.

__ 3. Close the Lookup stage and then add a rejects link going to a Peek stage to capture
the lookup failures.

__ 4. Compile and run. Examine the Peeks in the job log to see what rows were lookup
failures.

Copyright IBM Corp. 2012 Exercise 10. Using the Lookup stage 10-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 5. Examine the job log. Notice in the Peek messages that a number of rows were
rejected.

10-10 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 11.Range lookups

What this exercise is about


This exercise covers range lookups in the Lookup stage.

What you should be able to do


At the end of this exercise, you should be able to:
Design a job with a reference link range lookup.
Design a job with a stream range lookup.

Introduction
A major capability in the Lookup stage are range lookups. Two type of
range lookups are supported. Those in which the range is specified on
the reference link and those in which the range is specified on the
stream link.

Requirements
No new requirements.

Copyright IBM Corp. 2012 Exercise 11. Range lookups 11-1


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Design a job with a reference link range lookup


This job reads Warehouse item records from the source file. The lookup file contains start
and end item numbers with descriptions that apply to items within the specified range. The
appropriate description is added to each record which is then written out to a sequential
file.
__ 1. Open your LookupWarehouseItem job and save it under the name
LookupWarehouseItemRangeRef. Save in the _Training>Jobs folder. Rename
the stages and links as shown.

__ 2. Import the table definition for the Range_Descriptions.txt sequential file. The
StartItem and EndItem fields should be defined like the Item field is defined in the
Warehouse stage, namely, as VarChar(255).

__ 3. Edit the Range_Description Sequential File stage to read from the


Range_Descriptions.txt by setting the properties and changing the format settings
appropriately. When loading the new column definitions, delete the existing columns
first. Verify that you can view the data.

11-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 4. Open the Lookup stage. Edit the Description column on the left and the
ItemDescription column on the right so that both are nullable.

__ 5. Select the Range checkbox to the left of the Item field in the Warehouse table
window.
__ 6. Double-click on the Key Expression cell for the Item column to open the Range
Expression editor. Specify that the Warehouse.Item column value is to be greater
than or equal to the StartItem column value and less than the EndItem column
value.

__ 7. Open the Constraints window and specify that the job is to continue if a lookup
failure occurs.
__ 8. Edit the target Sequential File stage. The ItemDescription column in the Sequential
File stage is nullable. Go to the extended properties window for this column.
Replace NULL values by the string NO_DESCRIPTION.

Copyright IBM Corp. 2012 Exercise 11. Range lookups 11-3


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 9. Compile and run your job.


__ 10. Click the right mouse button over the stage and then select the option to view the
data in the target stage to verify the results.

11-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Design a job with a stream range lookup


This job reads from the Range_Descriptions.txt file. It then does a lookup into the
Warehouse.txt file. For each row read, it selects all the records from the Warehouse.txt
file with items within the range. The appropriate description is added to each record which
is then written out to a file.
__ 1. Save your job as LookupItemsRangeStream in your _Training>Jobs folder.
__ 2. Reverse the source and lookup links. First make the source link a reference link.
Click the right mouse button and click Convert to reference. Then make the lookup
link a stream link.

__ 3. Open up your Lookup stage. Select the Item column in the Warehouse table as the
key. Specify the Key type as Range.
__ 4. Double-click on the Key Expression cell next to Item. Specify the range
expression.

Copyright IBM Corp. 2012 Exercise 11. Range lookups 11-5


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 5. Click the Constraints icon. Specify that multiple rows are to be returned from the
Warehouse link. Also specify that the job is to continue if there is a lookup failure.

__ 6. Compile and run your job.


__ 7. View the data to verify the results.

11-6 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 12.Using the Join, Merge, and Funnel


stages

What this exercise is about


This exercise covers three other stages besides the Lookup stage that
can be used to combine data.

What you should be able to do


At the end of this exercise, you should be able to:
Use the Join stage in a job.
Use the Merge stage in a job.
Use the Funnel stage in a job.

Introduction
Several stages can be used to combine data. This exercise looks at
the Join, Merge, and Funnel stages.

Requirements
You have a working LookupWarehouseItem job.

Copyright IBM Corp. 2012 Exercise 12. Using the Join, Merge, and Funnel stages 12-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Use the Join stage in a job


__ 1. Open your LookupWarehouseItem job. Save it as JoinWarehouseItem.
__ 2. Delete the Lookup stage and replace it with a Join stage available from the
Processing folder in the palette. (Just delete the Lookup stage, drag over a Join
stage, and then reconnect the links.)

__ 3. Verify that you can view the data in the Warehouse stage.
__ 4. Verify that you can view the data in the Items stage.
__ 5. Open the Join stage. Join by Item. Specify a Right Outer join.

12-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 6. Click the Link Ordering tab. Make Warehouse the Right link by selecting either
Items or Warehouse and clicking an up or down arrow as appropriate.

__ 7. Click the Output>Mapping tab. Be sure all columns are mapped to the output.

__ 8. Edit the target Sequential File stage. Edit or confirm that the job writes to a file
named WarehouseItems.txt in your lab files Temp directory.

Copyright IBM Corp. 2012 Exercise 12. Using the Join, Merge, and Funnel stages 12-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 9. Compile and run. Verify that the number of records written to the target sequential
file is the same as were read from the Warehouse.txt file, since this is a Right
Outer join.

__ 10. View the data. Verify that the description is joined onto each Warehouse file record
of columns.

12-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Use the Merge stage in a job


In this task, we will see if the Merge stage can be used in place of the Join stage. We will
see that it cannot be successfully used.
__ 1. Save your job as MergeWarehouseItem. Replace the Join stage by the Merge
stage. (Just delete the Join stage, drag over a Merge stage, and then reconnect the
links.)

__ 2. In the Merge stage, specify that data is to be merged, with case sensitivity, by the
key (Item). Assume that the data is sorted in ascending order. Also specify that
unmatched records from Warehouse (the master link) are to be dropped.

Copyright IBM Corp. 2012 Exercise 12. Using the Join, Merge, and Funnel stages 12-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 3. On the Link Ordering tab, ensure that the Warehouse link is the master link.

__ 4. On the Output>Mapping tab, be sure that all input columns are mapped to the
appropriate output columns.

__ 5. Compile and run. View the data.


__ 6. View the job log. Notice that a number of master records have been dropped
because they are duplicates. Recall that the Merge stage requires the master data

12-6 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty to be duplicate-free in the key column. A number of update records have also been
dropped because they did not match master records.

__ 7. The moral here is that you cannot use the Merge stage if your Master source has
duplicates. None of the duplicate records will match with update records.
__ 8. Recall that another requirement of the Merge stage (and Join stage) is that the data
is hash partitioned and sorted by the key. We did not do this explicitly, so why did our
job not fail? Let us examine the job log for clues. Open up the Score message.
__ 9. Notice that hash partitioners and sorts (tsort operators) have been inserted by
DataStage.

Copyright IBM Corp. 2012 Exercise 12. Using the Join, Merge, and Funnel stages 12-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Use the Funnel stage in a job


In this task, you will funnel rows from two input files into a single file.
__ 1. Open a new parallel job and save it as FunnelWarehouse. Add links and stages
and name them as shown.

__ 2. Edit the two source Sequential File stages to, respectively, extract data from the two
Warehouse files, Warehouse_031005_01.txt and Warehouse_031005_02.txt.
They have the same format and column definitions as the Warehouse.txt file.
__ 3. Edit the Funnel stage to combine data from the two files in Continuous Funnel
mode.

__ 4. On the Output>Mapping tab, map all columns through the stage.


__ 5. In the target stage, write to a file named TargetFile.txt in the Temp directory.

12-8 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 6. Compile and run. Verify that the number of rows going into the target is the sum of
the number of rows coming from the two sources. And view the result data.

Copyright IBM Corp. 2012 Exercise 12. Using the Join, Merge, and Funnel stages 12-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

12-10 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 13.Group processing stages

What this exercise is about


This exercise covers stages that process groups of data, including the
Sort, Aggregator, and Remove Duplicates stages.

What you should be able to do


At the end of this exercise, you should be able to:
Create a job that uses Sort, Aggregator, and Remove Duplicates
stages.
Create a Fork-Join job design.

Introduction
In this exercise you will create a fairly complex job that contains all the
group processing stages listed above.

Requirements
No new requirements.

Copyright IBM Corp. 2012 Exercise 13. Group processing stages 13-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Create the job design


__ 1. Open a new parallel job and save it as ForkJoin. Add stages and links and name
them as shown.

__ 2. Edit the Selling_Group_Mapping_Dups Sequential File stage to read from the


Selling_Group_Mapping_Dups.txt file. It has the same format as the
Selling_Group_Mapping.txt file.
__ 3. Edit the Sort_By_Code Sort stage. Perform an ascending sort by
Selling_Group_Code. The sort should not be a stable sort. Send all columns
through the stage.

__ 4. In the Copy stage, specify that all columns move through the stage to the output link
going to the Join stage. If necessary review instructions in Exercise 6 for configuring
a copy stage.

13-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 5. Specify that only the Selling_Group_Code column moves through the Copy stage
to the Aggregator stage.
__ 6. Edit the Aggregator stage. Specify that records are to be grouped by
Selling_Group_Code.
__ 7. Specify that the type of aggregation is Count Rows.
__ 8. Specify that the aggregation amount is to go into a column named CountGroup.
Define this column on the Output>Columns tab as an integer, length 10.
__ 9. Select Sort as the aggregation method, because the data has been sorted by the
grouping key column.

Copyright IBM Corp. 2012 Exercise 13. Group processing stages 13-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 10. On the Output>Mapping tab, send out the key column and the result column.

__ 11. Edit the Join stage. The join key is Selling_Group_Code. The join type is Left
Outer. Verify on the Link Ordering tab that the CopyToJoin link is the left link.

13-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 12. On the Output>Mapping tab, map all columns across.

__ 13. Edit the Sort_By_Handling_Code stage. The key column of Selling_Group_Code


has already been sorted, so specify Don't Sort (Previously Sorted) for that key
column. Add Special_Handling_Code as an additional sort key. Turn off stable
sort. On the Output>Mapping tab, move all columns through the stage.

Copyright IBM Corp. 2012 Exercise 13. Group processing stages 13-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 14. On the Input>Partitioning tab, select Same to guarantee that the partitioning going
into the stage will not change.

__ 15. Edit the Remove Duplicates stage. Group by Selling_Group_Code. Retain the last
record in each group. On the Output>Mapping tab, move all columns through the
stage.

__ 16. Edit the target Sequential stage. Write to a file named


Selling_Group_Code_Deduped.txt in the lab files Temp directory. On the
Partitioning tab, collect the data using Sort Merge based on the two columns by

13-6 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty which the data has been sorted. Double-click the columns to move them to the
Selected box.

__ 17. Compile and run. View the job log to check whether there are any problems.

Copyright IBM Corp. 2012 Exercise 13. Group processing stages 13-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 18. View the results. There should be fewer rows going into the target stage than the
number coming out of the source stage, because the duplicate records have been
eliminated.

__ 19. View the data in the target stage. Take a look at the CountGroup to see that you are
getting multiple duplicate counts for some rows.

13-8 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 14.Defining a constraint

What this exercise is about


This exercise covers how to define constraints in the Transformer
stage.

What you should be able to do


At the end of this exercise, you should be able to:
Define constraints in a Transformer in a job.
Create an otherwise link.

Introduction
This unit as a whole introduces the functionality of the Transformer.
This lab exercise covers how to define constraints.

Requirements
No new requirements.

Copyright IBM Corp. 2012 Exercise 14. Defining a constraint 14-1


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Define Transformer Constraints


__ 1. Create a new parallel job and save it as TransSellingGroup.
__ 2. Add a Sequential File stage, a Transformer stage, and two target Sequential File
stages to the canvas. Name the links and stages as shown.

__ 3. Open the source Sequential File stage. Edit it to read data from the
Selling_Group_Mapping_RangeError.txt file. It has the same metadata as the
Selling_Group_Mapping.txt file.

14-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 4. Open up the Transformer. Drag all the input columns across to both output link
windows.

__ 5. Double-click to the right of the word Constraint in either output link window. This
opens the Transformer Stage Constraints window.

__ 6. Double-click on the Constraint cell to the right of the LowCode link name to open
the Expression Editor. Click on the box with the ellipsis to choose pre-defined fields

Copyright IBM Corp. 2012 Exercise 14. Defining a constraint 14-3


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

and code. Use the Editor to define a condition that selects just rows with special
handling codes between 0 and 2 inclusive.
__ 7. Double-click on the Constraint cell to the right of the HighCode link name to open
the Expression Editor. Use the Editor to define a condition that selects just rows with
special handling codes between 3 and 6 inclusive.

__ 8. Edit the LowCode target Sequential File stage to write to a file named LowCode.txt
in the lab files Temp directory.
__ 9. Edit the HighCode target Sequential File stage to write to a file named
HighCode.txt in the lab files Temp directory.
__ 10. Compile and run your job.
__ 11. View the data in your target files to verify that they each contain the right rows. Here
is the LowCode.txt file data. Notice that it only contains rows with special handling
codes between 0 and 2.

14-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Use an Otherwise Link to capture range errors in the data
__ 1. Save your job as TransSellingGroupOtherwise.
__ 2. Add an additional link from the Transformer to another Sequential File stage and
label the new stage and link RangeErrors.

__ 3. In the Transformer, drag all input columns across to the new target link.

__ 4. Click on the Output Link Execution Order icon.

Copyright IBM Corp. 2012 Exercise 14. Defining a constraint 14-5


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 5. Reorder the links so that the RangeErrors link is last in output link ordering.
(Depending on how you drew your links, this link may already be last.)

__ 6. Open the Constraints window. Select the Otherwise box to the right of your
RangeErrors link.

__ 7. Edit the RangeErrors Sequential File stage as needed to write to the


RangeErrors.txt file in the lab files Temp directory.

14-6 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 8. Compile and run your job. There should be a few range errors.

Copyright IBM Corp. 2012 Exercise 14. Defining a constraint 14-7


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

14-8 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 15.Define derivations

What this exercise is about


This exercise covers how to define derivations in the Transformer
stage.

What you should be able to do


At the end of this exercise, you should be able to:
Define a stage variable.
Build a formatting derivation.
Use functions in derivations.
Build a conditional replacement derivation.
Specify null processing options.
Capture rejects.

Introduction
This unit as a whole introduces the functionality of the Transformer.
This lab exercise covers how to define constraints.

Requirements
You have a working TransSellingGroup job from the previous lab.

Copyright IBM Corp. 2012 Exercise 15. Define derivations 15-1


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Build a formatting derivation


__ 1. Open up your TransSellingGroupOtherwise job and save it as
TransSellingGroupDerivations.

__ 2. Open the Transformer.

__ 3. Click the Stage Properties icon in the top left corner. Then click the Stage
Variables tab.

15-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 4. Create a stage variable named HCDesc. Set its initial value to the empty string. Its
SQL type is VarChar, precision 255.

__ 5. Close the Transformer Stage Properties window. The name of the stage variable
shows up in the Stage Variables window.

__ 6. Double-click in the cell to the left of the HCDesc stage variable. Define a derivation
that places each row's special handling code within a string of the following form:
Handling code = [xxx]. Here xxx is the value in the Special_Handling_Code
column.

__ 7. Create a new VarChar(255) column named Handling_Code_Description for each


of the LowCode and HighCode output links. You can create these on the
corresponding tabs at the bottom of the Transformer window.

Copyright IBM Corp. 2012 Exercise 15. Define derivations 15-3


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 8. Drag the value of the HCDesc stage variable to each of these link columns.

15-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 9. Compile and run. View the data in the output files.

Copyright IBM Corp. 2012 Exercise 15. Define derivations 15-5


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Use a function in a derivation


__ 1. Open the Transformer.
__ 2. In the derivation for the Distribution_Channel_Description target column in the
LowCode output link, turn the output text to uppercase and trim the string of any
blanks.

__ 3. Compile, run, and view the results.

15-6 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Build a conditional replacement derivation


__ 1. Open the Transformer.
__ 2. Write a derivation for the target Selling_Group_Desc columns in both the
LowCode and HighCode output links that replaces SG055 by SH055, leaving
the rest of the description as it is. In other words, SG055 Live Swine, for example,
becomes SH055 Live Swine.

Hint

Use the IF THEN ELSE operator. Also, you may need to use the substring operator and
Len functions.)

__ 3. Compile, run, and test your job. Here is some of the output from the HighCode
stage. Notice specifically, the row (550000), which shows the replacement of SG055
with SH055 in the second column.

Copyright IBM Corp. 2012 Exercise 15. Define derivations 15-7


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Capture rejects


__ 1. Save your job as TransSellingGroupRejects.
__ 2. Add another output link to a Peek stage. Name the link Rejects and the stage
Peek_Rejects.
__ 3. Right-click over the link and then click Convert to reject.

15-8 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 4. Open up the Transformer and then click the Stage Properties icon (top left). Select
the Legacy null processing box (if it is not already selected).

__ 5. Compile and run your job. Your job probably will not have any rejects.

Copyright IBM Corp. 2012 Exercise 15. Define derivations 15-9


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

15-10 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 16.Loop processing

What this exercise is about


This exercise covers how to define derivations in the Transformer
stage.

What you should be able to do


At the end of this exercise, you should be able to:
Create Loop variables
Create Loop conditions
Process input rows through a loop

Introduction
When processing input data in loops, each input row may result in
multiple output rows. In this exercise you will process input rows that
contain lists of colors. For each input row, you will extract a color from
the list and write it out as a separate row.

Requirements
No new requirements.

Copyright IBM Corp. 2012 Exercise 16. Loop processing 16-1


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Pivot
__ 1. The source data is contained in the ColorMappings.txt file. Each Item number is
followed by a list of colors.

__ 2. Create a new parallel job named TransPivot. Name the links and stages as shown.

__ 3. Import the table definition for the ColorMappings.txt file. Store it in your
_Training>Metadata folder.

16-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 4. Open the ColorMappings stage. Edit the stage so that it reads from the
ColorMappings.txt file. Verify that you can view the data.

__ 5. Open the Transformer stage. Drag the Item column across to the output link.
__ 6. Create a new VarChar(10) column named Color.

__ 7. Create a new integer stage variable named NumColors. This will store the number
of colors in the list of colors.

Copyright IBM Corp. 2012 Exercise 16. Loop processing 16-3


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 8. Use the Count function to count the number of occurrences of the substring | in
the Colors input column. Store the result in the NumColors stage variable. Note
that the number of | delimiters in the color list is one less than the number of
colors.

__ 9. Open the Loop Condition window. Open the Expression Editor in the Loop While
box. Specify a loop condition that will iterate for each color. The total number of
iterations is stored in the NumColors stage variable. Use the @ITERATION system
variable.

__ 10. Create a new VarChar(10) loop variable named Color.

__ 11. For each iteration, store the corresponding color from the colors list in the Color
loop variable. Use the Field function to retrieve the color from the colors list.

16-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 12. Drag the Color loop variable down to the derivation cell next to the Color output link
column.

__ 13. Edit the target stage to write to a sequential file named ItemColor.txt in your lab
files Temp directory.
__ 14. Compile and run your job. You should see more rows going into the target file than
coming out of the source file.

Copyright IBM Corp. 2012 Exercise 16. Loop processing 16-5


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 15. View the data in the target stage. You should see multiple rows for each item
number.

__ 16. Test that you have the right results. For example, count the number of rows for item
16. There should be four, because the original item 16 has a list of four colors.

16-6 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 17.Group processing in the Transformer

What this exercise is about


The first part of this exercise covers how to process groups of data
rows in a Transformer. The second part covers the DataStage parallel
job debugger.

What you should be able to do


At the end of this exercise, you should be able to:
Use the LastRowInGroup function to determine when you are at
the end of group.
Use the SaveInputRecord and GetSavedInputRecord functions.
Use the parallel job debugger to set and edit breakpoints on links in
a parallel job.
Run a job in the parallel job debugger
Examine the data in link columns at a breakpoint

Introduction
Several Transformer functions are available to you in the Transformer
to process groups of records. This exercise demonstrates how they
can be used. In this example, a group result is added to each
individual row.
In the second part of this exercise you will use the parallel job
debugger to debug a DataStage parallel job. You set breakpoints on
the links that contain the data you want to examine. You specify
conditions for the breakpoints. Then you run the job using the
debugger.

Requirements
Your lab files folder contains the Selling_Group_Mapping_Debug.txt
file. You have a working TransSellingGroupOtherwise job.

Copyright IBM Corp. 2012 Exercise 17. Group processing in the Transformer 17-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Process groups in a Transformer


__ 1. Create a new job named TransGroup. Name the links and stages as shown.

__ 2. Import a table definition for the ItemColor.txt file that you created in the previous
lab. Reminder: This file is located in the Temp directory rather than the
DSEss_Files directory. (If you did not previously create this file, you can use the
ItemColor_Copy.txt file in your lab files directory.) Below, a portion of the file is
displayed.

__ 3. Edit the source Sequential File stage to read data from the ItemColor.txt file.

17-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 4. Edit the Sort stage. Sort the data by the Item column.

__ 5. On the Sort stage Output>Mapping tab, drag all columns across.

Copyright IBM Corp. 2012 Exercise 17. Group processing in the Transformer 17-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 6. On the Sort Input>Partitioning tab, hash partition by the Item column.

__ 7. Open the Transformer stage. Drag the Item column across to the output link. Define
a new column named Colors as a VarChar(255).

__ 8. Create a Char(1) stage variable named IsLastInGroup. Initialize with N' (meaning
No).

17-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 9. Create a VarChar(255) stage variable named TotalColorList. Initialize it with the
empty string.
__ 10. Create a VarChar(255) stage variable named CurrentColorList. Initialize it with the
empty string.

__ 11. For the derivation for IsLastInGroup, use the LastRowInGroup() function on the
Item column to determine if the current row is the last in the current group of Items.
If so, return Y' (meaning Yes); else return N'.

__ 12. For the derivation of TotalColorList, return the conjunction of the current color to
CurrentColorList when the last row in the group is being processed. Otherwise,
return the empty string.

__ 13. For the derivation of CurrentColorList, return the conjunction of the current color to
the CurrentColorList when the last row in the group is not being processed. When
the last row is being processed, return the empty string.

Copyright IBM Corp. 2012 Exercise 17. Group processing in the Transformer 17-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 14. Drag the TotalColorList stage variable down to the cell next to Colors in the target
link.
__ 15. Define a constraint for the target link. Add text Is LastInGroup = 'Y' to output a row
when the last row in the group is being processed.

__ 16. Click OK to close the Transformer.

17-6 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 17. Edit the target Sequential File stage. Write to a file named ColorMappings2.txt in
your lab files Temp directory..

__ 18. Compile and run your job. Check the job log for error messages.

Copyright IBM Corp. 2012 Exercise 17. Group processing in the Transformer 17-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 19. View the data in your target stage. For each set of Item rows in the input file, you
should have a single row in the target file followed by a comma-delimited list of its
colors.

17-8 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Add group results to individual group records


__ 1. Save your job as TransGroupLoop.
__ 2. Open the Transformer stage.
__ 3. Add a new integer stage variable named NumSavedRows.

__ 4. For its derivation invoke the SaveInputRecord() function, found in the Utility folder.
This saves a copy of the row into the Transformer stage queue.

__ 5. Define the loop condition. Iterate through the saved rows after the last row in the
group is reached.

__ 6. Define an integer loop variable named SavedRowIndex.

Copyright IBM Corp. 2012 Exercise 17. Group processing in the Transformer 17-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 7. For its derivation invoke the GetSavedInputRecord() function in the Utility folder.
This retrieves a copy of the row from the Transformer stage queue.

__ 8. Drag the Color column across from the input link to the target output link. Put the
column second in the list of output columns.
__ 9. Remove the output link constraint.

__ 10. Compile and run. Check the job log for errors. View the data in the output.

17-10 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty DataStage parallel job debugger

Task: Debug a DataStage parallel job


__ 1. Open up your TransSellingGroupOtherwise job and save it as
TransSellingGroupDebug.

Note

If you do not have a working copy of the TransSellingGroupOtherwise job, import the
TransSellingGroupOtherwise.dsx job in your lab files dsxfiles directory.

__ 2. Open up your source stage. Set the stage to read from the
Selling_Group_Mapping_Debug.txt file.
__ 3. Create a job parameter named Channel. Make it a string with a default value of
Food Service, with the quotes.

__ 4. In the Transformer, open up the Constraints window. Add to the LowCode and
HighCode constraints the condition that the Distribution_Channel_Description
column value matches the Channel parameter value.

Copyright IBM Corp. 2012 Exercise 17. Group processing in the Transformer 17-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 5. Compile the job.


__ 6. Click Debug>Debug Window.
__ 7. Select the LowCode output link and then click the Toggle Breakpoint icon in the
Debug window. Repeat for the HighCode and RangeErrors links. Verify that the
breakpoint icon has been added to the link on the diagram.

17-12 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 8. Select the RangeErrors link and then click the Edit Breakpoints icon in the Debug
window. Set the breakpoint Expression to break when
Distribution_Channel_Description equals Food Service.

__ 9. Similarly, set the LowCode and HighCode link, breakpoint expressions to break
when Distribution_Channel_Description does not equal Food Service.
__ 10. Click the Start/Continue icon in the Debug window. When prompted for the job
parameter value, accept the default and click OK.

__ 11. Notice that the debugger stops at the RangeErrors link. The column values are
displayed in the Debug window. Click on the Node 1 and Node 2 tabs to view both
the data values for both nodes. Notice that each seems to have the correct value in
the Distribution_Channel_Description column. And the Special_Handling_Code

Copyright IBM Corp. 2012 Exercise 17. Group processing in the Transformer 17-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

is not out of range. So why are these values going out the otherwise link instead of
down the Lowcode link?

17-14 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 12. In the Debug window, right-click over the Distribution_Channel_Description


column, and then right-click Add to Watch List. This way you can highlight the
values for the column in both nodes.

__ 13. Click Run to End in the Debug window to see where the other rows go. The job
finishes and all the rows go down the otherwise link. But why? This should not
happen.

Copyright IBM Corp. 2012 Exercise 17. Group processing in the Transformer 17-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 14. Click the Start/Continue icon in the Debug window to start the job again. This time,
remove the quotes from around Food Service when prompted for the job
parameter value.

__ 15. Things definitely look better this time. More rows have gone down the LowCode link
and the breakpoint for the LowCode link has not been activated. The breakpoint for
the otherwise link has been activated. Since the Special_Handling_Code value is
out of range, this is as things should be.

__ 16. Click the Run to End icon in the Debug window to continue the job. This time the
job completes.

17-16 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 17. View the data in the LowCode file to verify that it contains only Food Service rows.

__ 18. View the data in the RangeErrors file to verify that it does not contain any Food
Service rows that are not out of range. There appear to be several Food Service
rows that should have gone out the LowCodes link.

__ 19. See if you can fix the bugs left in the job.

Hint

Try recoding the constraints in the Transformer.

Copyright IBM Corp. 2012 Exercise 17. Group processing in the Transformer 17-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

17-18 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 18.Repository functions

What this exercise is about


This exercise covers Repository functionality in DataStage.

What you should be able to do


At the end of this exercise, you should be able to:
Execute a Quick Find
Execute an Advanced Find
Generate a report
Perform an impact analysis
Find the differences between two jobs
Find the differences between two table definitions

Introduction
This exercise covers Repository functionality in DataStage. In this
exercise you will try out these new features including, Repository
Search, Impact Analysis, and generating job and table difference
reports.

Requirements
The screenshots and the results you get will vary unless you have
completed all the previous exercises in this course. If you have not, in
most cases you will still be able to complete the tasks although your
results might differ somewhat.

Copyright IBM Corp. 2012 Exercise 18. Repository functions 18-1


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Execute a Quick Find


1. Click the Open quick find link at the top of the Repository window.
2. In the Name to find box type Lookup*.
3. In the Types to find list select just parallel jobs.
4. Select the Include descriptions box.
5. Click Find. The first found item will be highlighted.

Note

Your results might differ somewhat from what is shown here.

6. Click Next to highlight the next item.

18-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Execute an Advanced Find


1. Click on the Adv button. This opens the Repository Advanced Find window.
2. In the Name to find field choose Lookup* from the drop down menu. If Lookup*
is not available type it in the field.
3. Open the Last modification folder. Specify objects modified within the last week
by your user ID.
4. Open up the Where Used folder. Add the Range_Descriptions.txt table
definition. This reduces the list of found items to those that use this table
definition.
5. Select in the Type box just parallel jobs and table definitions.
6. Click Find.

7. Select the found items and then click the right mouse button over them. Export
these jobs to a file named LookupJobs.dsx in your lab files Temp folder.
8. Close the Repository Export window.

Copyright IBM Corp. 2012 Exercise 18. Repository functions 18-3


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

9. Click the Results Graphical tab.

10. Explore some of the graphical tools. Expand the graphic. Move the graphic
around by holding down the right mouse button over the graphic and dragging it.
Drag the graphic around by moving the icon in the Bird's Eye view window.
Explore.

18-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Generate a report


1. Click File>Generate report to open a window from which you can generate a
report describing the results of your advanced find.
2. Click on the top link to view the report. This report is saved in the Repository
where it can be viewed by logging onto the Reporting Console.

3. Scroll through the report to view its contents.

Copyright IBM Corp. 2012 Exercise 18. Repository functions 18-5


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Perform an impact analysis


1. In the graphical results window, right click on
LookupWarehouseItemRangeRef. Choose Show dependency path to
'Range_Descriptions.txt'.

2. If necessary, use the Zoom control to adjust the size of the dependency path so
that it fits into the window.

3. Hold your right mouse button over a graphical object and move the path around.
4. Close the Advanced Search window.

18-6 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Find the differences between two jobs


1. Open your LookupWarehouseItemRangeRef job. Save it as
LookupWarehouseItemRangeRefComp into your _Training>Jobs folder.
2. Make the following changes to the LookupWarehouseItemRangeRefComp
job.
Open up the Range_Description Sequential File stage on the reference
link. On the Columns tab, change the length of the first column
(StartItem) to 111. On the Properties tab, change the First Line is
Column Names to False.
Change the name of the link going to the Warehouse_Items target
Sequential File stage to WAREHOUSE_ITEMS.
Open the Lookup stage. In the constraints window, change the Lookup
Failure condition to Drop.
3. Save the changes to your job.

Copyright IBM Corp. 2012 Exercise 18. Repository functions 18-7


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

4. Open up both the LookupWarehouseItemRangeRef and the


LookupWarehouseItemRangeRefComp jobs. Click Tile from the Window
menu to display both jobs in a tiled manner.

5. Right-click over your LookupWarehouseItemRangeRefComp job name in the


Repository window and then select Compare Against.
6. In the Compare window select your LookupWarehouseItemRangeRef job on
the Item Selection window.

18-8 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty 7. Click OK to display the Comparison Results window.

8. Click on a stage or link in the report, for example, Range_Description. Notice


that the stage is highlighted in both of the jobs.
9. Click on one of the underlined words. Notice that the editor is opened for the
referenced item.
10. With the Comparison Results window selected, click File>Save as and save
your report as an html file.
11. Open up the html file in a browser to see what it looks like.

Copyright IBM Corp. 2012 Exercise 18. Repository functions 18-9


Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Find the differences between two table definitions


1. Create a copy of your Warehouse.txt table definition.
2. Make the following changes to the copy.
On the General tab, change the short description to your name.
On the Columns tab change the name of the Item column to ITEM_ZZZ.
And change its type and length to Char(33).
3. Click OK.
4. Right-click over your table definition copy and then select Compare Against.
5. In the Comparison window select your Warehouse.txt table.
6. Click OK to display the Comparison Results window.

18-10 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 19.Reading and writing to relational


tables

What this exercise is about


This exercise covers reading and writing to DB2 tables using the
Connector stages.

What you should be able to do


At the end of this exercise, you should be able to:
Create a Data Connection object
Create and load a DB2 table using the DB2 Connector stage
Read from a DB2 table using the ODBC Connector stage

Introduction
In this exercise you will first create a job that creates and loads a DB2
table using the DB2 Connector stage. In a later task you will read from
the table using the ODBC Connector stage.

Requirements
No new requirements.

Copyright IBM Corp. 2012 Exercise 19. Reading and writing to relational tables 19-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Create a Data Connection object


__ 1. Click New.
__ 2. Click the Other folder.

__ 3. Double-click on the Data Connection icon to display the Data Connection window.
__ 4. Name the data connection DB2_Connect_dsadm.

19-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 5. Click on the Parameters tab. Select the DB2 Connector stage type in the Connect
using Stage Type list.

__ 6. Enter parameter values for the first three parameters:


ConnectionString: SAMPLE
Username: dsadm
Password: dsadm
__ 7. Click OK and then save the parameter set in your Metadata folder.

Copyright IBM Corp. 2012 Exercise 19. Reading and writing to relational tables 19-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Create and load a DB2 table using the DB2 Connector
stage
__ 1. Create a new parallel job named relWarehouseItems. The source stage is a
Sequential File stage. The target stage is a DB2 Connector stage. Name the links
and stages as shown.

__ 2. Edit the Warehouse Sequential File stage to read data from the Warehouse.txt file.
Be sure you can view the data.
__ 3. Edit the DB2 Connector stage as shown. First load the connection properties from
the Data Connection object you created in the previous task. This sets the
Database property to SAMPLE, and sets the user name and password properties.

19-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty In addition, set the Write mode property to Insert. Set Generate SQL to Yes. The
Table name is DSADM.ITEMS.

__ 4. Scroll down and set the Table action property to Replace. Also change the number
of rows per transaction (Record count) to 1. When this is done you must also set

Copyright IBM Corp. 2012 Exercise 19. Reading and writing to relational tables 19-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

the array size to 1 (because the number of rows per transaction must be a multiple
of the array size).

__ 5. Compile and run. Check the job log for errors.


__ 6. Open up the DB2 Connector. Click the View Data link.

19-6 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Import a table definition using ODBC


__ 1. Click Import>Table Definitions>ODBC Table Definitions.
__ 2. Select SAMPLE in the DSN box.
__ 3. Enter dsadm / dsadm as the user name and password.

__ 4. Click OK.
__ 5. Specify the To folder to point to your _Training>Metadata folder. Select the
DSADM.ITEMS table.

Copyright IBM Corp. 2012 Exercise 19. Reading and writing to relational tables 19-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Hint

If you have trouble finding it, type DSADM.ITEMS in the Name Contains box and then
click Refresh.

__ 6. Click Import.
__ 7. Open up your DSADM.ITEMS table definition in the Repository window and then
click the Columns tab to examine its column definitions.
__ 8. Click on the Locator tab and examine its contents. Verify that the Creator and
Table fields are filled in as shown. Type EDSERVER in the Computer box. This

19-8 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty metadata is saved in the Repository with the table definition and is used by
Information Server tools and components, including SQL Builder.

__ 9. Click OK to close the table definition.

Copyright IBM Corp. 2012 Exercise 19. Reading and writing to relational tables 19-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Create a job that reads from a DB2 table using the ODBC
Connector stage
__ 1. Create a new job named relReadTable_odbc. Here, use the ODBC Connector
stage to read from the ITEMS table you created in an earlier task. Write to a Data
Set stage.

__ 2. Open up the ITEMS Connector stage to the Properties tab. Type SAMPLE in the
Data source box. Specify your database user name and password, here
dsadm/dsadm. Click Test to test the connection.
__ 3. Set the Generate SQL property to Yes.

19-10 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 4. Type the table name: DSADM.ITEMS.

Copyright IBM Corp. 2012 Exercise 19. Reading and writing to relational tables 19-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 5. Click the Columns tab. Load your DSADM.ITEMS table definition.

__ 6. On the Properties tab, verify that you can view the data.
__ 7. In the Transformer stage map all columns across.
__ 8. In the target Data Set stage, write to a file named ITEMS.ds in your Temp directory.

19-12 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 9. Compile and run your job. Check the job log for errors. Be sure you can view the
data in the target data set file.

Copyright IBM Corp. 2012 Exercise 19. Reading and writing to relational tables 19-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

19-14 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 20.Connector stages with multiple input


links

What this exercise is about


This exercise covers creating multiple input links into Connector
stages.

What you should be able to do


At the end of this exercise, you should be able to:
Create a job with multiple Connector input links
Create Connector stage Reject links to capture rows in which SQL
errors occur
Create and use parameters for Connector stage properties

Introduction
Multiple input links are a major new feature in DataStage. The links
can be used to update multiple relational tables within the same
transaction. The use of Connector stage Reject links is also
demonstrated in this exercise.

Requirements
No new requirements.

Copyright IBM Corp. 2012 Exercise 20. Connector stages with multiple input links 20-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Create a job with multiple Connector input links


__ 1. Create a new parallel job named relMultInput. Name the links and stages as
shown.

__ 2. Open the source Sequential File stage. Edit it so that it reads from the
Selling_Group_Mapping.txt file. Be sure you can view the data.

__ 3. Open the Transformer. Map the Selling_Group_Code and Selling_Group_Desc


fields to the SGM_DESC output link. Map the Selling_Group_Code,
Special_Handling_Code, and Distribution_Channel_Description fields to the
SGM_CODES output link.

20-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 4. The Distribution_Channel_Description presents a problem. The column name is


too long for DB2. So change the name of the output column to
Distribution_Channel_Desc.

__ 5. Open up the DB2 Connector stage. Click on the Stage tab at the top left. This
displays the Connection properties.

Copyright IBM Corp. 2012 Exercise 20. Connector stages with multiple input links 20-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 6. Click the Load link. Select the DB2_Connect_dsadm Data Connection object you
created in an earlier lab.

20-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 7. Click on the Input tab. Select the SGM_DESC input link in the Input name
(upstream stage) box at the top left of the stage. Set the Write mode property to
Insert, Set Generate SQL to Yes, and Table name to SGM_DESC as shown.

__ 8. Select the Table action cell. Click on the icon to the right of the parameter value
cell. Click New Parameter.

__ 9. Create a new job parameter named TableAction with a default value of Append.

Copyright IBM Corp. 2012 Exercise 20. Connector stages with multiple input links 20-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 10. Click OK. This adds the job parameter enclosed in pound signs (#).

__ 11. Click the Columns tab. Select the Key box next to the Selling_Group_Code box.
This will define the column as a key column when the table is created.

20-6 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 12. Select the SGM_CODES input link in the Input name (upstream stage) box at the
top left of the stage. In the Properties tab, set the Write mode property to Insert,
the Generate SQL property to Yes, the Table name property to SGM_CODES, and
Table action to #TableAction# as shown.

__ 13. Click the Columns tab. Select the Key box next to the Selling_Group_Code box.
This will define the column as a key column when the table is created.

__ 14. Click on the Output tab and select SGM_DESC_Rejects


(Peek_SGM_DESC_Rejects) from the Output name (downstream stage) drop

Copyright IBM Corp. 2012 Exercise 20. Connector stages with multiple input links 20-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

down list. Select SGM_DESC in the Reject From Link box. Select the SQL error,
ERRORCODE, and ERRORTEXT boxes.

20-8 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 15. Select SGM_CODES_Rejects (Peek_SGM_CODES_Rejects) from the drop down


list. Select SGM_CODES in the Reject From Link box. Select the SQL error,
ERRORCODE, and ERRORTEXT boxes.

__ 16. Click OK to close the Connector stage.


__ 17. Compile your job.

Copyright IBM Corp. 2012 Exercise 20. Connector stages with multiple input links 20-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 18. Click the Run button. The Job Run Options window is displayed. The first time you
run this job, select Create as the Table action, so that the target tables get created.

__ 19. View the job log. Notice the DB2 Connector stage messages that display information
about the numbers of rows inserted and rejected.

__ 20. In the log, open the message that describes the statement used to generate the
table. Notice that the CREATE TABLE statement includes the PRIMARY KEY
option.

20-10 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 21. Now, let us test the reject links. Run the job again, this time selecting a Table action
of Append.

__ 22. Notice that all the rows are rejected, because they have duplicate keys.

Copyright IBM Corp. 2012 Exercise 20. Connector stages with multiple input links 20-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 23. In the job log, open up one of the reject Peek messages and view the information it
contains. Notice that it contains two additional columns of information
(RejectERRORCODE, RejectERRORTEXT) that contains SQL error information.

20-12 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 21.Construct an SQL statement using


SQL Builder

What this exercise is about


This exercise covers SQL Builder utility available in DataStage
Connector stages.

What you should be able to do


At the end of this exercise, you should be able to:
Build an SQL SELECT statement using SQL Builder
Use the SQL Builder expression editor

Introduction
In DataStage SQL Builder is available in all Connector stages. You can
use it to construct SQL statements for reading and writing within a GUI
tool.

Requirements
No new requirements.

Copyright IBM Corp. 2012 Exercise 21. Construct an SQL statement using SQL Builder 21-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Build an SQL SELECT statement using SQL Builder


__ 1. Open your relReadTable_odbc job and save it as relReadTable_odbc_sqlBuild.

__ 2. Open up your DSADM.ITEMS table definition. Click on the Locator tab. Edit or
verify that the Creator and Table boxes contain the correct schema name (creator)
and table name, respectively.

__ 3. Open up the Job Properties window and create two job parameters:
WarehouseLow is an integer type with a default value of 0. WarehouseHigh the
same type but has a default value of 999999

21-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 4. Open up the Connector source stage. In the Usage folder, set the Generate SQL
property to No. Notice that the warning icon shows up next to the Select statement
property.

__ 5. Click the Select statement cell and then click the Tools button. Click Build new
SQL (ODBC 3.52 extended syntax). This opens the SQL Builder window.
__ 6. Drag your DSADM.ITEMS table definition onto the canvas.

__ 7. Select all the columns except ALLOCATED and HARDALLOCATED and drag them
to the Select columns window.

Copyright IBM Corp. 2012 Exercise 21. Construct an SQL statement using SQL Builder 21-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 8. Sort by ITEM and WAREHOUSE in that order, ascending. To accomplish this select
Ascending in the Sort column. Specify the sort order in the last column.

__ 9. Click the SQL tab at the bottom of the window to view the SQL based on your
specifications so far.

__ 10. Click OK to save and close your SQL statement. You may get some warning
messages. Click Yes to accept the SQL as generated and allow DataStage to
merge the SQL Builder selected columns with the columns on the Columns tab.

21-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 11. In the Connector stage click the Columns tab. Ensure that the ALLOCATED and
HARDALLOCATED columns are removed, since they are not referenced in the
SQL.

__ 12. Click the Properties tab. Notice that the SQL statement you created using SQL
Builder has been put into the Select statement property.

__ 13. Open up the Transformer. Remove the output columns in red, since they are no
longer used.
__ 14. Compile and run. View the job log.
__ 15. Verify that you can view the data in the target stage.

Copyright IBM Corp. 2012 Exercise 21. Construct an SQL statement using SQL Builder 21-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Use the SQL Builder expression editor


__ 1. Save your job as relReadTable_odbc_expr.
__ 2. Open up your source ODBC Connector stage. Then click on the Tools button for the
SELECT statement you previously generated.
__ 3. Choose Edit existing SQL (ODBC 3.52 extended syntax).
__ 4. Click in the empty Column Expression cell after the last listed column, ONORDER.
Select Expression Editor from the drop-down list. This opens the Expression
Editor Dialog window.
__ 5. In the Predicates box select the Functions predicate and then select the
SUBSTRING function in the Expression Editor box. Specify that it is to select the
first 15 characters of the ITEM column.

__ 6. Click OK.
__ 7. For the new calculated column, specify a column alias of SHORT_ITEM.

__ 8. In the Construct filter expression (WHERE clause) window, construct a WHERE


clause that selects the following: Warehouses with numbers between
#WarehouseLow# and #WarehouseHigh#, where #WarehouseLow# and
#WarehouseHigh# are job parameters.

21-6 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 9. Click the Add button to add it to the SELECTION window.

__ 10. Click the SQL tab at the bottom of the SQL Builder to view the constructed SQL.
Verify that it is correct.

__ 11. Click OK to return to the Properties tab. A message is displayed informing you that
your columns in the stage do not match columns in the SQL statement. Click Yes to
add the SHORT_ITEM column to your metadata.
__ 12. On the Columns tab, specify the correct type for the SHORT_ITEM column, namely
Varchar(15).

Copyright IBM Corp. 2012 Exercise 21. Construct an SQL statement using SQL Builder 21-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 13. Open the Transformer and map the new SHORT_ITEM column across. Remove the
ONHAND and ONORDER columns from the output.

__ 14. Compile and run.


__ 15. View the results.

21-8 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Exercise 22.Build and run a Sequence job

What this exercise is about


This exercise covers Sequence jobs.

What you should be able to do


At the end of this exercise, you should be able to:
Build a Job Sequence that runs three jobs
Pass parameters from the Job Sequence to the Job Activity stages
Specify custom triggers
Define a user variable
Add a Wait for File stage
Add exception handling

Introduction
Sequence jobs are master jobs that run batches of DataStage jobs,
including other Job Sequences.

Requirements
No new requirements.

Copyright IBM Corp. 2012 Exercise 22. Build and run a Sequence job 22-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

Task: Build a Job Sequence


__ 1. Import the seqJobs.dsx file in your DSEss_Files>dsxfiles directory. This file
contains the jobs you will execute in your job sequence: seqJob1, seqJob2, and
seqJob3.
__ 2. Open up seqJob1. The other two jobs are similar. Compile the job.

__ 3. Right-click over seqJob2 in the Repository window. Click Multiple Job Compile.
The DataStage Compilation Wizard window is opened. Add seqJob2 and
seqJob3 to the Selected items window.

22-2 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 4. Click Next two times to move to the Compile Process window.

__ 5. Click Start Compile. After the jobs compile successfully, click Finish.
__ 6. Return to the open seqJob1 canvas. Click the Parameters tab in the Job
Properties window, and note the parameters defined for seqJob1. The other jobs
have similar parameters.

Copyright IBM Corp. 2012 Exercise 22. Build and run a Sequence job 22-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 7. Open the Transformer. Notice that the job parameter PeekHeading prefixes the
column of data that will be written to the job log using the Peek stage.

__ 8. Click New and then select the Jobs folder.

__ 9. Open a new Sequence Job and save it as seq_Jobs.


__ 10. Drag three Job Activity stages to the canvas, link them, and name the stages and
links as shown. (Alternatively, you can drag seqJob1, seqJob2, and seqJob3 to
the canvas.)

22-4 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty __ 11. Open the General tab in the Job Properties window. Read and check all the
compilation options.

__ 12. Add job parameters to the job sequence to supply values to the job parameters in
the jobs. Click on the Add Environment Variable button and then add
$APT_DUMP_SCORE. Also add three numbered RecCount variables:
RecCount1, RecCount2, and RecCount3. All are type string with a default value of
10.

__ 13. Open up each of the Job Activity stages and set or verify that the Job name box is
set to the job the Activity stage is to run.
__ 14. For each Job Activity stage, set the job parameters to the corresponding job
parameters of the job sequence. For the PeekHeading value use a string with a
single space.

Copyright IBM Corp. 2012 Exercise 22. Build and run a Sequence job 22-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 15. Set the Execution action to Reset if required, then run. Shown below is the
seqJob1 Activity stage. The others are similar.

__ 16. In each of the first two Job Activity stages, set the job triggers so that later jobs only
run if earlier jobs run without errors, although possibly with warnings.
This means that the DSJS.JOBSTATUS is either DSJS.RUNOK or
DSJS.RUNWARN.
To do this, create a custom trigger such that the previous job's status is
equal to one of the above two values. Click the right mouse button in the
expression window to insert the $JobStatus Activity Variable.

__ 17. Compile and run your job sequence.


__ 18. View the job log for the sequence. Verify that each job ran successfully and examine
the job sequence summary message and the individual job report messages.

22-6 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Add a user variable


__ 1. Save your job sequence as seq_Jobs_UserVar. Add a User Variables Activity
stage as shown.

__ 2. Open the User Variables stage to the User Variables tab. Right-click in the window
and then click Add Row. Create a user variable named varMessagePrefix.
__ 3. Double-click in the Expression cell to open the Expression Editor. Concatenate the
string constant Date is with the DSJobStartDate DSMacro, followed by a bar
surrounded with spaces ( | ).

Copyright IBM Corp. 2012 Exercise 22. Build and run a Sequence job 22-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 4. Open each Job Activity stage. For each PeekHeading parameter, insert the
varMessagePrefix in the Value Expression cell.

__ 5. Compile and run. In Director, open the job log for the seqJob1 job. Verify that the
PeekHeading value is inserted before the column values in the Peek messages in
the log. Below we see that the heading (Date is...) prefixes the data
(bbbbbbbb) going into col1.

22-8 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Add a Wait for File stage


In this task, you modify your design so that the job waits to be executed until the
StartRun.txt file appears in your DSEss_Files/Temp directory.
__ 1. Save your job sequence as seq_Jobs_Wait.
__ 2. Add a Wait for File Activity stage as shown.

__ 3. Add a job parameter named StartFile to pass the name of the file to wait for.

Copyright IBM Corp. 2012 Exercise 22. Build and run a Sequence job 22-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 4. Edit the Wait for File stage. Specify that the job is to wait forever until the
#StartFile# file appears in the DSEss_Files>Temp directory.

__ 5. On the Triggers tab, specify an unconditional trigger.


__ 6. Compile and run your job sequence. Now view the job log for the sequence. As you
can see in the log, the sequence is waiting for the file.

__ 7. Now open the seqStartSequence job that was part of the seqJobs.dsx file that
you imported earlier. This job creates the StartRun.txt file in your
DSEss_Files/Temp directory.
__ 8. Compile and run the seqStartSequence job to create the StartRun.txt file. Then
return to the log for your sequence to watch the sequence continue to the end.

22-10 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1
Student Exercises

EXempty Task: Add exception handling


__ 1. Save your sequence as seq_Jobs_Exception.
__ 2. Add the Exception Handler and Terminator Activity stages as shown.

__ 3. Edit the Terminator stage so that any running jobs are stopped when an exception
occurs.

Copyright IBM Corp. 2012 Exercise 22. Build and run a Sequence job 22-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Exercises

__ 4. Compile and run your job. To test that it handles exceptions make an Activity fail. For
example, set the RecCount3 parameter to -10. Then go to the job log and open the
Summary message. Verify that the Terminator stage was executed.

22-12 IBM InfoSphere DataStage Essentials v9.1 Copyright IBM Corp. 2012
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
V7.0.1

backpg
Back page

S-ar putea să vă placă și