Sunteți pe pagina 1din 16

IBM WebSphere DataStage

DataStage EE Basics
Creating a Job

DW & BI IMPACT Training 2008


Kolkata

© 2007 IBM Corporation


IBM GBS | WebSphere DataStage PX Training

Where We are
 MODULE 01 Introduction to DataStage
 MODULE 02 DataStage Installation on Windows Platform
 MODULE 03 Features of DataStage Clients
 MODULE 04 Creating a Job
 MODULE 05 Accessing Sequential Data
 MODULE 06 Combining Data
 MODULE 07 Splitting Data
 MODULE 08 Transforming Data
 MODULE 09 Sorting and Aggregating Data
 MODULE 10 Accessing Relational Data
 MODULE 11 Job Control
 MODULE 12 Architecture and Parallelism Concepts

© 2007 IBM Corporation


IBM GBS | WebSphere DataStage PX Training

Module 04
Creating a Job

© 2007 IBM Corporation


IBM GBS | WebSphere DataStage PX Training

Developing Jobs in DataStage


 Study Given Technical Specification
 Plan Job Design
 Decide Stage Types
 Import Metadata of source and target [and lookup reference] in
manager
 Build job in Designer
 Configure Stages
 Define Job parameters
 Maintain Design Standards
 Compile job in Designer
 Run and monitor job in Director

Module 03 © 2007 IBM Corporation


IBM GBS | WebSphere DataStage PX Training

Study Given Technical Specification


 source and target column details
 Source Information
 Target Information
 Mapping Rules – Business logic

Module 03 © 2007 IBM Corporation


IBM GBS | WebSphere DataStage PX Training

Attaching to a Project

Module 03 © 2007 IBM Corporation


IBM GBS | WebSphere DataStage PX Training

Open DataStage Designer

Module 03 © 2007 IBM Corporation


IBM GBS | WebSphere DataStage PX Training

Plan Job Design

1. Analyze the source data and what stage we can use to extract the
source data

2. Implement the business logic in DataStage

3. Load the Data into the database table

© 2007 IBM Corporation


IBM GBS | WebSphere DataStage PX Training

Job Parameters

 Go to Job Properties
 Use parameters defined in Project Level: User defined
parameters in environment variable. Default value can be hard
coded or retrieved at runtime [$PROJDEF]
 Create Job- level parameters
 Parameters can be passed from a sequence. Value defined in
higher level takes precedence

© 2007 IBM Corporation


IBM GBS | WebSphere DataStage PX Training

A sample Job in Designer


Compile Job
Annotation
Job properties

Run Job

© 2007 IBM Corporation


IBM GBS | WebSphere DataStage PX Training

Job Properties View generated script


Job parameters in OSH

Run any BASIC sub-routine


before or after Job run

Short description of
the Job

Full Job description to


track modification
history

© 2007 IBM Corporation


IBM GBS | WebSphere DataStage PX Training

Job Parameters
Parameter name
Parameter
Parameter prompt to default value
be seen at run time

Parameter type
Project level
parameters

Job level
parameters

Add new
environment variable

© 2007 IBM Corporation


IBM GBS | WebSphere DataStage PX Training

Design Standards – Naming Convention

 Maintain Naming Conventions for stages and links


 Use Job and stage annotations
 Name the stages after the Data they access/Function they
perform
 DO NOT leave default stage names like Sequential_File_0
 Use 2-character prefixes to indicate stage type
 Links named for the data they carry
 DO NOT leave default link names like DSLink1
 Prefix all link names with “lnk_”

© 2007 IBM Corporation


IBM GBS | WebSphere DataStage PX Training

Design Standards – Development Approach


 Follow Iterative Job Design
• Use Copy and Peek stages as stubs
• Start small and build to final solution
• Start from source and work out

 Test job in phases


• Small sections first, then increasing in complexity
• Use Peek stage/datasets to examine records
• Check data at various locations
• Check before and after processing

 Solve the business problem before the performance problem

© 2007 IBM Corporation


IBM GBS | WebSphere DataStage PX Training

Compile/Validate/Run
 Compiling a Job:
• creates the OSH script for the Job
• C++ code for the transformers

 Validating a Job:
• performs connectivity check and authentication
• Basically does everything except process rows. It opens any required files and
connect to data sources.
• Prepares SQL and captures any return message referring to syntax errors from
database
• It is not necessary to validate a job every time you change its design. If the
change does not affect any passive stage, then re-validating it won't really
prove anything. Missing property values will be picked up when the job is
compiled.
• Validation is useful on initial deployment.

 Run a Compiled Job


• DataStage Engine runs the processes on the available nodes [ details later]

© 2007 IBM Corporation


IBM GBS | WebSphere DataStage PX Training

Q&A

© 2007 IBM Corporation

S-ar putea să vă placă și