Sunteți pe pagina 1din 126

IBM information server

Whats New in Datastage 8.0?

Course Contents
Module 01 : Introduction
Module 02: Deployment
Module 03 : Administering DataStage
Module 04 : DataStage Designer
Module 05 : Repository Functions
Module 06 : DataStage Utilities

Whats New in DataStage V8.0?


Mod 01 : Introduction

Unit objectives

After completing this unit,you should be able to:


Describe information server
List whats the same in Datastage version 8
List whats the different in Datastage version 8

IBM information server


Suit of applications, including DataStage, that :
- Shares a common repository
DB2, by default
- Shares a common set of application services and functionality
Provided by Metadata server components hosted by an application
server
- IBM Websphere Application server
Provided services included:
- Security
- Repository
- Logging and reporting
- Metadata management
Managed using web console clients
- Administration console
- Reporting console
5

IBM Information Server

Information Server Backbone

WebSphere
Information
service
director

WebSphere WebSphere
Business
Glossary

Information
Analyzer

Metadata Access
Services

WebSphere WebSphere
DataStage

Quality
Stage

WebSphere
Federation
Server

Metadata Analysis
Services

Metadata Server

Information Server
Console
7

Whats Different?

Whats the Same


DataStage Designer and Director work very much the same
-

What you could do before you can still do


Minor changes to menus and GUI

Same job types are supported


Parallel jobs
Server jobs
Mainframe jobs
Job sequences
All stages that existed in DataStage 7.5x are still supported
Previous Datastage functionality is still supported
-Export DataStage components (dsx)

Now occurs in designer


- Import Metadata

Sequential files

Database

COBOL file definitions

Job compile, execution ,run-time log are the same

Whats Different Part I


qualityStage is now embeded within a DataStage designer
DataStage Quality stage(a.k.a DataStage) are now hosted by
Metadata server
- There is a layer of administrtion ,logging, reporting, security that
occurs at Metadata server level(outside of DataStage)
- Managed by the administration and reporting console
- Repository is now managed by the Metadata server and its services
- No longer a universe database
- Repository model has been completely recognized
- Installation and deployment are now more flexible (complex)
- More deployment option
- Metadata server, repository, DataStage engines can be on different
mechines and platforms

10

Whats new or different? Part II


DataStage manager is gone
All manager functionality has been moved into Designer
DataStage
-Permissions are implemented differently
-GUI : new icons, menu arrangements, etc.
-New stages
slowly changing dimensions (SCD) stage
Surrogate key management
Connector stages
-Stage enhancements
Lookup stage now supports range lookups
Complex flat file(CFF) stage now supports multiple record format
SQL builders accessible in connector stages

11

Whats New or Different ? Part III


DataStage repository enhancements

Flexible folder organization


Repository search
Enhanced DataStage compenents export
Enhanced DataStage components export
Job and table definition difference reports
Impact analysis
New objects
Parameters sets: Named sets of parameters
Database connectors : Named sets of database connection property values
New utilities
Performance analyzer
Resource estimater

12

Information Server Administration


console

13

Datastage Administrator

14

DataStage Designer

15

DataStage Director

16

Unit Summary

Having completed this unit, you


should be able to:
Describe information serer
List whats the same in DataStage Version 8
List whats the different in DataStage Version 8

17

Whats New In DataStage V8 ?


Mod 02: Deployment

18

Unit objectives

After completing this unit, you


should be able to:
Identify the components of information server
that need to be installed
Describe what a deployment domain consists of
Describe different domain deployment option
Describe the installation process
Start the information server

19

What Gets Deployed


An information server domain, consisting of the following:
Metadata server, hosted by an IBM Websphere Application server instance
One or more DataStage servers
- DataStage server includes both the parallel and server engines
One DB2 UDB instance containing the repository database
Information server clients
- administration console
- reporting console
-Datastage clients
administrator
Designer
Director
Additional information server applications
- information analyzer
-Business glossary
-Rotational data architect
-Information services Director
-Federation Server
20

Deployment : Everything on one


machine
Here we have a single domain
With the hosted applications all on
one machine

Metadata Server
Backbone
Clients

Additional client workstations


Can connect to this machine
using TCP/IP

Clients

DataStage Server

DB2 Instance With


repository
21

Deployment : DataStage on
separate machine
Here the domain is split
Between two machines

Metadata Server Backbone

- Datastage server
- Metadata server and DB2
repository

Datastage Server

DB2 instance
With repository

Clients
22

Metadata server and DB2 on


separate machines
Here the domain is split
between three
machines
-Datastage server
- MetaData Server
- DB2 repository

Metadata Server
Backbone

Clients

DataStage server

DB2 Instances
With repository
23

Information Server Installation

24

Installation configuration layer


Configuration layers include:
- CLIENT
Datastage and information server clients
- Engine
Datastage and other application engines
- Domain
Metadata server and hosted metadata server components
Installed products domain components
- Repository
Repository database server and database
- Documentation
Selected layers are installed on the machine local to the installation
Already existing components can be configured and used
- E.g. DB2, Websphere Application Server

25

Information server Start-up


- Start the Metadata Server
From windows start menu, click Start the server after the profile to be used
(e.g. default)
From the command line, open the profile bin directory
- Enter the startup server1
> server 1 is the default name of the application server hosting the
Metadata server
- Start the ASB agent
From windows start menu, click Start the agent after selecting the
information server folder
Only required if Datastage and metadata server are on different machines.
- To begin work in DataStage, double click on a DataStage client icon.
- To begin work in Administration and reporting consoles, double click on the
Web Console for Information Server icon

26

Starting the Metadata Server


Backbone

27

Starting the ASB Agent

28

Testing the Installation


You can partially test the installation by logging on to the information
server Web Console

29

Checkpoint
1. What application components make up a domain ?
2. Can a domain contain multiple Datastage servers ?
3. Does the DB2 instance and the repository database
need to be on the same machine as the application
server ?
4. Suppose Datastage is on the separate machine from
the Application server. What two components need to
be running before you log onto DataStage ?

30

Check point solutions


1. Metadata server hosted by the application server. One or more
DataStage servers. One DB2/UDB instance containing the suit
repository database
2. Yes. The DataStage servers must be on separate machines. They
can be on different platforms e.g. one server running on Windows
and another running on Linux
3. No. the DB2 instance with the repository can reside on a separate
machine/ platform than the Application server.
4. The Application server and the ASB agent

31

Unit Summary
Having completed this unit, you should be able to :
Identify the components of information server that need to
be installed
Describe what a Deployment domain consist of
Describe different domain deployment options
Describe the installation process
Start the information server

32

Whats New in DataStage V8 ?


Module 03: Administering
DataStage

33

Unit objectives

After completing this unit, you should be able to:


Open the administrative console
Create new user and groups
Assign suite roles and product roles to users and groups
Give user DataStage credentials
Log on to DataStage Administrator
Add a DataStage administrator
Add a DataStage user on the permissions tab and specify the users
role

34

Information server administrator


console
Web application for administering information
server
Used for :
- Domain management
- Session management
- Management of users and groups
- Logging management
- Scheduling management

35

Opening the Administrator Web


Console

36

Users and Group Management

37

User and group Management


Suite authorization can be provided to users or group
- users that are members of the group acquire the authorization of the group
Authorizations are provided in the form of roles
- Two types of roles
Suit roles: Apply to the suite
Suite components roles : Apply to a specific product or components of information server, e.g.
DataStage.
Suite roles
- Administrator

Perform user and group management tasks

Includes all the privileges of the Suite user role


- User

Create views of scheduled tasks and logged messages

Create and run reports


Suite components roles
- DataStage user
Permissions are assigned within DataStage
> Developer, Operator, Super Operator, Productions Manager
DataStage administrator
- Full permission to work in DataStage Administrator, Designer and Director
- And so on for all products in suite
38

Creating a DataStage user ID

39

Assigning DataStage Roles

40

DataStage Credential Mapping


Users given DataStage Administrator or DataStage user
product roles in the suite administrator console do not
automatically receive DataStage credentials
- Users with DataStage Administrator roles need to be
mapped to a valid user on the DataStage server
machine
This DataStage users must have file access permission to
the DataStage engine/Project files or Administrator rights
on the operating system
- Users with DataStage user roles need to be mapped to a
valid user on the DataStage server machine and need
additional DataStage assigned permissions (developer
or operator)
41

DataStage credential mapping

42

DataStage Administrator

43

Logging on to Administrator
Host name,
port number of
application server

DataStage
administrator ID and
Password

Name or IP address of
DataStage server
machine
44

Permission Tab

45

Adding users and Groups

46

Specify DataStage Role

47

Check point
1. Authorization can be assigned to what two
items?
2. What two types of authorization roles can be
assigned to a user or group?

48

Check point solutions


1. Users and groups. Member of a group acquire
the authorization of the group.
2. Suite roles and product roles.

49

Unit Summary
Having completed this unit, you should be able to :
Open the administrative console
Create new user and groups
Assign suite roles and product roles to users and groups
Give user DataStage credentials
Log on to DataStage Administrator
Add a DataStage administrator
Add a DataStage user on the permissions tab and
specify the users role

50

Whats New in DataStage V8?


Mod 04: DataStage Designer

51

Unit Objectives
After completing this unit, you should be able to be able:

Log on to DataStage
Navigate around DataStage designer
Create a parameter Set
Build a range lookup job
Import and export DataStage objects to a file

52

Whats new in DataStage designer


Some changes to logging on
Manager functionality now in Designer
- Manager is Gone
Quality stage functionality now embedded in DataStage
- New data quality folder palette with half-dozen new stages
GUI
- New tool bar icon
-Menu organization changes
Repository enhancements
- Flexible folder organization
- Repository search
- Enhancements to DataStage Component export(dsx)
- Impact analysis
- Job and table definition differences
- Parameter sets
- Data connection Objects
New Stages
- SCD stage
- Connector Stage
Enhancement to existing stages
- Range Lookups
53

Logging on to DataStage Designer

54

Designer Work Area

55

Parameter Sets

56

Parameter Sets
Store a collection of parameters in a named object
One more values file can be named and specified
- A value file stores values for specified parameters
- values are picked up at runtime
Parameter sets can be added to the job parameters specified on the
parameters tab in the job properties

57

Creating a new parameter set

58

Parameters Tab

59

Values Tab

60

Adding a parameter set to job


properties

61

Using parameter Set parameters

62

Designing a Range Lookup Job

63

Range lookup Job

64

Range on Reference Link

65

Selecting the Stream Column

66

Range Expression Editor

67

Range on Stream Link

68

Specifying the Range Lookup

69

Range Expression Editor

70

Importing and Exporting


DataStage Objects

71

Repository Window

72

Export Window

73

Import Options

74

Checkpoint
The directory to which you export is on the DataStage client
machine, not on the DataStage Server machine.

75

Checkpoint Solution

1.True

76

Unit Summary
Having completed this unit, you should be able to:
Log on to DataStage
Navigate around DataStage Designer
Create a Parameter Set
Build a Range Lookup job
Import and Export DataStage objects to a file

77

Whats New in DataStage V8 ?


Mod 05 : Repository Functions

78

Unit Objectives

After completing this Unit, you should be able to:


Perform a simple Find
Perform an impact analysis
Compare the difference between two table
definitions
Compare the differences between two jobs

79

Searching the Repository

80

Quick Find

81

Found Results

82

Advanced Find Window

83

Advanced Find Filtering Options


Type : type of object
- job, table definition etc.
Creation : range of dates
- E.g., up to a week ago
Last modification : range of dates
- E.g., Up to a week ago
Where Used : objects that use specified objects
- E.g., a job that uses a specified Table Definition
Dependencies of : objects that are dependencies of objects
- E.g., a Table definition that is referenced in a specified job
Options
- Case Sensitive
- Search within a last result set

84

Using the Found Results

85

Impact Analysis

86

Performing an Import analysis


Find where table definitions are used
- Right click over a stage or table definition
- select Find where the table definition Used or
- select Find where the table definition Used(deep)

Deep includes additional Object types


- Displays the list of the objects using the table definition
Find object dependencies
- Select Find dependencies or
- Select Find dependencies (deep)
- Displays list of Object dependent on the one selected
Graphical Functionality
- Displays Dependencies Path
- Collapse selected Objects
- Move the graphical Object
- Bird-eye view
87

Initiating an impact analysis from


a stage

88

Displaying the Dependencies


Graphically

89

Displaying the dependency Path

90

Generating an HTML Report

91

Job and Table difference


Reports

92

Finding the Difference Between


Two Jobs
Example : job 1 is saved as job 2. Changes are made to
job 2. What changes have been made?

93

Initiating the comparison

94

Comparison Results

95

Saving to an HTML File

96

Comparing Table Definitions


Same procedure as when comparing jobs

97

Checkpoint
1. You can compare the difference between what two kinds
of objects?
2. What Wild Card characters can be used in a Find?
3. You have a job whose name begins with abc. You cant
remember the rest of the name or where the job is
located what would be the fastest way to export the job
to a file ?
4. Name three Filters you can use in a Advance Find ?

98

Checkpoint Solutions
1. Jobs. Table definition
2. Asterisk(*). It stands for any zero or more characters.
3. Do a Find for objects matching abc. Filter by type job.
Locate the job in the result set. Click the right mouse
button over it, and then click Export.
4 Type of object. Creation date range, last modified date
range, last modified date range, where used,
dependencies of, other options including case sensitivity
within last result set.

99

Unit summary
Having completed this Unit, you should be able to :
Perform a simple find
Perform an Advance Find
Perform an impact analysis
Compare the differences between two table
definitions
Compare the differences between two jobs

100

Whats New in DataStage V 8?


Module 06: DataStage Utilities

101

Unit Objectives
After completing this unit, you should be able to :
Analyze the performance of a job
Estimate the resources needed by a job

102

Performance Analysis In the Past


Sue the Director Monitor to watch the throughput(rows/sec) during a
job run
Compare job run durations
Turn an APT_PM_PLAYER_TIMING and
APT_PM_PLAYER_MEMORY to report player calls and memory
allocation

How This Fails You


Long running jobs couldnt be watched for record throughput
changes throughout the job run
The job monitor didnt allow recording for playback
Job monitor throughput rates included time waiting for data
Couldnt determine what was happening on the machines

103

Performance Analyzer
Visualization tool that provides deeper insight in to job run time behavior
Offers several categories of visualizations:
- Record Throughput(rows/sec)
- CPU utilization
- Job Timing
- Job memory utilization
- Physical machine utilization
Performance data to be visualized can be :
- Filtered in selected ways, including
Hide startup processes
Hide license operators
Hide inserted operators
- Isolated to selected stages (operators ) , partitions, and phases
Charts can be saved and printed

104

Enabling Performance Data


Recrding
Open the job in designer
Select record job
performance Data in Job
properties
Run your Job
performance collection
has little impact on overall
job performance
To view the results, click
the performance analysis
icon in Designer

105

Example Job

106

Job Timeline Chart

107

Expanding the Job Timeline Chart

108

Another Job Timeline Chart

109

Record Throughput Chart

110

Displaying Selected Stages

111

CPU Utilization Totals Chart

112

Machine Utilization - CPU

113

Filters

114

Resource Estimator

115

Resource Estimation
To start :
- open Job
- Click the Resource Estimation icon in the Designer toolbar
- Click Run to build statics based on a Job run
Generate models
- Static Model: Computed worst case scenarios of resources usage
- Dynamic Model: computed from a sampling of data
- View resource estimates by stage
- compare Model resource estimates
Generate Projections
- View Projection resource estimates

116

Example Job

117

Resource Estimation Window

118

Model Tab

119

Model Estimates by Stage

120

Comparing Model Estimates

121

Projection Tab

122

Projection Estimates

123

Checkpoint
1. What are the five types of visualization that can be
created by the Performance Analyzer ?
2. How do you enable the collection of performance data?
3. Describe the two types of models that can be generated
by the Resource Estimator?

124

Checkpoint Solutions
1. Record Throughput(rows/sec).CPU utilization. Job
timing. Job memory utilization
2. Click on the Execution tab in Job Properties. Select
record Job Performance Data in Job Properties.
3. Static Model: computed worst case scenarios of
resource usage. Dynamic model: computed from the
sampling of the data

125

Unit Summary
Having completed this unit, you should be able to :
Analyze the performance of a job
Estimate the resources needed by a job

126