Sunteți pe pagina 1din 46

IBM Information Server

WebSphere DataStage 8.0


Richard Hedges
Program Director, Product Management
IBM Information Server
Agenda
ƒ IBM Information Server Overview & Architecture
ƒ WebSphere DataStage Usability Improvements
ƒ Best in class Data Transformation
ƒ Focus on Connectivity
ƒ Performance, Performance, and Performance
ƒ Installation, Configuration, Administration, Reporting
ƒ Upgrade to WebSphere DataStage v8.0
IBM Information Server
Delivering information you can trust
IBM Information Server

Unified Deployment

Understand Cleanse Transform Deliver

Discover, model, and Standardize, merge, Combine and restructure Synchronize, virtualize
govern information and correct information information for new uses and move information for
structure and content in-line delivery

Unified Metadata Management

Parallel Processing
Rich Connectivity to Applications, Data, and Content
IBM Information Server Architecture
UNIFIED USER INTERFACE

Supporting IBM
WebSphere
Analysis Development Web Admin Application Server
Interface Interface Interface

COMMON SERVICES Supporting IBM


DB2, Oracle, and
Unified Logging & MS SQL Server
Metadata Security
Service Reporting
Services Services
Deployment Services

UNIFIED PARALLEL PROCESSING UNIFIED METADATA

Understand Cleanse Transform Deliver Design Operational

COMMON CONNECTIVITY

Structured, Unstructured, Applications, Mainframe


Agenda
ƒ IBM Information Server Overview & Architecture
ƒ WebSphere DataStage Usability Improvements
ƒ Best in class Data Transformation
ƒ Focus on Connectivity
ƒ Performance, Performance, and Performance
ƒ Installation, Configuration, Administration, Reporting
ƒ Upgrade to WebSphere DataStage v8.0
DataStage and QualityStage Designer
Quick Find - Basic

ƒ Find item in
Repository tree
– In-place find

– Find by Name (Full


or Partial)
– Wild card support
– Find next…
– Filter on type
Find – Advanced Search Criteria

ƒ Search on following criteria:


– Object type
• Job, Table Definition, Stage etc.
– Creation
• Date/Time
• By User
– Last Modification
• Date/Time
• By User
– Where Used
• What other objects use this object?
– Dependencies of
• What does this object use?
ƒ Options
– Case
– Match on “name & description” or
“name or description”
Impact Analysis – Graphical View
Impact Analysis:
-Find dependencies
…What does this item depend on?
-Find where used
…Where is this item used?

Results shown using the


Advanced Find window
Impact Analysis – Tabular View

Results can be saved to html


or xml file for additional
processing or remote user
viewing. Within application,
results list can feed export,
reporting or compilation
functions
Job, Table or Routine Difference
Available for Jobs, Tables
& Routines

Textual report with hot


links to the relevant
editor in Designer.
Tables Æ
Job Parameter Sets

ƒ New object in
repository that contains
the names and values
of job parameters

ƒ A Parameter Set can


be referenced by one
or more jobs
Job Parameter Sets
¾ Can use Impact Analysis to determine which Jobs are
using a Parameter Set

¾ Works for DataStage Server and DataStage


Enterprise Edition

¾ Easier to share job parameters across jobs

¾ Easier to deploy jobs across machines

¾ Easier to propagate a changed job parameter value


Collaboration: Multi-User Environment

ƒ Locking to prevent concurrent update clashes


ƒ Optional “read-only” view when items already locked in
Repository
ƒ Visible lock “owner” to aid
identification
– By Name & Session ID

ƒ Identified user for “last modified” or “created by” actions


– Searchable using Advanced Find
– E.g. “Find all items created by user x today”
Export Improvements
 The new GUI allows modification of the original
populated export list.
 Items can be added, removed, filtered out.

Available from

Export based on a result of a search


Meta Data Sharing
DataStage, QualityStage & Information Analyzer

ƒ Sharing meta data with WebSphere Information Analyzer


– Both tools store Table meta data in the common repository
– DataStage users can see the table meta from Information Analyzer
• Allows sharing of meta data definitions
• Provides single meta data import from data source ~ for use in both tools
• Enables DS user to see IA analysis data for shared tables
– Where is the IA analysis information
available in DS/QS Designer?
• “Analytical Information” tab on the
EditRow dialog when looking at the
details of an individual column from…
– …a Table Definition
– …a stage editor
• “Analytical Information” tab on the Table
Definition dialog
Agenda
ƒ IBM Information Server Overview & Architecture
ƒ WebSphere DataStage Usability Improvements
ƒ Best in class Data Transformation
ƒ Focus on Connectivity
ƒ Performance, Performance, and Performance
ƒ Installation, Configuration, Administration, Reporting
ƒ Upgrade to WebSphere DataStage v8.0
Lookup Stage – New Range Capabilities
ƒ Range check box
allows you to specify a
range key for a 1 to 2
type range lookup

ƒ Key Type drop down


allows you to specify a
range key for a 2 to 1
type range lookup

ƒ Double clicking on the


Key Expression field of
a range key will bring
up the Range
Expression dialog
New Range Expression Dialog
ƒ Column selection for
the range key from the
reference table

ƒ Column selection for


the bounding columns
from the primary input

ƒ Range expression
operator drop down.
Specifies whether the
range bounds are
inclusive or exclusive
Surrogate Key Management
ƒ New engine functionality
ƒ Exposed in 2 new stages and 1 old one
– Surrogate Key Generator
– Slowly Changing Dimension
– Transformer – Initialize(), GetNextKey()
ƒ How it works
– Uses built-in state files or DBMS sequences (DB2 & Oracle)
– Supports large integer (uint64) surrogate key values
– Can be used to discover surrogate key values which are already
being used so that use of duplicate key values will be avoided
– Customizable block size to manage key gaps vs. performance
New Functionality to Support SCD

ƒ New engine
capabilities
– Surrogate Key
management
– Updatable in-memory
lookups
ƒ New & enhanced
stages
– Surrogate Key
Generator
– Slowly Changing
Dimension
Agenda
ƒ IBM Information Server Overview & Architecture
ƒ WebSphere DataStage Usability Improvements
ƒ Best in class Data Transformation
ƒ Focus on Connectivity
ƒ Performance, Performance, and Performance
ƒ Installation, Configuration, Administration, Reporting
ƒ Upgrade to WebSphere DataStage v8.0
Connectivity Updates

ƒ New functionality and more DB supported in SQL builders


– SQL Server, Teradata, ODBC

ƒ New Stored Procedures functionality and for more DBs


– SQL Server, Teradata

ƒ Latest/Greatest version support (not all listed)


– DB2 9.1
– Oracle 10gR2
– SQL Server 2005
– Teradata v2r6.1 (DB server) / 8.1 (TTU)
– Sybase ASE 15, Sybase IQ 12.7
– Informix 10 (IDS)
– SAS 9.1
– IBM WS MQ 6.1, WS MB 5.1
– Netezza v3.1
New Connectivity
– Stages for WebSphere Federation and Classic Federation
• Server and Enterprise stages
• DRS Support
• Native integration with Federation and Classic Federation

– Netezza Enterprise Stage


• Parallel Loader leveraging NZ_Load and External Tables

– SFTP Enterprise Stage


• Secure data transmission

– iWay Enterprise Stage


• Integration with over 250 disparate/legacy sources
Connection Objects
ƒ New top-level repository
object
ƒ Allows saving of a re-usable
connection path to a specific
source or target
– Username, password, db
name etc.
ƒ Supported on specific
stagetypes
– New Rich Connectors
– Enterprise Stages: DB2,
Informix, Oracle, Teradata
– For Plug-ins…
– For Server built-ins
• ODBC, UniVerse, UniData
Next Generation “Rich” Connectors
Combining the best of the plug-ins, operators, plus more.....

ƒ ODBC
– Embedded DataDirect v5.2 Connect for ODBC drivers

ƒ DB2 – Q107
– For DPF and non-DPF

ƒ Teradata – Q107
– New support for Teradata Parallel Transport (TPT)

ƒ Oracle – Q107
– New support for 10gR2

ƒ WebSphere MQ – Q107
– Adding support for “client only” configuration
Next Generation “Rich” Connectors

Connection
objects allow
properties to
be dropped Test the
onto stage connection
instantly

Diagram lets
you select the
link to edit as Parameter
though you’re button on
on the canvas every field

Warning sign
Graphical SQL
tells you which
builder
fields are
mandatory
Enterprise Packs Updates

– New Validations for enterprise apps versions


• SAP ECC 6.0
• SAP BI 7.0
• Siebel 7.8
• JD Edwards EnterpriseOne 8.12

– New SAP Unicode Certifications


• BW-STA 3.5 : Staging BAPI certification for BW Load
• BW-OHS 3.5 : Open-Hub service certification for BW Extract
• CA-ALE 4.0 : IDoc Load and Extract supports Web AS 6.40
• IA-BAPI : BAPI Load and Extract supports Web AS 6.40

– New Functionality
• Enhanced support for Siebel EIM and Business Components
• New Metadata browser and importer for Oracle Applications
• Greater support for large enterprise class deployments
CFF Stage – Multi-Format Record Support
ƒ Complex Flat File stage now
processes Multi Format Flat
(MFF) file

ƒ Constraints can be specified


on the output links to filter
data and/or define when a
record should be sent down
the link

ƒ New Fast Path feature


provides guided creation
Agenda
ƒ IBM Information Server Overview & Architecture
ƒ WebSphere DataStage Usability Improvements
ƒ Best in class Data Transformation
ƒ Focus on Connectivity
ƒ Performance, Performance, and Performance
ƒ Installation, Configuration, Administration, Reporting
ƒ Upgrade to WebSphere DataStage v8.0
Performance Improvements
ƒ Improved Job Startup Time
– Allow efficient use of DS EE against smaller data sets

ƒ Buffer Optimization
– Improved buffer placement algorithm
– E.g., Removed unnecessary buffer before parallel sort in some
instances

ƒ Combinability Optimizations
– More combinable stages
– Intelligent combining

ƒ Adaptive Job Monitoring


– The Adaptive Job Monitoring feature detects when CPU utilization
by the conductor reaches 80% and throttles the volume of job
monitoring data
– Note: only monitor messages will be throttled, metadata and
summary messages are not affected
– Time-based monitoring is now supported
Job Performance Analysis
A new visualization tool which:
ƒ Provides deeper insight
into runtime job behavior.
ƒ Offers several categories
of visualizations, including:
– Record Throughput
– CPU Utilization
– Job Timing
– Job Memory Utilization
– Physical Machine
Utilization
ƒ Hides runtime complexity
by emphasizing the stages
on the designer canvas.
Resource Estimation
ƒ Difficult to estimate resources required for job execution
– Scratch space, CPU, etc.
ƒ What happens if data volume increases?
ƒ How do I prevent job aborting due to lack of system resources?
Resource Estimation Tool Layout Overview
Agenda
ƒ IBM Information Server Overview & Architecture
ƒ WebSphere DataStage Usability Improvements
ƒ Best in class Data Transformation
ƒ Focus on Connectivity
ƒ Performance, Performance, and Performance
ƒ Installation, Configuration, Administration, Reporting
ƒ Upgrade to WebSphere DataStage v8.0
New IBM Information Server Installation
Create Users, Assign Roles,
and Map Credentials
1. Administration tab
click on users then
select create new
users
2. Enter values for the
different user
attributes. Id,
Password, First
Name and Last Name
are required
3. Assign Suite and
Product Roles as
appropriate
4. Click on Save

5. Map Credentials
Security Services
ƒ Internal Directory
– Defines users, groups, roles
– Support browsing/creation/deletion/update operations
ƒ External Directories
– LDAP, Active Directory, Unix
– External directories password are not stored
– Support browsing/partial update operations
ƒ Roles
– Suite roles: Suite User, Suite Administrator
– Product roles: e.g. DataStage user
– Project roles: e.g. Information Analyzer User
ƒ Standard Based Authentication
– JAAS
– Work against the supported directories
Logging
ƒ A new common logging facility
– Used by all the products of the Suite
– Logs go into the operational repository
ƒ DataStage Client log viewer does not change
ƒ Logging administration done from the administration console
ƒ Logging Views are “saved queries”
– Opening a view displays the log events corresponding to the
“saved query”
– Example
• Severity level: Error
• Category: DataStage
• Timestamp: past 12 hours
– A user can now view logs in a Production environment via a
browser and perform nothing else in that environment
Reporting Console

ƒ Can publish reports


from DataStage to the
IBM Information
Server Reporting
Console

ƒ Job Reports,
Advanced Find,
Impact Analysis, etc.
Source-to-Target and Target-to-Source
Agenda
ƒ IBM Information Server Overview & Architecture
ƒ WebSphere DataStage Usability Improvements
ƒ Best in class Data Transformation
ƒ Focus on Connectivity
ƒ Performance, Performance, and Performance
ƒ Installation, Configuration, Administration, Reporting
ƒ Upgrade to WebSphere DataStage v8.0
Upgrade
ƒ All objects from DataStage v7 projects upgrade into
DataStage v8.0
– Export projects and Import into DataStage v8.0
– All jobs (Server, Parallel, Mainframe, and Sequencer)
along with all other objects will migrate

ƒ Unix users can install IBM Information Server and


previous versions on the same server

ƒ Note: DataStage Version Control not in v8.0.


Platforms
ƒ At GA
– DS & QS Client: Windows XP
– Windows Server 2003
– AIX 5.2, 5.3
– Red Hat Enterprise Linux AS 3.0
– Red Hat Enterprise Linux AS 4.0
– SuSE Enterprise Linux 9, 10
– HP-UX 11i1 (11.11), 11i2 (11.23) – PA-RISC
– Solaris 2.9, 2.10

ƒ NLS Support, but not localized


The IBM Information Server Advantage
A Complete Information Infrastructure
ƒ A comprehensive, unified foundation for enterprise information
architectures, scalable to any volume and processing requirement
ƒ Auditable data quality as a foundation for trusted information across
the enterprise
ƒ Metadata-driven integration, providing breakthrough productivity
and flexibility for integrating and enriching information
ƒ Consistent, reusable information services—along with application
services and process services, an enterprise essential
ƒ Accelerated time to value with proven, industry-aligned solutions
and expertise
ƒ Broadest and deepest connectivity to information across diverse
sources: structured, unstructured, mainframe, and applications
Thank You!

S-ar putea să vă placă și