Sunteți pe pagina 1din 13

DataStage Projects Life Cycle Stages

Agenda
 Introduction  Requirements  Design  Build  Testing  Implementation  Support

2002. Infosys Technologies Ltd.

Introduction
DataStage projects follow the same life cycle stages as other projects. A typical life cycle phase of DataStage projects is Requirements Design Build Test Implement Support

2002. Infosys Technologies Ltd.

Requirements
Requirements Design Build Test Implement Support

 W arehouse needs to cater to a wide range of user analytics. Requirements should be well documented, elaborate and tight  Clearly identify the interface points and define the communication protocol  User views need to be modeled and aligned more closely to meet business needs  Identify the dependencies between all aspects of the project like ETL feeds, User Views etc. to facilitate better control over project execution  Performance related requirements need to be identified and documented.  Source Data Analysis need to be done to understand the type of data which needs to be processed.  A detailed Analysis/High level design phase is required to drill down the requirements

2002. Infosys Technologies Ltd.

Steps to effective Requirement gathering


Requirements Design Build Test Implement Support

 Identify the source system tables required.  Identify the data flow  Identify the data process.  Identify Views to be created, Reports to be generated etc.  Create Requirement Traceability and Test Matrix  State the assumptions clearly.  Define implementation Considerations.  Document Design Solution.  Identify Transformations -- Define data mapping.  Gather Volumetrics  Start Data Analysis.

2002. Infosys Technologies Ltd.

Design
Requirements Design Build Test Implement Support

 A fluid Data Model will result in lot of rework. Changes might be small, but might be required at multiple places increasing volume of rework.  Changing Data Model leads to difficulty in Metadata Management, which is very critical for an enterprise data warehouse. Metadata needs to be extracted and loaded into DataStage every time there is a change. This process needs a significant lead time.  Design should be robust and accommodate process health features like Auditing, ACR balancing, Error processing and reprocessing, Restart ability, Recovery etc  Perform POC on critical requirements and identify performance bottlenecks upfront  ACR checkpoints in the data flow will help in identifying the data problems early in the process before data is loaded to warehouse.  Design patterns should be reusable across projects to reduce development time  Brainstorm and consider various aspects of Framework , Finalize and Bring Clarity.  A flexible framework design which takes care of recovery in case of a downtime is very critical from application support perspective.

2002. Infosys Technologies Ltd.

Steps to a Good Design


Requirements Design Build Test Implement Support

 Re-validate Data Mapping.  Define General programming specifications.  Define Development objects.  Define Miscellaneous processes like Error processing, re-processing, ACR balancing, Auditing etc.  Create a POC for all the critical/complicated points, make it End to End to have no surprises during build.  Identify Common functionality, jobs, Scripts etc keeping re-usability in mind  Prepare Test plans, map them to requirements.  Define Programming standards, directory structure.  Explore different options/possibilities for Data Extraction

2002. Infosys Technologies Ltd.

Build
Requirements Design Build Test Implement Support

 Multiple stages can be used to establish similar/same function. Choice of selecting the right stage and configuration is key in developing a quality solution  Implementation of encryption routines using Open SSL library for AES encryption/Decryption/ SHA-1 hashing etc should be taken care in the start of the phase.  Metadata is a key aspect of a successful data warehouse implementation. Standards need to be clearly defined and followed  Accessing DataStage over Citrix server has improved productivity to a large extent. This has also given the flexibility to try out multiple options and provide the best solution. Hence Citrix server should be used for accessing datastage.  Knowledge Management practices capture and disseminate information. Repository of knowledge articles, learnings, checklist should be built from experience

2002. Infosys Technologies Ltd.

Tips to Efficient Build


Requirements Design Build Test Implement Support

 Categorize similar jobs.  Define framework for each category.  Define framework for each process (like error processing, record processing,  Finalize job parameters.  Build re-usable components, frameworks and custom stages.  Prepare necessary check list for Build.  Get Metadata ready.  Build datastage jobs.  Perform Usage Analysis for Metadata Compliance.  Finalize sequencing and scheduling (Either Control M or Sequencer)

2002. Infosys Technologies Ltd.

Testing System/ Volume/ Performance/ Integration/ Acceptance


Requirements Design Build Test Implement Support

 Experience in handling large volumes of data in multiple projects, including the huge CSPAM volumes from Target Stores  Broader understanding and good experience from innumerable challenges that we have overcome across projects and environments, old as well as new.  Understanding the role of the various teams involved. Ability to partner/coordinate/collaborate with multiple teams.  Testing of DataStage jobs requires considerable amount of time. Adequate testing time should be planned  Preparing a good test data bed is often complex and difficult. Plan well in advance.  Plan to have enough database capacity and test schemas to have a smooth testing phase.  Learning's from Target DataStage/UDB environment is critical in successful testing phase

2002. Infosys Technologies Ltd.

10

Testing System/ Volume/ Performance/ Integration/ Acceptance


Requirements Design Build Test Implement Support

 Obtain/ prepare source data matching all the scenarios.  Try to obtain production source data if possible for testing.  If possible have more rounds of testing.  Perform Unit testing  Test for negative cases too.  If there are changes, do regression testing.  Ensure the configuration similar to production environment while testing.  Identify system related issues and include them in System Testing. Configure and use Schedulers.  Perform Volume testing with various data. Use source data from production if available. Otherwise use generate data using tools for volume testing.  Identify all other external components and include them for Integration testing.

2002. Infosys Technologies Ltd.

11

Implementation
Requirements Design Build Test Implement Support

 Need to plan in advance for the implementation phase. Need to collaborate with different stake holders to successfully implement various aspects of the application such as DataStage jobs, Control-M schedule, Unix scripts, ACR application, etc.  In case of a new environment like grmetlprod01, there needs to be a test implementation phase to iron out any environment related surprises.  Awareness of the new processes in place for DataStage implementation such as the deployment using W BSD. This will help in resolving problems and reducing delays  A well developed deployment checklist which can be reused across projects

2002. Infosys Technologies Ltd.

12

Support
Requirements Design Build Test Implement Support

 Supported DataStage applications after implementation and successfully turned over a few applications to ESS. Need to plan well in advance for the involvement of TOC for the application turnover. Also a comprehensive knowledge article listing all the issues faced and the resolution from the UAT through support phase is very critical for the support team.  Based on the criticality of the application, a clear escalation procedure/support plan should be put in place to address environment related issues. This should be planned in advance with the DataStage support team as well as the DB hosting team.  Ability and experience to provide 24*7 support is critical for most of the high volume Data W arehouse ETL applications. Infosys Global Delivery Model suits for round the clock support with onsite-offshore resources.  Familiarity with all the support/turnover activities and systems like remedy to manage the post implementation/ turnover effectively.

2002. Infosys Technologies Ltd.

13

S-ar putea să vă placă și