Sunteți pe pagina 1din 27

MASTER DATA EXTRACT GUIDE

Release 8.0

Revision A

Initiate, InitiateSM and Initiate Identity Hub are trademarks and/or service marks of Initiate Systems, Inc., which may be registered in some jurisdictions. All rights reserved. All other marks are owned by their respective owners. The information in this document is protected under the applicable federal law as an unpublished work, and is confidential and proprietary to Initiate Systems, Inc. Its use, disclosure, reproduction, or publication, in whole or in part, without the express prior written consent of Initiate Systems, Inc. is prohibited.

Table of Contents
ABOUT THIS MANUAL ........................................................................................................ 4 Audience and purpose .............................................................................................. 4 Organization ............................................................................................................ 4 Additional reference documentation ........................................................................ 4 How to get help........................................................................................................ 4 ATSC........................................................................................................................ 4 Support Center Knowledge Base .................................................................................. 5 Acknowledgements .................................................................................................. 5 CHAPTER 1: MASTER DATA EXTRACT OVERVIEW .................................................................... 6 Clover.ETL basics ..................................................................................................... 6 The Master Data Extract Sample Graphs .................................................................. 7 CHAPTER 2: USING THE MASTER DATA EXTRACT SAMPLE GRAPHS ............................................. 10 Importing the sample graphs ................................................................................. 10 Configuring Readers............................................................................................... 15 Creating a database connection.................................................................................. 15 Specifying a database connection for each Reader ........................................................ 16 Configuring the extract_full_all.grf sample graph.................................................. 17 Configuring the extract_incremental_db.grf sample graph .................................... 19 Parameters for incremental extraction......................................................................... 19 Configuring the extract_incremental_file.grf sample graph ................................... 23 Parameters for incremental extraction......................................................................... 23 Running a graph..................................................................................................... 25 Troubleshooting graphs ......................................................................................... 25 Debugging a graph Edge ........................................................................................... 25 Viewing logs and error messages ............................................................................... 26 Automatic graph execution .................................................................................... 26 Using the madconfig utility to create a properties file for a scheduled job ......................... 26 Using madconfig to launch a graph using a specified properties file ................................. 26 Recording responses to the madconfig utility ............................................................... 27 Using extract.ddl to create target database schema .............................................. 27

iii

About this manual


Audience and purpose
This guide is intended for solution architects and developers responsible for development of Extract Transfer Load (ETL) graphs for data extraction. Through discussion of sample graphs, this guide describes how to extract data from Master Data Engine database tables, transform it for use with downstream applications such as reporting and analytic tools, and write the transformed data to database tables or extract files.

Organization
The information presented includes: Contents of Manual In Chapter 1 2 You will find Overview of the Master Data Extract application Detailed information about using the Master Data Extract sample graphs

Additional reference documentation


For additional information, refer to the following documents: Workbench User Guide Initiate Master Data Service Data Model Description Clover Documentation: The Clover.GUI Users Guide, which can be downloaded at http://www.cloveretl.org/documentation/clover-gui The Clover.ETL Wiki at http://wiki.clovergui.net/doku.php

How to get help


ATSC
Each organization designates two (2) or more individuals to act as Authorized Technical Support Contacts (ATSC) for Initiate software issues. These individuals interact with users in your organization and, when necessary, work with the technical support staff at Initiate Systems, Inc. to resolve issues. When you have questions or concerns about the software, and if the information in this guide does not answer your questions, contact your ATSC. Your ATSC will try to determine if the problem is a hardware system issue or an operational issue before contacting Initiate Systems for assistance.

About this manual

Support Center Knowledge Base


We realize that you might have questions that may not be addressed in the documentation, training, or within your standard workflow procedure. The Initiate Systems Customer Support Web site (http://www.initiatesystems.com/support) provides a knowledge base that offers additional information about Initiate products and their use. New items are frequently added, so please refer to the Web site when possible.

Acknowledgements
Third party software code files are shipped along with the Initiate 8.0 (the Third Party Code) software. Third Party Code files are the property of their respective owners and not Initiate Systems and Initiate Systems claims no rights in or to the Third Party Code. Your use and access to the Third Party Code is governed by the specific restrictions and limitations set forth in the applicable licenses provided by the Third Party Code owners. The Third Party Code is provided to you by Initiate solely for use with the Initiate software product and Initiate Systems does not authorize or promote any other use of the Third Party Code by you. The full text of the applicable Third Party Code licenses is provided in the Third Party License.zip file included along with the Initiate Release 8.0, located on the Initiate Systems product CD or downloaded CD image.

Chapter 1: Master Data Extract overview


Master Data Extract uses Clover.ETL, an open-source Extract Transfer Load utility, to extract data from the Master Data Engine to external files, for use with reporting and analytical systems. Extracts are designed and executed as graphs in Clover.ETL, and can be either full or incremental. Master Data Extract provides several sample graphs which illustrate how the Extract Transfer Load process works. The sample graphs extract entity-level attribute data from the Master Data Engine database, and write it to a variety of output options. The sample graphs are designed to be examples of how to use the Clover.ETL application; some configuration is necessary in order to use the graphs with your own data. In addition, graphs can be edited and customized according to your specific data extraction requirements.

Clover.ETL basics
For basic information on using the Clover.ETL application, refer to the Initiate Workbench User Guide and to the Clover documentation.

Master Data Extract overview

The Master Data Extract Sample Graphs


Master Data Extract provides three sample data extract graphs: extract_full_all.grf: This graph does a full extract of entity-level attribute data from the Master Data Engine database, removes duplicate entities, and writes the output to a selected target file or database. The graph consists of several subgraphs or series of connected Reader, Transformer, and Writers, which operate in parallel; each subgraph reads data from a specific database table in the Master Data Engine database.

Master Data Extract overview extract_incremental_db.grf: This graph does an incremental extract of entity-level attribute data from the Master Data Engine database, filters it based on audit-record parameters supplied by the user as a configuration parameter, removes duplicates, and writes the output to a specified database. The graph consists of several subgraphs or series of connected Reader, Transformer, and Writers, which operate in parallel; each subgraph reads data from a specific database table in the Master Data Engine database.

Master Data Extract overview extract_incremental_file.grf: Like the extract_incremental_db.grf, this graph does an incremental extract of entity-level attribute data and filters it on user-supplied audit record numbers, and removes duplicates. The output of this graph is written to a series of delimited files. The graph consists of several subgraphs or series of connected Reader, Transformer, and Writers, which operate in parallel; each subgraph reads data from a specific database table in the Master Data Engine database.

Chapter 2: Using the Master Data Extract sample graphs


This chapter provides information about how to configure each of the sample graphs for use with your data. Each graph consists of several components: Readers read data from an external source such as a database or file. Before you can use a graph, the Readers must be configured with parameters for connecting to these external sources. Note: Readers for each of the sample graphs are configured in the same manner; therefore Reader configuration is described independently of the specific sample graphs. Transformers perform operations on data, such as sorting, filtering, merging, and deduplicating. The Transformers in the sample graphs have been configured to process data as needed for each type of extract, you may want to edit the Transformers to adjust how data is handled. Also, in some cases you will need to delete some Transformers which copy data to Writers that you do not plan to use. Writers write processed data to specified target files, such as database tables or a designated flat file. Before you can use a graph, Writers must be configured with parameters that specify the target output file(s).

Importing the sample graphs


You must import the sample graphs into Workbench in order to access them in Clover.ETL. To import the sample graphs: 1. In the Navigator view, right-click on the Project folder you want to import the sample graphs into, and choose Import.

10

Using the Master Data Extract sample graphs 2. In the Import - Select dialog, navigate to and select Import graphs version conversion (in the Clover ETL node).

3. Click Next.

11

Using the Master Data Extract sample graphs 4. In the Import Clover ETL Graphs dialog, click the Browse button beside the From directory field.

12

Using the Master Data Extract sample graphs 5. Navigate to and select the <ROOTDIR>\Workbench x.x.x\samples\graphs directory (where <ROOTDIR> is your Initiate program files installation directory and x.x.x is your application version number).

6. Click OK. 7. The Into folder field lists the folder into which the graphs will be imported; the field you right-clicked on in Step 1 is displayed here by default. If you wish to specify a different folder, click the Browse button beside the Into folder field to browse to and select another folder.

13

Using the Master Data Extract sample graphs 8. The Import Clover ETL Graphs window is now populated with all available sample graphs. Check the boxes for the graphs you want to import: extract_full_all.grf extract_incremental_db.grf extract_incremental_file.grf

9. Click Finish.

14

Using the Master Data Extract sample graphs

Configuring Readers
The Readers in the sample graphs query database tables in parallel, ordering results by entity record number and modified audit record number. Before executing the graph, each of the Reader elements must be configured with the appropriate database connection information. Before you can specify a database connection for your Reader(s), you must create a database connection. Once the database connection is created, it can be used for all your Reader(s) in the sample graph.

Creating a database connection


To create a database connection: 1. In Outline view, right-click Connections and choose Connections > Create internal. This opens the database connections window. Note: You must have a graph open in the Graph editor to see the nodes, including the Connection node, in the Outline view. 2. Click to select a database driver from the available drivers window. Note: It is recommended that you use one of the supplied Initiate drivers. 3. Enter a Name for your connection. 4. Enter the User and Password for connecting to your database. 5. In the URL field, enter the appropriate values for the database parameters: hostname port database (for MSSQL, DB2, and Informix databases) SID (for Oracle databases) 6. Click the Validate Connection button to validate your database connection. 7. Click Finish.

15

Using the Master Data Extract sample graphs

Specifying a database connection for each Reader


Once the database connection has been created, you must edit each of the Readers to reference this connection. To specify a database connection in a Reader: 1. Double-click the Reader to open the Edit component dialog.

2. On the Properties tab, under Clover.ETL properties basic, click in the Value field for DB connection. A down arrow appears. 3. Click the down arrow and select the database connection you created for this database. 4. Click OK to save your changes and close the Edit component dialog. Note: You must specify a database connection for each of the Readers in the sample graph.

16

Using the Master Data Extract sample graphs

Configuring the extract_full_all.grf sample graph


The extract_full_all.grf sample graph must be edited to add database connection properties to the Readers. In addition, this graph provides several types of Writer for writing the output; before running the graph you must select the Writers you want to use, and remove the ones you will not use. Note: Each sample graph consists of several subgraphs or connected series of Readers, Transformers, and Writers, which operate in parallel when the graph is executed. The section below describes how to edit one of these subgraphs. You will need to repeat the steps below for each subgraph in your sample graph. To configure the extract_full_all.grf sample graph: 1. In the Navigator view, double-click the extract_full_all.grf sample graph to open it in the Graph editor.

2. Edit the Reader to provide database connection parameters. Detailed information on how to configure a Reader to connect to a database is given in the Configuring Readers section above.

17

Using the Master Data Extract sample graphs 3. Using the Select tool from the Palette, select and delete each of the Writers you do not wish to use. Each subgraph includes the following Writer types; delete all but the type you wish to use:
Sample graph Writer types

Database or file type Oracle DB2 MSSQL Delimited file

Name format oracle_data type db2_data type mssql_data type delimited_data type db2_ssn

Example oracle_name mssql_phone delimited_addr

Note: You can also disable a Writer by right-clicking the Writer and choosing Disable. 4. Using the Select tool, delete the Edge linking the Copy Transformer to your remaining Writer. 5. Use the Select tool to drag the Edge linking the Dedup Transformer to the Copy Transformer so that it connects the Dedup Transformer to the input port of your remaining Writer instead.

6. Delete the Copy Transformer. 7. If you are using a database Writer, connect the Writer to a database: A. Double-click the Writer to open the Edit component dialog. B. On the Properties tab, enter the relevant required properties according to the tables below. Required properties with missing values are flagged with a yellow exclamation-point icon.
Required Oracle properties

Property Path to sqlldr utility User name Password

Value The path to Oracles SQL Loader (sqlldr) utility. Click in this field to display an ellipsis, then click on the ellipsis to browse to the utility. The user name for connecting to the database The password for connecting to the database 18

Using the Master Data Extract sample graphs

Required Oracle properties

Property TNS name

Value The transparent network substrate (TNS) name identifier

Required DB2 properties

Property Database User name Password Database table

Value The database to which this data will be written The user name for connecting to the database The password for connecting to the database The name of the database table where this data will be written

Required MSSQL properties

Property Path to bcp utility

Value Path to the utility that copies data between Microsoft SQL Server and a data file. Click in this field to display an ellipsis, then click on the ellipsis to browse to the utility. The database to which this data will be written

Database

C. Click OK to save your changes and close the Edit component dialog. 8. Repeat the steps above as needed to edit each of the subgraphs in the sample graph. Refer to the Running a graph section below for information on how to run your graph once it is configured.

Configuring the extract_incremental_db.grf sample graph


The extract_incremental_db.grf must be edited to add database connection properties to the Reader, and configure appropriate attribute and audit record number parameters. In addition, this graph provides several Writers for writing the output; before running the graph you must select the Writer you want to use, configure a database connection for it, and remove the ones you will not use.

Parameters for incremental extraction


The parameters for the Transformers in this graph determine which attributes are read from the database, and a range of auditor record numbers which specify which records to extract data from. In a typical use case, you will configure the attributes one time (typically, within the graph itself) to determine which attributes are extracted, but the audit record range will typically be updated each time you run the graph. As an alternative to manually setting a range of auditor record numbers in the graph each time you run it, you can set up an automated, scheduled job to automatically update a parameter file with current values. See the Automatic graph execution section below for detailed information. 19

Using the Master Data Extract sample graphs Note: Each sample graph consists of several subgraphs or connected series of Readers, Transformers, and Writers, which operate in parallel when the graph is executed. The section below describes how to edit one of these subgraphs. You will need to repeat the steps below for each subgraph in your sample graph. To configure the extract_incremental_db.grf sample graph: 1. In the Navigator view, double-click the extract_incremental_db.grf sample graph to open it in the Graph editor.

2. Edit the Reader to provide database connection parameters. Detailed information on how to configure a Reader to connect to a database is given in the Configuring Readers section above.

20

Using the Master Data Extract sample graphs 3. Verify that the parameters for attributes and auditor record numbers are correct. Parameters are listed in the Outline view, in the Parameters node.

Note that parameters listed here apply to the graph as a whole, and are not edited for individual components. Note: In a typical use case, you will edit the attribute record parameters on a one-time basis as part of general graph configuration, but update the auditor record numbers each time you run the graph. You can use the madconfig utility to populate the auditor record number parameters via a scheduled job. See the Automatic graph execution section below for more information. 4. Using the Select tool from the Palette, select and delete each of the Writers you do not wish to use. Each subgraph includes the following Writer types; delete all but the type you wish to use:
Sample graph Writer types

Database or file type Oracle DB2 MSSQL

Name format oracle_data type db2_data type mssql_data type db2_ssn

Example oracle_name mssql_phone

Note: You can also disable a Writer by right-clicking the Writer and choosing Disable. 5. Using the Select tool, delete the Edge linking the Copy Transformer to your remaining Writer.

21

Using the Master Data Extract sample graphs 6. Use the Select tool to drag the Edge linking the Reformat Transformer to the Copy Transformer so that it connects the Reformat Transformer to the input port of your remaining Writer instead.

7. Delete the Copy Transformer. 8. Edit the Writer to connect to a database: A. Double-click the Writer to open the Edit component dialog. B. On the Properties tab, enter the relevant required properties according to the tables below. Required properties with missing values are flagged with a yellow exclamation-point icon.
Required Oracle Properties

Property Path to sqlldr utility User name Password TNS name

Value The path to Oracles SQL Loader (sqlldr) utility. The user name for connecting to the database The password for connecting to the database The transparent network substrate (TNS) name identifier

Required DB2 Properties

Property Database User name Password Database table

Value The database to which this data will be written The user name for connecting to the database The password for connecting to the database The name of the database table where this data will be written

Required MSSQL Properties

Property Path to bcp utility

Value Path to the utility that copies data between Microsoft SQL Server and a data file. Click in this field to display an ellipsis, then click on the ellipsis to browse to the utility. The database to which this data will be written 22

Database

Using the Master Data Extract sample graphs

C. Click OK to save your changes and close the Edit component dialog. 9. Repeat the steps above as needed to edit each of the subgraphs in the sample graph.

Configuring the extract_incremental_file.grf sample graph


The extract_incremental_file.grf must be edited to add database connection properties to the Reader, and configure appropriate attribute and audit record number parameters. Writers have been configured to write output to specified files; you can edit the Writer properties to edit the name and location if you wish, but further configuration of the Writer is not required.

Parameters for incremental extraction


The parameters for the Transformers in this graph determine which attributes are read from the database, and a range of auditor record numbers which specify which records to extract data from. In a typical use case, you will configure the attributes one time (typically, within the graph itself) to determine which attributes are extracted, but the audit record range will typically be updated each time you run the graph. As an alternative to manually setting a range of auditor record numbers in the graph each time you run it, you can set up an automated, scheduled job to automatically update a parameter file with current values. See the Automatic graph execution section below for detailed information. Note: Each sample graph consists of several subgraphs or connected series of Readers, Transformers, and Writers, which operate in parallel when the graph is executed. The section below describes how to edit one of these subgraphs. You will need to repeat the steps below for each subgraph in your sample graph.

23

Using the Master Data Extract sample graphs To configure the extract_incremental_file.grf sample graph: 1. In the Navigator view, double-click the extract_incremental_file.grf sample graph to open it in the Graph editor.

2. Edit the Reader to provide database connection parameters. Detailed information on how to configure a Reader to connect to a database is given in the Configuring Readers section above.

24

Using the Master Data Extract sample graphs 3. Verify that the parameters for attributes and auditor record numbers are correct. Parameters are listed in the Outline view, in the Parameters node.

Note that parameters listed here apply to the graph as a whole, and are not edited for individual components. Note: In a typical use case, you will edit the attribute record parameters on a one-time basis as part of general graph configuration, but update the auditor record numbers each time you run the graph. You can use the madconfig utility to populate the auditor record number parameters via a scheduled job. See the Automatic graph execution section below for more information. 4. Repeat the steps above as needed to edit each of the subgraphs in the sample graph.

Running a graph
To run a graph, click the Run icon in the toolbar, or choose Run > Run from the menu. When a graph is run, the number of records processed along each Edge is displayed. For detailed information about graph runtime options, refer to the Clover documentation.

Troubleshooting graphs
Use the following processes and tools to troubleshoot your graphs.

Debugging a graph Edge


To debug a graph Edge, right-click on the Edge and choose Debug > Enable Debug. A green bug icon is displayed on edges with debugging enabled. Debug information is captured when the graph is run. 25

Using the Master Data Extract sample graphs You can view debug data after the graph is run by right-clicking the edge and choosing Debug > View Data.

Viewing logs and error messages


Warning and error messages, processing information, and graph status are captured on the Console, Problems, CloverGraph tracking, and Clover Log views. Refer to the Clover documentation for detailed information about the contents of these tabs.

Automatic graph execution


Clover.ETL graphs can be executed automatically as part of a scheduled job, using the madconfig utility. You may wish to create an external properties file which the scheduled madconfig utility can reference, if you want to update parameters such as audit record number range when you run the scheduled job.

Using the madconfig utility to create a properties file for a scheduled job
Incremental extracts typically select data based on a range of audit record numbers which change each time the graph is run. Although you may manually set the range of record numbers to extract manually in the graph, it may be more practical to generate a properties file automatically via a scheduled job. The properties file then supplies the graph with the appropriate values for the record number range. This section describes how to use the madconfig utility to launch a graph using a designated, external properties file. You can set up a scheduled job to launch the madconfig utility on a regular basis. Note: It is outside the scope of this document to describe how to set up a scheduled job which generates the properties file. You can use a standard utility such as the Windows Task Scheduler or a Unix chron utility (or other methods) to set up a scheduled job.

Using madconfig to launch a graph using a specified properties file


This madconfig operation requires a properties file containing auditor record number files. To use madconfig to launch a Clover.ETL graph: 1. From a command prompt, run madconfig launch_etl Note: This utility is run from the <ROOTDIR>\Engine 8.0.0\scripts directory 2. When prompted, enter the path to the graph (*.grf file) you want to run. 3. When prompted, enter the path to your configuration file (that is, the file containing the properties for your graphs audit record number parameters). 4. When prompted, enter a memory size setting. 256 is the default. Note: Complete documentation of the madconfig utility is found in the Initiate Master Data Service Master Data Engine Installation Guide.

26

Using the Master Data Extract sample graphs

Recording responses to the madconfig utility


If you want to launch a graph via madconfig on a scheduled basis, you can record a set of responses to the madconfig utilitys prompts. To record a set of responses to the madconfig launch_etl function, run madconfig recordfile myfile.properties launch_etl where myfile.properties is the name of the file which will store your responses. Note: In addition to recording your responses, this command also executes the graph. To run madconfig using the recorded responses, run madconfig propertyfile myfile.properties lauch_etl, where myfile.properties is the name of the file where your responses are stored.

Using extract.ddl to create target database schema


An extract.ddl file is provided as a convenience for creating target database schema with the maddbx utility. For detailed information on how to use maddbx with a *.ddl file to create database schema, refer to the Master Data Engine Installer Guide. Note that the provided extract.ddl file references the schema used by the sample graphs in their original format. If you edit the graphs in a way which alters the metadata layout for the Writers, you must also edit the extract.ddl file before using it to create your target database schema.

27

S-ar putea să vă placă și