8.0 Master Data Extract Guide - RevA

MASTER DATA EXTRACT GUIDE
Release 8.0
Revision A
Initiate, InitiateSM and Initiate Identity Hub are trademarks and/or service marks of Initiate Systems, Inc., which may be registered in some jurisdictions. All rights reserved. All other marks are owned by their respective owners. The information in this document is protected under the applicable federal law as an unpublished work, and is confidential and proprietary to Initiate Systems, Inc. Its use, disclosure, reproduction, or publication, in whole or in part, without the express prior written consent of Initiate Systems, Inc. is prohibited.
Table of Contents
ABOUT THIS MANUAL ........................................................................................................ 4 Audience and purpose .............................................................................................. 4 Organization ............................................................................................................ 4 Additional reference documentation ........................................................................ 4 How to get help........................................................................................................ 4 ATSC........................................................................................................................ 4 Support Center Knowledge Base .................................................................................. 5 Acknowledgements .................................................................................................. 5 CHAPTER 1: MASTER DATA EXTRACT OVERVIEW .................................................................... 6 Clover.ETL basics ..................................................................................................... 6 The Master Data Extract Sample Graphs .................................................................. 7 CHAPTER 2: USING THE MASTER DATA EXTRACT SAMPLE GRAPHS ............................................. 10 Importing the sample graphs ................................................................................. 10 Configuring Readers............................................................................................... 15 Creating a database connection.................................................................................. 15 Specifying a database connection for each Reader ........................................................ 16 Configuring the extract_full_all.grf sample graph.................................................. 17 Configuring the extract_incremental_db.grf sample graph .................................... 19 Parameters for incremental extraction......................................................................... 19 Configuring the extract_incremental_file.grf sample graph ................................... 23 Parameters for incremental extraction......................................................................... 23 Running a graph..................................................................................................... 25 Troubleshooting graphs ......................................................................................... 25 Debugging a graph Edge ........................................................................................... 25 Viewing logs and error messages ............................................................................... 26 Automatic graph execution .................................................................................... 26 Using the madconfig utility to create a properties file for a scheduled job ......................... 26 Using madconfig to launch a graph using a specified properties file ................................. 26 Recording responses to the madconfig utility ............................................................... 27 Using extract.ddl to create target database schema .............................................. 27
iii
About this manual

Audience and purpose
This guide is intended for solution architects and developers responsible for development of Extract Transfer Load (ETL) graphs for data extraction. Through discussion of sample graphs, this guide describes how to extract data from Master Data Engine database tables, transform it for use with downstream applications such as reporting and analytic tools, and write the transformed data to database tables or extract files.
Organization
The information presented includes: Contents of Manual In Chapter 1 2 You will find Overview of the Master Data Extract application Detailed information about using the Master Data Extract sample graphs
Additional reference documentation

For additional information, refer to the following documents: Workbench User Guide Initiate Master Data Service Data Model Description Clover Documentation: The Clover.GUI Users Guide, which can be downloaded at http://www.cloveretl.org/documentation/clover-gui The Clover.ETL Wiki at http://wiki.clovergui.net/doku.php
How to get help

ATSC
Each organization designates two (2) or more individuals to act as Authorized Technical Support Contacts (ATSC) for Initiate software issues. These individuals interact with users in your organization and, when necessary, work with the technical support staff at Initiate Systems, Inc. to resolve issues. When you have questions or concerns about the software, and if the information in this guide does not answer your questions, contact your ATSC. Your ATSC will try to determine if the problem is a hardware system issue or an operational issue before contacting Initiate Systems for assistance.
About this manual
Support Center Knowledge Base

We realize that you might have questions that may not be addressed in the documentation, training, or within your standard workflow procedure. The Initiate Systems Customer Support Web site (http://www.initiatesystems.com/support) provides a knowledge base that offers additional information about Initiate products and their use. New items are frequently added, so please refer to the Web site when possible.
Acknowledgements
Third party software code files are shipped along with the Initiate 8.0 (the Third Party Code) software. Third Party Code files are the property of their respective owners and not Initiate Systems and Initiate Systems claims no rights in or to the Third Party Code. Your use and access to the Third Party Code is governed by the specific restrictions and limitations set forth in the applicable licenses provided by the Third Party Code owners. The Third Party Code is provided to you by Initiate solely for use with the Initiate software product and Initiate Systems does not authorize or promote any other use of the Third Party Code by you. The full text of the applicable Third Party Code licenses is provided in the Third Party License.zip file included along with the Initiate Release 8.0, located on the Initiate Systems product CD or downloaded CD image.
Chapter 1: Master Data Extract overview

Master Data Extract uses Clover.ETL, an open-source Extract Transfer Load utility, to extract data from the Master Data Engine to external files, for use with reporting and analytical systems. Extracts are designed and executed as graphs in Clover.ETL, and can be either full or incremental. Master Data Extract provides several sample graphs which illustrate how the Extract Transfer Load process works. The sample graphs extract entity-level attribute data from the Master Data Engine database, and write it to a variety of output options. The sample graphs are designed to be examples of how to use the Clover.ETL application; some configuration is necessary in order to use the graphs with your own data. In addition, graphs can be edited and customized according to your specific data extraction requirements.
Clover.ETL basics
For basic information on using the Clover.ETL application, refer to the Initiate Workbench User Guide and to the Clover documentation.
Master Data Extract overview
The Master Data Extract Sample Graphs

Master Data Extract provides three sample data extract graphs: extract_full_all.grf: This graph does a full extract of entity-level attribute data from the Master Data Engine database, removes duplicate entities, and writes the output to a selected target file or database. The graph consists of several subgraphs or series of connected Reader, Transformer, and Writers, which operate in parallel; each subgraph reads data from a specific database table in the Master Data Engine database.
Master Data Extract overview extract_incremental_db.grf: This graph does an incremental extract of entity-level attribute data from the Master Data Engine database, filters it based on audit-record parameters supplied by the user as a configuration parameter, removes duplicates, and writes the output to a specified database. The graph consists of several subgraphs or series of connected Reader, Transformer, and Writers, which operate in parallel; each subgraph reads data from a specific database table in the Master Data Engine database.
Master Data Extract overview extract_incremental_file.grf: Like the extract_incremental_db.grf, this graph does an incremental extract of entity-level attribute data and filters it on user-supplied audit record numbers, and removes duplicates. The output of this graph is written to a series of delimited files. The graph consists of several subgraphs or series of connected Reader, Transformer, and Writers, which operate in parallel; each subgraph reads data from a specific database table in the Master Data Engine database.
Chapter 2: Using the Master Data Extract sample graphs

This chapter provides information about how to configure each of the sample graphs for use with your data. Each graph consists of several components: Readers read data from an external source such as a database or file. Before you can use a graph, the Readers must be configured with parameters for connecting to these external sources. Note: Readers for each of the sample graphs are configured in the same manner; therefore Reader configuration is described independently of the specific sample graphs. Transformers perform operations on data, such as sorting, filtering, merging, and deduplicating. The Transformers in the sample graphs have been configured to process data as needed for each type of extract, you may want to edit the Transformers to adjust how data is handled. Also, in some cases you will need to delete some Transformers which copy data to Writers that you do not plan to use. Writers write processed data to specified target files, such as database tables or a designated flat file. Before you can use a graph, Writers must be configured with parameters that specify the target output file(s).
Importing the sample graphs

You must import the sample graphs into Workbench in order to access them in Clover.ETL. To import the sample graphs: 1. In the Navigator view, right-click on the Project folder you want to import the sample graphs into, and choose Import.
10
Using the Master Data Extract sample graphs 2. In the Import - Select dialog, navigate to and select Import graphs version conversion (in the Clover ETL node).
3. Click Next.
11
Using the Master Data Extract sample graphs 4. In the Import Clover ETL Graphs dialog, click the Browse button beside the From directory field.
12
Using the Master Data Extract sample graphs 5. Navigate to and select the <ROOTDIR>\Workbench x.x.x\samples\graphs directory (where <ROOTDIR> is your Initiate program files installation directory and x.x.x is your application version number).
6. Click OK. 7. The Into folder field lists the folder into which the graphs will be imported; the field you right-clicked on in Step 1 is displayed here by default. If you wish to specify a different folder, click the Browse button beside the Into folder field to browse to and select another folder.
13
Using the Master Data Extract sample graphs 8. The Import Clover ETL Graphs window is now populated with all available sample graphs. Check the boxes for the graphs you want to import: extract_full_all.grf extract_incremental_db.grf extract_incremental_file.grf
9. Click Finish.
14
Using the Master Data Extract sample graphs
Configuring Readers
The Readers in the sample graphs query database tables in parallel, ordering results by entity record number and modified audit record number. Before executing the graph, each of the Reader elements must be configured with the appropriate database connection information. Before you can specify a database connection for your Reader(s), you must create a database connection. Once the database connection is created, it can be used for all your Reader(s) in the sample graph.
Creating a database connection

To create a database connection: 1. In Outline view, right-click Connections and choose Connections > Create internal. This opens the database connections window. Note: You must have a graph open in the Graph editor to see the nodes, including the Connection node, in the Outline view. 2. Click to select a database driver from the available drivers window. Note: It is recommended that you use one of the supplied Initiate drivers. 3. Enter a Name for your connection. 4. Enter the User and Password for connecting to your database. 5. In the URL field, enter the appropriate values for the database parameters: hostname port database (for MSSQL, DB2, and Informix databases) SID (for Oracle databases) 6. Click the Validate Connection button to validate your database connection. 7. Click Finish.
15
Specifying a database connection for each Reader

Once the database connection has been created, you must edit each of the Readers to reference this connection. To specify a database connection in a Reader: 1. Double-click the Reader to open the Edit component dialog.
2. On the Properties tab, under Clover.ETL properties basic, click in the Value field for DB connection. A down arrow appears. 3. Click the down arrow and select the database connection you created for this database. 4. Click OK to save your changes and close the Edit component dialog. Note: You must specify a database connection for each of the Readers in the sample graph.
16
Configuring the extract_full_all.grf sample graph

The extract_full_all.grf sample graph must be edited to add database connection properties to the Readers. In addition, this graph provides several types of Writer for writing the output; before running the graph you must select the Writers you want to use, and remove the ones you will not use. Note: Each sample graph consists of several subgraphs or connected series of Readers, Transformers, and Writers, which operate in parallel when the graph is executed. The section below describes how to edit one of these subgraphs. You will need to repeat the steps below for each subgraph in your sample graph. To configure the extract_full_all.grf sample graph: 1. In the Navigator view, double-click the extract_full_all.grf sample graph to open it in the Graph editor.
2. Edit the Reader to provide database connection parameters. Detailed information on how to configure a Reader to connect to a database is given in the Configuring Readers section above.
17
Using the Master Data Extract sample graphs 3. Using the Select tool from the Palette, select and delete each of the Writers you do not wish to use. Each subgraph includes the following Writer types; delete all but the type you wish to use:
Sample graph Writer types
Database or file type Oracle DB2 MSSQL Delimited file
Name format oracle_data type db2_data type mssql_data type delimited_data type db2_ssn
Example oracle_name mssql_phone delimited_addr
Note: You can also disable a Writer by right-clicking the Writer and choosing Disable. 4. Using the Select tool, delete the Edge linking the Copy Transformer to your remaining Writer. 5. Use the Select tool to drag the Edge linking the Dedup Transformer to the Copy Transformer so that it connects the Dedup Transformer to the input port of your remaining Writer instead.
6. Delete the Copy Transformer. 7. If you are using a database Writer, connect the Writer to a database: A. Double-click the Writer to open the Edit component dialog. B. On the Properties tab, enter the relevant required properties according to the tables below. Required properties with missing values are flagged with a yellow exclamation-point icon.
Required Oracle properties
Property Path to sqlldr utility User name Password
Value The path to Oracles SQL Loader (sqlldr) utility. Click in this field to display an ellipsis, then click on the ellipsis to browse to the utility. The user name for connecting to the database The password for connecting to the database 18
Required Oracle properties
Property TNS name
Value The transparent network substrate (TNS) name identifier
Required DB2 properties
Property Database User name Password Database table
Value The database to which this data will be written The user name for connecting to the database The password for connecting to the database The name of the database table where this data will be written
Required MSSQL properties
Property Path to bcp utility
Value Path to the utility that copies data between Microsoft SQL Server and a data file. Click in this field to display an ellipsis, then click on the ellipsis to browse to the utility. The database to which this data will be written
Database
C. Click OK to save your changes and close the Edit component dialog. 8. Repeat the steps above as needed to edit each of the subgraphs in the sample graph. Refer to the Running a graph section below for information on how to run your graph once it is configured.
Configuring the extract_incremental_db.grf sample graph

The extract_incremental_db.grf must be edited to add database connection properties to the Reader, and configure appropriate attribute and audit record number parameters. In addition, this graph provides several Writers for writing the output; before running the graph you must select the Writer you want to use, configure a database connection for it, and remove the ones you will not use.
Parameters for incremental extraction

The parameters for the Transformers in this graph determine which attributes are read from the database, and a range of auditor record numbers which specify which records to extract data from. In a typical use case, you will configure the attributes one time (typically, within the graph itself) to determine which attributes are extracted, but the audit record range will typically be updated each time you run the graph. As an alternative to manually setting a range of auditor record numbers in the graph each time you run it, you can set up an automated, scheduled job to automatically update a parameter file with current values. See the Automatic graph execution section below for detailed information. 19
Using the Master Data Extract sample graphs Note: Each sample graph consists of several subgraphs or connected series of Readers, Transformers, and Writers, which operate in parallel when the graph is executed. The section below describes how to edit one of these subgraphs. You will need to repeat the steps below for each subgraph in your sample graph. To configure the extract_incremental_db.grf sample graph: 1. In the Navigator view, double-click the extract_incremental_db.grf sample graph to open it in the Graph editor.
20
Using the Master Data Extract sample graphs 3. Verify that the parameters for attributes and auditor record numbers are correct. Parameters are listed in the Outline view, in the Parameters node.
Note that parameters listed here apply to the graph as a whole, and are not edited for individual components. Note: In a typical use case, you will edit the attribute record parameters on a one-time basis as part of general graph configuration, but update the auditor record numbers each time you run the graph. You can use the madconfig utility to populate the auditor record number parameters via a scheduled job. See the Automatic graph execution section below for more information. 4. Using the Select tool from the Palette, select and delete each of the Writers you do not wish to use. Each subgraph includes the following Writer types; delete all but the type you wish to use:
Sample graph Writer types
Database or file type Oracle DB2 MSSQL
Name format oracle_data type db2_data type mssql_data type db2_ssn
Example oracle_name mssql_phone
Note: You can also disable a Writer by right-clicking the Writer and choosing Disable. 5. Using the Select tool, delete the Edge linking the Copy Transformer to your remaining Writer.
21
Using the Master Data Extract sample graphs 6. Use the Select tool to drag the Edge linking the Reformat Transformer to the Copy Transformer so that it connects the Reformat Transformer to the input port of your remaining Writer instead.
7. Delete the Copy Transformer. 8. Edit the Writer to connect to a database: A. Double-click the Writer to open the Edit component dialog. B. On the Properties tab, enter the relevant required properties according to the tables below. Required properties with missing values are flagged with a yellow exclamation-point icon.
Required Oracle Properties
Property Path to sqlldr utility User name Password TNS name
Value The path to Oracles SQL Loader (sqlldr) utility. The user name for connecting to the database The password for connecting to the database The transparent network substrate (TNS) name identifier
Required DB2 Properties
Property Database User name Password Database table
Value The database to which this data will be written The user name for connecting to the database The password for connecting to the database The name of the database table where this data will be written
Required MSSQL Properties
Property Path to bcp utility
Value Path to the utility that copies data between Microsoft SQL Server and a data file. Click in this field to display an ellipsis, then click on the ellipsis to browse to the utility. The database to which this data will be written 22
Database
C. Click OK to save your changes and close the Edit component dialog. 9. Repeat the steps above as needed to edit each of the subgraphs in the sample graph.
Configuring the extract_incremental_file.grf sample graph

The extract_incremental_file.grf must be edited to add database connection properties to the Reader, and configure appropriate attribute and audit record number parameters. Writers have been configured to write output to specified files; you can edit the Writer properties to edit the name and location if you wish, but further configuration of the Writer is not required.
Parameters for incremental extraction

The parameters for the Transformers in this graph determine which attributes are read from the database, and a range of auditor record numbers which specify which records to extract data from. In a typical use case, you will configure the attributes one time (typically, within the graph itself) to determine which attributes are extracted, but the audit record range will typically be updated each time you run the graph. As an alternative to manually setting a range of auditor record numbers in the graph each time you run it, you can set up an automated, scheduled job to automatically update a parameter file with current values. See the Automatic graph execution section below for detailed information. Note: Each sample graph consists of several subgraphs or connected series of Readers, Transformers, and Writers, which operate in parallel when the graph is executed. The section below describes how to edit one of these subgraphs. You will need to repeat the steps below for each subgraph in your sample graph.
23
Using the Master Data Extract sample graphs To configure the extract_incremental_file.grf sample graph: 1. In the Navigator view, double-click the extract_incremental_file.grf sample graph to open it in the Graph editor.
24
Using the Master Data Extract sample graphs 3. Verify that the parameters for attributes and auditor record numbers are correct. Parameters are listed in the Outline view, in the Parameters node.
Note that parameters listed here apply to the graph as a whole, and are not edited for individual components. Note: In a typical use case, you will edit the attribute record parameters on a one-time basis as part of general graph configuration, but update the auditor record numbers each time you run the graph. You can use the madconfig utility to populate the auditor record number parameters via a scheduled job. See the Automatic graph execution section below for more information. 4. Repeat the steps above as needed to edit each of the subgraphs in the sample graph.
Running a graph
To run a graph, click the Run icon in the toolbar, or choose Run > Run from the menu. When a graph is run, the number of records processed along each Edge is displayed. For detailed information about graph runtime options, refer to the Clover documentation.
Troubleshooting graphs
Use the following processes and tools to troubleshoot your graphs.
Debugging a graph Edge

To debug a graph Edge, right-click on the Edge and choose Debug > Enable Debug. A green bug icon is displayed on edges with debugging enabled. Debug information is captured when the graph is run. 25
Using the Master Data Extract sample graphs You can view debug data after the graph is run by right-clicking the edge and choosing Debug > View Data.
Viewing logs and error messages

Warning and error messages, processing information, and graph status are captured on the Console, Problems, CloverGraph tracking, and Clover Log views. Refer to the Clover documentation for detailed information about the contents of these tabs.
Automatic graph execution

Clover.ETL graphs can be executed automatically as part of a scheduled job, using the madconfig utility. You may wish to create an external properties file which the scheduled madconfig utility can reference, if you want to update parameters such as audit record number range when you run the scheduled job.
Using the madconfig utility to create a properties file for a scheduled job
Incremental extracts typically select data based on a range of audit record numbers which change each time the graph is run. Although you may manually set the range of record numbers to extract manually in the graph, it may be more practical to generate a properties file automatically via a scheduled job. The properties file then supplies the graph with the appropriate values for the record number range. This section describes how to use the madconfig utility to launch a graph using a designated, external properties file. You can set up a scheduled job to launch the madconfig utility on a regular basis. Note: It is outside the scope of this document to describe how to set up a scheduled job which generates the properties file. You can use a standard utility such as the Windows Task Scheduler or a Unix chron utility (or other methods) to set up a scheduled job.
Using madconfig to launch a graph using a specified properties file

This madconfig operation requires a properties file containing auditor record number files. To use madconfig to launch a Clover.ETL graph: 1. From a command prompt, run madconfig launch_etl Note: This utility is run from the <ROOTDIR>\Engine 8.0.0\scripts directory 2. When prompted, enter the path to the graph (*.grf file) you want to run. 3. When prompted, enter the path to your configuration file (that is, the file containing the properties for your graphs audit record number parameters). 4. When prompted, enter a memory size setting. 256 is the default. Note: Complete documentation of the madconfig utility is found in the Initiate Master Data Service Master Data Engine Installation Guide.
26
Recording responses to the madconfig utility

If you want to launch a graph via madconfig on a scheduled basis, you can record a set of responses to the madconfig utilitys prompts. To record a set of responses to the madconfig launch_etl function, run madconfig recordfile myfile.properties launch_etl where myfile.properties is the name of the file which will store your responses. Note: In addition to recording your responses, this command also executes the graph. To run madconfig using the recorded responses, run madconfig propertyfile myfile.properties lauch_etl, where myfile.properties is the name of the file where your responses are stored.
Using extract.ddl to create target database schema

An extract.ddl file is provided as a convenience for creating target database schema with the maddbx utility. For detailed information on how to use maddbx with a *.ddl file to create database schema, refer to the Master Data Engine Installer Guide. Note that the provided extract.ddl file references the schema used by the sample graphs in their original format. If you edit the graphs in a way which alters the metadata layout for the Writers, you must also edit the extract.ddl file before using it to create your target database schema.
27

8.0 Master Data Extract Guide - RevA

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

8.0 Master Data Extract Guide - RevA

Încărcat de

Drepturi de autor:

Formate disponibile

MASTER DATA EXTRACT GUIDE

About this manual

Additional reference documentation

How to get help

About this manual

Support Center Knowledge Base

Chapter 1: Master Data Extract overview

Master Data Extract overview

The Master Data Extract Sample Graphs

Chapter 2: Using the Master Data Extract sample graphs

Importing the sample graphs

Using the Master Data Extract sample graphs

Creating a database connection

Using the Master Data Extract sample graphs

Specifying a database connection for each Reader

Using the Master Data Extract sample graphs

Configuring the extract_full_all.grf sample graph

Database or file type Oracle DB2 MSSQL Delimited file

Example oracle_name mssql_phone delimited_addr

Property Path to sqlldr utility User name Password

Using the Master Data Extract sample graphs

Required Oracle properties

Property TNS name

Value The transparent network substrate (TNS) name identifier

Required DB2 properties

Property Database User name Password Database table

Required MSSQL properties

Property Path to bcp utility

Configuring the extract_incremental_db.grf sample graph

Parameters for incremental extraction

Database or file type Oracle DB2 MSSQL

Name format oracle_data type db2_data type mssql_data type db2_ssn

Example oracle_name mssql_phone

Property Path to sqlldr utility User name Password TNS name

Required DB2 Properties

Property Database User name Password Database table

Required MSSQL Properties

Property Path to bcp utility

Using the Master Data Extract sample graphs

Configuring the extract_incremental_file.grf sample graph

Parameters for incremental extraction

Debugging a graph Edge

Viewing logs and error messages

Automatic graph execution

Using madconfig to launch a graph using a specified properties file

Using the Master Data Extract sample graphs

Recording responses to the madconfig utility

Using extract.ddl to create target database schema

S-ar putea să vă placă și