Oracle DBA

My Mother Thinks I’m a DBA!
Cross-Platform, Multi-Vendor, Distributed Relational

Data Replication with IBM DB2 DataPropagator and IBM DataJoiner
Made Easy!
Olivier Bonnet, Simon Harris, Christian Lenke, Li Yan Zhou,

Thomas Groh
International Technical Support Organization
http://www.redbooks.ibm.com
SG24-5463-00
SG24-5463-00
International Technical Support Organization
My Mother Thinks I’m a DBA! Cross-Platform,

Multi-Vendor, Distributed Relational Data Replication
with IBM DB2 DataPropagator and IBM DataJoiner
Made Easy!
June 1999
Take Note!
Before using this information and the product it supports, be sure to read the general information in
Appendix G, “Special Notices” on page 393.
First Edition (June 1999)
This edition applies to Version 5.1 of IBM DB2 DataPropagator Relational Capture for MVS, 5655-A23,
Version 5.1 of IBM DB2 DataPropagator Relational Apply for MVS, 5655-A22, Version 5.1 of IBM DB2
DataPropagator Relational for AS/400, Version 5.2 of IBM DB2 Universal Database, and Version 2.1.1
of IBM DataJoiner, 5801-AAR
Comments may be addressed to:

IBM Corporation, International Technical Support Organization
Dept. QXXE Building 80-E2
650 Harry Road
San Jose, California 95120-6099
When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the
information in any way it believes appropriate without incurring any obligation to you.
© Copyright International Business Machines Corporation 1999. All rights reserved

Note to U.S Government Users – Documentation related to restricted rights – Use, duplication or disclosure is
subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp.
Contents
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xi
Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
The Team That Wrote This Redbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii
Comments Welcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx
Part 1. Heterogeneous Data Replication—General Discussion . . . . . . . . . . . . . . . . . . . 1
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Why Replication? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Why Multi-Vendor? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 How to Use this Book? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 The Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 The Practical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Technical Warm-Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.1 IBM DataPropagator—Architectural Overview . . . . . . . . . . . . . . . 7
1.4.2 Extending IBM Replication to a Non-IBM RDBMS. . . . . . . . . . . . . 9
1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 2. Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1 Organizing Your Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Gathering the Detailed Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 The Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 List of Questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Determining the Replication Sources and Replication Targets . . . . . . 18
2.4 Technical Planning Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.1 Estimating the Data Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.2 About CPU, Memory, and Network Sizing. . . . . . . . . . . . . . . . . . 24
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Chapter 3. System and Replication Design—Architecture . . . . . . . . . 27

3.1 Principles of Heterogeneous Replication . . . . . . . . . . . . . . . . . . . . . . 28
3.1.1 Extending DProp Replication to Multi-Vendor Replication . . . . . . 30
3.1.2 Overview of the Most Common Replication Architectures . . . . . . 33
3.2 System Design Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.1 Apply Program Placement: Pull or Push . . . . . . . . . . . . . . . . . . . 39
3.2.2 DataJoiner Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.3 Control Tables Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 Replication Design Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
© Copyright IBM Corp. 1999 iii

3.3.1 Target Table Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3.2 Replication Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.3 Using Blocking Factor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4 Performance Considerations for Capture Triggers . . . . . . . . . . . . . . . 55
3.4.1 Description of the Performance Test Setup. . . . . . . . . . . . . . . . . 56
3.4.2 Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.4.3 Conclusion of the Performance Test . . . . . . . . . . . . . . . . . . . . . . 58
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Chapter 4. General Implementation Guidelines . . . . . . . . . . . . . . . . . . 61

4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Components of a Heterogeneous Replication System . . . . . . . . . . . . 63
4.3 Setting up a Heterogeneous Replication System . . . . . . . . . . . . . . . . 63
4.3.1 The Implementation Checklist. . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3.2 How to Use the Implementation Checklist . . . . . . . . . . . . . . . . . . 65
4.4 Detailed Description of the Implementation Tasks . . . . . . . . . . . . . . . 65
4.4.1 Set Up the Database Middleware Server . . . . . . . . . . . . . . . . . . 65
4.4.2 Implement the Replication Subcomponents (Capture, Apply) . . . 71
4.4.3 Set Up the Replication Administration Workstation . . . . . . . . . . . 71
4.4.4 Create the Replication Control Tables . . . . . . . . . . . . . . . . . . . . 74
4.4.5 Bind DProp Capture and DProp Apply . . . . . . . . . . . . . . . . . . . . 74
4.4.6 Status After Implementing the System Design . . . . . . . . . . . . . . 76
4.5 Next Steps—Implementing the Replication Design . . . . . . . . . . . . . . . 77
4.5.1 Replication Design for Multi-Vendor Target Servers . . . . . . . . . . 77
4.5.2 Replication Design for Multi-Vendor Source Servers . . . . . . . . . 79
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Chapter 5. Replication Operation, Monitoring and Tuning . . . . . . . . . 83

5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2 Operating and Maintaining DProp Replication—Initial Tasks . . . . . . . 84
5.2.1 Initialization and Operation of the Capture Task . . . . . . . . . . . . . 85
5.2.2 Initialization of Replication Subscriptions . . . . . . . . . . . . . . . . . . 86
5.3 Operating and Maintaining DProp Replication—Repetitive Tasks . . . . 91
5.3.1 Database Related Housekeeping . . . . . . . . . . . . . . . . . . . . . . . . 91
5.3.2 Pruning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.3.3 Utility Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.4 Monitoring a Distributed Heterogeneous Replication System . . . . . . . 98
5.4.1 Components That Need Monitoring . . . . . . . . . . . . . . . . . . . . . . 98
5.4.2 DProp’s Open Monitoring Interface . . . . . . . . . . . . . . . . . . . . . . . 99
5.4.3 Monitoring Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.4.4 Monitoring Apply. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.4.5 Monitoring the Database Middleware Server . . . . . . . . . . . . . . 116
5.5 Tuning Replication Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
iv The IBM Data Replication Solution

5.5.1 Running Capture with the Appropriate Priority . . . . . . . . . . . . . 118
5.5.2 Adjusting Capture Tuning Parameters . . . . . . . . . . . . . . . . . . . 118
5.5.3 Using Separate Tablespaces . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.5.4 Choosing Appropriate Lock Rules. . . . . . . . . . . . . . . . . . . . . . . 121
5.5.5 Using the Proposed Change Data Indexes . . . . . . . . . . . . . . . . 121
5.5.6 Updating Database Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.5.7 Making Use of Subscription Sets . . . . . . . . . . . . . . . . . . . . . . . 122
5.5.8 Using Pull Rather Than Push Replication . . . . . . . . . . . . . . . . . 125
5.5.9 Using Multiple Apply Processes in Parallel . . . . . . . . . . . . . . . . 125
5.5.10 Using High Performance Full Refresh Techniques . . . . . . . . . 125
5.5.11 Using Memory Rather Than Disk for the Spill File . . . . . . . . . . 126
5.5.12 Enabling Block Fetch for Apply . . . . . . . . . . . . . . . . . . . . . . . . 126
5.5.13 Tuning Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.5.14 Optimizing Network Performance . . . . . . . . . . . . . . . . . . . . . . 128
5.5.15 DB2 for OS/390 Data Sharing Remarks . . . . . . . . . . . . . . . . . 128
5.6 Other Useful Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.6.1 Deactivating Subscription Sets . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.6.2 Selectively Preventing Automatic Full Refreshes . . . . . . . . . . . 129
5.6.3 Full Refresh on Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.6.4 Dropping Unnecessary Capture Triggers for Non-IBM Sources 133
5.6.5 Modifying Triggers for Non-IBM Sources . . . . . . . . . . . . . . . . . 134
5.6.6 Changing Apply Qualifier or Set Name for a Subscription Set . . 134
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Part 2. Heterogeneous Data Replication—Case Studies . . . . . . . . . . . . . . . . . . . . . . 137
Chapter 6. Case Study 1—Point of Sale Data Consolidation, Retail . 139

6.1 The Business Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.2 Architecting the Replication Solution . . . . . . . . . . . . . . . . . . . . . . . . 142
6.2.1 Data Consolidation—System Design . . . . . . . . . . . . . . . . . . . . 142
6.2.2 Data Consolidation—Replication Design. . . . . . . . . . . . . . . . . . 145
6.3 Setting Up the System Environment . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.3.1 The System Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.3.2 Configuration Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
6.4 Nice Side Effect: Using SPUFI to Access Multi-Vendor Data . . . . . . 158
6.5 Implementing the Replication Design . . . . . . . . . . . . . . . . . . . . . . . . 159
6.5.1 Registering the Replication Sources . . . . . . . . . . . . . . . . . . . . . 159
6.5.2 Preparation of the Target Site Union. . . . . . . . . . . . . . . . . . . . . 160
6.5.3 Defining Replication Subscriptions . . . . . . . . . . . . . . . . . . . . . . 161
6.5.4 Starting Apply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
6.5.5 Some Performance Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 163
6.6 Moving from Test to Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.7 Some Background on Replicating from Multi-Vendor Sources . . . . . 166
v
6.7.1 Using Triggers to Emulate Capture Functions. . . . . . . . . . . . . . 166
6.7.2 The Change Data Table for a Non-IBM Replication Source . . . 169
6.7.3 How Apply Replicates the Changes from Non-IBM Sources . . . 169
6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Chapter 7. Case Study 2—Product Data Distribution, Retail . . . . . . . 173

7.1.1 Source Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
7.1.2 Target Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
7.2.1 Data Distribution—System Design . . . . . . . . . . . . . . . . . . . . . . 177
7.2.2 Data Distribution—Replication Design . . . . . . . . . . . . . . . . . . . 181
7.4.1 Define DB2 for OS/390 as Replication Source . . . . . . . . . . . . . 192
7.4.2 Define Empty Subscription Sets . . . . . . . . . . . . . . . . . . . . . . . . 194
7.4.3 Create a Password File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
7.4.4 Add Members to the Subscription Sets . . . . . . . . . . . . . . . . . . . 196
7.4.5 Add Statements or Stored Procedures to Subscription Sets . . . 199
7.4.6 Start DProp Capture and Apply on the Host . . . . . . . . . . . . . . . 200
7.4.7 Start DProp Apply on the DataJoiner Server . . . . . . . . . . . . . . . 200
7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Chapter 8. Case Study 3—Feeding a Data Warehouse. . . . . . . . . . . . 203

8.1.1 Source Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
8.1.2 Target Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
8.2.1 Feeding a Data Warehouse—System Design . . . . . . . . . . . . . . 209
8.2.2 Feeding a Data Warehouse—Replication Design . . . . . . . . . . . 210
8.4.1 Defining the Subscription Set . . . . . . . . . . . . . . . . . . . . . . . . . . 219
8.4.2 Maintaining a Change History for Suppliers . . . . . . . . . . . . . . . 220
8.4.3 Using Target Site Views to Denormalize Outlet Information . . . 228
8.4.4 Using Source Site Joins to Denormalize Product Information . . 237
8.4.5 Using a CCD Target Table to Manage the Sales Facts . . . . . . . 245
8.4.6 Adding Temporal History Information to Target Tables . . . . . . . 250
8.4.7 Maintaining Aggregate Information . . . . . . . . . . . . . . . . . . . . . . 256
vi The IBM Data Replication Solution

8.4.8 Pushing Down the Replication Status to Oracle . . . . . . . . . . . . 259
8.4.9 Initial Load of Data into the Data Warehouse . . . . . . . . . . . . . . 261
8.5 A Star Join Example Against the Data Warehouse Target Tables. . . 270
8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
Chapter 9. Case Study 4—Sales Force Automation, Insurance. . . . . 271

9.1.1 Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
9.1.2 Comments about the Table Structures . . . . . . . . . . . . . . . . . . . 273
9.2 Update-Anywhere Replication versus Multi-Site
Update Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
9.3.1 Ms Jet Update-Anywhere Scenario—System Design . . . . . . . . 276
9.3.2 MS Jet Update Anywhere—Replication Design. . . . . . . . . . . . . 277
9.5.1 Creating Source Views to Enable Subsetting . . . . . . . . . . . . . . 287
9.5.2 Registering the Replication Sources . . . . . . . . . . . . . . . . . . . . . 289
9.5.3 Defining the Replication Subscriptions . . . . . . . . . . . . . . . . . . . 292
9.5.4 Focus on Major Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
9.6 Replication Results for Sales Representative 1 . . . . . . . . . . . . . . . . 301
9.6.1 Contents of the Source Tables at the Beginning . . . . . . . . . . . . 302
9.6.2 Contents of the Main Control Tables at the Beginning . . . . . . . 303
9.6.3 Start ASNJET to Perform the Initial Full-Refresh . . . . . . . . . . . 306
9.6.4 Results of the Initial Full-Refresh . . . . . . . . . . . . . . . . . . . . . . . 306
9.6.5 Replicating Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
9.7 Operational Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
9.7.1 Operational Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
9.7.2 Monitoring and Problem Determination . . . . . . . . . . . . . . . . . . . 316
9.8 Benefits of this Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
9.8.1 Other Configuration Options . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
9.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
Appendix A. Index to Data Replication Tips, Tricks, Techniques . . . 321
Appendix B. Non-IBM Database Stuff . . . . . . . . . . . . . . . . . . . . . . . . . . . 325

B.1 Oracle Stuff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
B.1.1 Configuring Oracle Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
B.1.2 Using Oracle’s SQL*Plus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
B.1.3 The Oracle Data Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
B.1.4 Oracle Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
B.1.5 Oracle Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
vii
B.1.6 Oracle Listener . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
B.1.7 Other Useful Oracle Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
B.1.8 More Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
B.2 Informix Stuff. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
B.2.1 Configuring Informix Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . 329
B.2.2 Using Informix’s dbaccess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
B.2.3 Informix Error Messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
B.3 Microsoft SQL Server Stuff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
B.3.1 Configuring Microsoft SQL Server Connectivity . . . . . . . . . . . . . . . 330
B.3.2 Using the Microsoft Client OSQL . . . . . . . . . . . . . . . . . . . . . . . . . . 331
B.3.3 Microsoft SQL Server Data Dictionary . . . . . . . . . . . . . . . . . . . . . . 331
B.3.4 Helpful SQL Server Stored Procedures . . . . . . . . . . . . . . . . . . . . . 332
B.3.5 Microsoft SQL Server Error Messages . . . . . . . . . . . . . . . . . . . . . . 332
B.3.6 Microsoft SQL Server Administration . . . . . . . . . . . . . . . . . . . . . . . 332
B.3.7 ODBCPing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
B.4 Sybase SQL Server Stuff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
B.4.1 Configuring Sybase SQL Server Connectivity . . . . . . . . . . . . . . . . 333
B.4.2 Using the Sybase Client isql . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
B.4.3 Sybase SQL Server Data Dictionary . . . . . . . . . . . . . . . . . . . . . . . 334
B.4.4 Helpful SQL Server Stored Procedures . . . . . . . . . . . . . . . . . . . . . 335
B.4.5 Sybase SQL Server Error Messages . . . . . . . . . . . . . . . . . . . . . . . 335
Appendix C. General Implementation Checklist . . . . . . . . . . . . . . . . . . 337
Appendix D. DJRA Generated SQL for Case Study 2. . . . . . . . . . . . . . 339

D.1 Define Replication Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
D.2 Create Empty Subscription Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
D.3 Add a Member to Subscription Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
D.4 Add Stored Procedure to Subscription Sets . . . . . . . . . . . . . . . . . . . . . . 345
Appendix E. DJRA Generated SQL for Case Study 3 . . . . . . . . . . . . . . 347

E.1 Output from Define the SALES_SET Subscription Set. . . . . . . . . . . . . . 347
E.2 Output from Register the Supplier Table . . . . . . . . . . . . . . . . . . . . . . . . 348
E.3 Output from Subscribe to the Supplier Table . . . . . . . . . . . . . . . . . . . . . 349
E.4 Output from Register the Store and Region Tables . . . . . . . . . . . . . . . . 351
E.5 Output from Subscribe to the Region Table . . . . . . . . . . . . . . . . . . . . . . 353
E.6 Output from Subscribe to the Store Table . . . . . . . . . . . . . . . . . . . . . . . 355
E.7 Output from Register the Items, ProdLine, and Brand Tables . . . . . . . . 357
E.8 Output from Register the Products View. . . . . . . . . . . . . . . . . . . . . . . . . 361
E.9 Output from Subscribe to the Products View . . . . . . . . . . . . . . . . . . . . . 362
E.10 Output from Register the Sales Table. . . . . . . . . . . . . . . . . . . . . . . . . . 365
viii The IBM Data Replication Solution

E.11 Output from Subscribe to the Sales Table . . . . . . . . . . . . . . . . . . . . . . 366
E.12 SQL After to Support Temporal Histories for Supplier Table . . . . . . . . 369
E.13 Maintain Base Aggregate Table from Change Aggregate Subscription 370
Appendix F. DJRA Generated SQL for Case Study 4 . . . . . . . . . . . . . . 381

F.1 Structures of the Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
F.2 SQL Script to Define the CONTRACTS Table as a Replication Source . 383
F.3 SQL Script to Define the VCONTRACTS View as a Replication Source 385
F.4 SQL Script to Create the CUST0001 Empty Subscription Set . . . . . . . . 386
F.5 SQL Script to Add a Member to the CONT0001 Empty
Subscription Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
Appendix G. Special Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
Appendix H. Related Publications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397

H.1 International Technical Support Organization Publications . . . . . . . . . . 397
H.2 Redbooks on CD-ROMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
H.3 Other Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
H.4 Hot Web Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
How to Get ITSO Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401

How IBM Employees Can Get ITSO Redbooks . . . . . . . . . . . . . . . . . . . . . . . 401
How Customers Can Get ITSO Redbooks. . . . . . . . . . . . . . . . . . . . . . . . . . . 402
IBM Redbook Order Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
List of Abbreviations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
ITSO REDBOOK EVALUATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
ix
x The IBM Data Replication Solution
Figures
1. Part 1 - Structural Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2. Components of a Replication System Using IBM DProp . . . . . . . . . . . . . . . 9
3. Extending DProp Replication Through Nickname Technology . . . . . . . . . 10
4. Reposition Yourself: Chapter 2—Overview . . . . . . . . . . . . . . . . . . . . . . . . 13
6. Replication to a Non-IBM Target (Components and Placement). . . . . . . . 34
7. Replication from a Non-IBM Source (Components and Placement) . . . . . 35
8. Replication Both Ways Between DB2 and a Non-IBM Database . . . . . . . 37
9. DJRA and DataJoiner Database Connectivity . . . . . . . . . . . . . . . . . . . . . . 39
10. DataJoiner Placement—Data Distribution to Non-IBM Targets . . . . . . . . . 42
11. DataJoiner Placement—Data Consolidation from Non-IBM Sources . . . . 45
12. Why One DataJoiner Database for Each Non-IBM Source Server? . . . . . 46
13. Apply and Control Server Placement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
14. Performance Analysis for Trigger Based Change Capture . . . . . . . . . . . . 58
16. Define a Non-IBM Table as a Replication Target . . . . . . . . . . . . . . . . . . . 78
17. Define a Non-IBM Table as a Replication Source . . . . . . . . . . . . . . . . . . . 80
19. Initial Handshake between Capture and Apply . . . . . . . . . . . . . . . . . . . . . 88
20. Apply Cycle at a Glance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
21. Case Study 1—High Level System Architecture . . . . . . . . . . . . . . . . . . . 140
22. Major Information Flows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
23. Target Site UNION Example—Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . 146
24. Add a Computed Column to a Subscription. . . . . . . . . . . . . . . . . . . . . . . 148
25. Case Study 1—Test System Topology . . . . . . . . . . . . . . . . . . . . . . . . . . 151
26. Replication Performance Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
27. Example of an Oracle Change Capture Trigger (Insert Trigger) . . . . . . . 168
28. Replication from a Multi-Vendor Source Table . . . . . . . . . . . . . . . . . . . . 170
29. Case Study 2—High Level System Architecture . . . . . . . . . . . . . . . . . . . 174
30. Partial Data Model for the Retail Company Headquarters. . . . . . . . . . . . 176
31. Partial Data Model for a Branch of the Retail Company . . . . . . . . . . . . . 177
32. Data Distribution with Read-Only Target Tables . . . . . . . . . . . . . . . . . . . 178
33. Replicating to Non-IBM Target Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . 179
34. One DataJoiner Connected to Multiple Store Servers . . . . . . . . . . . . . . . 180
35. One DataJoiner for Each Branch Office. . . . . . . . . . . . . . . . . . . . . . . . . . 180
36. Replication of the Product Information. . . . . . . . . . . . . . . . . . . . . . . . . . . 182
37. Three-Tier Replication Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
38. Case Study 2—System Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
39. Configure Microsoft SQL Server Client Connectivity . . . . . . . . . . . . . . . . 189
40. Creating Replication Control Tables with DJRA . . . . . . . . . . . . . . . . . . . 191
© Copyright IBM Corp. 1999 xi

41. Register Table ITEMS as a Replication Source. . . . . . . . . . . . . . . . . . . . 193
42. Register Views as Replication Sources . . . . . . . . . . . . . . . . . . . . . . . . . . 194
43. Use DJRA to Create Empty Subscription Set. . . . . . . . . . . . . . . . . . . . . . 195
44. Create Empty Subscription Sets for CCDs . . . . . . . . . . . . . . . . . . . . . . . 196
45. Add a Member to Subscription Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
46. Data Subsetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
47. Add Stored Procedure to Subscription Sets . . . . . . . . . . . . . . . . . . . . . . 200
48. The Business Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
49. Data Model Diagram of Source Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
50. Data Model Diagram of Target Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
52. Create the SALES_SET Subscription Set . . . . . . . . . . . . . . . . . . . . . . . . 219
53. Transformation of Supplier Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
54. Define Supplier Table as a Replication Source . . . . . . . . . . . . . . . . . . . 222
55. Subscription Definition for Supplier Table . . . . . . . . . . . . . . . . . . . . . . . . 224
56. CCD Table Attributes for Supplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
57. Transformation of Store and Region into Outlets. . . . . . . . . . . . . . . . . . . 230
58. Registration of Store and Region Tables . . . . . . . . . . . . . . . . . . . . . . . . . 232
59. Defining the Region Subscription Member . . . . . . . . . . . . . . . . . . . . . . . 233
60. Subscription Definition for Store Table . . . . . . . . . . . . . . . . . . . . . . . . . . 234
61. Transformation of Items, ProdLine and Brand . . . . . . . . . . . . . . . . . . . . . 238
62. Defining Multiple Base Tables as Replication Sources . . . . . . . . . . . . . . 239
63. Defining a DB2 View as a Replication Source . . . . . . . . . . . . . . . . . . . . 241
64. Add Products View Subscription Definition . . . . . . . . . . . . . . . . . . . . . . . 243
65. Transformation of Sales. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
66. Registration Definition for Sales Table. . . . . . . . . . . . . . . . . . . . . . . . . . . 247
67. Subscription Definition for Sales Table . . . . . . . . . . . . . . . . . . . . . . . . . . 249
68. Adding the SQL After to Support Temporal Histories . . . . . . . . . . . . . . . 253
69. Maintain Base Aggregate Table from Change Aggregate Subscription . 258
70. Data Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
72. General Implementation Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
73. DB2 DataJoiner Replication Administration Main Panel . . . . . . . . . . . . . 285
74. Create Replication Control Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
75. Define One Table as a Replication Source . . . . . . . . . . . . . . . . . . . . . . . 289
76. Define the CONTRACTS Table as a Replication Source . . . . . . . . . . . . 290
77. Define DB2 Views as Replication Sources . . . . . . . . . . . . . . . . . . . . . . . 291
78. Define DB2 Views as Replication Sources - Continued... . . . . . . . . . . . . 292
79. Create Empty Subscription Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
80. Create Empty Subscription Sets - Continued ... . . . . . . . . . . . . . . . . . . . 295
81. Add a Member in Subscription Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
82. Add a Member in Subscription Sets - Continued... . . . . . . . . . . . . . . . . . 297
83. Microsoft Access Databases Created by ASNJET . . . . . . . . . . . . . . . . . 309
xii The IBM Data Replication Solution

84. Tables in the Target Database DBSR0001 . . . . . . . . . . . . . . . . . . . . . . . 310
85. Content of the CONTRACTS Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
86. Content of the Conflict Table for Contracts . . . . . . . . . . . . . . . . . . . . . . . 314
xiii
xiv The IBM Data Replication Solution
Tables
1. Available Replication Features in a Heterogeneous Environment. . . . . . . 31

2. Determining the Status of Subscription Sets . . . . . . . . . . . . . . . . . . . . . . 107
3. Timestamp Information Available from the Subscription Set Table . . . . . 108
4. Number of Connections Needed to Fulfill Replication Task. . . . . . . . . . . 124
5. Informix Instances used in this Case Study . . . . . . . . . . . . . . . . . . . . . . . 152
6. Subscription Set Characteristics for the Data Consolidation Approach . . 162
7. Source Data Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
8. Target Data Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
9. Attributes of Supplier Target Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
10. ’Attributes of Store and Region Target Tables. . . . . . . . . . . . . . . . . . . . . 229
11. Replication Attributes of Items, ProdLine and Brand Tables . . . . . . . . . . 237
12. Replication Attributes of Sales Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
13. Index to Data Replication Tips, Tricks, and Techniques . . . . . . . . . . . . . 321
14. Useful Oracle Data Dictionary Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
15. Invocation Parameters for OSQL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
16. Useful SQL Server Data Dictionary Tables . . . . . . . . . . . . . . . . . . . . . . . 332
17. Microsoft SQL Server Stored Procedures . . . . . . . . . . . . . . . . . . . . . . . . 332
18. Invocation Parameters for isql . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
19. Useful SQL Server Data Dictionary Tables . . . . . . . . . . . . . . . . . . . . . . . 334
20. Microsoft SQL Server Stored Procedures . . . . . . . . . . . . . . . . . . . . . . . . 335
© Copyright IBM Corp. 1999 xv

xvi The IBM Data Replication Solution
Preface
DB2 DataPropagator (DProp) is IBM’s strategic data replication solution. As a

tightly integrated component of IBM’s DB2 Universal Database (UDB)
products, IBM DProp enables cross-platform data replication among all
members of the DB2 UDB family. DProp is a separately orderable feature for
DB2 for OS/390, DB2 for AS/400, DB2 for VM or VSE.
In combination with other IBM products, such as IBM DataPropagator

NonRelational (DPropNR), DataRefresher, or IBM DataJoiner, IBM DProp
easily integrates non-relational data as well as data stored in non-IBM
relational database systems into an enterprise-wide distributed multi-platform
replication scenario.
In this redbook we focus on how IBM’s data replication solution is extended to

non-IBM relational database systems. IBM DataPropagator and IBM
DataJoiner will be used to integrate non-IBM relational databases, such as
Informix, Oracle, Sybase Open Server or Microsoft SQL Server, into IBM’s
enterprise-wide replication solution. Additionally, we will demonstrate how
mobile clients using Microsoft Access or other Microsoft Jet databases as a
data store can be supplied with data maintained on central DB2 database
servers.
To make the redbook most useful to support both the design and the
implementation phase of a heterogeneous replication project, the book
covers general guidelines and specific case studies separately.
First, we discuss general implementation options that are available to exploit

the flexibility of IBM’s data replication solution. We present guidelines to help
you specify your business driven requirements as well as to adjust available
replication options and advanced techniques to those business requirements.
General setup and maintenance tasks that basically apply to all kinds of
heterogeneous data replication projects are covered.
Next, we introduce practical case studies, showing how the previously

discussed options closely relate to exemplary business requirements. The
case studies are meant to be a reference for technically oriented project
members whose duty it is to set up distributed, cross-platform, multi-vendor
relational database systems. Implementation checklists are provided as well
as recommendations for using some of the advanced replication options IBM
DProp offers. To show the great variety of available options, each case study
deals with a different business problem. Furthermore, in each case study we
© Copyright IBM Corp. 1999 xvii

focus on a different non-IBM database system, either used as source for data
replication or as replication target.
In the case studies, the DB2 databases are either DB2 for OS/390 databases
or DB2 UDB for Windows NT databases, but the provided guidelines are also
applicable as well for any other member of the DB2 family. These include:
DB2 for AS/400, DB2 for UDB on UNIX platforms, DB2 UDB for OS/2, and
DB2 for VM/VSE.
Replication between relational databases and non-relational databases

(Lotus Notes databases, IMS hierarchical databases, VSAM files) is not a
topic of this redbook. But solutions exist:
• Lotus Notes Pump (used in conjunction with DProp), for Lotus Notes
databases
• DPropNR (used in conjunction with DProp), for IMS hierarchical
databases
• DataRefresher, possibly in conjunction with Data Difference Utility (DDU)
for IMS databases and VSAM files
The Team That Wrote This Redbook

This redbook was produced by a team of specialists from around the world
working at the International Technical Support Organization San Jose Center.
Thomas Groh is a Data Management and Business Intelligence Specialist at

the International Technical Support Organization, San Jose Center. He writes
extensively and teaches IBM classes worldwide on all areas of Data
Management and Data Warehousing. Before joining the ITSO in 1998,
Thomas worked for IBM Professional Services in Vienna, Austria as an IT
Architect. His technical background includes end-to-end design for traditional
online transaction processing systems in mainframe and client/server
environments, as well as designing, building, and managing data warehouse
solutions ranging from small data marts to very large database
implementations using massive parallel processing platforms. Thomas Groh
managed the project that delivered this redbook.
Olivier Bonnet is a Data Replication specialist in France. He provides both

professional and support defect services for DProp and DataJoiner. He has
three years of experience in the data replication field and has been working at
IBM for thirteen years. He gained his DProp and DataJoiner experience from
numerous customer projects on all DB2 platforms. Before this data replication
experience, he was a project specialist in the AS/400 area for four years, and
xviii The IBM Data Replication Solution

an application developer in the OS/390 environment for six years. He holds a
high-school engineer diploma (Institut Industriel du Nord).
Simon Harris is a Data Management pre-sales technical support specialist in

Europe. He has nine years of experience in the Data Management field and
holds a degree in Computing Science from Aston University, UK. His areas of
expertise include DataJoiner, DB2 UDB, and now DProp. He is an IBM
Certified Advanced Technical Expert in DB2 UDB.
Christian Lenke is a Data Management Services Professional in Germany.

He started to work on cross-platform data replication four years ago, in the
early days of DProp V 1.2.1. He holds a Dipl. Wirtschaftsinformatiker from the
University of Essen, Germany. He gained his experience on multi-vendor
relational database applications in numerous customer projects, integrating
DB2, Oracle, Informix and Sybase databases. Christian has really enjoyed
being a member of this team, and writing on this hot topic.
Li Yan Zhou is a Data Management Specialist in Beijing, China. She has

three years of experience in supporting Data Management products in the
IBM Software Solution Division.
Thanks to the following people for their invaluable contributions to this

project:
Rob Goldring
IBM Santa Teresa Lab
Madhu Kochar
Micks Purnell
Kathy Kwong
Bob Haimovits
International Technical Support Organization, Poughkeepsie
Vasilis Karras
International Technical Support Organization, Poughkeepsie
xix
Comments Welcome
Your comments are important to us!
We want our redbooks to be as helpful as possible. Please send us your

comments about this or other redbooks in one of the following ways:
• Fax the evaluation form found in “ITSO REDBOOK EVALUATION” on
page 411 to the fax number shown on the form.
• Use the electronic evaluation form found on the Redbooks Web sites:
For Internet users http://www.redbooks.ibm.com
For IBM Intranet users http://w3.itso.ibm.com
• Send us a note at the following address:
redbook@us.ibm.com
xx The IBM Data Replication Solution

Part 1. Heterogeneous Data Replication—General Discussion
© Copyright IBM Corp. 1999 1

2 The IBM Data Replication Solution
Chapter 1. Introduction
Congratulations for choosing IBM’s replication solution for your

enterprise-wide relational data replication system. Good choice!
To make your first approach to enterprise-wide data replication a real

success, this redbook will help you leap right into the world of cross-platform,
multi-vendor, high-performance relational data replication, using IBM’s
extended database middleware IBM DB2 DataPropagator (DProp) and IBM
DataJoiner.
Remark: Throughout this book, the term "multi-vendor" is used as a synonym

for "non-IBM".
1.1 Why Replication?

Today’s businesses are dependent on information and applications to
manipulate this information. Providing the necessary data to the users is
therefore a constant and major requirement. Generally, the data that makes
up the information exists in the enterprises’ databases. These databases are
either centralized (central host-based databases) or decentralized (either
department-level databases or LAN-based databases). The major issue is to
give the customers access to the right data, at the right time, and at an
optimized cost.
Two different approaches to data access can be chosen:

• Remote access to data
• Moving the data close to the user, which means replicating the data
Each approach has advantages and drawbacks, and corresponds to different

uses:
• Remote access fits better in environments where no latency is allowed:
The users need the data to be absolutely current. This approach requires
a highly available and high-performance network.
• Replication enables the users to have local copies of the data, so that they
can use the data with local applications without needing a permanent
connection to the enterprise’s databases.
Whether you choose remote access or replication will be determined primarily

by the acceptable or desired latency, and the ability of the network to

accommodate the data transfer requirements. Sometimes, the best solution
could also be a combination of the two approaches.
Basically, the most common uses of data replication are the following:
• Data distribution from one source database towards many target
databases.
• Feeding a data warehouse from a production database, utilizing the data
manipulation functions provided by the replication product (DProp). The
replicated data can, for example, be enhanced, aggregated, and/or
histories can be built.
• Data consolidation from several source databases towards one target
database.
1.2 Why Multi-Vendor?

Today’s IT environments have become very complex and diverse. Legacy
applications exist along with client/server applications, packaged applications
that solve special business problems, and Web-based applications—each of
them managing or accessing data organized in different databases on
different platforms. Mergers and acquisitions of companies have also
contributed to the diversity of the typical computing environments we face
today.
So, databases and applications from multiple different vendors on different

hardware platforms have to co-exist and communicate. Their integration into
a consolidated enterprise-wide database architecture is one of the biggest
challenges the relational database technology has to face. With DataJoiner,
both remote access to heterogeneous data sources and replication (in
conjunction with DProp) between heterogeneous databases can be
implemented.
1.3 How to Use this Book?

Designing, implementing, and operating a cross-platform, multi-vendor data
replication system is not a trivial task. You need a great variety of skills:
• Replication skills (good knowledge of DProp and DataJoiner, as well as all
the replication issues and techniques)
• Heterogeneous connectivity skills
• Database skills of course, for your different types of database servers
• Systems skills on the various platforms

• Network skills
• Knowledge of the applications and how they manipulate the data
The DProp reference documentation provides a lot of useful information

about how to implement a replication system in a DB2-only environment. The
DataJoiner reference documentation explains how DB2 client applications
can access non-IBM databases such as Oracle, Informix, Microsoft SQL
Server, and Sybase.
But this redbook is the first publication that fully explains how you can
combine DProp and DataJoiner to implement a heterogeneous data
replication system.
Of course, this book does not cover all the areas listed above. It will provide
you with the guidelines and recommendations that you should follow during
your heterogeneous data replication project. All the steps are detailed, and
the book also gives you detailed examples for the setup of the most
frequently used replication configurations.
After you have read this book, you will be on your way to becoming a
replication specialist and ready for practical hands-on experience. You will
know how to handle your project, what you can expect from your replication
system, how you can implement a test system, and which steps you should
follow. Then you will need to get familiar with DProp and DataJoiner, and
probably re-read some parts of this book, before you move to production.
1.3.1 The Theory

The architecture of the first part of this book mirrors the phases of your
heterogeneous data replication project: replication planning; replication
design; general implementation guidelines; and operation, monitoring, and
tuning.
This is illustrated by Figure 1, which represents the structure of both the first
part of the book and your project. To keep the figure simple, iterations, which
are of course possible, are not displayed.
Introduction 5
Approaching
Data Replication
Replication General Operation,

Replication
Planning Implementation Monitoring &
Design
Guidelines Tuning
Figure 1. Part 1 - Structural Overview
Each phase is fully described in a separate chapter in the first part of the
book. Each chapter gives all the guidelines and recommendations that should
be followed to successfully achieve the objectives of the corresponding
phase.
To help you re-position yourself at the beginning of each chapter, the figure
above is reproduced and detailed.
Replication Planning (Chapter 2): The planning phase of a replication

project is intended to gather all the business requirements that the replication
system will have to fulfill, and precisely determine how the target tables will
be derived from the source tables.
Replication Design (Chapter 3): You will then define the technical design of
the replication system, choosing the placement of the middleware
components. In this chapter we will provide you with the necessary
background information to help you choose between the many
implementation options offered by the IBM replication solution, so that the
system will fulfill your business requirement.
General Implementation Guidelines (Chapter 4): To assist you with the

actual implementation of the replication system, we developed a generally
applicable implementation checklist by identifying, sequencing, and
describing all the setup activities that are necessary before the actual data
replication subsystems can be started.

Operation, Monitoring, and Tuning (Chapter 5): This chapter contains all
the detailed operational information that you should take into account before
moving the multi-platform replication solution to your production environment.
1.3.2 The Practical Examples

To deliver the biggest practical advantage, the guidelines and
recommendations that we elaborated in Part 1 of this book are detailed in
Part 2 in four separate case studies. To demonstrate the great variety of
available solutions, each case study deals with a different business scenario:
1. Consolidating sales data (Chapter 6)
2. Distributing product data (Chapter 7)
3. Feeding a data warehouse using advanced replication techniques
(Chapter 8)
4. Using update-anywhere in a mobile computing environment (Chapter 9)
Additionally, each case study integrates a different multi-vendor database

system into IBM’s replication solution: Informix, Oracle, Microsoft SQL
Server, Microsoft Access.
Before we jump into the phases of a replication project, let us have a look at
the technical warm-up below. It explains the basic DProp and DataJoiner
concepts that you will need to know to fully understand the contents of the
next chapters.
1.4 Technical Warm-Up

IBM DB2 DataPropagator (DProp) and IBM DataJoiner are the central
components of IBM’s cross-platform replication solution. We will use these
introductory sections to achieve a common understanding of broadly used
technical expressions and to name and understand DProp’s distributed
COMPONENTS.
1.4.1 IBM DataPropagator—Architectural Overview

From a technical point of view, the three main activities involved when
replicating database changes from a set of source tables to a set of target
tables are:
• Setting up the replication system
• Capturing changes at the source database and store them into staging
tables
Introduction 7
• Apply database changes from the staging tables to the target databases
IBM DProp provides components to implement these main activities:

• The Capture component asynchronously captures changes to database
tables by reading the database log or journal. It places the captured
changes into change data tables, also referred to as staging tables.
• The Apply component reads the staging tables and applies the changes to
the target tables.
• The Administration component generates Data Definition Language (DDL)
and Data Manipulation Language (DML) statements to configure both
Capture and Apply. The two main tasks are to define replication sources
(also referred to as registrations) and to create replication subscriptions.
Replication sources are defined to limit the change capture activity to only
those tables that are going to be replicated. Replication subscriptions
contain all the settings the Apply program uses when replicating the
change data to the target tables. To set up homogeneous replication
between DB2 database systems, either the DB2 Control Center or the
DataJoiner Replication Administration can be used. The setup of
multi-vendor replication configurations (replication configurations where
non-IBM relational databases are involved, either as sources or as
targets) is only supported by the DataJoiner Replication Administration
product (DJRA), which is included in DataJoiner.
Basically, the three components operate independently and asynchronously

to minimize the impact of replication on your applications and online
transaction processing (OLTP) systems. The only interface between the
different components of the replication system is a set of relational tables, the
DProp control tables. The administration component feeds these control
tables when you define the replication sources and the replication targets.
The runtime components (Capture and Apply) read the control tables to find
out what they have to do. They also update the control tables to report
progress and synchronize their activities.
The basic principles of DProp are illustrated by Figure 2 on page 9:

SOURCE TARGET
CONTROL
APPLY
BASE
TARGET
UNIT OF WORK
TARGET
CHANGE DATA
TARGET
LOG
CONTROL
CAPTURE ADMINISTRATION
DProp Capture captures change data from the DB2 log.

DProp Apply replicates change data to the target tables.
Control tables are used to store the replication configuration.
The Administration component is used to feed the control tables.
Figure 2. Components of a Replication System Using IBM DProp
1.4.2 Extending IBM Replication to a Non-IBM RDBMS

IBM’s strategic solution to enable the transparent access to non-IBM
relational data is IBM DataJoiner, IBM’s database middleware gateway. The
DataJoiner server enables all kinds of DB2 clients, either DRDA clients, such
as DB2 for OS/390 systems, or DB2 LAN clients, such as UNIX or Windows
applications, to transparently access back-end data sources (DB2 or
multi-vendor databases).
Using DataJoiner in conjunction with DProp, the IBM replication solution is

extended to multi-vendor relational database systems, such as Oracle,
Informix Dynamic Server, Sybase SQL Server or Microsoft SQL Server. Since
DB2 applications can transparently access multi-vendor databases through
DataJoiner, DProp can also access those database systems, to apply the
changes to multi-vendor platforms, or to use those non-IBM systems as
sources for replication.
To achieve the transparent access to non-IBM relational data, DataJoiner

uses a nickname technology to reference the physical database objects (such
as tables or stored procedures) that are stored in the remote data sources.
Introduction 9
The nicknames are created within the DataJoiner database. Once nicknames
are in place, every DB2 client application, such as DProp Apply, can
transparently access (read, write, or execute) the referenced database
objects by simply accessing the nicknames.
Client applications cannot distinguish if the accessed or invoked database

object is locally stored in the DataJoiner database or just referenced by a
nickname and physically stored within a back-end database.
Figure 3 on page 10 illustrates the nickname technology.
Nickname 1 Table 1
DB2 Client
Application
Nickname n Table n
Stored Procedure Stored Procedure

Nickname
DataJoiner Multi-Vendor
Database Database
Figure 3. Extending DProp Replication Through Nickname Technology
To connect to non-IBM database systems, DataJoiner uses the native client

software provided by the non-IBM databases. That means, an Oracle server
is accessed using a SQL*Net client or Net8, an Informix server is accessed
using ESQL/C, a Sybase server is accessed using dblib or ctlib and a
Microsoft SQL Server is accessed using ODBC.
It is when the connectivity to the back-end data source has been established,
that nicknames can be created to reference database objects, such as tables,
stored procedures, user defined types or user defined functions.
Change capture, which for DB2 systems is achieved by reading the DB2 log,
is achieved by using native triggers for all the supported non-IBM data
sources. When a non-IBM table is registered for change replication, all the
necessary triggers or stored procedures are automatically generated by the
replication administration component (DataJoiner Replication Administration).

1.5 Summary
The purpose of this introduction was to briefly explain how the book is
organized and how it should be used.
It also gave you a technical warm-up, introducing the basic concepts of

heterogeneous replication using DProp and DataJoiner.
Introduction 11
Chapter 2. Planning
The first thing you have to do when you begin studying your heterogeneous
replication system is .... Think !
You have to prepare yourself to find out about the details of the business
requirements that make you consider replication and details about your data.
Do not go into the technical details too soon. After you have organized your
project (just as any other project: staff the project, train the people, define the
project plan) you must first clearly determine what the business requirements
are, that is, what your users really need (which kind of data is needed, when
is it needed, and for which purposes?)
In this chapter, of course, we will not answer these questions. They depend
on your specific business. But we will help you determine which questions
you should ask the users to gather the business requirements, and focus on
the general topics you should study before you move to the implementation
phase of the project. This is illustrated by Figure 4:
Approaching
Data Replication
General Operation,
Replication Replication
Implementation Monitoring &
Planning Design
Guidelines Tuning
Organizing Gathering Determining Technical

Your Business Replication Planning
Project Requirements Sources & Targets Considerations
Figure 4. Reposition Yourself: Chapter 2—Overview

At the end of this phase, you will have drawn a high-level global picture of
your replication sources and replication targets, showing how the replication
system will provide answers to the business requirements. Technical
considerations have only minor importance during this phase. You will study
the technical replication considerations later.
At the end of this phase, you will also have written a document detailing the
list of all the target tables, with all the columns, and the correlation between
the target and the source data. Tables structures, column name mappings,
and data types will have to be described.
Remark: You certainly already have a general idea of what you want to build
and why (for example, a datawarehouse for decision support, or a data
distribution system). So perhaps you already started the business
requirements gathering and analysis before you started reading this book
(because you did not decide to implement a heterogeneous replication
system just for the pleasure of having one, and the need to have such a
replication system is probably not recent). If this is the case, use the present
chapter to verify that you did not forget important things.
2.1 Organizing Your Project

The implementation of a heterogeneous replication system is a project in
itself. As for any other project, the planning phase includes organizational
activities:
• Identify project sponsor
• Define project scope
• Develop work plan and procedures (do not forget to plan for reviews and
testing)
• Assign resources and assets to the project
• Evaluate and schedule the activities
• Train the project team
These activities are not detailed in this book, because they are not specific to
this kind of project, but we wanted to remind you not to underestimate their
importance and the associated workload.
Recommendations for the project staffing:
You will need at least the following skills:

• A data replication professional with the following set of skills:

• DProp and DataJoiner
• Relational database skills, especially DB2
• Basic knowledge of the multi-vendor databases (for example, Oracle,
Informix, Microsoft SQL Server)
• Knowledge of the corresponding client access software of the involved
databases
• Platform skills (UNIX, Windows NT, OS/390, AS/400, depending on
your IT environment)
• Network and connectivity skills (TCP/IP, SNA)
• Project management skills
• Database administrator skills for each database product involved, with all
the necessary access authorizations. The database administrator(s) must
have a good knowledge of:
• The table structures:
• Of the source tables
• Of the target tables
• The applications that use the tables:
• Applications that update the source tables
• Applications that use the target tables
• The SQL and utilities that the database servers provide:
• The database servers on the source side
• The database servers on the target side
• Application specialists for applications on both the source and target side
• System specialists for the different platforms involved
• Network specialists
Depending on the size of your company and the project scope, each role
described above can be covered by one or more persons, or several roles
can be covered by a single person.
You will also have to involve the users in the project, as early as possible (you
will need them during the business requirements definition, and then during
the test phase).
Planning 15
2.2 Gathering the Detailed Requirements
You must determine with the users, in detail, what their data needs are, and
how the data is going to be used. Be sure to get information on the uses and
business needs for the replicated data from all people who are important to
the success of the project (that is, the users of the replicated data, the
management of the department that needs the replicated data, other staff and
management who have any interest in the data being replicated).
2.2.1 The Approach

To gather the detailed requirements, you will of course have to organize
meetings with the users, and it is important that you get them involved in the
project as early as possible. Do not forget that you will need the users to
actively test the new system once you have built it, and that it is the users’
commitment to the project that will determine its success.
The list of questions below can help you prepare the for user interviews.
You must also have a deep knowledge of the current data and applications,
because you will need to determine how the future tables will relate to the
existing ones. Perhaps some new tables will have to be created, or existing
ones reorganized, or joined together.
So you will need to review the application documentation and/or interview the
programmers or software providers.
2.2.2 List of Questions

The first user interviews, and your own knowledge of the existing data and
applications, should help you answer the following questions:
• Will the users need current data, or will they be able to work with data that
is not current up to the second?
• This is a very important question, because DProp is an asynchronous
replication product. If the users need to always use the most recent
data, perhaps you should consider other approaches, such as remote
access or distributed transaction programming.
• In particular, if the overall purpose of your project is to build a hot site
recovery system, then asynchronous replication is probably not the
correct approach.
• What is an acceptable level of latency for the replicated data? One hour,
one day, one week? (This should be identified for each of the target tables
individually.)

• Is all the data already present in the existing tables?
• If not, are you able to create additional data, or to derive the new data
from existing columns?
• If yes, perhaps the data exists but it is present in several separate
tables, and these tables are perhaps even located in different
databases.
• Will the users need to be able to update the replicated data, or will they
only need to read the data?
• If the users must be able to update the replicated data (to create new
customer orders for example), is it a problem if someone else updates
another copy of the data (same row), elsewhere, at the same time? If it
is a problem, how can you prevent such a conflict from occurring, or
how can you deal with update conflicts if you cannot prevent them? In
case a conflict occurs, who should win the conflict?
• Do you intend to have referential integrity constraints between the
target tables?
• Will all the users need the same data, or will they need different subsets of
the data?
• If they need different subsets of rows, do you have a subsetting
criterion present in all the tables (department number for example)? Is
the subsetting criterion contained in a column that can be updated in
place by an existing application? Will some users need different
subsets of columns?
• Where will the users be located? How many different geographical
locations will there be?
• How will the users access the data?
• Will they use laptops with local applications?
• How will they connect to the corporate network (communications
links)? How often?
• How much data has to be replicated? What is the level of volatility (how
many updates, inserts, deletes per hour or per day)?
• Are there periods when insert/update/delete activity on source tables is
more frequent, such as during batch processing?
• Is the distributed database application homogeneous (that means, all your
database servers are of the same type), or are you dealing with a
multi-vendor environment, replicating data between databases of different
vendors and across different system platforms?
Planning 17
• Will the users need history information that is not present in the corporate
data?
• Are there special auditing needs?
• Is there a need of retaining the values of the columns before the record
was changed in the tables that the users will use (before-images of
columns)?
• Are there some complex data manipulations that must be done on the data
before the users use the data?
• Are the existing tables normalized, and do you always follow the relational
model recommendations (no update of primary keys in particular)?
• Will the headquarters need consolidated data from geographically
dispersed data?
• Are there special filtering needs such as: Propagate the inserts and
updates but not the deletes?
Remark: In this book you will find implementation examples for nearly all the
data replication requirements listed above. Table 1 on page 31 provides a list
of the most important replication features and tells you where you can find the
examples.
Once you have answers to these questions, you know the business
requirements and you have a more precise idea of what you will easily be
able to provide (when you have all the requested data already available) and
what will be more difficult to provide (when you do not have the requested
data available!)
You must, of course, consolidate and sort all the information that you have
collected, and probably solve some conflicts (some users have contradictory
requirements).
Then you can move on to the next step and begin drawing the global picture
that we were talking about in the introductory part of this chapter.
2.3 Determining the Replication Sources and Replication Targets

Draw a high-level picture of all your replication sources and replication
targets.
Reminder: At this point, the focus is on building the overall architecture of the
replication system. Try to avoid DProp or DataJoiner specific details, for
example:

• The middleware should not appear.
• The DProp control tables should not appear.
• The technical names of the application tables should not appear except if
they are explicit from a business point of view, or if they are known by
everybody in your organization.
But you should name the available communication links (with no technical
details) between the sources and the targets, and you should name the
source and target platforms.
The picture should, in fact, be considered as a communications media

between you and the users, to show users how the replication system will
provide them with new facilities and new functions. The picture should show
which are the replication source tables and the replication target tables,
where they are located, how and when the users will have access to the
replicated data, and what they will be able to do with the data.
But you must go farther, in your analysis, than just drawing this picture.
You also have to develop a document detailing the list of all the target tables,
with all the columns and their meaning, and explain how each column will be
derived from the source data (source table and column name, or calculation
formula).
The document has to provide answers for the following questions:

• How does the structure of the target tables relate to the structure of the
source tables?
• In which cases will data for a target table come from columns of multiple
source tables?
• In which cases will different columns of one source table be sent to
different target tables?
• In which cases is the column length or data type of the target column
different from the source column?
• In which cases does source data need to be translated?
• In which cases does data for a target column need to come from multiple
source columns?
You will probably need the users’ help again to complete this document. So,
in fact, the two steps (2.2, “Gathering the Detailed Requirements” on page 16,
and 2.3, “Determining the Replication Sources and Replication Targets” on
Planning 19
page 18) are iterative. You will need several iterations to stabilize the
requirements analysis documentation.
So far, you have taken the users’ requirements into consideration, but you
must also establish capacity planning requirements. The next section helps
you do this.
2.4 Technical Planning Considerations

In this section, we want to discuss the technical planning aspects that you
should start thinking about before you actually move on to the implementation
of the replication solution. You might want to revisit this section after you have
gained a more detailed understanding of DProp, because the content is using
DProp specifics.
2.4.1 Estimating the Data Volumes

The introduction of a replication system (heterogeneous or not) will have a
significant impact on the disk space utilization, because:
• The changes to the source tables are captured and stored into staging
tables, as long as they have not been replicated to the target (there is of
course a mechanism to avoid an unlimited growth of the staging tables).
• The change capture activity requires that additional information be logged
into the log files.
• The target tables will probably be new tables.
• The replication system will use additional work files called spill files.
The next sections help you estimate this additional disk space utilization.
Advice: The estimation of the future volume of the staging tables is often a
difficult task, because most database administrators do not know how many
updates, inserts and deletes are performed on the source tables. So, some
are tempted to just "forget" this essential task. But you will not do this. You
really will spend some time trying to estimate, even roughly, how often your
source tables are updated.
If it is really too difficult, you can choose the following approach: Install the
Capture component of DProp on your source production system (or the
capture emulation triggers if your source is a non-IBM database), a long time
before you are ready to actually move the whole replication system to
production, then simulate a full-refresh so that Capture really starts capturing
the updates (the way to do this is explained in 8.4.9, “Initial Load of Data into

the Data Warehouse” on page 261), and let Capture run during a few days.
Then you simply have to stop Capture (or drop the capture emulation
triggers) and count how many rows you have in each staging table.
2.4.1.1 Change Data Tables and Unit-Of-Work Table Sizing

The updates that are done to the source tables are captured and stored into
staging tables. Change data tables (CD) for DB2 sources, consistent change
data tables (CCD) for non-IBM sources.
When you try to estimate the disk space the staging tables will use, it is not
only important to know the size of the source tables. You must also know how
many insert/delete/update operations will be made to the source tables, not in
average but as a maximum.
For example, imagine you have a source table that contains 1 million rows,
and your daily applications only update one percent of the rows. You will have
10,000 new rows each day in the staging table. If the table is replicated
regularly (several times a day for example), the staging table pruning
mechanism will be able to remove rows regularly from the staging table, and
so the staging table will never contain more than 10,000 rows. But now,
imagine that for this table you have a new monthly application that updates all
the rows. When the changes are captured, the staging table will contain 1
million rows.
This illustrates the fact that you need to know both the size of your source
tables and the maximum percentage of rows that are updated during one
replication cycle.
For some small tables (1000 rows or less; of course it also depends on the
length of each row!) that are globally updated by batch programs each day
and also propagated once a day, you might even consider that capturing and
replicating the updates is not the best approach. You can configure DProp so
that it should replicate this table in ’full-refresh’ mode only. The updates will
not be captured nor stored in staging tables, and the Apply component of
DProp will simply copy the whole content of the source table towards the
target tables. This runmode should of course only be used in exceptional
cases, because the main advantage of DProp is to provide change
replication.
When the source is a DB2 table, the capture component of DProp also inserts
rows into a table called the unit-of-work table. A row is inserted in the
unit-of-work table each time an application transaction issues a COMMIT and
the transaction had executed an SQL insert/delete/update statement against
a registered replication source table.
Planning 21
Except if you are using the technique explained in the remark above,
precisely estimating the size of the staging tables could only be done after
you have chosen all the replication parameters. But for the moment you only
need to do a rough estimation, using some simplified formulas (see below).
To size the staging tables, use the following simplified formula (the result is in
bytes), then add a 50% security margin:
(21 bytes + sum(length(registered columns)) x estimated number of inserts,
updates, deletes to be captured during max(pruning interval,replication
interval), for a busy day such as a month-end for example
If the source is a non-IBM database (staging table is CCD):

Same formula as above, but replace 21 by 57.
To size the unit-of-work table, use the following simplified formula (the result
is in bytes), then add a 50% security margin:
79 bites x estimated number of commits to be captured during max(pruning
interval,replication interval), for a busy day such as a month-end for
example
Remark: The formulas above assume that all the Apply processes will
replicate with the same frequency. If you are in a configuration where one
Apply could run very infrequently (this is the case for mobile replication
environments for example), the effective pruning of the staging tables will be
done according to another parameter that is called retention limit, and the
size of the staging tables will probably be larger.
Remark: With IBM’s replication solution, the definition of new replication

targets does not cause any additional rows to be inserted into the staging
tables. Capture inserts a source table change into the staging table once for
all targets, and so additional disk space or log space is not needed. But
please keep in mind that Capture will not prune a row from a staging table
until it is replicated to all targets. If one of the targets is replicated less
frequently, or stopped replicating, the change data table will grow larger with
records that are retained for replication to this target.
2.4.1.2 Log Impact

On most platforms, only the log space of the source system will increase. If
the target platform is an AS/400, you will also need to consider the log space
(journal receivers space) of the target tables. Several AS/400 customers do
not usually journalize their AS/400 physical files; but DProp requires that both
the source and the target AS/400 tables be journalled (logged). (On other
platforms you do not have this choice; all DB2 tables are logged.)

Log Impact for DB2 Source Systems
DB2 source tables must have the DATA CAPTURE CHANGES attribute.
Without this attribute, only the before/after values of changed columns are
logged. With this attribute, the before/after values of all the columns are
logged. This is why the definition of a DB2 table as a replication source might
increase the log space.
The increase in log space needed for your replication source tables will
depend on the number of replication sources defined, the row length of the
replication sources, the number of changes to those tables, and the number
of columns updated by the application. As a rule-of-thumb, you can estimate
that the log space needed for the replication source tables, after you have set
the DATA CAPTURE CHANGES attribute, will be three times larger than the
original log space needed for these tables.
You also need to consider the increase in log space needed because the
staging tables and the unit-of-work table are DB2 tables and so they are also
logged.
Log Impact for Non-IBM Source Systems

For non-IBM source systems, there is no capture component and the capture
functions are emulated by triggers attached to the source tables. So there is
no additional logging for the source tables. There is also no unit-of-work
table. So the only log space increase is due to the logging of the staging
tables.
Log Impact for a Target AS/400

Since journal receivers for target AS/400 tables can be created with the
MNGRCV(*SYSTEM) and DLTRCV(*YES) parameters, and since you only
need to journal the after-image columns, use the following formula to
estimate the volume of the journal receivers for the AS/400 target tables:
Target table row length x Journal receiver threshold
2.4.1.3 Target Tables

The target tables will probably be newly created tables, so you must estimate
their volume. The estimation will mostly depend on:
• The target tables’ type (complete or non-complete, condensed or
non-condensed)
• The number of rows of the source tables, and the rows subsetting criteria
that you will define
• The length of each row
Planning 23
2.4.1.4 Spill Files
The Apply component of DProp uses work files called spill files when it
fetches the data from either the source tables or the staging tables. Spill files
can be large when there are many updates to replicate, or when the initial full
refresh of a target table is performed.
Refer to 5.5.11, “Using Memory Rather Than Disk for the Spill File” on page
126, and to the DPROPR Planning and Design Guide, SG24-4771 for more
complete details about the spill files sizing estimation.
2.4.2 About CPU, Memory, and Network Sizing

The CPU, memory, and network utilization will depend so much on the
architecture and characteristics of your heterogeneous replication system
(placement of components, number of sources and targets, speed of
communication links, frequency of replication, number of updates), that we
cannot give a rule-of-thumb sizing formula for these elements.
Generally, the Capture component of DProp is not a high CPU consuming

process, except during the pruning activity (that is why we recommend
deferring pruning to off-peak hours), and it can be run with a low priority.
Performance considerations about the capture emulation triggers, for a

non-IBM replication source, are given in 3.4, “Performance Considerations for
Capture Triggers” on page 55.
The Apply component of DProp is a more CPU-consuming process than the

Capture component. Apply is an SQL application that runs partly on the
source server to fetch the updates from the staging tables, and partly on the
target server to apply the updates to the target tables. The CPU consumption
can be high on both sides. The easiest way to reduce the CPU consumption
is to increase the replication interval, and you can also schedule the
replication to run at off-peak hours. Apart from that, since Apply is an SQL
application, tuning Apply is identical to tuning any other SQL application. You
should check in particular that you have the correct indexes defined on the
staging tables, on the unit-of-work table and on the target tables. See 5.5,
“Tuning Replication Performance” on page 117 for more details.
Most of all, the network capacity has a very significant impact on the overall
performance of any replication system. Do not neglect it! The network is often
the bottleneck of the whole system.
These recommendations will help you design and implement the most
efficient architecture with respect to your business and organizational
requirements. Then, the best thing you can do to optimize the CPU, memory

and network utilization, is to measure them with the appropriate tools, and
have specialists study the results and tune the different performance
parameters.
Remarks:
• You can also find useful CPU, memory and network sizing information in
the DPROPR Planning and Design Guide, SG24-4771. Although it was
written for DProp version 1, most of the guidelines it contains remain true
for DProp version 5.
• Testing the performance of your heterogeneous replication system on your
test system will not necessarily be meaningful unless you are able to
have:
• A real pre-production environment with similar characteristics as the
production environment
• An automation tool to reproduce the workload of the production
environment onto the pre-production environment.
2.5 Summary
In this chapter we focused on the planning topics that you should study
before you really start designing and implementing your heterogeneous
replication system:
• Organize your project as any other IT project. Do not forget to involve
users and application specialists as early as possible.
• Gather the detailed business requirements and determine the list of
targets and the corresponding sources. To help you achieve this task we
provided you with a checklist of the questions you should ask users, and
yourself.
After that you should be able to:
• Draw a business oriented picture of the future replication system
• Write a document describing the future target tables, and the origin of
the data for all the columns
• Estimate the impact of data replication on the IT environment
You are now ready to go to the next phase and design the architecture of your
heterogeneous replication system. See Chapter 3, “System and Replication
Design—Architecture” on page 27.
Planning 25
Chapter 3. System and Replication Design—Architecture
After you have been through the planning phase of your heterogeneous
replication project (see Chapter 2, “Planning” on page 13), and before you
really start implementing the components of the technical solution (see
Chapter 4, “General Implementation Guidelines” on page 61), you should
spend some time thinking about the architectural aspects of your
heterogeneous replication system.
Within this chapter we will provide you with enough information to help you
choose between the different options that are available when you build this
architecture. Figure 5 shows where we are in the sequence of planning,
designing, implementing, and operating heterogeneous replication.
Approaching
Data Replication
Replication
Replication Replication Implementation
Operation &
Planning Design Guidelines
Maintenance
Principles of
System Replication
Heterogeneous
Design Options Design Options
Replication
We will focus on the following topics:

• Principles of heterogeneous replication:
• What you can and cannot do, in regard to your business requirements
• Overview of common replication architectures

• System design options
The system design options section deals with the different possibilities for
the placement of the software components (DataJoiner, DProp Capture,
DProp Apply) and of the DProp control tables.
• Replication design options
The replication design options section deals with the replication options
that you still have to choose once you have positioned your software
components and chosen the control tables location. The most important
options are: What target tables types you will choose, and how often you
will propagate.
3.1 Principles of Heterogeneous Replication

In Chapter 1.4, “Technical Warm-Up” on page 7, we provided an overview of
the DProp and DataJoiner products:
• DProp is the central product of IBM’s replication solution. It enables you to
create and run powerful homogeneous replication systems. This means
that you can propagate data from any DB2 database located on any
platform (such as OS/390, VM, VSE, AS/400, AIX, OS/2, Windows NT,
Windows 95, or Windows 98) to any other DB2 database.
• DataJoiner is IBM’s strategic gateway to enable transparent access to
non-IBM relational databases. With DataJoiner, the DProp Apply
component can transparently access non-IBM relational databases as if
they were DB2 databases. That is why the combination of DProp and
DataJoiner can propagate data from any DB2 database towards non-IBM
relational databases. Basically, DataJoiner can be considered as a DB2
engine (so that you can create databases and tables in DataJoiner) with
an enhanced catalog that enables DataJoiner to play the role of a gateway
towards other database systems. In the catalog you define nicknames that
refer to objects (such as tables or procedures) that are located in the
non-IBM database. You then use the nicknames each time you want to
access the non-IBM objects.
• DataJoiner Replication Administration (DJRA), the administration
component that is included in DataJoiner, compensates for the absence of
the Capture component on the non-IBM databases, by generating triggers
that emulate the capture functions. That way, it is possible to propagate
from non-IBM databases to any DB2 database.
The IBM replication solution enables you to replicate data from (nearly) any
relational database to (nearly) any other relational database. Since DProp

and DataJoiner enable you to propagate from DB2 to non-IBM databases,
and also from non-IBM databases to DB2, you can of course also set up a
replication system from a non-IBM database towards another non-IBM
database. For example, you can propagate from Oracle to Informix, from
Informix to Sybase, from Microsoft SQL Server to Oracle, or from Informix to
Informix— just be creative!
In the case studies (in Part 2 of this book) we will only describe how to set up
data replication between DB2 and non-IBM databases, but you can simply
combine the various examples (non-IBM source, non-IBM target) to create
other scenarios.
There is only one case where you do not need the full functions of DataJoiner
to replicate between DB2 and a non-IBM database: Microsoft Access. See
Chapter 9, “Case Study 4—Sales Force Automation, Insurance” on page 271,
where such a replication scenario is explained in detail.
Data replication between a relational database and non-relational databases

is not within the scope of this book, but solutions exist. For example:
• You can use DProp and Lotus Notes Pump to propagate data between any
DB2 database and Lotus Notes databases. You just need to create
complete Consistent Change Data (CCD) tables in the DB2 database and
feed these CCDs using the Apply component of DProp. Lotus Notes Pump
will retrieve the updates from the CCDs and propagate them to the Lotus
Notes databases. Refer to the IBM Redbook Lotus Solutions for the
Enterprise, Volume 5 NotesPump: The Enterprise Data Mover, SG24-5255
for details.
• You can use DataPropagator NonRelational or the Classic Connect
feature of DataJoiner to access data from an IMS hierarchical database.
DataPropagator NonRelational can feed a CCD table in DB2 for OS/390,
that is then used by the DProp Apply component to propagate IMS
database changes to DB2, or via DataJoiner to non-IBM relational
databases. For instance, to replicate IMS changes to Oracle, use
DataPropagator NonRelational to replicate the changes into a CCD table
in DB2 for OS/390 and the Apply component of DataJoiner to replicate
from the CCD table to Oracle.
DataJoiner is not only used in heterogeneous replication systems. It can also

be used in environments where there is a need for heterogeneous distributed
data access (read and write). DataJoiner enables users and applications to
perform distributed SQL queries over tables that are located in separate,
multi-vendor databases. For example, you can execute a single SQL
statement that joins a table located in an Oracle database, with a table
System and Replication Design—Architecture 29

located in a Sybase database, with a third table located in a DB2 for AS/400
database, and all this with tremendous performance (thanks to DataJoiner’s
global optimizer)! Could you easily do that without DataJoiner?
Let us forget all the other marvellous capabilities of DataJoiner for a while
and go back to our heterogeneous replication topic.
3.1.1 Extending DProp Replication to Multi-Vendor Replication

When used in a DB2-only environment, DProp has many powerful features
that enable you to fulfill lots of business-driven replication requirements. They
include:
• Update-anywhere replication, with conflict detection and referential
integrity constraints between the tables
• Denormalization (including replication from join views)
• Data transformation (columns-subsetting, use of calculated columns)
• Data aggregation and temporal histories
• Data distribution (one source to many targets), with the following
possibilities:
• Subsetting of rows using where-clauses
• Use of updatable partitioning keys (updates are captured as
delete+insert pairs)
• Data consolidation (many sources to one target)
• Additional logic with invoking SQL statements or stored procedures during
replication
• Multi-tier replication, using Consistent Change Data (CCD) staging tables
• Support for occasionally connected, mobile environment
• Large answer sets regulation (use of blocking factor)
• Flexibility in the placement of the control tables
• Flexibility in the placement of the Apply program (pull or push replication)
Please refer to the DB2 Replication Guide and Reference,SR5H-0999 for

more details on all these capabilities.
Now you are probably wondering which of these DProp features can also be
used when you add DataJoiner into the picture to propagate between DB2
and a non-IBM database.

Table 1 shows which of the various features are available in a heterogeneous
replication environment, and which are not. As you can see, nearly all the
features are supported.
Update-anywhere is only available for Microsoft Access. Some other features

are available in all cases except Microsoft Access, because the replication
between DB2 and Microsoft Access must always be configured as an
update-anywhere replication, and these other features are not compatible
with update-anywhere processing.
The last column of this table indicates where you can find examples, in this
book, to implement these features.
Table 1. Available Replication Features in a Heterogeneous Environment
Replication Feature Available Available Example Reference

if Source if Target is
is non-IBM non-IBM
Update-anywhere + conflict Only with Only with Chapter 9, “Case Study 4—Sales
detection + referential integrity MS Access MS Access Force Automation, Insurance” on page
constraints support 271
Denormalization (use of Join N Y Chapter 7, “Case Study 2—Product

views as sources) Data Distribution, Retail” on page 173
Chapter 8, “Case Study 3—Feeding a

Data Warehouse” on page 203
Chapter 9, “Case Study 4—Sales

Force Automation, Insurance” on page
271
Data transformation (columns Y(*) Y(*) Chapter 8, “Case Study 3—Feeding a

subsetting, use of calculated Data Warehouse” on page 203
columns)
Data aggregation Y (*) Y (*) Chapter 8, “Case Study 3—Feeding a

Temporal histories Y Y Chapter 8, “Case Study 3—Feeding a

Replication from non-IBM Y Y Chapter 6, “Case Study 1—Point of

database Sale Data Consolidation, Retail” on
page 139

Replication Feature Available Available Example Reference
if Source if Target is
is non-IBM non-IBM
Rows subsetting Y Y Chapter 7, “Case Study 2—Product

Data Distribution, Retail” on page 173
Chapter 9, “Case Study 4—Sales

Force Automation, Insurance” on page
271
Capture of updates as delete + Y Y Chapter 9, “Case Study 4—Sales

insert pairs Force Automation, Insurance” on page
271
Data consolidation Y Y Chapter 6, “Case Study 1—Point of

Sale Data Consolidation, Retail” on
page 139
Run SQL statements or stored Y (*) Y (*) Chapter 7, “Case Study 2—Product
procedures Data Distribution, Retail” on page 173
Use of CCDs Y Y Chapter 7, “Case Study 2—Product

Data Distribution, Retail” on page 173
Chapter 8, “Case Study 3—Feeding a

Mobile environment MS Access MS Access Chapter 9, “Case Study 4—Sales

only only Force Automation, Insurance” on page
271
Large answer sets regulation Not recom- Y None

mended
Flexibility in the location of the Y (*) Y (*) None

Control tables
Flexibility in the location of the Y (*) Y (*) None

Apply program (pull, push)
Note: The (*) in the table above means ’except with MS Access’.
Now that we have seen what you can do (and what you cannot do) according
to your source database systems and your target database systems, let us
have a look at some of the most common replication environments.

3.1.2 Overview of the Most Common Replication Architectures
The purpose of this section is to show how DProp and DataJoiner
components can be used in different replication scenarios. The figures
described in this section detail the components of the most common
heterogeneous replication environments. It is important to understand that
many more configurations are possible.
3.1.2.1 Replication from DB2 to a Non-IBM Target

If you want to propagate from a DB2 database towards a non-IBM target
database, you will need to have DataJoiner installed as a middleware
component (except if the non-IBM database is Microsoft Access; see Chapter
9, “Case Study 4—Sales Force Automation, Insurance” on page 271).
The most common configuration in this case is:

• The Capture component of DProp is co-located with the DB2 Source.
• The Apply component of DProp is co-located with DataJoiner.
• The control tables used by Apply are stored in the DataJoiner database.
Figure 6 on page 34 shows an example with a non-IBM target database and a

DB2 source database.
Of course there are possible alternatives, but let us keep it simple for the
moment.

DB2 DataJoiner Multi-Vendor
Source Database Target
Database Database
DataJoiner
Global Catalog
Source Table Target Target Table

Nickname
CAPTURE APPLY
CD Table UOW Table
Apply Control Tables :

ASN.IBMSNAP_SUBS_SET
ASN.IBMSNAP_SUBS_MEMBR
ASN.IBMSNAP_SUBS_COLS
Register Table ASN.IBMSNAP_SUBS_STMTS
Pruncntl Table ASN.IBMSNAP_SUBS_EVENTS
Reg_Synch Table
Replication Direction
Figure 6. Replication to a Non-IBM Target (Components and Placement)
Apply will access the target table through a nickname that is defined in the
DataJoiner database.
3.1.2.2 Replication from a Non-IBM Source Towards DB2

If you want to propagate from a non-IBM source database towards a DB2
database, you will also need to have DataJoiner installed (except if the
non-IBM database is Microsoft Access; see Chapter 9, “Case Study 4—Sales
Force Automation, Insurance” on page 271).
The most common configuration in this case is:

• There is no Capture component. The Capture functions are emulated by
triggers at the source. All the necessary triggers will be automatically
generated by DataJoiner Replication Administration (DJRA).
• The Apply component of DProp is located at the DB2 target server.

• The control tables used by Apply are stored in the target database.
This configuration is shown in Figure 7.
Multi-Vendor DataJoiner DB2

Source Database Target
Database DataJoiner
Database
Global Catalog
Insert
Update
Delete Source Table Source Target Table
Nickname
APPLY
CCD Table CCD

Nickname
Prune
Control Tables :
Pruning Control ASN.IBMSNAP_SUBS_SET
Pruning Control ASN.IBMSNAP_SUBS_MEMBR
Nickname
ASN.IBMSNAP_SUBS_STMTS
ASN.IBMSNAP_SUBS_EVENTS
ASN.IBMSNAP_APPLYTRAIL
Register ASN.IBMSNAP_CCPPARMS
Register ASN.IBMSNAP_UOW
Nickname
Reg_Synch
Reg_Synch Reg_Synch
Nickname
Replication Direction
Figure 7. Replication from a Non-IBM Source (Components and Placement)
3.1.2.3 Two-Way Replication Between DB2 and a Non-IBM Database
As we have said before, it is not possible to set up an update-anywhere

configuration when either the source or the target is not a DB2 database
(except with Microsoft Access). This means that you cannot propagate both
ways to and from the same tables. But you can easily propagate some tables
one way, and other tables the other way.

For example, let us imagine you want to propagate data between an AS/400
and Microsoft SQL Server on Windows NT by replicating:
• Some tables from the AS/400 towards Microsoft SQL Server
• Some other tables from Microsoft SQL Server to the AS/400
To do this you just have to combine the environments discussed in the two
previous sections. You can simplify the setup since the Apply program is able
to run in both Pull and Push modes. Therefore, you only need to have a
single Apply instance running in the DataJoiner Server, pulling the data from
the AS/400 to Microsoft SQL Server, and pushing the data from Microsoft
SQL Server to the AS/400.
To avoid confusion when you define the replication sources and targets, it is
better to define two DataJoiner databases, one for replicating data in each
direction.
The global picture would look like Figure 8:

DB2 DataJoiner Multi-Vendor
Database Database
DJDB1
Source Table 1 Nickname Target 1 Target Table 1
CAPTURE
Apply Control Tables :

CD Table UOW Table ASN.IBMSNAP_SUBS_MEMBR
ASN.IBMSNAP_SUBS_EVENTS
Register Table APPLY

Pruncntl Table AQLY1
Reg_Synch Table Pull Mode
APPLY Insert
AQLY2
Push Mode Update
Target Table 2 Delete Source Table 2
DJDB2
CCD Table
Nickname Source 2 Prune

Nickname CCD
Nickname Register
Nickname Pruncntl
Nickname Reg_Synch
Pruning Control
Register
Apply Control Tables : Reg_Synch

ASN.IBMSNAP_SUBS_MEMBR
ASN.IBMSNAP_SUBS_EVENTS Reg_Synch
Figure 8. Replication Both Ways Between DB2 and a Non-IBM Database

3.1.2.4 The Administration Component
The previous pictures show the runtime components, but not the
administration component that is used to define the replication sources and
the replication subscriptions. In fact, the administration component,
DataJoiner Replication Administration (DJRA), is a Graphical User Interface
(GUI) that does not need to be present when the replication is running, and
that is why it does not appear in the previous examples. It can be installed on
a Windows 95 or Windows 98 workstation, or on a Windows NT system.
DJRA must be configured in such a way that it can access both the non-IBM
databases and the DB2 databases:
• To access the non-IBM databases, DJRA will connect to the DataJoiner
database and DataJoiner will act as a gateway towards the non-IBM
databases using the defined server mappings and user mappings (see
Figure 9, A) .
• To access the DB2 databases:
Depending on the type of DB2 database, DJRA will:
• Connect directly to the DB2 database: This is the case if the DB2
database is DB2 UDB or DB2 Common Server on Intel or RISC
platforms (see Figure 9, C).
• Connect to the DB2 database through DataJoiner: This is the case if
the DB2 database is DB2 for OS/390, DB2 for AS/400, or DB2 for VSE.
No server mapping is necessary, only the Distributed Database
Connection Services (DDCS) function of DataJoiner is used (see
Figure 9, B).
Remark: It is also possible to create server mappings and nicknames for DB2
databases and tables. This DataJoiner feature is used, for example, when a
DB2 table and an Oracle table must be joined together. DB2 access through
nicknames is not recommended when you use DB2 objects for replication
only.
See Figure 9 for an overview of how the administration component relates to

the DataJoiner server. For a more detailed discussion of the setup tasks
necessary for the administration component, refer to Chapter 4, “General
Implementation Guidelines” on page 61.

Non-IBM
DataJoiner Target Databases
DataJoiner - Informix
Database - Oracle
A - Microsoft SQL Server
Server Mappings - Sybase SQL Server
A User Mappings - SQL Anywhere
DJRA
DB2 CAE B DDCS IBM

Target Databases
Node Directory
DB Directory B - DB2 for OS/390
Replication - DB2 for AS/400
Administration DCS Directory
C - DB2 for VSE
Workstation
DataJoiner
Server
- DB2 UDB
- DB2 CS V2
Figure 9. DJRA and DataJoiner Database Connectivity
Remark: In a DB2-only environment, you have the choice of configuring your

replication system using either:
• DJRA
• The Control Center of DB2 UDB
In a multi-vendor database environment, you must use DJRA.
3.2 System Design Options

This section describes the following system design options:
• Apply program placement: pull or push
• DataJoiner placement: centralized or decentralized
• Control table placement: centralized or decentralized
3.2.1 Apply Program Placement: Pull or Push

When Apply is running on the same machine as the target database, it pulls
the data from the source. When Apply is running remotely from the target
database, it pushes the data to the target. A pull configuration is typically

between 10 and 30 times more efficient than a push configuration. The push
versus pull performance difference is due to the design of the underlying
database connectivity protocol (it is not a consequence of the DProp or
DataJoiner internal designs). It is a trade-off between Apply being able to use
block-fetch of an answer set over the network, and Apply performing single
insert/update/delete operations over the network (push mode). Single
insert/update/delete operations across the network are far more expensive
than fetching an answer set in blocks (that is, multiple records can be
retrieved by Apply in one network I/O operation). Therefore, the goal for good
performance is to perform the insert/update/delete operations locally. This
means co-locating both Apply and DataJoiner with the target.
In most cases we recommend the use of a pull replication scenario. It is only

when you have source tables that are infrequently updated that you can
consider using a push scenario. This means that you should place the Apply
program close to the target database.
Of course, it is also possible to have a scenario in which DataJoiner and

Apply are located on a separate machine from both source and target. In this
case, the performance benefit of block fetching the data from the source will
be negated by the performance degradation of applying the
insert/update/delete operations over the network.
3.2.2 DataJoiner Placement

A very important design consideration when implementing a heterogeneous
replication system is the placement of the DataJoiner middleware server. An
additional task is to evaluate how many DataJoiner instances and how many
DataJoiner databases are necessary to implement the design.
Let us spend a minute discussing what generally influences the decision of

where to place the DataJoiner middleware. Definitely, we want to achieve the
following goals:
• Good performance: If you want to propagate towards a non-IBM
replication target, for example, you will use the Apply code that is included
in DataJoiner, and so it will be better to place DataJoiner as close as
possible to the non-IBM replication target system, or even on the same
platform when it is possible. If you have many different target databases,
this means you will also have several DataJoiner instances and
databases.
Since Apply is a DB2 application, it needs to connect to non-IBM relational
databases through DataJoiner which, with its nicknames, makes the target
tables appear to Apply to be in a DB2 database. Apply can be no closer to

the target tables than DataJoiner can be. If DataJoiner is installed on the
same server as the non-IBM database that has the target table, then the
Apply insert/update/delete operations will be done locally. If DataJoiner is
installed on a different server than the non-IBM database, then the Apply
insert/update/delete operations will have to be executed over the network.
If DataJoiner needs to be on a different server than the target tables, you
should look for ways to tune the network hardware, the network software,
and the data source and its network client software (for example, Oracle
SQL*Net, or Sybase Open Client on the DataJoiner server) for best
performance.
• Ease of administration: The more database middleware instances
involved, the more administration overhead you will have to manage. That
means, that if your concern is to minimize the administration overhead,
you should minimize the number of DataJoiner instances and databases.
The two goals are conflicting, and you will have to find a good compromise.
Furthermore, the best solution is not necessarily the same whether you
intend to propagate from a non-IBM database or whether you intend to
propagate to a non-IBM database.
The following two sections provide you with the background information to
help you decide where to place the DataJoiner middleware server(s), and
how many DataJoiner databases you should use. We will consider data
distribution to non-IBM databases and data consolidation from non-IBM
database systems, separately.
3.2.2.1 Data Distribution to Non-IBM Targets

Considering a data distribution scenario with non-IBM target systems, the two
main rules for placing the database middleware seem to be contradictory:
• Good performance would be achieved by adding one DataJoiner instance
to every remote non-IBM database server. DProp Apply would be running
along with every DataJoiner instance. The DProp control tables could be
decentralized or centralized.
• The lowest administration and monitoring costs would be achieved by
setting up one dedicated middleware server, providing nicknames for all
the target tables and additionally containing the DProp control information
for all the replication targets.
Figure 10 visualizes the two different approaches:

Data Distribution
Source Source
IBM
DJ & APPLY
IBM IBM
DJ & APPLY DJ & APPLY
Non-IBM Non-IBM Non-IBM Non-IBM

DBMS DBMS DBMS DBMS
PERFORMANCE EASE OF ADMINISTRATION
Figure 10. DataJoiner Placement—Data Distribution to Non-IBM Targets
One DataJoiner Instance for Each Non-IBM Target

Placing the DataJoiner instances as close as possible to the non-IBM target
gives you the choice between two major options:
• Option 1: Install DataJoiner on the same workstation as the non-IBM
database.
• Option 2: Install DataJoiner onto a separate machine, most likely placed
within the same LAN as the non-IBM database server.
We recommend Option 1 for those system platforms that DataJoiner natively

supports (at the time this book was edited, DataJoiner Version 2 was
available for AIX and Windows NT). For example, if your target system is
Microsoft SQL Server on Windows NT, install DataJoiner for Windows NT on
the same machine. If your target system is Oracle on AIX, install DataJoiner
for AIX on the same machine.
Option 2 should only be used if the non-IBM target databases are located on
operating system platforms that DataJoiner does not yet natively support.
Example: To access Oracle on SUN Solaris, use a separate machine (either
AIX or Windows NT) and place this machine in the same LAN as the SUN
Solaris machine.

For mobile systems, having one DataJoiner instance per target database
would not be a good idea! (But who could be so smart as to put MS SQL
Server, for example, onto a mobile computer!)
Please refer to Chapter 9, “Case Study 4—Sales Force Automation,

Insurance” on page 271, to see what IBM currently recommends for
infrequently connected mobile replication target systems. Additionally, watch
out for solutions that focus on light DB2 systems to be introduced with DB2
UDB Version 6.
If you choose to have one DataJoiner instance per non-IBM target, you will
only need one DataJoiner database in each DataJoiner instance.
One Central DataJoiner Instance

One central database middleware server in a data distribution scenario has
the disadvantage of sub-optimal replication performance, because remote
SQL operations (inserts, updates, deletes) over a network are much slower
than remote fetches.
On the other hand, a single point of administration will reduce systems

management overhead.
If you choose to have one central DataJoiner instance, you can choose to
have either:
• Only one DataJoiner database, common for all the non-IBM target
databases.
• Several DataJoiner databases.
The only reason why you would want to create several DataJoiner databases
is if you want the nicknames to be stored separately for security reasons. As
a first approach, just consider that a single DataJoiner database for all the
non-IBM databases is a good solution.
Final Recommendations for Data Distribution

Which option to choose for a data distribution scenario will definitely depend
on the kind of replication system you are planning to introduce. Factors that
will influence your decision will be:
• The amount of data to be replicated per replication interval
• The length of one replication interval (for example, replication every
minute or once a day)
• The number of non-IBM target systems
• The batch-window available for replication

Generally, based on performance testing, we recommend one DataJoiner
instance close to every non-IBM target system when the volume of data to be
replicated is high. Automated systems management components can be used
to assist you in monitoring large numbers of database middleware servers.
Only in situations where the data flows are small and the replication cycles
are long, do we recommend the use of a central middleware server to reduce
complexity.
In both cases, we recommend using only one DataJoiner database per

instance.
Refer to Chapter 7, “Case Study 2—Product Data Distribution, Retail” on

page 173 for a deeper insight into a data distribution scenario with Microsoft
SQL Server replication targets.
3.2.2.2 Data Consolidation from Non-IBM Sources

Topology recommendations for data consolidation scenarios which replicate
data from non-IBM source databases are much easier. Again, we are looking
for a system setup that provides:
• Good performance
• Easy administration
Figure 11 shows that a central database middleware instance is sufficient to

deliver the best performance as well as ease of administration for data
consolidation scenarios.

Data Consolidation
Target / APPLY Target / APPLY
IBM
DataJoiner
IBM IBM
DataJoiner DataJoiner

DBMS DBMS DBMS DBMS
EASE OF ADMINISTRATION
& PERFORMANCE
Figure 11. DataJoiner Placement—Data Consolidation from Non-IBM Sources
Number of DataJoiner Instances

Placing one DataJoiner instance besides every non-IBM replication source
would have no performance advantages compared to a central database
middleware server. Therefore we will not discuss this setup in detail.
One central DataJoiner instance enables both best performance as well as

ease of administration. DProp Apply will most likely be running at the
replication target system, and connect to all the non-IBM replication source
systems through the DataJoiner middleware instance. So, Apply will run in
pull mode, which is the best approach for performance. Additionally, the
centralized middleware approach will minimize administration and monitoring
cost.
Refer to Chapter 6, “Case Study 1—Point of Sale Data Consolidation, Retail”

on page 139 for a deeper insight into a data consolidation scenario with
Informix Dynamic Server Version 7.3 replication source systems.

Number of DataJoiner Databases
The number of DataJoiner databases that you have to create in the
DataJoiner instance is also very easy to determine: You must create as many
DataJoiner databases as there are non-IBM source servers. For example, if
you want to propagate data from three Informix systems, you will have to
create three DataJoiner databases. This is because for non-IBM source
servers, DJRA creates the REGISTER, PRUNCNTL and REG_SYNCH tables
in the source servers, and creates nicknames called
ASN.IBMSNAP_REGISTER, ASN.IBMSNAP_PRUNCNTL and
ASN.IBMSNAP_REG_SYNCH in the DataJoiner databases for these tables.
This is illustrated in Figure 12:
Wrong : Multi-Vendor Source Right : Multi-Vendor Source

DataJoiner
Database 1 Database 1
DJDB1
DataJoiner Source Table Source Source Table

Nickname
DJDB1
DJRA Source 1 DJRA M

E
Nickname NA
CK Nicknames
ASN.IBMSNAP_PRUNCNTL NI
Create CREATE TABLE ASN.IBMSNAP_REGISTER Create AT
E ASN.IBMSNAP_PRUNCNTL ASN.IBMSNAP_PRUNCNTL
RE ASN.IBMSNAP_REGISTER ASN.IBMSNAP_REGISTER
Replication ASN.IBMSNAP_REG_SYNCH Replication C
ASN.IBMSNAP_REG_SYNCH ASN.IBMSNAP_REG_SYNCH
Control Control CREATE TABLE
Tables CREATE NICKN. Multi-Vendor Source Tables Multi-Vendor Source
Database 2 Database 2
Nicknames
ASN.IBMSNAP_PRUNCNTL
ASN.IBMSNAP_REGISTER
?! DJDB2
ASN.IBMSNAP_REG_SYNCH CR
EA
Source Table TE
NIC Source Source Table
KN Nickname
AM
E
Source 2
Nickname
ASN.IBMSNAP_PRUNCNTL Nicknames
ASN.IBMSNAP_REGISTER ASN.IBMSNAP_PRUNCNTL ASN.IBMSNAP_PRUNCNTL
ASN.IBMSNAP_REG_SYNCH ASN.IBMSNAP_REGISTER ASN.IBMSNAP_REGISTER
ASN.IBMSNAP_REG_SYNCH ASN.IBMSNAP_REG_SYNCH
CREATE TABLE
CREATE TABLE
Figure 12. Why One DataJoiner Database for Each Non-IBM Source Server?
3.2.3 Control Tables Placement

The purpose of this section is to study the different possible placement
options for the DProp control tables.
When a source database is a DB2 database, the Capture program needs to

have its control tables within the replication source database. This is also true
for the triggers that emulate the capture functions. The REGISTER,
PRUNCNTL, and REG_SYNCH tables must be physically stored in the
non-IBM source, even if nicknames are created for them in the DataJoiner
database.

So in fact when we are discussing the placement options for the control
tables, we are only interested in the control tables of the Apply program. The
placement of Capture’s control tables (or the capture-emulation triggers’
control tables) is predetermined.
The Apply program can have its control tables (for example, SUBS_SET,
SUBS_MEMBR) located locally or remotely. The location chosen to hold the
control tables is known as the Control Server. As we already explained, it is
also possible to run Apply in push mode or in pull mode.
This means that you have several possible configurations (see Figure 13):
Source Source Source Source
Control Server Control Server
APPLY APPLY
DataJoiner DataJoiner DataJoiner DataJoiner
Control Server Control Server
APPLY APPLY

Target Target Target Target
Configuration 1 Configuration 2 Configuration 3 Configuration 4

Pull Pull Push Push
Figure 13. Apply and Control Server Placement
And since you probably will have several Apply instances, you can even have
combinations of the above configurations. But remember that if you want to
keep your configuration manageable, you had better keep it simple!

Configuration 1: This is the solution we would recommend if you have either:
• Few target servers and one DataJoiner instance for each target server
• One central DataJoiner instance
Configuration 2: This is the solution we would recommend if you have either:

• Many target servers and one DataJoiner instance for each target
server; the performance penalty of having the control server remote
from the Apply program is largely compensated by the ease of
administration.
• A production environment (mostly limited by your staff skills and the
monitoring tools available) that makes it really preferable to have the
control server co-located with the source server.
Configuration 3: This configuration is possible, but:

• It would be a push configuration (with poor data transfer performance).
• You should only consider this kind of configuration if you have only few
updates to propagate and if you run the Apply program infrequently.
Configuration 4: This configuration should not be used, because:

• It would have poor performance and uneasy administration.
So far, we have been discussing the placement of the software components

and of the control tables in a heterogeneous replication environment, but we
have not discussed some options that you will have to study before you begin
the real implementation. The next section helps you think about the additional
design options to be considered.
3.3 Replication Design Options

We will discuss the following topics in this section:
• The target table types you can use
• The replication frequency (replication timing)
• The blocking factor
In fact, most of these replication design options are directly driven by your
business requirements (refer to Chapter 2, “Planning” on page 13, to see
which tasks you need to go through to assess these business requirements).
You will have to check that these business requirements can really be
achieved, considering the comments and restrictions that are explained here.

3.3.1 Target Table Types
In a DB2-only replication environment, the following target table types are
available:
• Read-only target tables: usercopy, point-in-time (PIT), consistent change
data (CCD), base aggregate, change aggregate
• Updatable target tables: replica, user tables (source tables in an
update-anywhere scenario)
In a heterogeneous data replication environment, not all of these target table

types are available.
Read-only target table types are supported, except for CCDs. DJRA does not
currently allow the creation of a CCD table in a non-IBM target database, but
this restriction will probably soon be removed, and there is a workaround
anyway (see Chapter 8, “Case Study 3—Feeding a Data Warehouse” on page
203, for more details about this workaround).
Update-anywhere is currently supported for DB2 target tables only (target

type Replica), or for Microsoft Access target tables (target type Row-Replica).
This means that you cannot have an update-anywhere scenario if the source
database is not DB2 and if the target databases are not DB2 or Microsoft
Access.
3.3.1.1 Referential Integrity Considerations

Even in a DB2-only replication environment, you can not define referential
integrity (RI) constraints between your target tables if they are read-only
target tables (usercopy, point-in-time, base aggregate, change aggregate, or
CCD).
In fact constraints are only needed when there are application updates. It is
useless to define constraints on read-only targets, because the updates made
against the source tables already satisfied the RI constraints defined at the
source, and Apply will logically maintain these original RI constraints on the
target.
For better performance, DProp Apply assumes some freedom with respect to
read-only copies. Updates to read-only copies within a subscription set cycle
will be performed one member at a time, with all updates related to one
subscription member issued before updating the target associated with the
next member. Still, all members within a subscription set are replicated within
one unit-of-work. By taking a global upper transaction boundary (global
SYNCHPOINT) for the complete set into account, all the target tables are

transactionally and referentially consistent at the end of the replication cycle.
This is what is called transaction consistent replication .
In a DB2-only replication environment, if you really want to define RI

constraints between your target tables, you must define the target tables as
replicas. In this case you can implement the same RI constraints on the target
tables as the ones that you defined at the source (master). Apply will no
longer replicate the captured changes on a table-by-table basis (as above),
because that may violate the RI constraints. Instead, Apply will issue the
updates to the target tables in their original interleave sequence, across all
the subscription members concurrently. If the subscription set is defined large
enough, that means if the set includes all the tables that are referentially
related, the RI constraints defined on the targets will be respected because
Apply will have applied the updates in their original execution order. This is
known as transaction based replication .
3.3.1.2 Read-only Copies with Referential Integrity

If your target tables are non-IBM (or if they are DB2 tables but you do not
want to define them as replicas) and you want to define RI constraints
between your target tables, use the following advanced technique:
1. Define your replication source tables at the source server.
2. At the source server, disable conflict detection for all the replication
sources:
UPDATE ASN.IBMSNAP_REGISTER
SET CONFLICT_LEVEL = 0
WHERE SOURCE_OWNER = ’<source_owner>’
AND SOURCE_TABLE = ’<source_table>’;
3. Define a subscription set and add members using the target table type
usercopy, just as you would do for any read-only replication target.
4. Then change the target table type under the covers, changing it from
usercopy (TARGET_STRUCTURE = 8) to replica (TARGET_STRUCTURE = 7). To do
this, change the target table type within the pruning control table at the
source server for all the members of the subscription set:
UPDATE ASN.IBMSNAP_PRUNCNTL
SET TARGET_STRUCTURE = 7
WHERE APPLY_QUAL = ’<apply_qual>’
AND SET_NAME = ’<set_name>’
AND TARGET_OWNER = ’<target_owner>’
AND TARGET_TABLE = ’<target_table>’;

5. At the control server, change the target table type within the subscription
target member table for all the members of the subscription set:
UPDATE ASN.IBMSNAP_SUBS_MEMBR
SET TARGET_STRUCTURE = 7
AND TARGET_OWNER = ’<target_owner>’
AND TARGET_TABLE = ’<target_table>’;
6. Start Capture and Apply, and implement the RI constraints between the
target tables after Apply has full-refreshed all the tables. This is because
no interleave sequence can be detected when performing the initial
full-refresh. If you want to be able to define the RI constraints before the
full-refresh, then you must:
• Disable full-refresh before starting Capture and Apply
• Then use an adapted technique to load the tables and preserve the RI
constraints (use the ASNLOAD exit with EXPORT/LOAD, followed by a
CHECK CONSTRAINTS command for example, if the target is DB2).
Refer to Chapter 8.4.9, “Initial Load of Data into the Data Warehouse”
on page 261 for examples of the use of such techniques.
3.3.2 Replication Timing

When you design your replication system, an important aspect is to determine
when the the subscription sets will actually be processed. You can choose to
have the replication sets processed at regular time intervals (this is called
relative timing processing) or according to external events that can be of any
nature (this is called event based scheduling) or both.
In fact the replication timing definition is not peculiar to a heterogeneous

replication system. You would have to study exactly the same question in the
same way, if you were in a DB2-only replication environment. This is because
the replication timing exclusively deals with the Apply program (the Capture
program or the change capture triggers are not involved here), and it is
always the same Apply program that you use, whether you propagate
towards a DB2 database or towards a non-IBM database through DataJoiner.
So you could simply refer to the DB2 Replication Guide and Reference book,
S95H-0999. Nevertheless we want to recall some important notions here. We
also want to point out that there is nothing particular in replication timing for a
heterogeneous replication system compared to a DB2-only replication
system.

3.3.2.1 Relative Timing
When you define a subscription set, you can indicate a replication interval, in
minutes, hours, days, or even weeks.
You can also indicate that you want the subscription set to be processed
continuously, meaning that just after Apply has finished processing the
subscription set, it will process it again. This does not mean, of course, that
you have transformed DProp into a synchronous replication system. There is
still a delay between the time the update is done at the source and the time
the update is applied to the target. But this delay will be short.
Remark: The timing information is defined at a subscription set level. So, all
the members in a set will be processed with the same frequency.
Now you must be aware that if you have many tables to propagate, and you
choose a very short interval (1 minute, for example), Apply will do its best to
meet your requirement, but the actual interval will probably be higher than
what you indicated. This depends mainly on the available system resources
(for example, CPU power, and network capacity).
3.3.2.2 Event Based Timing

Event based scheduling was introduced in DProp V5. It enables you to
schedule the processing of a subscription set according to external events.
An event can be, for example, the end of a batch job, or a precise date and
time.
In fact, when you want a subscription set to be scheduled by events, you

simply indicate an event name (any character string is fine; EVENTXX, for
example) when you define the subscription set. You have probably noticed
that there is a control cable that is called ASN.IBMSNAP_SUBS_EVENT. To
create an event that will trigger the processing of your subscription set, you
simply need to insert a row into the ASN.IBMSNAP_SUBS_EVENT table,
with the following column values:
• The name of the event (EVENTXX in our example).
• A timestamp that indicates when the subscription set must be processed.
There is an optional third column when you add a row into the events table:
Refer to the DProp documentation to find more details about its use.
Several subscription sets can share the same event name. This means that if
you wish to trigger several subscription sets together from the same event,
you only need to indicate the same event name in the subscription sets
definition. But if you intend to do this, perhaps you should consider grouping

all the members into a single subscription set, rather than having separate
subscription sets.
Remark: For a subscription set, you can indicate both a replication interval
and an event name, but in general we recommend not mixing these two
processing modes. Remember: Try to keep things simple!
Example: If you want to propagate only once a day (for example, in the
evening at 8 pm) you have several possibilities:
• Use relative timing, with a 1-day frequency.
• You can even indicate a smaller interval (15 minutes, for example) so that
you can start additional replications during the day if needed. You can
either stop Apply once all the subscriptions sets have been processed at
least once, or deactivate all subscription sets processed by this Apply
instance by updating the control tables with the following statement:
UPDATE ASN.IBMSNAP_SUBS_SET SET ACTIVATE=0
WHERE APPLY_QUAL=’<apply qualifier>’
• Use events: Insert as many events as you wish in the
ASN.IBMSNAP_SUBS_EVENT table, one for each day that you want the
subscription sets to be processed.
• Or use the advanced event based scheduling technique that is described
in the next section.
3.3.2.3 Advanced Event Based Scheduling

The technique that is explained here enables you to automatically schedule
the processing of a subscription set once a day.
First, create your subscription sets with the event name ’WEEKDAY’ (or any
other name you like).
You will need to drop the existing ASN.IBMSNAP_SUBS_EVENT table and

recreate it with another name.
Then create the following view:

CREATE VIEW ASN.IBMSNAP_SUBS_EVENT
(EVENT_NAME,EVENT_TIME,END_OF_PERIOD)
AS SELECT SUBSTR(’WEEKDAY ’ , 1),
TIMESTAMP(DATE(CURRENT TIMESTAMP) , ’00.00.00’),
NULLIF(1 , 1)
FROM SYSIBM.SYSDUMMY1
WHERE((DAYS(CURRENT TIMESTAMP)-((DAYS(CURRENT TIMESTAMP)/7)*7))+1)
BETWEEN 2 AND 6

SYSIBM.SYSDUMMY1 can be substituted with the name of any 1-row table.
WEEKDAY is the name of the event. If you created the subscription sets with
another event name, just replace ’WEEKDAY’ above in the view definition by
the actual event name you chose.
The view above will generate a transparent event (in fact, no event will
actually be inserted in any real table; the view will just generate a temporary
event for Apply, when Apply accesses what it thinks is the
ASN.IBMSNAP_SUBS_EVENT table). The event will be visible each day of
the week from Monday through Friday, at midnight. On Saturdays and
Sundays nothing will happen (of course, you can change this: you simply
need to change the BETWEEN 2 AND 6 clause). When Apply runs on
Monday, for example, it will access the view and believe that there is an event
for that day at midnight, and so it will process the subscription set. The next
time Apply evaluates the view, the view will generate the same transparent
event, but since the LASTSUCCESS column in the SUBS_SET table
indicates that the subscription set has already been processed that day, it will
of course not process the subscription set again.
The event generator (the ASN.IBMSNAP_EVENT view) is a good idea. But

you could go further. You could, for example, modify the view so that it
performs a union with the original (renamed) event table. This would allow
you to continue creating traditional events, and still benefit from the automatic
event generator.
3.3.3 Using Blocking Factor

In a DB2-only replication environment, you can indicate a blocking factor
when you define replication sets. This feature is there to regulate the large
answer sets. We will explain what this means below.
The blocking factor limits the number of rows that will be propagated by
Apply. It determines a maximum number of minutes of changes that Apply
can process when it reads the change data tables. If the rows that are present
in the change data tables (and that have not yet been processed) represent a
number of minutes of changes that is above the blocking factor value, Apply
will automatically split the fetched answer set into smaller pieces, and it will
process the subscription set as several mini-subscriptions, in several
mini-cycles.
This is an important feature. If you have defined a blocking factor value, then
when Apply encounters an environment problem (logs full in the target
database, for example), Apply will automatically try to split the answer set so
that the subscription is reprocessed in several mini-subscriptions. If you do

not define any blocking factor value (that is to say, if you left the 0 default
value), Apply will not be able to use this mini-subscriptions technique in case
of environment problems.
In a heterogeneous replication environment:

• If the target is a non-IBM database and the source is a DB2 database, the
blocking factor feature can be used. Use a blocking factor that is large
enough, so that in most cases Apply will be able to process all the rows in
the change data tables in only one replication cycle.
• If the source is a non-IBM database, choosing a blocking factor for a
subscription set to limit the amount of data to be replicated within one
subscription cycle (ASN.IBMSNAP_SUBS_SET.MAX_SYNCH_MINUTES <> NULL) does
not take transaction boundaries into account (as it does for DB2
replication sources). That means that target database consistency could
possibly be affected while Apply is replicating all change data using
mini-cycle processing. After Apply has finished processing, target
database consistency is, of course, reestablished.
Recommendation: We recommend not using a subscription set blocking
factor when replicating from a non-IBM replication source, if some
applications can query the replication target tables while Apply is running.
If you are replicating during off-hours only, then a blocking factor can be
used.
3.4 Performance Considerations for Capture Triggers

In contrast to the asynchronous, log-based change capture mechanism that
IBM DProp provides for all DB2 replication sources, trigger based change
capture is a synchronous solution. That means, all change data is committed
into the change data tables (CCD tables in fact) in the same transaction along
with the source application.
Planning ahead for trigger based change capture, therefore, has to take the
following considerations into account:
• Source application transactions will slow down, because the transaction’s
workload is actually doubled.
• Source applications will need more log space, because writing change
data into the change data tables will happen within the same commit
scope as changing the application tables.
• If a Capture trigger cannot insert a change record into the CCD table, such
as because there is no longer space for the new record, then the

transaction attempted by the user or application program cannot complete
successfully.
The impact on large batch-style applications will also need to be studied.
To better evaluate the impact of synchronous trigger based change capture,

we set up a performance examination for two non-IBM replication source
systems, namely Informix Dynamic Server 7.3 and Oracle 8.
3.4.1 Description of the Performance Test Setup

To make different test setups comparable, we installed both Oracle 8 and
Informix Dynamic Server 7.3 on the same RS/6000 machine, model J50,
running IBM AIX Version 4.3. In both systems, we natively created the same
table (same number of columns, same data types). Both tables contained a
column comparable to the DB2 data type TIMESTAMP (Informix data type
DATETIME, Oracle data type DATE). A unique index was created for both
tables.
We created two test jobs, both containing 27340 SQL INSERT statements,
grouped together into 100 INSERTs per transaction (1 COMMIT after 100
INSERTs). The TIMESTAMP column was populated in both cases using an
SQL expression comparable to DB2’s current timestamp.
The test jobs basically contained the same statements. The only differences
were the syntactical representation of the CURRENT TIMESTAMP
expression (Informix: CURRENT YEAR TO FRACTION (5), Oracle:
CURRENT DATE) and the method used to execute the SQL script:
• Informix: The Informix client program dbaccess was used to execute the
SQL script. The following syntax shows the invocation of the test script. All
output was redirected to /dev/null to prevent any slowdown by
unnecessary screen output.
dbaccess ifxdb1 insertsifx.sql > /dev/null
• Oracle: The Oracle client program SQL*Plus was used to execute the SQL
script. The following syntax shows the invocation of the test script. All
output was redirected to /dev/null to prevent any slowdown by
unnecessary screen output.
sqlplus user1/pwd1@oradb1 @insertora.sql > /dev/null
To measure the impact of the synchronous triggers, the insert job was
executed before the tables were registered as replication sources (that is,
before the capture triggers were created) and again after the tables were
registered as replication sources.

To eliminate unreasonable deviations, each separate test setup was executed
three times. Before each test was repeated, the test table was dropped and
recreated to guarantee identical test conditions.
First test setup: The test script was executed without any triggers defined.
Second test setup: The second test run only applied to Informix. The reason
is that the Informix capture triggers generated by DJRA require the Informix
system variable USEOSTIME to be set to 1. We wanted to find out whether
this setting had any negative impact on Informix performance.
Third test setup: The test script was executed with capture triggers defined.
3.4.2 Test Results

The purpose of this test was not to determine which database system was
faster. We only wanted to find out the impact that synchronous trigger-based
change capture has on both the examined database systems. Please do not
interpret the results in any other way.
When interpreting the test results, please take into account the fact that the
batch-style insert script we used represents an extreme workload of
INSERTS.
Figure 14 graphically represents the test results. The y-axis (vertical) displays
the INSERT performance ratio comparing the different test setups,
considering the performance without any triggers and without any replication
based changes to the system settings as 100%.
Please note that the ratio displayed in the graph does not compare the
absolute INSERT performance observed comparing Informix and Oracle.

120 Insert Ratio
100.0 96.6 60.6 100.0 69.7
100
80
60
40
20
1 2 3 4 5
0
1
Informix 2
Dynamic Server3 4 5 Oracle 8 6
1 Informix: USEOSTIME = 0, No Capture Triggers

2 Informix: USEOSTIME = 1, No Capture Triggers
3 Informix: USEOSTIME = 1, With Capture Triggers
4 Oracle: No Capture Triggers
5 Oracle: With Capture Triggers
Figure 14. Performance Analysis for Trigger Based Change Capture
3.4.3 Conclusion of the Performance Test

Synchronous trigger-based change capture will always have an impact on
transaction throughput rates, especially for transactions which heavily update
application source tables. That is the price for integrating non-IBM database
systems into a multi-vendor change replication solution.
What you gain is, of course, a change capture mechanism that enables
out-of-the-box change replication for non-IBM database systems without
having to change any application logic and without having to copy complete
database tables when synchronizing source and target (which other vendors
call snapshot replication).

3.5 Summary
In this chapter we first explained which components (DProp, DataJoiner,
DJRA, control tables) are required to build a heterogeneous replication
system, and how these components work together.
We also provided you with a list of the replication features or techniques that
you can use in a heterogeneous replication system, and some references to
examples in this book that show how you can implement these techniques.
Then we discussed the different options that are available when you build the
system’s architecture. We divided these options into two separate categories:
• The system design options, which essentially deal with the placement of
the DataJoiner middleware, the placement of the control tables and the
Apply program.
• The replication design options, which essentially deal with the types of
target tables and the replication timing.
We also discussed the impact that the new replication system will have on
your current production database(s) and illustrated this with some examples.
Now, you have to correlate the information provided in this chapter with the
preliminary information (that is, list of data sources, list of targets, volumes,
for example) that you gathered during the planning phase of your project, and
build a picture of your future heterogeneous replication system.
When you draw this picture, you must precisely indicate how many
DataJoiner servers you will use, how many DataJoiner databases you will
create in each DataJoiner server, where the Apply control tables will be
located, and where the Apply programs will run.
You must also indicate on the picture the types of target tables you will use,
and the timing options that will be used. You should also indicate the naming
conventions that you will use for:
• The databases names, including the DataJoiner databases
• The Apply qualifier names
• The subscription set names
• The userids that will be used to access the non-IBM databases
• The owner (high-level qualifier) of the target tables and of the nicknames

Once all this information appears in the picture that you have drawn, and the
picture has been approved, you can move to the next step: Implementing
your heterogeneous replication system. The next chapter will tell you how you
can efficiently achieve this implementation task.

Chapter 4. General Implementation Guidelines
This chapter introduces general implementation guidelines to set up a

heterogeneous replication system. It is assumed that the replication system
design is complete.
4.1 Overview
Building on recommendations given in Chapter 3, the following decisions are
made before implementing the solution:
• Replication source server platform(s), either DB2 or non-IBM
• Replication target server(s) platforms, either DB2 or non-IBM
• DataJoiner platform(s) and DataJoiner placement
• Placement of DProp Apply, either push or pull configuration
• Control table location, either centralized or decentralized
To assist you in implementing the replication system design, we developed a

general implementation checklist which names and sequences all activities
that are necessary before the first replication definition can be made.
We used the checklist presented in this chapter throughout the

implementation of all case studies that are detailed in the second part of the
book—showing its general usability for different scenarios.
Before we start, please reposition yourself by having a look at Figure 15:

Approaching
Data Replication
General Replication
Replication Replication Implementation Operation &
Planning Design Guidelines Maintenance
Setup Database Implementation Setup Replication

Middleware of Replication Administration
Server Subcomponents Workstation
Create Bind DProp

Replication Capture &
Control Tables DProp Apply
Following the work breakdown approach that guides us through the complete
book, we start to implement the replication solution after planning the project
and after deciding about the overall replication design. Going into the details,
the implementation of the replication solution has to deal with five major
activities:
1. Set up the Database Middleware Server
2. Implement the Replication Subcomponents
3. Set up the Replication Administration Workstation
4. Create the Replication Control Tables
5. Bind DProp Capture and DProp Apply

We will work along these main activities when naming and describing all
setup steps that have to be successfully executed before heterogeneous
replication sources can be registered and before subscriptions replicating
towards IBM and non-IBM target tables can be defined.
4.2 Components of a Heterogeneous Replication System

A distributed heterogeneous replication system consists of several
autonomous components:
• The replication source system, either DB2 or a non-IBM database
• The replication target system, either DB2 or a non-IBM database
• The database middleware gateway
• The replication administration workstation, which is used to set up and
monitor the replication environment
We call a distributed multi-platform replication scenario "heterogeneous"

when either the replication source or the replication target is a non-IBM
database system. No limits apply. Using IBM DataPropagator and IBM DB2
DataJoiner, it is even possible to replicate change data from one non-IBM
database system, such as Oracle, to another non-IBM database system, such
as Sybase.
4.3 Setting up a Heterogeneous Replication System

The main tasks when setting up a heterogeneous distributed replication
system are:
• Installation and configuration of all necessary software components
• Setting up connectivity between all components of the replication system
To guide you through these activities, the following sections provide a

checklist that includes the end-to-end view on how to set up the replication
and database middleware subcomponents.
4.3.1 The Implementation Checklist

All setup activities named in the following implementation checklist are
numbered. The numbers are used to refer to certain setup activities
throughout the book:
General Implementation Guidelines 63

Set Up the Database Middleware Server
Step 1—Install the non-IBM client code to access non-IBM data sources
Step 2—Prepare and check native access to the remote data sources
Step 3—Install the DataJoiner software
Step 4—Prepare DataJoiner to access the remote data sources
Step 5—Create a DataJoiner instance
Step 6—Create the DataJoiner databases
Step 7—Connect DataJoiner to other DB2 or DataJoiner databases
Step 8—Enable DB2 clients to connect to the DataJoiner databases
Step 9—Create Server Mappings for all non-IBM database systems
Step 10—Create the Server Options
Step 11—Create the User Mappings
Implement the Replication Subcomponents (Capture, Apply)

Step 12—Install and set up DProp Capture (if required)
Step 13—Install and set up DProp Apply (if required)
Set Up the Replication Administration Workstation

Step 14—Install DB2 Client Application Enabler (DB2 CAE)
Step 15—Establish DB2 connectivity
Step 16—Install DJRA, the DataJoiner Replication Administration software
Step 17—Set up DJRA to access the source and target databases
Step 18—Modify the DJRA user exits (optional)
Create the Replication Control Tables

Step 19—Set up the DProp control tables at the replication source servers
Step 20—Set up the DProp control tables at the replication control servers
Bind DProp Capture and DProp Apply

Step 21—Bind DProp Capture (if required)
Step 22—Bind DProp Apply

4.3.2 How to Use the Implementation Checklist
We recommend that you follow the given sequence of setup activities,
because most of the steps build on previously achieved results.
After all steps named in the general implementation checklist are successfully
completed, you are ready to define replication source tables and replication
subscriptions. Once these are defined, you can start the replication
subsystems DProp Capture and DProp Apply.
4.4 Detailed Description of the Implementation Tasks

The setup activities provided in the previous checklist will now be expanded
and explained in detail. Furthermore, we will refer to several supplementary
IBM publications that provide additional information.
4.4.1 Set Up the Database Middleware Server

The main task when setting up the database middleware server will be to
install and configure a DataJoiner instance.
Because DataJoiner’s role is to enable database access to a variety of

remote database systems, setting up the database connectivity will be key.
After the following steps are completed, you will be able to transparently
access heterogeneous databases through DataJoiner:
Step 1: Install Non-IBM Client to Access Non-IBM Data Sources

DataJoiner uses data access modules to access remote databases such as
Oracle, Sybase, Informix, Microsoft SQL Server, or other ODBC data
sources. These data access modules require native non-IBM client libraries
to be installed onto the DataJoiner server. A basic step when setting up a
DataJoiner middleware server is therefore to install the remote database’s
client code, such as SQL*Net, ESQL/C, Sybase Open Client, or an ODBC
driver onto the DataJoiner server.
Please refer to the DataJoiner Planning, Installation and Configuration Guide,

SC26-9150 (Chapter 3: Software Requirements) for more information about
software prerequisites regarding non-IBM client software. Details about
setting up client connectivity for all involved non-IBM database systems can
be obtained from documentation provided by the particular database vendors.
It is also always a good idea to contact a local DBA for assistance during this
phase.

Step 2: Prepare and Verify Native Access to Remote Data Sources
Check the connectivity between the database client and the remote database
server natively, before trying to access the database through DataJoiner. This
eliminates potential setup errors as early as possible.
Oracle: Set up an SQL*Net or Net8 client (for Oracle Version 8) to access an

Oracle server and check if native connections work fine.
Informix: Set up an ESQL/C client to access an Informix server and check if

native connections work fine.
Sybase: Set up a Sybase Open Client, using dblib or ctlib (ctlib is

recommended), to access a Sybase server and check if native connections
work fine.
Microsoft: Set up an ODBC client to access a Microsoft SQL Server instance

and check if native connections work fine.
Check Appendix B, “Non-IBM Database Stuff” on page 325 for many useful
details about non-IBM client software, including hints on how to set up
non-IBM database clients and how to natively test connectivity.
Step 3: Install the IBM DataJoiner Software

After you have successfully tested native connectivity between the non-IBM
clients and the non-IBM servers, you can start to set up the DataJoiner
server. Transfer the DataJoiner software from the installation media to the
destination devices and consult the DataJoiner Planning, Installation and
Configuration Guide, SC26-9150 for further details.
As discussed in 3.2.2, “DataJoiner Placement” on page 40, DataJoiner can

be installed on the same machine as the non-IBM database, or on a separate
server. Refer to Case Study 1, section 6.3.2, “Configuration Tasks” on page
151 for a practical example.
Step 4: Prepare DataJoiner to Access the Remote Data Sources

DataJoiner uses data access modules to access remote data sources. On
UNIX platforms, these data access modules have to be built by link-editing
DataJoiner libraries with the client libraries provided by the non-IBM client
software. DataJoiner for Windows NT already provides data access modules
for all natively supported non-IBM database systems. Therefore no specific
action is required on NT.

UNIX platforms: It is a good idea to build the data access modules before the
first DataJoiner instances are created. All subsequently created instances will
automatically contain those data access modules. However, it is possible to
create new data access modules and add those to existing DataJoiner
instances using the db2iupdt command (db2iupdt updates existing DataJoiner
instances).
The creation of the data access modules as well as the update of the
DataJoiner instances has to be executed as root user. To create the data
access modules for the remote databases, use the following guidelines:
1. Log on as root.
2. Set the remote client’s environment variables accordingly (for example,
set the SYBASE variable when link-editing a dblib or ctlib data access
module).
3. Change to the /usr/lpp/djx*/lib directory (the actual DataJoiner path
depends on the DataJoiner version you are using, for example djx_02_01).
4. Execute the shell script djxlink.sh, to automatically create the necessary
data access modules.
5. If the execution of djxlink.sh is not successful, build the data access
modules by:
1. Editing djxlink.makefile
2. Creating the access modules you need by executing
make -f djxlink.makefile <youraccessmodule>
Please refer to the DataJoiner Planning, Installation and Configuration Guide

for AIX, SC26-9145 (Chapter 6: Configuring Access to Data Sources) for
more details. There you will find guidelines to create data access modules for
all natively supported DataJoiner data sources.
Additionally, check the DataJoiner homepage on the World Wide Web,

especially http://www.software.ibm.com/data/datajoiner/faqv2.html, for hot
updates regarding the support of the latest non-IBM client releases (Appendix
H, “Related Publications” on page 397, has the details where to find Web
sites referenced in this book).
Step 5: Create the DataJoiner Instance

Basically, one DataJoiner instance will be sufficient regardless of how many
DataJoiner databases you intend to create. Create multiple instances if you
want to use different start-up parameters for different instances.

Windows NT: One DataJoiner instance is automatically created when the
DataJoiner code is installed. The instance, by default, is called DB2.
UNIX platforms: Some setup tasks have to be completed before the instance
can be successfully created:
1. Create a DataJoiner instance owner group.
2. Create a DataJoiner instance owner.
3. Change to the /usr/lpp/djx*/instance directory (the actual DataJoiner
path depends on the DataJoiner version you are using, for example
djx_02_01_01).
4. Create the instance using the db2icrt <instance owner> command.
Further details can be found in the DataJoiner Planning, Installation and

Configuration Guide for AIX, SC26-9145.
Step 6: Create a DataJoiner Database

One single DataJoiner database can be used to access multiple backend
database systems. Using DataJoiner as gateway within a replication scenario
implies some additional considerations.
Generally, one DataJoiner database is sufficient to support multiple

heterogeneous replication targets (multiple DProp target servers). All
replication control tables are stored within this DataJoiner database, and all
the remote target tables are referenced by nicknames. If replicating to
multiple non-IBM target databases, create one subscription set for each
target database, to avoid distributed transactions.
If multiple non-IBM databases are used as a replication source, one

DataJoiner database has to be created for each non-IBM replication source
database. Refer to 3.2.2, “DataJoiner Placement” on page 40 for background
information.
Step 7: Connect DataJoiner to Other DB2 or DataJoiner Databases

Set up the connectivity to other DB2 or DataJoiner databases (DB2 /
DataJoiner on UNIX/Intel as well as DRDA servers such as DB2 for OS/390,
DB2 for AS/400, DB2 for VM, or DB2 for VSE) by cataloging DB2 nodes and
DB2 databases.
Refer to the DB2 and DataJoiner documentation for more details.

Step 8: Enable DB2 Clients to Connect to the DataJoiner Databases
The activities necessary to enable remote DB2 clients to connect to the new
DataJoiner databases depend upon the communication protocols that the
DB2 clients will use. For each of the incoming communication protocols you
wish to use (DataJoiner supports TCP/IP, APPC and NetBIOS) you have to:
1. Set up the underlying communication subsystem (for example, SNA
Server, TCP/IP).
2. Update the DataJoiner instance with the appropriate communication
settings, for example use:
update database manager configuration using SVCENAME <tcp port>
when setting up the instance to support TCP/IP clients.
3. Set the DB2COMM environment variable to the appropriate value.
Windows NT: Set the environment variable DB2COMM to all the
communication protocols that the DataJoiner instance will support. For
example:
SET DB2COMM=TCPIP,APPC
UNIX platforms: Make use of the ./sqllib/db2profile file and set the
DB2COMM setting there. Execute the db2profile within the instance owner’s
.profile
Remark: These settings only take effect when you recycle ( db2stop -
db2start) the instance.
Step 9: Create Server Mappings for All the Non-IBM Databases

Each non-IBM database that you want to access as either replication source
or replication target has to be referenced through a DataJoiner server
mapping. A server mapping specifies how DataJoiner will subsequently
access a remote data source and is defined using the CREATE SERVER MAPPING
command.
Refer to the DataJoiner Planning, Installation and Configuration Guide,

SC26-9150 (Chapter 6: Configuring Access to Data Sources) for further
details about how to set up server mappings for remote data sources, or to
the DataJoiner Application Programming and SQL Reference Supplement for
details about the CREATE SERVER MAPPING DDL statement.
Use the following SQL statement to query all successfully defined server
mappings within a DataJoiner database:
SELECT SERVER, NODE, DBNAME, SERVER_TYPE,
SERVER_VERSION, SERVER_PROTOCOL
FROM SYSCAT.SERVERS;

Step 10: Create the Server Options
Server options help optimize performance and control certain interactions
between DataJoiner and the remote data sources. We recommend always
setting the server option TWO_PHASE_COMMIT to NO when using
DataJoiner as a gateway for DProp replication.
Example:
create server option TWO_PHASE_COMMIT
for server <hetero_server> setting ’N’;
Refer to the DataJoiner Application Programming and SQL Reference

Supplement, SC26-9148 for more details on the server options which might
be required for the specific release of the non-IBM system you are using.
Step 11: Create User Mappings

In cases where your DataJoiner userids and your non-IBM userids are
different, user mappings are used to map DataJoiner userids and passwords
to non-IBM database userids and passwords. If the userids and passwords
are identical, no user mappings are necessary to enable communication
between DataJoiner and the non-IBM data sources.
When using DataJoiner Replication Administration, DataJoiner user

mappings have an additional value as they are also used to determine the
table schema (table creator) for the tables that are created in the non-IBM
databases.
Recommendation (1): Create at least one user mapping before creating the
replication control tables for a non-IBM replication source. The DataJoiner
Replication Administration program determines the remote schema for
control tables that are created within the remote data source from the
REMOTE_AUTHID defined for the DataJoiner user that DJRA uses to connect to
the DataJoiner database.
Recommendation (2): Define user mappings for all the schemas that you
are planning to use as table qualifiers for non-IBM replication target tables.
Use the following SQL statement to query all successfully defined user
mappings within a DataJoiner database:
SELECT AUTHID, SERVER, REMOTE_AUTHID
FROM SYSCAT.REMOTEUSERS;

4.4.2 Implement the Replication Subcomponents (Capture, Apply)
This section refers to the installation of DProp Capture and DProp Apply. To
decide if either component is required for your replication setup, we have to
distinguish replication source and target systems.
Step 12: Install and Set Up DProp Capture, if Required

The installation of DProp Capture is only required for DB2 replication
sources. For non-IBM replication sources, DProp Capture is emulated by
triggers.
Non-IBM Source: The change capture activity will be achieved using OEM
triggers. There is no need to install additional software.
IBM Source: Install IBM DProp Capture (On UNIX and Intel platforms,
Capture is already pre-installed when setting up DB2 UDB). If the replication
source servers are DB2 UDB databases on Intel or UNIX platforms, change
the LOGRETAIN parameter of the source server’s database configuration to
YES.
Step 13: Install and Setup DProp Apply, if Required

DProp Apply is used to replicate to both DB2 as well as non-IBM replication
targets.
Non-IBM Target: IBM DProp Apply will be used to replicate data to the
non-IBM target databases. As an integrated component of DataJoiner for
Windows NT, Apply is already pre-installed when setting up DataJoiner. On
UNIX platforms, make sure you install DataJoiner’s Apply component when
installing the DataJoiner software.
IBM Target: Install IBM DProp Apply (On UNIX and Intel platforms, Apply is
already pre-installed when setting up DB2 UDB).
Refer to the DB2 Replication Guide and Reference, S95H-0999 for

platform-specific issues on how to install DProp Capture or DProp Apply.
4.4.3 Set Up the Replication Administration Workstation

The complete heterogeneous multi-platform replication system can be
configured from one single PC (if necessary, you can use a couple of
administration workstations).
All information on replication source tables and replication subscriptions is

stored within the replication control tables at either the replication source or
the replication target database. Therefore, the replication Administration

workstation can be added or removed without any effect on the runtime
components of your replication system.
DataJoiner Replication Administration (DJRA) is the GUI (Graphical User

Interface) you will use to define and maintain replication sources and
subscriptions. At the time this book was edited, DataJoiner Replication
Administration was available for 32-bit Windows systems (Windows NT,
’95/’98) only. Therefore, a Windows workstation will be used as replication
administration workstation.
Administering your replication setup from the administration workstation

consists of the following tasks:
• Create the replication control tables
• Define and maintain replication source tables (or views)
• Define and maintain replication subscriptions
• Promote replication sources, replication subscriptions and change data
tables from the test systems to the production systems
• Monitor the status of your replication subsystems
We will now name and describe all the setup tasks that are necessary to
configure the administration workstation. You can then start to define
replication source tables and replication subscriptions from this replication
administration workstation.
Step 14: Install DB2 Client Application Enabler (DB2 CAE)

Install the DB2 UDB Client Application Enabler (CAE) for Windows, on the PC
that you chose as the Administration workstation. Refer to the IBM DB2 UDB
manual Installing and Configuring DB2 Clients on the World Wide Web
(http://www.software.ibm.com/data/db2/performance/dprperf.htm) for further
details.
Step 15: Establish DB2 Connectivity

Basically, the administration PC has to be able to connect to all DB2 or
DataJoiner databases that are used as replication source server or replication
target server. For DRDA connections (for example to access a DB2 for
OS/390 replication source system), the DataJoiner instance (or any other
DB2 CONNECT server) can be used as a DRDA gateway.
Step 16: Install DataJoiner Replication Administration Software

Install DJRA onto the administration workstation. As more and more functions
are added to the replication administration software, it is a good idea to
download the DJRA package from time to time from the Web. The DJRA

package is available from both the IBM DataPropagator and the IBM
DataJoiner homepages on the World Wide Web. Both homepages are listed
in Appendix H, “Related Publications” on page 397.
Note: Some older documentations refer to the Replication Administration

software as DPRTools.
Step 17: Set Up DJRA Connectivity to DB2 / DataJoiner Databases

Set up DJRA to access all DB2 and DataJoiner databases contributing to
your replication scenario.
The main topic here is to create a password file that will contain userids and
passwords for all DB2/DataJoiner replication sources and targets. DJRA will
use the password file when connecting to replication source and target
databases. Use DJRA’s Preference menu (Option: Connectivity) to populate
the password file. The password file will be stored in the DJRA working
directory.
Please note that you have to restart DJRA after cataloging additional
databases into the administration workstation’s database directory. DJRA will
pick up all databases from the database directory at startup time.
Step 18: Modify the DJRA User Exits (Optional)

The DJRA user exits can be modified to customize DJRA. Separate user exits
are available to define default-logic for replication definitions:
• TBLSPACE.REX can be modified to influence tablespace definitions when
the replication control tables are defined.
• SRCESVR.REX: can be modified to change the default attributes and
placement of CD and CCD tables when defining tables as replication
source.
• TARGSVR.REX and CNTLSVR.REX: can be modified to change the
default attributes and placement of target tables when defining replication
subscriptions.
Customizing the user exits provided is a useful option, especially when the
database objects generated during replication setup have to fulfill strict
naming conventions, but it is not a requirement. The standard user exits
named above include several examples that explain how to modify the
defaults.

4.4.4 Create the Replication Control Tables
Basically, you use DataJoiner Replication Administration to generate and run
SQL scripts. These SQL scripts will generate and populate the DProp control
tables at either the replication sources or the replication target systems.
Therefore, the first action after installing DJRA is to create the replication
control tables at all the replication source servers and all the replication
control servers.
Step 19: Create the Control Tables at Replication Source Servers

Use the DJRA function "Create Replication Control Tables" to create all the
DProp control tables at the replication source servers. Select a DataJoiner
database in combination with a non-IBM data source name to create the
control tables for a non-IBM replication source (in fact, you select the
DataJoiner database, and then DJRA provides you with the list of non-IBM
data sources that are accessible from this DataJoiner database). DJRA
automatically generates CREATE NICKNAME statements for all control tables that
are created at a non-IBM source server.
Use the following SQL statement to query all successfully defined nicknames
within a DataJoiner database:
SELECT TABSCHEMA, TABNAME, REMOTE_TABSCHEMA, REMOTE_TABNAME,
REMOTE_SERVER
FROM SYSCAT.TABLES
WHERE REMOTE_SERVER IS NOT NULL;
Advice: Use this select-statement to create a view called NICKNAMES.
Step 20: Create the Control Tables at Replication Control Servers

Use the DJRA function "Create Replication Control Tables" to create all the
DProp control tables at the replication control servers.
4.4.5 Bind DProp Capture and DProp Apply

After the control tables are successfully created, you are able to bind the
DProp Capture and DProp Apply programs to source and target databases.
Step 21: Bind DProp Capture (if Required)

Generally you will want to bind DProp Capture against DB2 replication source
servers. If all your source servers are non-IBM databases, there is, of course,
no need to bind DProp Capture.

Step 22: Bind DProp Apply
Each DProp Apply instance has to be bound against all replication source
servers, all replication target servers, and all replication control servers that
will be accessed by that Apply instance.
On the OS/390 platform, for example, considering that source server, control
server and target server are not identical, all Apply packages have to be
bound against all locations that Apply will connect to during replication:
BIND PACKAGE(<location>.<collection-id>.<pakagename>)
Finally, the bind job has to include a BIND PLAN statement, including all
different locations Apply is bound against. (Note that the following example is
applicable to DB2 for OS/390 V5 only. Examples referring to other DB2
releases are included within the product documentation.)
BIND PLAN(ASNAP510) PKLIST (loc1.collection-id.*,loc2.collection-id.*, ...)
Later on, if you are adding a new location to the replication scenario, just bind
Apply’s packages to the new location and rebind the plan after adding the
new location to the PKLIST.
OS/390 Remark: A sample bind job is included within the DProp

documentation as well as on the DProp installation media.
Performance Advice:
• Do not change the recommended isolation levels provided in the DProp
documentation or in the sample bind jobs.
• Refer to the DB2 Replication Guide and Reference, S95H-0999 for the
syntax of the bind command appropriate to your platform.
• Make use of the BLOCKING ALL bind parameter on UNIX and Intel
platforms. This will enable Apply to use block fetch when fetching change
data from the source systems.
• Be aware that the default for the CURRENTDATA bind option valid for
DB2 for OS/390 has changed from DB2 version 4 to version 5. With DB2
for OS/390 version 5, CURRENTDATA(YES) was introduced as the
default bind option (until DB2 version 4, CURRENTDATA(NO) was the
default). To enable block fetch for DB2 for OS/390, add the
CURRENTDATA(NO) bind parameter to Apply’s bind job, if it is not already
present.
Refer to the DB2 Replication Guide and Reference, S95H-0999 for platform
specific issues.

4.4.6 Status After Implementing the System Design
After performing the setup steps above in the given sequence, your
replication system is ready for use! The next steps will be to:
1. Use DJRA to define your replication source tables (that is, register the
replication sources).
2. Use DJRA to define your replication subscriptions.
3. If you are operating Apply on UNIX or Intel platforms, do not forget to
create a password file for each of your Apply qualifiers before you start
Apply. No password file is required for Capture.
4. If your replication source is DB2, start DProp Capture. (Refer to the DB2
Replication Guide and Reference, S95H-0999 for operational issues on
how to start Capture on the system platform you are using) You will need
to start Capture with at least the DProp Source Server parameter (telling
Capture which DB2 database to monitor).
5. Start DProp Apply (Refer to the DB2 Replication Guide and Reference,
S95H-0999 for operational issues on how to start Apply on the system
platform you are using). You will need to start Apply with at least the
following parameters:
Apply Qualifier (telling Apply which subscriptions are to be serviced by this
Apply instance)
DProp Control Server (telling Apply in which DB2/DataJoiner database the
replication control tables are located)
Advice: Be aware that the Apply Qualifier is a case-sensitive parameter.
Pass it to Apply in upper case, if it is stored in upper case within the DProp
control tables.
Refer to the case studies detailed in the second part of the book to see how
the checklist can be practically used during the implementation phase of a
replication project. If you want to learn more about replicating from non-IBM
source databases, refer to “Case Study 1—Point of Sale Data Consolidation,
Retail” on page 139 (Informix replication sources). To get a deeper insight
into replication examples replicating to multi-vendor databases, have a look
at “Case Study 2—Product Data Distribution, Retail” on page 173 (Microsoft
SQL Server) or “Case Study 3—Feeding a Data Warehouse” on page 203
(Oracle).

4.5 Next Steps—Implementing the Replication Design
The replication design is implemented by defining replication source tables
and replication subscriptions. Following the general implementation checklist,
the replication administration workstations are already configured, so that the
DataJoiner Replication Administration can be used to set up the required
definitions.
Although we will not go into too much detail, we want to use the remaining
sections of this chapter to give you an overview of the next setup steps. We
will be dealing separately with:
• Implementing the Replication Design for Multi-Vendor Target Servers
• Implementing the Replication Design for Multi-Vendor Source Servers
For all further details about using DJRA, please refer to the DB2 Replication
Guide and Reference, S95H-0999 and to the DataJoiner Planning,
Installation and Configuration Guide, SC26-9150 (Starting and Using DJRA).
4.5.1 Replication Design for Multi-Vendor Target Servers

The following steps are necessary to implement a replication design towards
non-IBM target databases.
4.5.1.1 Register the Replication Sources

Define your DB2 replication sources first.
4.5.1.2 Define an Empty Subscription Set

When you choose to replicate to a non-IBM database, you define the
DataJoiner database as the DProp target server. This DataJoiner database
needs to have a server mapping and at least one user mapping to the
non-IBM database that will finally contain the replication target tables.
4.5.1.3 Add Subscription Members to the Set

If you want to replicate to a non-IBM database, you need to specify the server
mapping of the non-IBM target database when you add a member to a
subscription set (DJRA lists all the non-IBM databases that have a server
mapping defined within the DataJoiner database) .
Remark: To avoid distributed transactions all the non-IBM target tables

grouped together within one subscription set must be located in the same
non-IBM target database. All the changes to all the target tables of one
subscription set will be applied within the same unit-of-work.

Figure 16 shows the replication control information and database objects
which are created when adding a member to a subscription set, if the target
table is a non-IBM database table:
Database Database
Create Table
DJRA Create
Nickname
Target Target Table
Nickname
Add a
Member to Insert
Subscription
Sets SUBS_MEMBR
Insert
SUBS_COLS
Figure 16. Define a Non-IBM Table as a Replication Target
If there is no nickname defined within the DataJoiner database that

corresponds to the target table qualifier and target table name specified when
adding the member to the subscription set, DJRA will generate DDL to
natively create the target table within the non-IBM target database.
Additionally, one DataJoiner nickname is automatically created to enable
transparent access to the new target table.
It is also possible to replicate into an existing non-IBM target table. To

achieve this, all you have to do is manually create a nickname on the existing
table, before using DJRA to add the member to the subscription set.

Data Type Compatibility
DJRA creates all the non-IBM replication target tables using DataJoiner’s
PASSTHRU mode. DataJoiner therefore automatically uses forward type
mappings to map all heterogeneous data types into DB2 data types when
creating the nickname. Because DJRA has knowledge of all the data types
of all the table columns that are going to be replicated, DJRA automatically
creates data type fixups for the target table nickname when the source
server’s data types and the type mappings chosen by DataJoiner are not
completely compatible.
Type fixups can be necessary for DATE, TIME, and TIMESTAMP columns,
for example.
To create these data types fixups, DJRA generates ALTER NICKNAME

statements.
It is a good idea to let DJRA create a target table (including possible data
type fixups), even when the non-IBM target table already exists (use a
different table name, for example). Compare the created data types, including
any fixups, with the data types of the already existing table.
Advice: Alternatively, you can use the transparent DDL feature of DataJoiner
V2.1.1 to direct a CREATE TABLE statement to the non-IBM database. Using
this method, no datatype fixups are necessary. For more information refer to
the DataJoiner SQL and Application Programming Reference Supplement,
SC26-9148.
4.5.2 Replication Design for Multi-Vendor Source Servers

The following steps are necessary to implement a replication design from
non-IBM source databases.
As we already introduced, change capture for non-IBM databases, such as

Oracle, Informix, Microsoft SQL Server and Sybase SQL Server, is achieved
by creating capture triggers upon the multi-vendor replication source tables.
This section will help you understand how to set up non-IBM tables as
sources for replication.
4.5.2.1 Create a Nickname for the Non-IBM Source Table

Before you register a non-IBM database table, create a nickname for the
non-IBM source table.

4.5.2.2 Register the Nickname as a Replication Source
Once the nickname is created, you can register it by using the DJRA
functions Define a Table as Replication Source or Define Multiple Tables as
Replication Sources.
When you register a nickname that corresponds to a non-IBM source table,

DJRA automatically creates a script including native CREATE TRIGGER
statements for the following triggers:
• Capture changes triggers
• Pruning trigger
Figure 17 shows the replication control information and database objects that
are created when defining a non-IBM database table as a replication source:
Database Database
Insert
Update
Source Delete Source Table
Create Triggers Nickname
Create
Nickname
DJRA CCD Table
CCD
Nickname
Create Table
Define Drop and Re-create Trigger Prune
a Table as
Replication
Source Pruning Control Pruning Control
Nickname
Insert
Register Register
Nickname
reg_synch
Reg_Synch Reg_Synch
Nickname
Figure 17. Define a Non-IBM Table as a Replication Source

As you can see, DJRA generates DDL to create a Change Data table in the
non-IBM database for every replication source table—to be more precise, it is
a Consistent Change Data (CCD) table—and a nickname for the CCD in the
DataJoiner database. Additionally, DDL statements to create the capture
triggers are generated.
Remarks:
• Notice that the REGISTER, the PRUNCNTL, and the REG_SYNCH table
are already present in the non-IBM database, and that there is a nickname
for each of those tables in the DataJoiner database. These tables and the
corresponding nicknames are created when you create the Control Tables.
• Some database systems, such as Informix Dynamic Server Version 7 or
Microsoft SQL Server (without setting sp_dbcmptlevel to 70), support only
one trigger per SQL operation on a database table. This means that, for
one source table, you can create only:
• One trigger for INSERT
• One trigger for UPDATE
• One trigger for DELETE
Some of those systems do not even issue a warning (Informix, to their
honor, does) when you create a new trigger (say, for INSERT) on a table
that already has a trigger defined for this SQL operation. Therefore, the
database administrator must be careful not to overwrite existing triggers.
In this case all new trigger function required for change replication has to
be manually integrated into the existing triggers.
As you can imagine, DataJoiner Replication Administration will not
compensate missing database functionality for those non-IBM database
systems. But DJRA is smart enough to help the database administrator by
issuing a WARNING whenever non-IBM native triggers are created or
removed. DJRA does this for all supported non-IBM databases, regardless
of whether multiple triggers per SQL action are supported or not.
You will have to decide whether the Capture triggers can be created as
generated, or whether the generated trigger logic has to be integrated into
an existing trigger when a non-IBM table is defined as a replication
source. The same is also true when you remove the definition of a
replication source. Either remove the triggers or adapt the existing ones.

4.6 Summary
In this chapter, we focussed on all activities that are necessary to set up a
multi-vendor, cross-platform replication system. We provided an
implementation checklist that we recommend for use whenever a
heterogeneous replication system is to be implemented (a one-page copy of
the implementation checklist can be found in Appendix C, “General
Implementation Checklist” on page 337). If you are setting up a multi-vendor
replication system yourself, mark the checkboxes of all completed task until
you are finished.
After all setup tasks named in the general implementation checklist are
completed and after the replication design has been defined and tested, your
replication system is ready to use. The next steps would be to carry over all
system components and all tested replication definitions to your production
system environments.
Before doing so, we will use Chapter 5, “Replication Operation, Monitoring

and Tuning” on page 83 to discuss the major operation, monitoring and
maintenance topics of a distributed replication system using IBM DProp and
IBM DataJoiner.

Chapter 5. Replication Operation, Monitoring and Tuning
Before carrying the tested replication system over into your production
environment, we will spend some time discussing operational tasks of a
heterogeneous replication system.
5.1 Overview
But before we start, we want you to reposition yourself again by having a look
at the introductory diagram shown in Figure 18:
Approaching
Data Replication
General Operation,
Replication Replication Implementation Monitoring &
Planning Design Guidelines Tuning
DProp Replication Monitoring Tuning Other

Initial and a Distributed Replication Useful
Repetitive Tasks Replication System Performance Techniques
Following the work breakdown approach that guides us through the complete
book, we will now discuss operational issues. In the previous chapter we
already gained first experiences on a multi-vendor replication system while
designing and implementing a first solution. That means, we can already
assume some expertise on working with distributed replication systems.
This chapter contains a lot of detailed information. It is natural, that you will
not follow every thought while browsing through the different parts of this
chapter for the first time. But the more time you spend on tuning and

enhancing your replication solution, the more questions you will raise.
Therefore, take this chapter as a reference. If you do not get the point at
once, come back to those sections when appropriate.
Figure 18 on page 83 displays the major sections described throughout this

chapter:
• Initial tasks: The most important initial tasks will be to start the capture
process (considering a DB2 replication source system) and to initialize
your replication subscriptions. We will provide some advanced techniques,
especially on how to initialize the replication subscriptions outside of IBM
DProp.
• Repetitive tasks: Some of the housekeeping tasks we are going to
introduce are automatically performed by IBM DProp; others will result in
the extension of existing housekeeping tasks for your replication source
and target databases.
• Monitoring: Before carrying a tested data replication application over to a
production system, the replication system has to fulfill operational
requirements. The ideal case is, as always, an unattended, self-repairing
operational mode. Due to its importance, we have devoted a separate
section in this chapter to discuss monitoring aspects.
• Tuning: A multi-vendor, cross-platform relational data replication system
has many autonomous components. All components are connected using
some network link. All together, this is reason enough to admit a separate
section within this chapter for the discussion of tuning considerations.
• Other useful techniques: Looking at the billboards of our database
competition along the highways, smart techniques are becoming more and
more important these days. We can reassure you that we have lots of
those. Some are revealed in the latter portion of this chapter.
5.2 Operating and Maintaining DProp Replication—Initial Tasks

Before changes can be replicated from your source databases to your target
databases, a consistent starting point for the replication system has to be
established. The contents of the target databases have to be synchronized
with the source databases, and the starting conditions have to be established
in order for the Capture and Apply components to work properly. In this
section we describe the necessary tasks and give you some background
information on how the components work together.

5.2.1 Initialization and Operation of the Capture Task
The Capture process should be treated as a key system task. If DB2 is active,
Capture should also be active. If Capture is not active, there will be no new
change data available for Apply, regardless of how often Apply connects to
the source system.
OS/390 Remark: In most known installations, Capture for OS/390 is operated

as a started task.
AS/400 Remark: Capture can be started at each IPL. The best way to do this
is to include the start of the QZDNDPR sub-system and of Capture in the
QSTRUP program.
Nevertheless, it is possible to stop Capture, for example, to perform certain

housekeeping tasks that need exclusive access to database objects
maintained by Capture (Refer to 5.3.1, “Database Related Housekeeping” on
page 91 for examples).
When Capture stops, it writes the log sequence number of the last
successfully captured DB2 log record (or AS/400 journals) into one of the
DProp control tables (ASN.IBMSNAP_WARM_START), so that it can easily
determine a safe restart point. Even in those cases where it was impossible
for Capture to write the WARM start information when shutting down (for
example, after a DB2 shutdown, using MODE FORCE), Capture is able to
determine a safe restart by evaluating SYNCHPOINT values stored in other
replication control tables, such as the register table. (The only assumption is:
Capture has been successfully capturing changes before the hard shutdown.)
To avoid a Capture COLD start in a production environment, always start

Capture using start option WARMNS, which means that Capture either
resumes capturing at the position of the DB2 log (or AS/400 journal) where it
had stopped previously, or terminates if this is not possible. Without start
option WARMNS, Capture will automatically switch to a COLD start if a
WARM start, for what ever reason, is impossible.
Mobile or Satellite Remark: If IBM DProp is used on mobile systems, both

Capture and Apply can be started on demand. For mobile application
platforms, which synchronize with central servers at certain times only,
Capture can be invoked in mobile mode. In this mode, change capture is
started at the last successfully captured log record, and stopped
automatically if the end of the DB2 log is reached.
Replication Operation, Monitoring and Tuning 85

5.2.2 Initialization of Replication Subscriptions
When DProp Apply services the subscription sets that you previously defined,
it will always check for replication consistency first. When Apply detects that it
is servicing a subscription set for the first time ever, Apply will automatically
initialize the replication target tables by copying all required data directly from
the source tables to the target tables. Using DProp terminology, we call this a
full-refresh (we are assuming here that the target tables are complete).
Considering huge source tables, a full-refresh can be quite an expensive

activity. Also take into account that Apply has to access the replication source
tables (your application tables) directly during a full refresh.
Remark: Apply always inserts the data that it fetched from the replication
source server within a single transaction into the target tables, to guarantee
target site transaction consistency at subscription set level. This also applies
to the full refresh.
To let you control the replication initialization, DProp offers a lot of freedom
and flexibility in how the initial full refresh task is performed.
5.2.2.1 Initial Refresh Maintained by Apply

If you decide to let Apply perform the initial refresh automatically (this is the
default), you can skip over the following paragraphs regarding initial refresh.
Nonetheless, if you are interested, you will find a lot of useful information to
better understand how IBM DProp guarantees data consistency.
Before skipping over the following paragraphs, let us just recall that an
automatically maintained initial refresh consists of the following two steps:
1. Handshake between Capture and Apply
2. Moving data from the source tables to the target tables
OK, now it is safe to go ahead and jump to 5.2.2.3, “Manual Refresh / Off-line
Load” on page 89!
5.2.2.2 Initial Refresh - A Look Behind the Curtain

DProp maintains copy integrity by an information exchange protocol that lets
Capture and Apply communicate with each other. This protocol assures that
Apply can detect when copy integrity is compromised, and that Capture
knows when to start capturing, and when and to what extent it is safe to prune
rows that are no longer needed from the change data tables. The information
interchange is facilitated by independent, asynchronous updates to the
DProp control tables.

We will reveal some details about DProp’s full refresh protocol here, because
this background knowledge will be helpful when you decide to perform the
initialization of your target tables yourself.
How Apply Initiates the Handshake

Consider that Apply has already decided that a subscription set needs to be
full-refreshed.
As already mentioned, moving data is not the only activity during the full
refresh. Even more interesting is how Capture and Apply perform the initial
handshake, because this handshake has to be replayed when initializing the
replication target tables manually.
Before fetching data from the replication source tables, Apply lets Capture
know that it is starting to perform the initial refresh.
Apply actually initiates the handshake by setting the SYNCHPOINT value of

the corresponding row of the pruning control table ASN.IBMSNAP_PRUNCNTL to
hexadecimal zero (x’00000000000000000000’), which is used as a trigger value.
Figure 19 references this step as step (1). Apply updates the pruning control
table for every member belonging to the subscription set that is being
initialized.
DB2 logs these updates, as usual, which is visualized in step (2).
How Capture Detects a Handshake Request

Capture detects Apply’s handshake request while monitoring the DB2 log (or
AS/400 journal). The pruning control table is created with the DATA
CAPTURE CHANGES attribute, so that Capture can read changes to the
pruning control table from the DB2 log. Refer to step (3) in Figure 19.
Remark: On an AS/400, starting the journalization for a physical file is

equivalent to setting the Data Capture Changes attribute on the other
platforms.
Upon seeing a DB2 log record indicating that the SYNCHPOINT column has
been updated to hexadecimal zero for a subscription member, Capture
immediately translates the hex zero synchpoint into the actual log sequence
number of the log record read. The log sequence number value is retrieved
from the header of the log record that contains the update to
x’00000000000000000000’. See step (4) in Figure 19.

Pruning Control Table
SYNCHPOINT Column
LSN
x ’0000...0000’ (e.g.: x ’0000...1234’)
1 4
APPLY 2 CAPTURE
(or user)
DB2
Log
Record 3
Synchpoint set to
x ’0000...0000’
Figure 19. Initial Handshake between Capture and Apply
Apply will take the translated synchpoints into account when it performs the
next replication cycle (for that subscription set). The translated synchpoint
tells Apply exactly, when it initiates the initial refresh. Apply now knows that
all CD table records with a higher log sequence number (LSN) are awaiting
replication, and that all updates with a lower log sequence number have
already been included within the initial refresh.
If Capture is not already capturing changes to a replication source, it

immediately starts capturing all the changes to that source table that were
logged after Apply’s handshake request. That, of course, includes all the
updates to a source table that were made while Apply was fetching data
during the full refresh. (This is OK—Apply will make use of its rework
capability if it replicates data that has already been included in the initial
refresh.)
Capture acknowledges every successful handshake by issuing the following

message in the Capture trace table (ASN.IBMSNAP_TRACE) and to the log
file (or joblog):
ASN0104I: Change capture started for owner "<owner>"; table name is

"<table>" ...,
This message is also known as Capture’s GOCAPT message.

Remark: Apply’s automatic full refresh capability is available for all supported
replication source and replication target databases, and that implies, of
course, cross-platform, multi-vendor full refresh capability.
5.2.2.3 Manual Refresh / Off-line Load

After understanding how Apply initiates a full refresh for a target table, we will
now better understand what is necessary to perform the full refresh manually.
Basically, if you decide to perform the initial load of your replication targets
yourself, your responsibilities will be:
• To guarantee that replication source and replication target are
synchronized (by loading the target tables), and
• To let DProp know about it, by updating the replication control tables as
explained it in 5.2.2.2, “Initial Refresh - A Look Behind the Curtain” on
page 86.
The DataJoiner Replication Administration component will guide you through

this process and will generate all SQL statements necessary to update the
replication control tables for you. The only thing you have to do is to execute
the generated SQL statements and to unload and load the data. DJRA will
guide you through the correct sequence in which the above steps have to be
executed.
Loading the Target Tables Manually

The sequence of steps that DJRA will generate for you to manually load your
replication target tables is:
1. Disable automatic refreshes from the source table.
2. Deactivate the subscription set and update the replication control tables to
synchronize Capture and Apply (handshake request).
3. Unload the replication source tables.
4. Load the replication target tables.
5. Reactivate the subscription set.
Some Background Information: The Capture-Apply synchronization should

occur before the unload/export step. That way, any updates that occurred
while the unload/export was in progress will be captured and will propagate
with the next differential refresh, in the subscription cycle that follows the full
refresh.
If the Capture-Apply synchronization occurs after the unload/export, then

there is a chance that some updates will never propagate to the target.

Capture interprets Apply’s handshake request as a starting signal to begin
the change capture activity for a certain source table. If you want to update
the replication control tables to simulate Apply’s handshake request after
unloading the source data, you will need to quiesce the source applications
during the refresh (or exclusively lock all replication source tables) to prevent
any changes.
Summarizing, if you follow the guidelines provided by DJRA and you

synchronize Capture and Apply before you unload the source data, you can
perform the manual full refresh while your source applications are running.
Possibly, data that has been changed during the unload phase could be
included in the unload data set (or file) as well as in the change data table.
Apply will replicate those records again during the first replication cycle
following the initial refresh: Due to its rework capability for condensed target
tables, Apply will successfully re-replicate those changes. Inserts will
automatically be reworked into updates, updates will be reworked into
updates to itself, and deletes of rows that are not present just have no effect
at all.
What if You Consider Your Copies Already Being Initialized?

If you are sure that your replication source and your replication target tables
are already synchronized, there is no obvious need for loading your target
tables again. Nevertheless, DProp Capture and DProp Apply will not start
their job unless you tell them that the full-refresh has been done (by initiating
their initial handshake as explained in “Loading the Target Tables Manually”
on page 89).
Note: Without any manual interaction, Apply always tries to automatically

perform the full refresh itself. Disabling automatic refresh by setting
ASN.IBMSNAP_REGISTER.DISABLE_REFRESH = 1 as described in 5.6.2, “Selectively
Preventing Automatic Full Refreshes” on page 129 will not be sufficient.
Apply will issue the following error message instead:
ASN1016I: Refresh copying has been disabled ...
Use the DataJoiner Replication Administration feature Off-Line Load as

described in the previous sections, but omit the steps to unload and load the
data. Just execute the generated SQL scripts to initialize Capture and Apply.
Remarks for Heterogeneous Environments: The above mechanisms apply

both to DB2 replication sources as well as to multi-vendor sources. As there
is no Capture process at non-IBM replication sources, triggers will emulate
Capture’s role during the initialization of a replication subscription.

5.3 Operating and Maintaining DProp Replication—Repetitive Tasks
As repetitive tasks we identified housekeeping activities to manage space
allocated by change data tables and replication control tables. After
discussing standard database related housekeeping tasks for change data
tables, we will focus on how records that have already successfully been
replicated are periodically removed from the change data tables again.
5.3.1 Database Related Housekeeping

We generally recommend that you reorganize all volatile and reasonably
sized database tables on a regular basis in order to reclaim space. In the
DProp replication environment, this is especially applicable to the change
data tables, the unit-of-work table, and for some of the replication control
tables.
5.3.1.1 Reorganizing CD Tables and the Unit-of-Work Table

The change data tables and the unit-of-work table receive heavy INSERTS
during change capture and heavy DELETES during pruning. Those tables are
never updated.
The necessity for reorganizing the change data tables and the unit-of-work
table (ASN.IBMSNAP_UOW ), of course, depends on the update rates against the
replication source tables. As a rule of thumb, reorganize the change data
tables and the unit-of-work table about once a week.
On OS/390: If using DB2 for OS/390 Version 5 or higher, specify the

PREFORMAT REORG option. Preformatting the tablespace will speed up
Capture’s insert processing.
On AS/400: Run the RGZPFM command on all the change data tables and
on the unit-of-work table, once a week.
5.3.1.2 Reorganizing the DProp Control Tables

Basically the same rule also applies to some of the DProp control tables,
especially the Capture trace table ( ASN.IBMSNAP_TRACE) and the Apply trail table
(ASN.IBMSNAP_APPLYTRAIL).
5.3.2 Pruning
DProp terminology uses the term pruning for the process of removing records
from the change data tables that have already been replicated to all targets.
Capture performs the pruning for the change data tables, the unit-of-work
table, and the Capture trace table. Manual pruning has to be established for

certain types of consistent change data (CCD) tables and for the Apply trail
table.
5.3.2.1 Automatic Pruning of CCD Tables and Unit-of-Work Table

Change data tables (and the unit-of-work table for DB2 replication source
servers) have the potential of unlimited growth. Therefore, DProp has to
provide a mechanism to prevent change data tables from running out of
space. It is Capture’s duty to perform the pruning for change data tables and
the unit-of-work table.
For multi-vendor replication source systems, the pruning mechanism is

provided by a trigger, defined on the pruning control table (refer to 6.7.1,
“Using Triggers to Emulate Capture Functions” on page 166, for a deeper
insight into how triggers are used to achieve change capture).
Without going into the details, pruning can be considered to be a very

CPU-consuming process. Therefore, we recommend that you defer pruning
to off-hours by starting the Capture program using the NOPRUNE start
option, then launch pruning when appropriate by using Capture’s PRUNE
command.
The same result can be achieved for multi-vendor replication sources, by

temporarily disabling (or dropping) the pruning trigger.
AS/400 Remark: On AS/400, there is no NOPRUNE parameter, but you can

achieve the same result by specifying *NO for the Start Clean-Up parameter
of the STRDPRCAP command. Since there is no PRUNE command, you will
have to stop Capture and restart it with Start Clean-Up *IMMED when you
want to start the pruning.
5.3.2.2 Pruning of CCD Tables

Capture performs pruning operations only for those tables that it maintains
itself (which are, as explained above, the change data tables and the
unit-of-work-table).
CCD tables (consistent change data tables) are maintained by Apply. They
are not automatically pruned by Capture.
For some types of CCD tables, by replication design, pruning is not required:
• Complete condensed CCD tables are updated in place, so that they do not
grow without bound. The only records that could be removed from these
CCD tables are those with IBMSNAP_OPERATION equal to ’D’ (Delete)
that have already been propagated to the dependent targets.

• Non-condensed CCD tables that contain history, with the assumption that
you wish all the data to be retained.
On the other hand, CCD pruning is an issue for internal CCD tables. This type
of table will grow if there is large update activity, and it could reach the size of
a complete CCD table. Yet, there is no value in letting this table grow, as only
the most recent changes will be fetched from it.
To enable pruning for internal CCD tables, you might want to add an
SQL-After statement to the internal CCD table's subscription to prune change
data that has already been applied to all dependent targets. Instead of letting
Apply launch the pruning statement via SQL-After processing, you could also
add the pruning statement to every other automatic scheduling facility.
A crude, but effective statement for internal CCD table pruning would be:
DELETE FROM <ccd_owner>.<ccd_table>
WHERE IBMSNAP_COMMITSEQ <=
(SELECT MIN(SYNCHPOINT)
FROM ASN.IBMSNAP_PRUNCNTL);
This will prune behind the slowest of all the subscriptions, not just those
subscriptions which refer to the source table associated with the internal
CCD. You might want to improve the pruning precision by adding the
replication source table to the subselect:
DELETE FROM <ccd_owner>.<ccd_table>
WHERE IBMSNAP_COMMITSEQ <=
(SELECT MIN(SYNCHPOINT)
FROM ASN.IBMSNAP_PRUNCNTL
WHERE PHYS_CHG_OWNER = ’<phys_chg_owner>’
AND PHYS_CHG_TABLE = ’<phys_chg_table>’;
To find out all internal CCD tables together with their source and change data
tables that are defined within your replication system, run the following query
at the source server:
SELECT SOURCE_OWNER, SOURCE_TABLE,
PHYS_CHG_OWNER, PHYS_CHG_TABLE,
CCD_OWNER, CCD_TABLE
FROM ASN.IBMSNAP_REGISTER
WHERE CCD_OWNER IS NOT NULL;
5.3.2.3 Pruning of the APPLYTRAIL Table

At the end of each subscription cycle, DProp Apply reports subscription
statistics in the Apply trail table (ASN.IBMSNAP_APPLYTRAIL). The Apply

trail table is located at the replication control server. For each subscription set
and cycle, Apply will insert one single row into the Apply trail table.
To keep the table from growing too large, these rows need to be deleted from
time to time. When to delete these rows is entirely up to you. Apply writes to
the ASN.IBMSNAP_APPLYTRAIL table, but never reads from it again.
An easy way to manage the growth of this table is to add an SQL-After

statement to one of your subscription sets. Alternatively, you could also add
the pruning statement to every other automatic scheduling facility:
DELETE FROM ASN.IBMSNAP_APPLYTRAIL
WHERE LASTRUN < (CURRENT TIMESTAMP - 7 DAYS);
If you are one of those more sophisticated types of DBAs, your SQL
statement could look like the following example instead:
DELETE FROM ASN.IBMSNAP_APPLYTRAIL
WHERE
( STATUS = 0
AND EFFECTIVE_MEMBERS = 0
AND LASTRUN < (CURRENT TIMESTAMP - 1 DAYS))
OR
( STATUS = 0
AND EFFECTIVE_MEMBERS > 0
AND LASTRUN < (CURRENT TIMESTAMP - 7 DAYS))
OR
(
LASTRUN < (CURRENT TIMESTAMP - 14 DAYS))
;
The statement shown above will prune the Apply trail table in stages:
• All Apply status messages, reporting that nothing was replicated
(EFFECTIVE_MEMBERS = 0), and also that no error occurred during replication
(STATUS = 0), will be removed first (after 1 day).
• All Apply status messages that report some replication action will stay in
the table a little bit longer (for example, 7 days). We detect that data was
actually replicated within one subscription cycle by specifying
EFFECTIVE_MEMBERS > 0.
• All other messages, possibly those reporting replication problems, will stay
longer. We can prevent error messages from being pruned earlier by
restricting the first two predicates to STATUS = 0.
Feel free to adjust the time period that the statistics records stay in your
Apply trail table. For example, if you are replicating continuously, you will

want to prune the Apply trail table more frequently than if you are replicating
just once a week.
Remark: You can even occasionally delete everything from the Apply trail
table. However, you had better not do that for one of the other replication
control tables. So be careful when typing in the SQL statement!
5.3.2.4 Journals Management on AS/400

On the AS/400, the journal receivers used by the Capture program must be
regularly deleted to reduce the used disk space, but you must not remove
journal receivers that are still required by the Capture program.
If you are running OS/400 V420 or a later version, a system exit prevents you
from removing receivers that are still needed by the Capture program. We
recommend that you indicate MNGRCV(*SYSTEM) when you create the
journals, and that you indicate a threshold when you create the journal
receivers.
If you are running OS/400 V410, you must use the ANZDPRJRN command to
safely remove the receivers that are no longer needed, and we recommend
that you create the journals with MNGRCV(*USER).
5.3.3 Utility Operations

Some DB2 related utility operations bypass DB2 logging when they
manipulate data stored within replication source tables. Without logging,
Capture is not able to recognize those changes. That means, the changes to
replication source tables made by these utilities do not make their way into
the change data tables (and will never replicate).
In the following sections we will give an overview of utility operations that

need special care taken when executed against database tables that are
registered as replication sources.
Remark: On the AS/400, the following operations on a source table will be

detected by the Capture program, and will produce a full-refresh the next time
the Apply program is run : restoration (RSTOBJ, RSTLIB), dejournalization
(ENDJRNPF), clear file (CLRPFM).
5.3.3.1 Table Loads

Performing LOAD operations against your DB2 replication source tables
causes them to be out of synch from their copies. Depending on your
application, this may or may not be a concern to you.

DB2 LOAD utilities do not perform change data capture logging. For better
performance, LOAD does as little work as possible, leaving little indication of
its presence.
If you are periodically performing LOAD (REPLACE) or LOAD (RESUME)

operations against your source tables, outside of replication control, you
might want to continue the job and also perform these utility operations
against your copies. Or, you may want to drive the replication software to
re-initialize those subscription sets which refer to source tables that have
recently had a LOAD operation. In any case, be aware that your existing
operations may potentially cause data consistency errors with respect to the
copies of the tables you load, and you may need to modify or expand your
LOAD procedures.
To see how to issue a re-synch request for your replication target tables, refer
to 5.6.3, “Full Refresh on Demand” on page 132.
5.3.3.2 Recovery
As with load processing, your RECOVER procedures should consider the
effect on data consistency with copies derived from source tables that needed
a RECOVER. You may want to expand your procedures to perform a
coordinated recovery of a source table and all its copies, or you may want to
drive the replication software to re-initialize those subscription sets which
refer to source tables for which there was a recent RECOVER operation, or
you may decide to tolerate any data consistency errors resulting from your
RECOVER procedures.
5.3.3.3 Pseudo-ALTER Tools

Third-party "Pseudo-ALTER TABLE" tools appear to add a number of
functions to DB2's ALTER TABLE statement, among these are:
• Rename a column
• Change the length of a column
• Change the data type of a column
• Delete a column
These tools do not actually alter a table, but rather unload, drop, re-create
and load a new table in place of your existing table. Keep in mind that DB2
logs updates to a table based on internal identifiers, not by the names of the
tables. From Capture's perspective (and DB2's) it is merely coincidental that
this new table has a name matching the name of your old table. If you wish to
continue using such tools, you will need to carefully coordinate the
pseudo-alter processing:

1. Stop all updating applications
2. Let Capture process all the log records written for the old table
3. Stop Capture
4. Run the pseudo-alter utility
5. Start Capture and restart your updating applications
5.3.3.4 Tablespace Reorganizations

Customers using DB2 for OS/390 need to make only a slight, but important,
change in their REORG procedures.
Generally, DProp is not affected by tablespace reorganizations:

• Capture is not affected, as the internal object identifiers do not change.
• Apply is not affected, as key values, not internal row identifiers, are used
to associate source and target rows.
If you are using tablespace compression, it is very important that you specify
the KEEPDICTIONARY REORG utility option, which is not the default.
DB2 for OS/390 can keep at most one compression dictionary in memory per
tablespace. Once a new compression dictionary is generated, such as during
a REORG, it replaces the previous dictionary. The DB2 log interface cannot
return log records written using an old compression dictionary.
If Capture has already processed all log records written prior to the REORG,
there is no problem.
If Capture requests a range of log records written before the REORG, and at
least one of the log records within the requested range was written using a
compression dictionary that has changed as a result of a REORG, then DB2
will not return the requested range of log records through the log interface.
Advice: If you plan on rebuilding your compression dictionaries, coordinate

the REORG and Capture processing:
1. Stop all applications that usually update any table having the change data
capture attribute and being created within a compressed tablespace (not
just those tables registered as replication sources).
2. Let Capture process all the log records written using the ’old’ compression
dictionary.
3. Stop Capture.
4. REORG, without specifying KEEPDICTIONARY.

5. Start Capture and restart your updating applications.
Better yet, learn to live with the compression dictionaries you now have,
resisting the urge to rebuild them.
5.4 Monitoring a Distributed Heterogeneous Replication System

From an operator’s perspective, a new system application should run
unattended and fix all unforeseen error situations automatically. The effort to
deliver such a system (and we do not speak only of replication systems here),
will be quite high.
Before a replication application is carried over to the production environment,

the replication applications should at least be able to detect error situations
automatically. If errors cannot be solved autonomously, a message has to be
sent out to ask for support.
This section will give you an overview of replication monitoring issues and
techniques. Before we go into details and focus on the several separate
components of a distributed replication system, we will name all the
components that will be subject to monitoring.
5.4.1 Components That Need Monitoring

The following system components of a distributed replication system will have
to be monitored to guarantee non-disruptive operation and promised target
table latency:
Source Database System: We generally assume that your replication source

databases, either DB2 or multi-vendor, are already integrated into existing
monitoring and systems management environments.
Capture Process: If Capture is not running, no changes to the registered

source tables will be captured at the replication source system. The
monitoring system at least must be able to guarantee that Capture is running
continuously. Some error situations could even be automatically resolved by
the monitoring system itself. Section 5.4.3, “Monitoring Capture” on page 101
will demonstrate some advanced monitoring and troubleshooting techniques
for the DProp Capture program.
Tablespaces: Change data tables are stored in tablespaces. For some

database systems or on some system platforms, those tablespaces can be
created with a fixed size only. If the size limit is reached, Capture will not be
able to capture any additional changes.

Target Database System: We generally assume that your replication target
databases, either DB2 or multi-vendor, are already integrated into existing
monitoring and systems management environments.
Apply Process: Especially in large data distribution environments, all Apply

processes have to run mostly unattended. To make you aware of replication
problems, the Apply processes have to be monitored. Section 5.4.4,
“Monitoring Apply” on page 106 reveals common monitoring techniques for
Apply.
Middleware Server (IBM DataJoiner): A DataJoiner middleware server will

be introduced when replicating to or from non-IBM database systems. When
the DataJoiner instance is stopped, non-IBM sources or targets cannot be
accessed by Apply. Refer to 5.4.5, “Monitoring the Database Middleware
Server” on page 116 for appropriate monitoring techniques.
Database Gateways: Assuming that a DRDA server is involved in a

replication system (DB2 for OS/390, DB2 for VM/VSE or DB2 for AS/400),
then a DRDA gateway is required to enable DB2 connectivity to those
servers. DataJoiner’s built-in DRDA requester can be used, or a separate
DRDA gateway can be installed, which will be used by several database
applications on several servers.
If a separate DRDA gateway is used, this is of course also subject to

monitoring.
Use the following sections to gain knowledge about recommended

techniques on how to monitor the above system components.
5.4.2 DProp’s Open Monitoring Interface

To fit into all different kinds of systems management and systems monitoring
applications available on the marketplace, even considering that we are
dealing with a distributed (multi-platform), heterogeneous (multi-vendor)
environment, IBM DProp provides on open interface to all status information
and error messages it produces. You may want to consider customizing the
monitoring tools you are using for other applications to make use of DProp’s
open interface.
Most of the replication statistics are available within the replication control
tables. We will use the following sections to introduce examples of how to
make use of the information within the replication control tables to fulfill
replication monitoring tasks.

For a detailed description of all DataPropagator control tables, including DDL
for each of the different tables, refer to the DB2 Replication Guide and
Reference, S95H-0999 Chapter "Table Structures". Additionally, the
Replication Guide contains a listing of all the replication status and error
messages that DProp might issue.
5.4.2.1 What Does the Interface Look Like?

As already introduced, the DProp control tables contain replication status
information. Additionally, and this statement is true for all operating system
platforms, Capture and Apply issue some trace messages and all error
messages into a log file or into the job log.
DProp Control Tables

The easiest way to determine the current status of the replication system is to
select the information available in the replication control tables.
Some of the control tables can be used to determine the status of the change
capture process, others are available to get an overview of the subscription
status, the subscription latency, or to evaluate statistics about the data
volume replicated within the most recent subscription cycles.
Log Files and Console Messages

All error messages that are recorded in the replication control tables are also
written into log files or into the job log, which make the messages available to
those monitoring environments that are not capable of using database
queries. Using automated operators to analyze the job log is quite a common
technique on host-based operating system platforms, for example.
In the following sections, we will provide you with queries against the DProp
control tables, which extract useful monitoring information, and with
techniques to work around replication problems.
Trace
Finally, start Capture or Apply in trace mode, if the problem that is causing an
error is not obvious:
• Capture’s start option to enable trace mode is TRACE.
Remark:This option is not available on AS/400 because the Capture/400
program writes a lot of information into the ASN.IBMSNAP_TRACE table.
• Apply’s start option to enable trace mode is TRCFLOW.
The generated traces contain a large amount of text output, including dumps
and SQLCA information that can be used to determine the cause of the

problem if the messages inserted into the replication control tables did not
contain enough information.
Advice: Only start Capture and Apply in trace mode, if you are investigating
problems. The trace mode obviously slows the replication processes down
and also generates a lot of output.
5.4.3 Monitoring Capture

The Capture process is a very important task. If Capture is not running at all,
no changes to the registered source tables will make their way into the
change data tables.
Additionally, Capture reads through the DB2 log sequentially. A problem with
one replication source table can therefore delay change capture for the
complete replication system.
Considering this, the main monitoring tasks regarding the Capture program
will fall into the following categories:
• See if Capture is up and running
• Detect and solve Capture problems as soon as possible
5.4.3.1 Monitoring the Capture Process

Checking to see if Capture is up and running should be a task your monitoring
system performs on a regular basis. Depending on the execution platform you
are using, the facilities available to detect if a program is running will be
different. The following example shows how to determine if Capture is
running for UNIX operating systems:
#!/bin/ksh
ps -ef | grep ’asnccp <source_server>’ | grep -v grep | wc -l
The above command will return 1 if the Capture instance is active.
On AS/400, you can check whether Capture is running by issuing the

WRKSBS command, choose option 8 in front of the QZSNDPR sub-system,
and check that job QDPRCTL5 is running. After the initial full-refresh has
been done, you should also see several jobs having the same name as the
journals.
Remark: The QDPRCTL5 job is sometimes in a MSGATT status, but it does

not necessarily mean that it is waiting for an answer to an error message. If it
is waiting for an answer, there will also be an error message in the QSYSOPR
message queue.

5.4.3.2 Detecting Capture Errors
Capture errors can be detected by querying the Capture trace table
ASN.IBMSNAP_TRACE. All available Capture messages can be retrieved by
the following query (use descending order to see the most recent messages
first):
SELECT OPERATION, TRACE_TIME, DESCRIPTION
FROM ASN.IBMSNAP_TRACE
ORDER BY TRACE_TIME DESC;
Error messages only are retrieved by adding the following where-clause:

SELECT OPERATION, TRACE_TIME, DESCRIPTION
FROM ASN.IBMSNAP_TRACE
WHERE OPERATION = ’MESSAGE’
AND SUBSTR(DESCRIPTION , 8 , 1) = ’E’
ORDER BY TRACE_TIME DESC;
Possible Capture errors could be caused by incorrect replication source

definitions (for example, a replication source table is created without the data
capture changes attribute, a change data table has been accidentally
dropped, column types of change data table and source table do not match,
or Capture does not have sufficient privileges to write into the change data
table).
All error messages dealing with incorrect replication definitions clearly

identify the source table that is causing the problem. Correct the error or
remove the failing registration to resume change capture for your source
system.
Capture records error messages, including a specific DProp message

number, into the Capture trace table. A more detailed problem description,
including possible solutions for all replication related problems, can be
obtained from the DB2 Replication Guide and Reference, S95H-0999.
5.4.3.3 Capture Lag

Log based change capture is an asynchronous process. Additionally, the
Capture process can be scheduled to run in certain time windows only or to
use a minor system priority so that it does not interfere with any source
applications.
Therefore, it is possible that Capture temporarily does not keep up with

processing all log records as quickly as DB2 writes them. We call the time
difference between the current timestamp and the timestamp of the last log
record processed by Capture the Capture lag.

With DProp V5, it has become very easy to determine the Capture lag,
because Capture maintains a heartbeat within the register table. The register
table contains a so-called global record, which is updated by Capture every
time Capture commits its change capture activity. (Refer to 5.5.2, “Adjusting
Capture Tuning Parameters” on page 118 to see how to customize Capture’s
COMMIT interval.)
Every time Capture commits, it updates the global record with the log
sequence number (SYNCHPOINT) and the timestamp associated with the
last processed log sequence number (SYNCHTIME).
The Capture lag therefore can be calculated by comparing the CURRENT

TIMESTAMP with the SYNCHTIME of the last processed log record:
SELECT SECOND (CURRENT TIMESTAMP - SYNCHTIME) +
((MINUTE (CURRENT TIMESTAMP) - MINUTE(SYNCHTIME)) * 60) +
((HOUR (CURRENT TIMESTAMP) - HOUR (SYNCHTIME)) * 3600) +
((DAYS (CURRENT TIMESTAMP) - DAYS (SYNCHTIME)) * 86400)
AS CAPTURE_LAG
WHERE GLOBAL_RECORD=’Y’;
According to this query, the Capture lag is displayed in seconds. To see the
actual timestamp of the log record most recently processed by Capture, just
select the global record from the register table:
SELECT SYNCHPOINT, SYNCHTIME
WHERE GLOBAL_RECORD=’Y’;
Remark for Heterogeneous Environments: Capture triggers are

synchronous. They commit within the same transaction as the source
application. Therefore, there is no need to monitor a Capture lag for
multi-vendor replication sources.
5.4.3.4 Resolving a Gap

We can consider Capture to be a very robust process. Nevertheless, Capture
has a severe problem when DB2 cannot offer the log records that Capture is
requesting. Just to skip those log records would of course compromise the
replication consistency.
So what does Capture do if the DB2 log interface cannot deliver the log
records requested by Capture? Right, Capture stops and issues an error
message. In DProp terminology, we call this status a gap (that is, some piece
of the DB2 log is missing).

If the error persists after restarting Capture, there are two options available to
resume replication. Both options are a little bit tricky:
1. Capture COLD start: This option is tricky, because it will automatically
cause a full refresh for all replication target tables.
2. Manually help Capture over the gap: This option is tricky, because it needs
very sensitive manual interaction. But, if successfully performed, the
consequences are much smaller than in option 1.
In regard to unavailable log records, please consider that DB2 also might be
in trouble if certain log records are no longer available. Also consider that,
since Capture is able to process archived log records, a Capture gap
because of unavailable log records should never happen. But to be prepared,
we nevertheless want to go into more detail.
Resolving the Gap with a Capture COLD Start

A Capture COLD start is what you probably want to avoid in a productive and
highly-tuned distributed replication system. A COLD start will certainly
resolve the gap, but a COLD start will also force all replication target tables to
perform a new initial refresh (because the replication consistency is
compromised):
• Performing a COLD start, Capture ignores all information about previously
captured log records. Capture will resume the change capture with the
most current log record DB2 can provide.
• During a COLD start, Capture removes all previously captured log records
from all change data tables and the unit-of-work table. A COLD start can
be considered a big cleanup task.
Some production environments can tolerate a Capture COLD start with

subsequently automatically performed full-refreshes. Others, especially those
creating history tables by the means of data replication (Refer to Chapter 8,
“Case Study 3—Feeding a Data Warehouse” on page 203 for several
examples) cannot.
Remark: On AS/400 the way to start Capture in COLD mode is to indicate the
RESTART(*NO) parameter in the STRDPRCAP command.
Resolving the Gap Manually

If a Capture COLD start cannot be tolerated, Capture has to be provided with
a log sequence number from where Capture can successfully resume
capturing.
Warning: Using this advanced technique could cause some source

transactions not to replicate! That is what you have to be aware of. In certain

scenarios, this might be less of a pain than the refresh of all replication
targets.
If you started Capture using the WARMNS start option (which means WARM
start or no start), Capture will terminate when a requested log record cannot
be provided by the DB2 log interface. Capture will issue an error message
into the Capture Trace table ( ASN.IBMSNAP_TRACE) and Capture will write at
least one WARM start record into the WARM start table
(ASN.IBMSNAP_WARM_START). The WARM start table could look like the following
example:
SEQ AUTHTKN AUTHID CAPTURED UOWTIME
----------------------- ------------ ---------------- -------- -----------
x’00000000485E57D60000’ 0
x’00000000485E82F60000’ APPLY01 DB2RES5 N -1307724304
x’00000000485107C80000’ APPLY01 DB2RES5 N -1307736417
x’00000000480135D20000’ APPLY01 DB2RES5 N 0
If a restart attempt fails again with the same error message, you need to
provide Capture with a different restart point (a different WARM start log
sequence).
Do so by following the guidelines below:

1. Delete all records from the existing WARM start table (back up the table
before doing this).
2. Determine a valid log sequence number using utilities available for the
DB2 platform you are using.
3. Insert one record into the WARM start table, using the following values:
SEQ the log sequence number determined in step 2
AUTHTKN leave this column NULL
AUTHID leave this column NULL
CAPTURED leave this column NULL
UOWTIME set this column to 0 (zero)
4. Restart Capture
OS/390 Remark: A valid log sequence number can be derived by first using
the DSNJU004 utility to find an active log range, and then by running the
DSN1LOGP utility with this given log range (or a smaller subset) as an input.
The DSN1LOGP utility will show the actual log record numbers within the
given range. Choose a BEGIN UR or COMMIT log record, avoid UNDO or
REDO records.

Again, Capture errors while processing the DB2 log are not something you
will be dealing with every day. The procedure given above could be treated as
an emergency guideline, but it is best to be prepared.
5.4.4 Monitoring Apply

Basically, the same monitoring issues that apply to Capture are also
recommended for Apply. That means, the main monitoring tasks for Apply will
also fall into the following categories:
• Check if the Apply process is up and running.
• Detect and solve Apply problems as soon as possible.
5.4.4.1 Monitoring Apply Processes

If your Apply process is assumed to be running continuously, then checking if
Apply is up and running will be a task your monitoring system has to perform
on a regular basis. In contrast to Capture (which is assumed to be a
continuous process), it is possible to start Apply only when replication actions
are supposed to happen (for example, if Apply is scheduled to run once a
day). In those cases, of course, regular monitoring is obsolete.
Depending on the execution platform you are using, the facilities available to
check if a program is running will be different. The following example shows
how to determine if Apply is running for UNIX operating systems:
#!/bin/ksh
ps -ef | grep ’asnapply’ | grep -v grep | wc -l
The above command will return 1 if the Apply instance is active.
If you are using several Apply processes on the same machine, all using a
different Apply qualifier or a different control server, the command to check if
Apply is running could even distinguish between several processes:
#!/bin/ksh
ps -ef | grep ’asnapply <apply_qual> <cntl_server>’ | grep -v grep | wc -l
If you are running Apply on an AS/400, you can check whether it is running by
issuing the WRKSBS command, choose option 8 in front of the QZSNDPR
sub-system, and you should see a job having the name of the Apply Qualifier.
5.4.4.2 Monitoring the Subscription Status

Apply reports the status of each subscription cycle in the subscription set
table by updating the status column. As the final step of each subscription
cycle, Apply inserts one record into the Apply trail table, which includes
additional statistics about previous replication cycles.

A look at the subscription set table (ASN.IBMSNAP_SUBS_SET), located at
the replication control server, is sufficient to check whether the status of a
subscription set is as expected, or whether problems prevented the most
recent replication attempt.
Table 2 displays all possible states of a subscription set, taking the ACTIVATE
column and the STATUS column of the subscription set table into account.
Table 2. Determining the Status of Subscription Sets
Activate Status Description
0 0 The subscription set has never been run (initial setting after
defining an empty set), or the subscription set has been
manually deactivated.
1 0 The subscription set is active and the subscription status is

fine. Have a look at the timestamp columns provided within
the subscription set table (SUBS_SET) to see if the
subscription has been run.
1 -1 The execution of this subscription set ended in error the

last time the subscription was processed. Refer to the
Apply trail table (ASN.IBMSNAP_APPLYTRAIL) to
determine the reason for the failure. Note that Apply retries
failing subscription sets every 5 minutes.
0 -1 Apply reported an error when processing the set before the

subscription was deactivated.
1 1 This subscription set is currently being serviced by Apply.
1 2 This subscription set is currently being serviced by Apply.

Apply uses mini-cycles (see blocking factor) to service the
set.
5.4.4.3 Detecting Failing Subscription Sets

Considering the above explanations, subscription sets currently in error can
be determined by using the following query:
SELECT ACTIVATE, STATUS, APPLY_QUAL, SET_NAME, WHOS_ON_FIRST
FROM ASN.IBMSNAP_SUBS_SET
WHERE STATUS = -1;
Keep in mind that no data is replicated to any of the replication target tables
of a set whenever a subscription set fails. If a subscription error occurs after
changes were already inserted into some of the target tables, those changes
will be rolled back immediately.

5.4.4.4 Monitoring Subscription Set Latency
The status of a subscription set is not the only means to detect if the
replication system is working properly. The next level of collecting
subscription status information is to compare the timestamp of when the
subscription set should have been run with the timestamp the subscription set
was actually processed.
The timestamp columns that we will use for this comparison are all available
from the subscription set table, ASN.IBMSNAP_SUBS_SET(see Table 3).
Table 3. Timestamp Information Available from the Subscription Set Table
Column Name Meaning
LASTRUN Control server timestamp, revealing the most recent time a

subscription set was processed. Apply advances LASTRUN for
successful and for unsuccessful subscription attempts.
LASTSUCCESS Control server timestamp, revealing the most recent time a

subscription set was successfully processed.
SYNCHTIME All replication target tables belonging to a subscription set contain

all changes, captured at the replication source server, that were
committed before SYNCHTIME. SYNCHTIME is the timestamp
associated with the log sequence number of a captured log record
(the log sequence number is stored within the SYNCHPOINT
column).
Apply obtains SYNCHPOINT and SYNCHTIME from the
REGISTER table (global record) for each subscription cycle.
Apply uses the global SYNCHPOINT as upper limit when fetching
data from the change data tables when it processes a subscription
cycle (SYNCHTIME therefore contains the source server
timestamp associated with this upper limit).
Use the following query, to compare the three subscription timestamp

columns with the current timestamp to calculate the subscription lag (or
subscription latency). Issue the query while connected to the replication
control server.
SELECT ACTIVATE, STATUS, APPLY_QUAL, SET_NAME, WHOS_ON_FIRST,
SECOND(CURRENT TIMESTAMP - LASTRUN) +
((MINUTE(CURRENT TIMESTAMP) - MINUTE(LASTRUN)) * 60) +
((HOUR (CURRENT TIMESTAMP) - HOUR (LASTRUN)) * 3600) +
((DAYS (CURRENT TIMESTAMP) - DAYS (LASTRUN)) * 86400)
AS SET_RUN_LAG,
SECOND(CURRENT TIMESTAMP - LASTSUCCESS) +
((MINUTE(CURRENT TIMESTAMP) - MINUTE(LASTSUCCESS)) * 60) +
((HOUR (CURRENT TIMESTAMP) - HOUR (LASTSUCCESS)) * 3600) +
((DAYS (CURRENT TIMESTAMP) - DAYS (LASTSUCCESS)) * 86400)

AS SET_SUCCESS_LAG,
SECOND(CURRENT TIMESTAMP - SYNCHTIME) +
((MINUTE(CURRENT TIMESTAMP) - MINUTE(SYNCHTIME)) * 60) +
((HOUR (CURRENT TIMESTAMP) - HOUR (SYNCHTIME)) * 3600) +
((DAYS (CURRENT TIMESTAMP) - DAYS (SYNCHTIME)) * 86400)
AS SET_LATENCY
AND SET_NAME = ’<set_name>’;
Remark: Please notice that you are comparing a control server timestamp
(current timestamp) with a source server timestamp (SYNCHTIME). This
query could cause unexpected results, if the control server and the source
server are placed within different time zones, for example.
5.4.4.5 Identifying Members for a Given Subscription Set

Which members are defined for a subscription set can easily be determined
by joining the subscription set and the subscription members table:
SELECT A.STATUS, A.APPLY_QUAL, A.SET_NAME, A.WHOS_ON_FIRST,
B.SOURCE_OWNER, B.SOURCE_TABLE, B.SOURCE_VIEW_QUAL,
B.TARGET_OWNER, B.TARGET_TABLE,
B.TARGET_COMPLETE, B.TARGET_CONDENSED
FROM ASN.IBMSNAP_SUBS_SET A,
ASN.IBMSNAP_SUBS_MEMBR B
WHERE A.APPLY_QUAL = B.APPLY_QUAL
AND A.SET_NAME = B.SET_NAME
AND A.WHOS_ON_FIRST = B.WHOS_ON_FIRST
AND A.APPLY_QUAL = ’<apply_qual>’
AND A.SET_NAME = ’<set_name>’;
5.4.4.6 Looking for Details in the Apply Trail Table

Apply records the details of each subscription cycle in the DProp control table
ASN.IBMSNAP_APPLYTRAIL. The table content can be used:
• To determine the reason for replication errors (if the status of a
subscription set switched to -1)
• To collect replication statistics, for example to check how many records
are replicated during one day
Use the following query example to select data from the Apply trail table:
SELECT APPLY_QUAL, SET_NAME, WHOS_ON_FIRST,
STATUS, LASTRUN, LASTSUCCESS, SYNCHTIME,
MASS_DELETE, EFFECTIVE_MEMBERS,
SET_INSERTED, SET_DELETED, SET_UPDATED, SET_REWORKED, SET_REJECTED_TRXS,
SQLCODE, SUBSTR(APPERRM , 1 , 8) AS ASNMSG, APPERRM

FROM ASN.IBMSNAP_APPLYTRAIL
ORDER BY LASTRUN;
Modify the statement to determine the most recent Apply trail record for a
subscription set which was not successful:
SELECT SQLCODE, APPERRM FROM ASN.IBMSNAP_APPLYTRAIL
AND WHOS_ON_FIRST = ’<whos_on_first>’
AND STATUS = -1
AND LASTRUN = (SELECT LASTRUN
AND WHOS_ON_FIRST = ’<whos_on_first>’);
As an additional idea, you could easily define a trigger on the Apply trail table
that would always execute (and perhaps sends a message) if a failing
subscription is reported into the Apply trail table ( STATUS = -1).
Remark: For all known error situations, a more detailed description and
possible solutions can be obtained from the DB2 Replication Guide and
Reference, S95H-0999.
Remark for Heterogeneous Environments: DataJoiner attempts to map all

SQL codes or SQL states reported from remote non-IBM database systems
into known DB2 SQL code and SQL states. If such a mapping is not possible,
DataJoiner issues SQL code -1822. The SQLMSG (SQL message) contains
the error message as obtained from the remote data source.
DataJoiner SQLCODE -1822

SQL1822N Unexpected error code "<error code>" received from data source
"<data source name>". Associated text and tokens are "<tokens>".
Cause: While referencing a data source, DataJoiner received an

unexpected error code from the data source that does not map to a DB2
equivalent.
If the Apply trail table (or the Apply trace) reveals that a problem had
originally occurred at a non-IBM database system (by showing SQLCODE
-1822), you have to refer to column APPERRM of the Apply trail table for
more details. This column will contain the complete SQL message (at least as

much as DataJoiner could get) as issued from the remote data source. Then
use the techniques described in Appendix B, “Non-IBM Database Stuff” on
page 325 to obtain vendor specific error information.
5.4.4.7 Utilizing Apply’s ASNDONE User Exit

With IBM DProp V5, a lot of freedom and flexibility was introduced to add
customized logic during or after a subscription cycle.
Apply’s Interfaces for Customized Logic

The most important interfaces for customized logic that Apply offers are:
• SQL statements or stored procedures (statement type "G"), executed at
the replication source server, before Apply reads the register table to
determine which change data table corresponds to which replication
source table.
• SQL statements or stored procedures (statement type "S"), executed at
the replication source before Apply fetches the answer set from the
change data tables (for an SQL statement example, refer to 8.4.7.2,
“Maintaining a Base Aggregate Table from a Change Aggregate
Subscription” on page 257).
• SQL statements or stored procedures (statement type "B"), executed at
the replication target server, before Apply applies the answer set to the
target tables (for a stored procedure example, executed at Mircosoft SQL
Server, refer to 7.2.2.3, “Invoking Stored Procedures at the Target
Database” on page 185).
• SQL statements or stored procedures (statement type "A"), executed at
the replication target server after Apply has applied the answer set to the
target tables (for an SQL statement example, refer to 8.4.8, “Pushing
Down the Replication Status to Oracle” on page 259).
• Last but not least, a user exit, called ASNDONE, can be called after Apply
has completed a replication cycle. The C source code of a sample
program is included within the Apply package.
In this section, we will provide you with some guidelines on how to customize
and use the sample program ASNDONE.
The ASNDONE User Exit

Apply is capable of invoking a program, called ASNDONE, after it has
finished a replication cycle. Apply does so, if it is started with the NOTIFY
start option (if the NOTIFY start parameter is omitted, the ASNDONE program
will not be invoked).

Parameters passed to ASNDONE
Apply invokes the ASNDONE program with several parameters to let the
ASNDONE program know which subscription it was servicing before invoking
the user exit. The ASNDONE program can use the passed parameters, but
does not necessarily have to.
Parameters passed to the ASNDONE user exit:

•Set_Name
•Apply_Qualifier
•Whos_On_First
•Control Server
•Trace Option
•Status
Useful ASNDONE Logic

ASNDONE is invoked by the Apply program after the subscription set
processing has completed, regardless of success or failure. You can modify
ASNDONE to meet the requirements of your installation.
The following are some very useful examples where ASNDONE could be
used:
• If a subscription cycle has not completed successfully (which can be
determined from the STATUS value passed to the ASNDONE program),
an automated monitoring system could be notified.
• If a subscription cycle has not completed successfully, an e-mail could
automatically be sent to the replication operator.
• Depending on the reason causing a problem, ASNDONE could deactivate
the subscription set causing the problem.
• In Update-Anywhere scenarios (updates to the replication target tables are
replicated back to the replication source table), ASNDONE can be used to
react on compensated transactions. Capture marks every transaction that
was compensated at the replica site by adding a compensation code to
the unit-of-work record that was captured into the unit-of-work table (and
rejected transactions will only be pruned by retention limit pruning so that
they are available for additional processing). ASNDONE could make use
of the rejection code provided by Capture to notify users or to
automatically reinsert compensated transactions.
After Changing ASNDONE

After you have made changes to the sample ASNDONE C source code, you
will need to recompile, link and bind the program. If the modified source will

run on any platform but MVS and include SQL statements, then you must
install the DB2 UDB Software Developer’s Toolkit (DB2 UDB SDK) on the
system where the code is compiled.
Keep in mind, that ASNDONE (and this also applies to stored procedures) is
called from the Apply program. Therefore, if the user exit uses compiled SQL
statements, the user exit must fulfill the following requirements:
• Use CONNECT TYPE 1 only
• If executed on OS/390, link with DB2 CAF
• Static SQL packages must be bound against the databases/locations
where the SQL will execute
• If called from OS/390, the packages must be included in the Apply plan
PKLIST
Are Programming Languages Other Than C Supported ?

The compiled sample user exit, compiled from C source code, can be
substituted with any other compiled program named ASNDONE.
Even REXX user exits can be used to substitute the compiled versions. On
OS/2, substitute the ASNDONE program in %DB2PATH%\bin with your REXX
exec code. On Windows NT/95, a REXX exec is called by issuing " REXX
execname parameters".
To implement a Windows REXX ASNDONE exit, simply provide a NT batch

file within %DB2PATH/bin, named ASNDONE.BAT, that invokes the REXX
exec, using the above syntax:
1. Create a batch file named ASNDONE.BAT with one line:
@REXX ASNDONE.REX %1 %2 %3 %4 %5 %6
2. Create a REXX exec named ASNDONE.REX with your logic.
3. Rename the provided ASNDONE sample program.
4. Place ASNDONE.BAT and ASNDONE.REX in path %DB2PATH%\bin..
REXX Sample for the ASNDONE User Exit

We are going to show you a small ASNDONE program, coded in REXX, to
give you an idea of how ASNDONE could be used.
The easy example below just switches off (deactivates) failing subscriptions.
In a more sophisticated approach, some more logic could be added to
automatically fix certain problems or to notify an administrator or a monitoring
system.

Remark: The feature in which subscriptions can modify some of their own
attributes (like ACTIVATE in our example) was introduced within the first
quarter of 1999. Be sure to install an Apply release that supports this feature
if you want to use the following example. We tested this feature using Apply
for NT, build 0074 (you can check the Apply build by invoking Apply with
option TRCFLOW—the build number will be given in the trace output).
/**********************************************************************/
/* */
/* ASNDONE SAMPLE REXX EXEC */
/* */
/* This sample program just deactivates a subscription set after a */
/* failing subscription cycle (it is just an example, feel free to */
/* include more useful logic ...). */
/* */
/* The parameters passed to ASNDONE are as follows: */
/* ------------------------------------------------ */
/* - set name */
/* - apply qualifier */
/* - whos_on_first value */
/* - control server */
/* - trace option */
/* - status value */
/* */
/**********************************************************************/
/* get parameters passed */

PARSE ARG SET_NAME APPLY_QUAL WHOS_ON_FIRST CNTL_SERVER TRACEON STATUS;
/* init return code */

RC = 0;
/* print parameter info if trace is on */

if traceon = "yes" then
do
say ’ DONE: The following parameters were passed by APPLY ...’;
say ’ DONE: APPLY_QUAL = ’ apply_qual;
say ’ DONE: SET_NAME = ’ set_name;
say ’ DONE: WHOS_ON_FIRST = ’ whos_on_first;
say ’ DONE: CNTL_SERVER = ’ cntl_server;
say ’ DONE: STATUS = ’ status;
end;
if status = 0 then
SIGNAL GO_EXIT
/*******************************/
/* Load Rexx DB2 functions */
/*******************************/
STMT = ’LOADING DB2 REXX FUNCTION’

if Rxfuncquery(’sqlexec’) \= 0 then
do
rcy = RxFuncAdd(’SQLEXEC’, ’DB2AR’, ’SQLEXEC’);
if rcy \= 0 then
do
if TRACEON = ’yes’ then say ’ DONE:’ STMT ’ FAILED’;
RC = RCY;
SIGNAL GO_EXIT;
end;

if TRACEON=’yes’ then say ’ DONE:’ STMT ’ SUCCESSFUL’
end;
/*************************/
/* CONNECT TO CNTLSERVER */
/*************************/
STMT = ’CONNECT TO’ CNTL_SERVER;

call SQLEXEC ’CONNECT TO’ CNTL_SERVER;
if SQLCA.SQLCODE \= 0
then SIGNAL SQL_ERROR;
if TRACEON = ’yes’ then

say ’ DONE:’ STMT ’ SUCCESSFUL’;
/*---------------------------------------------------------------------*/
/* INVESTIGATE ON THE REASON FOR THE PROBLEM */
/*---------------------------------------------------------------------*/
/* Use the following query to determine the reason why the */

/* subscription was failing ... */
/* trail_stmt = "SELECT SQLCODE, APPERRM FROM ASN.IBMSNAP_APPLYTRAIL", */

/* "WHERE APPLY_QUAL = ’"apply_qual"’", */
/* "AND SET_NAME = ’"set_name"’", */
/* "AND WHOS_ON_FIRST = ’"whos_on_first"’", */
/* "AND STATUS = -1", */
/* "AND LASTRUN = (SELECT LASTRUN", */
/* "FROM ASN.IBMSNAP_SUBS_SET", */
/* "WHERE APPLY_QUAL = ’"apply_qual"’", */
/* "AND SET_NAME = ’"set_name"’", */
/* "AND WHOS_ON_FIRST = ’"whos_on_first"’)" */
/*---------------------------------------------------------------------*/
/* TRY TO AUTOMATICALLY FIX THE PROBLEM */
/*---------------------------------------------------------------------*/
/* ... */
/*---------------------------------------------------------------------*/
/* DEACTIVATE SUBSCRIPTION, IF PROBLEM CANNOT BE FIXED */
/*---------------------------------------------------------------------*/
if status = -1 then
do
sql_stmt = "UPDATE ASN.IBMSNAP_SUBS_SET",
" SET ACTIVATE = 0",
" WHERE SET_NAME = ’"set_name"’",
" AND APPLY_QUAL = ’"apply_qual"’",
" AND WHOS_ON_FIRST = ’"whos_on_first"’";
/*********************/
/* EXECUTE IMMEDIATE */
/*********************/
STMT = ’UPDATE ASN.IBMSNAP_SUBS_SET SET ACTIVATE = 0 for’,

apply_qual’/’set_name’/’whos_on_first;
call SQLEXEC ’EXECUTE IMMEDIATE :sql_stmt’;

if SQLCA.SQLCODE \= 0 then

SIGNAL SQL_ERROR;
call SQLEXEC ’COMMIT’;
if TRACEON = ’yes’ then

say ’ DONE:’ STMT ’ SUCCESSFUL’;
end;
/*---------------------------------------------------------------------*/
/* SEND AN EMAIL TO THE REPLICATION OPERATOR */
/*---------------------------------------------------------------------*/
/* ... */
SIGNAL GO_EXIT
/* END OF PROGRAM LOGIC */
/*********************/
/* SQL ERROR HANDLER */
/*********************/
SQL_ERROR:
if TRACEON = ’yes’ then do

SAY ’ DONE:’ STMT ’ FAILED’
SAY ’ DONE: WITH A SQLCODE OF ’ SQLCA.SQLCODE
SAY ’ DONE: SQLERRMC = ’ SQLCA.SQLERRMC
SAY ’ DONE: SQLSTATE = ’ SQLCA.SQLSTATE
end
RC = SQLCA.SQLCODE
go_exit:
return RC
Start the Apply instance with the TRCFLOW start option to find all trace
messages issued by ASNDONE in Apply’s trace output.
ASNDONE on the AS/400

If you are using Apply on an AS/400, you can have as many ASNDONE
programs as you like, and you can even change the name of the ASNDONE
program. In fact, when you start one Apply instance using the STRDPRAPY
command, you can indicate the name and the library of the user exit.
5.4.5 Monitoring the Database Middleware Server

Once started, the most important issue while monitoring DataJoiner is to
check if the main DataJoiner engine is running. Basically, all monitoring
options that apply to DB2 Version 2.1.2 are also available for DataJoiner.

5.4.5.1 Monitoring DataJoiner Processes
Depending upon whether you are running the DataJoiner gateway on
Windows NT or on a UNIX platform, the facilities available to check if the
DataJoiner engine is running will be different.
Considering DataJoiner for UNIX operating systems, the DataJoiner process

model consists of several autonomous subcomponents. The most important
process is the process of the DataJoiner engine, called db2sysc.
The following example shows how to determine if the DataJoiner engine is

running for UNIX operating systems:
#!/bin/ksh
ps -ef | grep <instancename> | grep db2sysc | grep -v grep | wc -l
The above command will return a value greater or equal to 1 if the DataJoiner
instance is active.
5.5 Tuning Replication Performance

Basically, apart from some techniques that are unique to data replication with
IBM DProp, we can consider Capture, Apply, and also the heterogeneous
change capture triggers, to be database applications or database tasks.
So, most of the performance techniques introduced within this chapter are
common database tuning techniques, applied to change data tables,
database logs, or static and dynamic SQL. The following dedicated DProp
tuning techniques will be introduced within this section:
• Running Capture with the appropriate priority
• Adjusting the Capture tuning parameters
• Using separate tablespaces
• Choosing appropriate lock rules
• Using the proposed change data indexes
• Updating database statistics
• Making use of subscription sets
• Using pull rather than push replication
• Using multiple Apply processes in parallel
• Using high performance full refresh techniques
• Using memory rather than disk for the spill file

• Enabling block fetch for Apply
• Tuning pruning
• Optimizing network performance
Other useful performance recommendations can be found in the

DataPropagator Relational Performance Measurement Series , available on
the World Wide Web
(http://www.software.ibm.com/data/db2/performance/dprperf.htm , subsection
"library").
5.5.1 Running Capture with the Appropriate Priority

Although the Capture process is the most critical task within a replication
system, it does not have to be run as a high priority task. Depending on the
latency you are planning to allow for the replication target tables, adjust the
priority of the Capture process accordingly.
For example, if application performance in the source system is more

important than replicating the changes as soon as possible, just lower the
priority of the Capture process (on those platforms where such scheduling is
possible). During periods of high system load (for example, during heavy
batch processing), Capture will simply lag behind. During periods with a lower
system load, Capture will catch up again.
Remark: The above scheduling recommendation is only applicable to DB2

replication source systems using DProp Capture, and not for non-IBM
replication sources, where change capture is established by synchronous
database triggers.
5.5.2 Adjusting Capture Tuning Parameters

Four tuning parameters are available to customize the change capture
process (for DB2 replication sources). The tuning parameters are stored
within the DProp control table ASN.IBMSNAP_CCPPARMS. This table
contains only one row. The tuning parameters are:
• Capture Commit Interval
• Capture Lag Limit
• Capture Pruning Interval
• Capture Retention Limit
Tuning parameter adjustments take effect after recycling (stopping/starting)

Capture or re-initializing Capture using the REINIT command.

5.5.2.1 Capture Commit Interval
Capture continuously monitors the DB2 log. When Capture reads a change
for one of the registered replication source tables, it inserts a row into a
change data table (but Capture does not commit every insert at once). The
Capture commit interval determines how often Capture commits while
capturing changes to the change data tables.
The default for the Capture commit interval is 30 seconds. A higher commit
interval reduces the cost of change capture, but also might increase the
latency of very frequently running subscriptions (for example, for continuously
running subscriptions). The commit interval is specified in seconds.
Recommendation:
On platforms where dynamic SQL caching is available, the low threshold is

10 seconds. Under 10 seconds, the commit overhead impacts throughput.
Where dynamic SQL caching is not available, the low threshold is 20

seconds.
5.5.2.2 Capture Lag Limit

The Capture lag limit is used to prevent Capture from acquiring very old DB2
archived log datasets (files). For production environments, the lag limit will
probably never be reached.
But consider test systems, that are from time to time used to check out new
replication techniques. Capture might have been stopped for some time (say
weeks). When it is re-started with the WARM start option (which is the
default), Capture would request all DB2 log datasets from the time is was
stopped. If those datasets are still available, they would be mounted.
You probably do not want this to happen. The general advice is to start
Capture in COLD mode in test environments, if Capture has not been running
for a while. If it is accidentally started in WARM mode, the lag limit will let
Capture switch to a COLD start (or stop, if WARMNS is used), if the log
records that Capture would require to perform a WARM start are older than
the lag limit. The lag limit parameter is specified in minutes.
5.5.2.3 Capture Pruning Interval

Pruning is an expensive process, and therefore control over that process
should be in your hands. The general advice is to start Capture specifying the
NOPRUNE start option and to launch pruning (by issuing the Capture PRUNE
command) when system resources are available. Refer to 5.3.2, “Pruning” on
page 91 for the full story about pruning.

If you choose not to follow this advice and you are starting Capture without
the NOPRUNE option, pruning will automatically interrupt the change capture
activity on a regular basis. To control how often this will happen, make use of
the pruning interval parameter. We strongly recommend a value higher than
the default of 300 seconds. The pruning interval parameter is specified in
seconds.
5.5.2.4 Capture Retention Limit

The most infrequently replicating subscription determines which records can
be pruned from the change data tables and which must stay there. The
retention limit prevents change data records from staying there forever, if one
of the replication targets no longer connects to the replication source system.
Remark: Retention limit pruning can destroy replication consistency for those
subscriptions that did not connect for a long time. Those subscriptions will
automatically do a full-refresh when starting the next subscription cycle.
The retention limit is used during pruning, to prune all transactions from the
change data tables that are older than CURRENT TIMESTAMP - RETENTION_LIMIT
minutes.
Replication environments with very infrequently connected mobile systems

will probably need a longer retention limit than systems with frequently
connecting Apply processes. The retention limit parameter is specified in
minutes.
5.5.3 Using Separate Tablespaces

The following recommendations apply to tablespaces containing DProp
control tables or change data tables for DB2 for OS/390 replication source
servers:
• Use one single tablespace for each of the change data tables. These
tables might become quite big and may need some special housekeeping
treatment.
• Use one single tablespace for the unit-of-work table. This table also might
become quite big and need some special housekeeping treatment.
Additionally, you will avoid locking contention problems on the UOW and
REGISTER table, if those tables are stored in separate tablespaces.
• It is OK to put all other replication control tables into one tablespace.
Remark (DB2 UDB for Intel and UNIX): Excellent performance can be
achieved by placing multiple change data tables into one single tablespace
(for example, using disk striping accross multiple disks for that tablespace).

Therefore, the above advice to use separate tablespaces does not
necessarily apply to DB2 on Intel and RISC platforms.
5.5.4 Choosing Appropriate Lock Rules

When using DB2 for OS/390 as a replication source server, we recommend
setting the LOCKSIZE for the UOW tablespace and every CD tablespace to
TABLE. A lower granularity of locks will create unnecessary resource
utilization and overhead. Since DProp V5 introduced isolation level
uncommitted read (UR), Capture holds an exclusive lock on the changed data
tables it is using, but Apply is nevertheless able to read previously committed
data using the new isolation level. (Capture tells Apply up to which log
sequence number the changes have been committed.)
The LOCKSIZE of the tablespace containing the replication control tables

(such as the register table) should be defined with a higher granularity (for
example LOCKSIZE ROW), if multiple Apply processes are accessing the
control tables in parallel.
5.5.5 Using the Proposed Change Data Indexes

Be sure to create the index generated for the change data table as a UNIQUE
INDEX (do not customize this index attribute).
To guarantee optimal performance, the one and only change data table index
should look like the following example (DB2 for OS/390 syntax):
CREATE TYPE 2 UNIQUE INDEX <indexname> ON <cd_owner>.<cd_table>
(IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC)
USING STOGROUP <stogroup> PRIQTY <nnn> SECQTY <mm>
FREEPAGE 0 PCTFREE 10;
The unit-of-work table index should look like the following example (DB2 for
OS/390 syntax):
CREATE TYPE 2 UNIQUE INDEX <indexname> ON ASN.IBMSNAP_UOW
(IBMSNAP_COMMITSEQ ASC, IBMSNAP_UOWID ASC, IBMSNAP_LOGMARKER ASC)
USING STOGROUP <stogroup> PRIQTY <nnn> SECQTY <mm>
FREEPAGE 0 PCTFREE 0 ;
OS/390 Remark: Make sure that all indexes on change data tables, the
unit-of-work table and all other replication control tables are defined as TYPE
2 indexes. DB2 ignores TYPE 1 indexes when using isolation UR.
Note for all DPRTools V1 users and all DJRA early users: Make sure that
there is only 1 unique CD index and one unique UOW index after migrating to

DProp V5. Previously proposed indexes are still supported, but have a
negative performance impact.
AS/400 Remark: In DPropR/400 V1, the indexes were different from those of
the other platforms. With DPropR/400 V5, use the same indexes as those
described above (except the TYPE 2, FREEPAGE and PCTFREE parameters
that do not exist on AS/400).
5.5.6 Updating Database Statistics

The change data tables contents and the unit-of-work table contents will vary
in size from the initial zero rows to the maximum size right before Capture
prunes the change data tables. This means that the timing of RUNSTATS is
critical.
RUNSTATS must be run at a time when the change data tables contain
sufficient data so that the carefully chosen indexes on change data tables and
on the unit-of-work table will be used by Apply and Capture.
It is not necessary to update the statistics again, once the catalog tables
show that there is an advantage to using the indexes. The SQL against the
changed data tables is dynamic, using parameter marker values, and
therefore default filter factors will be used.
The cardinality of the tables will affect the default filter factor values, but the
fact that the high and low values are old will not have any effect.
Rebind the Capture and Apply packages after the RUNSTATS has been
performed, so that the static SQL contained in these packages can benefit
from the updated statistics.
Remark: There is no equivalent to the RUNSTATS command on the AS/400.
5.5.7 Making Use of Subscription Sets

With DProp V5, subscription sets were introduced in order to group multiple
subscription members together.
Subscription sets have several advantages compared to single subscriptions

(subscription sets containing only one table). For example, all target tables
refreshed within the same subscription set are transactionally consistent,
meaning, all the members of a subscription set contain all the changes
captured at the replication source database up to the same log sequence
number (up to the same source site transactions).

Additionally, subscription sets provide an immense performance advantage
compared to replicating every table separately. To show this, let us take a
closer look at what happens when Apply services a subscription set. To keep
things simple, we will consider read-only target tables (no update-anywhere).
Figure 20 shows that Apply performs several single tasks during a

subscription cycle. Some tasks are executed at the replication control server,
others are executed at the replication source server, and finally others at the
replication target server. Of course, Apply needs database connections in
order to perform the tasks at the different databases.
2. Pick up Recent Changes 3. Apply foreign

5. Report Subscription Source Target Changes to the
Progress for Pruning Server Server Target Tables
APPLY
By never acquiring locks in two

1. Look for Work databases at the same time,
APPLY avoids the possibility of
4. Update Subscription Control participating in a distributed deadlock;
Status Server a nasty possibility when working
with distributed databases.
Figure 20. Apply Cycle at a Glance
Looking at Figure 20, we can identify at least the following Apply tasks, which
execute in the following sequence:
1. Control Server: Look for work and determine subscription set details
2. Source Server: Fetch changes from change data tables into the spill file
3. Target Server: Apply changes from the spill file to target tables
4. Control Server: Update subscription statistics
5. Source Server: Advance pruning control synchpoint to enable pruning
All these tasks need database connections, and are executed at subscription
set level.
The number of database connections (established one at a time) that are

needed to replicate a given number of tables can be dramatically reduced by

grouping the target tables together in subscription sets. Table 4, showing
alternatives for replicating 100 tables, stresses this observation:
Table 4. Number of Connections Needed to Fulfill Replication Task
Number of Subscription Sets Number of Connections
100 Sets / 1 Member per Set 500 Connections
50 Sets / 2 Members per Set 250 Connections
1 Set / 100 Members in the Set 5 Connections
To see the performance boost that can be achieved by grouping subscription

members together, no further comment is required!
The only impact of having big subscription sets is that the transactions
needed to replicate data into the target tables can become quite large (all
changes within one subscription set are applied within one transaction). Be
sure to allocate enough log space and enough space for the spill file.
To prevent database log and spill file overflow, DProp offers another
technique to keep target transactions small. To use this technique, you have
to add a blocking factor (also referred to as the MAX_SYNCH_MINUTES feature) to
the subscription set. This also guarantees transaction consistency at set
level, but lets Apply replicate changes in multiple smaller mini-cycles rather
than in one big transaction. Refer to 3.3.3, “Using Blocking Factor” on page
54 for the details.
As you see, we are dealing with a classic trade-off situation here. We

generally recommend that multiple subscription members be grouped into
one subscription set. On the other hand, we also recommend use of the
blocking factor, to set some kind of upper transaction boundary.
Recommendation: Customize your system to be generally capable of

replicating one subscription cycle within one target site transaction. Choose a
blocking factor that takes effect in extraordinary situations, for example,
during days with extremely high change rates, or after Apply comes up for the
first time after a long maintenance window.

5.5.8 Using Pull Rather Than Push Replication
In Chapter 3, “System and Replication Design—Architecture”, 3.2.1, “Apply
Program Placement: Pull or Push” on page 39, we explained the differences
between setting up Apply in what is called push mode and pull mode.
As a reminder, pull means that DProp Apply is running at the target server,
fetching data from the replication source server, usually over a network, and
inserting all the fetched changes into the target tables locally.
In push mode, DProp Apply is running at a site other than the target server
(probably at the source server), inserting all the changes into the target tables
remotely over the network.
From a performance perspective, it is better to design and configure your

replication system so that it uses pull mode, because Apply will be able to
make use of DB2/DRDA block fetch in these cases. Selecting data from a
remote site over a network using block fetch capability is much (!) faster than
inserting data over a network (without the possibility of blocking multiple
inserts together).
5.5.9 Using Multiple Apply Processes in Parallel

With DProp V5, the terms Apply qualifier and subscription set were
introduced:
• One Apply Instance is started for each Apply qualifier.
• Multiple subscription sets can be defined for each Apply qualifier.
When Apply is started for one Apply qualifier, it immediately calculates, based
on the subscription timing that you defined, if subscription sets need to be
serviced. If several subscription sets are awaiting replication, Apply always
services the most overdue one first.
That means, a single Apply process always services subscription sets

sequentially. If you want to have subscription sets serviced in parallel, choose
to have multiple Apply qualifiers. For each Apply qualifier, a single Apply
process can be started subsequently.
Multiple Apply processes obviously can be advantageous from a performance

perspective, because the work is done in parallel.
5.5.10 Using High Performance Full Refresh Techniques

To speed up the initial full refresh process for huge source tables, use one of
the following advanced techniques:

1. Customize and use Apply’s ASNLOAD user exit to speed up the automatic
full refresh using EXPORT and LOAD.
2. Initialize your replication target tables manually.
• Disable full refresh from your source tables. Refer to 5.6.2, “Selectively
Preventing Automatic Full Refreshes” on page 129 for the details.
• Use the DBA unload and load utilities you are familiar with and that deliver
best performance. Refer to 5.2.2.3, “Manual Refresh / Off-line Load” on
page 89 for the full details about performing initial refreshes manually.
5.5.11 Using Memory Rather Than Disk for the Spill File
When using Apply for MVS, Apply provides an option to create the spill file in
memory rather than on disk. There is an obvious advantage in using memory
for the spill file rather than using disk storage (refer to 5.5.7, “Making Use of
Subscription Sets” on page 122 to see when and where the spill file is
created).
If your replication cycles are short, the amount of data to be replicated may
be appropriate for creating spill files in memory.
5.5.12 Enabling Block Fetch for Apply

In order to enable the performance boost that we want to achieve using a
replication pull design, we have to ensure that Apply really does use block
fetch when selecting data from a remote source server.
Whether Apply will actually use DB2/DRDA block fetch depends on the bind
options that were used when binding the Apply packages against the
replication source server, either DB2 or DataJoiner. For details about binding
Apply, refer to the general implementation checklist, “Step 22—Bind DProp
Apply” on page 64.
5.5.12.1 UNIX and Intel Platforms

Specify the bind option BLOCKING ALL when binding Apply for Intel or UNIX
platforms against the remote replication source server.
Refer to the DProp installation documentation (DB2 Replication Guide and

Reference, S95H-0999) for further details.
5.5.12.2 OS/390
Specify the bind option CURRENTDATA(NO) when binding the packages of
Apply for OS/390 against the remote replication source server.

Remark: Be aware that the default for the CURRENTDATA bind option
changed from DB2 Version 4 to Version 5. With DB2 for OS/390 Version 5,
CURRENTDATA(YES) was introduced as default bind option (until DB2
Version 4, CURRENTDATA(NO) was the default). To enable block fetch for
DB2 for OS/390, it is necessary to add the CURRENTDATA(NO) bind
parameter to Apply’s BIND job, if not already present.
5.5.12.3 AS/400
Nothing special needs to be done for the AS/400.
5.5.13 Tuning Pruning

Without going into the details, pruning can be assumed to be quite a
CPU-consuming process. On the other hand, change data tables and the
unit-of-work deferring pruning to off-peak hours.
For for details about pruning, please refer to 5.3.2, “Pruning” on page 91.
5.5.13.1 Start DProp Capture With the NOPRUNE Start Option

Starting Capture with the NOPRUNE start option will dramatically reduce
Capture’s CPU utilization, because no pruning will actually happen until
Capture is requested to initiate the pruning operation.
To request pruning on demand, DProp offers the possibility to send Capture a

pruning command. Refer to the DB2 Replication Guide and Reference,
S95H-0999 for the syntax of Capture commands depending on the operating
system platform you are using.
5.5.13.2 How to Defer Pruning for Multi-Vendor Sources

For all non-IBM replication source platforms, pruning is initiated by a trigger,
defined on the pruning control table. The trigger executes every time the
pruning control table is updated by Apply after successfully performing a
replication cycle. Especially for those systems using short Apply cycles, this
could be a costly activity.
As you can imagine, there is no PRUNE command available for the

trigger-based solution. Nonetheless, you may also want to defer the pruning
action to off-peak hours or at least have the pruning occur less frequently.
The solution here is to disable the pruning trigger during peak hours and to
enable it when appropriate. Some of the supported multi-vendor database
systems provide the option to simply deactivate triggers. We are showing two
examples here:

Oracle Syntax Example:
-- temporarily disable pruning control trigger

ALTER TRIGGER <schema>.PRUNCNTL_TRIGGER DISABLE;
-- enable pruning control trigger
ALTER TRIGGER <schema>.PRUNCNTL_TRIGGER ENABLE;
Informix Syntax Example:
-- temporarily disable pruning control trigger

SET TRIGGERS <schema>.PRUNCNTL_TRIGGER DISABLED;
-- enable pruning control trigger
SET TRIGGERS <schema>.PRUNCNTL_TRIGGER ENABLED;
For all other database systems, check the documentation for the database
system you are using as replication source to check if triggers can be
temporarily disabled. If disabling of triggers is not supported, use the DROP
TRIGGER and CREATE TRIGGER statements instead:
-- temporarily drop pruning control trigger
DROP TRIGGER <schema>.PRUNCNTL_TRIGGER;
-- recreate pruning control trigger
CREATE TRIGGER <schema>.PRUNCNTL_TRIGGER ...
Important: Copy the DDL to create the pruning control trigger from the SQL
script generated by DJRA. Be sure to copy the CREATE TRIGGER statement from
the SQL script of the source registration that you created last, because the
trigger body of the pruning control trigger changes with every registered
source table.
5.5.14 Optimizing Network Performance

Well, you might already be wondering why we are not focusing on network
performance in the distributed, cross-platform world of data replication. Right,
network setup and performance could easily fill another book. Please refer to
existing SNA, TCP/IP, DRDA, and multi-vendor literature.
5.5.15 DB2 for OS/390 Data Sharing Remarks

For remarks and very valuable guidelines on how to tune IBM DProp in a DB2
DataSharing environment, please refer to the DataPropagator Relational
Performance Measurement Series available on the World Wide Web
(http://www.software.ibm.com/data/db2/performance/dprperf.htm ).

5.6 Other Useful Techniques
During the rest of this chapter we will introduce some techniques that again
make use of the open interface that DProp provides by storing all control
information in DB2 or DataJoiner database tables.
5.6.1 Deactivating Subscription Sets

While testing different replication setup alternatives, but also in production
environments (for example, in case of certain error situations), it could be
advisable to temporarily deactivate subscription sets. To achieve this,
connect to the replication control server and issue the following statement:
UPDATE ASN.IBMSNAP_SUBS_SET
SET ACTIVATE = 0
AND WHOS_ON_FIRST = ’<whos_on_first>’;
To reactivate disabled subscription sets, just reset the ACTIVATE column to 1

again.
SET ACTIVATE = 1
5.6.2 Selectively Preventing Automatic Full Refreshes
As we have already discussed in 5.2.2, “Initialization of Replication

Subscriptions” on page 86, a full refresh for all tables of a subscription set
could be an expensive task. Additionally, consider non-condensed target
tables (histories, for example), which would be destroyed if the history were
to be replaced with a copy of the source table at a certain point in time.
To gain control, DProp allows you to disable any automatic full refresh for
certain source tables.
5.6.2.1 Disable Full Refresh for All Subscriptions

Full refresh is always disabled (or enabled) at the replication source server.
That means, you can allow full refresh for all replication subscriptions
replicating from a certain source table or for none.
Use the following SQL statement to disable any automatic full refresh for a
certain source table. Issue the statement while you are connected to the

replication source server. The statement sets the DISABLE_REFRESH column of
the register table to 1:
SET DISABLE_REFRESH = 1
Use the following statement to enable automatic full refreshes for a

replication source table:
SET DISABLE_REFRESH = 0
5.6.2.2 Allow Full Refresh for Certain Subscriptions

Imagine several subscriptions replicating from the same source table. Some
might copy the complete huge tabl; others might be restricted to copy small
segments only by using restrictive subscription predicates.
Using the technique described above, full refresh can only be disabled (or
enabled) for all the subscriptions that use the source table, because the
disable refresh attribute is set at the replication source table level.
Use the following technique to generally disable full refresh from a replication
source, but to open the door for certain subscriptions only. We will make use
of Apply’s capability to issue SQL statements while performing a replication
cycle.
Using SQL Before Statements

Two different types of SQL Before statements are available to execute at the
replication source server:
• Statement type G: The statement is executed before Apply reads the
register table (ASN.IBMSNAP_REGISTER).
• Statement type S: The statement is executed after Apply has read the
register table and before Apply prepares the cursors to read data from the
change data tables (or source tables when performing the initial refresh,
respectively).
The only thing that Apply does between executing SQL Before statements of
type G and type S is reading the register table. Therefore, this time window,
and the chance that other subscriptions (which should not perform the refresh
automatically) are reading the register table in parallel, is more than
acceptably small.

You are already anticipating what we are going to do? Right, we will use the
SQL Before statements (that execute on set level) to let Apply update the
DISABLE_REFRESH column of the register table to 0, which means
automatic refreshes are temporarily enabled, and we will let Apply reset the
same DISABLE_REFRESH value to 1 again right after it has read through the
register table.
Add a Statement to Your Set to Enable Full Refresh

Add an SQL Before statement similar to the following example to a
subscription set to temporarily enable full refresh from replication source
tables. Because SQL Before statements execute at set level, the statements
switching the DISABLE_REFRESH value must include all the source tables
that correspond to the subscription set members.
Note: The statement enabling full refresh has to be of statement type ’G’.
-- TEMPORARILTY ENABLE FULL REFRESH
-- Statement to temporarily enable full refresh for all members

-- of the subscription set - BEFORE_OR_AFTER = G
INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST,
BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT,ACCEPT_SQLSTATES)
VALUES(’<apply_qual>’,’<set_name>’,’<whos_on_first>’,’G’, 1 ,’E’,
’UPDATE ASN.IBMSNAP_REGISTER SET DISABLE_REFRESH=0 WHERE
(SOURCE_OWNER=’’<source_owner_1>’’ AND SOURCE_TABLE=’’<source_table_1>’’)
OR
(SOURCE_OWNER=’’<source_owner_n>’’ AND SOURCE_TABLE=’’<source_table_n>’’)’
,’0000002000’);
-- increment the AUX_STMTS counter in IBMSNAP_SUBS_SET

UPDATE ASN.IBMSNAP_SUBS_SET SET AUX_STMTS = AUX_STMTS + 1
Add a Statement to Your Set to Disable Full Refresh Again

Add an SQL Before statement similar to the following example to the
subscription set to reset DISABLE_REFRESH to 1 again. Include the same
source tables in the where-clause of the statement as the ones you specified
for the first statement.
Note: The statement disabling full refresh again has to be of statement type
’S’.
-- DISABLE FULL REFRESH AGAIN
-- Statement to disable full refresh again for all members

-- of the subscription set - BEFORE_OR_AFTER = S
VALUES(’<apply_qual>’,’<set_name>’,’<whos_on_first>’,’S’, 2 ,’E’,
(SOURCE_OWNER=’’<source_owner_1>’’ AND SOURCE_TABLE=’’<source_table_1>’’)
OR
(SOURCE_OWNER=’’<source_owner_n>’’ AND SOURCE_TABLE=’’<source_table_n>’’)’
,’0000002000’);

UPDATE ASN.IBMSNAP_SUBS_SET SET AUX_STMTS = AUX_STMTS + 1
5.6.3 Full Refresh on Demand

Apply will automatically perform an initial refresh for a subscription set, in the
following cases:
• If such a refresh has never occurred before
• If Apply has detected a gap for at least one of the members of the
subscription set.
If you, for whatever reason, want to persuade Apply to perform a full refresh
the next time it processes the set, the following three techniques are
available. Please notice the different scopes of each technique. Select the
technique that is most suitable for your needs.
5.6.3.1 Forcing a Full Refresh for a Certain Subscription Set

The following very simple statement can be used to force a full refresh for a
single subscription set.
Connect to the replication control server in order to execute the statement:

SET LASTSUCCESS = NULL,
SYNCHPOINT = NULL,
SYNCHTIME = NULL
The statement resets certain columns of the subscription set table to their
initial values.

5.6.3.2 Forcing a Refresh for All Sets Reading From a Source Table
The following statement can be used to force a full refresh for all subscription
sets reading from a certain replication source table.
Connect to the replication source server in order to execute the statement:

SET SYNCHPOINT = NULL ,
SYNCHTIME = NULL
The statement will reset the SYNCHPOINT and SYNCHTIME columns, for all
subscriptions replicating from the source table, to NULL. It has the same
effect as a Capture COLD start, but limited to only one replication source
table.
Advice: Doing so could cause a lot of network traffic. Also, replication targets
maintaining histories might lose data. Think twice!
5.6.3.3 Forcing a Refresh for All Sets Reading from a Source Server
Start Capture in COLD mode. This is the ’brute force’ method. We strongly
recommend never to COLD start Capture within a production environment.
Capture performs an overall cleanup when starting in COLD mode. For
example, Capture removes all the records from all the change data tables.
Refer to the DB2 Replication Guide and Reference, S95H-0999 for more
details about Capture COLD starts.
5.6.4 Dropping Unnecessary Capture Triggers for Non-IBM Sources

Change capture triggers are always automatically generated for the three
possible operations (insert, update, delete). Therefore, the definition of a
non-IBM table as a replication source will always result in the creation of
three change capture triggers:
• 1 trigger for INSERT
• 1 trigger for UPDATE
• 1 trigger for DELETE
There may be replication requirements—perhaps data warehouse

requirements—that make it necessary to exclude DELETES, for example,
from replication. For DB2 replication definitions, we do this by defining a
subscription predicate like IBMSNAP_OPERATION IN (’I’, ’U’), because DProp
Capture always captures all changes to a DB2 table into the change data
table.

For non-IBM replication sources, we could just drop (or disable) the DELETE
trigger to achieve the same result. As an additional advantage, the workload
of regularly executed delete jobs would be reduced, because the deletes
would not be unnecessarily captured by the triggers.
Refer to Chapter 8, “Case Study 3—Feeding a Data Warehouse”, especially

to 8.4.5.2, “Subscribe to the Sales Table” on page 248 for a useful business
example on how to limit the data to be replicated to INSERTS only.
5.6.5 Modifying Triggers for Non-IBM Sources

Modify triggers to include more logic, if required. For example, you can
modify the triggers to capture certain changes only. Make use of what the
non-IBM database delivers.
5.6.6 Changing Apply Qualifier or Set Name for a Subscription Set

If you want to add already existing subscription sets to another Apply
Qualifier, or if you simply want to rename a subscription, you can either:
• Drop and re-create the subscription set (and initialize it again with a full
refresh).
• Simply change the Apply Qualifier and the subscription set name without
dropping and redefining all subscription members, and without the need to
initialize existing target tables.
To change either the Apply Qualifier or the subscription set name, follow the
procedure below:
1. Stop the Apply process servicing the Apply Qualifier that you want to
change.
2. Update all tables at the control server to change the Apply Qualifier and
set name.
-- Change APPLY_QUAL / SET_NAME within the Subscription Set Table
SET APPLY_QUAL = ’<new_apply_qual>’,
SET_NAME = ’<new_set_name>’
-- Change APPLY_QUAL / SET_NAME within the Subscription Member Table

UPDATE ASN.IBMSNAP_SUBS_MEMBR

-- Change APPLY_QUAL / SET_NAME within the Subscription Columns Table

UPDATE ASN.IBMSNAP_SUBS_COLS
-- Change APPLY_QUAL / SET_NAME within the Subscription Statements Table

UPDATE ASN.IBMSNAP_SUBS_STMTS
3. Update the pruning control table at the replication source server to change
the Apply Qualifier and set name.
-- Change APPLY_QUAL / SET_NAME within the Pruning Control Table
AND CNTL_ALIAS = ’<cntl_alias>’
AND TARGET_SERVER = ’<target_server>’;
4. Restart Apply. Remember to provide a new password file if Apply is

running on Windows or UNIX platforms, because the Apply Qualifier is
part of the name of the password file.

5.7 Summary
Wow, that was a lot of stuff. You, as a replication administrator, do not have to
know everything mentioned in this chapter right from the beginning, but the
more you know, the more comfortable you will feel.
As a reminder, we will summarize all the activities that the replication

administrator has to deal with, in a cross-platform, multi-vendor,
high-performance distributed relational data replication system:
• Initialize the replication system
• Perform repetitive tasks, such as database and replication housekeeping
• Guarantee optimal distributed performance
• Monitor replication
• React on replication setup change requests
In Part 2 of this redbook, we will use several of the discussed design

alternatives and setup strategies in four case studies. All case studies deal
with a different business problem. All case studies integrate a different
non-IBM database system, either used as replication source or as replication
target.
And now: Happy replication!

Part 2. Heterogeneous Data Replication—Case Studies

Chapter 6. Case Study 1—Point of Sale Data Consolidation, Retail
The case study introduced in this chapter gives an implementation example

for projects that are using multi-vendor databases as a source for replication.
For the specific business application we are using in this example, we have
chosen Informix Dynamic Server (V7.3) as the replication source database,
but the techniques that we are going to use are applicable as well to other
non-IBM source databases, such as Oracle, Microsoft SQL Server, or Sybase
SQL Server, too.
As a business example we chose a retail application that aims to consolidate

sales data, which originates at several distributed branch offices, into a
central data store. The central replication target platform is DB2 for OS/390.
The overall objectives of this case study are to:

• Show an example for replication from multi-vendor source databases.
• Provide a general solution for all related business requirements that have
to consolidate distributed data to a central site.
• Use Apply for OS/390 to replicate Informix source data.
In accordance with the guidelines presented in Part 1 of this book, this

chapter is structured into the following phases:
Planning: The planning section contains the description of the business

requirements and an overview of the solution we want to demonstrate.
Design: This section is used to highlight the design options that are most
appropriate to implement this data consolidation application. We will give
additional recommendations on how to scale the application to a large
number of replication source servers.
Implementation: Working along the general implementation checklist that

was provided in Chapter 4, we will demonstate how we set up the test system
to prove the chosen design. Especially, those setup steps are described in
detail, that are unique for this case study. For example, we a providing a lot of
interesting details about how we connected DataJoiner to the Informix
instances that we used as replication source server, and how we set up the
replication subscriptions to consolidate the distributed data into one central
table.
Finally, we are going to reveal some details about how the capture triggers
are used to emulate all functions, that, for DB2 replication sources, are

provided by DProp Capture. This general section is applicable to all non-IBM
replication source databases.
6.1 The Business Problem

A retail company with a number of branches throughout the country uses a
central DB2 for OS/390 Data Sharing Group at the company’s head office and
Informix Servers on AIX in the remote branches. The major business
applications are maintained on OS/390, whereas electronic point-of-sales
(EPOS) systems are maintained on AIX servers at each of the branch offices.
Figure 21 displays the high level system architecture used by the "retail"
company. So far, no database connectivity exists between the Informix EPOS
systems and the mainframe DB2 data sharing group. Until today, data has
only been exchanged through FTP using the existing TCP/IP network that
connects all branch offices to the company’s headquarters.
Company Headquater
DB2 Data Sharing Group
DB2I
TCP/IP
Branch Branch Branch Branch

01 02 03 nn
Informix Informix Informix Informix
Figure 21. Case Study 1—High Level System Architecture
Considering a work flow approach, the business system we are architecting

has to deal with two major information flows:
• One information flow has to supply all branches with product and price
information. The product set and also the prices can vary from branch to
branch. From a data replication perspective we call this a data distribution
scenario, most likely using advanced subsetting techniques to deliver only
the information to a branch that belongs to the particular branch’s data
scope.

• The second information flow consolidates data, which is autonomously
created at the branches, back to the company’s head office. In this
example, we consider this data to be SALES data, generated by electronic
point-of-sales (EPOS) applications. From a data replication perspective
we call this a data consolidation scenario, most likely using advanced
aggregation techniques to condense the amount of data that has to be
replicated back to the head office each day.
Figure 22 visualizes both information flows. In this chapter, we will focus on

data consolidation techniques, with the additional consideration that the
branch offices are using Informix Dynamic Server as their database
management system.
To gain knowledge about data distribution techniques, please refer to Chapter

6, “Case Study 1—Point of Sale Data Consolidation, Retail” on page 139,
which explicitly focuses on a data distribution scenario.
DB2 for OS/390 V5

PRODUCT / PRICES
Business
Applications
EPOS
System
SALES DETAILS
Informix IDS V7.3
Figure 22. Major Information Flows
Related Business Problems

The replication techniques subsequently introduced in this case study
generally apply to all scenarios consolidating multiple identically structured
source tables into one single target table. In addition to the retail case study
introduced in this chapter, the following examples are candidates for data
consolidation through replication:
Case Study 1—Point of Sale Data Consolidation, Retail 141

• System Management applications, heterogeneous or not, that store
configuration data regarding all client stations at distributed LAN servers.
To enable central User Help Desk (UHD) applications, it is required to
consolidate the configuration data into a central data store.
• Banks use replication techniques to consolidate customer data, recorded
at branch offices, into the central customer information system.
• Manufacturing companies, whick operate several independent production
plants, consolidate process information recorded at plant level into central
production planning and control systems.
Remark: All data replication and consolidation techniques introduced in this

chapter are of course applicable to all DB2 replication source systems as well
as to all other non-IBM replication source platforms.
6.2 Architecting the Replication Solution

Following the task guidelines that we recommended in part one of the book,
we will design architectural options before actually starting to implement the
solution.
As explained in Chapter 3, “System and Replication Design—Architecture” ,

the discussion of basic architectural options and recommendations can be
divided into the following topics:
• Architecting the system design
• Architecting the replication design
We will follow this approach while designing the replication solution for this
case study.
6.2.1 Data Consolidation—System Design

In Chapter 3 we discussed all system design alternatives and design options
in detail. For replication applications used to consolidate data from several
distributed system components to a central location, we clearly elaborated
the following recommendations with regard to the placement of the different
system components.
6.2.1.1 Placement of the System Components

Design decisions have to be made fot the location of DProp Apply, the DProp
control tables, and the DataJoiner middleware server.

DProp Apply: In the data consolidation application that we are going to
implement, we are dealing with one central replication target server. The
number of source servers may be large. Both cost of administration as well as
best performance can be optimized by locating Apply centrally at the
replication target server.
Control Table Placement: The control tables that coordinate change capture
always have to be created at the replication source server. Apply’s control
tables, the control server tables , can be placed anywhere in the network. As
we decided to locate Apply centrally at the replication target server database,
we also will create Apply’s control tables within the replication target server
database. All subscription information can be retrieved using only one local
database connection. Performance and manageability could not be better.
DataJoiner Placement: One central DataJoiner instance enables best

performance as well as ease of administration for the data consolidation
approach that we want to implement.
Refer back to 3.2, “System Design Options” on page 39 for a deeper

discussion of the system design options available for replication systems
using IBM DProp and IBM DataJoiner.
6.2.1.2 DataJoiner Databases

As discussed in detail in Chapter 3, one DataJoiner database is required for
each heterogeneous replication source database. Each DataJoiner database
will contain one server mapping for one Informix replication source database
(other server mappings are optional, but they will not be used for replication).
One DataJoiner Database for Each Non-IBM Replication Source

To create one DataJoiner database for each non-IBM replication source
database is a technical requirement of the DataJoiner / DProp replication
solution.
Let us see how we can deal with this requirement. The most interesting
question here is, whether there will be any volatile data stored in these
DataJoiner databases that will make any housekeeping for the DataJoiner
databases necessary. And the answer is definitely NO!
In a replication environment, the DataJoiner databases are only used to store

nicknames. After the nicknames are defined, the data within these DataJoiner
databases will never change again. All control tables that are needed to
configure the capture triggers are directly placed within the Informix source
database. They are only referenced by nicknames. All other control tables
that are created by default within the DataJoiner databases will not contain

any data (they are just necessary to successfully bind Apply’s static SQL
packages against the DataJoiner databases).
In summary, no regular maintenance is required for the possibly large number

of DataJoiner databases.
Disk Space Requirements

The only concern you might still have is that a large number of databases will
occupy some disk space. Well, that is right. In the test setup described in this
chapter, each DataJoiner database occupied about 20 MB of disk space.
More than half of the space (12 MB) was used by log files. That means, the
occupied disk space could be slightly reduced by reducing the log space
offered for each database (knowing that we will not have any transactions
writing into DataJoiner objects anyway).
As a rule of thumb to estimate how much disk space will be finally required for
all DataJoiner databases, multiply the number of non-IBM source servers by
20 MB:
Number of Non-IBM Source Servers * 20 MB = Required DJ DB Disk Space
To consolidate data from 50 Informix servers for instance will require 1 GB of

disk space at the DataJoiner server (the data volumes to be replicated do not
influence the above formula).
6.2.1.3 How Many Apply Instances?

When designing the solution, we finally have to decide how many Apply jobs
will be collecting the data from the remote sources. Well, generally speaking,
one Apply job is a good first approach. This Apply job, identified by a single
Apply Qualifier, would be servicing all remote sources, one after the other.
Optionally, to collect data from several branches in parallel, it would be a

good idea to set up multiple Apply instances, one instance for each Apply
Qualifier. Each Apply job would be servicing a different subset of all available
subscriptions.
If you realize, when your replication system is growing, that it takes too much
time to collect data from all branches sequentially, you can always
re-distribute the subscriptions that are already running over all available
Apply qualifiers. To do so, follow the instructions given in 5.6.6, “Changing
Apply Qualifier or Set Name for a Subscription Set” on page 134.
A good example for distributing subscription sets over multiple Apply

Qualifiers would be:

• One Apply job for one or a small number of branches with a large
replication volume
• One Apply job for several branches with a smaller replication volume
OS/390 Remark: On OS/390, for example, the size of the spill file that Apply
allocates when it fetches data from a change data table is defined within the
Apply start job. If you specify a huge file size, because the biggest shop
requires it, a huge spill file is allocated for every set that this Apply job (this
Apply Qualifier) services. This remark does not apply to Apply for UNIX or
Intel platforms.
6.2.2 Data Consolidation—Replication Design

After deciding about the system design, and before implementing the
solution, we will be deciding the major replication design issues. Most
interesting, because we have not discussed this before, will be the
introduction of the target site UNION approach to achieve the automated
consolidation of several distributed tables into one single target table.
6.2.2.1 Are Before Images Required?

The central sales application the retail company is deploying needs the data
recorded at the branches without any structural change. Before Images are
not required.
6.2.2.2 Are Primary Keys or Partitioning Keys Ever Updated?

Sales data is always inserted into the sales tables at the branch offices. The
data is never updated. Therefore, we do not need to take special care on how
to capture updates. We will not change the DProp standard setting to capture
updates as updates.
6.2.2.3 Data Consolidation: Target Site Union

The data consolidation approach we are going to use in this case study could
be easily compared to a materialized UNION of all identically structured
distributed replication source tables.
Consider that the same identically structured table, in our example the
SALES table, is created at each of the distributed locations. Additionally,
consider that all the distributed tables will be defined as sources for data
replication.
To consolidate the content of all the SALES tables into one large
company-wide SALES table that contains the data of all distribution sites, the
technique we are introducing here basically requires creating multiple

subscriptions that all point to the same target table. What we have to be
concerned with, is that an initial refresh from one location does not replace
the data already available at the target site, but "appends" the data from the
new location to what is already available.
In DProp terminology, we call this approach a Target Site UNION approach.
Implementing a Target Site Union Design

In fact we will let Apply replicate all changes from all locations into the same
target table. But to prevent that Apply replaces the target table content with
the data from one location only in case of an automatic full refresh; views are
going to be defined over the target table. Each subscription, coming from a
different target, will be defined to replicate through a different view. Figure 23
graphically represents this consolidation technique:
UNION TABLE
VIEW 1
VIEW 2
VIEW n
TARGET
SOURCE 1 SOURCE 2 SOURCE n
Figure 23. Target Site UNION Example—Basic Idea
The following task list describes how to set up a Target Site Union replication
system:
1. Create the replication target table manually. Use the same DDL (same
structure) as used at the distributed locations.
2. Create as many views over the target table as there are distributed
locations (create as many views as the number of subscriptions that you
expect). Each view should be created as follows:

SELECT * FROM OWNER.REPLTARGET
WHERE CONSTANT_COLUMN = ’<constant_value>’;
Usually, if data is distributed over several locations, the data contains a
unique identifier (like a branch number, for example), that shows where
this data was originated. The unique identifier can be a single column or
the combination of several columns.
Use all the columns that uniquely identify the source data when creating
the different views. If the source data does not contain any unique
identifier, refer to “What if the Source Data Contains No Unique Identifier”
on page 147 to see how such a unique identifier can be created during
replication (with DProp!).
3. After defining all target views, define one subscription set for each
replication source server. Add a member to each set, and select the
appropriate view (containing the WHERE clause that identifies the source
location) as target table for the subscription member.
In case of a full refresh, this technique lets Apply automatically append the
content coming from one location, instead of deleting the complete table and
inserting the data from the one location that is currently being refreshed.
Apply does so by deleting everything from the view, letting the WHERE
clause of the view limit the effect of the delete. Apply has no knowledge of the
UNION table: Apply knows only the view.
Background on Refresh
When Apply performs an initial refresh, Apply deletes the complete target
table before inserting the content selected from the replication source table
(Apply replaces the target content with the source content to initialize the
replication subscription). Defining the view as replication target table,
Apply’s delete (we call that mass delete) is restricted to the values that
fulfill the where-clause of the view.
In the next section, we are going to describe what to do if the existing

replication source tables do not contain any attribute that uniquely identifies
where the data was originally created.
What if the Source Data Contains No Unique Identifier

Regardless of whether your replication source tables, located at several
distributed source databases, contain an attribute that is unique at each
source site, we have to create a single view over the target table for every
single source table. This technique is needed, as described above, to let

Apply perform the full refresh without always deleting the complete target
table before inserting the records coming from another source.
If your source tables do not have an attribute that is unique at every source
site that could be used in the where-clause of the target site views, we have
two options to generate such a uniqueness attribute:
1. Create a new column at every source site.
2. Create a uniqueness attribute automatically during replication without the
need to change the source data model.
Obviously we could create another column, but that is perhaps not what we
want. More easily, we could use one of DProp’s advanced features and
create the uniqueness attribute on the fly (while replicating the data up to the
consolidated target).
The technique for creating a uniqueness attribute during replication is to add

a computed column to every subscription member that replicated into a
consolidated target table. This computed column needs to contain a different
constant string for every source server (like a branch number). We
recommend using the SQL function SUBSTR (substring) to compute a
constant.
Use the DJRA feature List Members or Add a Column to Target Tables to add
a computed column to a subscription member as shown in Figure 24.
Figure 24. Add a Computed Column to a Subscription

You can either create the target table (and the target views) including the new
target column already, or let DJRA create the new column for you. If the new
computed target column is not yet present, DJRA generates an ALTER
TABLE statement automatically.
The following SQL excerpt shows the most interesting statements that were
automatically generated by DJRA:
--* The column name REPLFLAG1 is not present in the target table
--* CHRIS.REPLFLAG.
ALTER TABLE CHRIS.REPLFLAG ADD REPLFLAG CHAR(8) NOT NULL WITH DEFAULT;
...
-- create a new row in IBMSNAP_SUBS_COLS
INSERT INTO ASN.IBMSNAP_SUBS_COLS
(APPLY_QUAL, SET_NAME, WHOS_ON_FIRST, TARGET_OWNER, TARGET_TABLE,
COL_TYPE, TARGET_NAME, IS_KEY, COLNO, EXPRESSION) VALUES
(’IFXUP02’, ’SET01’ , ’S’, ’CHRIS’, ’REPLFLAG’,
’C’, ’REPLFLAG’, ’N’, 3 , ’SUBSTR (’’BRANCH01’’ , 1 , 8)’);
Create the views at the target site referencing the new calculated column in
the where-clause, like:
WHERE REPLFLAG = ’BRANCH01’;
6.2.2.4 Aggregation
In common data consolidation examples, it not necessary to replicate all table
records created at the source sites. Instead, it would be sufficient to replicate
a summary only (for example, summaries grouped by products).
The IBM replication solution provides two methods for the replication of
summaries:
• Base Aggregates: Summaries are built over the replication source tables
• Change Aggregates: Summaries are built over the change data tables
Both techniques can, of course, also be used when replicating into a

consolidated target table. For an advanced practical example, have a look at
Chapter 8, especially 8.4.7.1, “DProp Support for Aggregate Tables” on page
256.
6.3 Setting Up the System Environment

This section will guide you through all installation and configuration tasks that
were necessary to set up the test system for this case study. Basically, we will
work along the implementation checklist that was provided in Chapter 4,
“General Implementation Guidelines” .

6.3.1 The System Topology
Figure 25 shows the overall test system setup for the data consolidation
scenario used in this case study. The test environment consists of three
database system layers:
• Layer 1: Informix Dynamic Server Version 7.3 for AIX, used as replication
source systems. The test scenario consists of three independent Informix
instances on three different AIX servers, namely sky, azov and star.
• Layer 2: IBM database middleware infrastructure. The main component of
this system layer is IBM DataJoiner, Version 2.1.1 for AIX. DataJoiner is
used as the database middleware gateway to enable transparent access
to the Informix replication source systems. Moreover, an IBM DB2
Universal Database (UDB) instance (without database) is used as a DRDA
application server to enable Apply to connect from OS/390 to DataJoiner
using TCP/IP (The DB2 instance would not be necessary if we had used
DRDA over SNA).
• Layer 3: IBM DB2 for OS/390, Version 5.1.1, is used as replication target
system. DProp Apply for OS/390, Version 5.1 is installed at the target
system.
Figure 25 shows that system layer 2 (the IBM database middleware layer) is
installed on one of the AIX servers (sky), that already contained one of the
Informix instances. The other two Informix instances are accessed remotely
using Informix ESQL/C client software.

DB2 for OS/390 Data Sharing Group DB2I
DProp
Apply
mvsip
DB2 UDB V5.2

DataJoiner V2.1
DJDB01
DJDB02
DJDB03
Ifx Client SDK
Informix V7.3
sj_branch03
Informix V7.3 Informix V7.3
sky
sj_branch01 sj_branch02
azov star
Figure 25. Case Study 1—Test System Topology
Smart Remark: All network connections between all system components use
TCP/IP as the network protocol.
6.3.2 Configuration Tasks

This section documents setup tasks, as we used them to prepare the systems
for case study 1. Refer to Chapter 4, “General Implementation Guidelines” on
page 61 for all general setup activities that are not specific to this case study.
Assumption: All three Informix server instances are installed and running.

Table 5 names all Informix instances on the dedicated AIX servers that were
used during this case study. The names will be referenced later during
several setup tasks.
Table 5. Informix Instances used in this Case Study
AIX server (AIX V 4.3) Informix Instance Name
sky sjsky_ifx01
azov sjazov_ifx01
star sjstar_ifx01
All Informix server instances were from "Informix Dynamic Server, Version
7.30UC7".
6.3.2.1 Install and Set Up the DataJoiner Middleware Server

One of the AIX servers belonging to the test setup is used as a branch server
as well as the IBM DataJoiner middleware server. The location of this AIX
server is assumed to be close to the company’s headquarters.
Set Up DataJoiner to Access the Informix Servers

Based on step 1 and step 2 of the implementation checklist, the first task was
to enable native Informix connectivity from AIX server sky (which, from an
Informix perspective, behaves as a client to all other Informix server
instances) to all other Informix instances. Use the Informix manuals
(especially the Administrator’s Guide for Informix Dynamic Server,
Part#000-4354) for further details. On sky, we completed the following three
configuration steps:
1. Install the Informix Client Code (We used Informix Client SDK for AIX,
Version 2.10UC1).
2. Set Informix environment variables:
INFORMIXDIR - pointing to the Informix ’home’ directory
INFORMIXSQLHOSTS - if sqlhosts file is located in a directory other
than $INFORMIXDIR/etc
3. Set up the Informix connectivity file sqlhosts
In order to connect to all Informix instances, the sqlhosts file used on sky was
configured with the following four entries. Please be aware that we will
reference these entries later when creating the DataJoiner server mappings.
#********************************************************************
#
# location: $INFORMIXDIR/etc/sqlhosts
#

# Title: sqlhosts
# Description: sqlhosts file to access Informix servers at IBM ITSO
#
#*********************************************************************
sjazov_ifx01 onsoctcp azov 2800

sjstar_ifx01 onsoctcp star 2801
sjsky_ifx01 onsoctcp sky 2810
sjsky_ifx01_shm onipcshm sky dummy
To check the success of this configuration step we used the Informix client
interface dbaccess to natively connect to all three Informix instances. Refer to
Appendix B, especially B.2.2, “Using Informix’s dbaccess” on page 329 for
useful instructions on how to set up and use Informix’s client interface
dbaccess.
Prepare DataJoiner to Access the Informix Servers

To set up this heterogeneous environment we used IBM DB2 DataJoiner for
AIX V 2.1.1, PTF U462216.
The first step after loading the DataJoiner code onto the middleware server
was to create an Informix data access module (“Step 4—Prepare DataJoiner
to access the remote data sources” of the implementation checklist).
DataJoiner will use this access module for all connections to Informix using
the currently installed version of the Informix client.
In compliance with the DataJoiner for AIX Planning, Installation and

Configuration Guide, SC26-9145, the Informix data access module was
created using the following command, logged on as user root:
make -f djxlink.makefile informix72
Edit the file djxlink.makefile before executing the make command to set the
Informix environment variables accordingly.
The result of executing the make command will be the Informix data access
module, named ’ informix72’.
Remark: The name of the DataJoiner data access module we created during
this step is ’informix72’. Nonetheless, we will access Informix servers running
Informix Dynamic Server Version 7.3. To clarify, the name of the data access
module is not related to the server version. It is just a label. If you like,
change the name when building the data access module.

Set Up a DataJoiner Instance
In accordance with “Step 5—Create a DataJoiner instance” of the general
implementation checklist, the DataJoiner instance was created. We created
the instance with the name ’djinst3’. The instance was set up to support
TCP/IP clients. Finally, three databases were created, one for each Informix
replication source server.
Why Another DB2 UDB Instance?!

Well yes, that’s an interesting point! If you go back to Figure 25 on page 151,
you can see that the DB2 UDB instance was installed on AIX server sky.
According to the information flow, note that it is placed between the
DataJoiner instance and the DB2 for OS/390 replication target system. The
following sections explain why and how we used the DB2 UDB instance in
this case study:
DB2 UDB for AIX, Version 5.2, was not only installed to demonstrate how
flexible the DB2 communication setup really is, but also to work around a
current limitation of DataJoiner: Although DataJoiner, V2.1.1, contains the
capability to access all DRDA servers using TCP/IP, DataJoiner V.2.1.1 is not
enabled to be a DRDA application server through TCP/IP itself. DB2 UDB, by
the way, is.
Therefore, we are using DB2 UDB (without databases) simply as a DRDA

gateway (DRDA application server). DB2 for OS/390 (APPLY, respectively)
has no knowledge about the DataJoiner Instance. It only knows the
DataJoiner databases and the DB2 UDB instance. DB2 for OS/390 will
always "hop over the DB2 UDB instance" to connect to the DataJoiner
databases.
How Does it Work:

1. Both the DB2 UDB instance and the DataJoiner instance are accepting
TCP/IP clients. DataJoiner and DB2 UDB are assigned to different TCP/IP
ports (this is done by updating the database manager configuration of
each the DB2 and DataJoiner instance, setting the SVCENAME
parameter).
To update the database manager configuration of the DB2 UDB instance,
we logged on as DB2 instance owner and issued the following command
(the port address you are using might be different):
update database manager configuration using SVCENAME 2820
Note that we will configure DB2 for OS/390 to connect to this DB2 UDB
instance by assigning the DB2 UDB instance’s port number to the target

locations that we cataloged into the communication database of the
DB2/390 system.
To update the database manager configuration of the DataJoiner instance,
we logged on as DataJoiner instance owner and issued the following
command:
update database manager configuration using SVCENAME 2822
2. Within the DB2 UDB instance, we subsequently cataloged the DataJoiner
instance as a TCP/IP node as well as all DataJoiner databases:
Catalog DataJoiner at the DB2 UDB instance:
catalog tcpip node DJSKY remote sky server 2822
Catalog the DataJoiner databases at the DB2 UDB instance:
catalog database DJDB01 at node DJSKY
3. Next, we configured the DB2 for OS/390 communication database (CDB).
The values were inserted, as if the DataJoiner databases were located at
the DB2 UDB instance:
-- SYSIBM.LOCATIONS ------------------------------------------------
--
-- LOCATION: DJ database, DJDB01 in this case
-- LINKNAME: Arbitrary pointer to SYSIBM.IPNAMES
-- PORT: The TCP/IP service port of the DataJoiner instance
-- TPN: This column is onlt used for APPC connections
insert into SYSIBM.LOCATIONS

(LOCATION, LINKNAME, PORT) values
(’DJDB01’, ’UDBSKY’, ’2820’)
;
(’DJDB02’, ’UDBSKY’, ’2820’)
;
(’DJDB03’, ’UDBSKY’, ’2820’)
;
-- SYSIBM.IPNAMES -------------------------------------------------
--
-- LINKNAME: Pointer to SYSIBM.LOCATIONS
-- IPADDR: HOSTNAME or IP Address
-- SECURITY_OUT: P (connect with userid and password)

-- USERNAMES: O (outbound)
insert into SYSIBM.IPNAMES

(LINKNAME, IPADDR, SECURITY_OUT, USERNAMES) values
(’UDBSKY’, ’sky.almaden.ibm.com’, ’P’, ’O’)
;
-- SYSIBM.USERNAMES -----------------------------------------------
--
-- TYPE: O (Type of translation, oh for outbound)
-- AUTHID: DB2RES5
-- LINKNAME: Pointer to SYSIBM.LOCATIONS
-- NEWAUTHID: djinst3 (DJ authid)
-- PASSWORD: djinst3 (dj’s password)
insert into SYSIBM.USERNAMES

(TYPE, AUTHID, LINKNAME, NEWAUTHID, PASSWORD) values
(’O’, ’DB2RES5’, ’UDBSKY’, ’djinst3’, ’pwd’)
;
Refer to the IBM Redbook Wow! DRDA supports TCP/IP, SG24-2212 for
further details on how to set up DRDA connectivity using TCP/IP between
DB2 for OS/390 and other DB2 database servers.
Remark: DB2 for OS/390 caches the tables of the communication database
(CDB). Therefore, if you update your CDB tables again after the first
connection attempt, you will need to recycle the DB2 Distributed Data Facility
(DDF) to make your changes effective.
To reposition ourselves: We just completed “Step 8—Enable DB2 clients to

connect to the DataJoiner databases” of the implementation checklist. DB2
for OS/390 is DataJoiner’s client in this case.
Set Up DataJoiner to Transparently Connect to Informix

The following paragraphs will provide you with the details that we used to
configure DataJoiner’s access to Informix databases. The SQL statements
refine “Step 9—Create Server Mappings for all non-IBM database systems”
and “Step 10—Create the Server Options” of the general heterogeneous
implementation checklist.
Because one DataJoiner database is needed for every non-IBM replication

source database, we created one single server mapping within each of the
DataJoiner databases. The SQL statements below show how we configured
the DataJoiner database DJDB01.
connect to djdb01 ;

create server mapping from SJ_BRANCH01 to
node "sjazov_ifx01"
database "sj_branch01"
type informix
version 7.3
protocol "informix72"
;
create server option TWO_PHASE_COMMIT
for server SJ_BRANCH01 setting ’N’
;
create user mapping from djinst3 to
server SJ_BRANCH01
authid "djinst3"
password "pwd"
;
Remark: Even if the DataJoiner user (djinst3 in this case) is authorized to

natively connect to Informix, a user mapping will be needed when the DProp
control tables are created. Note that the password specified when creating
the user mapping will be encrypted before it is stored within the DataJoiner
catalog tables.

A Windows NT Workstation Version 4.0 PC was used as replication
administration workstation. The software components that had to be installed
were:
• DB2 UDB Client Application Enabler V 5
• DataJoiner Replication Administration (V. 2.1.1.140)
From this client, we cataloged the DataJoiner instance as TCP/IP node and
all databases directly at the DataJoiner instance (for LAN connections, no
hopping over DB2 UDB is required). The DB2 for OS/390 replication target
server was also cataloged at the DataJoiner instance, using the DataJoiner
instance as DRDA Gateway.
Create Replication Control Tables

The replication control tables were created for DB2 for OS/390 (replication
target) and for all DataJoiner databases. Notice that, because we are using a
non-IBM database as a replication source, some replication control tables are
natively created within the Informix source databases, and others are created
within the DataJoiner databases. Refer to 3.2.3, “Control Tables Placement”
on page 46 for the background. Nicknames are created within the DataJoiner

databases to enable transparent access to those control tables created at the
remote data sources.
Bind Apply
After creating the replication control tables, Apply for OS/390 was bound
against the replication target server (DB2 for OS/390) and against all
DataJoiner databases. Refer to “Step 22—Bind DProp Apply” of the general
implementation guidelines for more details about the Bind task.
6.4 Nice Side Effect: Using SPUFI to Access Multi-Vendor Data

Wouldn’t you like to know (from a DB2-for-OS/390 perspective) what the
client-server folks are doing down there on their workstations? You, as a DBA
using DB2 for OS/390, can easily find this out, after the DRDA connectivity
between DB2 for OS/390 and DataJoiner and the connectivity between
DataJoiner and Informix has been established, by using SPUFI on OS/390 to
query Informix data (generally all databases that can be transparently
accessed through DataJoiner).
To enable SPUFI to work with DataJoiner databases, just bind SPUFI against
those databases.
SPUFI packages have to be bound against all new locations you want to
access (in our case, all three DataJoiner databases), and the SUPFI plan has
to be rebound, for all locations (including those you were accessing already
before). The following excerpt of the Bind job shows the procedure:
DSN SYSTEM(DB2I)
BIND PACKAGE (DJDB01.DSNESPCS) MEMBER(DSNESM68) -
ACT(REP) ISO(CS) SQLERROR(NOPACKAGE) VALIDATE(BIND)
BIND PLAN(DSNESPCS) PKLIST(*.DSNESPCS.DSNESM68) -

ISOLATION(CS) ACTION(REPLACE)
To process any SQL against DataJoiner, and therefore any SQL against
Informix, possibly using nicknames, DataJoiner’s PASSTHRU mode or
transparent DDL, set the CONNECT LOCATION on the SPUFI main panel to
the location name of the DataJoiner database (as defined within the DB2 for
OS/390 communication database):
For remote SQL processing:

10 CONNECT LOCATION ===> DJDB01
Play around! Create a table in, say Informix, using SPUFI for OS/390, to
realize that there are no more limits (ask your DataJoiner administrator for the
necessary database privileges).
6.5 Implementing the Replication Design

In this section we will actually describe which replication sources and targets
we defined to consolidate Informix sales data on DB2 for OS/390.
Assumption:
All Informix databases contain an identically structured sales table. The table
name is SJCOMP.SALES.
The Informix databases are called sj_branch01 (server azov), sj_branch02

(server star), sj_branch03 (server sky).
6.5.1 Registering the Replication Sources

If a table located at a non-IBM database is going to be registered as a
replication source, a NICKNAME for the table has to be created upfront
(manually). This nickname will be registered as a replication source using the
DataJoiner replication administration.
Therefore, the first task when preparing the registration of the SALES tables,
located within the three Informix source databases, was to create a nickname
for every SALES table:
CONNECT TO DJDB01;
-- Create a NICKNAME for the SALES table

CREATE NICKNAME SJCOMP.SALES FOR "SJ_BRANCH01"."sjcomp"."sales";
CONNECT TO DJDB02;

CONNECT TO DJDB03;


All nicknames can have the same name. They are all created within separate
databases.
After creating the nicknames, the DJRA function Define One Table as a
Replication Source was used to register the nicknames as replication
sources. We chose the following replication source characteristics for this
case study:
• Capture all available columns
• Capture After-Images only
• Capture Updates as Updates (not as Delete/Insert pairs)
When generating the SQL script to actually register a non-IBM table as a

replication source, DJRA determines if the object to be registered is a DB2
table or a nickname. Given that it is a nickname, DJRA automatically
generates native DDL statements (in this case native Informix DDL) to create
all necessary capture triggers as well as the change data table remotely at
the non-IBM database.
If you want to understand how the created change capture triggers finally
work, see section 6.7, “Some Background on Replicating from Multi-Vendor
Sources” on page 166. It introduces an overall picture of all triggers defined
for a non-IBM replication source server and describes how the triggers
interact to emulate all functions that, for DB2 replication sources, are
provided by DProp Capture.
6.5.2 Preparation of the Target Site Union

To prepare for the target site UNION, we created the target table at the
replication target server before defining the first subscription. Additionally, we
created one view over the target table for each subscription replicating into
the consolidated SALES table.
6.5.2.1 Creating the Target Table

We created the replication target table on DB2 for OS/390, using SPUFI:
-- Create Replication Target Table in DB2 for OS/390
CREATE TABLE SJCOMP.SALES (
DATE TIMESTAMP NOT NULL,
LOCATION DECIMAL (4 , 0) NOT NULL,
COMPANY DECIMAL (3 , 0) NOT NULL,
SALESTXNO DECIMAL (15 , 0) NOT NULL,
ITEMNO DECIMAL (13 , 0) NOT NULL,
PIECES DECIMAL (7 , 0) NOT NULL,
OUT_PRC DECIMAL (11 , 2) NOT NULL,

TAX DECIMAL (11 , 2) NOT NULL,
WGRNO DECIMAL (5 , 0) NOT NULL,
SUPPLNO DECIMAL (7 , 0) NOT NULL)
IN SJ390DB1.TSSALESH;
-- Create Replication TARGET Table Index in DB2 for OS/390

CREATE UNIQUE INDEX SALESIX ON SJCOMP.SALES
(LOCATION, COMPANY, SALESTXNO);
COMMIT;
6.5.2.2 Creating Views Over the Target Table

We created three views over the target table, one for each replication source
table (branch sales table). The WHERE clause uniquely identifies each
branch:
-- Create Replication TARGET View for branch01 (company 63 / location 54)
CREATE VIEW DB2RES5.SALES6354 AS
SELECT * FROM SJCOMP.SALES
WHERE COMPANY = 63
AND LOCATION = 54;

WHERE COMPANY = 63
AND LOCATION = 55;

WHERE COMPANY = 63
AND LOCATION = 57;
COMMIT;
6.5.3 Defining Replication Subscriptions

Because each of the three source tables is located at a different source
server (different DataJoiner databases, connecting to the Informix source
tables), we will have to define three subscription sets (one subscription set
always includes one source server and one target server).
After defining the sets, one member was added to each set.
We have been using DataJoiner Replication Administration (DJRA) to set up

the subscription set and the subscription members.

6.5.3.1 Creating Three Empty Subscription Sets
We created three subscription sets, as shown in Table 6. All subscriptions
sets were defined with the same Apply Qualifier; that means, all subscription
sets will be serviced by the same Apply job.
Table 6. Subscription Set Characteristics for the Data Consolidation Approach
Source Server Target Server Apply Qualifier Set Name Event Name
DJDB01 DB2I SALES01 BRANCH01 BRANCH01
Note that we chose event-driven subscription timing, using a single event for
every set (to better control the replication activities for our test scenario).
6.5.3.2 Adding One Member to Each Subscription Set

After the subscription sets were generated, one member was added to every
set. As target table we specified one of the previously created views:
Source Table Target View
DJDB01.SJCOMP.SALES01 --> DB2RES5.SALES6354
Note: DJRA also supports the setup of subscription members for existing
target tables or target views. That means, no target table is created if the
target table (or view) already exists. However, a CREATE TABLESPACE
statement is always generated, regardless of whether the target table exists
or not. We simply removed the CREATE TABLESPACE statement from the
SQL output that DJRA generated.
6.5.4 Starting Apply

One Apply job was created and customized for Apply Qualifier ’SALES01’. As
all sets are defined for the same Apply Qualifier, this Apply job will service all
three subscriptions.
6.5.4.1 The Customized Apply Job

The following job excerpt shows the invocation parameters that we used to
start the Apply job:
000039 //ASNARUN EXEC PGM=ASNAPV25,
000040 // PARM=’SALES01 DB2I DISK’
000041 //* <== APPLY_QUALIFIER
000042 //* <== CONTROL_SERVER

000043 //* <== SPILL FILE OPTION ’DISK’
6.5.4.2 Triggering an Event

The actual replication action was triggered by inserting events into DProp’s
event table. The data from all three source servers was separately replicated
by issuing separate inserts into the event table.
The following insert into the event table, for example, will trigger the
subscription replicating from branch 03:
INSERT INTO ASN.IBMSNAP_SUBS_EVENT (EVENT_NAME, EVENT_TIME)
VALUES (’BRANCH03’, CURRENT TIMESTAMP);
Remark: Apply queries the event table after every subscription cycle to see if
there are new events that trigger another subscription. If there is nothing to
replicate, Apply will at least query the event table every 5 minutes.
6.5.5 Some Performance Remarks

Please do not think that we are seriously going to compare Informix on AIX
performance with DB2 for OS/390 performance. But to give you a feeling for
what you can expect, have a quick look at Figure 26. The figure shows two
bars.
Bar 1 visualizes the amount of time that it took to insert a day’s worth of sales
data (27,340 rows) into the sales table at Informix. (The value was taken from
the performance measurement experiment in Chapter 3: 3.4, “Performance
Considerations for Capture Triggers” on page 55. Even though we have set
up capture triggers for the Informix table during this case study, we want to
eliminate the impact of change capture triggers for this comparison.)
Bar 2, now, shows the time Apply for OS/390 needed to replicate the
captured changes (27,340 rows) to DB2 for OS/390. This time bar is divided
into two sections:
• Section 1: Apply’s fetch phase, fetching the data from Informix/AIX into the
Spill file on the host.
• Section 2: Apply’s insert phase, inserting the change data from the spill file
into the target table (through the target view).
Remark: The start and the end of the insert phase was exactly measured by
adding SQL statements (one of type B, one of type A) to the subscription set,
that inserted the current timestamp into a separately created table. See
Figure 26.

140
sec
Inserts into *)
Informix/AIX
45 55 INSERT Phase
Applying the sec sec
FETCH Phase
Change Data
to DB2 for *) Informix Insert Performance
OS/390 without change capture triggers
60 120
sec sec
Figure 26. Replication Performance Remarks
As expected, the Inserts on the host are quicker that the Inserts on AIX! Even
though you might consider this to be obvious, we would like to use this result
to encourage you to invest some time on performance considerations before
you decide about the platform of your central data store or data warehouse.
6.6 Moving from Test to Production

The next major step will be to move the tested replication solution to the
productive system. As a special consideration for a large data consolidation
scenario, the replication source server configuration has to be rolled out to a
considerable number of source databases.
As we have seen, no manual interaction is necessary to initialize the

replication environment once all replication definitions are created at the
replication source and target server. Additionally, we are only dealing with
one single and centrally located DataJoiner instance here, which will make
this step easier.
Summarizing, the following definitions have to be created or carried over from

the test site to configure the production system:
• One DataJoiner database for every Informix source server
• The replication control tables within every replication source server
database, both DataJoiner and Informix
• One source table registration for every source database (which includes
Informix triggers and inserts to replication control tables)

• One subscription set for every source server
• One subscription member for each subscription set
• The productive target table, which has to be created manually
• One target view for each subscription member
The main issue will therefore be to clone the available setup information and
all defined database objects (like change data tables or capture triggers) to
meet the productive requirements. Mainly, two different strategies can be
followed to achieve this cloning:
• Strategy 1: DJRA provides a feature to re-generate DProp control
information, by re-engineering inserts to the DProp control tables from
existing definitions. This feature is called the PROMOTE feature (also
referred to as the CLONE feature).
It is recommended to use the promote function when carrying replication
definitions over from a test to a production system, because all changes
made to the replication control tables after the initial setup (for example, to
tune the setup) will be caught by PROMOTE.
• Strategy 2: Save all DJRA-generated or customized SQL scripts that
were used to configure the test system. As an option, anonymize the
scripts and generate new scripts from the anonymized examples when
adding a new source server to the replication system. Objects that are
unique for each productive instance are:
• CONNECT statements (either to the source server or to the control
server)
• Non-IBM database names, which are referenced in SET PASSTHRU
commands or CREATE NICKNAME statements
• References to the replication source server, the replication target
server, and the replication control server, that are named within the
INSERT statements that configure the replication control tables.
If separate procedures exist to create database objects for Informix and
DB2/DataJoiner, divide the generated scripts into one DB2/DataJoiner part
and one Informix part.
Remark: You may notice that the pruning control trigger code changes
with every new replication source table that is added to a non-IBM
replication source server.

Remark: Please note, that when this book was edited, DJRA’s PROMOTE
feature did not yet support heterogeneous replication sources (non-IBM
change data tables, capture triggers cannot be promoted so far). Please
watch for future releases.
6.7 Some Background on Replicating from Multi-Vendor Sources

Change capture for non-IBM databases, such as Informix (as demonstrated
in this case study), Oracle, Microsoft SQL Server, and Sybase SQL Server, is
achieved by creating capture triggers upon the multi-vendor replication
source tables that queue all changes made against source tables in a change
data table. This section contains an overall picture of all different kinds of
triggers that are defined for non-IBM replication sources, and describes how
the triggers interact to emulate all functions, that, for DB2 replication sources,
are provided by DProp Capture.
6.7.1 Using Triggers to Emulate Capture Functions

Considering a non-IBM replication source, triggers are used to emulate all the
Capture functions. Capture triggers must be provided for the following three
main Capture tasks:
• Capture of changes (inserts, deletes, updates)
• Pruning
• Advancing the SYNCHPOINT column in the REGISTER table
The change capture triggers will feed the Change Data tables at the
multi-vendor replication source database. Providing compatibility with DB2
replication sources, capture triggers can be defined to capture both before
and after images or after images only. Additionally, the DProp Capture feature
to capture updates as delete-and-insert pairs can be emulated. Of course,
triggers can be set up to capture either only certain columns of a replication
source table or all the available columns.
The pruning trigger is used to delete records which are no longer needed
from the non-IBM replication source’s Change Data tables. Change Data
table rows are no longer needed when all the Apply processes have
replicated these records to the replication targets. The pruning trigger is
defined on the pruning control table (within the non-IBM replication source
database) and is invoked when Apply updates the pruning control table after
successfully replicating a Subscription Set. Refer to Chapter 5.5.13.2, “How
to Defer Pruning for Multi-Vendor Sources” on page 127 to see how to gain
performance benefits by temporarily disabling the pruning trigger for non-IBM

sources. Disabling the pruning trigger can be used to emulate Capture’s
NOPRUNE runmode.
The reg_synch trigger is used to advance the SYNCHPOINT value in the

ASN.IBMSNAP_REGISTER table, for all the registered replication sources,
before the Apply program accesses the REGISTER table to see if new
changes are awaiting replication. For DB2 replication sources, Capture
maintains the SYNCHPOINT column to provide Apply with performance hints.
Apply uses this column to see which Change Data tables did receive new
changes since the previous replication cycle. Apply can omit to open cursors
for those tables that did not receive any new changes.
We have seen when setting up the replication definitions for this case study
that all the triggers are created natively within the non-IBM replication source
database. The reg_synch trigger is defined when the control tables are
created; the capture triggers are generated when a non-IBM table is defined
as a replication source.
6.7.1.1 Change Capture Triggers

Although the three kinds of triggers (capture, pruning, reg_synch) have to be
in place to properly capture changes for non-IBM source tables, the most
interesting ones are the change capture triggers. The change capture triggers
feed the change data tables that Apply will use to replicate database changes
to the replication target tables.
Change capture triggers are always automatically generated for the three
possible DML operations. The definition of a non-IBM table as a replication
source, therefore, always results in the creation of three native change
capture triggers:
• One trigger for INSERT
• One trigger for UPDATE
• One trigger for DELETE
Change Capture Trigger Example

The DDL for all the necessary triggers is automatically created by DJRA.
Nevertheless, we want to give you a feeling of what the capture triggers look
like.
Although we discussed the setup of a replication system using Informix

source servers in this case study, we want to show the DDL to create a
change capture trigger for an Oracle replication source table this time, just to
remind you that the same basic setup and the same replication techniques
would be applicable to an Oracle environment as well. Figure 27 shows the

DDL to create the capture trigger (AFTER INSERT) for the Oracle replication
source table CHRIS.SALES.
It is possible to define a naming scheme for heterogeneous triggers by

customizing DJRA’s REXX user exit for replication sources, called
SRCESVR.REX. We chose to name the insert trigger by adding the constant ’CD’
to the name of the replication source table. Doing so, the name of the insert
trigger is CHRIS.ISALESCD.
-- create the insert trigger for CHRIS.SALES

CREATE TRIGGER CHRIS.ISALESCD
AFTER INSERT ON CHRIS.SALES FOR EACH ROW
BEGIN INSERT INTO CHRIS.SALESCD
( ORADATE, LOCATION, COMPANY, SALESTXNO, ITEMNO,
PIECES, OUT_PRC, TAX, SUPPLNO,
IBMSNAP_COMMITSEQ,
IBMSNAP_INTENTSEQ,
IBMSNAP_OPERATION,
IBMSNAP_LOGMARKER )
VALUES (
:NEW.ORADATE, :NEW.LOCATION, :NEW.COMPANY, :NEW.SALESTXNO,
:NEW.ITEMNO, :NEW.PIECES, :NEW.OUT_PRC, :NEW.TAX, :NEW.SUPPLNO,
LPAD (TO_CHAR(CHRIS.SGENERATOR001.NEXTVAL), 20 ,’0’),
LPAD (TO_CHAR(CHRIS.SGENERATOR001.NEXTVAL), 20 ,’0’),
’I’,
SYSDATE );
END;
Figure 27. Example of an Oracle Change Capture Trigger (Insert Trigger)
The trigger is defined to execute after each insert operation into the source
table, and it inserts a new row into the Change Data table, named
CHRIS.SALESCD. All the new column values, represented by :NEW.<columnname>,
are used when inserting a row into the Change Data table.
Note that an Oracle unique sequence generator is used to maintain the

sequence columns (IBMSNAP_COMMITSEQ and IBMSNAP_INTENTSEQ)
of the Change Data table. For other non-IBM databases, that do not provide a
unique sequence generator, the sequence columns are populated by values
derived from the current timestamp.
Remark (Informix): Because Informix triggers have a limited length, Informix

stored procedures, invoked by triggers, are used to perform the change
capture activity.

6.7.2 The Change Data Table for a Non-IBM Replication Source
You may have already noticed that there is no unit-of-work table for non-IBM
replication sources (there is always one for DB2 replication sources). The
reason for this is that synchronous triggers only commit (or execute) when
the source application commits. When the source application aborts or
performs a rollback, no data is inserted into the Change Data tables.
In the DProp terminology, we call such a Change Data table a Consistent

Change Data (CCD) table. For experts: DProp Apply handles heterogeneous
Change Data tables as non-complete, non-condensed internal CCD tables.
Remark: DProp Capture, DB2’s log based change capture mechanism, reads
the DB2 database log sequentially and as quickly as possible. Capture does
not wait for transactions to commit or rollback. To ensure that only committed
change data is replicated to the replication target tables, DProp Capture
maintains a global unit-of-work table (ASN.IBMSNAP_UOW) that contains
one record for every committed transaction. DProp Apply joins every Change
Data table with the global unit-of-work table when replicating from a DB2
replication source. Using this technique, change data that has not yet been
committed is hidden from the Apply process and therefore is not replicated.
6.7.3 How Apply Replicates the Changes from Non-IBM Sources

When Apply finally accesses the replication source server to replicate the
most recent changes, it transparently accesses all non-IBM database objects
through the DataJoiner nicknames. Actually, Apply has no knowledge that the
replication source tables and the change data tables are located at a non-IBM
source server.
Figure 28 graphically represents the sequence in which Apply accesses the

control tables and the change data tables to fulfill its task:

Business Applications
reg_synch
Insert
Source Table Update Pruning Control Reg_Synch Register
Delete
Prune
CCD Table
Multi-Vendor
Database
Source CCD Pruning Control Reg_Synch Register

Nickname Nickname Nickname Nickname Nickname
3 4 1 2
Other
Control Tables
DataJoiner
Database
APPLY
To Target
Figure 28. Replication from a Multi-Vendor Source Table
As the first action after Apply has connected to the DataJoiner database,
Apply executes an SQL Before statement which updates the REG_SYNCH
table (in Figure 28, this operation is marked as step 1). This only use of this
update is to invoke the reg_synch trigger, which immediately updates the
SYNCHPOINT column for all registered source tables in the register table
(as previously explained). The SQL Before statement that updates the
REG_SYNCH table is automatically added when creating a subscription set if
the source server is a non-IBM database.

Next, Apply queries the register table, as usual, to determine which change
data table belongs to which registered source table. This is shown as step 2
in Figure 28.
Still connected to the replication source server, Apply will subsequently fetch
the most recent changes to the target server, which is shown as step 3. As we
are dealing with multi-vendor sources here, the change data table is
previously fed by change data triggers (assuming that the source table was
changed since Apply accessed the source server before).
After all changes have been applied to the target server, Apply reconnects to
the source server to advance the status of the subscription with an update to
the pruning control table, which is shown as step 4. Updates to the pruning
control table will finally invoke the pruning control trigger (if it has not been
disabled as described in 5.5.13.2, “How to Defer Pruning for Multi-Vendor
Sources” on page 127) to prune all records from the change data table that
were already replicated.
6.8 Summary
We used case study 1 to give you a practical example for a data replication
application, using:
• Informix replication source servers
• A DB2 for OS/390 replication target server
• IBM DataJoiner as central database middleware
• DProp Apply to actually move the data
We satisfied the business requirements by demonstrating an

easy-to-implement advanced technique that IBM DProp provides to
consolidate distributed data into a single central table at the replication target
site. We have additionally demonstrated that the initialization of the target
table (in DProp terms: the full refresh) can be achieved without any manual
interaction. Additional source servers can be added to an existing solution at
any time.
After focusing on the implementation of the test environment that was used to
prove all techniques, we provided ideas on how to carry a tested replication
application over from a test environment to a production environment.
The final part of this chapter, showing change capture triggers at work, can be
used as a reference see how the IBM replication solution integrates
multi-vendor database systems into an enterprise-wide, cross-platform data
replication application. (It’s really that easy!)

Chapter 7. Case Study 2—Product Data Distribution, Retail
In this case study, we describe how centrally managed data can be

distributed to databases in the branch offices using IBM’s Data Replication
solution.
We will utilize the following major techniques in IBM Data Replication Solution
within this scenario to optimize the performance and the managability of the
solution:
• Replication from DB2 for OS/390 to Microsoft SQL Server
• Source-Site Join-Views
• Noncomplete, condensed internal CCDs
• Two-tier versus three-tier approach
• Pull configuration for enhanced replication performance
• Data subsetting to distribute only the data relevant to each branch
• Invoking stored procedures in the target database

A retail company that has a number of branches throughout the country wants
to implement a new decentralized inventory management system for their
branches.
A local inventory management application in each branch needs to access

information about the products, sold in the branch, including product line and
brand information. It also needs access to the supplier data for the products.
This information is managed centrally by applications in the company
headquarters, which are based on DB2 for OS/390. The new inventory
application in the branches, which will be implemented in Visual Basic, relies
on Microsoft SQL Server as the data store. A TCP/IP connection exists
between the OS/390 system at the company headquarters and the Windows
NT systems in the branch offices, but no database connections are yet in
place. Figure 29 shows the high-level system architecture:

OS/390 Company Headquarters
Central App’s.
DB2 Data Sharing Group (DB2I)
TCP/IP
WinNT Branch WinNT Branch WinNT Branch WinNT Branch

01 02 03 n
VB Inv.App. VB Inv.App. VB Inv.App. VB Inv.App.
MS SQL Server MS SQL Server MS SQL Server MS SQL Server
Figure 29. Case Study 2—High Level System Architecture
Two major approaches exist for the design of the new inventory application:
1. The inventory application accesses the required product and supplier
information directly from the DB2 for OS/390 database at the company
headquarters, using remote requests over the network link.
2. The application accesses a local copy of the required data held in the
Microsoft SQL Server database (where all the other relevant data for the
application is located as well).
The first design approach has some serious disadvantages in this scenario:
• Network outages between head offices and branches will directly affect
the availability of the new inventory application.
• The contention between the instances of the new inventory application in
the branches and the central applications will have an impact on the
performance of the central applications.
• The network traffic will increase, which will result in higher network costs.
• The performance of the local inventory application will be degraded due to
remote database requests.
These issues lead to the conclusion that the second design approach, where
local copies of the relevant data are distributed to each of the branches, is
more feasible.
The only issue that has to be resolved for the second approach is: The
distribution of copies of the data introduces redundancy into the system.
Because the required data is not static, the redundancy has to be managed to

keep all the copies up-to-date. The IBM Data Replication solution enables
this design approach by managing the distribution and currency of the copies
as well as the complexity of this process. The inventory application can
access the local database and rely on the correctness of the data.
As no history information is required at the branch offices, only the most

current change to a source table record has to be replicated. In DProp
terminology, this is called transaction-consistent replication.
Note: The IBM Data replication solution supports both transaction-based

replication and transaction-consistent replication. Transaction-based
replication will propagate every update issued by every transaction, and
transaction-consistent replication will only propagate the net results of the
recent activity.
7.1.1 Source Data Model

In Figure 30 we show the subset of the data model of the source database,
which is relevant to our case study.
Each branch will copy a subset of data from the headquarters database
corresponding to the products sold at that particular branch.
Case Study 2—Product Data Distribution, Retail 175

.
Supplier
Store_item Supp_no
Store_num Supp_Name
Prodline_no
Store
Store_Num
CompNo
Name Items
Street Item_Num
City Desc
Zip Prod_Line_No ProdLine
Region_Id Prod_Line_Num
Supp_No
Desc
Brand_Num
Sales
BasartNo
Date
StoreNo
Comapny
Out_Prc
Tax Brand
Location Brand_Num
Pieces
Transfer_Date Desc
Process_Date
Figure 30. Partial Data Model for the Retail Company Headquarters
You can refer to Table 7 on page 206 for description of the tables. Only the
STORE_ITEM table is not described there. This table holds information about
the product lines sold in each store.
7.1.2 Target Data Model

Figure 31 shows the partial data model in each branch of the retail company.
Although more information is required by the inventory application, only the
product domain of the target data model is shown, because this is sufficient
for the understanding of the case study.
The table S_PRODUCT holds the information about the products available at
a particular branch.
The table P_ITEMS holds the information about the number of ITEMS for
each product line.
For other table descriptions, please refer to Table 7 on page 206.

Supplier
S_Product Supp_no
Supp_Name
Item_Num
Desci
Prod_Line_Num
Supp_No
ProdLine
Prod_Line_Num
Desc
Brand_Num
P_Items
Prod_Line_Num
Item_Count Brand
Brand_Num
Desc
Figure 31. Partial Data Model for a Branch of the Retail Company

In coordination with the structured approach proposed in Chapter 3, we
discuss the architecture of the replication solution in terms of the system
design and the replication design.
7.2.1 Data Distribution—System Design

Data will be replicated from the company headquarters to each branch.
Changes are captured from DB2 for OS/390, then replicated to the Microsoft
SQL Server database.
This is called a data distribution configuration.
The target tables are read-only. Therefore, you do not need to set up conflict
detection. Applications can use the target tables, which are local copies, so
that they do not overload the network, and will make the load on the central
server more managable. Refer to Figure 32.

Source Server
read/write source table
Target Server Target Server
read-only target table read-only target table
Figure 32. Data Distribution with Read-Only Target Tables
Since the target is a non-IBM database, the Apply program cannot connect to
the Microsoft SQL Server directly. It will connect to a DataJoiner database
instead (with DB2 DataJoiner connected to the Microsoft SQL Server) and will
apply the changes to Microsoft SQL Server targets using DB2 DataJoiner
nicknames.
Apply issues INSERT, UPDATE or DELETE statements against the

nicknames (which, for Apply and any other DB2 client application, appear to
be DB2 objects), and DataJoiner passes the SQL statements transparently to
the remote data sources. Figure 33 summarizes the foregoing explanation.

APPLY Nickname 1 Target Table 1
Nickname n Target Table n
Apply requests Replication

Control Tables
DJ requests on
behalf of Apply DataJoiner Non-IBM
Database Target Database
Figure 33. Replicating to Non-IBM Target Tables
7.2.1.1 How Many DataJoiner Instances—One or Several?

According to the discussion in Chapter 3, we have two choices when we set
up the connection and replication between the source and the targets.
1. One DataJoiner connected to mutiple branches (see Figure 34).
Advantages of this implementation:
• Ease of administration
• Low cost (for example, license fees, roll out, administration)
But there is also a disadvantage:
• If the data volume is large, the replication performance will be poor,
because the DataJoiner instance will become a bottleneck, and Apply
will have to push all changes to the remote targets. (See Chapter 3 for
a detailed discussion of this topic.)

DB2 for OS/390
WinNT/AIX
DataJoiner
WinNT Branch WinNT Branch WinNT Branch WinNT Branch

01 02 03 n
Figure 34. One DataJoiner Connected to Multiple Store Servers
2. One DataJoiner instance at each branch office (see Figure 35).

Advantage for this implementation:
• Performance will be good, even when the data volume is large.
Disadvantages:
• Setup will be more complicated because several DataJoiner instances
must be installed.
• Higher cost due to the number of licenses.
DB2 for OS/390
WinNT WinNT WinNT WinNT

DataJoiner DataJoiner DataJoiner DataJoiner
Branch Branch Branch Branch

01 02 03 n
Figure 35. One DataJoiner for Each Branch Office
Since the data volume was acceptable in this case study, we chose the first
solution.

7.2.1.2 Apply Program Placement—Pull or Push?
Logically, the Apply program could run on any server that has connectivity to
source, target, and control server.
That is, we can choose to run the Apply program at the source server (on the
headquarters side), which is called a Push configuration, or at the target
server (on the DataJoiner side), which is called a Pull configuration.
1. In a Push configuration, the Apply program for OS/390 connects to the
headquarters source server (DB2 for OS/390) and retrieves the data. Then
it connects to the remote DataJoiner server and pushes the updates to the
target table in Microsoft SQL Server (through DataJoiner nicknames).
In a Push configuration, the Apply program pushes the updates row by
row, and cannot use DB2’s block fetch capability to improve network
efficiency.
The Push techniques are touted as reducing the overhead of having
clients continually poll the server, looking to see if there is any new
information to pull. This configuration will be sufficient when tables are
infrequently updated.
2. In a Pull configuration, the Apply program is located at the DataJoiner
server and connects to the remote DB2 for OS/390 to retrieve the data.
DB2 can use block fetch to retrieve the data across the network efficiently.
After all the data is retrieved, the Apply program connects to the
DataJoiner database and applies the changes to Microsoft SQL Server
through DataJoiner nicknames.
In a Pull configuration, the Apply program can take advantage of the
database protocol’s block fetch optimization.
We select a Pull configuration here to gain advantage of the optimized

performance compared to Push configurations.
7.2.2 Data Distribution—Replication Design

After discussing the decisions regarding the general system design, we
introduce the replication techniques used to fulfill the business requirements
of this case study.
7.2.2.1 Data Subsetting

The tables in the headquarter database are REGION, STORE, SALES,
ITEMS, SUPPLIER, PRODLINE, BRAND, STORE_ITEM.
The tables SUPPLIER, PRODLINE and BRAND will be propagated to each

store completely. Regarding the ITEMS table, each store only needs the

product-related information that is relevant to the store. So we will use a row
subsetting technique for the ITEMS table.
Basically, row subsetting can be achieved by:

• Defining a simple predicate (WHERE clause), if the replicated table
contains the subsetting columns
• Replicating a subset join, if the subsetting column is not part of the
replicated table
In our example, we had to choose row-subsetting using join views:

1. Create a join view based on STORE_ITEM table and ITEMS table. The
view ’s definition is:
CREATE VIEW DB2RES5.S_PRODUCT
AS SELECT S.STORE_NUM, I.ITEM_NUM, I.DESC, I.PROD_LINE_NUM, I.SUPP_NO
FROM LIYAN.STORE_ITEM S , LIYAN.ITEMS I
WHERE S.PRODLINE_NO=I.PROD_LINE_NUM;
This view was defined on the OS/390, which is the replication source
database. Refer to Figure 36.
Two-tier
STORE_ITEM ITEMS Replication
Item_num
Store_num
Desc
prodline_no
Prod_line_no
Supp_no
S_PRODUCT (view)
Store_num
Item_num
Desc
Prod_line_no
Supp_No
Source
Store_num=01 Store_num=nn Target
S_PRODUCT S_PRODUCT
Item_num Item_num
Desc Desc
Prod_line_no Prod_line_no
Supp_no Supp_no
Store01 Storenn
Figure 36. Replication of the Product Information

Remark: Always specify the view’s schema name. Specify all the column
names there, and also specify the correlation id after the table name, and
use the id in the where-clause. Otherwise you will get an error message
when you try to register the view as a replication source.
2. When you add a member to this subscription set, specify a where-clause:
store_num = ...;
7.2.2.2 Replication Strategy—Two-Tier or Three-Tier?

For the STORE_ITEM and ITEMS tables, since we will use a subsetting
technique (join view first, then subsetting according to store_num), we will
use the two-tier approach.
For the other tables, BRAND, PRODLINE, and SUPPLIER which are all
needed in each store with their whole contents, we will use internal CCDs to
net out hot-spots while updating the source tables. This will reduce the
number of rows that really need to be replicated, if the same record (same
primary key) is updated several times within one replication cycle.
So we will have a two-tier topology for tables STORE_ITEM and ITEMS, and
a three-tier topology for the other tables, as shown in Figure 37.

Headquarters DB2 for OS/390
STORE_ITEM ITEMS SUPPLIER
Tier 1
S_PRODUCT View
APPLY
Noncomplete
condensed
CCD Table Tier 2
Capture (internal)
APPLY
DJDB Nicknames
Middleware
DataJoiner
for NT
Tier 3
PRODUCT SUPPLIER PRODUCT SUPPLIER
MS SQL Server
MS SQL Server
Store 01 NT Store 02 NT
Figure 37. Three-Tier Replication Architecture

About the CCD Tables
We use condensed, noncomplete internal CCD tables in this case study,
because condensed CCD tables ensure that only the net change for a row
are replicated to the targets, and therefore it reduces the network load.
Noncomplete CCD tables contain only the modified rows from the source
table.
The CCD table that we created for the SUPPLIER table is called CCDSUPP.
We use internal CCDs to benefit from the following advantages:

1. The join between the CD tables and the UOW table will be performed only
once, regardless of the number of subscriptions (stores).
2. "Hot spot" updates to the same row will be eliminated; using condensed
CCD, only the last image of the row will be kept, and only the most current
row image will propagate to the targets, not each and every change.
7.2.2.3 Invoking Stored Procedures at the Target Database

After replicating the source data, the store offices also need some specific
information for their applications: this information is to be calculated from the
replicated data, and a good way to do this is to use a stored procedure. So
we illustrate here the DProp ability to call a stored procedure before or after
the processing of a Subscription Set. Together with DataJoiner’s capability of
creating nicknames for remote non-IBM stored procedures, it is even possible
to invoke a Microsoft SQL Server stored procedure with Dprop Apply.
In this case study, we used the following technique to fulfill this task:
1. In the Microsoft SQL Server database, we created the following stored
procedure in the target database:
CREATE PROCEDURE compute_item AS
delete from p_items
insert into p_items select prod_line_num, count(item_num) from s_product
group by prod_line_num
This stored procedure will aggregate the product numbers for each
product line sold in the store. The first statement is used to clear the
historic data. And the second part of the stored procedure computes the
current aggregate data, then inserts into the aggregation table.
Each time the Subscription Set is processed, this stored procedure is
called.

2. Next, we created the stored procedure nickname in the DataJoiner
database:
Create Stored Procedure Nickname c_item for infodb1.dbo.compute_item;
3. Finally, we added the stored procedure to the Subscription set definition,
using DJRA (see Figure 47 on page 200).

This section introduces the detailed steps and tips used to set up the test
environment for this case study . Basically, we will work along the
implementation checklist that is provided in Chapter 4.

Figure 38 shows the topology of the environment for this case study.

DB2 for OS/390 Data Sharing Group DB2I
DProp DProp
Capture Apply
OS/390
wtscpok
DProp
Apply
DataJoiner V2.1
DJDB
MS SQL Server
Client
Intel, WinNT Server
MS SQL Server V7 MS SQL Server V7

infodb 1 infodb n
Intel, WinNT Server Intel, WinNT Server

branch1 branch n
Figure 38. Case Study 2—System Topology
In Figure 38, you can identify the following major components:

1. The company headquarters—source site:
• DB2 for OS/390 V5.1.1
• DProp Capture and Apply for MVS
2. The DataJoiner server:
• DataJoiner for NT V2.1.1, including DJRA
• Microsoft SQL Server V7.0 client code (ODBC driver)

3. The branches - target sites:
• Microsoft SQL Server V7.0.
In this environment, we use TCP/IP to connect all the systems.

This section documents the setup tasks, as we used them to prepare the
systems for case study 2. This section refers to Chapter 4 for all general
setup activities that are specific to this case study.
7.3.2.1 Setup the Database Middleware Server

One of the NT servers used in this environment is used as IBM DataJoiner
middleware server.
We assume that all the Microsoft SQL Servers in the branches are already
installed and running.
Set up DataJoiner to access the Microsoft SQL Servers

Based on step 1 and step 2 of the implementation checklist that was
introduced in Chapter 4, the first task is to enable native SQL Server
connectivity between DataJoiner and all the Microsoft SQL Servers. The
following steps were necessary to connect the DataJoiner database to all the
SQL Server instances:
• Install the Microsoft SQL Server Client Code on the DataJoiner machine.
• Use SQL Server Client Network Utility on the DataJoiner server to set up
the connections to each of the stores SQL Server databases.
Use the General tab in this tool (see Figure 39), select TCP/IP as Default
network library, then click the Add... button to add a server. You can refer
to the Microsoft SQL Server manuals for more details.

Figure 39. Configure Microsoft SQL Server Client Connectivity
To check the success of this configuration step we used the SQL Server
Enterprise Manager to natively connect to all the SQL Server instances.
Prepare DataJoiner to Access the Microsoft SQL Server Systems

The DataJoiner release that we used to set up this environment is IBM DB2
DataJoiner for NT 2.1.1 with PTF6.
Register the SQL Server Databases as ODBC data sources.
Create your DataJoiner Database

One single DataJoiner database can generally be used to access multiple
backend database systems.
CREATE DATABASE DJDB;
Create Server Mappings for Microsoft SQL Server Systems

A server mapping specifies how DataJoiner will subsequently access a
remote data source.
connect to djdb;
create server mapping from infodb1 to node "branch1" database "infodb1"
type mssqlserver version 7.0 protocol "djxmssql";

Create Server Options
When using DataJoiner as a gateway for DProp replication, we recommend
that you always set the server option TWO_PHASE_COMMIT to NO.
create server option TWO_PHASE_COMMIT for server infodb1 ’N’;
Create User Mappings

DataJoiner user mappings are used to map DataJoiner userid’s and
passwords to non-IBM database userid’s and passwords.
create user mapping from grohres3 to server infodb1 authid "sa" Password
"password";
Connect to the Replication Source Server (DB2 for OS/390)

We used DataJoiner’s built-in DRDA requester to connect to the replication
source server. The DRDA connectivity was implemented using the TCP/IP
network protocol.
• Open the hosts file on the DataJoiner server, and insert the following line:
9.12.14.555 wtscpok
• Make sure you can ping the host (wtscpok), and then execute the
following steps from the DB2 command line processor:
catalog tcpip node hostdb remote wtscpok server 33320;
catalog database sj390db1 as sj390db1 at node hostdb authentication dcs;
catalog dcs database sj390db1 as sj390db1;
• Test the connection:
connect to sj390db1 user db2res5 using pwd;
• Bind the DB2 utilities to the data source:
bind @ddcsmvs.lst blocking all grant public;
7.3.2.2 Implement the Replication Subcomponents

This paragraph refers to the installation of DProp Capture and DProp Apply
corresponding to the system topology outlined in Figure 37 on page 184.
Install and set up DProp Capture

Install DProp Capture on the OS/390 source system. Refer to “Step 12: Install
and Set Up DProp Capture, if Required” on page 71 for more details.
Install and Setup DProp Apply

Install DProp Apply on the OS/390 system to support the three-tier approach
for the BRAND, PRODLINE, and SUPPLIER tables.
You do not need to install DProp Apply on the DataJoiner server because
Apply has already been installed with DataJoiner.

Refer to , “Step 13: Install and Setup DProp Apply, if Required” on page 71 for
additional information.
7.3.2.3 Set Up the Replication Administration Workstation

Set up DJRA preferences, so that it can access the headquarters database
(SJ390DB1) and the DataJoiner database (DJDB).
Open DJRA, select File => Preferences, then click the Connection tab, and
set the userid and password for the source and target.
7.3.2.4 Create the Replication Control Tables

You must create Control tables at the Source Server and also at the
DataJoiner server (which acts as the Control Server in this scenario).
Create the Control Tables at the Replication Source

Use the DJRA function Create Replication Control Tables to create all the
DProp control tables at the replication source server. Select SJ390DB1 in the
panel, then generate the SQL script.
Create the Control Tables at the DataJoiner Server

Use the DJRA function Create Replication Control Tables to create all the
DProp control tables in the DataJoiner database (see Figure 40).
Figure 40. Creating Replication Control Tables with DJRA
7.3.2.5 Bind DProp Capture and DProp Apply

After the control tables are successfully created, we are able to bind DProp
Capture and DProp Apply.
Bind DProp Capture

Use the job provided with the Capture for OS/390 installation media to bind
the Capture component against the source database.

Bind DProp Apply
Use the job provided with the Apply for OS/390 installation media to bind
Apply for OS/390 against the source database. You can refer to “Step 22:
Bind DProp Apply” on page 75 for details.
We must also bind the Apply component that is included in DataJoiner

against the replication source server (SJ390DB1), all replication target
servers (DJDB), and the replication control server (DJDB).
At the DataJoiner instance, change the directory to SQLLIB\BND and use the
following statements to bind Apply:
Connect to SJ390DB1 user db2res5 using pwd;
bind @applyur.lst isolation ur blocking all;
bind @applycs.lst isolation cs blocking all;
Connect to DJDB;
bind @applyur.lst isolation ur blocking all;
bind @applycs.lst isolation cs blocking all;

In this section we describe which replication sources and targets have to be
defined to distribute the required data to the branch offices.
7.4.1 Define DB2 for OS/390 as Replication Source

Use DJRA to generate and run the SQL script to define the OS/390 tables as
replication sources (see Figure 41).

Figure 41. Register Table ITEMS as a Replication Source
Remark: You must first register all the tables, before defining the join view as
replication source, then register the S_PRODUCT view (see Figure 42).
For the Column capture policy, select After-images only (Option: both
before-images and after-images would be used in an auditing scenario, for
example).
For Update capture policy, if the souce tables’ primary key or partition key
could be updated, then you would have to choose Updates as delete/insert
pairs. Here we simply need Updates captured as updates.
Since this case study is not an update-anywhere scenario, Conflict detection

level has to be set to None here.
Remark: If the Capture program is running while you are defining a new
replication source, you will have to reinitialize Capture so that it takes the new
registration into account.

Figure 42. Register Views as Replication Sources
7.4.2 Define Empty Subscription Sets

In this case study, we create several subscription sets, some for the internal
CCD tables (subscription set names: SETBRAND, SETSUPP, SETPROD)
and another for the User Copy tables (target tables in SQL Server;
subscription set name: SETLY). See Figure 43 on page 195 and Figure 44 on
page 196.
Logically you will create the Subscription Sets for the CCDs first, and then the
Subscription Set for the User Copy tables.
For the CCD Subscription Sets, the Apply Qualifier is AQCCD.
For the Copy Tables Subscription Set, the Apply Qualifier is AQLY (the Apply
Qualifier is used in the command to start Apply, and it is also used as part of
the password-file name).
Remember: The Apply Qualifier is case-sensitive!

Figure 43. Use DJRA to Create Empty Subscription Set.
We also specify the time interval for this Subscription set as 1440 minutes,
which means 24 hours.
We can see from Figure 43 that there is another parameter named Blocking
factor. The value you specify here will be the MAX_SYNCH_MINUTES value.
If a blocking factor is specified, Apply takes this factor into account when
selecting data from the change data tables (either CD or CCD). If the time
span of queued transactions is greater than the numbers of minutes specified
by MAX_SYNC_MINUTES, Apply will try to convert a single subscription
cycle into many mini-cycles, cutting the backlog down to manageable pieces.
But, in doing so, Apply will never cut transactions into pieces. A transaction is
always replicated completely, or not at all. This reduces the stress on the
network and DBMS resources and reduces the risk of failure.
We specify the MAX_SYNC_MINUTES here as 6 hours. That is, every 24

hours, Apply will run, and it will try to subset the change data into 4 subsets.
That Apply instance will slice the propagation backlog into several
mini-subscriptions, each mini-subscription taking a maximum of 6 hours of
changes.
For performance considerations, we select the target database as the control

database, because the Apply program runs in the same machine; this can
save Apply from going to the network to access the subscription definitions.

When creating the subscription sets for the CCD tables, we use SJ390DB1 as
the control server and the target server (see Figure 44).
Figure 44. Create Empty Subscription Sets for CCDs
7.4.3 Create a Password File

Only the UNIX platforms and Windows platforms need a password file. It
contains the following information:
SERVER=SJ390DB1 USER=db2res5 PWD=pwd
SERVER=DJDB USER=grohres3 PWD=pwd
Save the file as AQLYdb2DJDB.PWD in the directory where you will invoke the
Apply program. AQLY is the value of Apply qualifier we defined in the
previous step (see Figure 43 on page 195).
If you want to start Apply as a Windows NT service, please refer to DB2

Replication Guide and Reference, SR5H-0999.
7.4.4 Add Members to the Subscription Sets

First you add members to the CCDs subscription sets (see Figure 45).

Figure 45. Add a Member to Subscription Sets
Note: In this step, the CCD table is an internal CCD; we used DJRA’s target
table logic user exit to customize the create tablespace statements for the
CCD table’s tablespace.
The following is the DB2 for MVS part of the target table logic file:
SAY "-- in TARGSVR.REX";
SUBLOGIC_TIME_SUFFIX=SUBSTR(TIME(’L’),4,2)||,
SUBSTR(TIME(’L’),7,2)||,
SUBSTR(TIME(’L’),10,2);
SELECT
WHEN SUBSTR(IN_TARGET_PRDID,1,3)="DSN" THEN; /* DB2 FOR MVS */
DO; /* CREATE A TABLESPACE FOR THE TARGET TABLE */
SAY "-- About to create a target table tablespace";
SAY "CREATE TABLESPACE TS"||SUBLOGIC_TIME_SUFFIX;
SAY " IN SJ390DB1 SEGSIZE 4 LOCKSIZE PAGE CLOSE NO CCSID
EBCDIC;";
OUT_TARGET_TABLESPACE="SJ390DB1.TS"||SUBLOGIC_TIME_SUFFIX;
END

Once you have added the members to the CCDs subscription sets, you must
add members to the User Copy subscription sets.
Attention: The source tables you choose are always the real tables, not the
CCDs. This would be different if you had defined external CCDs instead of
internal CCDs, because in the case of external CCDs, it is the CCDs that are
indicated as sources for the dependent target tables.
So when you use internal CCDs, the CCDs are really transparent. You define
them as targets, but you never refer to them afterwards. Apply will, of course,
take the internal CCDs into account when servicing the subscriptions.
When we add a member to a subscription set, we can specify a where-clause

to indicate the subset of data we want to replicate to the target. Indicate a
where-clause for the PRODUCT table (see Figure 46).
Remark: You must not indicate the word where in the where-clause input
field.
The DJRA panel should look like this:
Figure 46. Data Subsetting

In this step, we specify the target table qualifier as grohres3; it is a DataJoiner
user, which is the SQL Server authentication id.
You should pay attention to the following items in the generated SQL:
The table it creates in the SQL Server database has the default schema
"dbo", but DJRA will fetch the REMOTE_AUTHID from the
SYSIBM.SYSREMOTEUSER table: "sa".
In the generated SQL, it will use "sa" as the table schema when creating
nicknames and indexes, so you should update the SQL script, and change
"sa" to "dbo". If you can create a user with the same login id and username in
Microsoft SQL Server, then there will be no need to update the SQL Script.
Note: Since DESC is a reserved word in SQL Server, you cannot create a
table with this column name in the SQL Server database (it will report an
ODBC error 37000), so you should update the generated SQL manually, and
update the target table column name. Refer to Appendix D.3, “Add a Member
to Subscription Sets” on page 342.
7.4.5 Add Statements or Stored Procedures to Subscription Sets

As you can see in Figure 47, we indicate that the stored procedure must be
called after the data has been replicated.
The native Microsoft SQL Server stored procedure developed in 7.2.2.3,

“Invoking Stored Procedures at the Target Database” on page 185, is
referenced by name (c_item).

Figure 47. Add Stored Procedure to Subscription Sets
7.4.6 Start DProp Capture and Apply on the Host

Refer to the DB2 Replication Guide and Reference, S95H-0999" Capture and
Apply for MVS", for a detailed description of how to operate these
components in an OS/390 environment.
7.4.7 Start DProp Apply on the DataJoiner Server

To do this, issue the following command from a Windows NT command
window:
asnapply AQLY DJDB
You also can specify a trace for the Apply program using the following
command:
asnapply AQLY DJDB trcflow;
This can help you when there is something wrong: You can get the error
messages and sqlcode from the trace information. You can also record the
trace information in a file by running the following command:
asnapply AQLY DJDB trcflow > filename;

7.5 Summary
In this chapter we have shown one of the most common replication scenarios,
a data distribution from a central site to a number of locations. We
demonstrated how to use IBM’s replication solution to distribute data to
Microsoft SQL Server databases. We showed the advantages of a pull
configuration for Apply, as well as a three-tier configuration with an internal
CCD table to fan out copies to the target systems without putting a burden on
the data source. We demonstrated the powerful data subsetting capabilities,
and we showed you how to invoke native stored procedures within the target
database.

Chapter 8. Case Study 3—Feeding a Data Warehouse
This case study concentrates on the design and implementation of populating

a data warehouse by using log based change capture from a central
production system running On Line Transaction Processing (OLTP) types of
applications. We intend to capture data changes from the production system
using DProp Capture and maintain historic information within the data
warehouse by using DProp Apply.
The specific objectives of the case study are to demonstrate how DProp can
be used in a data warehousing environment to:
• Populate and maintain a data warehouse in a non-IBM database.
• Show how join replication can be used to denormalize data.
• Describe how temporal histories can be automatically maintained by
DProp within the data warehouse.
• Demonstrate how DProp can automatically maintain aggregations of data
within the data warehouse.
In this chapter we will also describe a technique for pushing down the
replication status to a non-IBM database. This is not specifically a data
warehousing issue, but it is, nevertheless, a useful trick.

In this case study, a retail company has a number of retail outlets which
collect electronic point-of-sales (EPOS) information. The information is
consolidated from the outlets to the head office in order to perform stock and
order analysis. Stock is dispatched to the outlets from a central warehouse
when product levels at that outlet fall below a certain threshold. New product
items are also ordered from suppliers when stock within the warehouse is
low.
To remain competitive and streamline its operations the company wishes to

analyze the sales data collected by the EPOS terminals. The company
wishes to hold up to 2 years worth of sales and related data in the data
warehouse. The production system currently only maintains the most recent
7 days worth of sales data.
This new business intelligence (BI) application will enable the company to
control their inventory more closely and manage their supply chain more

efficiently. In order to analyze trends and forecast the demand based on the
sales data, the data warehouse must provide historic, time-consistent data.
The retail company has decided to utilize an existing Oracle server to act as
the data warehouse store. This server is located within the head office.
Figure 48 summarizes the business environment of this organization.
Head Office
Retail Outlet
EPOS
DB2 for OS/390
Data Sharing Group
Retail Outlet
Stock Ordering and
EPOS
Distribution
Application
.
..
Oracle
Data
Warehouse Retail Outlet
Server EPOS
Figure 48. The Business Environment
The data warehouse is to be populated directly from the OS/390 production

system. The production system also contains global store and product
information which is not available in the outlets.
This case study documents the process of establishing a replication

environment which feeds an Oracle data warehouse from a production
system running DB2 for OS/390. This data flow is shown with a bold arrow in
Figure 48.
Note: The replication techniques introduced in this case study will show
solutions for some of the most common issues in populating data warehouses
or data marts, and will be applicable for many other data warehousing
situations.

To understand the operations, transformations, and denormalization which
are described in this case study, we must first of all understand the source
data model along with the proposed data model of the data warehouse.
8.1.1 Source Data Model

The Entity-Relationship (E-R) diagram in Figure 49 shows the data model for
the source data which resides on the production DB2 for OS/390 database.
Supplier
Supp_no
Supp_Name
Store
Store_Num
CompNo
Name Items
Street Item_Num
City Desc
Zip
Prod_Line_No
Region_Id
Supp_No
Sales
BasartNo
Date
Location
Company ProdLine
Out_Prc Prod_Line_Num
Tax Desc
Pieces Brand_Num
Transfer_date
Process_Date
Region
Region_Id
Region_Name
Contains_stores
Brand
Brand_Num
Desc
Figure 49. Data Model Diagram of Source Data
Case Study 3—Feeding a Data Warehouse 205

To minimize redundant data on the production system, the data is stored in
a highly normalized form. The seven tables which compose the source data
model are described briefly in Table 7.
Table 7. Source Data Tables
Table Name Description Approx.

Rows
Sales The central table which records information relating 790,000

to a particular sales transaction. Each record is 1 (7 days
EPOS transaction. worth of
data)
Items Contains 1 row for each product which the company 38,000
sells.
Supplier Holds information about the suppliers of the 6,500

products.
ProdLine Holds information relating to the product lines into 2,300

which the products are grouped.
Brand Holds information about the brands which the 469

product lines are associated with.
Store Contains information on the stores through which 3,000

the company sells its products.
Region Contains geographical information on the location of 41

the stores.
8.1.2 Target Data Model

The E-R diagram in Figure 50 shows the design of the data model for the
target data warehouse within Oracle. The data model shows a typical data
warehouse or data mart approach, which is suitable for multi-dimensional
analysis.

Suppliers
Supp_Number
Supp_Name
Valid_From
Valid_To
Outlets
Store_Num
CompNo
Name Products
Street Item_Num
City Item_Description
Zip Prod_Line_Num
Region_Id Supp_Num
Region_Name Product_Line_Desc
Valid_From Brand_Num
Valid_To Sales Brand_Description
BasartNo IBMSNAP_LOGMARKER
Sale_Date
EXPIRED_TIMESTAMP
Pieces
Out_Prc
Tax
Location
Comapny
Transfer_Date
Process_Date
Time
Date
Day_of_Year
Day_of_Month
Day_of_Week
Week
Month
Quarter
Year
Holiday_Ind
Season_Ind
Figure 50. Data Model Diagram of Target Data
The Valid_From and Valid_To columns in Outlets and Suppliers and the
IBMSNAP_LOGMARKER and EXPIRED_TIMESTAMP columns in Products
enable those tables to maintain temporal histories and will be created
manually. For more detailed information refer to 8.4.6, “Adding Temporal
History Information to Target Tables” on page 250.
The data model shown in Figure 50 does not show all of the DProp control
columns. These columns are added to target tables during the subscription
process and are automatically maintained by DProp.
IBMSNAP_LOGMARKER is shown because it is used by the data warehouse
applications as the start of a record’s validity period.

Table 8 provides a brief description of the tables in the target data model:
Table 8. Target Data Tables
Table Name Description Approx.

Change
Volume
Sales The central fact table in a star schema which 120,000

maintains a history of sales records over a rows per
period of 2 years. day
Products The denormalization of the Item, ProdLine, 20 rows

and Brand source tables which maintains a per week
history of changes to products and their
attributes.
Suppliers A simple copy of the Supplier source table 2 rows per
which maintains a history of changes to week
suppliers.
Outlets Denormalization of Store and Region source 1 row per
tables describing stores and their locations. month
This is actually a view defined over the Sales
and Region tables which are replicated to the
target as individual tables.
Time Contains 1 row for each day in the analysis none

period. It is a standard dimension in nearly
every multi-dimensional data model and is
used to identify special date/time related
information such as holidays or season
indicators. It is also used to enable grouping of
the analysis results according to weeks,
months or quarters.
The data has been denormalized into a star schema with Sales as the central
fact table and three dimensions for Products, Outlets and Time.
Denormalization was performed in order to aid query performance. More
complex data warehouses are likely to have more than three dimensions, but
for the purpose of this study, three will suffice.
Since the Time dimension table does not require any DProp replication
definitions to be maintained, it will not be discussed further in this book.
Section 8.4, “Implementing the Replication Design” on page 217 of this

chapter, takes each of the target tables in turn and details the DProp steps
necessary to assemble the target from the source tables provided. Once all

these steps have been completed, the data model in the warehouse will
resemble the one described in Figure 50 on page 207. However, first of all,
we need to describe how the replication solution is architected.

In this section, we will follow our structured approach, introduced in Chapter
3, and discuss the system and replication design in the context of the data
warehouse scenario described above.
8.2.1 Feeding a Data Warehouse—System Design

The following recommendations regarding the placement of the different
system components apply to our data warehouse case study.
Both the type and location of the replication source and replication target are
fixed by business requirements. Since the source database is fixed, the
placement of DProp Capture is also fixed: Capture must be co-located with
the source database. The placement of all other components, such as
DataJoiner, Apply and the Replication Administration Workstation are
variable.
The configuration shown in Figure 51 on page 214 co-locates Apply and

DataJoiner on the same physical machine as Oracle. This configuration
provides the maximal performance option since it reduces the number of
network hops. In DProp terminology the topology of this environment is
known as a pull configuration because Apply is co-located with the target and
"pulls" the data from source to target. If Apply for DB2 on OS/390 was used in
this configuration, it would "push" the data to the Oracle target.
Pull configurations generally perform better than push configurations

because in a pull configuration, individual rows from an SQL result set can be
blocked and sent across the network in large chunks—this is generally more
efficient than sending one row at a time across the network. Since data
warehousing scenarios are likely to deal with large amounts of data, the
optimal performance configuration was adopted for this business problem.
The placement of the Replication Administration Workstation does not have

any bearing on replication performance and can be located on any Windows
32-bit workstation.

8.2.2 Feeding a Data Warehouse—Replication Design
This case study describes DProp procedures to implement four common data
warehousing techniques. The four techniques described are:
• Data transformations
• Denormalization
• Maintaining history information
• Maintaining summary information
These techniques are outlined below. Suggested methods of implementing

them using DProp are described in the remainder of this chapter.
8.2.2.1 Data Transformations

It is often necessary to transform and manipulate source data before placing
it into a warehouse. These transformations change the data into a form which
is more suitable for warehouse applications and business analysts.
Transformations are also often used to translate cryptic codes (such as
Region_Id) into more meaningful descriptive information (such as
Region_Name).
As a general rule of thumb, DProp can perform any data transformation which
can be expressed in SQL by using views over source tables, staging tables or
target tables. Alternatively, more complex transformations can be achieved by
executing SQL statements or stored procedures (either DB2 or multi-vendor)
at various stages during the subscription cycle. The SQL or stored procedure
can operate against:
• Any table at the replication source system, including replication source
and change data tables.
• Any table at the replication target system, including replication target
tables. The SQL statements or stored procedures can be executed before
or after the answer set is applied.
8.2.2.2 Denormalization
Database systems used for on-line business critical applications are tuned for
high volume transaction processing. Typically this requires the data to be in a
highly normalized form (as shown in Figure 49 on page 205). This form is
optimized for fast SQL insert and update transactions, but not for the selects
which will be used in the warehouse environment. A common technique in
data warehousing is therefore to hold the data in a denormalized form within
the warehouse—thus facilitating faster response to queries. The process of
introducing redundancy and structuring the data according to business needs
instead of application needs is known as denormalization.

Two techniques are described for denormalizing data within this chapter.
They are:
• Denormalization by using a target site join—see 8.4.3, “Using Target Site
Views to Denormalize Outlet Information” on page 228.
• Denormalization by using a source site join—see 8.4.4, “Using Source
Site Joins to Denormalize Product Information” on page 237.
Other techniques are available with DProp for denormalizing data. For
example, creating views over staging tables or simulating outer joins. These
techniques are not specifically covered in this chapter, but use the same type
of procedures as those which are described in detail.
8.2.2.3 Maintaining History Information

The ability to provide the foundation for historical data analysis is one of the
major differentiators of business intelligence or data warehouse solutions as
opposed to OLTP systems. Typically, the production database will only
contain the most recent value for a particular entity (for example, the
customer record, or supplier information for a product). It does not contain
information on the value of a certain attribute as of two years ago, or
information on how that entity has evolved over time (for example,
information about how the packaging of a product has changed over time).
In a data warehouse environment, keeping history information is essential in

order to analyze trends over time.
One approach to establish a record of history is to prevent deletes (and

updates) in the production data from being replicated to the data warehouse
environment. Instead, these events result in a new record associated with the
type of operation and a timestamp of the event, being inserted into the data
warehouse history table. This technique is especially well suited for audit
trails, and for analyzing churn patterns in the data.
DProp provides the so-called Consistent Change Data Tables (CCD Tables)
as a solution for history tables of this type. See the DB2 Replication Guide
and Reference, SR5H-0999 for a basic introduction on CCD tables.
A more complex requirement for time-series is very common in relational

snowflake or star-schema models for data warehouse or data mart solutions.
It deals with modeling slowly-changing dimensions of such a data model.
The structure of a star-schema or snowflake data model usually consists of a

central fact table, which contains the measures to be analyzed for a particular
business domain (for example, sales revenues, number of items ordered) and

a multipart key, that relates these measures to the dimension tables. The
dimension tables hold the various selection and grouping criteria along which
the analysis is performed (for example: product, supplier, or customer
information).
The fact table usually records events (such as a sale). An event is associated
with a single date or timestamp. Events are inserted periodically (for example,
daily or weekly) into the fact table, building a history of events over time.
The attribute values recorded in the dimension tables (for example the
supplier information for a product) are usually valid for a certain period of
time. For example, product X was supplied by supplier A from 1997-02-01 to
1997-12-31. After this time period the supplier for product A was switched to
supplier B.
If we just propagated the update of the supplier information from the

production system to the dimension table, we would implicitly associate the
new supplier with all events in the fact table, even those recorded prior to the
change (that is, before 1997-12-31).
In section 8.4.6, “Adding Temporal History Information to Target Tables” on

page 250 we show how DProp can be used to automatically maintain validity
time periods for attributes in dimension tables, when an update to this
attribute occurs in the production table. The validity time period is denoted by
two date/timestamp columns, that show the beginning and the end of the
validity of a specific value for that attribute.
This technique enables correct time-based analysis in a multidimensional

data model with facts and dimensions. It has many advantages over the
traditional full extract and load technique of populating data warehouses.
Some of these advantages are listed below:
• A more granular validity period can be maintained by DProp. The
granularity period for the extract and load technique is limited to the time
between the source data snapshots used to populate the warehouse. The
granularity of the DProp maintained period can be accurate to the
sub-second and can easily be changed using SQL date and timestamp
functions.
• The extract and load technique involves keeping several snapshots of the
data. These snapshots contain redundant data for all records which have
not changed since the previous snapshot (thus consuming needless
network bandwidth and disk space). With DProp, only the changes to the
data are replicated.

• The snapshot method does not reveal a complete history of the changes
made at the source. If a record changes multiple times between
snapshots, only the last change will be replicated. Therefore, a warehouse
maintained from this technique will only provide partial history information,
and does not tell the whole story. DProp will provide a complete history of
all changes made at the source, maintaining a complete history.
8.2.2.4 Maintaining Summary Information

Aggregate or summary tables contain summaries of information from the
base table. The summaries can be calculated at the data warehouse when a
query is issued, or they can be automatically maintained by the process
which feeds the warehouse. If they are automatically maintained by the
feeding process, then Business Intelligence applications which require the
summary information can perform a simple table lookup (rather than an
expensive query which may involve a table scan).
The technique discussed in 8.4.7, “Maintaining Aggregate Information” on

page 256 describes an advanced DProp technique for maintaining summary
information with no impact on source tables.

This section details the steps required to setup the heterogeneous replication
environment used in this case study. It uses the implementation checklist
described in 4.3, “Setting up a Heterogeneous Replication System” on page
63. Only those steps from the checklist which require further explanation are
described in detail below.

To summarize the discussion about the placement of the system components,
and to provide an overview of the decisions made so far, the system
architecture of the solution is shown in Figure 51.

OS/390
Capture
Windows NT
Replication DB2 for OS/390 V5.1.1
Administration
Workstation
DJRA
DB2 CAE
v2.1.1.140
TCP/IP
TCP/IP
RS/6000 J50
DDCS AIX v4.3.1
Apply
SQL*Plus
DataJoiner V2.1.1
Oracle V8.0.4
Net8
Data Warehouse

Let us now review the detailed steps necessary to implement the replication
infrastructure for this case study, according to the checklist described in
Chapter 4.
8.3.2.1 Set Up the Database Middleware Server

DataJoiner requires Oracle’s Net8 (for Oracle8) or SQL*Net (for Oracle
Version 7) client code to be installed on the same machine as DataJoiner
before it can access Oracle databases. Refer to the Oracle Net8
Administrator’s Guide, A58230-01 for more information on how to install and
configure Net8.
Advice: Net8 only provides the communication between Oracle client and
Oracle database server. It does not provide a command line interpreter where

Oracle commands and SQL can be executed. This feature is provided by
another Oracle product called SQL*Plus. It is a good idea to install SQL*Plus
at the same time as Net8 so that native Oracle connectivity between client
and server can be tested—see Appendix B.1.2, “Using Oracle’s SQL*Plus” on
page 325 for more details. Once native connectivity has been verified,
proceed to “Step 3—Install the DataJoiner software” of the implementation
checklist.
When installing the DataJoiner product on the AIX machine, remember to

select the DProp Apply sub-component. After installing the DataJoiner code,
create the Oracle data access module using the djxlink.sh shell script. For
more details, refer to “Step 4—Prepare DataJoiner to access the remote data
sources” of the Implementation Checklist.
Now create the DataJoiner instance using db2icrt, and create the DataJoiner
database that will be used to access the Oracle database. The following
syntax was used to create the DataJoiner database for this case study:
CREATE DATABASE djdb
COLLATE USING IDENTITY
WITH "DataJoiner database";
As recommended in the DataJoiner Planning, Installation and Configuration

Guide for AIX, SC26-9145, the COLLATE USING parameter is set to IDENTITY.
This is because replication will store binary data (CHAR(nn) FOR BIT data in
some DProp control columns) at the remote data source.
Once the DataJoiner database has been successfully created, configure DB2
database connectivity between DataJoiner and the DB2 for OS/390
subsystem which is to act as the replication source. Connectivity for this case
study is established using DRDA over TCP/IP using the following node and
database definitions:
CATALOG TCPIP NODE DB2INODE REMOTE MVSIP SERVER 33320;
CATALOG DCS DATABASE SJ390DB1 AS DB2I;
CATALOG DATABASE SJ390DB1 AT NODE DB2INODE AUTHENTICATION DCS;
At the same time, configure DataJoiner to accept TCP/IP clients by updating

the Database Manager Configuration. The Replication Administration
Workstation will need to be able to connect to the DataJoiner database and
will also use DataJoiner as a DRDA gateway to communicate with the DB2
for OS/390 database. Refer to “Step 8—Enable DB2 clients to connect to the
DataJoiner databases” of the Implementation Checklist for more details.
Now that all DB2 connectivity has been established and verified, configure
connectivity from DataJoiner to Oracle. This connectivity is configured by

populating the DataJoiner catalog tables with information about the remote
database. The steps below outline the procedure to configure this
connectivity:
1. Identify the remote database to DataJoiner by creating a Server Mapping.
The following Server Mapping was used to identify the Oracle target
database in this case study:
CREATE SERVER MAPPING FROM azovora8 TO NODE "azov"
TYPE ORACLE VERSION 8.0.4 PROTOCOL "net8"
CPU RATIO 1
IO RATIO 1
COMM RATE 16000000;
2. Tell DataJoiner what options to use when accessing the remote database.
The following Server Options where set for this case study:
CREATE SERVER OPTION fold_id FOR SERVER azovora8 SETTING ’n’;
CREATE SERVER OPTION fold_pw FOR SERVER azovora8 SETTING ’n’;
CREATE SERVER OPTION password FOR SERVER azovora8 SETTING ’y’;
The fold_id and fold_pw options tell DataJoiner not to fold the userid and
password specified in the CREATE USER MAPPING statement to either upper or
lower case. The password option informs DataJoiner that it must send a
password when connecting to Oracle.
3. Use the CREATE USER MAPPING statement to map the DataJoiner user to a
user who is valid at the remote database. The following user mapping is
defined for this case study:
CREATE USER MAPPING FROM USER TO SERVER azovora8
AUTHID simon PASSWORD pwd;
More details on configuring DataJoiner to access non-IBM databases can be

found in the DataJoiner Planning, Installation and Configuration Guide,
SC26-9150 for your platform and the DataJoiner Application Programming
and SQL Reference Supplement, SC26-9148 .
Now that all database connectivity has been configured and verified, we need
to start implementing the DProp Capture and Apply components.
8.3.2.2 Implement the Replication Subcomponents (Capture, Apply)

Install and configure Capture for MVS on the OS/390 source system. In
addition, install Apply if it has not already been installed on the DataJoiner
AIX machine. Refer to 4.4.2, “Implement the Replication Subcomponents
(Capture, Apply)” on page 71 for more detailed information.

8.3.2.3 Set Up the Replication Administration Workstation
Choose a 32-bit Windows platform to act as the Replication Administration
Workstation and configure DB2 connectivity from this workstation to the
source DB2 for OS/390 system and the DataJoiner database.
Install the DataJoiner Replication Administration software (DJRA) from the

DataJoiner CD, or download the latest version of DJRA from the Web. This
tool acts as the replication administration focal point for configuring and
monitoring DProp replication. For more details, refer to “Step 16—Install
DJRA, the DataJoiner Replication Administration software” of the
Implementation Checklist.
When DJRA has been installed, proceed to “Step 17—Set up DJRA to access
the source and target databases” of the Implementation Checklist to enable
DJRA to communicate with all databases within the replication scenario. In
this case study, this means DB2 for OS/390 and DataJoiner (because
DataJoiner is used to establish connectivity to the Oracle database).
8.3.2.4 Create the Replication Control Tables

Use the Create Replication Control Tables feature of DJRA to create the
replication control tables at the source DB2 for OS/390 sub-system. Use the
same feature to create the replication control tables at the DataJoiner
database which is to act as the replication control server (DJDB). See 4.4.4,
“Create the Replication Control Tables” on page 74 for more information.
8.3.2.5 Bind DProp Capture and DProp Apply

Use the instructions detailed in 4.4.5, “Bind DProp Capture and DProp Apply”
on page 74 to bind Capture against the source database (SJ390DB1) and
Apply against the source, target and control databases (SJ390DB1 and
DJDB).
Once the bind has completed, you are now ready to start defining replication
sources (called registrations) and their associated targets (called replication
subscriptions). A summary of the steps required to configure the replication is
detailed in 8.4, “Implementing the Replication Design” on page 217.

The following summarizes the implementation steps required to configure the
replication environment for this case study. The steps are then expanded and
explained in more detail in the remainder of the chapter.

Each step described below generates an SQL script from DJRA. Once the
script has been modified as described within the detailed sections of this
chapter, and saved, it should be executed using the Run menu option from
the DJRA output window. This step is not explicitly described within the
detailed sections—it is assumed.
1. Define the SALES_SET subscription set
2. Maintain a change history for suppliers
• Register the Supplier table
• Subscribe to the Supplier table
• Add temporal history support to the Supplier table
• Hide DProp control columns
3. Use target site views to denormalize outlet information
• Register the Store and Region tables
• Subscribe to the Store and Region tables
• Add temporal history support to the Store table
• Create the denormalization view
4. Use source site joins to denormalize product information
• Register the Items, ProdLine and Brand tables
• Subscribe to the Products view
• Add temporal history support to the Products table
5. Use a CCD target table to manage sales facts
• Register the Sales table
• Subscribe to the Sales table
6. Add temporal history information to target tables
7. Maintain aggregate information
8. Push down the replication status to Oracle
9. Perform initial load of data into the data warehouse
DProp Capture must be started after the replication definitions have been
created (steps 1 to 8), but before populating the data warehouse for the first
time (step 9). DProp Apply may be started after the data has been loaded into
the warehouse.

Several different approaches were used to replicate the tables from source to
target. The aim here was to show the flexible nature of DProp and to compare
and contrast the different techniques available.
8.4.1 Defining the Subscription Set

The first practical step to perform with DProp is to create the replication
subscription set which will own all the individual subscriptions. The
SALES_SET was created using the Create Empty Subscription Sets option
of DJRA - as shown in Figure 52.
Figure 52. Create the SALES_SET Subscription Set
The subscription set timing has been defined to execute every 1440 minutes
(that is, once every 24 hours) at midnight (presumably when there is little
activity on the servers or network).
Advice: Another option to control the timing of the replication is to use event
based timing. See 3.3.2.3, “Advanced Event Based Scheduling” on page 53
for an example of how to use event based timing to execute your
subscriptions once a day at midnight, on week days only.
The SQL generated by DJRA can be seen in Appendix E.1, “Output from
Define the SALES_SET Subscription Set” on page 347. The generated SQL
was saved and then executed using the Run menu option from the DJRA
output window.

8.4.2 Maintaining a Change History for Suppliers
This section describes the approach adopted to replicate the Supplier table
from the source DB2 for OS/390 system to the target Oracle database. The
target table is required to maintain historical information about suppliers and
provide the ability for time consistent queries.
These requirements can be satisfied with DProp by specifying the target table
to be a complete, non-condensed CCD with an additional column to record
expiry timestamps (for time consistent queries). These attributes are
summarized in Table 9:
Table 9. Attributes of Supplier Target Table
Warehouse Attribute DProp equivalent
Maintain historical information Complete, non-condensed CCD

target table.
Maintain temporal histories Additional TIMESTAMP column

to record the expiry date of a
record and SQL After to maintain
the validity period.
Figure 53 shows the relationship between the source and target Supplier
tables.

Supplier
Supp_no
Supp_Name
Source
Target
Supplier(CCD)
Supp_no
Supp_Name
IBMSNAP_INTENTSEQ
IBMSNAP_OPERATION
IBMSNAP_COMMITSEQ
IBMSNAP_LOGMARKER
EXPIRED_TIMESTAMP
Suppliers(view)
Supplier_Number
Supplier_Name
Valid_From
Valid_To
Figure 53. Transformation of Supplier Table
All columns pre-fixed with IBMSNAP are DProp control columns which are
required and are automatically maintained by Apply for CCD target tables. A
view named Suppliers will be created to hide these control columns from
warehouse users and also to rename the IBMSNAP_LOGMARKER and
EXPIRED_TIMESTAMP columns to more meaningful names (see 8.4.2.4,
“Hiding DProp Control Columns” on page 228 for more details).
8.4.2.1 Register the Supplier Table

The registration definition for the Supplier table was generated using the
DJRA Define One Table as a Replication Source function (see Figure 54).
Business intelligence (BI) applications generally only require After-images of

columns to be replicated because most target warehouse tables will maintain
histories of changes to the source table (and therefore already hold the
Before-image in a separate record). Throughout this case study only
After-images are replicated from source to target.

Figure 54. Define Supplier Table as a Replication Source
Disable Automatic Full Refresh

The SQL generated by DJRA was modified to disable the automatic full
refresh of the target table by setting
ASN.IBMSNAP_REGISTER.DISABLE_REFRESH=1. This can be seen in the
following code segment, which is taken from the DJRA generated SQL:
-- insert a registration record into ASN.IBMSNAP_REGISTER
INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER,
SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED,
SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE,
DISABLE_REFRESH,ARCH_LEVEL,BEFORE_IMG_PREFIX,CONFLICT_LEVEL,
PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’SUPPLIER’, 0 , 1 ,’Y’,’Y’,
’ITSOSJ’,’CDSUPPLIER’,’ITSOSJ’,’CDSUPPLIER’, 1 ,’0201’,NULL,’0’,’N’);
For a complete listing of the SQL used for registering Supplier, see Appendix
E.2, “Output from Register the Supplier Table” on page 348.
Advice: Alternatively, the DJRA generated SQL could be executed

unmodified and full refresh disabled at a later date using the SQL update
statement:
UPDATE ASN.IBMSNAP_REGISTER SET DISABLE_REFRESH=1
WHERE SOURCE_OWNER=’ITSOSJ’
This approach has the advantage that multiple registrations can have refresh
disabled with a single SQL statement (in the example above, all those tables
registered and owned by ITSOSJ), but suffers from the drawback that the

change is not documented in the original SQL scripts. The preferred
approach is to modify the generated SQL scripts before execution.
After the modifications have been made and saved, the file was executed
from the DJRA output window using the Run menu option, thus defining the
Supplier table as a replication source.
Advice: Full refresh is a costly process which involves direct access to

source tables and not change capture from the logs. Ideally, it will happen
only once, when the target is populated for the first time. It is a common
procedure to disable this facility. Full refresh was also disabled in this
particular case because we are maintaining a history (non-condensed CCD)
table at the target. During an automatic full refresh, Apply issues a DELETE
FROM TARGET_TABLE to remove all the rows from the target before it starts to
copy the new data from the source table. This would destroy all the history
information which had been maintained at the target—something we
obviously want to avoid because we cannot regenerate this information.
Therefore, for non-condensed CCD tables it is essential that full refresh be
disabled.
With full refresh disabled, the administrator must synchronize Capture and
Apply before change capture replication can be enabled. This can be done
either manually or by using the Off-line load option of DJRA. Refer to 8.4.9,
“Initial Load of Data into the Data Warehouse” on page 261 for more details
on performing this synchronization.
If automatic full refresh was enabled, then Apply would automatically perform
the full refresh and synchronize itself with Capture when it is started.
8.4.2.2 Subscribe to the Supplier Table

Supplier is added to the SALES_SET Subscription Set created in 8.4.1,
“Defining the Subscription Set” on page 219 by using the Add a Member to
Subscription Set function of DJRA. Figure 55 shows this definition.
Although the current version of DJRA does not allow the direct creation of
CCD tables at non-IBM targets, it is possible to work around this by editing
the generated SQL prior to execution. Future versions of DJRA may well
support this function directly.

Figure 55. Subscription Definition for Supplier Table
Note the following from the subscription definition shown in Figure 55:
• In this case, the Target table qualifier field is SIMON. This specifies the
user and schema who will own the target CCD table in Oracle. This must
be an already existing Oracle user.
Advice: When creating a target table at a non-IBM database, the target
table qualifier field must be set to a DataJoiner user who has a user
mapping defined to the remote server where the target table is to be
created. DJRA uses the remote authid from this user mapping to
determine the schema and owner of the remote table. This is not the case
when creating CCD tables in non-IBM targets because we are fooling
DJRA into thinking the CCD table will be in the local DataJoiner database.
Essentially we have to perform the mapping ourselves by specifying an
existing Oracle user who will own the CCD table. If the mapping is not
correct, the CREATE TABLE statement will fail during the subscription
definition with the following error message:
SQL0204N "SQLNET object: Unknown " is an undefined name. SQLSTATE=42704
• Target structure should be a CCD table and the DataJoiner non-IBM target
server should be (None). DJRA will issue a message and will not generate

any SQL if CCD and a non-IBM target are specified. We will edit the
generated SQL in order to create the CCD in Oracle.
• At this point, no primary key should be defined. Non-condensed CCD
tables by definition cannot have the same primary key columns as the
source table because they contain multiple records with the same source
key values (the history). Indexes may be added to warehouse tables to
improve performance of applications and queries.
• Use the Setup button next to Target structure in order to set the properties
of the CCD. We would like the target history table to start off as a complete
copy of the source table and then to grow over time as it maintains
historical information. Therefore, the target should be a complete,
non-condensed CCD table - Figure 56 shows this definition. For more
information on CCD table attributes, refer to the DB2 Replication Guide
and Reference, SR5H-0999.
Note: The Setup button is only available on DJRA versions 2.1.1.140 and
later. If you are using an earlier version of DJRA, then you will have to edit the
generated SQL to ensure that the following condition has been set:
ASN.IBMSNAP_SUBS_MEMBR.TARGET_CONDENSED=’N’.
Figure 56. CCD Table Attributes for Supplier

Advice: It is also possible to use the CCD properties window to include
additional DProp control columns (from the ASN.IBMSNAP_UOW table) in
the target warehouse. For example, the IBMSNAP_AUTHID column in the
UOW table maintains information about the userid who performed the SQL
operation which changed the source table. This can be replicated to a target
table and can be useful for audit and tracking purposes.
Create CCD Table in Oracle

As previously mentioned, it is necessary to modify the SQL generated for this
subscription so that the CCD table is created at the Oracle target and not in
the DataJoiner database. To achieve this, modify the SQL as follows:
1. Comment out the CREATE TABLESPACE statement. Since the table will be
created in Oracle and not DataJoiner, there is no need to create a
DataJoiner tablespace to hold the target table.
2. Change the tablespace name in the CREATE TABLE statement to the
DataJoiner Server Mapping for the Oracle database where the data
warehouse will exist (in our example AZOVORA8). A DataJoiner nickname
will automatically be created as part of the create table process. This uses
a feature of DataJoiner called Data Definition Language (DDL)
transparency which was introduced in DataJoiner V2.1.1. For more
information about the DDL transparency feature of DataJoiner, refer to the
Create Table statement in the DataJoiner Application Programming and
SQL Reference Supplement, SC26-9148.
3. Modify the CREATE TABLE statement to add the EXPIRED_TIMESTAMP
column to support temporal histories - see 8.4.6, “Adding Temporal History
Information to Target Tables” on page 250 for more information.
The SQL script segment below shows these modifications in bold:

--* If you don’t see: -- now done interpreting REXX logic file
--* TARGSVR.REX, then check your REXX code
--*
-- in TARGSVR.REX
-- About to create a target table tablespace
--CREATE TABLESPACE TSSUPPLIER MANAGED BY DATABASE USING (FILE
--’/data/djinst5/djinst5/SUPPLIER.F1' 2000);
-- now done interpreting REXX logic file TARGSVR.REX
-- The target table does not yet exist

-- Not remote to DataJoiner target
-- Create the target table SIMON.SUPPLIER
CREATE TABLE SIMON.SUPPLIER(SUPP_NO DECIMAL(7 , 0) NOT NULL,SUPP_NAME
CHAR(25) NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL,
IBMSNAP_OPERATION CHAR(1) NOT NULL,IBMSNAP_COMMITSEQ CHAR(10) FOR BIT
DATA NOT NULL,IBMSNAP_LOGMARKER TIMESTAMP NOT NULL,
EXPIRED_TIMESTAMP TIMESTAMP) IN AZOVORA8;

For a complete listing of the SQL used to create the Supplier subscription,
see Appendix E.3, “Output from Subscribe to the Supplier Table” on page
349.
Advice: If using a version of DJRA earlier than 2.1.1.140, then the generated
SQL would also have to be modified to remove the auto-registration of the
CCD. This is the SQL insert into the ASN.IBMSNAP_REGISTER table at the
end of the generated SQL. Failure to remove this record would result in SQL
return code -30090, reason code 18 when Apply attempts to replicate the
data to the target. This is because Apply thinks the CCD table is in DataJoiner
and is attempting to update both the CCD table (in Oracle) and the Register
table (in DataJoiner) in the same Unit Of Work (UOW).
8.4.2.3 Add Temporal History Support to the Supplier Table

To maintain temporal history information for the Supplier target table, use the
technique described in 8.4.6, “Adding Temporal History Information to Target
Tables” on page 250.
The specific SQL After statements used to maintain temporal histories for the
Supplier table are:
UPDATE SIMON.SUPPLIER A SET EXPIRED_TIMESTAMP =
( SELECT MIN(IBMSNAP_LOGMARKER) FROM SIMON.SUPPLIER B
WHERE A.SUPP_NO = B.SUPP_NO AND
A.EXPIRED_TIMESTAMP IS NULL AND
B.EXPIRED_TIMESTAMP IS NULL AND
(B.IBMSNAP_INTENTSEQ > A.IBMSNAP_INTENTSEQ))
WHERE A.EXPIRED_TIMESTAMP IS NULL
AND A.IBMSNAP_OPERATION IN (’I’,’U’);
UPDATE SIMON.SUPPLIER A SET EXPIRED_TIMESTAMP =

( SELECT B.IBMSNAP_LOGMARKER FROM SIMON.SUPPLIER B
WHERE A.SUPP_NO = B.SUPP_NO AND
B.IBMSNAP_OPERATION = ’D’ AND
B.IBMSNAP_INTENTSEQ = A.IBMSNAP_INTENTSEQ)
WHERE A.EXPIRED_TIMESTAMP IS NULL
AND A.IBMSNAP_OPERATION = ’D’;
Use these statements in conjunction with the information described in 8.4.6,

“Adding Temporal History Information to Target Tables” on page 250 to
support temporal histories within the Supplier target table.

8.4.2.4 Hiding DProp Control Columns
The following SQL was used to create a view in Oracle which hides the
DProp CCD control columns not required by data warehousing applications
from the end users:
CREATE VIEW SIMON.SUPPLIERS AS
SELECT S.SUPP_NO as SUPPLIER_NUMBER,
S.SUPP_NAME as SUPPLIER_NAME,
S.IBMSNAP_LOGMARKER as VALID_FROM,
S.EXPIRED_TIMESTAMP as VALID_TO
FROM SIMON.SUPPLIER S;
The view definition was stored in a file and executed directly from SQL*Plus.
For some useful hints on using SQL*Plus, see Appendix B.1.2, “Using
Oracle’s SQL*Plus” on page 325.
Advice: IBMSNAP_LOGMARKER was not hidden (just renamed) because it

is used as the starting timestamp of the records validity period.
8.4.3 Using Target Site Views to Denormalize Outlet Information

This section describes the approach taken to replicate and denormalize the
information held in the Store and Region tables into a single entity called
Outlets. Outlets describes the relationship between individual stores and the
regions in which they operate.
The technique used in this case is to copy the individual source tables to
target tables and perform denormalization through a view at the target site.
This approach is adopted in this case study to compare and contrast the
technique with the one discussed in 8.4.4, “Using Source Site Joins to
Denormalize Product Information” on page 237. Performing the join at the
target and not the source also alleviates the source system from having to
perform join operations against base and CD tables (as discussed in 8.4.4,
“Using Source Site Joins to Denormalize Product Information” on page 237).
To understand the replication techniques used for Store and Region, we first
have to understand the applications which work on the source data. For the
Region table:
• A record is inserted into the table when a new region is added.
• There are no deletes from the Region table. When a region no longer
contains any stores, the region information is maintained in the table and
the CONTAINS_STORES flag is updated with an ’N’.
• No other columns in the table are updated.

Over a period of time, one particular region will always be uniquely identified
by a certain Region_Id (although it may not contain any stores at that
particular point in time). We also know that an update to the
CONTAINS_STORES column would involve either the creation of the first
store in that region, or the removal of the last store within that region.
Therefore, Region is updated only in association with a change in the Store
table.
By analyzing the source application behavior, we can now define a replication

solution which will denormalize the data and allow for time-consistent queries
by replicating Region as a Point In Time (PIT) and Store as a CCD. The
resultant join view between the two tables at the target will provide a history
of changes to Store and Region. This view is called Outlets.
The data warehouse attributes and their DProp equivalents for the Store and
Region target tables are summarized in Table 10.
Table 10. ’Attributes of Store and Region Target Tables
Warehouse attribute DProp equivalent—Store DProp equivalent—Region
Maintain historical Complete, non-condensed Only replicating inserts to

information CCD target table. PIT will generate a history
table.
Maintain temporal Additional TIMESTAMP Since a change in Region is

histories column to record expiry always associated with a
date of a record and SQL change in Store,
After to maintain the validity maintaining expiry
period. information for Store will be
sufficient.
Denormalize data in Denormalize at the target by Denormalize at the target by

tables a join view with Region. a join view with Store.
Figure 57 summarizes the relationship between the source and target Store
and Region tables and their denormalization through a target site view.

Store
Store_Num
Region
CompNo Region_Id
Name Region_Name
Street
City Contains_Stores
Zip
Region_Id
Source
Target
Store(CCD) Region(PIT)
Store_Num Region_Id
CompNo
Name Region_Name
Street IBMSNAP_LOGMARKER
City
Zip
Region_Id
IBMSNAP_INTENTSEQ
IBMSNAP_OPERATION
IBMSNAP_COMMITSEQ
IBMSNAP_LOGMARKER
EXPIRED_TIMESTAMP
Outlets(view)
Store_Num
CompNo
Name
Street
City
Region_Id
Region_Name
Valid_From
Valid_To
Figure 57. Transformation of Store and Region into Outlets
Outlets is a view defined over the Store and Region target tables. For details
on the view definition, please refer to 8.4.3.4, “Create the Denormalization
View” on page 236.
Although in this case the target table type for Region is PIT, by analyzing the
application behavior and only replicating inserts we will actually create a
target table which maintains historic information (because records are only
appended to it).
Replicating Inserts Only

Replication of only the inserts for the Region table can be achieved by using
the following predicate in the subscription definition for that table:
IBMSNAP_OPERATION =’I’

IBMSNAP_OPERATION is a column within the CD table which is
automatically generated and maintained by Capture. It contains:
• I — if the operation against the source record was an SQL INSERT.
• U — if the operation against the source record was an SQL UPDATE.
• D — if the operation against the source record was an SQL DELETE.
Advice: There are three simple approaches for removing unwanted records
from the target history table:
1. The first approach, described here, is to place a predicate on the
subscription definition that prevents the unwanted records from
replicating. This is probably the simplest method, but also means that full
refresh for the source must be disabled.
2. The second approach would be to replicate the unwanted records, and
simply create a view at the target which does not include these records.
This has the disadvantage of replicating unwanted records, which would
consume network resources and CPU cycles. However, if at a later date
deletes are required in the target history, then this method simply requires
the target view to be redefined.
3. The third approach is the most flexible: Define a view over the source
table and register this view as a source for replication. However, before
executing the generated SQL for this registration, modify the CREATE
VIEW statement for the change data view in the generated SQL and add
the predicate IBMSNAP_OPERATION='I'. This way, the subscription does
not even know that the filtering is occurring and all the subscriptions will
be simpler once the source is set up this way. This CD-view technique also
works for both full refresh and differential refresh because the predicate is
defined on the CD table view and subsequently will not be applied to the
source during a full refresh.
8.4.3.1 Register the Store and Region Tables

Create the registration definitions for the source Store and Region tables
using the DJRA Define Multiple Tables as Replication Sources function
(see Figure 58).

Figure 58. Registration of Store and Region Tables
Modify the generated SQL to disable full refresh (by setting

DISABLE_REFRESH=1 in the Register table) for the Region and Store
tables. The changed segments of the generated SQL are shown below:

PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’REGION’, 0 , 1 ,’Y’,’Y’,
’ITSOSJ’,’CDREGION’,’ITSOSJ’,’CDREGION’, 1 ,’0201’,NULL,’0’,’N’);

PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’STORE’, 0 , 1 ,’Y’,’Y’,
’ITSOSJ’,’CDSTORE’,’ITSOSJ’,’CDSTORE’, 1 ,’0201’,NULL,’0’,’N’);
The complete listing of the SQL executed to define these registrations can be
found in Appendix E.4, “Output from Register the Store and Region Tables”
on page 351.
Automatic full refresh for Region is disabled because we are going to define a
predicate in the subscription definition for Region, which prevents SQL
updates from replicating. This predicate refers to the IBMSNAP_OPERATION
column, which only exists in the CD table for Region. During full refresh,

Apply would attempt to enforce this predicate directly against the source
Region table. Since the Region table does not contain this column, the full
refresh would fail with error message:
SQL0206N "<name>" is not a column in an inserted table....
Full refresh for the Store table is also disabled so that the historical
information held in this table does not get lost during a full refresh from the
source table.
For details on how the Store and Region tables where loaded into the target
database, refer to 8.4.9, “Initial Load of Data into the Data Warehouse” on
page 261.
8.4.3.2 Subscribe to the Store and Region Tables

Create and add both subscription definitions for Store and Region to the
SALES_SET using the DJRA Add a Member to Subscription Set feature.
The DJRA window used for defining the Region subscription member is
shown in Figure 59.
Figure 59. Defining the Region Subscription Member

• The Target table qualifier is DJINST5. This user must have a DataJoiner
User Mapping defined to map the DataJoiner user DJINST5 to a valid
Oracle user. In this case, a User Mapping exists which maps DataJoiner
user DJINST5 to Oracle user SIMON. Therefore, DJRA will use SIMON as
the schema name for the target table when it is created in Oracle.
• Target structure is PIT. This type of target table is directly supported by
DJRA to non-IBM targets.
• The Source primary key is selected (Region_Id). PIT target tables must
have a primary key (unlike CCD tables).
• The IBMSNAP_OPERATION=’I’ predicate is added to ensure that updates are
not replicated.
The generated SQL from the DJRA tool for the Region subscription can be
found in Appendix E.5, “Output from Subscribe to the Region Table” on page
353.
The DJRA window used to define the Store replication subscription can be
seen in Figure 60.
Figure 60. Subscription Definition for Store Table

• Again, we are creating the CCD table in Oracle, and therefore the Target
table qualifier field is SIMON. This is the user and schema who will own
the target CCD table in Oracle. It must be an already existing Oracle user.
Refer to advice on page 224 for more information on setting Target table
qualifier.
• Target structure is set to CCD and the DataJoiner non-IBM target server
must be (None). The Setup button was also used to specify the CCD
properties to be non-condensed, complete.
• No primary key should be defined for the target CCD table.

Modify the generated SQL in order to add temporal history support and create
the target CCD in the Oracle database. These modifications are shown below
and are similar to those described in “Create CCD Table in Oracle” on page
226:
--*
-- in TARGSVR.REX
-- CREATE TABLESPACE TSSTORE MANAGED BY DATABASE USING (FILE
--’/data/djinst5/djinst5/STORE.F1’ 2000);

-- Create the target table SIMON.STORE
CREATE TABLE SIMON.STORE(COMPNO DECIMAL(3 , 0) NOT NULL,STORE_NUM
DECIMAL(3 , 0) NOT NULL,NAME CHAR(25) NOT NULL,STREET CHAR(25) NOT
NULL,ZIP DECIMAL(5 , 0) NOT NULL,CITY CHAR(20) NOT NULL,REGION_ID
DECIMAL(3 , 0) NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT
NULL,IBMSNAP_OPERATION CHAR(1) NOT NULL,IBMSNAP_COMMITSEQ CHAR(10) FOR
BIT DATA NOT NULL,IBMSNAP_LOGMARKER TIMESTAMP NOT NULL,
The full listing of the SQL used to define the Store subscription can be found
in Appendix E.6, “Output from Subscribe to the Store Table” on page 355.
8.4.3.3 Add Temporal History Support to the Store Table

Temporal history information only needs to be maintained for the Store table
because a change in Region is always associated with a change in Store. To
maintain temporal history information for this table use the technique
described in 8.4.6, “Adding Temporal History Information to Target Tables” on
page 250.

The specific SQL After statements used to maintain temporal information for
the Store table are:
UPDATE SIMON.STORE A SET EXPIRED_TIMESTAMP=
( SELECT MIN(IBMSNAP_LOGMARKER) FROM SIMON.STORE B
WHERE A.STORE_NUM = B.STORE_NUM AND
A.COMPNO = B.COMPNO AND
WHEREA.EXPIRED_TIMESTAMP IS NULL AND
A.IBMSNAP_OPERATION IN (’I’,’U’);
UPDATE SIMON.STORE A SET EXPIRED_TIMESTAMP=

( SELECT B.IBMSNAP_LOGMARKER FROM SIMON.STORE B
WHERE A.STORE_NUM = B.STORE_NUM AND
A.COMPNO = B.COMPNO AND
A.IBMSNAP_OPERATION = ’D’;
Use these statements in conjunction with the information described in 8.4.6,

“Adding Temporal History Information to Target Tables” on page 250 to
support temporal histories within the Store target table.
8.4.3.4 Create the Denormalization View

A view is created at the Oracle target database to make it easier for the
business analyst to access the store and region information. The view
performs a join between the Store and Region tables on the REGION_ID
column. Because the tables are relatively small, performance of the data
warehouse will not be affected.
The following view definition was saved to a file and then executed from
Oracle’s SQL*Plus:
CREATE VIEW simon.outlets AS
SELECT s.store_num, s.compno, s.name,
s.street, s.city, s.region_id, r.region_name,
s.ibmsnap_logmarker as valid_from,
s.expired_timestamp as valid_to
FROM simon.store s,
simon.region r
WHERE s.region_id = r.region_id;
To execute the file in SQL*Plus, start Oracle SQL*Plus and then type:

start file_name
8.4.4 Using Source Site Joins to Denormalize Product Information

This section describes the approach taken to replicate and denormalize the
information held in the Item, ProdLine and Brand tables into a single target
table called Products. Products contains information about individual
products, the line which the product belongs to, and the brand of that product.
The technique used is to create a view at the source site which performs the
denormalization. This view is then used as the source for replication, the
target table being the materialization of the source view.
Since we would like to maintain historic data at the target, the target table
type should be a non-condensed, complete CCD table. These attributes are
summarized in Table 11.
Table 11. Replication Attributes of Items, ProdLine and Brand Tables
Warehouse attribute DProp equivalent
Maintain historical information Complete, non-condensed CCD target table.
Maintain temporal histories Additional TIMESTAMP column to record expiry date

of a record and SQL After to maintain the temporal
information.
Denormalize data in Items, Create a source site view which performs the
ProdLine and Brand denormalization and register this as a source for
replication.
Figure 61 shows the relationship between the three source tables, the
Products source site view, and the target CCD table.

Items ProdLine Brand
Item_Num Prod_Line_Num Brand_Num
Desc
Prod_Line_Num Desc Desc
Supp_No Brand_Num
Products(view)
Item_Num
Item_Description
Prod_Line_Num
Product_Line_Desc
Supplier_Num
Brand_Num
Brand_Description
Source
Target
Products(CCD)
Item_Num
Item_Description
Prod_Line_Num
Product_Line_Desc
Supplier_Num
Brand_Num
Brand_Description
IBMSNAP_INTENTSEQ
IBMSNAP_OPERATION
IBMSNAP_COMMITSEQ
IBMSNAP_LOGMARKER
EXPIRED_TIMESTAMP
Figure 61. Transformation of Items, ProdLine and Brand
In order to be able to perform change replication from a source site view,

DProp requires that the base tables which compose the view and the view
itself be registered as sources for replication. The registration of the
component base tables are straightforward registrations. The registrations of
the Products view automatically performs the following:
1. Creates a view which joins the CD table of ProdLine with the base
tables of Items and Brand and registers this view as a source for
replication with SOURCE_VIEW_QUAL=1.
2. Creates a view which joins the CD table of Brand with the base tables
of Items and ProdLine and registers this view as a source for
replication with SOURCE_VIEW_QUAL=2.

3. Creates a view which joins the CD table of Items with the base tables of
ProdLine and Brand and registers this view as a source for replication
with SOURCE_VIEW_QUAL=3.
This can be seen in the generated SQL in Appendix E.8, “Output from
Register the Products View” on page 361.
DProp Apply will use these views to determine the change data to replicate to
the target. Each of these views joins one CD table with all other base tables
from the original view. Therefore, when Apply is serving this subscription
cycle, it will be accessing the source tables directly (and joining these with
CD tables). This is an important fact to consider when replicating from a
source site view because DProp is no longer working purley from log based
change capture, but is also accessing base tables directly. This may impact
the performance of the source applications.
8.4.4.1 Register the Items, ProdLine and Brand Tables

The first step when defining a view as a replication source is to define the
base tables within the view as replication sources. Therefore, we have to
register Items, Prodline and Brand as replication sources, using the Define
Multiple Tables as Replication Sources feature of DJRA (see Figure 62).
Figure 62. Defining Multiple Base Tables as Replication Sources
Alter the generated SQL to prevent full refresh of all the base tables by
setting ASN.IBMSNAP_REGISTER.DISABLE_REFRESH=1. The modified
sections of the code can be seen below:

PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’BRAND’, 0 , 1 ,’Y’,’Y’,
’ITSOSJ’,’CDBRAND’,’ITSOSJ’,’CDBRAND’, 1,’0201’,NULL,’0’,’N’);
COMMIT;
-- Disabled FULLREFRESH by setting DISABLE_REFRESH=1 above.

PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’ITEMS’, 0 , 1 ,’Y’,’Y’,
’ITSOSJ’,’CDITEMS’,’ITSOSJ’,’CDITEMS’, 1 ,’0201’,NULL,’0’,’N’);
COMMIT;

PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’PRODLINE’, 0 , 1 ,’Y’,’Y’,
’ITSOSJ’,’CDPRODLINE’,’ITSOSJ’,’CDPRODLINE’, 1 ,’0201’,NULL,’0’,’N’);
COMMIT;
The complete SQL used to register the Items, ProdLine and Brand tables is in
Appendix E.7, “Output from Register the Items, ProdLine, and Brand Tables”
on page 357.
Define and Register the Products View

The view which denormalizes the Items, Prodline, and Brand tables is called
Products and has the following definition:
CREATE VIEW DB2RES5.PRODUCTS AS
SELECT I.ITEM_NUM, SUBSTR(I.DESC,1,40) AS ITEM_DESCRIPTION,
I.PROD_LINE_NUM, P.DESC as PRODUCT_LINE_DESC,
I.SUPP_NO as SUPPLIER_NUM, P.BRAND_NUM, B.DESC as BRAND_DESCRIPTION
FROM ITSOSJ.ITEMS I,
ITSOSJ.PRODLINE P,
ITSOSJ.BRAND B
WHERE I.PROD_LINE_NUM=P.PROD_LINE_NUM
AND P.BRAND_NUM=B.BRAND_NUM;
Advice: Remember to use correlation ids when creating the view. When
registering the view as a replication source DJRA parses the view definition

and expects correlation ids to be present. If correlation ids are not used,
DJRA will not be able to parse the view definition and the registration will fail.
The view uses the SQL SUBSTR function to perform some data manipulation
on the DESC column on the source system. The view was created using
SPUFI on the OS/390 source system.
Once the view has been created, it can be registered as a replication source
using the Define DB2 Views as Replication Sources function in DJRA
(shown in Figure 63).
Figure 63. Defining a DB2 View as a Replication Source
As with the registration of the base tables, the generated SQL is modified to
disable full refresh for the ProductsA, ProductsB and ProductsC views
(shown below):
-- register the base and change data views for component
DISABLE_REFRESH,CCD_OWNER,CCD_TABLE,CCD_OLD_SYNCHPOINT,SYNCHPOINT,
SYNCHTIME,CCD_CONDENSED,CCD_COMPLETE,ARCH_LEVEL,BEFORE_IMG_PREFIX,
CONFLICT_LEVEL,PARTITION_KEYS_CHG) VALUES(’N’,’DB2RES5’,’PRODUCTS’, 1 ,
1 ,’Y’,’Y’,’DB2RES5’,’PRODUCTSA’,’ITSOSJ’,’CDPRODLINE’, 1 ,NULL,NULL,
NULL,NULL,NULL ,NULL,NULL,’0201’,NULL,’0’,’N’);
-- register the base and change data views for component 2


1 ,’Y’,’Y’,’DB2RES5’,’PRODUCTSB’,’ITSOSJ’,’CDBRAND’, 1 ,NULL,NULL,

1 ,’Y’,’Y’,’DB2RES5’,’PRODUCTSC’,’ITSOSJ’,’CDITEMS’, 1 ,NULL,NULL,
A complete listing of the SQL used to register the Products view as a

replication source can be found in Appendix E.8, “Output from Register the
Products View” on page 361.
8.4.4.2 Subscribe to the Products View

The subscription definition for the Products target table uses a similar
technique to that described in 8.4.2.2, “Subscribe to the Supplier Table” on
page 223.
Figure 64 shows the DJRA function used to add Products to the SALES_SET
subscription set.

Figure 64. Add Products View Subscription Definition
• Only the DB2RES5.PRODUCTS view needs to be selected as a source for
replication. DJRA hides all the complexity of the base tables and
generated views at this point.
• Once again, because the target CCD is to be created in Oracle, the Target
table qualifier field is set to SIMON. This is the user and schema who will
own the target CCD table in Oracle. It must be an existing Oracle user.
Refer to Advice on page 224 for more information on setting the Target
table qualifier.
• The DataJoiner non-IBM target is defined as (None). CCDs are not directly
supported by DJRA to non-IBM targets. We will modify the generated SQL
prior to execution in order to create the CCD table in Oracle.
• No primary key is defined initially.

The SQL generated from the above DJRA definition is modified in a similar
way to that described in “Create CCD Table in Oracle” on page 226. You can
find the complete version of the SQL script in Appendix E.9, “Output from

Subscribe to the Products View” on page 362. Only the modified components
are shown below:
--*
-- in TARGSVR.REX
-- CREATE TABLESPACE TSPRODUCTS MANAGED BY DATABASE USING (FILE
--’/data/djinst5/djinst5/PRODUCTS.F1’ 2000);

-- Create the target table SIMON.PRODUCTS
CREATE TABLE SIMON.PRODUCTS(ITEM_NUM DECIMAL(13 , 0) NOT NULL,
ITEM_DESCRIPTION CHAR(40) NOT NULL,PROD_LINE_NUM DECIMAL(7 , 0) NOT
NULL,PRODUCT_LINE_DESC CHAR(30) NOT NULL,SUPPLIER_NUM DECIMAL(13 , 0)
NOT NULL,BRAND_NUM DECIMAL(7 , 0) NOT NULL,BRAND_DESCRIPTION CHAR(30)
NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL,
--* Commit work at target server DJDB

--*
COMMIT;
8.4.4.3 Add Temporal History Support to the Products Table

Use the SQL After statements below in conjunction with the information
described in 8.4.6, “Adding Temporal History Information to Target Tables” on
page 250 to support temporal histories within the Products target table.
UPDATE SIMON.PRODUCTS A SET EXPIRED_TIMESTAMP=
( SELECT MIN(IBMSNAP_LOGMARKER) FROM SIMON.PRODUCTS B
WHERE A.ITEM_NUM = B.ITEM_NUM AND
UPDATE SIMON.PRODUCTS A SET EXPIRED_TIMESTAMP=

( SELECT B.IBMSNAP_LOGMARKER FROM SIMON.PRODUCTS B
WHERE A.ITEM_NUM = B.ITEM_NUM AND

8.4.5 Using a CCD Target Table to Manage the Sales Facts
This section describes the configuration used to replicate the Sales table from
DB2 for OS/390 to the Oracle warehouse. The Sales table on the OS/390
contains daily records of all the transactions made within each of the stores. It
holds these records for a period of 7 days. Each evening, a batch job is run to
remove all records which are older than 7 days.
By the nature of the source application, there will never be any updates made
to the Sales table. SQL inserts are performed to record each sale transaction,
and SQL deletes are performed in batch to remove records for housekeeping
purposes. The batch deletes should not be replicated because they are only
being performed for housekeeping purposes and have no significance within
the warehouse (this will also help to reduce by half the number of changes
made to the Sales table that are replicated).
Since only inserts are copied, the Sales table can be replicated to either a PIT
or CCD target and still maintain history information. A PIT target table would
save space and take less network bandwidth compared to a CCD table
because a CCD table has the overhead of maintaining three additional DProp
control columns. However, a PIT target table requires a primary key, and one
is not readily definable on the target table because the uniqueness of a row
cannot be guaranteed even using all target columns. Therefore we have
chosen to make the target table a CCD table.
Advice: An alternative approach would be to generate some kind of artificial

key on the target table and then replicate to a PIT target (the
IBMSNAP_INTENSEQ DProp control column could be used as the artificial
key in this case). This is a valid work-around in this example because there
are no updates or deletes replicated to the target. If updates and/or deletes
were replicated, Apply would attempt to use this artificial key to identify the
rows in the target to update/delete. This would fail because the artificial key
does not exist in the source table.
Table 12 summarizes the attributes of the Sales target table:

Table 12. Replication Attributes of Sales Table
Warehouse attribute DProp equivalent
Maintain historical information. Complete, non-condensed CCD target

table.
Do not replicate batch delete from source Apply predicate to Sales subscription to
to target. prevent deletes from replicating.

Notice that temporal histories do not need to be maintained in the Sales table.
Temporal histories are often required on the dimension tables of a star
schema within a warehouse, but not on the central fact table (which usually
records events). In this case, the event is a sale and the date of the sale is
recorded in the SALE_DATE column.
Figure 65 below shows the relationship between the source and target Sales
tables.
Sales
Date
BasArtNo
Location
Company
StoreNo
Pieces
Out_Prc
Tax
Transfer_Date
Process_Date
Source
Target
Sales(CCD)
Sale_Date
BasArtNo
Location
Company
Pieces
Out_Prc
Tax
Transfer_Date
Process_Date
IBMSNAP_INTENTSEQ
IBMSNAP_OPERATION
IBMSNAP_COMMITSEQ
IBMSNAP_LOGMARKER
Figure 65. Transformation of Sales
8.4.5.1 Register the Sales Table

Since the Sales table will be the most volatile table, it was decided to only
Capture and replicate the columns which are required in the warehouse.
Figure 66 shows the Define One Table as a Replication Source window of
DJRA used to register the Sales table.

Figure 66. Registration Definition for Sales Table
Edit the generated SQL to disable full refresh. Also modify the CREATE
TABLESPACE statement to create a large DB2 for OS/390 tablespace with
enough primary and secondary storage to hold the large amounts of change
data expected for the Sales table. The modified SQL can be seen below:
-- in SRCESVR.REX, about to create a change data tablespace
--CREATE TABLESPACE TSSALES
-- IN SJ390DB1 SEGSIZE 4 LOCKSIZE TABLE CLOSE NO CCSID EBCDIC;
CREATE TABLESPACE TSSALES IN sj390db1
SEGSIZE 4 LOCKSIZE TABLE CLOSE NO CCSID EBCDIC
USING STOGROUP SJDB1SG2 PRIQTY 180000 SECQTY 5000;
--* Source table DB2RES5.SALES already has CDC attribute, no need to

--* alter.
-- selecting ’X’ as the before-image prefix character
-- create the cd/ccd table for DB2RES5.SALES
CREATE TABLE DB2RES5.CDSALES(IBMSNAP_UOWID CHAR(10) FOR BIT DATA NOT
NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL,
IBMSNAP_OPERATION CHAR(1) NOT NULL,DATE DATE NOT NULL,BASARTNO DECIMAL(
13 , 0) NOT NULL,LOCATION DECIMAL(4 , 0) NOT NULL,COMPANY DECIMAL(3 ,
0) NOT NULL,PIECES DECIMAL(7 , 0) NOT NULL,OUT_PRC DECIMAL(11 , 2) NOT
NULL,TAX DECIMAL(11 , 2) NOT NULL,TRANSFER_DATE TIMESTAMP NOT NULL,
PROCESS_DATE TIMESTAMP NOT NULL) IN SJ390DB1.TSSALES ;

PARTITION_KEYS_CHG) VALUES(’N’,’DB2RES5’,’SALES’, 0 , 1 ,’Y’,’Y’,
’DB2RES5’,’CDSALES’,’DB2RES5’,’CDSALES’, 1 ,’0201’,NULL,’0’,’N’);

A full listing of the SQL used to register the Sales table can be found in
Appendix E.10, “Output from Register the Sales Table” on page 365. For
more information on the DB2 for OS/390 CREATE TABLESPACE statement, refer to
the DB2 for OS/390 V5 SQL Reference, SC26-8966.
Advice: When working with source tables of high volatility, it is often

necessary to modify the automatically generated CREATE TABLESPACE statement
to ensure that the tablespaces for CD and CCD tables have sufficient disk
space.
8.4.5.2 Subscribe to the Sales Table

The subscription definition for the Sales table is the final subscription to be
added to the SALES_SET subscription set. Special consideration needs to be
taken when handling the Sales table because of its size and expected change
volume. This is often the case when dealing with the central fact table in a
star schema.
The initial size of the Sales target table is 87Mb with an estimated change
volume of 14Mb per day. To manage these large amounts of change data and
the expected change volume, it is often necessary to define a tablespace at
the target capable of managing large amounts of data. In this case, the
following command was used to create a tablespace in Oracle capable of
holding sales information:
CREATE TABLESPACE BIGTS DATAFILE ’/oracle8/u01/oradata/ora8/bigts.dbf’
SIZE 90M
AUTOEXTEND ON
NEXT 15M ;
Define the tablespace directly from within SQL*Plus. It will have an initial size
of 90M and will be able to automatically extend in chunks of 15M. For more
information on managing Oracle tablespaces, see the Oracle8 Administrator’s
Guide, A58397-01.
Once the Oracle tablespace has been created, the Add a Member to
Subscription Set feature of DJRA is used to create the subscription for the
Sales table (see Figure 67).

Figure 67. Subscription Definition for Sales Table
The Target table attributes are similar to those described in detail in 8.4.2.2,
“Subscribe to the Supplier Table” on page 223.
The Where clause was added to the subscription definition to prevent the
batch deletes from replicating to the target table.
Modify the SQL generated from this subscription as follows:

1. Change the insert into ASN.IBMSNAP_SUBS_COLS for the
TARGET_NAME column from DATE to SALE_DATE. This is because
Oracle does not allow a column named DATE to be defined within a table.
2. Correspondingly, change the CREATE TABLE statement to create a column
called SALE_DATE and not DATE.
3. Comment out the CREATE TABLESPACE command because the CCD will be
created in Oracle and not in DataJoiner.
4. Change the CREATE TABLE statement to replace the DataJoiner tablespace
name with the Oracle Server Mapping so the CCD table will be created in
Oracle.

5. Add the REMOTE OPTION clause to the CREATE TABLE statement to ensure that
the target table is created in the specially defined Oracle tablespace
(BIGTS).
The modified SQL is shown below:

-- Create a new row in IBMSNAP_SUBS_COLS
INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST,
TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION)
VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SALES’ ,’A’,’SALE_DATE’,’N’, 1 ,
’DATE’);

-- CREATE TABLESPACE TSSALES MANAGED BY DATABASE USING (FILE
--’/data/djinst5/djinst5/SALES.F1’ 2000 );

-- Create the target table SIMON.SALES
CREATE TABLE SIMON.SALES(SALE_DATE DATE NOT NULL,BASARTNO DECIMAL(13 , 0)
NOT NULL,LOCATION DECIMAL(4 , 0) NOT NULL,COMPANY DECIMAL(3 , 0) NOT
NULL,PIECES DECIMAL(7 , 0) NOT NULL,OUT_PRC DECIMAL(11 , 2) NOT NULL,
TAX DECIMAL(11 , 2) NOT NULL,TRANSFER_DATE TIMESTAMP NOT NULL,
PROCESS_DATE TIMESTAMP NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT
DATA NOT NULL,IBMSNAP_OPERATION CHAR(1) NOT NULL,IBMSNAP_COMMITSEQ
CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_LOGMARKER TIMESTAMP NOT NULL)
IN AZOVORA8
REMOTE OPTION ’TABLESPACE BIGTS’;
A full listing of the SQL used to define the Sales subscription can be found in
Appendix E.11, “Output from Subscribe to the Sales Table” on page 366.
Now that all the replication registrations and subscriptions have been defined,
we need to look at more detailed information on how to use DProp to support
temporal histories, maintain aggregate information and finally load the data
into the warehouse.
8.4.6 Adding Temporal History Information to Target Tables

DProp is designed to automatically maintain history target tables by using
CCDs. By adding an additional column to the target and some SQL After to
the subscription set, DProp can also easily maintain temporal history
information within the target table. To maintain temporal histories using
DProp:
• Use the IBMSNAP_LOGMARKER DProp control column within the target
table to represent the start of the validity period of each record. This
column contains timestamp information of when the record was changed
on the source system. DProp maintains this information and automatically
replicates it to the target table.

• Add an additional column to each target table for which a validity period
(or temporal history) is required. This column will contain the end of the
validity period for each record. Like the IBMSNAP_LOGMARKER column,
this should be a TIMESTAMP format.
• Define SQL After statements which maintain the validity period for the
records.
The additional column is added to the target table(s) by editing the DJRA
generated SQL for the subscription. In this case study, the column is called
EXPIRY_TIMESTAMP.
Using SQL After Statements to Maintain Validity Periods

SQL After statements execute after the replication subscription set has
successfully completed. The two pseudo SQL statements shown below can
be used to maintain temporal history information for the <tablename> table:
UPDATE <owner>.<tablename> A SET EXPIRED_TIMESTAMP=

( SELECT MIN(IBMSNAP_LOGMARKER) FROM <owner>.<tablename> B
WHERE A.<source_key_column1> = B.<source_key_column1> AND
A.<source_key_column2> = B.<source_key_column2> AND
UPDATE <owner>.<tablename> A SET EXPIRED_TIMESTAMP=

( SELECT B.IBMSNAP_LOGMARKER FROM <owner>.<tablename> B
WHERE A.<source_key_column1> = B.<source_key_column1> AND
A.<source_key_column2> = B.<source_key_column2> AND
The first SQL works by scanning through the <tablename> table for records
with the same source key column value(s) and placing a timestamp in the
EXPIRED_TIMESTAMP column of the oldest of these records. The oldest
record is identified as the one with the lowest value in
IBMSNAP_INTENTSEQ. The IBMSNAP_LOGMARKER value of the new
record is used as the timestamp which is inserted into the
EXPIRED_TIMESTAMP column of the old record. In other words, the start of
the validity period of the new record becomes the end of the validity period of

the old record. This method provides a validity period for all records in the
table, so that a particular record is valid from the value in
IBMSNAP_LOGMARKER to the value in EXPIRED_TIMESTAMP. If
EXPIRED_TIMESTAMP is NULL, then the record is valid at the current point
in time.
The second SQL statement is used to provide additional handling for source
records, which are deleted. This statement looks for records that record a
delete operation against the source. It updates the EXPIRED_TIMESTAMP
column of such records with the IBMSNAP_LOGMARKER of the same
record. In effect, it closes the record’s validity period immediately. It is
included to respect one of the basic principles of life-span modeling, which
states that the start and end dates represent the time in which the object is
true in the modeled reality. Any query requesting information on the object
outside of its modeled validity period should result in false or an SQLCODE
100 being returned. If you leave the end date of a deleted record open,
temporal queries will return true, which is the wrong answer. The point is that
the object was logically deleted, and thus, the state history must reflect this.
For example, consider the following target CCD table:

KeyCol IBMSNAP_LOGMARKER IBMSNAP_OPERATION EXPIRED_TIMESTAMP
A 1999-03-26-11.37.30.000000 I 1999-03-26-13.40.30.000000
A 1999-03-26-13.40.30.000000 U <NULL>
B 1999-03-26-11.37.30.000000 I <NULL>
C 1999-03-26-15.22.21.000000 I <NULL>
The record with the source key column value ’A’ is updated at the source and
’B’ is deleted.These changes are replicated to the CCD table. After Apply has
replicated these changes to the target, but before the SQL After statements
which maintain the temporal histories are executed, the table will contain the
following data:
A 1999-03-26-11.37.30.000000 I 1999-03-26-13.40.30.000000
A 1999-03-26-13.40.30.000000 U <NULL>
A 1999-03-26-18.12.08.000000 U <NULL>
B 1999-03-26-11.37.30.000000 I <NULL>
B 1999-03-26-18.12.08.000000 D <NULL>
C 1999-03-26-15.22.21.000000 I <NULL>
The update and the delete have been recorded in the CCD table, but the
validity period has not been changed. Once the SQL After statements have
been executed, the target table will contain:
A 1999-03-26-11.37.30.000000 I 1999-03-26-13.40.30.000000
A 1999-03-26-13.40.30.000000 U 1999-03-26-18.12.08.000000
A 1999-03-26-18.12.08.000000 U <NULL>
B 1999-03-26-11.37.30.000000 I 1999-03-26-18.12.08.000000
B 1999-03-26-18.12.08.000000 D 1999-03-26-18.12.08.000000
C 1999-03-26-15.22.21.000000 I <NULL>

For both ’A’ and ’B’, the old record now has an expiry timestamp which is the
same as the new records initial timestamp. The new record for ’A’ contains
NULL in the EXPIRED_TIMESTAMP column which means it is valid at the
current point in time. The new record for ’B’ has an expiry timestamp, which is
the same as its initial timestamp, indicating that ’B’ does not exist at the
current point in time.
Advice: It is possible to combine the two SQL update statements used to

maintain temporal histories into a single update statement containing an
embedded CASE expression. Using this technique, temporal histories may
be maintained within a single pass of the table (instead of the two-pass
approach adopted here). However, Oracle does not support the expression,
and therefore DataJoiner is forced to compensate for this lack of functionality
by effectively adopting a two-pass approach. Therefore, in effect, no added
benefit would be achieved (but the SQL would be more complex). When the
SQL After or SQL Before becomes complex, it may be easier to write the SQL
in the native dialect of the target (because you are more familiar with its SQL
dialect), hold it in a stored procedure at the target database and use the
technique described in 7.4.5, “Add Statements or Stored Procedures to
Subscription Sets” on page 199 to call the stored procedure and execute the
SQL directly at the target database.
Figure 68 shows the Add Statements or Procedures to Subscription Sets

function of DJRA used to add temporal history capability to the Supplier table.
Figure 68. Adding the SQL After to Support Temporal Histories

It is important to add the SQL After statements in the order in which they are
presented at the start of this section.
Similar SQL After statements were used to add temporal history support to
the Store and Products target tables. For details of the specific SQL used,
refer to 8.4.2.3, “Add Temporal History Support to the Supplier Table” on page
227 for the Supplier table; 8.4.3.3, “Add Temporal History Support to the
Store Table” on page 235 for the Store table; and 8.4.4.3, “Add Temporal
History Support to the Products Table” on page 244 for the Products table.
The SQL generated to add SQL After statements to the Supplier table can be
seen in Appendix E.12, “SQL After to Support Temporal Histories for Supplier
Table” on page 369.
Tuning the SQL After

To optimize performance of the SQL After, which maintains temporal
histories, it is advisable to create a unique index on the
IBMSNAP_INTENTSEQ column and the column(s) used as the primary key
on the source table. For Supplier, the following unique index is created on the
target table using Oracle’s SQL*Plus:
CREATE UNIQUE INDEX tempidx ON simon.supplier
(SUPP_NO, IBMSNAP_INTENTSEQ);
ANALYZE TABLE simon.supplier COMPUTE STATISTICS;
The ANALYZE command is used to gather statistics for the table and index and
is similar to DB2’s RUNSTATS command. We recommended creating the index
and analyzing the data after the initial load of the data into the target. This
way, the statistics will be more accurate.
DataJoiner will not automatically recognize the new Oracle index. To make
DataJoiner aware of the index, connect to the DataJoiner database and
create an index on the Supplier nickname using the following syntax:
CREATE UNIQUE INDEX tempidx ON simon.supplier
(SUPP_NO, IBMSNAP_INTENTSEQ);
This does not actually create an index on the nickname; it just populates the
DataJoiner global catalog so that DataJoiner knows there is an index on the
Oracle table. It is also advisable to use DB2 RUNSTATS against the nickname in
order to ensure that the DataJoiner global statistics are up-to-date. For the
Supplier target table, the following SQL was issued from the DB2 Command
Line while connected to the DataJoiner database in order to update the global
statistics:

RUNSTATS ON TABLE simon.supplier WITH DISTRIBUTION AND INDEXES ALL
Advice: If the tables are large, performing RUNSTATS against nicknames

may take a long time to complete. In this case, use the getstats utility, which
can be downloaded from http://www.software.ibm.com/data/datajoiner (the
DataJoiner home page). Alternatively, the DataJoiner global statistics can be
updated manually by using the SYSSTAT views. See the DataJoiner SQL
Reference and Application Programming Supplement, SC26-9148 for more
information.
Finally, we need to tell DataJoiner that the collating sequence used within the
Oracle database is the same as the collating sequence used within the local
DataJoiner database. This allows DataJoiner to push down order-dependent
operations (such as ORDER BY, MIN, MAX, SELECT DISTINCT) to Oracle. If
we do not set this option, DataJoiner must retrieve the necessary data from
Oracle, and perform the ordering locally—this is usually far less efficient
because far more data is transferred from Oracle to DataJoiner. We use the
DataJoiner COLSEQ server option to do this. In this case study, the option is
created for the AZOVORA8 server mapping by issuing the following SQL
from the DB2 Command Line:
CREATE SERVER OPTION colseq FOR SERVER azovora8 SETTING ’y’
This server option only needs to be created once, as it applies to the whole
Oracle server. By creating the COLSEQ server option and setting it to "Y",
performance can improve dramatically. For example, consider the Products
target table which contains 37,000 rows. Without the server option, the SQL
After statement took several minutes to execute. After creating the option,
execution time for the SQL After was less than 5 seconds.
For more details on DataJoiner server options, please refer to the DataJoiner
Application Programming and SQL Reference Supplement, SC26-9148. For
more information about tuning DataJoiner in the heterogeneous environment,
please refer to the DataJoiner Administration Supplement, SC26-9146.
8.4.6.1 Defining a Time Consistent Query

Time consistent data can now be returned from the data warehouse by
adding the following predicates to queries:
(SALE_DATE >= IBMSNAP_LOGMARKER)
AND
(SALE_DATE < IBMSNAP_LOGMARKER OR EXPIRED_TIMESTAMP IS NULL)
However, it is important to establish proper business rules in order to interpret

timestamps correctly during analysis. For example, the above predicates

assume that a record is valid and will appear in the answer set if its validity
period starts at exactly the same time as the sale occurred. Is this a valid
assumption, or should the record be valid when its validity end period exactly
matches the sale date? This can only be determined when proper business
rules have been defined.
Advice: Instead of using a NULL as the default in the

EXPIRED_TIMESTAMP column, a date such as "9999-12-31" could be used
as the default instead. Therefore, a value of "9999-12-31" in the
EXPIRED_TIMESTAMP column would indicate that the record is currently
valid. The above predicate to return time consistent data could then be
simplified to:
(SALE_DATE >= IBMSNAP_LOGMARKER AND SALE_DATE < EXPIRED_TIMESTAMP)
8.4.7 Maintaining Aggregate Information

Pre-defined summary tables are very useful in data warehousing scenarios
where most of the common summary information requested by users can be
pre-determined. Therefore, a common warehousing technique is to
pre-calculate these summaries and store them in table(s) at the data
warehouse.
8.4.7.1 DProp Support for Aggregate Tables

DProp provides support for two types of aggregate target tables:
1. A base aggregate is the result of a query against the base source table
involving one or more SQL column functions and a GROUP BY clause.
2. A change aggregate is the result of a query against the change data table
involving one or more SQL column functions and a GROUP BY clause.
Change aggregates are useful for trend analysis—they summarize recent

activity, but they do not summarize the overall state of your data. That is, they
can help tell you where your business is going, but not where your business
is.
Apply does not maintain base aggregate tables from log based changed data
capture. It maintains base aggregates by querying the application base tables
directly. These tables may be large and contention may occur between Apply
and your OLTP transactions when Apply is accessing the source table(s).
Change aggregates are relatively inexpensive to maintain because Apply
queries the change data table, and not the base table. Not only does this
avoid contention with your OLTP applications, but change data tables are
usually much smaller than application tables. For more information on DProp

aggregate tables, refer to the DB2 Replication Guide and Reference,
S95H-0999.
8.4.7.2 Maintaining a Base Aggregate Table from a Change

Aggregate Subscription
The technique described below combines the benefits of using base

aggregate target tables to summarize your source data with the low cost
maintenance option of change aggregate tables. It is a practical
implementation of the technique described in the white paper D13 Using Data
Replication in Data Warehousing Scenarios. This paper can be found on the
Web at http://www.software.ibm.com/data/dpropr/library.html and contains
useful information on how to maintain data warehouses using DProp.
Figure 69 shows pictorially how the scheme works.

1. The base aggregate subscription (BASEAGG_SET) runs once to populate
the target aggregate table (SIMON.AGGREGATES).
2. Once the base aggregate subscription has finished, it is disabled and the
change aggregate subscription (CHGAGG_SET) takes over maintenance
of the target base aggregate table. The change aggregate set aggregates
the change data into the movement table (SIMON.MOVEMENT).
3. This information is then used to adjust the values in the base aggregate
table at the end of the subscription cycle (using SQL After).

The Target
Source Table Initia Aggregation
lize
Base
Aggregate
1
DB2RES5.SALES
SIMON.AGGREGATES
Maintain
3
CD Table
Change
Maintain Aggregate
2 SIMON.MOVEMENT
UOW Table The Movement

Table
Figure 69. Maintain Base Aggregate Table from Change Aggregate Subscription
Let us consider an example for this case study. A common query against the
warehouse is to find the total number of items sold and the total price of all
these items broken down by store. The following SQL statement can be used
to provide this analysis:
SELECT company, location, sum(pieces), sum(out_prc)
FROM sales GROUP BY company,location
We would like to have this information precalculated and stored within the
warehouse. By using the process described, it is possible to maintain such a
target aggregate table from a change aggregate subscription. The SQL script
detailed in Appendix E.13, “Maintain Base Aggregate Table from Change
Aggregate Subscription” on page 370 was used to maintain the aggregate
shown above within the data warehouse (and contains detailed comments on
how the scheme works).
Advice: Use the Replication Analyzer with the DEEPCHECK option to check
the validity of the SQL Before and SQL After statements before starting the
subscription.

In the example used in this case study, if we select the data from the base
aggregate table, then a summation of the total number of items sold and the
total price of those items grouped by store number and company number is
displayed. However, a simple join with the Outlets view will yield the store
name and city. For example, the following query may be issued from
SQL*Plus against the data warehouse:
SELECT o.NAME, o.CITY, a.SUM_PIECES, a.SUM_OUTPRC FROM
OUTLETS o, SIMON.AGGREGATES a WHERE
o.STORE_NUM=a.LOCATION AND o.COMPNO=a.COMPANY AND
o.Valid_To is NULL
ORDER BY a.SUM_OUTPRC DESC;
This will produce the following output:

NAME CITY SUM_PIECES SUM_OUTPRC
------------------------ -------------------- ---------- ----------
PARF. SIMON SUED GMBH NUERNBERG 40121 766878.23
PARF. SIMON NORD GMBH HAMBURG 19885 748414.80
PARF. SIMON SUED GMBH FRANKFURT 25932 597813.25
CHRIS’ PARFUEMERIE GMBH DORTMUND 12260 539272.99
PARF. SIMON SUED GMBH MANNHEIM 15828 514384.91
PARF. SIMON SUED GMBH MUENCHEN 13428 510742.99
PARF. SIMON SUED GMBH WIESBADEN 17003 431994.30
PARF. SIMON WEST GMBH BREMEN 9560 397343.00
CHRIS’ PARFUEMERIE GMBH ESSEN 8801 388603.62
PARF. SIMON WEST GMBH DUEREN 13668 371753.06
PARF. SIMON SUED DARMSTADT 17521 367486.95
The Valid_To is NULL predicate is added to ensure that only those records
from the Outlets table which are valid at the present time are used. The ORDER
BY clause will order the data so that the stores which take the most money will
appear first in the report.
This technique will work for SQL column functions AVG, COUNT and SUM. It
is not possible to use the technique with the MIN and MAX column functions
(these functions will still have to be maintained directly from the source tables
by using standard base aggregate subscriptions).
If Capture is cold-started, then you will probably need to reactivate the base
aggregate set and deactivate the change aggregate set to refresh the base
aggregate table. Once the refresh is complete, the base aggregate set will
automatically be deactivated, and the change aggregate set will be activated.
8.4.8 Pushing Down the Replication Status to Oracle

DProp maintains current replication status information in the Apply
Subscription Set table (ASN.IBMSNAP_SUBS_SET). DBAs can use the
information within this table to determine the success or failure of a
subscription cycle.

The Subscription Set table is located at the control server, which is always a
DB2 database (in this case study, the DB2 database within DataJoiner). The
DBA of the multi-vendor target database is likely to want the status
information in a format (and database) which is familiar to them. They are
unlikely to want to logon to a DB2 database just to find the status of the
replication subscriptions.
The technique described below is used to create a status table in the

heterogeneous target database and populate the contents of that table with
replication status information from the Subscription Set table:
1. Connect to the DataJoiner database that is acting as the control server
and create the status table in Oracle using syntax similar to the following:
CREATE TABLE SIMON.DPROP_STATUS(
APPLY_QUAL CHAR(18),
SET_NAME CHAR(18),
STATUS SMALLINT,
LASTRUN TIMESTAMP,
LASTSUCCESS_RUN TIMESTAMP,
CONSISTENT_TO TIMESTAMP) IN AZOVORA8;
By using DataJoiner’s DDL transparency feature, the DPROP_STATUS
table is created in the Oracle target server (AZOVORA8) and a nickname
is automatically generated for that table.
The DPROP_STATUS table will contain one row for each subscription set
recording information about the current status of the subscription, the last
time it was run, the last time it was successfully run, and a timestamp
indicating the point in time to which the target data is consistent with the
source data.
2. Insert an initial record into the DPROP_STATUS table for each
subscription set which requires monitoring:
INSERT INTO SIMON.DPROP_STATUS VALUES(
’WHQ1’,’SALES_SET’,NULL,NULL,NULL,NULL);
3. Now insert an SQL After statement into the
ASN.IBMSNAP_SUBS_STMTS table for each subscription set being
monitored. The SQL After statement updates the DPROP_STATUS Oracle
table (by using the nickname) with the current information for that
subscription from the Subscription Set table. The sample code below was
used for the WHQ1 subscription set:
-- insert the SQL AFTER into the table
VALUES(’WHQ1’,’SALES_SET’,’S’,’A’, 4,’E’,

’UPDATE SIMON.DPROP_STATUS SET (STATUS, LASTRUN, LASTSUCCESS_RUN,
CONSISTENT_TO) =
(SELECT STATUS, LASTRUN, LASTSUCCESS, SYNCHTIME
FROM ASN.IBMSNAP_SUBS_SET WHERE
APPLY_QUAL=’’WHQ1’’ AND SET_NAME=’’SALES_SET’’ AND
WHOS_ON_FIRST=’’S’’)’,
’0000002000’);
Be sure that this is the last SQL After statement for the subscription set.
This is done by setting the STMT_NUMBER column higher than any other
value for that set. The SQL After statement can be added to the
subscription set either by using the DJRA Add Statements or
Procedures to Subscription Sets function, or by inserting the
information directly into the Subscription Statements table (as shown
above).
4. We also need to update the subscription set table to tell Apply that we
have added an SQL After statement to the set:
UPDATE ASN.IBMSNAP_SUBS_SET SET AUX_STMTS=AUX_STMTS+1 WHERE
APPLY_QUAL=’WHQ1’ AND SET_NAME=’SALES_SET’ AND WHOS_ON_FIRST=’S’;
Now, when the WHQ1 subscription set has executed, Apply will automatically
update status information into the Subscription Set table. The SQL After
statement will then be executed, which will copy this status information into
Oracle by using DataJoiner. The multi-vendor DBA can now access DProp
status information using the tools and techniques which they are familiar with.
For example, the Oracle DBA could use the following SQL query from
SQL*Plus to obtain the status of the last replication cycle:
SELECT APPLY_QUAL,
SET_NAME,
STATUS,
TO_CHAR(LASTRUN,’IYYY-MM-DD-HH24:MI:SS’),
TO_CHAR(LASTSUCCESS_RUN,’IYYY-MM-DD-HH24:MI:SS’),
TO_CHAR(CONSISTENT_TO,’IYYY-MM-DD-HH24:MI:SS’)
FROM DPROP_STATUS;
8.4.9 Initial Load of Data into the Data Warehouse

Now that all the replication definitions are in place, we can proceed with
loading the data into the data warehouse and then synchronizing Capture and
Apply so that the target tables can be maintained by change capture from the
source tables.

Note: Before performing the steps detailed in this section, Capture must be
running at the source server. Refer to the DB2 Replication Guide and
Reference, S95H-0999 for details on how to start Capture for MVS.
Since the subscription definition has full refresh disabled, the initial full
refresh of the data and synchronization of Capture and Apply must be
performed manually.
The DJRA Off-line Load utility can be used to help load data into the
warehouse manually. The utility will only unload/load data one subscription
set at a time. Therefore, we will have to unload/load all the data from the
SALES_SET at once.
The four steps that off-line load utility guides you through are these:
1. Prepare the tables for the off-line load:
• Disable full refresh for the subscription set members.
• Disable the subscription set.
• Initiate change capture by performing synchpoint translation for
each source table.
2. Unload the data from the source tables.
3. Load the data into the target tables.
4. Reactivate the subscription set.
Steps 1 and 4 are performed by the Off-line Load utility. Steps 2 and 3, the
unloading and loading of the data, must be performed manually by the
replication administrator.
There are many ways in which the unload and load tasks can be performed.
The most suitable method is usually determined by the volume of data being
loaded into the target. Several of the most common alternatives are
described below.
8.4.9.1 Using SQL INSERT....SELECT.... from DataJoiner

A quick and easy way to transfer relatively small amounts of data is to use the
distributed request facility inherent in DataJoiner. To use this, a DataJoiner
Server Mapping and User Mapping need to be defined for the DB2 for
OS/390 system where the source data resides. We assume that a server
mapping already exists for the Oracle target database.
Once the DB2 for OS/390 Server Mapping has been defined, create a
nickname for the replication source tables.

Now for a PIT target type, the table can be populated using SQL similar to the
following (which was used for the Region table):
INSERT INTO DJINST5.REGION
(REGION_ID, REGION_NAME, IBMSNAP_LOGMARKER)
SELECT REGION_ID,
REGION_NAME,
’1997-12-01’ AS IBMSNAP_LOGMARKER
FROM DJINST5.REGION_SOURCE
Advice: We need to ensure that the timestamp we initially load into the
IBMSNAP_LOGMARKER column is either the same or earlier than the
minimum date from the central fact table (because we use this column to
denote the start of the validity period). If we do not do this, then the predicate
described in 8.4.6.1, “Defining a Time Consistent Query” on page 255 may
not return all the valid rows because the SALE_DATE may be after the initial
timestamp marking the start of that records validity period. In this case study,
the following SQL was issued against the source Sales table to find the
correct timestamp to use:
SELECT MIN(DATE) FROM DB2RES5.SALES
For a CCD table, we have to generate values for the additional DProp control
columns. The example below shows the SQL used to populate the Supplier
table:
INSERT INTO SIMON.SUPPLIER
(SUPP_NO,
SUPP_NAME,
IBMSNAP_INTENTSEQ,
IBMSNAP_OPERATION,
IBMSNAP_COMMITSEQ,
IBMSNAP_LOGMARKER)
SELECT SUPP_NO,
SUPP_NAME,
x’00000000000000000001’ as IBMSNAP_INTENTSEQ,
’I’ as IBMSNAP_OPERATION,
x’00000000000000000001’ as IBMSNAP_COMMITSEQ,
’1997-12-01’ as IBMSNAP_LOGMARKER FROM DJINST5.SUPPLIER_SOURCE
Default values must be generated for the DProp control columns because
they do not exist within the source table and the columns are defined as NOT
NULL on the target table.
8.4.9.2 Using DataJoiner’s EXPORT/IMPORT Utilities

The IMPORT and EXPORT utilities within DataJoiner can also be useful for
transferring data from source to target. The utilities can be used directly

against nicknames, or if a direct connection is made to a DB2 family
database, then the utilities can be used against the table directly. The
simplest format to use when transferring data using these utilities is IXF.
We used the following SQL script to export data from the Products view on
DB2 for OS/390 and import data into the Oracle Products table by using a
DataJoiner nickname:
-- Manual addition to export the data
CONNECT TO SJ390DB1 USER db2res5 using;
EXPORT TO products.ixf OF IXF

SELECT ITEM_NUM,
ITEM_DESCRIPTION,
PROD_LINE_NUM,
PRODUCT_LINE_DESC,
SUPPLIER_NUM,
BRAND_NUM,
BRAND_DESCRIPTION,
x’00000000000000000001’ as IBMSNAP_INTENTSEQ,
’I’ as IBMSNAP_OPERATION,
x’00000000000000000001’ as IBMSNAP_COMMITSEQ,
’1997-12-01’ as IBMSNAP_LOGMARKER
FROM db2res5.products;
CONNECT RESET;
-- Manual addition to import the data

CONNECT TO DJDB USER djinst5 using;
IMPORT FROM products.ixf OF IXF INSERT INTO simon.products;
CONNECT RESET;
Since the target Oracle table is a CCD, additional column values are
generated for the DProp control columns on export.
Advice: EXPORT will create the IXF file on the machine where the EXPORT
command is issued. If there is a significant amount of data in the file, then it
should be transferred to the machine where the Oracle target database
resides before using the IMPORT command. This will dramatically improve
the performance of the IMPORT because DataJoiner will be able to perform
the SQL inserts locally against the Oracle database (and not across the
network). Of course this is only possible if DataJoiner is on the same machine
as Oracle.
8.4.9.3 Using DSNTIAUL and Oracle’s SQL*Loader Utility

For large amounts of data—such as those that will be encountered in the
initial population of a data warehouse, or when populating Very Large
Databases (VLDBs)—an alternative unload and load mechanism must be
found. The method described below by-passes DataJoiner completely and
uses native unload/load mechanisms.

The most efficient method to transfer large amounts of data between DB2 for
OS/390 and Oracle is to use the DB2 for OS/390 DSNTIAUL sample program
to unload the data and Oracle’s SQL*Loader utility to import the data. This
section describes how this mechanism can be used to perform the initial load
of the Sales target table (which contains 787,000 records and is
approximately 87MB).
Use DSNTIAUL to Unload the Data

DSNTIAUL is a DB2 for OS/390 sample assembler program which is shipped
in the sample library. It is an assembler program whose source code is part of
the DB2 for OS/390 machine readable materials. SQL select statements can
be used as input to DSNTIAUL, the outputs being two types of Basic
Sequential Access Method (BSAM) fixed format block files—containing data
which is suitable for the DB2 for OS/390 LOAD utility. This data can also be
used as input for the Oracle SQL*Loader utility.
The JCL used for invoking the DSNTIAUL program in our environment is
shown below:
//DB2RES5$ JOB (999,POK),’DSNTIAUL’,
// CLASS=A,MSGCLASS=T,MSGLEVEL=(1,1),TIME=1440,
// NOTIFY=DB2RES5
//*
//DELETE EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
DELETE DB2RES5.SYSREC00;
DELETE DB2RES5.SYSPUNCH;
SET MAXCC = 0;
/*
//*
//UNLOAD EXEC PGM=IKJEFT01,DYNAMNBR=20,COND=(4,LT)
//STEPLIB DD DISP=SHR,DSN=DB2V510.SDSNLOAD
//DBRMLIB DD DISP=SHR,DSN=DB2V510I.DBRMLIB.DATA
//SYSPRINT DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*
//SYSREC00 DD DSN=DB2RES5.SYSREC00,UNIT=SYSDA,
// VOL=SER=PSOFT6,SPACE=(CYL,(500,0),RLSE),DISP=(,CATLG)
//SYSPUNCH DD DSN=DB2RES5.SYSPUNCH,UNIT=SYSDA,
// VOL=SER=SAP007,SPACE=(1024,(15,15)),DISP=(,CATLG)
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
DSN S(DB2I)
RUN PROGRAM(DSNTIAUL) PLAN(DSNTIB51) PARMS(’SQL’) -
LIB(’DB2V510I.RUNLIB.LOAD’)
/*
//SYSIN DD *
SELECT
CHAR(DATE,ISO) AS DATE,
CHAR(BASARTNO) AS BASARTNO,
CHAR(LOCATION) AS LOCATION,
CHAR(COMPANY) AS COMPANY,
CHAR(PIECES) AS PIECES,
CHAR(OUT_PRC) AS OUT_PRC,
CHAR(TAX) AS TAX,
CHAR(DATE(TRANSFER_DATE),ISO) AS TRANSFER_DATE,

CHAR(DATE(PROCESS_DATE),ISO) AS PROCESS_DATE,
CHAR(CURRENT DATE,ISO) AS IBMSNAP_LOGMARKER
FROM DB2RES5.SALES;
/*
//
This JCL will probably need modification to meet particular site requirements
and configurations. When using DSNTIAUL it is important to estimate the size
of the dataset which will be created and use the SPACE allocation of the
SYSREC00 DD statement to ensure there is sufficient disk space available. Use
the RLSE parameter to shorten the data set to the space occupied by the data
at the time the data set is closed.
In the above job, two files are created:

SYSREC00 contains the exported data from the Sales table.
SYSPUNCH contains the corresponding control statements for the
DB2 for OS/390 LOAD utility.
The SYSPUNCH file is similar to the control file required by Oracle’s

SQL*Loader and can be transferred to AIX along with the data file and then
edited to comply with the Oracle control format.
The SQL select statement used to extract the data is at the bottom of the JCL
file. In order to overcome the differences in the representations of various
data types (for example, decimal, integer) between OS/390 and AIX, the
externalized data must be converted to character prior to the unload. It is then
converted back to the corresponding data type for the target tables during the
load. The Sales table contained three data types which were converted to
CHARACTER using the techniques described below:
• DECIMAL columns are converted to CHARACTER format by using the
CHAR SQL function. For example: CHAR(TAX) AS TAX .
• DATE columns are converted to CHARACTER format using the CHAR
SQL function with an additional parameter indicating the format of the date
within the character field. For example: CHAR(DATE,ISO) AS DATE .
• TIMESTAMP columns are converted to CHARACTER by first of all using
the DATE function to convert the TIMESTAMP to a DATE type. The result
of this was subsequently converted to CHARACTER using the same
method as that described for the DATE type above. For example:
CHAR(DATE(PROCESS_DATE),ISO) AS PROCESS_DATE. The time information in the
TIMESTAMP is lost when it is converted to a DATE. This is acceptable in
this situation because even though the TRANSFER_DATE and
PROCESS_DATE columns where of TIMESTAMP type, they only
contained DATE information.

For more information on CHAR, DATE and other DB2 functions, refer to the DB2
for OS/390 V5 SQL Reference, SC26-8966.
Although the target Sales table is a CCD, many of the DProp control columns
can be omitted from the SQL used by DSNTIAUL. This is because they can
be added as constant values from the SQL*Loader control file. This reduces
the amount of data which is held in the export file created by DSNTIAUL, and
consequently the amount of data which will be transferred across the
network.
Advice: The only DProp control column to be added to the export file is
IBMSNAP_LOGMARKER. This can be added as the CURRENT DATE DB2
special register and not the CURRENT TIMESTAMP special register. This is
because Oracle does not have the same precision for TIMESTAMPS as DB2.
In fact, Oracle stores all its time and date information in columns of type
DATE, which can only hold data accurate to the second (not 100,000 ths of a
second like DB2).
The SYSPUNCH output file generated by DSNTIAUL is shown below:

LOAD DATA LOG NO INDDN SYSREC00 INTO TABLE
TBLNAME
(
DATE POSITION( 1 )
CHAR( 10) ,
BASARTNO POSITION( 11 )
CHAR( 15) ,
LOCATION POSITION( 26 )
CHAR( 6) ,
COMPANY POSITION( 32 )
CHAR( 5) ,
PIECES POSITION( 37 )
CHAR( 9) ,
OUT_PRC POSITION( 46 )
CHAR( 17) ,
TAX POSITION( 63 )
CHAR( 17) ,
TRANSFER_DATE POSITION( 80 )
CHAR( 10) ,
PROCESS_DATE POSITION( 90 )
CHAR( 10) ,
IBMSNAP_LOGMARKER POSITION( 100 )
CHAR( 10)
)
The first few lines of the SYSREC00 output dataset follows:

1997-12-01 0000000012211. 0246. 063. 0000001. 000000033.00
000000004.301900-01-011997-12-041999-03-26
1997-12-01 0000000019471. 0083. 061. 0000001. 000000029.50
000000003.851900-01-011997-12-041999-03-26
1997-12-01 0000000019489. 0083. 061. 0000001. 000000029.50
000000003.851900-01-011997-12-041999-03-26
1997-12-01 0000000019513. 0054. 063. 0000001. 000000022.00
000000002.871900-01-011997-12-041999-03-26

1997-12-01 0000000019794. 0109. 063. 0000001. 000000011.95
000000001.561900-01-011997-12-041999-03-26
1997-12-01 0000000019935. 0041. 061. 0000001. 000000021.95
000000002.861900-01-011997-12-041999-03-26
1997-12-01 0000000019976. 0116. 061. 0000001. 000000019.95
000000002.601900-01-011997-12-041999-03-26
1997-12-01 0000000019984. 0086. 062. 0000001. 000000007.95
000000001.041900-01-011997-12-041999-03-26
1997-12-01 0000000022913. 0054. 063. 0000001. 000000009.95
000000001.301900-01-011997-12-041999-03-26
1997-12-01 0000000022913. 0063. 062. 0000001. 000000003.95
000000000.521900-01-011997-12-041999-03-21
As can be seen, all the data is in a readable character format.
For more information on supporting data movements between different

platforms with DSNTIAUL refer to Chapter 5 "Data Extractions" of the
redbook, Migrating and Managing Data on RS/6000 SP with DB2 Parallel
Edition, SG24-4658.
Transfer the Data and Control File to AIX

Transfer both the SYSPUNCH control file and the SYSREC00 data file to the
AIX machine hosting Oracle using ftp (in ASCII format). Rename the files to
sales.ctl and sales.dat respectively to conform to Oracle’s naming standards
(although this is not required).
Prepare the Control File for SQL*Loader

Edit the sales.ctl file to create the following Oracle SQL*Loader control file:
LOAD DATA
-- LOG NO
INDDN "sales.dat"
DISCARDFILE "sales.dis"
INSERT
-- REPLACE
INTO TABLE simon.sales
(
SALE_DATE POSITION(1:10) DATE ’YYYY-MM-DD’ ,
BASARTNO POSITION(11:25) DECIMAL EXTERNAL ,
LOCATION POSITION(26:31) DECIMAL EXTERNAL ,
COMPANY POSITION(32:36) DECIMAL EXTERNAL ,
PIECES POSITION(37:45) DECIMAL EXTERNAL ,
OUT_PRC POSITION(46:62) DECIMAL EXTERNAL ,
TAX POSITION(63:79) DECIMAL EXTERNAL ,
TRANSFER_DATE POSITION(80:89) DATE ’YYYY-MM-DD’ ,
PROCESS_DATE POSITION(90:99) DATE ’YYYY-MM-DD’ ,
IBMSNAP_INTENTSEQ CONSTANT ’00000000000000000001’,
IBMSNAP_OPERATION CONSTANT ’I’ ,
IBMSNAP_COMMITSEQ CONSTANT ’00000000000000000001’,
IBMSNAP_LOGMARKER POSITION(100:109) DATE ’YYYY-MM-DD’
)
As you can see, this file is somewhat similar in format to the SYSPUNCH file
generated by DSNTIAUL. A brief summary follows:

1. The header information identifies the following items to the SQL*Loader:
• Name of the input file ( INDDN)
• Name of the file used to hold records that failed the WHEN clause
(DISCARDFILE)
• Table load method (INSERT)
• Name of the table to be loaded ( INTO TABLE)
2. Column 1 of the main body of the control file is the name of the target
column into which the data will be loaded. The name of the DATE column
was changed to SALE_DATE because Oracle does not allow columns
named DATE.
3. Column 2 contains positional information describing the starting and
ending positions of the data within the data file. If this column contains the
keyword CONSTANT, then the value following the keyword will be inserted
into all rows.
4. Column 3 contains the data type specification:
• DECIMAL EXTERNAL means that the data is a decimal number
represented in character format.
• DATE ’YYYY-MM-DD’ means that the data is a date and describes the
format of that date. The format can be any valid Oracle data mask used
with the TO_DATE function.
Full details of the control file format and SQL*Loader parameters can be
found in the Oracle8 Utilities Guide, A58244-01.
Load the Data into Oracle using SQL*Loader

The final step is to use Oracle’s SQL*Loader utility to load the data.
SQL*Loader is invoked by using the following command from the AIX
command line:
sqlload USERID=simon/simon CONTROL=sales.ctl DATA=sales.dat DIRECT=TRUE
The DIRECT=TRUE parameter tells the loader to use the direct path option. This
option creates preformatted data blocks and inserts these blocks directly into
the table. This avoids the overhead of issuing multiple SQL inserts and the
associated database logging which will occur. This is similar to the DB2 UDB
LOAD utility.
Beside the discard file, SQL*Loader also creates a .log file which contains a
log of the work done and a .bad file which contains all the records which could
not be loaded into the target.

8.5 A Star Join Example Against the Data Warehouse Target Tables
A typical query which may be used against the data warehouse described in
this case study is shown below:
SELECT T.YEAR, T.MONTH,
O.REGION_NAME, P.PRODUCT_LINE_DESC, SUM(S.OUT_PRC)
FROM SIMON.SALES S,
SIMON.OUTLETS O,
SIMON.PRODUCTS P,
SIMON.TIME T
WHERE S.BASARTNO = P.ITEM_NUM
AND S.LOCATION = O.STORE_NUM
AND S.COMPANY = O.COMPNO
AND T.YEAR IN (1997,1998)
AND S.SALE_DATE >= O.VALID_FROM
AND (S.SALE_DATE < O.VALID_TO OR O.VALID_TO IS NULL)
AND S.SALE_DATE >= P.IBMSNAP_LOGMARKER
AND (S.SALE_DATE < P.EXPIRED_TIMESTAMP OR P.EXPIRED_TIMESTAMP IS NULL)
GROUP BY T.YEAR, T.MONTH,
O.REGION_NAME,
P.PRODUCT_LINE_DESC
ORDER BY T.YEAR, T.MONTH,
O.REGION_NAME,
5 DESC;
The query produces a summary of the total sales recorded in the Sales table
during 1997 and 1998 grouped by region and product line. Essentially, it tells
us the best (and worst) selling product lines by region over a 2-year period.
8.6 Summary
During this chapter we have seen how to maintain a data warehouse within
Oracle from changes captured from a DB2 for OS/390 system. Specific
techniques have been discussed showing how to use DProp to maintain
historic information, denormalize data, and maintain temporal histories within
the target data warehouse. Advanced techniques showing how to push down
the replication status to the warehouse and how to maintain base aggregate
tables from change aggregate subscriptions have also been demonstrated.
Many of the techniques discussed within this chapter apply not only to data
warehousing situations, but to replication situations where DProp is used as
the product of choice.

Chapter 9. Case Study 4—Sales Force Automation, Insurance
This scenario will illustrate the update-anywhere capability of DProp. The

changes performed into the source tables will be replicated towards the target
tables, and the changes performed into the target tables will be replicated
back towards the source tables. We will also illustrate the conflict detection
mechanism involved with update-anywhere replication.
The scenario also illustrates the use of data replication in an occasionally

connected, mobile environment.
In such an environment, not all the data will be replicated from the source
server towards all the target servers. Each target database will receive only a
subset of rows that are of interest for that particular target database. The
subsetting will be done according to a geographical criterion (agency code for
example). Since the partitioning data is not present in every source table, this
scenario will also illustrate the use of view registrations to implement the
subsetting technique.
The objectives of the scenario may be summarized as follows:

• Illustrate update-anywhere replication between DB2 UDB and Microsoft
Access.
• Explain the row-subsetting technique, based on the use of source views.
• Illustrate the conflict detection mechanism between DB2 UDB updates
and Microsoft Access updates.
Remark
At the time this book was written, the DB2 DataPropagator for Microsoft Jet
product was still in test phase, so the results described below should be
considered with some degree of caution.

The scenario describes a sales force automation application, for an insurance
company.
The insurance company’s head office owns the corporate data and runs the
reference applications. The insurance company has several agencies spread
all over the country, and sales representatives in each agency.

Each sales representative in the company is equipped with a laptop running
Windows 95 and Microsoft Access. On his laptop, the sales representative
has all the information he needs to prepare his customers’ interviews and
manage his business. That way, sales representatives can visit their
customers more often and more efficiently.
Each sales representative is attached to only one agency, and each customer
is usually managed by only one sales representative. The sales
representative’s Microsoft Access tables contain all the data pertaining to all
the customers that are attached to the sales representative’s agency. If a
sales representative is not available, he can ask one of his colleagues from
the same agency to replace him for a specific customer case. Sales
representatives do not have access to the data that belong to other agencies.
Periodically, once a day for example, the sales representatives connect to

the head office. They start the replication to transmit their updates to the head
office, and refresh their own data with the updates from the head office. The
updates from the head office are the ones that were originated directly by the
head office people, or that were previously originated by other sales
representatives, from other laptops.
A customer is allowed to move from one agency to another, if his address

changes, for example. We will explain how to deal with this specific issue.
9.1.1 Data Model

There are four source DB2 UDB tables at the head office (see Figure 70):
• CUSTOMERS
• CONTRACTS
• VEHICLES
• ACCIDENTS
Also, there are the equivalent four target tables in Microsoft Access. The
target tables have the same structure as the source tables.

CUSTOMERS CONTRACTS
CUSTNO CONTRACT
....... .......
AGENCY CUSTNO
VEHICLES ACCIDENTS
PLATENUM CUSTNO
....... ACCNUM
CUSTNO
Figure 70. Data Model
The SQL statements that we used to create the tables are shown in Appendix
F.1, “Structures of the Tables” on page 381.
In our scenario the source database is called SJNTDWH1, and the schema of
the source tables is called IWH.
9.1.2 Comments about the Table Structures

When you look at the fields that are present in the tables, you notice that the
customer number is present in all the tables, but the agency number is
present only in the CUSTOMERS table. Since we want to be able to transmit,
to a specific sales representative, only the rows that relate to the customers
of his agency, we will need to create join views between the CUSTOMERS
table and the three other tables. These views, which will be used as
replication sources, are detailed in a later section.
Since we will be replicating join views, we must take care of the issue
involving "double-delete" (or "simultaneous-delete"). What happens if a row
that was deleted from the CUSTOMERS table and the corresponding row,
Case Study 4—Sales Force Automation, Insurance 273

from the join point of view, is also deleted from the CONTRACTS table during
the same replication cycle?
The problem is that, since the row was deleted from the two components of
the join, it does not appear in the views (base views and CD-views) and so
the double-delete is not replicated.
There are ways to deal with this issue. A well-known technique is to define a
side CCD table for one of the components of the join. This CCD table should
be condensed and non-complete (you can define it as complete, but this is
not necessary) and located on the target server. The IBMSNAP_OPERATION
column of this CCD table is used to detect the deletes. The most common
way to do this is to add an SQL after statement in the definition of the
subscription set. The SQL statement will remove, from the target table, all the
rows for which the IBMSNAP_OPERATION is equal to "D" in the CCD table.
But in this scenario we are replicating between DB2 and Microsoft Access,
and we cannot create a CCD table into a Microsoft Access database.
Furthermore, DB2 DataPropagator for Microsoft Jet does not allow the use of
SQL After statements. So if we wanted to really deal with the double-delete
issue in this scenario we would need to:
• Create a CCD table on the source server. This means that we would have
an extra Apply program running on the source server to feed the CCD
table.
• Insert some code in the ASNJDONE user exit so that, after each
replication, it would connect to the source server, read the content of the
CCD table, and delete the target rows if IBMSNAP_OPERATION is equal
to "D".
In fact, in most real production environments, the double-delete issue does

not exist, because one of the components of the join never has deletes, for
example. In most cases the CUSTOMERS table will only have inserts and
updates, no deletes. If a customer cancels all his contracts, an indicator will
be updated in the CUSTOMERS table to show that this customer is no longer
active. In our scenario, we will assume that the CUSTOMERS table has no
deletes, and so the double-delete issue disappears.
9.2 Update-Anywhere Replication versus Multi-Site

Update Programming
In a traditional multi-site update application, the program simultaneously

accesses all the databases, and must be written in such a way that if one

database update fails, all the other databases are also rolled back (this kind
of programming uses a database feature called two-phase commit). If the
number of sites increases, and specifically when those sites are laptops, such
a synchronous application is not viable because you can never have all the
connections up and running at the same time. That is why an asynchronous
solution, such as Dprop replication, is so helpful. Each site receives the
updates when it needs or wishes to receive them.
But each solution has its own constraints. With an asynchronous

update-anywhere replication scenario, you must be aware that conflicts can
occur if several sites update the same table row during the same replication
cycle. In the sales force automation application described above, the risk of
conflict is limited by the fact that each sales representative only receives the
data that pertain to customers attached to his own agency. Each sales
representative is asked to update only his own customers’ data, and any
exception needs the approval of the sales representative who is responsible
for the customer, so that two sales representatives do not update the same
data at the same time.
In a DB2-towards-Microsoft-Access replication scenario, the Microsoft

Access tables are called row-replicas. Row-replica is the only target table
type allowed for such a scenario. The conflict detection mechanism that is
used with row-replicas differs from the one that is used with other types of
update-anywhere scenarios (DB2-to-DB2 scenarios, for example), so we will
illustrate how the conflict detection mechanism operates here and show how
the conflict is reported.
In this scenario, the conflicts would normally be limited to cases where a

sales representative updates a table row, and a staff member at the head
office also updates the same row during the same day, for example. If a
conflict occurs, the head office (source server) will always win.
You should always try to find a way to prevent conflicts from occurring. For
example, in our scenario, a convention should be established between sales
representatives and the head office, so that they do not update the same
tables the same day.

This chapter details the software components that are necessary to
implement the scenario.
We will be replicating data between a DB2 UDB for Windows NT server and
several Microsoft Access databases located on other Windows NT servers or

on laptops running Windows 95, using DB2 DataPropagator for Microsoft Jet
(also referred to as ASNJET in this chapter).
9.3.1 Ms Jet Update-Anywhere Scenario—System Design

This scenario does not require the full functions of the DataJoiner product.
Only the administration component (DJRA) and the ASNJET component
(equivalent of Dprop Apply, but for Microsoft Access) are required from the
DataJoiner installation CD-ROM.
The complete solution involves the following components:

• Source site:
• DB2 UDB Enterprise Edition for Windows NT, including the Capture
component of Dprop.
• Target workstations (either Windows 95 or Windows NT):
• Microsoft Access.
• CAE (DB2 Client Application Enabler).
• And the ASNJET component of Dprop (IBM DB2 DataPropagator for
Microsoft Jet). There is no separate Capture component on the target
side. ASNJET provides both the Capture and the Apply functions.
• Administration workstation.
• DataJoiner Replication Administration (DJRA).
The administration workstation will be used to create and fill the Dprop
control tables. It can be a separate Windows NT or Windows 95 box, or
it can be any one of the target workstations. Probably the best solution
is to install DJRA on a target workstation that will also be used for tests.
The administration workstation is only required during the set-up phase
and then to maintain, if necessary, the replication environment. It is not
required by the run-time components.
In the implementation described here, the connections between the different

components of the solution use TCP/IP.
If the source server were a DB2 for OS/390 host or an AS/400, the only
difference would be the need to install DDCS or DB2 Connect (or DB2 UDB
Enterprise Edition, since it includes DB2 Connect, or DataJoiner since it
includes DDCS), either on a separate Windows NT server that would operate
as a gateway (DB2 Connect Enterprise Edition), or on each target workstation
(DB2 Connect Personal Edition).

9.3.2 MS Jet Update Anywhere—Replication Design
ASNJET will use a pull/push replication mode. Pull the updates from DB2
UDB towards Microsoft Access; push the updates from Microsoft Access
towards DB2 UDB.
In this replication scenario, the Dprop control tables must be located in a DB2
database. So we will create them in the source server. This is important
because it means that the administration workstation will only need to have
access to the source server. You can define all the replication sources and all
the Subscriptions Sets even before you configure the target workstations.
You can even let ASNJET create the target database and the target tables for
you. If you do not create them yourself, ASNJET will create them
automatically the first time it is run.
The Dprop control tables are the same as for any other Dprop replication
scenario, except that there are two additional tables:
• ASN.IBMSNAP_SCHEMA_CHG: Used to signal modifications to a
subscription.
• ASN.IBMSNAP_SUBS_TGTS: Used by ASNJET to maintain the list of the
row-replica table names. It enables ASNJET to automatically delete a
row-replica table if the corresponding subscription definition was removed
since the last synchronization.

In this section, we discuss the system topology needed to support this case
study, and explain which steps are necessary to install and configure the
corresponding components.

In Figure 71 we show the overall picture for this scenario.

Win NT
HEAD-OFFICE A
P
P
Admin. Workstation L
I
Win 95 C
CAE DB2 UDB Tables
A
T
CAPTURE I
DJ RA O
N
DPropR S
Control
Tables
Win 95 SALES REP 1 Win 95 SALES REP 2 Win 95 SALES REP 3

Laptop Laptop Laptop
CAE CAE CAE
ASNJET ASNJET ASNJET
MS ACCESS Tables MS ACCESS Tables MS ACCESS Tables
APPLICATION APPLICATION APPLICATION
From the system topology diagram shown above, you can see that ASNJET
replaces the function of the Apply component. No additional functionality of
DataJoiner is needed in this scenario. The control tables are located at the
central DB2 UDB server, which acts as the master-copy for the mobile clients.
To establish the database connectivity, DB2 CAE is implemented on the
mobile clients. The sales representatives access their local copy of the

customer information in MS Access through locally installed applications (for
example, for claims handling, or updates of customer data). In the insurance
company’s head-office, applications typically do batch-type processing of
claims and policies, accessing the DB2 UDB database, which contains the
information gathered from all the sales representatives. The replication
administration workstation is also located in the head-office and basically
contains the DJRA component.

This chapter details the implementation steps that you have to follow. Just
use it as a cookbook. The first section gives you a summary implementation
checklist, to be used as a reminder, and the second section explains the
scenario’s specific steps in more detail. The steps that are not specific to a
heterogeneous replication environment, such as the installation of DB2 UDB
or CAE (DB2 Client Application Enabler), for example, are not detailed here,
because they are already explained in the product documentations.
Remark: You will notice that, unlike other Apply components, ASNJET does
not require any bind operation. The only necessary binds are:
• The bind for Capture on the source server.
• The binds for CAE (from the administration workstation and target
workstations towards the source server). Note that if CAE is at the same
level of maintenance on all the workstations, the binds for CAE need only
be done once.
9.4.2.1 Summary Implementation Checklist

Some activities described in the General Implementation Guidelines (see
Chapter 4, “General Implementation Guidelines” on page 61) are not
necessary for this scenario. In particular, all the activities related to the
DataJoiner product itself are not necessary. This means that the section
called "Set Up the Database Middleware Server" is not needed. It is replaced
by a section called "Set up the Database Connectivity between the Source
Server and the Target Workstations". You can see the general implementation
diagram in Figure 72.

Implementation
Guidelines
Setup Database Implement Setup Replication

Connectivity the Replication Administration
Source/Targets Subcomponents Workstation
Create the Bind DProp

Replication CAPTURE
Control Tables (Not APPLY)
Figure 72. General Implementation Steps
We also assume that DB2 UDB is already installed on the source server, that
you have already created the source database and the source tables, and
that Microsoft Access is already installed on the target workstations.
The summary implementation checklist is as follows:
Set up database connectivity between the source and targets:

• On the source server:
• Set up the underlying communication subsystem (TCP/IP, ...)
• Update the database manager configuration with the appropriate
configuration settings
• Set the DB2COMM environment variable
• On each target workstation:
• Set up the underlying communication subsystem (TCP/IP, ...)
• Install CAE (DB2 Client Application Enabler)
• Use the Client Configuration Assistant to configure the connectivity and
to register the source database as an ODBC data source.
• Connect to the source database and bind the db2cli.lst and
db2ubind.lst files. If you install the same level of CAE on all the

workstations, these binds need only be done once, from the first
configured workstation.
Implement the replication subcomponents (see Chapter 4.4.2, “Implement the

Replication Subcomponents (Capture, Apply)” on page 71):
• On the source server, since the Capture component of DProp is included
in DB2 UDB for Windows NT, it is already installed. But you need to:
• Modify the source database configuration to use LOGRETAIN ON
• Increase the APPLHEAPSZ parameter
• Perform a backup of the source database
• On each target workstation, install the following software:
• Microsoft DAO (Data Access Objects)
Note: DAO is the programming interface to Microsoft Jet (the database
engine that Microsoft Access uses). It provides a framework for directly
accessing and manipulating database objects.
• IBM DB2 DataPropagator for Microsoft Jet
Set-up your replication administration workstation (see Chapter 4.4.3, “Set

Up the Replication Administration Workstation” on page 71):
• Install CAE.
• Set-up the database connectivity between the administration workstation
and the source server. No database connection is necessary between the
administration workstation and the target workstations, because all the
Dprop control tables will be located at the source server.
• Install DJRA (DataJoiner Replication Administration).
• Perform the DJRA set-up.
• Create the directories where you will store the SQL scripts that will be
generated by DJRA.
Create the replication control tables (see Chapter 4.4.4, “Create the
Replication Control Tables” on page 74):
• On the administration workstation, use DJRA to generate and run an SQL
script to create the Dprop control tables in the source server database.
Bind DProp Capture (see Chapter 4.4.5, “Bind DProp Capture and DProp
Apply” on page 74):
• On the source server, bind the Capture component on the source
database.

Final status of your replication setup (see Chapter 4.4.6, “Status After
Implementing the System Design” on page 76):
• On the source server, create the join views over the source tables, to
implement the subsetting technique.
• On the administration workstation, use DJRA to generate and run SQL
scripts to define the replication sources.
• On the administration workstation, use DJRA to generate and run SQL
scripts to define the replication targets, for only one target workstation.
Then duplicate the SQL scripts, adapt the scripts for the other target
workstations, and run the scripts.
• On the source server, start Capture.
• On each target workstation, create the password file that will be used by
ASNJET to connect to the source server.
• On each target workstation, start ASNJET.
9.4.2.2 Detailed Implementation Steps

Follow the detailed steps indicated below.
Set-up Database Connectivity Source / Targets

• On the source server:
• Set up the underlying communication subsystem (TCP/IP, ...). Not
detailed here.
• Update the database manager configuration with the appropriate
configuration settings:
db2 update dbm cfg using SVCENAME=xxxx
• Set the DB2COMM environment variable:
DB2COMM=TCPIP
• On each target workstation:
• Set up the underlying communication subsystem (TCP/IP, ...). Not
detailed here.
• Install CAE (DB2 Client Application Enabler). Not detailed here.
• Use the Client Configuration Assistant to configure the connectivity and
to register the source database as an ODBC data source. Not detailed
here.
• Connect to the source database and bind the db2cli.lst and
db2ubind.lst files. If you install the same level of CAE on all the

workstations, these binds need only be done once, from the first
configured workstation.
To do this, select the DB2 Command Window, then go to the
SQLLIB\BND directory, and use the following commands:
db2 connect to SJNTDWH1 user USERID using PASSWORD
db2 bind @db2cli.lst blocking all grant public
db2 bind @db2ubind.lst blocking all grant public
db2 terminate
Implement the Replication Subcomponents

(See Chapter 4.4.2, “Implement the Replication Subcomponents (Capture,
Apply)” on page 71)
• On the source server, since the Capture component of DProp is included
in DB2 UDB for Windows NT, it is already installed. But you need to:
• Modify the source database configuration to use LOGRETAIN ON
• Increase the APPLHEAPSZ parameter
• Perform a backup of the source database
To do this, use the following commands:
db2 update database configuration for SJNTDWH1 using LOGRETAIN ON
db2 update database configuration for SJNTDWH1 using APPLHEAPSZ 2048
db2 backup database SJNTDWH1
• On each target workstation, install the following software:
• Microsoft DAO (Data Access Objects). (you can find it at the following
address: http://www.nesbitt.com/bctech.html ). Not detailed here.
Note: DAO is the programming interface to Microsoft Jet (the database
engine that Microsoft Access uses). It provides a framework for directly
accessing and manipulating database objects.
• IBM DB2 DataPropagator for Microsoft Jet:
During the installation of DB2 DataPropagator for Microsoft Jet, you will
be prompted to enter a value for the ASNJETPATH environment
variable. This variable indicates in which directory DB2 DataPropagator
for Microsoft Jet will find the files it needs (the password file for
example, see details below), and where it will create its own files (log,
trace, ...). It is also in this directory that DB2 DataPropagator for
Microsoft Jet will create the target Microsoft Access database (file
DBSR0001.MDB for sales representative 1, for example), if the target
database does not exist.

For example, indicate the value C:\SQLLIB\BIN for the ASNJETPATH
variable.
If the Target workstation is running Windows NT, you can also create
this variable as a Windows NT system variable.
Remark: Be careful that the variable had perhaps already been defined
as a user variable. If this is the case, remove the user variable
definition.
If the target workstation is running Windows 95, you can set the value
for ASNJETPATH just before starting ASNJET, using the following
command:
set ASNJETPATH=C:\SQLLIB\BIN

(See Chapter 4.4.3, “Set Up the Replication Administration Workstation” on
page 71.)
• Install DB2 CAE.
• Set-up the database connectivity between the administration workstation
and the source server. No database connection is necessary between the
administration workstation and the target workstations, because all the
Dprop control tables will be located at the source server. Do exactly as for
the target workstations. Make sure the source database is registered as
an ODBC data source. If it is not, add it using either the Client
Configuration Assistant or the ODBC utility from the Windows NT’s Control
Panel (the source database must be registered as a System DSN). Not
detailed here.
• Install DJRA (DataJoiner Replication Administration).
• Perform the DJRA set-up.
To do this, follow the following path: Start => Programs => DataJoiner
for Windows NT => Replication => Replication Administration. The
DB2 DataJoiner Replication Administration main panel is then displayed
(see Figure 73):

Figure 73. DB2 DataJoiner Replication Administration Main Panel
Select File => Preference, then select the Connection tab.

Select the source database, then choose Modify, and enter the userid and
password that will be used to connect to the source database, then select
OK.
Note: The userid and password must have been defined on the source
server.
Select OK again to return to DJRA’s main panel.
• Create the directories where you will store the SQL scripts that will be
generated by DJRA. For example:
• ASNJET\SCRIPTS\CONTROL: Will contain the SQL script used to
create the Dprop control tables (script generated by DJRA).
• ASNJET\SCRIPTS\SOURCES: Will contain the SQL scripts used to
define the source tables and the source views as replication sources
(scripts generated by DJRA).
• ASNJET\SCRIPTS\TARGET1: Will contain the SQL scripts used to
create the subscription sets and the subscription members for sales
representative 1 (scripts generated by DJRA).
• ASNJET\SCRIPTS\TARGET2: Will contain the SQL scripts used to
create the subscription sets and the subscription members for sales
representative 2 (scripts copied from ASNJET\SCRIPTS\TARGET1
and then adapted).

• ASNJET\SCRIPTS\TARGETx: .......

(See Chapter 4.4.4, “Create the Replication Control Tables” on page 74.)
• On the Administration Workstation, use DJRA to generate and run an SQL
script to create the Dprop control tables in the source server database.
To do this, select the Create Replication Control Tables option from
DJRA’s main panel. The following panel is then displayed (see Figure 74):
Figure 74. Create Replication Control Tables
Select Generate SQL.

DJRA opens an editor window and generates the SQL script to create the
control tables. Check that you have the Satisfactory completion message
at the end of the generated script. You can directly update the generated
script, if you wish to change the names of the tablespaces, for example.
Then select File, Save As... , and give a name to the generated SQL
script.
You can then run the SQL script by selecting the Run option that is present
in the toolbar of the editor’s window.
Then select the Cancel option from the Create Replication Control Tables
panel.
Another way to run an SQL script is to use the Run or Edit an SQL file
option from DJRA’s main panel.
Bind DProp Capture

(See Chapter 4.4.5, “Bind DProp Capture and DProp Apply” on page 74.)
• On the source server, bind the Capture component on the source
database. This activity must be done only once, after the control tables
have been created.

To do this, select the DB2 command window, then go to the SQLLIB\BND
directory, and use the following commands:
db2 connect to SJNTDWH1 user USERID using PASSWORD
db2 bind @capture.lst isolation ur blocking all grant public
db2 terminate

After establishing the infrastructure for the replication solution, we will now
discuss how to set up the source registrations, and define the subscriptions to
meet the requirements for mobile update anywhere replication.
9.5.1 Creating Source Views to Enable Subsetting

On the source server, create the join views over the source tables, to
implement the subsetting technique.
We want to be able to transmit, to a specific Microsoft Access database, only

the rows that relate to the customers of a specific agency. The customer
number is present in all the tables, but the AGENCY column is present only in
the CUSTOMERS table. So we must create join views, joining the
CUSTOMERS table with all the source tables that do not have the AGENCY
code, and it is these views that will be used as replication sources:
-- View VCONTRACTS (Contracts + Agency code):
CREATE VIEW IWH.VCONTRACTS
(CONTRACT, CONTYPE, CUSTNO, LIMITED, BASEFARE, TAXES, CREDATE,
AGENCY)
AS SELECTA.CONTRACT,
A.CONTYPE,
A.CUSTNO,
A.LIMITED,
A.BASEFARE,
A.TAXES,
A.CREDATE,
B.AGENCY
FROM IWH.CONTRACTS A, IWH.CUSTOMERS B
WHERE A.CUSTNO = B.CUSTNO ;
-- View VVEHICLES (Vehicles + Agency code):

CREATE VIEW IWH.VVEHICLES
(PLATENUM, CONTRACT, CUSTNO, BRAND, MODEL, COACHWORK, ENERGY, POWER,
ENGINEID, VALUE, FACTORDATE, ALARM, ANTITHEFT, AGENCY)
AS SELECTA.PLATENUM,
A.CONTRACT,

A.CUSTNO,
A.BRAND,
A.MODEL,
A.COACHWORK,
A.ENERGY,
A.POWER,
A.ENGINEID,
A.VALUE,
A.FACTORDATE,
A.ALARM,
A.ANTITHEFT,
B.AGENCY
FROM IWH.VEHICLES A, IWH.CUSTOMERS B
-- View VACCIDENTS (Accidents + Agency code):

CREATE VIEW IWH.VACCIDENTS
(CUSTNO, ACCNUM, TOWN, REPAIRCOST, STATUS, ACCDATE, AGENCY)
AS SELECTA.CUSTNO,
A.ACCNUM,
A.TOWN,
A.REPAIRCOST,
A.STATUS,
A.ACCDATE,
B.AGENCY
FROM IWH.ACCIDENTS A, IWH.CUSTOMERS B
Using these join views as replication sources, we will be able to define

subscriptions with rows-selection predicates such as:
(WHERE) AGENCY = 25
Remark: In join views used as replication sources (VCONTRACTS for

example), all the copied columns must come from only one of the tables
referenced in the join view. For example, the columns we will define in the
target CONTRACTS row-replica table will all come from the source
CONTRACTS table. The other columns in the join view (the AGENCY
column) can only be referenced in the subscription predicate. This means we
are not allowed to include the AGENCY column in the target CONTRACTS
table. And it also means that the replication from Microsoft Access to DB2
UDB is not able to update more than one component of the join view.

9.5.2 Registering the Replication Sources
On the administration workstation, use DJRA to generate and run SQL scripts
to define the replication sources (see the details below).
You must first define the physical tables as replication sources before you can
define the join views as replication sources.
For our scenario, we performed the following tasks:

• Define the CUSTOMERS, CONTRACTS, VEHICLES and ACCIDENTS
tables as replication sources.
• Define the VCONTRACTS, VVEHICLES and VACCIDENTS views as
replication sources.
When you generate an SQL script, always choose a meaningful script name
so that you will be able to remember the purpose of the script. For example,
we generated the following scripts: regcust.sql, regcont.sql, regvehi.sql,
regacci.sql, regvcont.sql, regvvehi.sql and regvacci.sql (reg stands for
"registration", which is a synonym for "define a replication source").
9.5.2.1 Registering the Contracts Table

Select the Define One Table as a Replication Source option from DJRA’s
main panel. The following panel is then displayed (see Figure 75):
Figure 75. Define One Table as a Replication Source

Indicate the source table qualifier (IWH), and press the Build List Using Filter
button.
Then choose the CONTRACTS table from the list of source tables, specify
that you will need all the source columns, that you want to capture both
before-images and after-images, that you want to capture the updates as
updates, and choose a standard conflict detection level.
The panel should now look like this (see Figure 76):
Figure 76. Define the CONTRACTS Table as a Replication Source
Select Generate SQL to generate the regcont.sql script. See the generated
SQL script in Appendix F.2, “SQL Script to Define the CONTRACTS Table as
a Replication Source” on page 383.
Save and run the generated SQL script, then select Cancel to come back to
DJRA’s main panel.
Remarks:
• If you want DJRA to generate an SQL script that uses your own naming
conventions (names of the CD tables for example), you can press the Edit
Logic button before you generate the SQL script.
• For the CUSTOMERS table, we chose exactly the same parameters,
except for the update capture policy. We decided that since a customer

can move from one agency to another and since the AGENCY field will be
used in the filtering predicate, any update of the AGENCY field should be
captured as a delete-and-insert pair. That way, if a customer moves from
one agency to another, the "old" agency will receive the delete row, and
the "new" agency will receive the insert row. If this option is not chosen,
the "new" agency will receive the new record, but the old record will
remain in the "old" agency. Of course, depending on your real business
needs, you will choose one logic or the other, because perhaps you would
not want to remove the history of "old" customers from the "old" agency.
9.5.2.2 Registering the VContracts View

Select the Define DB2 Views as a Replication Sources option from DJRA’s
main panel. The following panel is then displayed (see Figure 77):
Figure 77. Define DB2 Views as Replication Sources
Indicate the source view qualifier (IWH), and press the Build List Using Filter
button.
The following panel is then displayed (see Figure 78):

Figure 78. Define DB2 Views as Replication Sources - Continued...
Select the IWH.VCONTRACTS view, then select Generate SQL. See the
generated SQL script in Appendix F.3, “SQL Script to Define the
VCONTRACTS View as a Replication Source” on page 385.
Save and run the generated SQL script, then select Cancel to come back to
DJRA’s main panel.
9.5.3 Defining the Replication Subscriptions

So far we have defined the replication sources (tables and views). We will
now use DJRA to define the replication targets:
On the administration workstation, use DJRA to generate and run SQL scripts
to define the replication targets for the first target workstation (see the details
below). After you have done this, you will duplicate the SQL scripts, adapt the
scripts for the other target workstations, and run the scripts.
The definition of a subscription (that is, a replication target) using DJRA is a

two-step process. You must first define an empty subscription set, and then
you must add members to this empty subscription set.
For this scenario, we will create one subscription set for each table, so we will
have only one member per subscription set. An alternative would have been,
for example, to create one subscription set including the four members. The
performance of the replication would have been a little bit better, but you must

be aware that all the tables in the same set "share the same fate". For
example, if one needs a full-refresh, they all need a full-refresh; and if one is
in error, they are all considered in error.
Remark: If you intend to define referential integrity constraints between the

Microsoft Access tables, which is very likely since it is an update-anywhere
environment, then you must have the same subscription set for all the tables
that are linked by referential integrity constraints. In general, you will define
the same referential integrity constraints between the Microsoft Access tables
as the ones you have defined between the DB2 UDB tables.
For example, if we had defined referential integrity constraints between our

four tables: CUSTOMERS, CONTRACTS, VEHICLES and ACCIDENTS
—using the CUSTNO field as a foreign key—we would create only one
subscription set, CUST0001.
Warning: Neither DJRA nor ASNJET will automatically create the referential
integrity constraints between the Microsoft Access tables. You will have to
define these constraints yourselves. This is important because we will see
later that if you do not create the Microsoft Access tables yourself, ASNJET
will create them for you, the first time it is run, but it will not create the
constraints. In that case you will have to add the referential integrity
constraints after ASNJET has created the tables.
9.5.3.1 Defining Subscriptions for the First Sales Representative

Select the Create Empty Subscription Sets option from DJRA’s main panel.
The following panel is then displayed (see Figure 79):

Figure 79. Create Empty Subscription Sets
Select the Microsoft Jet check box for Target servers , and enter the name of
the Microsoft Access database, for example, DBSR0001 (for DataBase for
Sales Representative 0001 ). Each sales representative will have only one
Microsoft Access database.
The control server must be the same as the source server.
Choose the Apply Qualifier. It must be unique in the replication network.

Choose for example AQSR0001 (for Apply Qualifier for Sales Representative
0001). Each sales representative will use only one Apply Qualifier.
Set name: We decided to create one set per target table, so you can choose
a set name such as CUST0001 (for Set for the CUSTOMERS table for sales
representative 0001 ).
Subscription set timing: Choose a small value for testing purposes (2

minutes, for example). In production, the ASNJET program will most probably
be run with the MOBILE option, and so this frequency information will not be
used.
Your DJRA panel should now look like this (see Figure 80):

Figure 80. Create Empty Subscription Sets - Continued ...
Select Generate SQL. See the generated SQL script in Appendix F.4, “SQL
Script to Create the CUST0001 Empty Subscription Set” on page 386.
Save and run the generated SQL script. Always remember to give a
meaningful script name (such as SETCUST.SQL, for example).
Then select Cancel to come back to DJRA’s main panel.
Repeat the same operations for the three other subscription sets: CONT0001
(for CONTRACTS), VEHI0001 (for VEHICLES) and ACCI0001 (for
ACCIDENTS).
From DJRA’s main panel, select the Add a member to Subscription Sets
option. The following panel is then displayed (see Figure 81):

Figure 81. Add a Member in Subscription Sets
Select SJNTDWH1 as control server (same as source server), then click on

the top Build List button. The four subscription sets are displayed. Select
CONT0001.
You will receive a message saying Target structure must be row replica for
server DBSR0001. Simply answer OK.
Then select the second Build List button. This will display the list of defined
replication sources.
Select IWH.VCONTRACTS (VCONTRACTS appears twice in the list. Simply

select the first one). Do not select IWH.CONTRACTS.
Specify that you want all columns, and indicate the target table
characteristics:
• Qualifier: IWH
• Target table name: CONTRACTS (it does not need to be VCONTRACTS)
• Target structure: Row-replica

• And since the source is a view, you must indicate the name(s) of the
primary key column(s): CONTRACT+
Then enter the filtering predicate in the where clause field:

(AGENCY = 25)
The screen should now look like this (see Figure 82):
Figure 82. Add a Member in Subscription Sets - Continued...
Select Generate SQL. See the generated SQL script in Appendix F.5, “SQL
Script to Add a Member to the CONT0001 Empty Subscription Set” on
page 387.
Save and run the generated SQL script. Select a meaningful script name
(such as MBRCONT.SQL, for example).
Then select Cancel to come back to DJRA’s main panel.
Repeat the same operations to add members for VEHICLES, ACCIDENTS

and CUSTOMERS.

Warning: For the CUSTOMERS table, there is a difference: The source is not
a view, it is the base table itself. So, in the list of source tables, select the
IWH.CUSTOMERS table. The SQL script generated for the CUSTOMERS
table will, of course, have some differences compared to the other ones.
The SQL scripts generation using DJRA is now finished.
9.5.3.2 Defining Subscriptions for the Other Sales Representatives

We have seen the definition of the replication targets for the first target
workstation. We will now explain how we can easily copy the definitions we
have done, for the other target workstations.
So far, we have generated and run the SQL scripts to define the subscription
sets and subscription members for the first sales representative. We stored
these SQL scripts in directory ASNJET\SCRIPTS\TARGET1:
• SETCUST.SQL: Subscription set for the target CUSTOMERS table
• SETCONT.SQL: Subscription set for the target CONTRACTS table
• SETVEHI.SQL: Subscription set for the target VEHICLES table
• SETACCI.SQL: Subscription set for the target ACCIDENTS table
• MBRCUST.SQL: Subscription member for the target CUSTOMERS table
• MBRCONT.SQL: Subscription member for the target CONTRACTS table
• MBRVEHI.SQL: Subscription member for the target VEHICLES table
• MBRACCI.SQL: Subscription member for the target ACCIDENTS table
Now, we will create the equivalent SQL scripts for sales representative 2.
To do this, use the following steps:
• Copy the content of ASNJET\SCRIPTS\TARGET1 towards
ASNJET\SCRIPTS\TARGET2.
• Update SETCUST.SQL: Replace the string ’0001’ by ’0002’ everywhere.
• Update SETCONT.SQL: Replace the string ’0001’ by ’0002’ everywhere.
• Update SETVEHI.SQL: Replace the string ’0001’ by ’0002’ everywhere.
• Update SETACCI.SQL: Replace the string ’0001’ by ’0002’ everywhere.
• Update MBRCUST.SQL:
• Replace the string ’0001’ by ’0002’ everywhere.
• Find the filtering predicate ’AGENCY = 25’ (there should be only one
occurrence) and replace the 25 by the appropriate value for sales
representative 2.

• Update MBRCONT.SQL:
• Replace the string ’0001’ by ’0002’ everywhere.
• Find the filtering predicate ’AGENCY = 25’ (Important: There are
several occurrences) and replace the 25 by the appropriate value for
sales representative 2.
• Update MBRVEHI.SQL and MBRACCI.SQL exactly like you did for
MBRCONT.SQL.
• Then, run all these new SQL scripts: SETxxxx.SQL first and
MBRxxxx.SQL second. To run the scripts you can use either the Run or
Edit an SQL File option from DJRA’s main panel, or the following
commands from a DB2 command window:
db2 -tvf C:\ASNJET\SCRIPTS\TARGET2\SETCUST.SQL |more
db2 -tvf C:\ASNJET\SCRIPTS\TARGET2\SETCONT.SQL |more
and so on...
9.5.3.3 Finalizing the Replication Setup

So far we have defined all the content of the Dprop control tables. We will
now see the remaining activities that are necessary to complete the setup.
• On the source server, start Capture. Go to the DB2 command window, and
type the following command:
asnccp SJNTDWH1 cold trace
Or you can configure the Capture program to run as an NT service. Not
detailed here—see the Replication Guide and Reference, S95H-0999
book for more information.
• On each target workstation, create the password file that will be used by
ASNJET to connect to the source server.
The password file must be created in the directory from where ASNJET
will be started. It must also be the directory indicated by the ASNJETPATH
variable. The simplest way is to create it in C:\SQLLIB\BIN, but be careful
if you uninstall CAE and reinstall it later, because you will have to check
that the password file is still present in C:\SQLLIB\BIN.
The name of the password file must be Apply_Qualifier.PWD
So, in our scenario, it will be AQSR0001.PWD for sales representative 1,
AQSR0002.PWD for sales representative 2, and so on.

The content will be identical for all the sales representatives:
SERVER=SJNTDWH1 USER=USERID PWD=PASSWORD
In this expression, USERID and PASSWORD are the userid and password
that ASNJET will use to connect to the source server (SJNTDWH1).
• On each target workstation, start ASNJET.
To do this, go to the DB2 command window, and type the following
command (for sales representative 1):
ASNJET AQSR0001 SJNTDWH1 NOTIFY MOBILE TRCFLOW
For sales representative 2, the command would be the same except for
the Apply Qualifier. Indicate AQSR0002 instead of AQSR0001.
The MOBILE parameter tells ASNJET that it must process all the eligible
subscription sets only once and then stop.
If you do not use the MOBILE parameter (NOMOBILE is the default value),
you will have to stop ASNJET yourself. Use one of the two possible
methods:
• Ctrl-Break
• Or the following command (for sales representative 1):
ASNJSTOP AQSR0001
9.5.4 Focus on Major Pitfalls

• Increase the value of the APPLHEAPSZ parameter in the Source
database configuration, to avoid an SQLCODE -954, SQLSTATE 57011
error: Not enough space is available in the application heap to process
the statement.
• Before starting ASNJET, you should close any row-replica table that you
previously opened and updated through Microsoft Access, so that the
updates you made can be replicated by ASNJET.
• When you are replicating from a join view (VCONTRACTS for example),
all the copied columns must come from only one of the tables referenced
in the join view. For example, the columns we defined in the target
CONTRACTS row-replica table all come from the source CONTRACTS
table. The other columns in the join view (the AGENCY column) can only
be referenced in the subscription predicate. This means we are not
allowed to include the AGENCY column in the target CONTRACTS table.
• Run ASNJET with the NOTIFY parameter so that detected conflicts are
reported in the conflict table in the Microsoft Access target database.

• Remember that the level of conflict detection is equivalent to standard,
and the CONFLICT_LEVEL associated with the source table registration
is ignored. But there are differences compared to a homogeneous (DB2 to
DB2) update-anywhere scenario:
• The conflict detection is performed at row level only (row by row, not
transaction by transaction).
• If a conflict is detected, the source always wins.
• If an insert or update collides with a delete, the delete wins regardless
of source or target. This is not considered as a conflict.
• Check that you have followed these restrictions:
• A row-replica table can only have one source, either a user table or a
registered view.
• A row-replica table cannot be a multi-site union.
• A row-replica table must be referenced by one and only one
subscription set.
• The SQL statements or CALL statements features are not supported.
• The source table primary key columns must be one of the following
data types: INTEGER, SMALLINT, DECIMAL, CHAR, VARCHAR,
DATE, TIME.
• Choose different Apply Qualifier names, subscription sets names, and
target database names, for all the targets. In this scenario we chose:
• AQSRxxxx for the Apply Qualifiers
• DBSRxxxx for the target databases
• CONTxxxx, CUSTxxxx, VEHIxxxx, and ACCIxxxx for the subscription
sets
In these expressions, xxxx is a number that uniquely identifies the target.
9.6 Replication Results for Sales Representative 1

For sales representative 1, we want to replicate only the data that is related to
AGENCY 25. Let’s have a look at the source tables and the Dprop control
tables just before we start the ASNJET program for the first time.
In fact, two laptops (for sales representatives 1 and 2) have been configured
at that time, both having the same subsetting predicate (AGENCY 25).
Capture is running on the source DB2 UDB database.

9.6.1 Contents of the Source Tables at the Beginning
The source tables (and views) contain the following data for AGENCY 25:
CUSTOMERS table:
db2 select CUSTNO, LNAME, FNAME, AGENCY, SALESREP from IWH.CUSTOMERS where
AGENCY = 25
CUSTNO LNAME FNAME AGENCY SALESREP

-------- -------------------- --------------- ----------- --------
00000001 SMITH John 25 250001.
00000003 HARRIS Simon 25 250002.
00000004 LENKE Christian 25 250001.
00000007 MUSSET Cecile 25 250001.
00000008 GOLDRING Rob 25 250001.
00000009 PURNELL Micks 25 250002.
00000014 LOSA Veronique 25 250001.
00000015 BARRON Elsa 25 250002.
00000017 YU Percy 25 250001.
00000018 LI Shu 25 250002.
VCONTRACTS view:
db2 select CONTRACT, CUSTNO, BASEFARE, CREDATE, AGENCY from IWH.VCONTRACTS
where AGENCY = 25
CONTRACT CUSTNO BASEFARE CREDATE AGENCY

----------- -------- --------- ---------- -----------
1 00000001 1250.00 05/25/1998 25
3 00000003 2500.00 09/13/1998 25
4 00000004 1250.00 10/26/1998 25
7 00000007 1250.00 12/08/1998 25
8 00000008 1000.00 12/12/1998 25
9 00000009 1250.00 12/25/1998 25
14 00000014 1000.00 02/05/1999 25
15 00000015 1250.00 02/08/1999 25
17 00000017 1250.00 02/14/1999 25
18 00000018 2500.00 02/15/1999 25
VVEHICLES view:
db2 select PLATENUM, CONTRACT, CUSTNO, BRAND, MODEL, AGENCY from

IWH.VVEHICLES where AGENCY = 25

PLATENUM CONTRACT CUSTNO BRAND MODEL AGENCY
------------ ---------- -------- ---------- -------- --------
CA-000000001 1 00000001 VOLVO 440 25
CA-000000003 3 00000003 RENAULT LAGUNA 25
CA-000000004 4 00000004 TOYOTA V2 25
CA-000000007 7 00000007 GM 1000 25
CA-000000008 8 00000008 MAZDA COROLA 25
CA-000000009 9 00000009 MERCEDES XTRA 25
CA-000000014 14 00000014 CHRYSLER VOYAGER 25
CA-000000015 15 00000015 JAGUAR XXS 25
CA-000000017 17 00000017 RENAULT SAFRANE 25
CA-000000018 18 00000018 MERCEDES 300 25
VACCIDENTS view:
db2 select * from IWH.VACCIDENTS where AGENCY = 25
CUSTNO ACCNUM TOWN REPAIRCOST STATUS ACCDATE AGENCY

-------- ------- --------------- ------------ ------ ---------- -----------
00000009 1. SAN JOSE 0.00 E 01/02/1999 25
00000018 1. SANTA CRUZ 7500.00 R 03/05/1999 25
9.6.2 Contents of the Main Control Tables at the Beginning

(Only the most interesting columns are shown below.)
ASN.IBMSNAP_REGISTER table:
SOURCE_ SOURCE_ CD_TABLE PHYS_CHANGE BEFORE_IMG CONFLICT PARTIT.

TABLE VIEW_QUAL _TABLE _PREFIX _LEVEL KEYS_CHG
---------- --------- ----------- ----------- ---------- -------- --------
CUSTOMERS 0 CDCUSTOMERS CDCUSTOMERS X 1 Y
CONTRACTS 0 CDCONTRACTS CDCONTRACTS X 1 N
VEHICLES 0 CDVEHICLES CDVEHICLES X 1 N
ACCIDENTS 0 CDACCIDENTS CDACCIDENTS X 1 N
VACCIDENTS 1 VACCIDENTSA CDCUSTOMERS - 0 N
VACCIDENTS 2 VACCIDENTSB CDACCIDENTS - 0 N
VVEHICLES 1 VVEHICLESA CDCUSTOMERS - 0 N
VVEHICLES 2 VVEHICLESB CDVEHICLES - 0 N
VCONTRACTS 1 VCONTRACTSA CDCUSTOMERS - 0 N
VCONTRACTS 2 VCONTRACTSB CDCONTRACTS - 0 N

Notice that there is 1 row in the REGISTER table for each source physical
table, and 2 rows for each view defined as a Replication Source. The
SOURCE_VIEW_QUAL column indicates whether the row is for a table or for
a view.
ASN.IBMSNAP_SUBS_SET table:
APPLY_QUAL SET_NAME WHOS_ON SOURCE_ SOURCE_ TARGET_ TARGET_
_FIRST SERVER ALIAS SERVER ALIAS
---------- --------- ------- -------- -------- -------- --------
AQSR0001 CUST0001 S SJNTDWH1 SJNTDWH1 MSJET DBSR0001
AQSR0001 CUST0001 F MSJET DBSR0001 SJNTDWH1 SJNTDWH1
AQSR0001 CONT0001 S SJNTDWH1 SJNTDWH1 MSJET DBSR0001
AQSR0001 CONT0001 F MSJET DBSR0001 SJNTDWH1 SJNTDWH1
AQSR0001 VEHI0001 S SJNTDWH1 SJNTDWH1 MSJET DBSR0001
AQSR0001 VEHI0001 F MSJET DBSR0001 SJNTDWH1 SJNTDWH1
AQSR0001 ACCI0001 S SJNTDWH1 SJNTDWH1 MSJET DBSR0001
AQSR0001 ACCI0001 F MSJET DBSR0001 SJNTDWH1 SJNTDWH1
AQSR0002 CUST0002 S SJNTDWH1 SJNTDWH1 MSJET DBSR0002
AQSR0002 CUST0002 F MSJET DBSR0002 SJNTDWH1 SJNTDWH1
AQSR0002 CONT0002 S SJNTDWH1 SJNTDWH1 MSJET DBSR0002
AQSR0002 CONT0002 F MSJET DBSR0002 SJNTDWH1 SJNTDWH1
AQSR0002 VEHI0002 S SJNTDWH1 SJNTDWH1 MSJET DBSR0002
AQSR0002 VEHI0002 F MSJET DBSR0002 SJNTDWH1 SJNTDWH1
AQSR0002 ACCI0002 S SJNTDWH1 SJNTDWH1 MSJET DBSR0002
AQSR0002 ACCI0002 F MSJET DBSR0002 SJNTDWH1 SJNTDWH1
In the SUBS_SET table, notice that there are 2 rows for each subscription
set. The row with a WHOS_ON_FIRST value of "F" represents the replication
from Microsoft Access towards DB2 UDB, and the row with a
WHOS_ON_FIRST value of "S" represents the replication from DB2 UDB
towards Microsoft Access. You can also notice that the MSJET string is used
as a generic database name for Microsoft Access databases, and the real
name of the Microsoft Access database is indicated in the SOURCE_ALIAS
and TARGET_ALIAS columns.
ASN.IBMSNAP_SUBS_MEMBR table:
APPLY_QUAL SET_NAME WHOS SOURCE_ SOURCE TARGET_ TARG._ PREDICATES
_ON_ TABLE _VIEW TABLE STRUCT
FIRST _QUAL
---------- --------- ----- ---------- ------ --------- ------ -------------
AQSR0001 CUST0001 S CUSTOMERS 0 CUSTOMERS 9 (AGENCY = 25)
AQSR0001 CUST0001 F CUSTOMERS 0 CUSTOMERS 1 -
AQSR0001 CONT0001 S VCONTRACTS 1 CONTRACTS 9 (AGENCY = 25)

AQSR0001 CONT0001 F CONTRACTS 0 CONTRACTS 1 -
AQSR0001 VEHI0001 S VVEHICLES 1 VEHICLES 9 (AGENCY = 25)
AQSR0001 VEHI0001 F VEHICLES 0 VEHICLES 1 -
AQSR0001 ACCI0001 S VACCIDENTS 1 ACCIDENTS 9 (AGENCY = 25)
AQSR0001 ACCI0001 F ACCIDENTS 0 ACCIDENTS 1 -
AQSR0002 CUST0002 S CUSTOMERS 0 CUSTOMERS 9 (AGENCY = 25)
AQSR0002 CUST0002 F CUSTOMERS 0 CUSTOMERS 1 -
AQSR0002 CONT0002 F CONTRACTS 0 CONTRACTS 1 -
AQSR0002 VEHI0002 F VEHICLES 0 VEHICLES 1 -
AQSR0002 ACCI0002 F ACCIDENTS 0 ACCIDENTS 1 -
In the SUBS_SET table, notice that the TARGET_STRUCTURE column has a

new value of 9 to indicate Microsoft Access row-replica tables. If you look at
the rows that deal with CONTRACTS, for example, you can also notice that
DJRA was clever enough to understand that the replication from DB2 UDB
towards Microsoft Access should use the view as source and the real table as
target, whereas the other way (Access towards DB2) should use the real
table as source and also the real table as target. The target is not a view. And
finally, notice that the subsetting predicate is only present for the DB2
towards Access replication.
ASN.IBMSNAP_PRUNCNTL table:
TARGET TARGET_ SYNCH SOURCE_ SOURCE APPLY_ SET_NAME CNTL_ TARG.

_SERVER TABLE POINT TABLE _VIEW QUAL SERVER STRUC
_QUAL
------- --------- ----- ---------- ------ -------- -------- -------- -----
MSJET CUSTOMERS - CUSTOMERS 0 AQSR0001 CUST0001 SJNTDWH1 9
MSJET CONTRACTS - VCONTRACTS 1 AQSR0001 CONT0001 SJNTDWH1 9
MSJET VEHICLES - VVEHICLES 1 AQSR0001 VEHI0001 SJNTDWH1 9
MSJET ACCIDENTS - VACCIDENTS 1 AQSR0001 ACCI0001 SJNTDWH1 9
MSJET CUSTOMERS - CUSTOMERS 0 AQSR0002 CUST0002 SJNTDWH1 9

ASN.IBMSNAP_SCHEMA_CHG table:
APPLY_QUAL SET_NAME LAST_CHANGED

---------- -------- --------------------------
AQSR0001 CONT0001 1999-03-17-11.26.24.906000
AQSR0001 ACCI0001 1999-03-17-11.28.50.343000
AQSR0001 VEHI0001 1999-03-17-11.30.40.250000
AQSR0001 CUST0001 1999-03-17-11.32.10.359001
AQSR0002 CUST0002 1999-03-17-15.00.29.953000
AQSR0002 CONT0002 1999-03-17-15.00.46.843001
AQSR0002 VEHI0002 1999-03-17-15.00.59.296001
AQSR0002 ACCI0002 1999-03-17-15.01.09.953000
ASN.IBMSNAP_SUBS_TGTS is empty.
ASN.IBMSNAP_TRACE table:
OPERATION DESCRIPTION
--------- ----------------------------------------------------------
INIT ASN0100I: The Capture program initialization is successful
PARM ASN0103I: The Capture program started with SERVER_NAME
SJNTDWH1; the START_TYPE is COLD00000000000084..
9.6.3 Start ASNJET to Perform the Initial Full-Refresh

Start ASNET to perform the initial full-refresh for sales representative 1 by
entering the following command from the laptop:
9.6.4 Results of the Initial Full-Refresh

Since ASNJET was started with the MOBILE option, it processed all the
subscriptions only once, and then it stopped.
It first created the target Microsoft Access database (DBSR0001), then it

created the target tables inside this target database, and performed the initial
full-refresh.

A first look at the content of the ASN.IBMSNAP_APPLYTRAIL table shows
that everything worked fine (STATUS = 0 and SET_INSERTED is correct):
(Not all the columns are shown below.)

APPLY_ SET_NAME WHOS MASS EFF SET_ STAT SYNCHPOINT SOURCE_ TARGET_
QUAL _ON_ DEL. MBR INS. SERVER SERVER
FIRST
-------- -------- ----- ----- --- ---- ---- ----------- -------- --------
AQSR0001 CUST0001 S Y 1 10 0 - SJNTDWH1 MSJET
AQSR0001 CUST0001 F N 0 0 0 x’30..3035’ MSJET SJNTDWH1
AQSR0001 CONT0001 S Y 1 10 0 - SJNTDWH1 MSJET
AQSR0001 CONT0001 F N 0 0 0 x’30..3039’ MSJET SJNTDWH1
AQSR0001 ACCI0001 S Y 1 2 0 - SJNTDWH1 MSJET
AQSR0001 ACCI0001 F N 0 0 0 x’30..3133’ MSJET SJNTDWH1
AQSR0001 VEHI0001 S Y 1 10 0 - SJNTDWH1 MSJET
AQSR0001 VEHI0001 F N 0 0 0 x’30..3137’ MSJET SJNTDWH1
Notice that the EFFECTIVE_MEMBERS column is equal to 0 for the

replication from Microsoft Access towards DB2 UDB.
When you look at the ASN.IBMSNAP_TRACE table, you can see that
Capture has been triggered by ASNJET to start capturing the updates for the
source tables (see the GOCAPT messages):
OPERATION DESCRIPTION
--------- ----------------------------------------------------------
INIT ASN0100I: The Capture program initialization is successful
PARM The Capture program started with SERVER_NAME SJNTDWH1; ...
GOCAPT Change Capture started for ... table name is CUSTOMERS ...
GOCAPT Change Capture started for ... table name is CONTRACTS ...
GOCAPT Change Capture started for ... table name is ACCIDENTS ...
GOCAPT Change Capture started for ... table name is VEHICLES ...
There are several GOCAPT messages for table CUSTOMERS because it is

used in several views registrations.
The ASN.IBMSNAP_PRUNCNTL table also shows that Capture has started

capturing the updates (the fields SYNCHTIME and SYNCHPOINT are no
longer NULL):

TARGET_ TARGET_ SYNCHTIME SYNCHPOINT SOURCE_ APPLY_ SET_NAME
SERVER TABLE TABLE QUAL
------- --------- ----------- --------------- ---------- -------- --------
MSJET CUSTOMERS 1999-03-.. x’0..08497B115’ CUSTOMERS AQSR0001 CUST0001
MSJET CONTRACTS 1999-03-.. x’0..08497C33B’ VCONTRACTS AQSR0001 CONT0001
MSJET CONTRACTS 1999-03-.. x’0..08497C4E9’ VCONTRACTS AQSR0001 CONT0001
MSJET VEHICLES 1999-03-.. x’0..08497EF7F’ VVEHICLES AQSR0001 VEHI0001
MSJET VEHICLES 1999-03-.. x’0..08497F13D’ VVEHICLES AQSR0001 VEHI0001
MSJET ACCIDENTS 1999-03-.. x’0..08497D95D’ VACCIDENTS AQSR0001 ACCI0001
MSJET ACCIDENTS 1999-03-.. x’0..08497DB0B’ VACCIDENTS AQSR0001 ACCI0001
MSJET CUSTOMERS - x’0..08497B115’ CUSTOMERS AQSR0002 CUST0002
MSJET CONTRACTS - x’0..08497C33B’ VCONTRACTS AQSR0002 CONT0002
MSJET CONTRACTS - x’0..08497C4E9’ VCONTRACTS AQSR0002 CONT0002
MSJET VEHICLES - x’0..08497EF7F’ VVEHICLES AQSR0002 VEHI0002
MSJET VEHICLES - x’0..08497F13D’ VVEHICLES AQSR0002 VEHI0002
MSJET ACCIDENTS - x’0..08497D95D’ VACCIDENTS AQSR0002 ACCI0002
MSJET ACCIDENTS - x’0..08497DB0B’ VACCIDENTS AQSR0002 ACCI0002
The ASN.IBMSNAP_SCHEMA_CHG table has also been updated. Only the

rows for sales representative 2 remain, all the others have been removed:
APPLY_QUAL SET_NAME LAST_CHANGED
------------------ ------------------ --------------------------
AQSR0002 CUST0002 1999-03-17-15.00.29.953000
AQSR0002 CONT0002 1999-03-17-15.00.46.843001
AQSR0002 VEHI0002 1999-03-17-15.00.59.296001
AQSR0002 ACCI0002 1999-03-17-15.01.09.953000
And the ASN.IBMSNAP_SUBS_TGTS, that was empty before the full-refresh,

now contains 4 rows:
APPLY_ SET_NAME WHOS TARGET_TABLE LAST_POSTED
QUAL _ON_
FIRST
-------- -------- ----- ------------ --------------------------
AQSR0001 CUST0001 S CUSTOMERS 1999-03-18-13.18.44.187000
AQSR0001 CONT0001 S CONTRACTS 1999-03-18-13.18.50.562001
AQSR0001 ACCI0001 S ACCIDENTS 1999-03-18-13.18.55.890001
AQSR0001 VEHI0001 S VEHICLES 1999-03-18-13.19.01.031001
When you look at the target side, you can see that in fact two Microsoft
Access databases were created by ASNJET (see Figure 83):

Figure 83. Microsoft Access Databases Created by ASNJET
DBSR0001.mdb is the real target database.
IBMSNAP_DUMMY_DBSR0001.mdb is a dummy database. It needs to be

there, but you do not need to be concerned about it!
The target database (DBSR0001) contains the four target tables, plus some
complementary control tables (see Figure 84):

Figure 84. Tables in the Target Database DBSR0001
The complementary control tables are the following:

• IBMSNAP_ERROR_INFO: If an error had occurred, this table would
contain additional error information to identify the row-replica table and the
row that caused the error. This table is empty now.
• IBMSNAP_ERROR_MESSAGE: If an error had occurred, this table would
contain the error codes and error messages. This table is empty now.
• IBMSNAP_GUID_KEY: Maps Microsoft Jet table identifiers and row
identifiers to primary key values. Now it contains exactly 32 rows:
10 corresponding to CUSTOMERS, 10 corresponding to CONTRACTS,
10 corresponding to VEHICLES, plus 2 that correspond to the 2 rows in
ACCIDENTS.
• IBMSNAP_S_GENERATION: Synchronization generations table used to
prevent cyclic updates from replicating back to the DB2 UDB database. It
contains only one row.
• IBMSNAP_SIDE_INFORMATION: If conflicts are detected, this table
contains the names of the conflict tables. This table is empty now.

• And one IBMSNAP_target_table_CONFLICT table associated to each
target table. If conflicts are detected, these tables contain the rejected
updates, that is, the updates that were done on the Microsoft Access side,
but that were rejected because they were in conflict with updates from the
DB2 UDB database. These tables are all empty now.
Now, open each target table to check that the content is equivalent to that of
the corresponding source table, according to the subsetting predicate
(Agency = 25). For example, the content of the target CONTRACTS table is
the following (see Figure 85):
Figure 85. Content of the CONTRACTS Table
9.6.5 Replicating Updates

Since we have successfully performed the initial full-refresh, we should now
check that updates (including deletes, inserts, updates) are correctly
replicated both ways. We will do this in a 3-step process:
1. Update a source DB2 UDB table, start ASNJET, and check that the update
is replicated towards Microsoft Access.
2. Update a row-replica table, start ASNJET, and check that the update is
replicated towards DB2 UDB.
3. Update a row in a source DB2 UDB table, update the same row in the
corresponding row-replica table, start ASNJET, and check how ASNJET
processes the conflict.

9.6.5.1 Replication from DB2 UDB towards Microsoft Access
Use the following command, for example, to update the CONTRACTS table in
DB2 UDB:
db2 update IWH.CONTRACTS set taxes = 500 where contract = 14
Before starting ASNJET, check that Capture has had the time to capture the
update. For example, you can perform a SELECT over the Change Data table
(IWH.CDCONTRACTS) that is associated with the CONTRACTS table.
Then start ASNJET, using the same command as before:

After ASNJET has stopped, enter Microsoft Access and open the
CONTRACTS table. The TAXES column now contains a value of 500 for
contract number 14. If you query the ASN.IBMSNAP_APPLYTRAIL table
(on the DB2 UDB side), you can also see that ASNJET has added the
following rows:
APPLY SET_NAME WHOS MASS EFF SET SET SET SOURCE TARGET
QUAL _ON_ DEL. MBR INS DEL UPD SERVER SERVER
FIRST
-------- -------- ----- ---- --- --- --- --- -------- --------
AQSR0001 CUST0001 F N 0 0 0 0 MSJET SJNTDWH1
AQSR0001 CONT0001 S N 1 0 0 1 SJNTDWH1 MSJET
AQSR0001 CONT0001 F N 0 0 0 0 MSJET SJNTDWH1
AQSR0001 ACCI0001 S N 0 0 0 0 SJNTDWH1 MSJET
AQSR0001 ACCI0001 F N 0 0 0 0 MSJET SJNTDWH1
AQSR0001 VEHI0001 S N 0 0 0 0 SJNTDWH1 MSJET
AQSR0001 VEHI0001 F N 0 0 0 0 MSJET SJNTDWH1
9.6.5.2 Replication from Microsoft Access towards DB2 UDB

Enter Microsoft Access and open the CONTRACTS table. Update the TAXES
value for contract number 8. Set the TAXES value to 800. Then close the
Microsoft Access CONTRACTS table.
Then start ASNJET, using the same command as before:

After ASNJET has stopped, query the CONTRACTS table in DB2 UDB. The
TAXES column now contains a value of 800 for Contract Number 8. If you
query the ASN.IBMSNAP_APPLYTRAIL, you can also see that ASNJET has
added the following rows:

APPLY SET_NAME WHOS MASS EFF SET SET SET SOURCE TARGET
QUAL _ON_ DEL. MBR INS DEL UPD SERVER SERVER
FIRST
-------- -------- ----- ---- --- --- --- --- -------- --------
AQSR0001 CUST0001 S N 0 0 0 0 SJNTDWH1 MSJET
AQSR0001 CUST0001 F N 0 0 0 0 MSJET SJNTDWH1
AQSR0001 CONT0001 S N 0 0 0 0 SJNTDWH1 MSJET
AQSR0001 CONT0001 F N 1 0 0 1 MSJET SJNTDWH1
AQSR0001 ACCI0001 S N 0 0 0 0 SJNTDWH1 MSJET
AQSR0001 ACCI0001 F N 0 0 0 0 MSJET SJNTDWH1
AQSR0001 VEHI0001 S N 0 0 0 0 SJNTDWH1 MSJET
AQSR0001 VEHI0001 F N 0 0 0 0 MSJET SJNTDWH1
9.6.5.3 Conflict Detection

Now we will update the same row on both sides and see how ASNJET
behaves and how it reports the conflict.
Update the CONTRACTS table in DB2 UDB:

db2 update IWH.CONTRACTS set taxes = 2000 where contract = 17
(We put 2000 instead of 100)
Update the CONTRACTS table in Microsoft Access. For contract 17, change
the BASEFARE column from 1250 to 5000. Then close the Microsoft Access
table.
We now have:
• 17 - 1250 - 2000 in DB2 UDB
• 17 - 5000 - 100 in Microsoft Access
Check that Capture has captured the update on the DB2 UDB side (query the
Change Data table: IWH.CDCONTRACTS).
Then restart ASNJET.
When ASNJET has ended, check the content of the CONTRACTS tables. We
now have:
• 17 - 1250 - 2000 in DB2 UDB (unchanged)
• 17 - 1250 - 2000 In Microsoft Access
So we can see that the DB2 update has won the conflict and both databases
are left in a consistent state. In the Microsoft Access database, look at the
IBMSNAP_IWH_CONFLICT_CONTRACTS table. It contains one row that

shows that one update was rejected because of the conflict detection (see
Figure 86).
Figure 86. Content of the Conflict Table for Contracts
9.6.5.4 Data Integrity Considerations

Within a network of DB2 databases, DB2 DataPropagator supports an
update-anywhere model that is able to detect transaction conflicts. ASNJET
supports an update-anywhere model, but with weaker row-conflict detection
(similar to the standard Microsoft Jet replication model). As we have seen in
the preceding section, ASNJET reports synchronization conflicts in conflict
tables in a very similar way to the built-in Microsoft Jet replication feature.
This process can result in a loss of updates. Synchronization conflicts are
handled on a row-by-row basis, so some updates might be flagged as
conflicting while other updates replicate to the DB2 database. If this situation
is not acceptable, you need to program your own resolutions for all potential
update conflicts.
9.7 Operational Aspects

In this section we will study only the operational aspects that are specific to
the DB2 / Microsoft Access replication scenario. The operational aspects that
are not specific to this scenario are described in the first part of this book.
9.7.1 Operational Implications

Following is a discussion of operational and administrative topics that are
important during the production phase of the new environment.
9.7.1.1 Network
Any kinds of networks (for example, WAN, LAN, and phone lines) are suitable
for DataPropagator for Microsoft Jet. But DataPropagator for Microsoft Jet will
be most often used to replicate data towards occasionally connected
workstations, using phone lines. This means that special attention must be

paid to the subsetting predicates, to limit the number of rows that each
workstation will receive. This is particularly important for the initial full-refresh,
that might last several hours for large tables if no subsetting is defined. The
same is true after the initial full-refresh, if some jobs (monthly jobs for
example) update most rows of a large source table.
Important: ASNJET does not automatically establish the communication link

between the target and the source. It does not dial any phone number itself,
for example. You must establish the communication before ASNJET is
started, and you must end the communication after ASNJET has ended. This
can be achieved using any communications software.
9.7.1.2 Security
The general database security considerations apply. In addition, a password
file must be defined on each target workstation, named the following way:
Apply_Qualifier.PWD
This file contains the userids and passwords that are used by ASNJET when
it connects to the source server and to the control server. If the control server
is co-located with the source server, the password file contains only one row.
9.7.1.3 Scheduling
ASNJET can be started with either the MOBILE parameter or with the
NOMOBILE (which is the default) parameter:
• With the NOMOBILE parameter, the general scheduling considerations of
any replication scenario apply. This means that the subscription sets can
be processed either according to a timing frequency or according to the
arrival of specific events, or both. In this mode, ASNJET does not stop
automatically, and so the user must stop it when he so wishes.
• With the MOBILE parameter, ASNJET does not really take care about the
timing frequency that is defined in the control tables. It processes all the
eligible subscription sets only once, and then it stops automatically . This
is probably the option that will be chosen most often, especially if the
Target workstations are occasionally-connected laptops.
9.7.1.4 Locking
The general locking considerations of any replication scenario apply here.
Additionally, the user should close any Microsoft Access table he has been
updating, before starting the ASNJET program.

9.7.1.5 Recovery
In case of severe problems on the laptop, such as the accidental loss of one
of the target tables, it is very easy to recover. Simply delete the target
database file (DBSR0001.mdb in our scenario) and the dummy database file
(IBMSNAP_DUMMY_DBSR0001.mdb in our scenario). ASNJET will
automatically recreate everything the next time it is run.
9.7.1.6 Administration and Maintenance

Locate all the DProp control tables at the source server so that you can easily
check, from a single point, whether everything is working fine, whether each
workstation is getting its data and when, and so on.
Furthermore, since the process to define the source and target tables does
not require any connection to the target Microsoft Access databases, all the
setup can be prepared even before the target workstations are configured.
ASNJET contains a part of the administration functions. This particularity

enables ASNJET to:
• Automatically create the target Microsoft Access databases and tables on
the laptop.
• Automatically maintain the structure of the target tables according to the
definitions contained in the control tables. For example, if a column must
be added to the structure of a source table, you use DJRA to add the new
column’s definition into the replication control tables, and ASNJET will
automatically add the new column to the target the next time it is run. The
same mechanism also applies if you remove a subscription set from the
DProp control tables. ASNJET will automatically drop the corresponding
target Microsoft Access tables.
9.7.2 Monitoring and Problem Determination

The first place to look to check if everything is running fine, and if all the
target workstations have received (and transmitted) their data, is the
ASN.IBMSNAP_SUBS_SET table, in the DB2 UDB database.
In this table there are four columns that you should check:
• ACTIVATE: Indicates whether a subscription set is active (value 1) or not
(value 0). If it is not active (value 0), it is probably because you decided
that this subscription set should not be processed. So, in fact, you only
need to check the values of the three other columns listed below, for the
rows that have the ACTIVATE column equal to 1.

• STATUS: This should be 0 for all the active subscription sets. If the value
is -1 for some rows, it means that these subscription sets had errors.
Please read the following section to know what to do next.
• LASTRUN: This indicates the time when the subscription set was
processed for the last time. Check that it is recent for all the active
subscription sets. If some appear to be old, it means that some target
workstations have not started their ASNJET program recently. Perhaps
you should contact the corresponding users to know why they did not start
the replication.
• LASTSUCCESS: This indicates the time when the subscription set was
successfully processed for the last time. Check that it is recent and equal
to the LASTRUN value. If it is not equal to LASTRUN, this means that at
least the last processing was not successful.
Now that you have looked at the ASN.IBMSNAP_SUBS_SET table, you know
which subscription sets are OK, and which ones are not.
For those that have a problem, you must now determine what went wrong. To
do this, first have a look at the ASN.IBMSNAP_APPLYTRAIL table. In most
cases you will find helpful information there. The most interesting columns to
look at are:
• SQLCODE: This gives the SQL error code. Look at the DB2 reference
documentation to retrieve the description of SQLCODEs.
• SQLSTATE: This gives the SQLSTATE code. Look at the DB2 reference
documentation to retrieve the description of SQLSTATEs.
• APPERRM: This gives the error message.
Please make sure that you are only checking the rows that correspond to the
LASTRUN time, and not older rows.
Sometimes the information from the ASN.IBMSNAP_APPLYTRAIL table will

not be enough to directly understand the cause of the error. There are also
some cases when errors are not reported at all in the
ASN.IBMSNAP_APPLYTRAIL table. For example, if ASNJET encountered a
severe error before it had the time to write its information to the
ASN.IBMSNAP_APPLYTRAIL table. That is why it is so important to check
the content of the ASN.IBMSNAP_SUBS_SET table first.
In these cases where the APPLYTRAIL information is not enough to

determine the cause of errors, you should restart ASNJET with the
TRCFLOW parameter, like this:
ASNJET AQSR0001 SJNTDWH1 NOTIFY MOBILE TRCFLOW > TRACE.TRC

This will create a file called TRACE.TRC, containing all the details about what
ASNJET has been doing. In some cases you will find, for example, that the
error is an ODBC error, and you should then:
• Refer to the ODBC documentation to retrieve the meaning of the ODBC
error message.
• Check whether you have the latest level of ODBC driver.
On the Capture side, there are not many things to check:
Just have a look at the ASN.IBMSNAP_TRACE table to see if there are error
messages. If you have the feeling that updates were not captured, you should
check that the IBMSNAP_TRACE table contains a GOCAPT message for the
source table.
And you can also of course start Capture with a trace (be careful, the
parameter is TRACE, not TRCFLOW).
On the Target side you can also find useful error information in the following
Microsoft Access tables:
• IBMSNAP_ERROR_MESSAGE: This contains the error codes and error
messages.
• IBMSNAP_ERROR_INFO: This contains error information that helps
identify the row-replica table and the row that caused the error.
If you think that some updates should have been replicated from the
Microsoft Access tables towards the DB2 tables, and were not, it is probably
because a conflict was detected, and you must have a look at the two
following tables in the Microsoft Access database:
• IBMSNAP_SIDE_INFORMATION: This contains the names of the conflict
tables.
• IBMSNAP_target_table_CONFLICT: This contains the rejected updates.
9.8 Benefits of this Solution

There are several advantages in using DataPropagator for Microsoft Jet:
• Mobile professionals such as sales representatives can connect to their
corporate network occasionally, and start ASNJET to automatically receive
a subset of the corporate data, and to transmit their own updates to the
head-office. After the replication process has ended, they disconnect from
the network and use their laptop applications with the data that is stored
locally.

• The process to define the source and target tables does not require any
connection to the target Microsoft Access databases. This enables an
easy centralized management of the whole setup.
• The target Microsoft Access databases and tables can be created
automatically on the laptops when ASNJET is run for the first time.
• If the structure of the source tables must be changed, to add a new
column to a table, for example, you only need to update the replication
control tables, using DJRA. ASNJET will automatically add the new
column the next time it is run. The same mechanism also applies if you
remove a subscription set from the DProp control tables. ASNJET will
automatically drop the corresponding target tables.
• In case of severe problems on the laptop, such as the accidental loss of
one of the target tables, it is very easy to recover. Simply delete the target
database file on the laptop (DBSR0001.mdb in our scenario), delete the
dummy database file (IBMSNAP_DUMMY_DBSR0001.mdb in our
scenario) and ASNJET will automatically recreate everything the next time
it is run.
9.8.1 Other Configuration Options

There are not many configuration options for this scenario:
• The control server must reside in a DB2 database, so it is convenient to
co-locate it with the source server. Theoretically, any other DB2 server
could be used, provided that it is accessible from the target workstations.
But if the target workstations are laptops that use phone lines to access
the source DB2 database, you must locate the control server in the source
server.
• You can define the target database and the target tables yourself, but
since ASNJET is able to create them for you, why not use this facility?
9.9 Summary
In this scenario we have illustrated the following capabilities of the DB2
DataPropagator for Microsoft Jet component (ASNJET):
• Update-anywhere replication between DB2 and Microsoft Access, in an
occasionally-connected, mobile environment. We have seen that to
achieve this goal, the ASNJET program uses both the push and the pull
modes of replication.

• Data subsetting, meaning that each target user (sales representative)
receives only the rows that are of interest for him. We have also discussed
the following related topics:
• Replication from join views defined as replication sources. The purpose
of the views was to add the column involved in the rows-subsetting
criterion (AGENCY) to the replication sources definition.
• The use of a partitioning column (Agency) led us to ask Capture to
capture the updates to the CUSTOMERS table as delete-and-insert
pairs, to enable a customer to move from one agency to another.
• And we also discussed the double-delete issue, which must be studied
each time join views are used as replication sources.
• Conflict detection, which is the process that enables ASNJET to deal with
the cases when the same rows are updated in DB2 tables and the
corresponding Microsoft Access tables, during the same replication cycle.
In the implementation checklist, we have also seen the following interesting

points:
• The implementation does not require the installation of the complete
DataJoiner product. Only the ASNJET and DJRA components are required
from the DataJoiner installation CD-ROM.
• The process to define the source and target tables does not require any
connection to the target Microsoft Access databases, so that it is possible
to define the whole content of the control tables even if the target
workstations have not yet been configured.
• The target Microsoft Access databases and tables do not need to be
created before starting the replication, because ASNJET is able to create
them automatically when it is started for the first time.
Among the operational aspects, we have also seen a very important aspect:
• You can very easily recover from any important loss of data in the target
Microsoft Access tables. Simply delete the target database files and
ASNJET will automatically recreate the tables the next time it is run.
So this replication solution perfectly fits the needs of people who want to
exchange data between geographically dispersed micro-computers,
equipped with Microsoft Access, and a central DB2 server.

Appendix A. Index to Data Replication Tips, Tricks, Techniques
This Appendix contains a table (Table 13) that points you to all the tips, tricks,
and smart techniques described within this redbook. It provides a quick and
easy way to find a certain technique in the book.
Table 13. Index to Data Replication Tips, Tricks, and Techniques
Tip, Trick, Technique Where to look
Defining read-only copies with Referential 3.3.1.2, “Read-only Copies with

Integrity constraints Referential Integrity” on page 50
Creating an event generator 3.3.2.3, “Advanced Event Based

Scheduling” on page 53
Listing server mappings “Step 9: Create Server Mappings for All

the Non-IBM Databases” on page 69
Listing user mappings “Step 11: Create User Mappings” on

page 70
Listing nicknames “Step 19: Create the Control Tables at

Replication Source Servers” on page 74
Understanding how an automatic full 5.2.2.1, “Initial Refresh Maintained by

refresh is performed Apply” on page 86
How to perform a manual refresh or 5.2.2.3, “Manual Refresh / Off-line Load”

off-line load on page 89
How to prune CCD tables (including 5.3.2.2, “Pruning of CCD Tables” on page
internal CCD’s) 92
How to automatically prune the Apply Trail 5.3.2.3, “Pruning of the APPLYTRAIL
table Table” on page 93
Deleting AS/400 Journal receivers 5.3.2.4, “Journals Management on

AS/400” on page 95
Checking to see if the Capture process is 5.4.3.1, “Monitoring the Capture Process”
running on page 101
How to check for Capture errors 5.4.3.2, “Detecting Capture Errors” on

page 102
How to determine the current Capture lag 5.4.3.3, “Capture Lag” on page 102
How to resolve a gap with a Capture cold , “Resolving the Gap with a Capture COLD
start Start” on page 104

How to resolve a gap without a Capture , “Resolving the Gap Manually” on page
cold start 104
Checking to see if the Apply process is 5.4.4.1, “Monitoring Apply Processes” on

running page 106
Monitoring the status of a subscription set 5.4.4.2, “Monitoring the Subscription

Status” on page 106
Monitoring subscription set latency 5.4.4.4, “Monitoring Subscription Set

Latency” on page 108
Adding customized logic to a subscription 5.4.4.7, “Utilizing Apply’s ASNDONE User

(ASNDONE) Exit” on page 111
Monitoring DataJoiner 5.4.5, “Monitoring the Database

Middleware Server” on page 116
Tuning replication performance 5.5, “Tuning Replication Performance” on

page 117
How to defer pruning for multi-vendor 5.5.13.2, “How to Defer Pruning for
replication sources Multi-Vendor Sources” on page 127
How to deactivate subscription sets 5.6.1, “Deactivating Subscription Sets” on

page 129
How to disable full refresh for all 5.6.2.1, “Disable Full Refresh for All
subscriptions Subscriptions” on page 129
How to disable full refresh for certain 5.6.2.2, “Allow Full Refresh for Certain
subscriptions Subscriptions” on page 130
Forcing a full refresh 5.6.3, “Full Refresh on Demand” on page

132
Dropping Capture triggers 5.6.4, “Dropping Unnecessary Capture

Triggers for Non-IBM Sources” on page
133
Changing the Apply Qualifier or set name 5.6.6, “Changing Apply Qualifier or Set
for a subscription set Name for a Subscription Set” on page 134
Using SPUFI on OS/390 to access 6.4, “Nice Side Effect: Using SPUFI to
non-IBM databases Access Multi-Vendor Data” on page 158
Invoking stored procedures at a remote, 7.2.2.3, “Invoking Stored Procedures at

non-IBM target server the Target Database” on page 185
How to maintain a change history (CCD) 8.4.2, “Maintaining a Change History for
table in a non-IBM target Suppliers” on page 220

How to denormalize data using target-site 8.4.3, “Using Target Site Views to
views Denormalize Outlet Information” on page
228
How to denormalize data using 8.4.4, “Using Source Site Joins to

source-site views Denormalize Product Information” on
page 237
How to only replicate certain SQL 8.4.3, “Using Target Site Views to
operations Denormalize Outlet Information” on page
228
Using Point-in-Time target tables to 8.4.3, “Using Target Site Views to

maintain historic information Denormalize Outlet Information” on page
228
How to maintain temporal histories using 8.4.6, “Adding Temporal History

DProp Information to Target Tables” on page 250
How to maintain base aggregate 8.4.7, “Maintaining Aggregate

information from a change aggregate Information” on page 256
subscription
How to push down the replication status to 8.4.8, “Pushing Down the Replication
non-IBM targets Status to Oracle” on page 259
How to load data from a DB2 for OS/390 8.4.9.1, “Using SQL INSERT....SELECT....
source to an Oracle target by using from DataJoiner” on page 262
DataJoiner’s INSERT...SELECT...
How to load data from a DB2 for OS/390 8.4.9.2, “Using DataJoiner’s
source to an Oracle target by using EXPORT/IMPORT Utilities” on page 263
DataJoiner’s EXPORT/IMPORT utilities
How to load data from a DB2 for OS/390 8.4.9.3, “Using DSNTIAUL and Oracle’s
source to an Oracle target by using SQL*Loader Utility” on page 264
DSNTIAUL and Oracle’s SQL*Loader.
Dealing with the double delete issue when 9.1.2, “Comments about the Table
replicating join views Structures” on page 273
Index to Data Replication Tips, Tricks, Techniques 323

Appendix B. Non-IBM Database Stuff
In this appendix we document some useful hints and tips on multi-vendor

databases which were discovered during the writing of this book. It can be
used as a quick reference for performing some simple tasks.
For full information about configuring the non-IBM clients and databases,
always refer to the documentation for that particular database software.
B.1 Oracle Stuff
B.1.1 Configuring Oracle Connectivity

The Oracle client really comes in two components—SQL*Net or Net8 and
SQL*Plus. The basic network connectivity is provided by SQL*Net for Oracle
version 7 and net8 for Oracle8. SQL*Plus sits ontop of the basic network
connectivity component and provides a command line interpreter (similar to
the DB2 Client Application Enabler). We advise you to install both of these
components when working with Oracle.
Connectivity from Oracle server to client is established by creating the

tnsnames.ora file. This file contains the necessary information for the client to
access the Oracle server and is equivalent to the DB2 node and database
directories. It is a simple text file which is usually created using the Oracle
utilities (for Oracle8, use the Net8 Assistant to configure connectivity and
create the tnsnames.ora file).
Advice: Many Oracle DBA’s have copies of the tnsnames.ora files used within
their organizations. Ask the DBA for permission to copy this pre-configured
file to your workstation.
For more information about configuring Oracle clients, see the Oracle Net8
Administrator’s Guide, A58230-01.
B.1.2 Using Oracle’s SQL*Plus

Once the client has been configured, the connection from client to Oracle
server can be tested using SQL*Plus. To connect to the Oracle server, type
the following on the operating system command line:
sqlplus username/password@servicename
In this command, servicename is the entry in the tnsnames.ora file which

identifies the Oracle server.

For example:
sqlplus scott/tiger@HQ
Scott is the sample userid provided with Oracle, and tiger is Scott’s
password. If this userid has been revoked, then contact the Oracle DBA for
valid userid and password.
When logged on, the SQL*Plus prompt will be SQLPLUS>.
Here are a few useful tips once you have logged onto the Oracle server:
• End all SQL*Plus commands with a semicolon ( ; )
• To find the structure of an Oracle table use this command:
DESCRIBE <tablename>;
• To find out who you are logged onto Oracle as, issue the command:
SELECT * FROM USER_USERS
• To invoke SQL*Plus and use a file as input, use the command:
sqlplus user/pwd@orainst @<input_file>
Put a quit; at the end of the input_file and SQL*Plus stops when finished.
• Use spool <out_filename>; to dump output to an output file, and spool off;
to stop dumping the output to a file.
• Use COMMIT; to commit the changes. There is no auto-commit.
B.1.3 The Oracle Data Dictionary

The Oracle data dictionary is divided into three sets of views. Those views
which start with:
USER_ contain information about objects owned by the current user.
ALL_ contain all information from USER_ views, plus objects to which the
user has been granted privileges on.
DBA_ contain information on all objects within the database.
Table 14 provides details on some of the more useful Oracle data dictionary
views.

Table 14. Useful Oracle Data Dictionary Tables
Table Name Columns Description Synonym
USER_TABLES Table_Name, All tables owned by Tabs

Tablespace_Name, the user
Num_Rows,...
USER_OBJECTS Object_Name, All database Obj

Object_Type, objects owned by
Status,...... user.
USER_TAB_COLUMNS Table_Name, All columns in Cols

Column_Name, tables owned by
Data_Type, user.
Data_Length,
Data_Precision,
Data_Scale,...
B.1.4 Oracle Error Messages

To get details for Oracle error codes, use the oerr facility. From the operating
system command line, type:
oerr yyy xxx
In this command:
yyy is the three letter error code prefix
xxx is the sql return code.
For example, if the error was ORA12154, then type:

oerr ora 12154
Advice: If your path does not include $ORACLE_HOME/bin, change to directory

$ORACLE_HOME/bin before issuing the oerr command.
B.1.5 Oracle Server

The Oracle server manager can be used to start and stop the Oracle server
and to perform other administration functions. The server manager can be
started in either line or menu mode. Use:
svrmgrl - to start the server manager in line mode
svrmgrm - to start the server manager in menu mode
To start Oracle, issue the startup command from the server manager, to stop
Oracle use the shutdown command. Before issuing these commands you
usually have to issue the connect internal command. For more information
Non-IBM Database Stuff 327

about using the Oracle server manager, see the Oracle8 Administrator’s
Guide, A58397-01 .
B.1.6 Oracle Listener

The Oracle listener is the process which allows remote client to connect to
the server. The listener must be running on the server before clients can
connect to the Oracle database. If the Oracle database is running, but you
are still unable to connect, ensure that the listener service has been started
on the server. The listener can be started using the following command:
lsnrctl start
To obtain a list of other possible listener command parameters, type:

lsnrctl help
B.1.7 Other Useful Oracle Tools

TNSPING and TRCROUTE are two very useful Oracle tools to aid in debugging
connectivity problems.
TNSPING is similar to the TCP/IP ping command, except that it pings the Oracle
database to see if basic database connectivity is working. For example, if
your Oracle server is named AZOV, then type the following from the operating
system command line:
tnsping azov
The Trace Route Utility ( TRCROUTE) allows you to discover what path or route a
connection is taking from a client to a server. If a problem is encountered,
TRCROUTE returns an error stack to the client, which makes troubleshooting
easier. For information on how to use the TRCROUTE utility, see the Oracle8
Administrator’s Guide, A58397-01 .
B.1.8 More Information

The Oracle homepage on the Web is at:
http://www.oracle.com

B.2 Informix Stuff
B.2.1 Configuring Informix Connectivity

The Informix client is configured by updating the sqlhosts file (similar to the
DB2 node and database directories). On UNIX, the sqlhosts file resides, by
default, in the $INFORMIXDIR/etc directory. As an alternative, you can set the
INFORMIXSQLHOSTS environment variable to the full pathname and filename of a
file that contains the sqlhosts file information.
You can enter information in the sqlhosts file by using a standard text editor (copy
a sample from $INFORMIXDIR/etc/sqlhosts.std). The table-like structure of the file
is shown in the example below:
dbservername nettype hostname port options
sjazov_ifx01 onsoctcp azov 2800
sjstar_ifx01 onsoctcp azov 2801
sjsky_ifx01 onsoctcp sky 2810
Advice: Like the Oracle tnsnames.ora file, many Informix DBA’s will have a
copy of this file customized for use within their organization. If you ask them
nicely, I am sure they will allow you to copy the file to your Informix client.
B.2.2 Using Informix’s dbaccess

Once the sqlhosts file has been configured, use Informix’s client interface
dbaccess to connect from the client to the Informix server. The Informix client
program dbaccess can operate in a number of different modes depending on
how it is invoked. Use:
dbaccess to start dbaccess in interactive (menu) mode.
dbaccess <database_name> to start dbaccess in command line mode and
connect to the database specified. The dash ( - ) at the end is required.
Once logged onto dbaccess, the prompt will be >. End all dbaccess
commands and SQL with a semi-colon ( ; ).
dbaccess <database_name> <file_name>.sql to execute commands and SQL
statements from an input file. Be aware that dbaccess only accepts input
from files with the .sql extension.
dbaccess -e <database_name> <file_name>.sql > <out_file.out> 2>&1 to
execute commands and SQL statements from an input file, redirecting the
output to an output file. The -e option echoes the output to standard
output.

Advice: There is no auto-commit on/off switch in dbaccess. All statements in
an SQL file are automatically committed, unless you explicitly open a
transaction, using the BEGIN WORK; statement. To end the transaction, use
either COMMIT; or ROLLBACK;.
For example:
BEGIN WORK;
INSERT INTO.... VALUES (...);
COMMIT;
B.2.3 Informix Error Messages

To find out the meaning of Informix error codes, use the finderr facility. From
the operating system command line, type:
finderr <msgnum>
Advice: If your path does not include $INFORMIXDIR/bin, change to directory

$INFORMIXDIR/bin before issuing the finderr command.

The Informix home page is at:
http://www.informix.com
Informix on-line manuals can be downloaded from the Web.
B.3 Microsoft SQL Server Stuff
B.3.1 Configuring Microsoft SQL Server Connectivity

Microsoft SQL Server uses ODBC as its native database connectivity
protocol. ODBC drivers are automatically installed on your Windows
workstation when you install SQL Server client utilities. However, you will still
need to create specific Data Source Names (DSN’s) after the ODBC driver
has been installed. Use the ODBC Data Source Administrator tool, which can
be found in the Windows NT Control Panel, to configure a DSN. When
configuring the DSN, be sure to configure it as a System DSN.

B.3.2 Using the Microsoft Client OSQL
Microsoft supplies a command line utility called OSQL with SQL Server 7.
This provides the same functionality as ISQL (which was supplied with SQL
Server 6.5 and used DB-Library), but uses ODBC as the underlying database
communication protocol. You can use OSQL to run Transact-SQL statements,
system procedures and script files.
The syntax for invoking OSQL is summarized in the following statement:

osql -Y -S<server> -D<database> -U<user> -P<pwd> -c<termchar>
-i<inputfile> -p<outputfile>
The invocation parameters for OSQL are detailed in Table 15.

Table 15. Invocation Parameters for OSQL
Parameter Meaning
-Y Changed transaction - autocommit off
-S Microsoft server to connect to
-D Database at server <server>
-U SQL Server logon
-P SQL Server password
-c Overwrite the SQL Server default command termination character (the

default termination ’character’ is ’go’); example: -c";"
To change databases in SQL Server, specify:

use <database_name>
Smart Advice: To execute commands in SQL Server, type go after entering

the command in OSQL.
Microsoft also provides a graphical user interface called the SQL Server
Query Analyzer.
B.3.3 Microsoft SQL Server Data Dictionary

Table 16 summarizes some of the more useful Microsoft SQL Server data
dictionary tables. These data dictionary tables are owned by user dbo.

Table 16. Useful SQL Server Data Dictionary Tables
Table Name Columns Description
sysobjects name, type Information on all objects in

the database.
syscolumns name, type, length All columns in tables, views

and arguments in stored
procedures.
sysusers name Specifies who can use the

database.
B.3.4 Helpful SQL Server Stored Procedures

Microsoft SQL Server makes extensive use of stored procedures. Table 17
below lists some of the more useful stored procedures.
Table 17. Microsoft SQL Server Stored Procedures
Stored Procedure name Description
sp_help Lists the objects in the database.
sp_help <table_name> Gives the table structure for

<table_name>.
sp_helpdb Provides information about the databases

on the server.
B.3.5 Microsoft SQL Server Error Messages

To display a description for an error code, use OSQL to logon to a Microsoft
SQL Server, and then type:
use master
go
select error, description from sysmessages where error=<errorno>
go
B.3.6 Microsoft SQL Server Administration

Microsoft SQL Server can be started from a command prompt by typing one
of the following:
net start mssqlserver
sqlservr
net start SQLServerAgent

The server can be shutdown using the SHUTDOWN Transact-SQL statement
which can be issued from any query tool capable of issuing Transact-SQL.
Use SHUTDOWN NOWAIT to immediately stop SQL Server without waiting for
transactions to complete (but recovery time will be increased).
B.3.7 ODBCPing
This utility checks database connectivity from client to Microsoft SQL Server
databases accessed via ODBC. The syntax of the command is:
ODBCPING [-S Server | -D DSN] [-U login id] [-P Password]

The Microsoft home page is at:
http://www.microsoft.com
The homepage for SQL Server is at:

http://www.microsoft.com/backoffice/sql/default.htm
B.4 Sybase SQL Server Stuff

Even though we did not provide a Sybase case study, we want to share some
information on how to use the Sybase client with you.
B.4.1 Configuring Sybase SQL Server Connectivity

The Sybase client is configured by updating the interfaces file (similar to the
DB2 node and database directories). On UNIX, the interfaces file resides, by
default, in the $SYBASE directory.
The interfaces file can be edited manually, or configured using Sybase’s

sybinit utility. The following sample interfaces file provides connectivity
information for two Sybase servers, namely SYBSVR1 and SYBSVR2.
SYBSVR1
master tcp ether 137.12.111.33 4000
query tcp ether 137.12.111.33 4000
SYBSVR2
master tcp ether 137.12.111.42 3048
query tcp ether 137.12.111.42 3048

B.4.2 Using the Sybase Client isql
Sybase supplies a command line utility called isql with Sybase SQL Server.
The syntax for invoking isql is summarized in the following statement:
isql -Y -S<server> -D<database> -U<user> -P<pwd> -c<termchar>
-i<inputfile> -p<outputfile>
The invocation parameters for isql are detailed in Table 18.

Table 18. Invocation Parameters for isql
Parameter Meaning
-Y Changed transaction - autocommit off
-S Sybase server to connect to
-D Database at server <server>
-U SQL Server logon (user)
-P SQL Server password
-c Overwrite the SQL Server default command termination character (the

default termination ’character’ is ’go’); example: -c";"
To change databases in SQL Server, specify:

use <database_name>
Advice: To execute commands in SQL Server, type go after entering the

command in isql.
B.4.3 Sybase SQL Server Data Dictionary

Table 19 summarizes some of the more useful Sybase SQL Server data
dictionary tables. These data dictionary tables are owned by user dbo.
Table 19. Useful SQL Server Data Dictionary Tables
Table Name Columns Description
sysobjects name, type Information on all objects in

the database.
syscolumns name, type, length All columns in tables, views

and arguments in stored
procedures.
sysusers name Specifies who can use the

database.

B.4.4 Helpful SQL Server Stored Procedures
Sybase SQL Server makes extensive use of stored procedures. Table 20
below lists some of the more useful stored procedures.
Table 20. Microsoft SQL Server Stored Procedures
Stored Procedure Name Description
sp_help Lists the objects in the database.
sp_help <table_name> Gives the table structure for

<table_name>.
sp_helpdb Provides information about the databases
on the server.
B.4.5 Sybase SQL Server Error Messages

To display a description for an error code, use isql to logon to a Sybase SQL
Server, and then type:
use master
go
select error, description from sysmessages where error=<errorno>
go

The Sybase home page is at:
http://www.sybase.com

Appendix C. General Implementation Checklist
Set Up the Database Middleware Server

Step 1 - Install the Non-IBM Client Code to Access Non-IBM Data Sources
Step 2 - Prepare and Check Native Access to the Remote Data Sources
Step 3 - Install the DataJoiner Software
Step 4 - Prepare DataJoiner to Access the Remote Data Sources
Step 5 - Create a DataJoiner Instance
Step 6 - Create the DataJoiner Databases
Step 7 - Connect DataJoiner to Other DB2 or DataJoiner Databases
Step 8 - Enable DB2 Clients to Connect to the DataJoiner Databases
Step 9 - Create Server Mappings for All non-IBM Database Systems
Step 10 - Create the Server Options
Step 11 - Create the User Mappings
Implement the Replication Subcomponents (Capture, Apply)

Step 12 - Install and Set Up DProp Capture (If Required)
Step 13 - Install and Set Up DProp Apply (If Required)

Step 14 - Install DB2 Client Application Enabler (DB2 CAE)
Step 15 - Establish DB2 Connectivity
Step 16 - Install the DataJoiner Replication Administration Software (DJRA)
Step 17 - Set up DJRA to Access the Source and Target Databases
Step 18 - Modify the DJRA User Exits (Optional)

Step 19 - Set Up the DProp Control Tables at the Replication Source Servers
Step 20 - Set Up the DProp Control Tables at the Replication Control Servers
Bind DProp Capture and DProp Apply

Step 21 - Bind DProp Capture (If Required)
Step 22 - Bind DProp Apply

Appendix D. DJRA Generated SQL for Case Study 2
This Appendix contains the SQL generated from DJRA for the various
replication definitions which were configured in case study 2.
D.1 Define Replication Sources

1. Define table (ITEMS) as replication source:
--* echo input: TABLEREG SJ390DB1 LIYAN ITEMS AFTER NONEEXCLUDED
--* DELETEINSERT NONE N
-- using SRCESVR.REX as the REXX logic filename
-- using REXX password file PASSWORD.REX
-- If you don’t see: ’-- now done interpreting...’
-- then check your REXX code.
-- now done interpreting REXX password file PASSWORD.REX
-- connect to the source-server
CONNECT TO SJ390DB1 USER db2res5 USING pwd;
-- USERID=DB2RES5 SOURCE_ALIAS alias=SJ390DB1
-- PRDID=DSN0501 23 Mar 1999 9:42am
-- 1 candidate registrations, 16 already known to be registered
-- The following tables are candidates for registration:
-- 1 table LIYAN.ITEMS
-- registration candidate #1 LIYAN.ITEMS
-- LIYAN.ITEMS is assumed a USER table
-- reading REXX logic file SRCESVR.REX
CREATE TABLESPACE TS420598
IN SJ390DB1 SEGSIZE 4 LOCKSIZE TABLE CLOSE NO CCSID EBCDIC;
-- now done interpreting REXX logic file SRCESVR.REX
-- enable change data capture
ALTER TABLE LIYAN.ITEMS DATA CAPTURE CHANGES;
-- create the cd/ccd table for LIYAN.ITEMS
CREATE TABLE LIYAN.LIYANCD_ITEMS(IBMSNAP_UOWID CHAR(10) FOR BIT DATA NOT
NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL, IBMSNAP_OPERATION CHAR(1)
NOT NULL,ITEM_NUM DECIMAL(13 , 0) NOT NULL, DESC VARCHAR(150) NOT
NULL,PROD_LINE_NUM DECIMAL(7 , 0) NOT NULL, SUPP_NO DECIMAL(13 , 0) NOT NULL) IN
SJ390DB1.TS420598;
-- create the index for the change data table for LIYAN.ITEMS
CREATE TYPE 2 UNIQUE INDEX LIYAN.CDI00LIYANCD_ITEMS ON LIYAN.LIYANCD_ITEMS
(IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC);

PARTITION_KEYS_CHG) VALUES(’N’,’LIYAN’,’ITEMS’, 0 , 1 ,’Y’,’Y’,’LIYAN’,
’LIYANCD_ITEMS’,’LIYAN’,’LIYANCD_ITEMS’, 0 ,’0201’,NULL,’0’,’Y’);
COMMIT;
-- Satisfactory completion at 9:42am
2. Define view (S_PRODUCT base on STORE_ITEM & ITEMS) as

replication source:
--* Calling VIEWREG for source table DB2RES5.S_PRODUCT
-- If you don’t see: ’-- now done interpreting...’ then check your REXX code
-- input view OWNER=DB2RES5 input view NAME=S_PRODUCT
-- connect to the source server
-- USERID=DB2RES5 SOURCE_ALIAS alias=SJ390DB1 23 Mar 1999 10:28am
--* The view definition to be registered=’CREATE VIEW
--* DB2RES5.S_PRODUCT AS SELECT S.STORE_NUM,I.ITEM_NUM,I.DESC,
--* I.PROD_LINE_NUM,I.SUPP_NO FROM LIYAN.STORE_ITEM S, LIYAN.ITEMS I
--* WHERE S.PRODLINE_NO=I.PROD_LINE_NUM’
-- create the change data view for component 1
CREATE VIEW DB2RES5.S_PRODUCTA AS SELECT S.IBMSNAP_UOWID,
S.IBMSNAP_INTENTSEQ,S.IBMSNAP_OPERATION,S.STORE_NUM,I.ITEM_NUM,I.DESC,
I.PROD_LINE_NUM,I.SUPP_NO FROM LIYAN.LIYANCD_STORE_ITEM S, LIYAN.ITEMS I WHERE
S.PRODLINE_NO=I.PROD_LINE_NUM;
CONFLICT_LEVEL,PARTITION_KEYS_CHG) VALUES(’N’,’DB2RES5’,’S_PRODUCT’, 1 , 1
,’Y’,’Y’,’DB2RES5’,’S_PRODUCTA’,’LIYAN’,’LIYANCD_STORE_ITEM’, 0 , NULL,NULL,NULL,NULL,NULL
,NULL,NULL,’0201’,NULL,’0’,’N’);
CREATE VIEW DB2RES5.S_PRODUCTB AS SELECT I.IBMSNAP_UOWID,
I.IBMSNAP_INTENTSEQ,I.IBMSNAP_OPERATION,S.STORE_NUM,I.ITEM_NUM,I.DESC,
I.PROD_LINE_NUM,I.SUPP_NO FROM LIYAN.STORE_ITEM S, LIYAN.LIYANCD_ITEMS I WHERE
S.PRODLINE_NO=I.PROD_LINE_NUM;
CONFLICT_LEVEL,PARTITION_KEYS_CHG) VALUES(’N’,’DB2RES5’,’S_PRODUCT’, 2 , 1
,’Y’,’Y’,’DB2RES5’,’S_PRODUCTB’,’LIYAN’,’LIYANCD_ITEMS’, 0 ,NULL, NULL,NULL,NULL,NULL
,NULL,NULL,’0201’,NULL,’0’,’N’);
COMMIT;

D.2 Create Empty Subscription Sets
1. Create subscription sets for CCD tables (use SUPPLIER as example):
--* Calling ADDSET for set 1 : AQCCD/setsupp
--* echo input: ADDSET SJ390DB1 AQCCD SETSUPP SJ390DB1 SJ390DB1
--* 19990318111900 R 2 NULL 30
--* CONNECTing TO SJ390DB1 USER db2res5 USING pwd;
--* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I ’
--* connect to the CNTL_ALIAS
-- current USERID=DB2RES5 CNTL_ALIAS alias=SJ390DB1 18 Mar 1999 11:21am
-- create a new row in IBMSNAP_SUBS_SET
INSERT INTO ASN.IBMSNAP_SUBS_SET( ACTIVATE,APPLY_QUAL,SET_NAME,WHOS_ON_FIRST,
SOURCE_SERVER,SOURCE_ALIAS,TARGET_SERVER,TARGET_ALIAS, STATUS,LASTRUN,
REFRESH_TIMING,SLEEP_MINUTES,EVENT_NAME,MAX_SYNCH_MINUTES,AUX_STMTS,ARCH_LE
VEL) VALUES (0 , ’AQCCD’ , ’SETSUPP’ , ’S’ , ’DB2I ’ , ’SJ390DB1’ , ’DB2I ’ , ’SJ390DB1’ ,
0 , ’1999-03-18-11.19.00’ , ’R’ , 2 ,NULL , 30 , 0 ,’0201’);
INSERT INTO ASN.IBMSNAP_SUBS_SET( ACTIVATE, APPLY_QUAL, SET_NAME, WHOS_ON_FIRST,
SOURCE_SERVER, SOURCE_ALIAS, TARGET_SERVER, TARGET_ALIAS, STATUS, LASTRUN,
REFRESH_TIMING, SLEEP_MINUTES, EVENT_NAME, MAX_SYNCH_MINUTES, AUX_STMTS,
ARCH_LEVEL) VALUES (0 , ’AQCCD’ , ’SETSUPP’ , ’F’ , ’DB2I ’ , ’SJ390DB1’ , ’DB2I ’,
’SJ390DB1’ , 0 , ’1999-03-18-11.19.00’ , ’R’ , 2 ,NULL , 30 , 0 ,’0201’);
--* commit work at SJ390DB1
COMMIT;
-- Satisfactory completion of ADDSET at 11:21am
2. Create empty subscription sets for user copy tables:

--* Calling ADDSET for set 1 : AQLY/SETLY
--* echo input: ADDSET DJDB AQLY SETLY SJ390DB1 DJDB 19990323103300 R
--* 2 NULL 30
--* CONNECTing TO DJDB;
--* The ALIAS name ’DJDB’ matches the RDBNAM ’DJDB’
CONNECT TO DJDB;
-- current USERID=GROHRES3 CNTL_ALIAS alias=DJDB 23 Mar 1999 10:35am
INSERT INTO ASN.IBMSNAP_SUBS_SET( ACTIVATE,APPLY_QUAL,SET_NAME,
WHOS_ON_FIRST,SOURCE_SERVER,SOURCE_ALIAS,TARGET_SERVER,TARGET_ALIAS,
STATUS,LASTRUN,REFRESH_TIMING,SLEEP_MINUTES,EVENT_NAME,
DJRA Generated SQL for Case Study 2 341

MAX_SYNCH_MINUTES,AUX_STMTS,ARCH_LEVEL) VALUES (0 , ’AQLY’ , ’SETLY’ , ’S’ , ’DB2I ’
, ’SJ390DB1’ , ’DJDB’ , ’DJDB’ , 0 , ’1999-03-23-10.33.00’ , ’R’ , 2 ,NULL , 30 , 0 ,’0201’);
MAX_SYNCH_MINUTES,AUX_STMTS,ARCH_LEVEL) VALUES (0 , ’AQLY’ , ’SETLY’ , ’F’ , ’DJDB’ ,
’DJDB’ , ’DB2I ’ , ’SJ390DB1’ , 0 , ’1999-03-23-10.33.00’ , ’R’ , 2 ,NULL , 30 , 0 ,’0201’);
--* commit work at DJDB
COMMIT;
D.3 Add a Member to Subscription Sets

1. Add a member to subscription sets to create a CCD table
(use BRAND as example):
--* Calling d:\Program Files\DPRTools\addmembr.rex for AQCCD/SETBRAND pair # 1
--* Echo input: ADDMEMBR SJ390DB1 AQCCD SETBRAND ITSOSJ BRAND
--* NONEEXECLUDED CCD=YNNYN BRAND_NUM+ DB2RES5 CCDBRAND NODATAJOINER U
--* Connect to the CNTL_ALIAS
--* Current USERID=DB2RES5 CNTL_ALIAS alias= 18 Mar 1999 12:04pm
--* Fetching from the ASN.IBMSNAP_SUBS_SET table at SJ390DB1
--* Fetching from the ASN.IBMSNAP_REGISTER table at SJ390DB1
-- using REXX logic file CNTLSVR.REX
--* If you don’t see: ’--* now done interpreting REXX logic file
--* CNTLSVR.REX’, then check your REXX code
--* The subscription predicate was not changed by the user logic in
--* CNTLSVR.REX
--* now done interpreting REXX logic file CNTLSVR.REX
-- create a new row in IBMSNAP_SUBS_MEMBR
INSERT INTO ASN.IBMSNAP_SUBS_MEMBR( APPLY_QUAL, SET_NAME, WHOS_ON_FIRST,
SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,TARGET_OWNER,TARGET_TABLE,
TARGET_CONDENSED,TARGET_COMPLETE,TARGET_STRUCTURE,PREDICATES) VALUES (
’AQCCD’ , ’SETBRAND’ , ’S’ , ’ITSOSJ’ , ’BRAND’ , 0 ,’DB2RES5’,’CCDBRAND’ ,’Y’ ,’N’ , 3 ,NULL);
VALUES(’AQCCD’,’SETBRAND’ , ’S’,’DB2RES5’,’CCDBRAND’ ,’A’,’BRAND_NUM’,’Y’, 1 ,’BRAND_NUM’);
VALUES(’AQCCD’,’SETBRAND’ , ’S’,’DB2RES5’,’CCDBRAND’ ,’A’,’DESC’,’N’, 2 ,’DESC’);
--* I noticed the set subscription is inactive

UPDATE ASN.IBMSNAP_SUBS_SET SET ACTIVATE=1 WHERE APPLY_QUAL=’AQCCD’ AND
SET_NAME=’SETBRAND’ AND WHOS_ON_FIRST=’S’;--* Commit work at cntl_ALIAS SJ390DB1
COMMIT;
--* Connect to the SOURCE_ALIAS
--* record the subscription in the pruning control table at the
--* source server
INSERT INTO ASN.IBMSNAP_PRUNCNTL( TARGET_SERVER,TARGET_OWNER,
TARGET_TABLE,SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,APPLY_QUAL,
SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’DB2I’,
’DB2RES5’,’CCDBRAND’,’ITSOSJ’,’BRAND’, 0 ,’AQCCD’,’SETBRAND’,’DB2I ’, 3 ,’SJ390DB1’);
--* Commit work at source_ALIAS SJ390DB1
COMMIT;
--* Connect to the TARGET_ALIAS
-- using REXX logic file TARGSVR.REX
-- in TARGSVR.REX
CREATE TABLESPACE TS045974
IN DSNDB04 SEGSIZE 4 LOCKSIZE PAGE CLOSE NO;
-- Create the target table DB2RES5.CCDBRAND
CREATE TABLE DB2RES5.CCDBRAND(BRAND_NUM DECIMAL(7 , 0) NOT NULL, DESC CHAR(30)
NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL, IBMSNAP_OPERATION
CHAR(1) NOT NULL,IBMSNAP_COMMITSEQ CHAR(10) FOR BIT DATA NOT NULL,
IBMSNAP_LOGMARKER TIMESTAMP NOT NULL) IN DSNDB04.TS045974;
CREATE INDEX DB2RES5.CCDBRANDX ON DB2RES5.CCDBRAND(IBMSNAP_COMMITSEQ ASC,
IBMSNAP_INTENTSEQ ASC);
-- Create an index for the TARGET DB2RES5.CCDBRAND
CREATE TYPE 2 UNIQUE INDEX DB2RES5.CCDBRAND ON DB2RES5.CCDBRAND(
BRAND_NUM ASC);
-- Auto-register the TARGET as an ’internal CCD’
UPDATE ASN.IBMSNAP_REGISTER SET CCD_OWNER=’DB2RES5’,
CCD_TABLE=’CCDBRAND’,CCD_COMPLETE=’N’,CCD_CONDENSED=’Y’ WHERE
SOURCE_OWNER=’ITSOSJ’ AND SOURCE_TABLE=’BRAND’;
--* Commit work at target server SJ390DB1
COMMIT;
--* Satisfactory completion of ADDMEMBR at 12:05pm
2. Add a member to subcription sets to create a user copy table

(use BRAND as example):
--* Calling d:\Program Files\DPRTools\addmembr.rex for LY610/SET610 pair # 3
--* Echo input: ADDMEMBR DJDB LY610 SET610 ITSOSJ BRAND NONEEXECLUDED
--* UCOPY DESC+ GROHRES3 BRAND INFODB1 U

CONNECT TO DJDB;
--* The ALIAS name ’DJDB ’ matches the RDBNAM ’DJDB’
--* Current USERID=GROHRES3 CNTL_ALIAS alias= 18 Mar 1999 11:58am
--* Fetching from the ASN.IBMSNAP_SUBS_SET table at DJDB
--* CNTLSVR.REX
INSERT INTO ASN.IBMSNAP_SUBS_MEMBR( APPLY_QUAL,SET_NAME,WHOS_ON_FIRST,
SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,TARGET_OWNER,TARGET_TABLE,
’LY610’ , ’SET610’ , ’S’ , ’ITSOSJ’ , ’BRAND’ , 0 ,’GROHRES3’,’BRAND’,’Y’,’Y’, 8 ,NULL);
VALUES(’LY610’,’SET610’ , ’S’,’GROHRES3’,’BRAND’ ,’A’,’BRAND_NUM’,’N’,1 ,’BRAND_NUM’);
VALUES(’LY610’,’SET610’ , ’S’,’GROHRES3’,’BRAND’ ,’A’,’DESCI’,’Y’, 2 ,’DESC’);*1
--* Commit work at cntl_ALIAS DJDB
COMMIT;
--* source server
SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’DJDB’,’GROHRES3’,’BRAN
D’,’ITSOSJ’,’BRAND’, 0 ,’LY610’,’SET610’,’DJDB’, 8 ,’DJDB’);
COMMIT;
CONNECT TO DJDB ;
--* Set DJ two_phase commit off
SET SERVER OPTION TWO_PHASE_COMMIT TO ’N’ FOR SERVER INFODB1;
-- in TARGSVR.REX
1 Here we change the TARGET_NAME from DESC to DESCI manually, corresponding to the change in next page.

--* The target server is DataJoiner and no ’nickname’ yet exists for
--* the target table, making it necessary to passthru DataJoiner to
--* create the target table, then create a nickname for the new target
--* in DataJoiner
SET PASSTHRU INFODB1;
-- Assuming MS SqlServer data types for the target table
-- Create the target table GROHRES3.BRAND
CREATE TABLE BRAND(BRAND_NUM DECIMAL(7 , 0) NOT NULL,DESCI CHAR(30) NOT NULL);*2
-- Create an index for the TARGET GROHRES3.BRAND
CREATE UNIQUE INDEX BRANDX ON dbo.BRAND(DESCI);*
--* Returning now to a local DataJoiner context
COMMIT;
SET PASSTHRU RESET;
--* Create a DataJoiner nickname for the new target table in INFODB1
CREATE NICKNAME GROHRES3.BRAND FOR "INFODB1"."dbo"."BRAND";
--* Please resist the temptation to edit the CREATE NICKNAME
--* definition above. DPRTOOLS relies on both the name and the
--* qualifier of the nickname matching the name and qualifier of the
--* target table.
COMMIT;
--* Satisfactory completion of ADDMEMBR at 11:58am
D.4 Add Stored Procedure to Subscription Sets

Add the stored procedure to the subscription set in order to invoke it after the
apply program has applied the changed data to the target.
--* Calling ADDSTMT for AQLY/SETLY pair # 4 --*
--* echo input: ADDSTMT DJDB AQLY SETLY S A 1 C 0000002000 C_ITEM
CONNECT TO DJDB USER grohres3 USING pwd;
-- current USERID=GROHRES3 CNTL_ALIAS alias= 25 Mar 1999 8:28pm
---* Fetching from the ASN.IBMSNAP_SUBS_STMTS table at DJDB
--* There is no conflicting entry in the ASN.IBMSNAP_SUBS_STMTS table,
--* so Ok to add.
--* Unable to verify stored procedure names in advance, so will
--* continue, assuming that the stored procedure name is Ok.
-- create a new row in IBMSNAP_SUBS_STMTS
2 *the BRAND table has a column called ’DESC’, but MS SQL Server use ’DESC’ as a keep word, so it doesn’t support a
table with such a column name, so we update the script here manaually, also update the statement for inserting rows to
asn.ibmsnap_subs_col

VALUES(’AQLY’,’SETLY’,’S’,’A’, 1 ,’C’,’C_ITEM’,’0000002000’);
UPDATE ASN.IBMSNAP_SUBS_SET SET AUX_STMTS=AUX_STMTS + 1 WHERE
APPLY_QUAL=’AQLY’ AND SET_NAME=’SETLY’ AND WHOS_ON_FIRST=’S’;
--* commit work at cntl_ALIAS DJDB
COMMIT;
-- Satisfactory completion of ADDSTMT at 8:28pm

Appendix E. DJRA Generated SQL for Case Study 3
replication definitions which were configured in case study 3. The
modifications to the generated SQL are shown in bold typeface.
E.1 Output from Define the SALES_SET Subscription Set

--* File Name: create_sales_set.sql
--*
--* Calling ADDSET for set 1 : WHQ3/SALES_SET
--*
--* echo input: ADDSET DJDB WHQ3 SALES_SET SJ390DB1 DJDB
--* 19990314000000 R 1440 NULL 30
--*
--* CONNECTing TO SJ390DB1 USER db2res5 USING pwd ;
--*
--*
--* CONNECTing TO DJDB USER djinst5 USING pwd;
--*
--* The ALIAS name ’DJDB’ matches the RDBNAM ’DJDB’
--*
--*
CONNECT TO DJDB USER djinst5 USING pwd;
-- current USERID=DJINST5 CNTL_ALIAS alias=DJDB 15 Mar 1999 11:36am

MAX_SYNCH_MINUTES,AUX_STMTS,ARCH_LEVEL) VALUES (0 , ’WHQ1’ ,
’SALES_SET’ , ’S’ , ’DB2I ’ , ’SJ390DB1’ , ’DJDB’ , ’DJDB’ ,
0 , ’1999-03-14-00.00.00’ , ’R’ , 1440 ,NULL , 30 , 0 ,’0201’);

MAX_SYNCH_MINUTES,AUX_STMTS,ARCH_LEVEL) VALUES (0 , ’WHQ1’ ,
’SALES_SET’ , ’F’ , ’DJDB’ , ’DJDB’ , ’DB2I ’ , ’SJ390DB1’ ,
0 , ’1999-03-14-00.00.00’ , ’R’ , 1440 ,NULL , 30 , 0 ,’0201’);
--* commit work at DJDB

--*
COMMIT;

E.2 Output from Register the Supplier Table
--* File Name: register_supplier.sql
--*
--* echo input: TABLEREG SJ390DB1 ITSOSJ SUPPLIER AFTER NONEEXCLUDED
--* DELETEINSERTUPDATE NONE
--*


CONNECT TO SJ390DB1 USER db2res5 USING pwd ;

-- PRDID=DSN0501 11 Mar 1999 3:13pm

-- 1 table ITSOSJ.SUPPLIER
-- registration candidate #1 ITSOSJ.SUPPLIER

-- ITSOSJ.SUPPLIER is assumed a USER table


CREATE TABLESPACE TSSUPPLI

ALTER TABLE ITSOSJ.SUPPLIER DATA CAPTURE CHANGES;

-- create the cd/ccd table for ITSOSJ.SUPPLIER
CREATE TABLE ITSOSJ.CDSUPPLIER(IBMSNAP_UOWID CHAR(10) FOR BIT DATA NOT
IBMSNAP_OPERATION CHAR(1) NOT NULL,SUPP_NO DECIMAL(7 , 0) NOT NULL,
SUPP_NAME CHAR(25) NOT NULL) IN SJ390DB1.TSSUPPLI;
-- create the index for the change data table for ITSOSJ.SUPPLIER
CREATE TYPE 2 UNIQUE INDEX ITSOSJ.CDI00000CDSUPPLIER ON
ITSOSJ.CDSUPPLIER(IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC);

PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’SUPPLIER’, 0 , 1 ,’Y’,’Y’,
’ITSOSJ’,’CDSUPPLIER’,’ITSOSJ’,’CDSUPPLIER’, 1 ,’0201’,NULL,’0’,’N’);

COMMIT;
-- Satisfactory completion at 3:13pm
E.3 Output from Subscribe to the Supplier Table

--* File Name: subs_supplier1.sql
--*
-- *****************************************************
-- File was edited as follows :
-- 1.CREATE TABLE statement changed :
--a) CREATE TABLESPACE statement commented out.
-- b) TABLESPACE changed to Oracle server mapping
-- AZOVORA8 (nickname will automatically be
-- created).
--c)EXPIRED_TIMESTAMP column added for temporal histories
-- *****************************************************
--*
--* Calling C:\DPRTools\addmembr.rex for WHQ1/SALES_SET pair # 3
--*
--* Echo input: ADDMEMBR DJDB WHQ1 SALES_SET ITSOSJ SUPPLIER
--* NONEEXECLUDED CCD=NYNNN NONE SIMON SUPPLIER NODATAJOINER U
--*
--*

--*
--* Current USERID=DJINST5 CNTL_ALIAS alias= 15 Mar 1999 1:55pm
--*
--*
--*
--*
--*
--* CNTLSVR.REX

’WHQ1’ , ’SALES_SET’ , ’S’ , ’ITSOSJ’ , ’SUPPLIER’ , 0 ,’SIMON’,
’SUPPLIER’ ,’N’ ,’Y’ , 3 ,NULL);


VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SUPPLIER’ ,’A’,’SUPP_NO’,’N’,
1 ,’SUPP_NO’);

VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SUPPLIER’ ,’A’,’SUPP_NAME’,
’N’, 2 ,’SUPP_NAME’);

--*
UPDATE ASN.IBMSNAP_SUBS_SET SET ACTIVATE=1 WHERE APPLY_QUAL=’WHQ1’
AND SET_NAME=’SALES_SET’ AND WHOS_ON_FIRST=’S’;

--*
COMMIT;

--*

--*
--* source server
--*
SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’DJDB’,’SIMON’,
’SUPPLIER’,’ITSOSJ’,’SUPPLIER’, 0 ,’WHQ1’,’SALES_SET’,’DJDB’, 3 ,
’DJDB’);

--*
COMMIT;


--*
--*
-- in TARGSVR.REX
--CREATE TABLESPACE TSSUPPLIER MANAGED BY DATABASE USING (FILE
’/data/djinst5/djinst5/SUPPLIER.F1’ 2000 );

-- Create the target table SIMON.SUPPLIER
CREATE TABLE SIMON.SUPPLIER(SUPP_NO DECIMAL(7 , 0) NOT NULL,SUPP_NAME
CHAR(25) NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL,

--*
COMMIT;
E.4 Output from Register the Store and Region Tables

--* File Name: register_store+region.sql
--*
--* Calling TABLEREG for source table ITSOSJ.REGION
--*
--* echo input: TABLEREG SJ390DB1 ITSOSJ REGION AFTER NONEXCLUDED
--* DELETEINSERTUPDATE NONE N
--*



-- PRDID=DSN0501 17 Mar 1999 9:27am

-- 1 table ITSOSJ.REGION
-- registration candidate #1 ITSOSJ.REGION

-- ITSOSJ.REGION is assumed a USER table


CREATE TABLESPACE TSREGION
--* Source table ITSOSJ.REGION already has CDC attribute, no need to

--* alter.
-- create the cd/ccd table for ITSOSJ.REGION
CREATE TABLE ITSOSJ.CDREGION(IBMSNAP_UOWID CHAR(10) FOR BIT DATA NOT
IBMSNAP_OPERATION CHAR(1) NOT NULL,REGION_ID DECIMAL(3 , 0) NOT NULL,
REGION_NAME CHAR(30)) IN SJ390DB1.TSREGION;
-- create the index for the change data table for ITSOSJ.REGION
CREATE TYPE 2 UNIQUE INDEX ITSOSJ.CDI0000000CDREGION ON
ITSOSJ.CDREGION(IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC);

PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’REGION’, 0 , 1 ,’Y’,’Y’,
’ITSOSJ’,’CDREGION’,’ITSOSJ’,’CDREGION’, 1 ,’0201’,NULL,’0’,’N’);
COMMIT;
--*
--* Calling TABLEREG for source table ITSOSJ.STORE
--*
--* echo input: TABLEREG SJ390DB1 ITSOSJ STORE AFTER NONEXCLUDED
--* DELETEINSERTUPDATE NONE N
--*



-- PRDID=DSN0501 17 Mar 1999 9:27am

-- 1 table ITSOSJ.STORE
-- registration candidate #1 ITSOSJ.STORE

-- ITSOSJ.STORE is assumed a USER table


CREATE TABLESPACE TSSTORE
--* Source table ITSOSJ.STORE already has CDC attribute, no need to

--* alter.
-- create the cd/ccd table for ITSOSJ.STORE
CREATE TABLE ITSOSJ.CDSTORE(IBMSNAP_UOWID CHAR(10) FOR BIT DATA NOT
IBMSNAP_OPERATION CHAR(1) NOT NULL,COMPNO DECIMAL(3 , 0) NOT NULL,
STORE_NUM DECIMAL(3 , 0) NOT NULL,NAME CHAR(25) NOT NULL,STREET CHAR(
25) NOT NULL,ZIP DECIMAL(5 , 0) NOT NULL,CITY CHAR(20) NOT NULL,
REGION_ID DECIMAL(3 , 0) NOT NULL) IN SJ390DB1.TSSTORE ;
-- create the index for the change data table for ITSOSJ.STORE
CREATE TYPE 2 UNIQUE INDEX ITSOSJ.CDI00000000CDSTORE ON ITSOSJ.CDSTORE(
IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC);

PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’STORE’, 0 , 1 ,’Y’,’Y’,
’ITSOSJ’,’CDSTORE’,’ITSOSJ’,’CDSTORE’, 1 ,’0201’,NULL,’0’,’N’);
COMMIT;
E.5 Output from Subscribe to the Region Table

--* File Name: subs_region.sql
--*
--*
--* Echo input: ADDMEMBR DJDB WHQ1 SALES_SET ITSOSJ REGION
--* NONEEXECLUDED PIT REGION_ID+ DJINST5 REGION AZOVORA8 U
--* ’IBMSNAP_OPERATION = ’I’’
--*
--*

--*
--* Current USERID=DJINST5 CNTL_ALIAS alias= 17 Mar 1999 9:42am
--*
--*
--*
--*
--*
--* CNTLSVR.REX

’WHQ1’ , ’SALES_SET’ , ’S’ , ’ITSOSJ’ , ’REGION’ , 0 ,’DJINST5’,
’REGION’,’Y’,’Y’, 4 , ’IBMSNAP_OPERATION = ’’I’’’);


VALUES(’WHQ1’,’SALES_SET’ , ’S’,’DJINST5’,’REGION’ ,’A’,’REGION_ID’,
’Y’, 1 ,’REGION_ID’);

VALUES(’WHQ1’,’SALES_SET’ , ’S’,’DJINST5’,’REGION’ ,’A’,’REGION_NAME’,
’N’, 2 ,’REGION_NAME’);

--*
COMMIT;

--*

--*
--* source server
--*
SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’DJDB’,
’DJINST5’,’REGION’,’ITSOSJ’,’REGION’, 0 ,’WHQ1’,’SALES_SET’,’DJDB’, 4 ,
’DJDB’);

--*
COMMIT;

--* Set DJ two_phase commit off

--*
SET SERVER OPTION TWO_PHASE_COMMIT TO ’N’ FOR SERVER AZOVORA8;
--*
--*
-- in TARGSVR.REX

--* The target server is DataJoiner and no ’nickname’ yet exists for
--* the target table, making it necessary to passthru DataJoiner to
--* create the target table, then create a nickname for the new target
--* in DataJoiner
--*
SET PASSTHRU AZOVORA8;
-- Assuming Oracle data types for the target table
-- Create the target table DJINST5.REGION
CREATE TABLE SIMON.REGION(REGION_ID DECIMAL(3 , 0) NOT NULL,
REGION_NAME CHAR(30),IBMSNAP_LOGMARKER DATE);
-- Create an index for the TARGET DJINST5.REGION

CREATE UNIQUE INDEX SIMON.REGION ON SIMON.REGION(REGION_ID ASC);

--* Returning now to a local DataJoiner context
--*
COMMIT;
SET PASSTHRU RESET;
--* Create a DataJoiner nickname for the new target table in AZOVORA8
--*
CREATE NICKNAME DJINST5.REGION FOR AZOVORA8.SIMON.REGION;
--* Please resist the temptation to edit the CREATE NICKNAME

--* definition above. DPRTOOLS relies on both the name and the
--* qualifier of the nickname matching the name and qualifier of the
--* target table.
--*
--* Fixup DataJoiner data type mapping
--*
ALTER NICKNAME DJINST5.REGION SET COLUMN REGION_ID LOCAL TYPE DECIMAL(
3 , 0);

--*
COMMIT;
E.6 Output from Subscribe to the Store Table

--* File Name: subs_store.sql
--*
-- *****************************************************
-- br created).
-- *****************************************************
--*
--*
--* Echo input: ADDMEMBR DJDB WHQ1 SALES_SET ITSOSJ STORE
--* NONEEXECLUDED CCD=NYNNN NONE SIMON STORE NODATAJOINER U
--* ’IBMSNAP_OPERATION IN (’I’,’U’)’
--*
--*

--*
--* Current USERID=DJINST5 CNTL_ALIAS alias= 17 Mar 1999 9:56am
--*
--*

--*
--*
--*
--* CNTLSVR.REX

’WHQ1’ , ’SALES_SET’ , ’S’ , ’ITSOSJ’ , ’STORE’ , 0 ,’SIMON’,’STORE’
,’N’ ,’Y’ , 3 ,NULL);

VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’STORE’ ,’A’,’COMPNO’,’N’, 1 ,
’COMPNO’);

VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’STORE’ ,’A’,’STORE_NUM’,’N’,
2 ,’STORE_NUM’);

VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’STORE’ ,’A’,’NAME’,’N’, 3 ,
’NAME’);

VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’STORE’ ,’A’,’STREET’,’N’, 4 ,
’STREET’);

VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’STORE’ ,’A’,’ZIP’,’N’, 5 ,
’ZIP’);

VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’STORE’ ,’A’,’CITY’,’N’, 6 ,
’CITY’);

VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’STORE’ ,’A’,’REGION_ID’,’N’,
7 ,’REGION_ID’);

--*

COMMIT;

--*

--*
--* source server
--*
’STORE’,’ITSOSJ’,’STORE’, 0 ,’WHQ1’,’SALES_SET’,’DJDB’, 3 ,’DJDB’);

--*
COMMIT;


--*
--*
-- in TARGSVR.REX
-- CREATE TABLESPACE TSSTORE MANAGED BY DATABASE USING (FILE
’/data/djinst5/djinst5/STORE.F1’ 2000 );

-- Create the target table SIMON.STORE
CREATE TABLE SIMON.STORE(COMPNO DECIMAL(3 , 0) NOT NULL,STORE_NUM
DECIMAL(3 , 0) NOT NULL,NAME CHAR(25) NOT NULL,STREET CHAR(25) NOT
NULL,ZIP DECIMAL(5 , 0) NOT NULL,CITY CHAR(20) NOT NULL,REGION_ID
DECIMAL(3 , 0) NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT
NULL,IBMSNAP_OPERATION CHAR(1) NOT NULL,IBMSNAP_COMMITSEQ CHAR(10) FOR
BIT DATA NOT NULL,IBMSNAP_LOGMARKER TIMESTAMP NOT NULL,

--*
COMMIT;
E.7 Output from Register the Items, ProdLine, and Brand Tables
--* File Name: register_items+prodline+brand.sql
--*
--* Calling TABLEREG for source table ITSOSJ.BRAND
--*

--* echo input: TABLEREG SJ390DB1 ITSOSJ BRAND AFTER NONEXCLUDED
--*



-- PRDID=DSN0501 9 Mar 1999 3:27pm

-- 1 table ITSOSJ.BRAND
-- registration candidate #1 ITSOSJ.BRAND

-- ITSOSJ.BRAND is assumed a USER table


CREATE TABLESPACE TSBRAND

ALTER TABLE ITSOSJ.BRAND DATA CAPTURE CHANGES;

-- create the cd/ccd table for ITSOSJ.BRAND
CREATE TABLE ITSOSJ.CDBRAND(IBMSNAP_UOWID CHAR(10) FOR BIT DATA NOT
IBMSNAP_OPERATION CHAR(1) NOT NULL,BRAND_NUM DECIMAL(7 , 0) NOT NULL,
DESC CHAR(30) NOT NULL) IN SJ390DB1.TSBRAND ;
-- create the index for the change data table for ITSOSJ.BRAND
CREATE TYPE 2 UNIQUE INDEX ITSOSJ.CDI00000000CDBRAND ON ITSOSJ.CDBRAND(

PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’BRAND’, 0 , 1 ,’Y’,’Y’,
’ITSOSJ’,’CDBRAND’,’ITSOSJ’,’CDBRAND’, 1,’0201’,NULL,’0’,’N’);
COMMIT;

--*
--* Calling TABLEREG for source table ITSOSJ.ITEMS
--*
--* echo input: TABLEREG SJ390DB1 ITSOSJ ITEMS AFTER NONEXCLUDED
--*



-- PRDID=DSN0501 9 Mar 1999 3:27pm

-- 1 table ITSOSJ.ITEMS
-- registration candidate #1 ITSOSJ.ITEMS

-- ITSOSJ.ITEMS is assumed a USER table


CREATE TABLESPACE TSITEMS

ALTER TABLE ITSOSJ.ITEMS DATA CAPTURE CHANGES;

-- create the cd/ccd table for ITSOSJ.ITEMS
CREATE TABLE ITSOSJ.CDITEMS(IBMSNAP_UOWID CHAR(10) FOR BIT DATA NOT
IBMSNAP_OPERATION CHAR(1) NOT NULL,ITEM_NUM DECIMAL(13 , 0) NOT NULL,
DESC VARCHAR(150) NOT NULL,PROD_LINE_NUM DECIMAL(7 , 0) NOT NULL,
SUPP_NO DECIMAL(13 , 0) NOT NULL) IN SJ390DB1.TSITEMS ;
-- create the index for the change data table for ITSOSJ.ITEMS
CREATE TYPE 2 UNIQUE INDEX ITSOSJ.CDI00000000CDITEMS ON ITSOSJ.CDITEMS(

PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’ITEMS’, 0 , 1 ,’Y’,’Y’,
’ITSOSJ’,’CDITEMS’,’ITSOSJ’,’CDITEMS’, 1 ,’0201’,NULL,’0’,’N’);

COMMIT;
--*
--* Calling TABLEREG for source table ITSOSJ.PRODLINE
--*
--* echo input: TABLEREG SJ390DB1 ITSOSJ PRODLINE AFTER NONEXCLUDED
--*



-- PRDID=DSN0501 9 Mar 1999 3:27pm

-- 1 table ITSOSJ.PRODLINE
-- registration candidate #1 ITSOSJ.PRODLINE

-- ITSOSJ.PRODLINE is assumed a USER table


CREATE TABLESPACE TSPRODLI

ALTER TABLE ITSOSJ.PRODLINE DATA CAPTURE CHANGES;

-- create the cd/ccd table for ITSOSJ.PRODLINE
CREATE TABLE ITSOSJ.CDPRODLINE(IBMSNAP_UOWID CHAR(10) FOR BIT DATA NOT
IBMSNAP_OPERATION CHAR(1) NOT NULL,PROD_LINE_NUM DECIMAL(7 , 0) NOT
NULL,DESC CHAR(30) NOT NULL,BRAND_NUM DECIMAL(7 , 0) NOT NULL) IN
SJ390DB1.TSPRODLI;
-- create the index for the change data table for ITSOSJ.PRODLINE
CREATE TYPE 2 UNIQUE INDEX ITSOSJ.CDI00000CDPRODLINE ON
ITSOSJ.CDPRODLINE(IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC);

PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’PRODLINE’, 0 , 1 ,’Y’,’Y’,
’ITSOSJ’,’CDPRODLINE’,’ITSOSJ’,’CDPRODLINE’, 1 ,’0201’,NULL,’0’,’N’);
COMMIT;
E.8 Output from Register the Products View

--* File Name: register_products_view.sql
--*
--* Calling VIEWREG for source table DB2RES5.PRODUCTS
--*
-- input view OWNER=DB2RES5 input view NAME=PRODUCTS
-- USERID=DB2RES5 SOURCE_ALIAS alias=SJ390DB1 9 Mar 1999 3:46pm

--* The view definition to be registered=’CREATE VIEW PRODUCTS AS
--* SELECT I.ITEM_NUM, SUBSTR(I.DESC,1,40) AS ITEM_DESCRIPTION,
--* I.PROD_LINE_NUM, P.DESC AS PRODUCT_LINE_DESC, I.SUPP_NO AS
--* SUPPLIER_NUM, P.BRAND_NUM, B.DESC AS BRAND_DESCRIPTION FROM
--* ITSOSJ.ITEMS I, ITSOSJ.PRODLINE P, ITSOSJ.BRAND B WHERE
--* I.PROD_LINE_NUM=P.PROD_LINE_NUM AND P.BRAND_NUM=B.BRAND_NUM’
--*
CREATE VIEW DB2RES5.PRODUCTSA AS SELECT P.IBMSNAP_UOWID,
P.IBMSNAP_INTENTSEQ,P.IBMSNAP_OPERATION,I.ITEM_NUM, SUBSTR(I.DESC,1,40)
AS ITEM_DESCRIPTION, I.PROD_LINE_NUM, P.DESC AS PRODUCT_LINE_DESC,
I.SUPP_NO AS SUPPLIER_NUM, P.BRAND_NUM, B.DESC AS BRAND_DESCRIPTION
FROM ITSOSJ.ITEMS I, ITSOSJ.CDPRODLINE P, ITSOSJ.BRAND B WHERE
I.PROD_LINE_NUM=P.PROD_LINE_NUM AND P.BRAND_NUM=B.BRAND_NUM;

1 ,’Y’,’Y’,’DB2RES5’,’PRODUCTSA’,’ITSOSJ’,’CDPRODLINE’, 1 ,NULL,NULL,

CREATE VIEW DB2RES5.PRODUCTSB AS SELECT B.IBMSNAP_UOWID,
B.IBMSNAP_INTENTSEQ,B.IBMSNAP_OPERATION,I.ITEM_NUM, SUBSTR(I.DESC,1,40)
FROM ITSOSJ.ITEMS I, ITSOSJ.PRODLINE P, ITSOSJ.CDBRAND B WHERE


1 ,’Y’,’Y’,’DB2RES5’,’PRODUCTSB’,’ITSOSJ’,’CDBRAND’, 1 ,NULL,NULL,

CREATE VIEW DB2RES5.PRODUCTSC AS SELECT I.IBMSNAP_UOWID,
I.IBMSNAP_INTENTSEQ,I.IBMSNAP_OPERATION,I.ITEM_NUM, SUBSTR(I.DESC,1,40)
FROM ITSOSJ.CDITEMS I, ITSOSJ.PRODLINE P, ITSOSJ.BRAND B WHERE

1 ,’Y’,’Y’,’DB2RES5’,’PRODUCTSC’,’ITSOSJ’,’CDITEMS’, 1 ,NULL,NULL,
COMMIT;
E.9 Output from Subscribe to the Products View

--* File Name: subs_products.sql
--*
-- *****************************************************
-- created).
-- *****************************************************
--*
--*
--* Echo input: ADDMEMBR DJDB WHQ1 SALES_SET DB2RES5 PRODUCTS
--* NONEEXECLUDED CCD NONE SIMON PRODUCTS NODATAJOINER U
--*
--*

--*
--*
--*
--*
--*
--*
--* CNTLSVR.REX

’WHQ1’ , ’SALES_SET’ , ’S’ , ’DB2RES5’ , ’PRODUCTS’ , 1 ,’SIMON’,
’PRODUCTS’,’N’,’Y’, 3 , NULL);

VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’PRODUCTS’ ,’A’,’ITEM_NUM’,
’N’, 1 ,’ITEM_NUM’);

VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’PRODUCTS’ ,’A’,
’ITEM_DESCRIPTION’,’N’, 2 ,’ITEM_DESCRIPTION’);

’PROD_LINE_NUM’,’N’, 3 ,’PROD_LINE_NUM’);

’PRODUCT_LINE_DESC’,’N’, 4 ,’PRODUCT_LINE_DESC’);

’SUPPLIER_NUM’,’N’, 5 ,’SUPPLIER_NUM’);

’BRAND_NUM’,’N’, 6 ,’BRAND_NUM’);

’BRAND_DESCRIPTION’,’N’, 7 ,’BRAND_DESCRIPTION’);



--*
UPDATE ASN.IBMSNAP_SUBS_SET SET ACTIVATE=1 WHERE APPLY_QUAL=’WHQ1’
AND SET_NAME=’SALES_SET’ AND WHOS_ON_FIRST=’S’;

--*
COMMIT;

--*

--*
--* source server
--*
’PRODUCTS’,’DB2RES5’,’PRODUCTS’, 1 ,’WHQ1’,’SALES_SET’,’DJDB’, 3 ,
’DJDB’);

--* source server
--*
’DJDB’);

--* source server
--*
’DJDB’);

--*
COMMIT;


--*
--*
-- in TARGSVR.REX
-- CREATE TABLESPACE TSPRODUCTS MANAGED BY DATABASE USING (FILE
’/data/djinst5/djinst5/PRODUCTS.F1’ 2000 );

-- Create the target table SIMON.PRODUCTS
CREATE TABLE SIMON.PRODUCTS(ITEM_NUM DECIMAL(13 , 0) NOT NULL,
ITEM_DESCRIPTION CHAR(40) NOT NULL,PROD_LINE_NUM DECIMAL(7 , 0) NOT
NULL,PRODUCT_LINE_DESC CHAR(30) NOT NULL,SUPPLIER_NUM DECIMAL(13 , 0)
NOT NULL,BRAND_NUM DECIMAL(7 , 0) NOT NULL,BRAND_DESCRIPTION CHAR(30)
NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL,

--*
COMMIT;
E.10 Output from Register the Sales Table

--* File Name: register_sales.sql
--*
--* echo input: TABLEREG SJ390DB1 DB2RES5 SALES AFTER DEPTNO,IN_PRC,
--* NO_CUST,SUPPLNO,WGRNO DELETEINSERTUPDATE NONE N
--*



-- PRDID=DSN0501 17 Mar 1999 3:16pm


-- 1 table DB2RES5.SALES
-- registration candidate #1 DB2RES5.SALES

-- DB2RES5.SALES is assumed a USER table


--CREATE TABLESPACE TSSALES
-- IN SJ390DB1 SEGSIZE 4 LOCKSIZE TABLE CLOSE NO CCSID EBCDIC;
create tablespace TSSALES in sj390db1
SEGSIZE 4 LOCKSIZE TABLE CLOSE NO CCSID EBCDIC
USING STOGROUP SJDB1SG2 PRIQTY 180000 SECQTY 5000;
--* Source table DB2RES5.SALES already has CDC attribute, no need to

--* alter.
-- create the cd/ccd table for DB2RES5.SALES
CREATE TABLE DB2RES5.CDSALES(IBMSNAP_UOWID CHAR(10) FOR BIT DATA NOT
IBMSNAP_OPERATION CHAR(1) NOT NULL,DATE DATE NOT NULL,BASARTNO DECIMAL(
13 , 0) NOT NULL,LOCATION DECIMAL(4 , 0) NOT NULL,COMPANY DECIMAL(3 ,
0) NOT NULL,PIECES DECIMAL(7 , 0) NOT NULL,OUT_PRC DECIMAL(11 , 2) NOT
NULL,TAX DECIMAL(11 , 2) NOT NULL,TRANSFER_DATE TIMESTAMP NOT NULL,
PROCESS_DATE TIMESTAMP NOT NULL) IN SJ390DB1.TSSALES ;
-- create the index for the change data table for DB2RES5.SALES
CREATE TYPE 2 UNIQUE INDEX DB2RES5.CDI00000000CDSALES ON
DB2RES5.CDSALES(IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC);

PARTITION_KEYS_CHG) VALUES(’N’,’DB2RES5’,’SALES’, 0 , 1 ,’Y’,’Y’,
’DB2RES5’,’CDSALES’,’DB2RES5’,’CDSALES’, 1 ,’0201’,NULL,’0’,’N’);
COMMIT;
E.11 Output from Subscribe to the Sales Table

--* File Name: subs_sales.sql
--*
--*
--* Echo input: ADDMEMBR DJDB WHQ1 SALES_SET DB2RES5 SALES
--* NONEEXECLUDED CCD=NYNNN NONE SIMON SALES NODATAJOINER U
--* ’IBMSNAP_OPERATION = ’I’’
--*

--*

--*
--*
--*
--*
--*
--*
--* CNTLSVR.REX

’WHQ1’ , ’SALES_SET’ , ’S’ , ’DB2RES5’ , ’SALES’ , 0 ,’SIMON’,
’SALES’ ,’N’ ,’Y’ , 3 , ’IBMSNAP_OPERATION = ’’I’’’);

VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SALES’ ,’A’,’SALE_DATE’,’N’, 1 ,
’DATE’);

VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SALES’ ,’A’,’BASARTNO’,’N’,
2 ,’BASARTNO’);

VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SALES’ ,’A’,’LOCATION’,’N’,
3 ,’LOCATION’);

VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SALES’ ,’A’,’COMPANY’,’N’, 4
,’COMPANY’);

VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SALES’ ,’A’,’PIECES’,’N’, 5 ,
’PIECES’);

VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SALES’ ,’A’,’OUT_PRC’,’N’, 6
,’OUT_PRC’);

VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SALES’ ,’A’,’TAX’,’N’, 7 ,
’TAX’);

VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SALES’ ,’A’,’TRANSFER_DATE’,
’N’, 8 ,’TRANSFER_DATE’);

VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SALES’ ,’A’,’PROCESS_DATE’,
’N’, 9 ,’PROCESS_DATE’);

--*
COMMIT;

--*

--*
--* source server
--*
’SALES’,’DB2RES5’,’SALES’, 0 ,’WHQ1’,’SALES_SET’,’DJDB’, 3 ,’DJDB’);

--*
COMMIT;


--*
--*
-- in TARGSVR.REX
-- CREATE TABLESPACE TSSALES MANAGED BY DATABASE USING (FILE
’/data/djinst5/djinst5/SALES.F1’ 2000 );

-- Create the target table SIMON.SALES
CREATE TABLE SIMON.SALES(SALE_DATE DATE NOT NULL,BASARTNO DECIMAL(13 , 0)
NOT NULL,LOCATION DECIMAL(4 , 0) NOT NULL,COMPANY DECIMAL(3 , 0) NOT
NULL,PIECES DECIMAL(7 , 0) NOT NULL,OUT_PRC DECIMAL(11 , 2) NOT NULL,
TAX DECIMAL(11 , 2) NOT NULL,TRANSFER_DATE TIMESTAMP NOT NULL,
PROCESS_DATE TIMESTAMP NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT
DATA NOT NULL,IBMSNAP_OPERATION CHAR(1) NOT NULL,IBMSNAP_COMMITSEQ
CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_LOGMARKER TIMESTAMP NOT NULL)
IN AZOVORA8
REMOTE OPTION ’TABLESPACE BIGTS’;

--*
COMMIT;
E.12 SQL After to Support Temporal Histories for Supplier Table

--* File Name: sqlafter_supplier.sql
--*
--*
--* Calling ADDSTMT for WHQ1/SALES_SET pair # 1 --*
--* echo input: ADDSTMT DJDB WHQ1 SALES_SET S A 1 E 0000002000 UPDATE
--* SIMON.SUPPLIER A SET EXPIRED_TIMESTAMP= (SELECT MIN(
--* IBMSNAP_LOGMARKER) FROM SIMON.SUPPLIER B WHERE
--* A.SUPP_NO=B.SUPP_NO AND A.EXPIRED_TIMESTAMP IS NULL AND
--* B.EXPIRED_TIMESTAMP IS NULL AND (B.IBMSNAP_INTENTSEQ >
--* A.IBMSNAP_INTENTSEQ)) WHERE A.EXPIRED_TIMESTAMP IS NULL
--*
--*

--*
-- current USERID=DJINST5 CNTL_ALIAS alias= 17 Mar 1999 2:24pm
--*
--* Fetching from the ASN.IBMSNAP_SUBS_STMTS table at DJDB
--*
--* There is no conflicting entry in the ASN.IBMSNAP_SUBS_STMTS table,
--* so Ok to add.
--*
--* CONNECTing TO DJDB USER djinst5 USING pwd;
--*
--* The SQL_STMT: UPDATE SIMON.SUPPLIER A SET EXPIRED_TIMESTAMP= (
--* SELECT MIN(IBMSNAP_LOGMARKER) FROM SIMON.SUPPLIER B WHERE
--* A.SUPP_NO=B.SUPP_NO AND A.EXPIRED_TIMESTAMP IS NULL AND
--* B.EXPIRED_TIMESTAMP IS NULL AND (B.IBMSNAP_INTENTSEQ >
--* A.IBMSNAP_INTENTSEQ)) WHERE A.EXPIRED_TIMESTAMP IS NULL; passed
--* validation at the server with alias name DJDB .
--*
-- create a new row in IBMSNAP_SUBS_STMTS

VALUES(’WHQ1’,’SALES_SET’,’S’,’A’, 1 ,’E’,’UPDATE SIMON.SUPPLIER A SET
EXPIRED_TIMESTAMP= (SELECT MIN(IBMSNAP_LOGMARKER) FROM
SIMON.SUPPLIER B WHERE A.SUPP_NO=B.SUPP_NO AND
A.EXPIRED_TIMESTAMP IS NULL AND B.EXPIRED_TIMESTAMP IS NULL AND (
B.IBMSNAP_INTENTSEQ > A.IBMSNAP_INTENTSEQ)) WHERE A.EXPIRED_TIMESTAMP
IS NULL’,’0000002000’);

APPLY_QUAL=’WHQ1’ AND SET_NAME=’SALES_SET’ AND WHOS_ON_FIRST=’S’;
--* commit work at cntl_ALIAS DJDB

--*
COMMIT;
-- Satisfactory completion of ADDSTMT at 2:24pm
E.13 Maintain Base Aggregate Table from Change Aggregate Subscription

----* File Name: aggregat.sql
--
--
-- The following SQL will be used to create a summary table from
-- the SALES table :
-- SELECT company, location, sum(pieces), sum(out_prc)
-- FROM sales GROUP BY company,location
--
-- This will be maintained initially from a base aggregate
-- (for full refresh only), and subsequently from a change
-- aggregate subscription.
--
-- Be sure to register the SALES table with XPIECES and XOUT_PRC
-- as a before-image columns. If the table has already been
-- registered, you can re-register the table using DJRA.
-- Alternatively, you can uncomment the following SQL which adds
-- the after-image columns to the SALES CD table, and updates
-- the register table with information on the before image
-- prefix.
--
-- ALTER TABLE DB2RES5.CDSALES ADD XPIECES DECIMAL(7,0);
-- ALTER TABLE DB2RES5.CDSALES ADD XOUT_PRC DECIMAL(11,2);
-- Change register record for Sales by updating BEFORE_IMG_PREFIX

-- UPDATE ASN.IBMSNAP_REGISTER SET BEFORE_IMG_PREFIX=’X’ WHERE
-- SOURCE_OWNER=’DB2RES5’ AND SOURCE_TABLE=’SALES’;
--
-- COMMIT
--
--
-- Connect to the DataJoiner database and create the
-- MOVEMENT and AGGREGATE tables.
--
CONNECT TO DJDB USER djinst5 ;
--
-- Create the Movement table. This needs to be similar to the
-- AGGREGATES table, but has an additional IBMSNAP_OPERATION
-- column. The table uses DataJoiner’s DDL transparency feature
-- to create the table in Oracle and also create a nickname

-- for it.
--
CREATE TABLE SIMON.MOVEMENT(
COMPANY INTEGER NOT NULL,
LOCATION INTEGER NOT NULL,
DIFFERENCE_PIECES INTEGER NOT NULL,
DIFFERENCE_OUTPRC DECIMAL(20,2) NOT NULL,
IBMSNAP_OPERATION CHAR(1) NOT NULL,
IBMSNAP_HLOGMARKER TIMESTAMP NOT NULL,
IBMSNAP_LLOGMARKER TIMESTAMP NOT NULL) IN AZOVORA8;
--
-- Note: The IBMSNAP_OPERATION column above is not referenced
-- when updating the SIMON.AGGREGATES table. Rather, having
-- the value in the SIMON.MOVEMENT table more clearly shows
-- the intermediate aggregations.
--
-- Create an index for the SIMON.MOVEMENT on the source
-- columns which are used for GROUPING.
--
CREATE INDEX MOVEMENTX ON SIMON.MOVEMENT(COMPANY,LOCATION ASC);
SET PASSTHRU RESET;
--
-- Now update the DJ global catalog so it is aware of the
-- index just created.
--
CREATE INDEX MOVEMENTX ON SIMON.MOVEMENT(COMPANY,LOCATION ASC);
--
-- Create the AGGREGATES table. This is the table which will
-- hold the final summary information. Again, the table is created
-- in Oracle using DataJoiner’s DDL transparency.
--
CREATE TABLE SIMON.AGGREGATES(
COMPANY INTEGER NOT NULL,
LOCATION INTEGER NOT NULL,
SUM_PIECES INTEGER,
SUM_OUTPRC DECIMAL(20,2),
IBMSNAP_HLOGMARKER TIMESTAMP NOT NULL,
IBMSNAP_LLOGMARKER TIMESTAMP NOT NULL) IN AZOVORA8;
--
-- Create an index for the target SIMON.AGGREGATES table
-- on the columns used for GROUPING.
--
CREATE UNIQUE INDEX AGGREGATESX ON SIMON.AGGREGATES(COMPANY,LOCATION ASC);
SET PASSTHRU RESET;
--
-- Now update the DJ global catalog with information that
-- this index exists.
--
CREATE UNIQUE INDEX AGGREGATESX ON SIMON.AGGREGATES(COMPANY,LOCATION ASC);
--
--
-- As a ’union’ must be simulated, and DB2 for OS/390 disallows
-- unions in views, multiple subscription members are be used,
-- requiring target views to differentiate the separate members
-- copying insert, update and delete operations.
-- The views are created in DataJoiner, over the nickname
-- for MOVEMENT table.
--
-- For INSERTs....
CREATE VIEW SIMON.MOVEMENTI AS SELECT * FROM SIMON.MOVEMENT;
--

-- For UPDATEs....
CREATE VIEW SIMON.MOVEMENTU AS SELECT * FROM SIMON.MOVEMENT;
--
-- For DELETEs....
CREATE VIEW SIMON.MOVEMENTD AS SELECT * FROM SIMON.MOVEMENT;
--
--
--
-- Create a new row in IBMSNAP_SUBS_SET for the base aggregate
-- "AGGREGATES" subscription - BASEAGG_SET.
--
MAX_SYNCH_MINUTES,AUX_STMTS,ARCH_LEVEL) VALUES (1 , ’WHQAGG’ ,
’BASEAGG_SET’ , ’S’ , ’SJ390DB1’ , ’SJ390DB1’ , ’DJDB’ , ’DJDB’ , 0 ,
’1999-01-05-19.19.00’ , ’R’ , 1 ,NULL , 15 , 0 ,’0201’);
--
-- Create a new row in IBMSNAP_SUBS_SET for the change aggregate
-- "MOVEMENT" subscription - CHGAGG_SET.
--
MAX_SYNCH_MINUTES,AUX_STMTS,ARCH_LEVEL) VALUES (0 , ’WHQAGG’ ,
’CHGAGG_SET’ , ’S’ , ’SJ390DB1’ , ’SJ390DB1’ , ’DJDB’ , ’DJDB’ , 0 ,
’1999-01-05-19.19.00’ , ’R’ , 1 ,NULL , 15 , 0 ,’0201’);
--
--
-- Create a new row in IBMSNAP_SUBS_MEMBR for the BASE aggregate.
--
’WHQAGG’ , ’BASEAGG_SET’ , ’S’ , ’DB2RES5’ , ’SALES’ , 0 ,
’SIMON’,’AGGREGATES’,’A’,’N’, 5 ,’1=1 GROUP BY COMPANY,LOCATION’);
--
-- The dummy predicate above, 1=1, can be substituted with a real
-- filtering predicate. The aggregate subscription requires a
-- predicate of some kind preceding the GROUP BY clause.
--
--
-- Create a new row in IBMSNAP_SUBS_COLS for the BASEAGG_SET aggregate
VALUES(’WHQAGG’,’BASEAGG_SET’ , ’S’,’SIMON’,’AGGREGATES’ ,’A’,
’COMPANY’,’N’, 1 ,’COMPANY’);
--
VALUES(’WHQAGG’,’BASEAGG_SET’ , ’S’,’SIMON’,’AGGREGATES’ ,’A’,
’LOCATION’,’N’, 2 ,’LOCATION’);
--
VALUES(’WHQAGG’,’BASEAGG_SET’ , ’S’,’SIMON’,’AGGREGATES’ ,’F’,
’SUM_PIECES’,’N’, 3 ,’SUM(PIECES)’);
--

VALUES(’WHQAGG’,’BASEAGG_SET’ , ’S’,’SIMON’,’AGGREGATES’ ,’F’,
’SUM_OUTPRC’,’N’, 4 ,’SUM(OUT_PRC)’);
--
--
-- First of all, enable Full Refresh for the base table so
-- that the initial load of the base_aggregate can take place.
-- Full refresh will be disabled after the initial full refresh
-- has completed.
-- This is necessary, because the SALES table has Full Refresh
-- disabled. If Full Refresh is enabled for the source table,
-- the next two INSERTs into SUBS_STMTS can be commented out.
--
-- ’G’ in BEFORE_OR_AFTER means execute this SQL before reading
-- the REGISTER table.
-- ’S’ in BEFORE_OR_AFTER means execute this SQL before reading
-- the CD table.
--
VALUES(’WHQAGG’,’BASEAGG_SET’,’S’,’G’, 1 ,’E’,
SOURCE_OWNER=’’DB2RES5’’ AND SOURCE_TABLE=’’SALES’’ ’,’0000002000’);
--
-- Now disable full refresh
--
VALUES(’WHQAGG’,’BASEAGG_SET’,’S’,’S’, 2 ,’E’,
SOURCE_OWNER=’’DB2RES5’’ AND SOURCE_TABLE=’’SALES’’ ’,’0000002000’);
--
--
-- Add an SQL-before statement to remove all rows from the AGGREGATES
-- table, just in case this is a re-run and the table is not empty.
--
-- COMMIT SQL statements are added to avoid a DUOW error. These are
-- probably only required for non-IBM targets.
--
-- COMMIT
VALUES(’WHQAGG’,’BASEAGG_SET’,’S’,’B’, 3,’E’,
’COMMIT’,’0000002000’);
--
-- Clear out the AGGREGATES table
VALUES(’WHQAGG’,’BASEAGG_SET’,’S’,’B’, 4 ,’E’,
’DELETE FROM SIMON.AGGREGATES’,’0000002000’);
--
-- and another COMMIT
VALUES(’WHQAGG’,’BASEAGG_SET’,’S’,’A’, 5,’E’,
’COMMIT’,’0000002000’);
--
--
-- Now activate the change aggregate set CHGAGG_SET which will
-- take over the subscription once the initial full refresh
-- has taken place.
--
BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT)

VALUES(’WHQAGG’,’BASEAGG_SET’,’S’,’A’, 6 ,’E’,’UPDATE
ASN.IBMSNAP_SUBS_SET SET ACTIVATE=1 WHERE APPLY_QUAL=’’WHQAGG’’
AND SET_NAME=’’CHGAGG_SET’’ AND WHOS_ON_FIRST=’’S’’’);
--
-- COMMIT.....
VALUES(’WHQAGG’,’BASEAGG_SET’,’S’,’B’, 7,’E’,
’COMMIT’,’0000002000’);
--
--
--
-- Add an SQL-after statement to turn off the AGGREGATES subscription
-- once it has completed successfully.
-- Attach this to the MOVEMENT subscription, so that the AGGREGATES
-- subscription will not be self-modifying.
-- Older levels of Apply code did not allow a subscription to modify
-- its own ACTIVATE value.
--
-- COMMIT first of all to avoid DUOW problems....
VALUES(’WHQAGG’,’CHGAGG_SET’,’S’,’A’, 1 ,’E’,
’COMMIT’);
--
VALUES(’WHQAGG’,’CHGAGG_SET’,’S’,’A’, 2 ,’E’,’UPDATE
ASN.IBMSNAP_SUBS_SET SET ACTIVATE=0 WHERE APPLY_QUAL=’’WHQAGG’’
AND SET_NAME=’’BASEAGG_SET’’ AND WHOS_ON_FIRST=’’S’’’);
--
-- Increment the AUX_STMTS counter in IBMSNAP_SUBS_SET
-- for both the BASEAGG_SET and the CHGAGG_SET.
--
APPLY_QUAL=’WHQAGG’ AND SET_NAME=’BASEAGG_SET’ AND
WHOS_ON_FIRST=’S’;
--
APPLY_QUAL=’WHQAGG’ AND SET_NAME=’CHGAGG_SET’ AND
--
--
--
-- Create a new row in IBMSNAP_SUBS_MEMBR to fetch insert
-- operations from the source CD table into view SIMON.MOVEMENTI.
-- This member just fetches the aggregated INSERTs.
--
’WHQAGG’ , ’CHGAGG_SET’ , ’S’ , ’DB2RES5’ , ’SALES’ , 0 ,
’SIMON’,’MOVEMENTI’,’A’,’N’, 6 ,
’IBMSNAP_OPERATION=’’I’’ GROUP BY COMPANY,LOCATION,IBMSNAP_OPERATION’);
--
-- Now add the columns for this subscription member to the
-- SUBS_COLS table.
--
-- Create a new row in IBMSNAP_SUBS_COLS for MOVEMENTI
VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTI’ ,’A’,

--
--
VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTI’ ,’F’,
’DIFFERENCE_PIECES’,’N’, 3 ,’SUM(PIECES)’);
--
VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTI’ ,’F’,
’DIFFERENCE_OUTPRC’,’N’, 4 ,’SUM(OUT_PRC)’);
--
’IBMSNAP_OPERATION’,’N’, 5 ,’IBMSNAP_OPERATION’);
--
--
--
-- Create a new row in IBMSNAP_SUBS_MEMBR to fetch UPDATE
-- operations from the source CD table into view SIMON.MOVEMENTU.
-- This member just fetches the aggregated UPDATEs.
--
’SIMON’,’MOVEMENTU’,’A’,’N’, 6 ,
’IBMSNAP_OPERATION=’’U’’ GROUP BY COMPANY,LOCATION,IBMSNAP_OPERATION’);
--
-- Create a new row in IBMSNAP_SUBS_COLS for MOVEMENTU
VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTU’ ,’A’,
--
--
-- Because this is an update, DIFFERENCE_PIECES is calculated by
-- substracting the before-image sum from the after-image sum.
VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTU’ ,’C’,
’DIFFERENCE_PIECES’,’N’, 3 ,’SUM(PIECES)-SUM(XPIECES)’);
--
-- Because this is an update, DIFFERENCE_OUTPRC is calculated by
-- substracting the before-image sum from the after-image sum.

VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTU’ ,’F’,
’DIFFERENCE_OUTPRC’,’N’, 4 ,’SUM(OUT_PRC)-SUM(XOUT_PRC)’);
--
--
--
--
-- Create a new row in IBMSNAP_SUBS_MEMBR to fetch DELETE
-- operations from the source CD table into view SIMON.MOVEMENTD.
-- This member just fetches the aggregated DELETEs.
--
’SIMON’,’MOVEMENTD’,’A’,’N’, 6 ,
’IBMSNAP_OPERATION=’’D’’ GROUP BY COMPANY,LOCATION,IBMSNAP_OPERATION’);
--
-- Create a new row in IBMSNAP_SUBS_COLS for MOVEMENTD
VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTD’ ,’A’,
--
--
-- The PIECES value is negated before going into the
-- MOVEMENT table (since the value has been deleted) and
-- must be subtracted from the sum.
VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTD’ ,’F’,
’DIFFERENCE_PIECES’,’N’, 3 ,’-SUM(PIECES)’);
--
-- The OUT_PRC value is negated before going into the
-- MOVEMENT table (since the value has been deleted) and
-- must be subtracted from the sum.
VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTD’ ,’F’,
’DIFFERENCE_OUTPRC’,’N’, 4 ,’-SUM(OUT_PRC)’);
--
--
--
--
-- The IBMSNAP_LLOGMARKER and IBMSNAP_HLOGMARKER columns will be
-- automatically maintained by the Apply process, representing
-- the interval of the change aggregation.

--
--
--
-- Remove old rows from the MOVEMENT table before the SQL-after
-- statements run again. Otherwise, they will be double-counted
-- in the AGGREGATES table.
--
-- COMMIT......
VALUES(’WHQAGG’,’CHGAGG_SET’,’S’,’B’, 2 ,’E’,’COMMIT’,’0000002000’);
--
VALUES(’WHQAGG’,’CHGAGG_SET’,’S’,’B’, 3 ,’E’,’DELETE FROM
SIMON.MOVEMENT’,’0000002000’);
--
-- COMMIT......
VALUES(’WHQAGG’,’CHGAGG_SET’,’S’,’A’, 4 ,’E’,’COMMIT’,’0000002000’);
--
--
--
-- Add an SQL-after statement to compute adjust AGGREGATES for
-- INSERT/UPDATE/DELETE to SALES.
-- This is the main guts of the logic.
--
--
’UPDATE SIMON.AGGREGATES A SET
SUM_PIECES=
(SELECT CASE
WHEN SUM(DIFFERENCE_PIECES) IS NULL THEN A.SUM_PIECES
ELSE SUM(DIFFERENCE_PIECES) + A.SUM_PIECES
END
FROM SIMON.MOVEMENT M
WHERE A.COMPANY=M.COMPANY AND A.LOCATION=M.LOCATION),
SUM_OUTPRC=
(SELECT CASE
WHEN SUM(DIFFERENCE_OUTPRC) IS NULL THEN A.SUM_OUTPRC
ELSE SUM(DIFFERENCE_OUTPRC) + A.SUM_OUTPRC
END
WHERE A.COMPANY=M.COMPANY AND A.LOCATION=M.LOCATION),
IBMSNAP_HLOGMARKER=
(SELECT CASE
WHEN MAX(M.IBMSNAP_HLOGMARKER) IS NULL THEN A.IBMSNAP_HLOGMARKER
ELSE MAX(M.IBMSNAP_HLOGMARKER)
END
FROM SIMON.MOVEMENT M)’,’0000002000’);
--
--
-- Add some more SQL-after to adds rows when new COMPANY’s, LOCATION’s
-- are created.
--
--

--
’INSERT INTO SIMON.AGGREGATES
(COMPANY,LOCATION,SUM_PIECES,SUM_OUTPRC,IBMSNAP_LLOGMARKER,IBMSNAP_HLOGMARKER)
SELECT COMPANY,LOCATION,DIFFERENCE_PIECES,DIFFERENCE_OUTPRC,
IBMSNAP_LLOGMARKER,IBMSNAP_HLOGMARKER
WHERE NOT EXISTS
(SELECT * FROM SIMON.AGGREGATES E WHERE
E.COMPANY=M.COMPANY AND E.LOCATION=M.LOCATION)’,
’0000002000’);
--
--
--
--
-- Increment the AUX_STMTS counter in IBMSNAP_SUBS_SET for
-- the CHGAGG_SET.
--
APPLY_QUAL=’WHQAGG’ AND SET_NAME=’CHGAGG_SET’ AND
--
--
COMMIT;
CONNECT RESET;
--
--
--
-- Connect to the source system and record the BASEAGG_SET and
-- CHGAGG_SET subscriptions in the pruning control table.
--
CONNECT TO SJ390DB1 USER DB2RES5 ;
--
’SIMON’,’AGGREGATES’,’DB2RES5’,’SALES’, 0 ,’WHQAGG’,
’BASEAGG_SET’,’DJDB’, 5 ,’DJDB’);
--
’SIMON’,’MOVEMENTI’,’DB2RES5’,’SALES’, 0 ,’WHQAGG’,
’CHGAGG_SET’,’DJDB’, 6 ,’DJDB’);
--
’SIMON’,’MOVEMENTU’,’DB2RES5’,’SALES’, 0 ,’WHQAGG’,
--
’SIMON’,’MOVEMENTD’,’DB2RES5’,’SALES’, 0 ,’WHQAGG’,
--

COMMIT;
--
CONNECT RESET;
--
-- Finshed.
-- ************************************************************
-- Now check that all the SQL_Before and SQL-After are valid
-- by using the Replication Analyzer with the DEEPCHECK option.
-- ************************************************************

Appendix F. DJRA Generated SQL for Case Study 4
replication definitions which were configured in case study 4. The
modifications to the generated SQL are shown in bold typeface.
F.1 Structures of the Tables

The following SQL statements were used to create the tables:
-- Customers table
CREATE TABLE IWH.CUSTOMERS (
CUSTNO CHAR(8) NOT NULL,
LNAME CHAR(20),
FNAME CHAR(15),
SEX CHAR(1),
BIRTHDATE DATE,
AGENCY INTEGER NOT NULL,
SALESREP DECIMAL(6, 0) NOT NULL,
ADDRESS CHAR(50),
LICNB CHAR(12),
LICCAT CHAR(1),
LICDATE DATE,
PRIMARY KEY (CUSTNO))
DATA CAPTURE CHANGES ;
COMMENT ON IWH.CUSTOMERS (
CUSTNO IS ’Customer number’,
LNAME IS ’First name’,
FNAME IS ’Last name’,
SEX IS ’Sex’,
BIRTHDATE IS ’Birth date’,
AGENCY IS ’Agency code’,
SALESREP IS ’Sales rep in charge of the customer’,
ADDRESS IS ’Customer Address’,
LICNB IS ’Driving licence number’,
LICCAT IS ’Driving licence category’,
LICDATE IS ’Driving licence date’) ;
-- Contracts table
CREATE TABLE IWH.CONTRACTS (
CONTRACT INTEGER NOT NULL,
CONTYPE CHAR(2) NOT NULL,
LIMITED CHAR(1),
BASEFARE DECIMAL(7, 2),

TAXES DECIMAL(7, 2),
CREDATE DATE,
PRIMARY KEY (CONTRACT))
COMMENT ON IWH.CONTRACTS (
CONTRACT IS ’Contract number’,
CONTYPE IS ’Contract type’,
LIMITED IS ’Warranty excludes fire/glass break’,
BASEFARE IS ’Annual base fare’,
TAXES IS ’Taxes’,
CREDATE IS ’Creation date’) ;
-- Vehicles table
CREATE TABLE IWH.VEHICLES (
PLATENUM CHAR(12) NOT NULL,
CONTRACT INTEGER NOT NULL,
BRAND CHAR(10),
MODEL CHAR(10),
COACHWORK CHAR(1),
ENERGY CHAR(2),
POWER DECIMAL(4, 0),
ENGINEID CHAR(10),
VALUE DECIMAL(10, 0),
FACTORDATE DATE,
ALARM CHAR(1),
ANTITHEFT HAR(1),
PRIMARY KEY(PLATENUM))
COMMENT ON IWH.VEHICLES (
PLATENUM IS ’Plate-number’,
CONTRACT IS ’Contract number’,
BRAND IS ’Brand’,
MODEL IS ’Model’,
COACHWORK IS ’Coachwork type code’,
ENERGY IS ’Energy type’,
POWER IS ’Power’,
ENGINEID IS ’Engine identification number’,
VALUE IS ’Initial purchase value’,
FACTORDATE IS ’Date of exit from factory’,
ALARM IS ’Alarm feature code’,
ANTITHEFT IS ’Anti-theft feature code’) ;

-- Accidents table
CREATE TABLE IWH.ACCIDENTS (
ACCNUM DECIMAL(5, 0) NOT NULL,
TOWN CHAR(15),
REPAIRCOST DECIMAL(10,2),
STATUS CHAR(1),
ACCDATE DATE,
PRIMARY KEY(CUSTNO, ACCNUM))
COMMENT ON IWH.ACCIDENTS (
ACCNUM IS ’Accident record number’,
TOWN IS ’Town where accident happened’,
REPAIRCOST IS ’Repair cost’,
STATUS IS ’Status’,
ACCDATE IS ’Accident Date’) ;
F.2 SQL Script to Define the CONTRACTS Table as a Replication Source
Notice : We adapted the generated SQL script before running it, to change
the name of the Change Data table.
--* echo input: TABLEREG SJNTDWH1 IWH CONTRACTS BOTH NONEEXCLUDED
--* DELETEINSERTUPDATE STANDARD N
--*


CONNECT TO SJNTDWH1 USER DBADMIN USING pwd;
-- USERID=DBADMIN SOURCE_ALIAS alias=SJNTDWH1

-- PRDID=SQL0502 15 Mar 1999 6:04pm
-- source server SJNTDWH1 is not a DataJoiner server,


-- 1 table IWH.CONTRACTS
-- registration candidate #1 IWH.CONTRACTS

-- IWH.CONTRACTS is assumed a USER table


-- CREATE TABLESPACE TS041488 MANAGED BY DATABASE USING (FILE
--’C:\TS041488.F1’ 500 );
--* Source table IWH.CONTRACTS already has CDC attribute, no need to

--* alter.
-- create the cd/ccd table for IWH.CONTRACTS
CREATE TABLE IWH.CDCONTRACTS(IBMSNAP_UOWID CHAR(10) FOR BIT
DATA NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL,
IBMSNAP_OPERATION CHAR(1) NOT NULL,CONTRACT INTEGER NOT NULL,XCONTRACT
INTEGER,CONTYPE CHAR(2) NOT NULL,XCONTYPE CHAR(2),CUSTNO CHAR(8) NOT
NULL,XCUSTNO CHAR(8),LIMITED CHAR(1),XLIMITED CHAR(1),BASEFARE DECIMAL(
7 , 2),XBASEFARE DECIMAL(7 , 2),TAXES DECIMAL(7 , 2),XTAXES DECIMAL(7 ,
2),CREDATE DATE,XCREDATE DATE) ;
-- create the index for the change data table for IWH.CONTRACTS
CREATE UNIQUE INDEX IWH.CDICONTRACTS ON IWH.CDCONTRACTS(

PARTITION_KEYS_CHG) VALUES(’N’,’IWH’,’CONTRACTS’, 0 , 1 ,’Y’,’Y’,’IWH’,
’CDCONTRACTS’,’IWH’,’CDCONTRACTS’, 0 ,’0201’,’X’,’1’,’N’)
;
COMMIT;

F.3 SQL Script to Define the VCONTRACTS View as a Replication Source
--*
--* Calling VIEWREG for source table IWH.VCONTRACTS
--*
-- If you don’t see: ’-- now done interpreting...’ then check your REXX
code
-- input view OWNER=IWH input view NAME=VCONTRACTS
-- USERID=DBADMIN SOURCE_ALIAS alias=SJNTDWH1 15 Mar 1999 6:40pm

--* The view definition to be registered=’CREATE VIEW IWH.VCONTRACTS (
--* CONTRACT, CONTYPE, CUSTNO, LIMITED, BASEFARE, TAXES, CREDATE,
--* AGENCY) AS SELECT A.CONTRACT, A.CONTYPE, A.CUSTNO, A.LIMITED,
--* A.BASEFARE, A.TAXES, A.CREDATE, B.AGENCY FROM IWH.CONTRACTS A,
--* IWH.CUSTOMERS B WHERE A.CUSTNO = B.CUSTNO ’
--*
CREATE VIEW IWH.VCONTRACTSA (IBMSNAP_UOWID,IBMSNAP_INTENTSEQ,
IBMSNAP_OPERATION,CONTRACT, CONTYPE, CUSTNO, LIMITED, BASEFARE, TAXES,
CREDATE, AGENCY) AS SELECT B.IBMSNAP_UOWID,B.IBMSNAP_INTENTSEQ,
B.IBMSNAP_OPERATION,A.CONTRACT, A.CONTYPE, A.CUSTNO, A.LIMITED,
A.BASEFARE, A.TAXES, A.CREDATE, B.AGENCY FROM IWH.CONTRACTS A,
IWH.CDCUSTOMERS B WHERE A.CUSTNO = B.CUSTNO ;

CONFLICT_LEVEL,PARTITION_KEYS_CHG) VALUES(’N’,’IWH’,’VCONTRACTS’, 1 ,
1 ,’Y’,’Y’,’IWH’,’VCONTRACTSA’,’IWH’,’CDCUSTOMERS’, 0 ,NULL,NULL,NULL,
NULL,NULL ,NULL,NULL,’0201’,NULL,’0’,’N’);

CREATE VIEW IWH.VCONTRACTSB (IBMSNAP_UOWID,IBMSNAP_INTENTSEQ,
IBMSNAP_OPERATION,CONTRACT, CONTYPE, CUSTNO, LIMITED, BASEFARE, TAXES,
CREDATE, AGENCY) AS SELECT A.IBMSNAP_UOWID,A.IBMSNAP_INTENTSEQ,
A.IBMSNAP_OPERATION,A.CONTRACT, A.CONTYPE, A.CUSTNO, A.LIMITED,
A.BASEFARE, A.TAXES, A.CREDATE, B.AGENCY FROM IWH.CDCONTRACTS A,
IWH.CUSTOMERS B WHERE A.CUSTNO = B.CUSTNO ;

CONFLICT_LEVEL,PARTITION_KEYS_CHG) VALUES(’N’,’IWH’,’VCONTRACTS’, 2 ,
1 ,’Y’,’Y’,’IWH’,’VCONTRACTSB’,’IWH’,’CDCONTRACTS’, 0 ,NULL,NULL,NULL,
NULL,NULL ,NULL,NULL,’0201’,NULL,’0’,’N’);
COMMIT;
F.4 SQL Script to Create the CUST0001 Empty Subscription Set

--*
--* Calling ADDSET for set 1 : AQSR0001/CONT0001
--*
--* echo input: ADDSET SJNTDWH1 AQSR0001 CONT0001 SJNTDWH1 DBSR0001
--* 19990316095500 R 2 NULL 30 TARGET=MSJET
--*
code
--* CONNECTing TO SJNTDWH1 USER DBADMIN USING pwd;
--*
--* The ALIAS name ’SJNTDWH1’ matches the RDBNAM ’SJNTDWH1’
--*
--*
-- current USERID=DBADMIN CNTL_ALIAS alias=SJNTDWH1 16 Mar 1999 9:57am

MAX_SYNCH_MINUTES,AUX_STMTS,ARCH_LEVEL) VALUES (0 , ’AQSR0001’ ,
’CONT0001’ , ’S’ , ’SJNTDWH1’ , ’SJNTDWH1’ , ’MSJET’ , ’DBSR0001’ , 0 ,
’1999-03-16-09.55.00’ , ’R’ , 2 ,NULL , 30 , 0 ,’0201’);

MAX_SYNCH_MINUTES,AUX_STMTS,ARCH_LEVEL) VALUES (0 , ’AQSR0001’ ,
’CONT0001’ , ’F’ , ’MSJET’ , ’DBSR0001’ , ’SJNTDWH1’ , ’SJNTDWH1’ , 0 ,
’1999-03-16-09.55.00’ , ’R’ , 2 ,NULL , 30 , 0 ,’0201’);
--* commit work at SJNTDWH1

--*
COMMIT;
F.5 SQL Script to Add a Member to the CONT0001 Empty

Subscription Set
--*
--* Calling C:\SQLLIB\DPRTools\addmembr.rex for AQSR0001/CONT0001 pair # 2
--*
--* Echo input: ADDMEMBR SJNTDWH1 AQSR0001 CONT0001 IWH VCONTRACTS
--* NONEEXECLUDED ROWREPLICA CONTRACT+ IWH CONTRACTS NODATAJOINER U
--* MSJET ’(AGENCY = 25)’
--*
code
--*

--*
--* Current USERID=DBADMIN CNTL_ALIAS alias= 16 Mar 1999 11:24am
--* Fetching from the ASN.IBMSNAP_SUBS_SET table at SJNTDWH1
--*
--* CONNECTing TO SJNTDWH1 USER DBADMIN USING pwd;
--*
--*
--* Fetching from the ASN.IBMSNAP_REGISTER table at SJNTDWH1
--*
--* Because you are defining a replica subscription from a source view,

--* and not a source table, some unusual steps will be taken. The
--* subscription down the heirarchy will name the view as the source
--* and the replica as the target. The subscription up the heirarchy
--* will name the replica as the source and the dominant table within
--* the view as the target. The dominant table being the component of
--* the source view which contributes the most columns to the source
--* view. Only columns from the dominant component in the source view
--* will be included in the replica.
--*
--* The source view IWH.VCONTRACTS includes 7 columns from the
--* dominant view component IWH.CONTRACTS.
--*
--* The source view IWH.VCONTRACTS also includes 2 columns from view
--* component IWH.CUSTOMERS, which will not be selected but can be
--* referenced in your subscription predicates.
--*
--*
--* CNTLSVR.REX

’AQSR0001’ , ’CONT0001’ , ’S’ , ’IWH’ , ’VCONTRACTS’ , 1 ,’IWH’,
’CONTRACTS’,’Y’,’Y’, 9 , ’(AGENCY = 25)’);

VALUES(’AQSR0001’,’CONT0001’ , ’S’,’IWH’,’CONTRACTS’ ,’A’,’CONTRACT’,
’Y’, 1 ,’CONTRACT’);

VALUES(’AQSR0001’,’CONT0001’ , ’S’,’IWH’,’CONTRACTS’ ,’A’,’CONTYPE’,
’N’, 2 ,’CONTYPE’);


VALUES(’AQSR0001’,’CONT0001’ , ’S’,’IWH’,’CONTRACTS’ ,’A’,’CUSTNO’,
’N’, 3 ,’CUSTNO’);

VALUES(’AQSR0001’,’CONT0001’ , ’S’,’IWH’,’CONTRACTS’ ,’A’,’LIMITED’,
’N’, 4 ,’LIMITED’);

VALUES(’AQSR0001’,’CONT0001’ , ’S’,’IWH’,’CONTRACTS’ ,’A’,’BASEFARE’,
’N’, 5 ,’BASEFARE’);

VALUES(’AQSR0001’,’CONT0001’ , ’S’,’IWH’,’CONTRACTS’ ,’A’,’TAXES’,’N’,
6 ,’TAXES’);

VALUES(’AQSR0001’,’CONT0001’ , ’S’,’IWH’,’CONTRACTS’ ,’A’,’CREDATE’,
’N’, 7 ,’CREDATE’);

’AQSR0001’ , ’CONT0001’ , ’S’ , ’IWH’ , ’VCONTRACTS’ , 2 ,’IWH’,
’CONTRACTS’,’Y’,’Y’, 9 , ’(AGENCY = 25)’);

--*
UPDATE ASN.IBMSNAP_SUBS_SET SET ACTIVATE=1 WHERE
APPLY_QUAL=’AQSR0001’ AND SET_NAME=’CONT0001’ AND WHOS_ON_FIRST=’S’;
-- Create a new row in IBMSNAP_SUBS_MEMBR

’AQSR0001’ , ’CONT0001’ , ’F’ , ’IWH’ , ’CONTRACTS’ , 0 ,’IWH’,
’CONTRACTS’,’Y’,’Y’, 1 ,NULL);
--* Assuming the set subscription is inactive, I’ll activate it.

--*
UPDATE ASN.IBMSNAP_SUBS_SET SET ACTIVATE=1 WHERE
APPLY_QUAL=’AQSR0001’ AND SET_NAME=’CONT0001’ AND WHOS_ON_FIRST=’F’;

VALUES(’AQSR0001’,’CONT0001’ , ’F’,’IWH’,’CONTRACTS’ ,’A’,’CONTRACT’,
’Y’, 1 ,’CONTRACT’);

VALUES(’AQSR0001’,’CONT0001’ , ’F’,’IWH’,’CONTRACTS’ ,’A’,’CONTYPE’,
’N’, 2 ,’CONTYPE’);

VALUES(’AQSR0001’,’CONT0001’ , ’F’,’IWH’,’CONTRACTS’ ,’A’,’CUSTNO’,
’N’, 3 ,’CUSTNO’);

VALUES(’AQSR0001’,’CONT0001’ , ’F’,’IWH’,’CONTRACTS’ ,’A’,’LIMITED’,
’N’, 4 ,’LIMITED’);

VALUES(’AQSR0001’,’CONT0001’ , ’F’,’IWH’,’CONTRACTS’ ,’A’,’BASEFARE’,
’N’, 5 ,’BASEFARE’);

VALUES(’AQSR0001’,’CONT0001’ , ’F’,’IWH’,’CONTRACTS’ ,’A’,’TAXES’,’N’,
6 ,’TAXES’);

VALUES(’AQSR0001’,’CONT0001’ , ’F’,’IWH’,’CONTRACTS’ ,’A’,’CREDATE’,
’N’, 7 ,’CREDATE’);
-- CREATE A NEW ROW IN IBMSNAP_SCHEMA_CHG

INSERT INTO ASN.IBMSNAP_SCHEMA_CHG( APPLY_QUAL,SET_NAME,LAST_CHANGED)
VALUES ( ’AQSR0001’ , ’CONT0001’ , CURRENT TIMESTAMP);
--* Commit work at cntl_ALIAS SJNTDWH1

--*
COMMIT;

--*

--*
--* source server
--*
SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’MSJET’,’IWH’,
’CONTRACTS’,’IWH’,’VCONTRACTS’, 1 ,’AQSR0001’,’CONT0001’,’SJNTDWH1’, 9
,’SJNTDWH1’);

--* source server
--*
SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’MSJET’,’IWH’,
’CONTRACTS’,’IWH’,’VCONTRACTS’, 2 ,’AQSR0001’,’CONT0001’,’SJNTDWH1’, 9
,’SJNTDWH1’);
--* Commit work at source_ALIAS SJNTDWH1

--*
COMMIT;

Appendix G. Special Notices
This publication is intended to help database administrators and replication

specialists to learn more about the features of IBM’s replication solution
available in multi-vendor database environments. The information in this
publication is not intended as the specification of any programming interfaces
that are provided by IBM DB2 DataPropagator, IBM DB2 Universal Database,
and IBM DataJoiner. See the PUBLICATIONS section of the IBM
Programming Announcement for IBM DB2 DataPropagator, IBM DB2
Universal Database, and IBM DataJoiner for more information about what
publications are considered to be product documentation.
References in this publication to IBM products, programs or services do not

imply that IBM intends to make these available in all countries in which IBM
operates. Any reference to an IBM product, program, or service is not
intended to state or imply that only IBM's product, program, or service may be
used. Any functionally equivalent program that does not infringe any of IBM's
intellectual property rights may be used instead of the IBM product, program
or service.
Information in this book was developed in conjunction with use of the

equipment specified, and is limited in application to those specific hardware
and software products and levels.
IBM may have patents or pending patent applications covering subject matter
in this document. The furnishing of this document does not give you any
license to these patents. You can send license inquiries, in writing, to the IBM
Director of Licensing, IBM Corporation, 500 Columbus Avenue, Thornwood,
NY 10594 USA.
Licensees of this program who wish to have information about it for the
purpose of enabling: (i) the exchange of information between independently
created programs and other programs (including this one) and (ii) the mutual
use of the information which has been exchanged, should contact IBM
Corporation, Dept. 600A, Mail Drop 1329, Somers, NY 10589 USA.
Such information may be available, subject to appropriate terms and

conditions, including in some cases, payment of a fee.
The information contained in this document has not been submitted to any
formal IBM test and is distributed AS IS. The information about non-IBM
("vendor") products in this manual has been supplied by the vendor and IBM
assumes no responsibility for its accuracy or completeness. The use of this
information or the implementation of any of these techniques is a customer

responsibility and depends on the customer’s ability to evaluate and integrate
them into the customer’s operational environment. While each item may have
been reviewed by IBM for accuracy in a specific situation, there is no
guarantee that the same or similar results will be obtained elsewhere.
Customers attempting to adapt these techniques to their own environments
do so at their own risk.
Any pointers in this publication to external Web sites are provided for
convenience only and do not in any manner serve as an endorsement of
these Web sites.
Any performance data contained in this document was determined in a

controlled environment, and therefore, the results that may be obtained in
other operating environments may vary significantly. Users of this document
should verify the applicable data for their specific environment.
The following document contains examples of data and reports used in daily
business operations. To illustrate them as completely as possible, the
examples contain the names of individuals, companies, brands, and
products. All of these names are fictitious and any similarity to the names and
addresses used by an actual business enterprise is entirely coincidental.
Reference to PTF numbers that have not been released through the normal
distribution process does not imply general availability. The purpose of
including these reference numbers is to alert IBM customers to specific
information relative to the implementation of the PTF when it becomes
available to each customer according to the normal IBM PTF distribution
process.
The following terms are trademarks of the International Business Machines

Corporation in the United States and/or other countries:
AIX MVS/ESA
AS/400 OS/400
DATABASE 2 OS/390
DataJoiner OS/2
DataPropagator PowerPC
DB2 QMF
DRDA RISC System/6000
IBM  VM/ESA
IMS VSE/ESA

The following terms are trademarks of other companies:
Informix, Informix Dynamic Server, Informix ESQL/C, and Informix Client SDK
are trademarks of Informix Corporation.
Microsoft, Windows, Windows NT, the Windows logo, and Access are
trademarks of Microsoft Corporation in the United States and/or other
countries.
MMX, and Pentium are trademarks of Intel Corporation in the United States
and/or other countries. (For a complete list of Intel trademarks see
www.intel.com/dradmarx.htm)
Oracle, Oracle8, SQL*Net, Net8, SQL*Loader, and PL/SQL are trademarks of

Oracle Corporation.
Sybase, Open Server, Open Client, SQL Server, System11, and

Transact-SQL are trademarks of Sybase Inc.
UNIX is a registered trademark in the United States and/or other countries

licensed exclusively through X/Open Company Limited.
SET and the SET logo are trademarks owned by SET Secure Electronic
Transaction LLC.
Other company, product, and service names may be trademarks or service

marks of others.
Special Notices 395

Appendix H. Related Publications
The publications listed in this section are considered particularly suitable for a
more detailed discussion of the topics covered in this redbook.
H.1 International Technical Support Organization Publications

For information on ordering these ITSO publications see “How to Get ITSO
Redbooks” on page 401.
• DPROPR Planning and Design Guide , SG24-4771
• WOW! DRDA Supports TCP/IP: DB2 Server for OS/390 and DB2
Universal Database, SG24-2212
• Lotus Solutions for the Enterprise, Volume 5 NotesPump: The Enterprise
Data Mover, SG24-5255
• Migrating and Managing Data on RS/6000 SP with DB2 Parallel Edition,
SG24-4658
H.2 Redbooks on CD-ROMs

Redbooks are also available on CD-ROMs. Order a subscription and
receive updates 2-4 times a year at significant savings.
CD-ROM Title Subscription Collection Kit

Number Number
System/390 Redbooks Collection SBOF-7201 SK2T-2177
Networking and Systems Management Redbooks Collection SBOF-7370 SK2T-6022
Transaction Processing and Data Management Redbook SBOF-7240 SK2T-8038
Lotus Redbooks Collection SBOF-6899 SK2T-8039
Tivoli Redbooks Collection SBOF-6898 SK2T-8044
AS/400 Redbooks Collection SBOF-7270 SK2T-2849
RS/6000 Redbooks Collection (HTML, BkMgr) SBOF-7230 SK2T-8040
RS/6000 Redbooks Collection (PostScript) SBOF-7205 SK2T-8041
RS/6000 Redbooks Collection (PDF Format) SBOF-8700 SK2T-8043
Application Development Redbooks Collection SBOF-7290 SK2T-8037
H.3 Other Publications

These publications are also relevant as further information sources:

• DataJoiner for AIX Systems Planning, Installation, and Configuration
Guide, SC26-9145
• DataJoiner for Windows NT Systems Planning, Installation, and
Configuration Guide, SC26-9150
• DataJoiner Application Programming and SQL Reference Supplement,
SC26-9148
• DataJoiner Administration Supplement, SC26-9146
• DB2 Replication Guide and Reference , S95H-0999
• DB2 for OS/390 V5 SQL Reference, SC26-8966
• Oracle Net8 Administrator’s Guide, A58230-01
• Oracle8 Administrator’s Guide , A58397-01
• Administrator’s Guide for Informix Dynamic Server, Part No. 000-4354
• Oracle8 Utilities, A58244-01
H.4 Hot Web Sites

IBM Software Homepage
http://www.software.ibm.com
IBM Database Management Homepage
http://www.software.ibm.com/data
IBM DProp Homepage:
http://www.software.ibm.com/data/dpropr
IBM DataJoiner Homepage:
http://www.software.ibm.com/data/datajoiner
IBM Data Management Performance Reports
http://www.software.ibm.com/data/db2/performance
DataPropagator Relational Performance Measurement Series
http://www.software.ibm.com/data/db2/performance/dprperf.htm

Related Publications 399
How to Get ITSO Redbooks
This section explains how both customers and IBM employees can find out about ITSO redbooks,
CD-ROMs, workshops, and residencies. A form for ordering books and CD-ROMs is also provided.
This information was current at the time of publication, but is continually subject to change. The latest
information may be found at http://www.redbooks.ibm.com/.
How IBM Employees Can Get ITSO Redbooks

Employees may request ITSO deliverables (redbooks, BookManager BOOKs, and CD-ROMs) and
information about redbooks, workshops, and residencies in the following ways:
• Redbooks Web Site on the World Wide Web
http://w3.itso.ibm.com/
• PUBORDER – to order hardcopies in the United States
• Tools Disks
To get LIST3820s of redbooks, type one of the following commands:
TOOLCAT REDPRINT
TOOLS SENDTO EHONE4 TOOLS2 REDPRINT GET SG24xxxx PACKAGE
TOOLS SENDTO CANVM2 TOOLS REDPRINT GET SG24xxxx PACKAGE (Canadian users only)
To get BookManager BOOKs of redbooks, type the following command:
TOOLCAT REDBOOKS
To get lists of redbooks, type the following command:
TOOLS SENDTO USDIST MKTTOOLS MKTTOOLS GET ITSOCAT TXT
To register for information on workshops, residencies, and redbooks, type the following command:
TOOLS SENDTO WTSCPOK TOOLS ZDISK GET ITSOREGI 1998
• REDBOOKS Category on INEWS
• Online – send orders to: USIB6FPL at IBMMAIL or DKIBMBSH at IBMMAIL
Redpieces
For information so current it is still in the process of being written, look at "Redpieces" on the
Redbooks Web Site (http://www.redbooks.ibm.com/redpieces.html). Redpieces are redbooks in
progress; not all redbooks become redpieces, and sometimes just a few chapters will be published
this way. The intent is to get the information out much quicker than the formal publishing process
allows.

How Customers Can Get ITSO Redbooks
Customers may request ITSO deliverables (redbooks, BookManager BOOKs, and CD-ROMs) and
information about redbooks, workshops, and residencies in the following ways:
• Online Orders – send orders to:
IBMMAIL Internet
In United States usib6fpl at ibmmail usib6fpl@ibmmail.com
In Canada caibmbkz at ibmmail lmannix@vnet.ibm.com
Outside North America dkibmbsh at ibmmail bookshop@dk.ibm.com
• Telephone Orders
United States (toll free) 1-800-879-2755
Canada (toll free) 1-800-IBM-4YOU
Outside North America (long distance charges apply)

(+45) 4810-1320 - Danish (+45) 4810-1020 - German
(+45) 4810-1420 - Dutch (+45) 4810-1620 - Italian
(+45) 4810-1540 - English (+45) 4810-1270 - Norwegian
(+45) 4810-1670 - Finnish (+45) 4810-1120 - Spanish
(+45) 4810-1220 - French (+45) 4810-1170 - Swedish
• Mail Orders – send orders to:
IBM Publications IBM Publications IBM Direct Services
Publications Customer Support 144-4th Avenue, S.W. Sortemosevej 21
P.O. Box 29570 Calgary, Alberta T2P 3N5 DK-3450 Allerød
Raleigh, NC 27626-0570 Canada Denmark
USA
• Fax – send orders to:
United States (toll free) 1-800-445-9269
Canada 1-800-267-4455
Outside North America (+45) 48 14 2207 (long distance charge)
• 1-800-IBM-4FAX (United States) or (+1) 408 256 5422 (Outside USA) – ask for:
Index # 4421 Abstracts of new redbooks
Index # 4422 IBM redbooks
Index # 4420 Redbooks for last six months
• On the World Wide Web
Redbooks Web Site http://www.redbooks.ibm.com
IBM Direct Publications Catalog http://www.elink.ibmlink.ibm.com/pbl/pbl
Redpieces
For information so current it is still in the process of being written, look at "Redpieces" on the
Redbooks Web Site (http://www.redbooks.ibm.com/redpieces.html). Redpieces are redbooks in
progress; not all redbooks become redpieces, and sometimes just a few chapters will be published
this way. The intent is to get the information out much quicker than the formal publishing process
allows.

IBM Redbook Order Form
Please send me the following:

Title Order Number Quantity
First name Last name
Company
Address
City Postal code Country
Telephone number Telefax number VAT number
Invoice to customer number
Credit card number
Credit card expiration date Card issued to Signature
We accept American Express, Diners, Eurocard, Master Card, and Visa. Payment by credit card not
available in all countries. Signature mandatory for credit card payment.
403
List of Abbreviations
A ODBC Open Database

Connectivity
CAE DB2 Client Application
Enabler OLTP Online Transaction
Processing
CCD Consistent change data
table PIT Point in Time table
CD Change data table RDBMS Relational database

management system
CPU Central Processing Unit
REXX Restructured Extended
DAO Data Access Objects Executor
DB2 UDB DB2 Universal RI Referential Integrity
Database
SNA Systems Network
DCS Database Connection Architecture
Services
SQL Structured Query
DDCS Distributed Database Language
Connection Services
SQLCA SQL Communication
DDF Distributed Data Area
Facility (component of
DB2 for OS/390) TCP/IP Transmission Control
Protocol/Internet
DDL Data Definiton Protocol
Language (part of SQL)
UOW unit-of-work table
DJRA DataJoiner Replication
Administration
DML Data Manipulation
Language (part of SQL)
DProp DB2 DataPropagator
DRDA Distributed Relational
Database Architecture
EPOS Electronic Point of
Sales
FTP File Transfer Program
GUI Graphical User
Interface
IBM International Business
Machines Corporation
ITSO International Technical
Support Organization
JCL Job Control Language

Index
lag limit 119
monitoring 101
A NOPRUNE start option 92, 127
abbreviations 405
PRUNE command 92
acronyms 405
pruning interval 119
Apply
resolving a gap 103
ASNDONE user exit 111
retention limit 120
automatic full refresh 86
setup 71, 190
bind 75, 192
starting 85
changes from non-IBM sources 169
TRACE option 100
CPU utilization 24
triggers 166
enabling block fetch 126
triggers, impact 55
general introduction 8
tuning parameters 118
monitoring 106
change data tables
placement 39
indexes 121
pull mode 39, 125
pruning 92
push mode 39
tablespace usage 120
replication timing 49, 51
control tables
setup 71, 190
create at control servers 74, 191
starting on OS/390 162
create at source servers 74, 191
starting on Windows NT 200
TRCFLOW option 100
placement 46
using multiple parallel processes 125
reorganizing 91
APPLYTRAIL table
looking for details 109
pruning 93 D
ASNDONE user exit 111 data consolidation scenario
ASNJET 276, 277 configuration tasks 151
DataJoiner placement 44
defining replication subscriptions 161
B moving from test to production 164
block fetch 126
placement of the system components 142
blocking factor 54
replication design 145, 159
business requirements
system design 142
gathering 16
system topology 150
questions 16
target site union 145
data distribution scenario
C configuration tasks 188
Capture DataJoiner placement 41
bind 74 replication design 181, 192
commit interval 119 system design 177
CPU utilization 24 system topology 186
detecting errors 102 data volumes
dropping unnecessary triggers for non-IBM estimation 20
sources 133 log space 22
execution priority 118 spill file 24
general introduction 8 staging tables 22
lag 102 unit-of-work table 22

data warehouse scenario forceing for a certain subscription set 132
configuration tasks 214 forcing for all sets reading from a source server
initial load 261 133
maintaining a base aggregate table from a forcing for all sets reading from a source table
change aggregate subscription 257 133
maintaining history information 211, 220, 235, initial automatic 86, 306
244, 250 manual 89
pushing down the replication status 259 preventing selectively 129
replication design 210, 217 techniques 125
star join example 270 using DataJoiner’s EXPORT/IMPORT utilities
system design 209 263
system topology 213 using DSNTIAUL and Oracle’s SQL*Loader util-
using a CCD target table for the facts 245 ity 264
using source site joins for data denormalization using SQL INSERT/SELECT from DataJoiner
237 262
using target site views for data denormalization
228
DataJoiner
H
handshake between Capture and Apply 87
data access modules 66
heterogeneous replication
database 68
architectures 33
features 31
instance 67
monitoring 116
nicknames 9, 34, 79, 159, 186 L
placement 40 lock rules 121
server mappings 69 log space
server options 70 volume estimation 22
setup 65
user mappings 70
DB2 UDB Control Center 39
P
password file
denormalization create 196, 299
using source site joins 237 performance tuning 117
using target site views 228 project
DJRA implementation checklist 63
connectivity 38 implementation tasks 65
general introduction 8, 28 planning 13
setup 72, 191, 217 staffing 14
user exits 73 pruning 91
DProp APPLYTRAIL table 93
components 8 change data tables 92
features 30 consistent change data tables 92
general introduction 28 defering pruning for non-IBM sources 127
open monitoring interface 99 tuning considerations 127
unit-of-work table 92
F
full refresh
allow for certain subscriptions 130
R
reorganization
disable for all subscriptions 129 of change data tables 91

of the unit-of-work table 91 detecting errors 107
replication sources how to make use of 122
defering pruning for non-IBM 127 identifying members 109
defining non-IBM tables as 79 latency 108
dropping unnecessary capture triggers for monitoring the status 106
non-IBM 133
high-level picture 18
non-IBM source 34, 166
T
target site views 228
register 77, 159
replication status 259
replication targets U
defining non-IBM tables as 77 unit-of-work table
high-level picture 18 pruning 92
invoking stored procedures 185 tablespace usage 120
non-IBM target 33 volume estimation 21, 22
referential integrity considerations 49 update anywhere scenario
target table types 49 administration and maintenance 316
RUNSTATS 122 configuration tasks 279
conflict detection 313
creating source views to enable subsetting 287
S major pitfalls 300
scheduling
monitoring and problem determination 316
event based 52
operational implications 314
relative timing 52
replicating updates 311
server mappings 69
replication design 277, 287
server options 70
replication results 301
skills
system design 276
application specialists 15
system topology 277
data replication professional 14
user mappings 70
database administrator 15
utility operations
network specialists 15
LOAD 95
system specialists 15
pseudo-ALTER 96
source site joins 237
RECOVER 96
source-to-target mapping 19
REORG 97
spill file
using memory rather than disk 126
volume estimation 24
SPUFI 158
staffing 14
staging tables
volume estimation 20, 21, 22
stored procedure
add to subscription set 199
stored procedures
invoking at target database 185
subscription set
add stored procedure to 199
changing Apply qualifier or set name 134
deactivating 129
409
ITSO REDBOOK EVALUATION
My Mother Thinks I’m a DBA! Cross-Platform, Multi-Vendor, Distributed Relational Data Replication with
IBM DB2 DataPropagator and IBM DataJoiner Made Easy!
SG24-5463-00
Your feedback is very important to help us maintain the quality of ITSO redbooks. Please complete
this questionnaire and return it using one of the following methods:
• Use the online evaluation form found at http://www.redbooks.ibm.com
• Fax this form to: USA International Access Code + 1 914 432 8264
• Send your comments in an Internet note to redbook@us.ibm.com
Which of the following best describes you?

_ Customer _ Business Partner _ Solution Developer _ IBM employee
_ None of the above
Please rate your overall satisfaction with this book using the scale:
(1 = very good, 2 = good, 3 = average, 4 = poor, 5 = very poor)
Overall Satisfaction __________
Please answer the following questions:
Was this redbook published in time for your needs? Yes___ No___
If no, please explain:
What other redbooks would you like to see published?
Comments/Suggestions: (THANK YOU FOR YOUR FEEDBACK!)

My Mother Thinks I’m a DBA! Cross-Platform, Multi-Vendor, Distributed Relational Data Replication SG24-5463-00
with IBM DB2 DataPropagator and IBM DataJoiner Made Easy!
Printed in the U.S.A.
SG24-5463-00

Oracle DBA

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Oracle DBA

Încărcat de

Drepturi de autor:

Formate disponibile

My Mother Thinks I’m a DBA!

Cross-Platform, Multi-Vendor, Distributed Relational

Olivier Bonnet, Simon Harris, Christian Lenke, Li Yan Zhou,

International Technical Support Organization

International Technical Support Organization

My Mother Thinks I’m a DBA! Cross-Platform,

First Edition (June 1999)

Comments may be addressed to:

© Copyright International Business Machines Corporation 1999. All rights reserved

Part 1. Heterogeneous Data Replication—General Discussion . . . . . . . . . . . . . . . . . . . 1

Chapter 3. System and Replication Design—Architecture . . . . . . . . . 27

© Copyright IBM Corp. 1999 iii

Chapter 4. General Implementation Guidelines . . . . . . . . . . . . . . . . . . 61

Chapter 5. Replication Operation, Monitoring and Tuning . . . . . . . . . 83

iv The IBM Data Replication Solution

Part 2. Heterogeneous Data Replication—Case Studies . . . . . . . . . . . . . . . . . . . . . . 137

Chapter 6. Case Study 1—Point of Sale Data Consolidation, Retail . 139

Chapter 7. Case Study 2—Product Data Distribution, Retail . . . . . . . 173

Chapter 8. Case Study 3—Feeding a Data Warehouse. . . . . . . . . . . . 203

vi The IBM Data Replication Solution

Chapter 9. Case Study 4—Sales Force Automation, Insurance. . . . . 271

Appendix A. Index to Data Replication Tips, Tricks, Techniques . . . 321

Appendix B. Non-IBM Database Stuff . . . . . . . . . . . . . . . . . . . . . . . . . . . 325

Appendix C. General Implementation Checklist . . . . . . . . . . . . . . . . . . 337

Appendix D. DJRA Generated SQL for Case Study 2. . . . . . . . . . . . . . 339

Appendix E. DJRA Generated SQL for Case Study 3 . . . . . . . . . . . . . . 347

viii The IBM Data Replication Solution

Appendix F. DJRA Generated SQL for Case Study 4 . . . . . . . . . . . . . . 381

Appendix G. Special Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393

Appendix H. Related Publications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397

How to Get ITSO Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401

List of Abbreviations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405

ITSO REDBOOK EVALUATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411

1. Part 1 - Structural Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

© Copyright IBM Corp. 1999 xi

xii The IBM Data Replication Solution

1. Available Replication Features in a Heterogeneous Environment. . . . . . . 31

© Copyright IBM Corp. 1999 xv

DB2 DataPropagator (DProp) is IBM’s strategic data replication solution. As a

In combination with other IBM products, such as IBM DataPropagator

In this redbook we focus on how IBM’s data replication solution is extended to

First, we discuss general implementation options that are available to exploit

Next, we introduce practical case studies, showing how the previously

© Copyright IBM Corp. 1999 xvii

Replication between relational databases and non-relational databases

The Team That Wrote This Redbook

Thomas Groh is a Data Management and Business Intelligence Specialist at

Olivier Bonnet is a Data Replication specialist in France. He provides both

xviii The IBM Data Replication Solution

Simon Harris is a Data Management pre-sales technical support specialist in

Christian Lenke is a Data Management Services Professional in Germany.

Li Yan Zhou is a Data Management Specialist in Beijing, China. She has

Thanks to the following people for their invaluable contributions to this

We want our redbooks to be as helpful as possible. Please send us your

xx The IBM Data Replication Solution

© Copyright IBM Corp. 1999 1

Congratulations for choosing IBM’s replication solution for your

To make your first approach to enterprise-wide data replication a real

Remark: Throughout this book, the term "multi-vendor" is used as a synonym

1.1 Why Replication?

Two different approaches to data access can be chosen:

Each approach has advantages and drawbacks, and corresponds to different

Whether you choose remote access or replication will be determined primarily