Documente Academic
Documente Profesional
Documente Cultură
http://www.redbooks.ibm.com
SG24-5463-00
SG24-5463-00
June 1999
Take Note!
Before using this information and the product it supports, be sure to read the general information in
Appendix G, “Special Notices” on page 393.
This edition applies to Version 5.1 of IBM DB2 DataPropagator Relational Capture for MVS, 5655-A23,
Version 5.1 of IBM DB2 DataPropagator Relational Apply for MVS, 5655-A22, Version 5.1 of IBM DB2
DataPropagator Relational for AS/400, Version 5.2 of IBM DB2 Universal Database, and Version 2.1.1
of IBM DataJoiner, 5801-AAR
When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the
information in any way it believes appropriate without incurring any obligation to you.
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xi
Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
The Team That Wrote This Redbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii
Comments Welcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Why Replication? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Why Multi-Vendor? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 How to Use this Book? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 The Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 The Practical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Technical Warm-Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.1 IBM DataPropagator—Architectural Overview . . . . . . . . . . . . . . . 7
1.4.2 Extending IBM Replication to a Non-IBM RDBMS. . . . . . . . . . . . . 9
1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 2. Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1 Organizing Your Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Gathering the Detailed Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 The Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 List of Questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Determining the Replication Sources and Replication Targets . . . . . . 18
2.4 Technical Planning Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.1 Estimating the Data Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.2 About CPU, Memory, and Network Sizing. . . . . . . . . . . . . . . . . . 24
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
v
6.7.1 Using Triggers to Emulate Capture Functions. . . . . . . . . . . . . . 166
6.7.2 The Change Data Table for a Non-IBM Replication Source . . . 169
6.7.3 How Apply Replicates the Changes from Non-IBM Sources . . . 169
6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
vii
B.1.6 Oracle Listener . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
B.1.7 Other Useful Oracle Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
B.1.8 More Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
B.2 Informix Stuff. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
B.2.1 Configuring Informix Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . 329
B.2.2 Using Informix’s dbaccess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
B.2.3 Informix Error Messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
B.2.4 More Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
B.3 Microsoft SQL Server Stuff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
B.3.1 Configuring Microsoft SQL Server Connectivity . . . . . . . . . . . . . . . 330
B.3.2 Using the Microsoft Client OSQL . . . . . . . . . . . . . . . . . . . . . . . . . . 331
B.3.3 Microsoft SQL Server Data Dictionary . . . . . . . . . . . . . . . . . . . . . . 331
B.3.4 Helpful SQL Server Stored Procedures . . . . . . . . . . . . . . . . . . . . . 332
B.3.5 Microsoft SQL Server Error Messages . . . . . . . . . . . . . . . . . . . . . . 332
B.3.6 Microsoft SQL Server Administration . . . . . . . . . . . . . . . . . . . . . . . 332
B.3.7 ODBCPing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
B.3.8 More Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
B.4 Sybase SQL Server Stuff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
B.4.1 Configuring Sybase SQL Server Connectivity . . . . . . . . . . . . . . . . 333
B.4.2 Using the Sybase Client isql . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
B.4.3 Sybase SQL Server Data Dictionary . . . . . . . . . . . . . . . . . . . . . . . 334
B.4.4 Helpful SQL Server Stored Procedures . . . . . . . . . . . . . . . . . . . . . 335
B.4.5 Sybase SQL Server Error Messages . . . . . . . . . . . . . . . . . . . . . . . 335
B.4.6 More Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
ix
x The IBM Data Replication Solution
Figures
xiii
xiv The IBM Data Replication Solution
Tables
To make the redbook most useful to support both the design and the
implementation phase of a heterogeneous replication project, the book
covers general guidelines and specific case studies separately.
In the case studies, the DB2 databases are either DB2 for OS/390 databases
or DB2 UDB for Windows NT databases, but the provided guidelines are also
applicable as well for any other member of the DB2 family. These include:
DB2 for AS/400, DB2 for UDB on UNIX platforms, DB2 UDB for OS/2, and
DB2 for VM/VSE.
Rob Goldring
IBM Santa Teresa Lab
Madhu Kochar
IBM Santa Teresa Lab
Micks Purnell
IBM Santa Teresa Lab
Kathy Kwong
IBM Santa Teresa Lab
Bob Haimovits
International Technical Support Organization, Poughkeepsie
Vasilis Karras
International Technical Support Organization, Poughkeepsie
xix
Comments Welcome
Your comments are important to us!
Basically, the most common uses of data replication are the following:
• Data distribution from one source database towards many target
databases.
• Feeding a data warehouse from a production database, utilizing the data
manipulation functions provided by the replication product (DProp). The
replicated data can, for example, be enhanced, aggregated, and/or
histories can be built.
• Data consolidation from several source databases towards one target
database.
But this redbook is the first publication that fully explains how you can
combine DProp and DataJoiner to implement a heterogeneous data
replication system.
Of course, this book does not cover all the areas listed above. It will provide
you with the guidelines and recommendations that you should follow during
your heterogeneous data replication project. All the steps are detailed, and
the book also gives you detailed examples for the setup of the most
frequently used replication configurations.
After you have read this book, you will be on your way to becoming a
replication specialist and ready for practical hands-on experience. You will
know how to handle your project, what you can expect from your replication
system, how you can implement a test system, and which steps you should
follow. Then you will need to get familiar with DProp and DataJoiner, and
probably re-read some parts of this book, before you move to production.
This is illustrated by Figure 1, which represents the structure of both the first
part of the book and your project. To keep the figure simple, iterations, which
are of course possible, are not displayed.
Introduction 5
Approaching
Data Replication
Each phase is fully described in a separate chapter in the first part of the
book. Each chapter gives all the guidelines and recommendations that should
be followed to successfully achieve the objectives of the corresponding
phase.
To help you re-position yourself at the beginning of each chapter, the figure
above is reproduced and detailed.
Replication Design (Chapter 3): You will then define the technical design of
the replication system, choosing the placement of the middleware
components. In this chapter we will provide you with the necessary
background information to help you choose between the many
implementation options offered by the IBM replication solution, so that the
system will fulfill your business requirement.
Before we jump into the phases of a replication project, let us have a look at
the technical warm-up below. It explains the basic DProp and DataJoiner
concepts that you will need to know to fully understand the contents of the
next chapters.
Introduction 7
• Apply database changes from the staging tables to the target databases
APPLY
BASE
TARGET
UNIT OF WORK
TARGET
CHANGE DATA
TARGET
LOG
CONTROL
CAPTURE ADMINISTRATION
Introduction 9
The nicknames are created within the DataJoiner database. Once nicknames
are in place, every DB2 client application, such as DProp Apply, can
transparently access (read, write, or execute) the referenced database
objects by simply accessing the nicknames.
Nickname 1 Table 1
DB2 Client
Application
Nickname n Table n
DataJoiner Multi-Vendor
Database Database
It is when the connectivity to the back-end data source has been established,
that nicknames can be created to reference database objects, such as tables,
stored procedures, user defined types or user defined functions.
Change capture, which for DB2 systems is achieved by reading the DB2 log,
is achieved by using native triggers for all the supported non-IBM data
sources. When a non-IBM table is registered for change replication, all the
necessary triggers or stored procedures are automatically generated by the
replication administration component (DataJoiner Replication Administration).
Introduction 11
12 The IBM Data Replication Solution
Chapter 2. Planning
The first thing you have to do when you begin studying your heterogeneous
replication system is .... Think !
You have to prepare yourself to find out about the details of the business
requirements that make you consider replication and details about your data.
Do not go into the technical details too soon. After you have organized your
project (just as any other project: staff the project, train the people, define the
project plan) you must first clearly determine what the business requirements
are, that is, what your users really need (which kind of data is needed, when
is it needed, and for which purposes?)
In this chapter, of course, we will not answer these questions. They depend
on your specific business. But we will help you determine which questions
you should ask the users to gather the business requirements, and focus on
the general topics you should study before you move to the implementation
phase of the project. This is illustrated by Figure 4:
Approaching
Data Replication
General Operation,
Replication Replication
Implementation Monitoring &
Planning Design
Guidelines Tuning
At the end of this phase, you will also have written a document detailing the
list of all the target tables, with all the columns, and the correlation between
the target and the source data. Tables structures, column name mappings,
and data types will have to be described.
Remark: You certainly already have a general idea of what you want to build
and why (for example, a datawarehouse for decision support, or a data
distribution system). So perhaps you already started the business
requirements gathering and analysis before you started reading this book
(because you did not decide to implement a heterogeneous replication
system just for the pleasure of having one, and the need to have such a
replication system is probably not recent). If this is the case, use the present
chapter to verify that you did not forget important things.
These activities are not detailed in this book, because they are not specific to
this kind of project, but we wanted to remind you not to underestimate their
importance and the associated workload.
Depending on the size of your company and the project scope, each role
described above can be covered by one or more persons, or several roles
can be covered by a single person.
You will also have to involve the users in the project, as early as possible (you
will need them during the business requirements definition, and then during
the test phase).
Planning 15
2.2 Gathering the Detailed Requirements
You must determine with the users, in detail, what their data needs are, and
how the data is going to be used. Be sure to get information on the uses and
business needs for the replicated data from all people who are important to
the success of the project (that is, the users of the replicated data, the
management of the department that needs the replicated data, other staff and
management who have any interest in the data being replicated).
The list of questions below can help you prepare the for user interviews.
You must also have a deep knowledge of the current data and applications,
because you will need to determine how the future tables will relate to the
existing ones. Perhaps some new tables will have to be created, or existing
ones reorganized, or joined together.
So you will need to review the application documentation and/or interview the
programmers or software providers.
Planning 17
• Will the users need history information that is not present in the corporate
data?
• Are there special auditing needs?
• Is there a need of retaining the values of the columns before the record
was changed in the tables that the users will use (before-images of
columns)?
• Are there some complex data manipulations that must be done on the data
before the users use the data?
• Are the existing tables normalized, and do you always follow the relational
model recommendations (no update of primary keys in particular)?
• Will the headquarters need consolidated data from geographically
dispersed data?
• Are there special filtering needs such as: Propagate the inserts and
updates but not the deletes?
Remark: In this book you will find implementation examples for nearly all the
data replication requirements listed above. Table 1 on page 31 provides a list
of the most important replication features and tells you where you can find the
examples.
Once you have answers to these questions, you know the business
requirements and you have a more precise idea of what you will easily be
able to provide (when you have all the requested data already available) and
what will be more difficult to provide (when you do not have the requested
data available!)
You must, of course, consolidate and sort all the information that you have
collected, and probably solve some conflicts (some users have contradictory
requirements).
Then you can move on to the next step and begin drawing the global picture
that we were talking about in the introductory part of this chapter.
Reminder: At this point, the focus is on building the overall architecture of the
replication system. Try to avoid DProp or DataJoiner specific details, for
example:
But you should name the available communication links (with no technical
details) between the sources and the targets, and you should name the
source and target platforms.
But you must go farther, in your analysis, than just drawing this picture.
You also have to develop a document detailing the list of all the target tables,
with all the columns and their meaning, and explain how each column will be
derived from the source data (source table and column name, or calculation
formula).
You will probably need the users’ help again to complete this document. So,
in fact, the two steps (2.2, “Gathering the Detailed Requirements” on page 16,
and 2.3, “Determining the Replication Sources and Replication Targets” on
Planning 19
page 18) are iterative. You will need several iterations to stabilize the
requirements analysis documentation.
So far, you have taken the users’ requirements into consideration, but you
must also establish capacity planning requirements. The next section helps
you do this.
The next sections help you estimate this additional disk space utilization.
Advice: The estimation of the future volume of the staging tables is often a
difficult task, because most database administrators do not know how many
updates, inserts and deletes are performed on the source tables. So, some
are tempted to just "forget" this essential task. But you will not do this. You
really will spend some time trying to estimate, even roughly, how often your
source tables are updated.
If it is really too difficult, you can choose the following approach: Install the
Capture component of DProp on your source production system (or the
capture emulation triggers if your source is a non-IBM database), a long time
before you are ready to actually move the whole replication system to
production, then simulate a full-refresh so that Capture really starts capturing
the updates (the way to do this is explained in 8.4.9, “Initial Load of Data into
When you try to estimate the disk space the staging tables will use, it is not
only important to know the size of the source tables. You must also know how
many insert/delete/update operations will be made to the source tables, not in
average but as a maximum.
For example, imagine you have a source table that contains 1 million rows,
and your daily applications only update one percent of the rows. You will have
10,000 new rows each day in the staging table. If the table is replicated
regularly (several times a day for example), the staging table pruning
mechanism will be able to remove rows regularly from the staging table, and
so the staging table will never contain more than 10,000 rows. But now,
imagine that for this table you have a new monthly application that updates all
the rows. When the changes are captured, the staging table will contain 1
million rows.
This illustrates the fact that you need to know both the size of your source
tables and the maximum percentage of rows that are updated during one
replication cycle.
For some small tables (1000 rows or less; of course it also depends on the
length of each row!) that are globally updated by batch programs each day
and also propagated once a day, you might even consider that capturing and
replicating the updates is not the best approach. You can configure DProp so
that it should replicate this table in ’full-refresh’ mode only. The updates will
not be captured nor stored in staging tables, and the Apply component of
DProp will simply copy the whole content of the source table towards the
target tables. This runmode should of course only be used in exceptional
cases, because the main advantage of DProp is to provide change
replication.
When the source is a DB2 table, the capture component of DProp also inserts
rows into a table called the unit-of-work table. A row is inserted in the
unit-of-work table each time an application transaction issues a COMMIT and
the transaction had executed an SQL insert/delete/update statement against
a registered replication source table.
Planning 21
Except if you are using the technique explained in the remark above,
precisely estimating the size of the staging tables could only be done after
you have chosen all the replication parameters. But for the moment you only
need to do a rough estimation, using some simplified formulas (see below).
To size the staging tables, use the following simplified formula (the result is in
bytes), then add a 50% security margin:
(21 bytes + sum(length(registered columns)) x estimated number of inserts,
updates, deletes to be captured during max(pruning interval,replication
interval), for a busy day such as a month-end for example
To size the unit-of-work table, use the following simplified formula (the result
is in bytes), then add a 50% security margin:
79 bites x estimated number of commits to be captured during max(pruning
interval,replication interval), for a busy day such as a month-end for
example
Remark: The formulas above assume that all the Apply processes will
replicate with the same frequency. If you are in a configuration where one
Apply could run very infrequently (this is the case for mobile replication
environments for example), the effective pruning of the staging tables will be
done according to another parameter that is called retention limit, and the
size of the staging tables will probably be larger.
The increase in log space needed for your replication source tables will
depend on the number of replication sources defined, the row length of the
replication sources, the number of changes to those tables, and the number
of columns updated by the application. As a rule-of-thumb, you can estimate
that the log space needed for the replication source tables, after you have set
the DATA CAPTURE CHANGES attribute, will be three times larger than the
original log space needed for these tables.
You also need to consider the increase in log space needed because the
staging tables and the unit-of-work table are DB2 tables and so they are also
logged.
Planning 23
2.4.1.4 Spill Files
The Apply component of DProp uses work files called spill files when it
fetches the data from either the source tables or the staging tables. Spill files
can be large when there are many updates to replicate, or when the initial full
refresh of a target table is performed.
Refer to 5.5.11, “Using Memory Rather Than Disk for the Spill File” on page
126, and to the DPROPR Planning and Design Guide, SG24-4771 for more
complete details about the spill files sizing estimation.
Most of all, the network capacity has a very significant impact on the overall
performance of any replication system. Do not neglect it! The network is often
the bottleneck of the whole system.
These recommendations will help you design and implement the most
efficient architecture with respect to your business and organizational
requirements. Then, the best thing you can do to optimize the CPU, memory
Remarks:
• You can also find useful CPU, memory and network sizing information in
the DPROPR Planning and Design Guide, SG24-4771. Although it was
written for DProp version 1, most of the guidelines it contains remain true
for DProp version 5.
• Testing the performance of your heterogeneous replication system on your
test system will not necessarily be meaningful unless you are able to
have:
• A real pre-production environment with similar characteristics as the
production environment
• An automation tool to reproduce the workload of the production
environment onto the pre-production environment.
2.5 Summary
In this chapter we focused on the planning topics that you should study
before you really start designing and implementing your heterogeneous
replication system:
• Organize your project as any other IT project. Do not forget to involve
users and application specialists as early as possible.
• Gather the detailed business requirements and determine the list of
targets and the corresponding sources. To help you achieve this task we
provided you with a checklist of the questions you should ask users, and
yourself.
After that you should be able to:
• Draw a business oriented picture of the future replication system
• Write a document describing the future target tables, and the origin of
the data for all the columns
• Estimate the impact of data replication on the IT environment
You are now ready to go to the next phase and design the architecture of your
heterogeneous replication system. See Chapter 3, “System and Replication
Design—Architecture” on page 27.
Planning 25
26 The IBM Data Replication Solution
Chapter 3. System and Replication Design—Architecture
After you have been through the planning phase of your heterogeneous
replication project (see Chapter 2, “Planning” on page 13), and before you
really start implementing the components of the technical solution (see
Chapter 4, “General Implementation Guidelines” on page 61), you should
spend some time thinking about the architectural aspects of your
heterogeneous replication system.
Within this chapter we will provide you with enough information to help you
choose between the different options that are available when you build this
architecture. Figure 5 shows where we are in the sequence of planning,
designing, implementing, and operating heterogeneous replication.
Approaching
Data Replication
Replication
Replication Replication Implementation
Operation &
Planning Design Guidelines
Maintenance
Principles of
System Replication
Heterogeneous
Design Options Design Options
Replication
The IBM replication solution enables you to replicate data from (nearly) any
relational database to (nearly) any other relational database. Since DProp
In the case studies (in Part 2 of this book) we will only describe how to set up
data replication between DB2 and non-IBM databases, but you can simply
combine the various examples (non-IBM source, non-IBM target) to create
other scenarios.
There is only one case where you do not need the full functions of DataJoiner
to replicate between DB2 and a non-IBM database: Microsoft Access. See
Chapter 9, “Case Study 4—Sales Force Automation, Insurance” on page 271,
where such a replication scenario is explained in detail.
Let us forget all the other marvellous capabilities of DataJoiner for a while
and go back to our heterogeneous replication topic.
Now you are probably wondering which of these DProp features can also be
used when you add DataJoiner into the picture to propagate between DB2
and a non-IBM database.
The last column of this table indicates where you can find examples, in this
book, to implement these features.
Table 1. Available Replication Features in a Heterogeneous Environment
Update-anywhere + conflict Only with Only with Chapter 9, “Case Study 4—Sales
detection + referential integrity MS Access MS Access Force Automation, Insurance” on page
constraints support 271
Run SQL statements or stored Y (*) Y (*) Chapter 7, “Case Study 2—Product
procedures Data Distribution, Retail” on page 173
Note: The (*) in the table above means ’except with MS Access’.
Now that we have seen what you can do (and what you cannot do) according
to your source database systems and your target database systems, let us
have a look at some of the most common replication environments.
Of course there are possible alternatives, but let us keep it simple for the
moment.
DataJoiner
Global Catalog
CAPTURE APPLY
Replication Direction
Apply will access the target table through a nickname that is defined in the
DataJoiner database.
Insert
Update
Delete Source Table Source Target Table
Nickname
APPLY
Prune
Control Tables :
Pruning Control ASN.IBMSNAP_SUBS_SET
Pruning Control ASN.IBMSNAP_SUBS_MEMBR
Nickname
ASN.IBMSNAP_SUBS_COLS
ASN.IBMSNAP_SUBS_STMTS
ASN.IBMSNAP_SUBS_EVENTS
ASN.IBMSNAP_APPLYTRAIL
Register ASN.IBMSNAP_CCPPARMS
Register ASN.IBMSNAP_UOW
Nickname
Reg_Synch
Reg_Synch Reg_Synch
Nickname
Replication Direction
To do this you just have to combine the environments discussed in the two
previous sections. You can simplify the setup since the Apply program is able
to run in both Pull and Push modes. Therefore, you only need to have a
single Apply instance running in the DataJoiner Server, pulling the data from
the AS/400 to Microsoft SQL Server, and pushing the data from Microsoft
SQL Server to the AS/400.
To avoid confusion when you define the replication sources and targets, it is
better to define two DataJoiner databases, one for replicating data in each
direction.
CAPTURE
APPLY Insert
AQLY2
Push Mode Update
Target Table 2 Delete Source Table 2
DJDB2
CCD Table
Register
DJRA must be configured in such a way that it can access both the non-IBM
databases and the DB2 databases:
• To access the non-IBM databases, DJRA will connect to the DataJoiner
database and DataJoiner will act as a gateway towards the non-IBM
databases using the defined server mappings and user mappings (see
Figure 9, A) .
• To access the DB2 databases:
Depending on the type of DB2 database, DJRA will:
• Connect directly to the DB2 database: This is the case if the DB2
database is DB2 UDB or DB2 Common Server on Intel or RISC
platforms (see Figure 9, C).
• Connect to the DB2 database through DataJoiner: This is the case if
the DB2 database is DB2 for OS/390, DB2 for AS/400, or DB2 for VSE.
No server mapping is necessary, only the Distributed Database
Connection Services (DDCS) function of DataJoiner is used (see
Figure 9, B).
Remark: It is also possible to create server mappings and nicknames for DB2
databases and tables. This DataJoiner feature is used, for example, when a
DB2 table and an Oracle table must be joined together. DB2 access through
nicknames is not recommended when you use DB2 objects for replication
only.
DataJoiner
Server
- DB2 UDB
- DB2 CS V2
The two goals are conflicting, and you will have to find a good compromise.
Furthermore, the best solution is not necessarily the same whether you
intend to propagate from a non-IBM database or whether you intend to
propagate to a non-IBM database.
The following two sections provide you with the background information to
help you decide where to place the DataJoiner middleware server(s), and
how many DataJoiner databases you should use. We will consider data
distribution to non-IBM databases and data consolidation from non-IBM
database systems, separately.
Replication Replication
Source Source
IBM
DJ & APPLY
IBM IBM
DJ & APPLY DJ & APPLY
Option 2 should only be used if the non-IBM target databases are located on
operating system platforms that DataJoiner does not yet natively support.
Example: To access Oracle on SUN Solaris, use a separate machine (either
AIX or Windows NT) and place this machine in the same LAN as the SUN
Solaris machine.
If you choose to have one DataJoiner instance per non-IBM target, you will
only need one DataJoiner database in each DataJoiner instance.
If you choose to have one central DataJoiner instance, you can choose to
have either:
• Only one DataJoiner database, common for all the non-IBM target
databases.
• Several DataJoiner databases.
The only reason why you would want to create several DataJoiner databases
is if you want the nicknames to be stored separately for security reasons. As
a first approach, just consider that a single DataJoiner database for all the
non-IBM databases is a good solution.
Only in situations where the data flows are small and the replication cycles
are long, do we recommend the use of a central middleware server to reduce
complexity.
Replication Replication
Target / APPLY Target / APPLY
IBM
DataJoiner
IBM IBM
DataJoiner DataJoiner
EASE OF ADMINISTRATION
& PERFORMANCE
ASN.IBMSNAP_REG_SYNCH CR
EA
Source Table TE
NIC Source Source Table
KN Nickname
AM
E
Source 2
Nickname
ASN.IBMSNAP_PRUNCNTL Nicknames
ASN.IBMSNAP_REGISTER ASN.IBMSNAP_PRUNCNTL ASN.IBMSNAP_PRUNCNTL
ASN.IBMSNAP_REG_SYNCH ASN.IBMSNAP_REGISTER ASN.IBMSNAP_REGISTER
ASN.IBMSNAP_REG_SYNCH ASN.IBMSNAP_REG_SYNCH
CREATE TABLE
CREATE TABLE
Figure 12. Why One DataJoiner Database for Each Non-IBM Source Server?
The Apply program can have its control tables (for example, SUBS_SET,
SUBS_MEMBR) located locally or remotely. The location chosen to hold the
control tables is known as the Control Server. As we already explained, it is
also possible to run Apply in push mode or in pull mode.
This means that you have several possible configurations (see Figure 13):
APPLY APPLY
APPLY APPLY
And since you probably will have several Apply instances, you can even have
combinations of the above configurations. But remember that if you want to
keep your configuration manageable, you had better keep it simple!
In fact, most of these replication design options are directly driven by your
business requirements (refer to Chapter 2, “Planning” on page 13, to see
which tasks you need to go through to assess these business requirements).
You will have to check that these business requirements can really be
achieved, considering the comments and restrictions that are explained here.
Read-only target table types are supported, except for CCDs. DJRA does not
currently allow the creation of a CCD table in a non-IBM target database, but
this restriction will probably soon be removed, and there is a workaround
anyway (see Chapter 8, “Case Study 3—Feeding a Data Warehouse” on page
203, for more details about this workaround).
In fact constraints are only needed when there are application updates. It is
useless to define constraints on read-only targets, because the updates made
against the source tables already satisfied the RI constraints defined at the
source, and Apply will logically maintain these original RI constraints on the
target.
For better performance, DProp Apply assumes some freedom with respect to
read-only copies. Updates to read-only copies within a subscription set cycle
will be performed one member at a time, with all updates related to one
subscription member issued before updating the target associated with the
next member. Still, all members within a subscription set are replicated within
one unit-of-work. By taking a global upper transaction boundary (global
SYNCHPOINT) for the complete set into account, all the target tables are
You can also indicate that you want the subscription set to be processed
continuously, meaning that just after Apply has finished processing the
subscription set, it will process it again. This does not mean, of course, that
you have transformed DProp into a synchronous replication system. There is
still a delay between the time the update is done at the source and the time
the update is applied to the target. But this delay will be short.
Remark: The timing information is defined at a subscription set level. So, all
the members in a set will be processed with the same frequency.
Now you must be aware that if you have many tables to propagate, and you
choose a very short interval (1 minute, for example), Apply will do its best to
meet your requirement, but the actual interval will probably be higher than
what you indicated. This depends mainly on the available system resources
(for example, CPU power, and network capacity).
There is an optional third column when you add a row into the events table:
Refer to the DProp documentation to find more details about its use.
Several subscription sets can share the same event name. This means that if
you wish to trigger several subscription sets together from the same event,
you only need to indicate the same event name in the subscription sets
definition. But if you intend to do this, perhaps you should consider grouping
Remark: For a subscription set, you can indicate both a replication interval
and an event name, but in general we recommend not mixing these two
processing modes. Remember: Try to keep things simple!
Example: If you want to propagate only once a day (for example, in the
evening at 8 pm) you have several possibilities:
• Use relative timing, with a 1-day frequency.
• You can even indicate a smaller interval (15 minutes, for example) so that
you can start additional replications during the day if needed. You can
either stop Apply once all the subscriptions sets have been processed at
least once, or deactivate all subscription sets processed by this Apply
instance by updating the control tables with the following statement:
UPDATE ASN.IBMSNAP_SUBS_SET SET ACTIVATE=0
WHERE APPLY_QUAL=’<apply qualifier>’
• Use events: Insert as many events as you wish in the
ASN.IBMSNAP_SUBS_EVENT table, one for each day that you want the
subscription sets to be processed.
• Or use the advanced event based scheduling technique that is described
in the next section.
First, create your subscription sets with the event name ’WEEKDAY’ (or any
other name you like).
The view above will generate a transparent event (in fact, no event will
actually be inserted in any real table; the view will just generate a temporary
event for Apply, when Apply accesses what it thinks is the
ASN.IBMSNAP_SUBS_EVENT table). The event will be visible each day of
the week from Monday through Friday, at midnight. On Saturdays and
Sundays nothing will happen (of course, you can change this: you simply
need to change the BETWEEN 2 AND 6 clause). When Apply runs on
Monday, for example, it will access the view and believe that there is an event
for that day at midnight, and so it will process the subscription set. The next
time Apply evaluates the view, the view will generate the same transparent
event, but since the LASTSUCCESS column in the SUBS_SET table
indicates that the subscription set has already been processed that day, it will
of course not process the subscription set again.
The blocking factor limits the number of rows that will be propagated by
Apply. It determines a maximum number of minutes of changes that Apply
can process when it reads the change data tables. If the rows that are present
in the change data tables (and that have not yet been processed) represent a
number of minutes of changes that is above the blocking factor value, Apply
will automatically split the fetched answer set into smaller pieces, and it will
process the subscription set as several mini-subscriptions, in several
mini-cycles.
This is an important feature. If you have defined a blocking factor value, then
when Apply encounters an environment problem (logs full in the target
database, for example), Apply will automatically try to split the answer set so
that the subscription is reprocessed in several mini-subscriptions. If you do
Planning ahead for trigger based change capture, therefore, has to take the
following considerations into account:
• Source application transactions will slow down, because the transaction’s
workload is actually doubled.
• Source applications will need more log space, because writing change
data into the change data tables will happen within the same commit
scope as changing the application tables.
• If a Capture trigger cannot insert a change record into the CCD table, such
as because there is no longer space for the new record, then the
We created two test jobs, both containing 27340 SQL INSERT statements,
grouped together into 100 INSERTs per transaction (1 COMMIT after 100
INSERTs). The TIMESTAMP column was populated in both cases using an
SQL expression comparable to DB2’s current timestamp.
The test jobs basically contained the same statements. The only differences
were the syntactical representation of the CURRENT TIMESTAMP
expression (Informix: CURRENT YEAR TO FRACTION (5), Oracle:
CURRENT DATE) and the method used to execute the SQL script:
• Informix: The Informix client program dbaccess was used to execute the
SQL script. The following syntax shows the invocation of the test script. All
output was redirected to /dev/null to prevent any slowdown by
unnecessary screen output.
dbaccess ifxdb1 insertsifx.sql > /dev/null
• Oracle: The Oracle client program SQL*Plus was used to execute the SQL
script. The following syntax shows the invocation of the test script. All
output was redirected to /dev/null to prevent any slowdown by
unnecessary screen output.
sqlplus user1/pwd1@oradb1 @insertora.sql > /dev/null
To measure the impact of the synchronous triggers, the insert job was
executed before the tables were registered as replication sources (that is,
before the capture triggers were created) and again after the tables were
registered as replication sources.
First test setup: The test script was executed without any triggers defined.
Second test setup: The second test run only applied to Informix. The reason
is that the Informix capture triggers generated by DJRA require the Informix
system variable USEOSTIME to be set to 1. We wanted to find out whether
this setting had any negative impact on Informix performance.
Third test setup: The test script was executed with capture triggers defined.
When interpreting the test results, please take into account the fact that the
batch-style insert script we used represents an extreme workload of
INSERTS.
Figure 14 graphically represents the test results. The y-axis (vertical) displays
the INSERT performance ratio comparing the different test setups,
considering the performance without any triggers and without any replication
based changes to the system settings as 100%.
Please note that the ratio displayed in the graph does not compare the
absolute INSERT performance observed comparing Informix and Oracle.
80
60
40
20
1 2 3 4 5
0
1
Informix 2
Dynamic Server3 4 5 Oracle 8 6
What you gain is, of course, a change capture mechanism that enables
out-of-the-box change replication for non-IBM database systems without
having to change any application logic and without having to copy complete
database tables when synchronizing source and target (which other vendors
call snapshot replication).
We also provided you with a list of the replication features or techniques that
you can use in a heterogeneous replication system, and some references to
examples in this book that show how you can implement these techniques.
Then we discussed the different options that are available when you build the
system’s architecture. We divided these options into two separate categories:
• The system design options, which essentially deal with the placement of
the DataJoiner middleware, the placement of the control tables and the
Apply program.
• The replication design options, which essentially deal with the types of
target tables and the replication timing.
We also discussed the impact that the new replication system will have on
your current production database(s) and illustrated this with some examples.
Now, you have to correlate the information provided in this chapter with the
preliminary information (that is, list of data sources, list of targets, volumes,
for example) that you gathered during the planning phase of your project, and
build a picture of your future heterogeneous replication system.
When you draw this picture, you must precisely indicate how many
DataJoiner servers you will use, how many DataJoiner databases you will
create in each DataJoiner server, where the Apply control tables will be
located, and where the Apply programs will run.
You must also indicate on the picture the types of target tables you will use,
and the timing options that will be used. You should also indicate the naming
conventions that you will use for:
• The databases names, including the DataJoiner databases
• The Apply qualifier names
• The subscription set names
• The userids that will be used to access the non-IBM databases
• The owner (high-level qualifier) of the target tables and of the nicknames
4.1 Overview
Building on recommendations given in Chapter 3, the following decisions are
made before implementing the solution:
• Replication source server platform(s), either DB2 or non-IBM
• Replication target server(s) platforms, either DB2 or non-IBM
• DataJoiner platform(s) and DataJoiner placement
• Placement of DProp Apply, either push or pull configuration
• Control table location, either centralized or decentralized
General Replication
Replication Replication Implementation Operation &
Planning Design Guidelines Maintenance
Following the work breakdown approach that guides us through the complete
book, we start to implement the replication solution after planning the project
and after deciding about the overall replication design. Going into the details,
the implementation of the replication solution has to deal with five major
activities:
1. Set up the Database Middleware Server
2. Implement the Replication Subcomponents
3. Set up the Replication Administration Workstation
4. Create the Replication Control Tables
5. Bind DProp Capture and DProp Apply
After all steps named in the general implementation checklist are successfully
completed, you are ready to define replication source tables and replication
subscriptions. Once these are defined, you can start the replication
subsystems DProp Capture and DProp Apply.
Check Appendix B, “Non-IBM Database Stuff” on page 325 for many useful
details about non-IBM client software, including hints on how to set up
non-IBM database clients and how to natively test connectivity.
The creation of the data access modules as well as the update of the
DataJoiner instances has to be executed as root user. To create the data
access modules for the remote databases, use the following guidelines:
1. Log on as root.
2. Set the remote client’s environment variables accordingly (for example,
set the SYBASE variable when link-editing a dblib or ctlib data access
module).
3. Change to the /usr/lpp/djx*/lib directory (the actual DataJoiner path
depends on the DataJoiner version you are using, for example djx_02_01).
4. Execute the shell script djxlink.sh, to automatically create the necessary
data access modules.
5. If the execution of djxlink.sh is not successful, build the data access
modules by:
1. Editing djxlink.makefile
2. Creating the access modules you need by executing
make -f djxlink.makefile <youraccessmodule>
UNIX platforms: Some setup tasks have to be completed before the instance
can be successfully created:
1. Create a DataJoiner instance owner group.
2. Create a DataJoiner instance owner.
3. Change to the /usr/lpp/djx*/instance directory (the actual DataJoiner
path depends on the DataJoiner version you are using, for example
djx_02_01_01).
4. Create the instance using the db2icrt <instance owner> command.
Use the following SQL statement to query all successfully defined server
mappings within a DataJoiner database:
SELECT SERVER, NODE, DBNAME, SERVER_TYPE,
SERVER_VERSION, SERVER_PROTOCOL
FROM SYSCAT.SERVERS;
Example:
create server option TWO_PHASE_COMMIT
for server <hetero_server> setting ’N’;
Recommendation (1): Create at least one user mapping before creating the
replication control tables for a non-IBM replication source. The DataJoiner
Replication Administration program determines the remote schema for
control tables that are created within the remote data source from the
REMOTE_AUTHID defined for the DataJoiner user that DJRA uses to connect to
the DataJoiner database.
Recommendation (2): Define user mappings for all the schemas that you
are planning to use as table qualifiers for non-IBM replication target tables.
Use the following SQL statement to query all successfully defined user
mappings within a DataJoiner database:
SELECT AUTHID, SERVER, REMOTE_AUTHID
FROM SYSCAT.REMOTEUSERS;
Non-IBM Source: The change capture activity will be achieved using OEM
triggers. There is no need to install additional software.
IBM Source: Install IBM DProp Capture (On UNIX and Intel platforms,
Capture is already pre-installed when setting up DB2 UDB). If the replication
source servers are DB2 UDB databases on Intel or UNIX platforms, change
the LOGRETAIN parameter of the source server’s database configuration to
YES.
Non-IBM Target: IBM DProp Apply will be used to replicate data to the
non-IBM target databases. As an integrated component of DataJoiner for
Windows NT, Apply is already pre-installed when setting up DataJoiner. On
UNIX platforms, make sure you install DataJoiner’s Apply component when
installing the DataJoiner software.
IBM Target: Install IBM DProp Apply (On UNIX and Intel platforms, Apply is
already pre-installed when setting up DB2 UDB).
We will now name and describe all the setup tasks that are necessary to
configure the administration workstation. You can then start to define
replication source tables and replication subscriptions from this replication
administration workstation.
The main topic here is to create a password file that will contain userids and
passwords for all DB2/DataJoiner replication sources and targets. DJRA will
use the password file when connecting to replication source and target
databases. Use DJRA’s Preference menu (Option: Connectivity) to populate
the password file. The password file will be stored in the DJRA working
directory.
Please note that you have to restart DJRA after cataloging additional
databases into the administration workstation’s database directory. DJRA will
pick up all databases from the database directory at startup time.
Customizing the user exits provided is a useful option, especially when the
database objects generated during replication setup have to fulfill strict
naming conventions, but it is not a requirement. The standard user exits
named above include several examples that explain how to modify the
defaults.
Therefore, the first action after installing DJRA is to create the replication
control tables at all the replication source servers and all the replication
control servers.
Use the following SQL statement to query all successfully defined nicknames
within a DataJoiner database:
SELECT TABSCHEMA, TABNAME, REMOTE_TABSCHEMA, REMOTE_TABNAME,
REMOTE_SERVER
FROM SYSCAT.TABLES
WHERE REMOTE_SERVER IS NOT NULL;
On the OS/390 platform, for example, considering that source server, control
server and target server are not identical, all Apply packages have to be
bound against all locations that Apply will connect to during replication:
BIND PACKAGE(<location>.<collection-id>.<pakagename>)
Finally, the bind job has to include a BIND PLAN statement, including all
different locations Apply is bound against. (Note that the following example is
applicable to DB2 for OS/390 V5 only. Examples referring to other DB2
releases are included within the product documentation.)
BIND PLAN(ASNAP510) PKLIST (loc1.collection-id.*,loc2.collection-id.*, ...)
Later on, if you are adding a new location to the replication scenario, just bind
Apply’s packages to the new location and rebind the plan after adding the
new location to the PKLIST.
Performance Advice:
• Do not change the recommended isolation levels provided in the DProp
documentation or in the sample bind jobs.
• Refer to the DB2 Replication Guide and Reference, S95H-0999 for the
syntax of the bind command appropriate to your platform.
• Make use of the BLOCKING ALL bind parameter on UNIX and Intel
platforms. This will enable Apply to use block fetch when fetching change
data from the source systems.
• Be aware that the default for the CURRENTDATA bind option valid for
DB2 for OS/390 has changed from DB2 version 4 to version 5. With DB2
for OS/390 version 5, CURRENTDATA(YES) was introduced as the
default bind option (until DB2 version 4, CURRENTDATA(NO) was the
default). To enable block fetch for DB2 for OS/390, add the
CURRENTDATA(NO) bind parameter to Apply’s bind job, if it is not already
present.
Refer to the DB2 Replication Guide and Reference, S95H-0999 for platform
specific issues.
Refer to the case studies detailed in the second part of the book to see how
the checklist can be practically used during the implementation phase of a
replication project. If you want to learn more about replicating from non-IBM
source databases, refer to “Case Study 1—Point of Sale Data Consolidation,
Retail” on page 139 (Informix replication sources). To get a deeper insight
into replication examples replicating to multi-vendor databases, have a look
at “Case Study 2—Product Data Distribution, Retail” on page 173 (Microsoft
SQL Server) or “Case Study 3—Feeding a Data Warehouse” on page 203
(Oracle).
Although we will not go into too much detail, we want to use the remaining
sections of this chapter to give you an overview of the next setup steps. We
will be dealing separately with:
• Implementing the Replication Design for Multi-Vendor Target Servers
• Implementing the Replication Design for Multi-Vendor Source Servers
For all further details about using DJRA, please refer to the DB2 Replication
Guide and Reference, S95H-0999 and to the DataJoiner Planning,
Installation and Configuration Guide, SC26-9150 (Starting and Using DJRA).
DataJoiner Multi-Vendor
Database Database
Create Table
DJRA Create
Nickname
Target Target Table
Nickname
Add a
Member to Insert
Subscription
Sets SUBS_MEMBR
Insert
SUBS_COLS
Type fixups can be necessary for DATE, TIME, and TIMESTAMP columns,
for example.
It is a good idea to let DJRA create a target table (including possible data
type fixups), even when the non-IBM target table already exists (use a
different table name, for example). Compare the created data types, including
any fixups, with the data types of the already existing table.
Advice: Alternatively, you can use the transparent DDL feature of DataJoiner
V2.1.1 to direct a CREATE TABLE statement to the non-IBM database. Using
this method, no datatype fixups are necessary. For more information refer to
the DataJoiner SQL and Application Programming Reference Supplement,
SC26-9148.
Figure 17 shows the replication control information and database objects that
are created when defining a non-IBM database table as a replication source:
DataJoiner Multi-Vendor
Database Database
Insert
Update
Source Delete Source Table
Create Triggers Nickname
Create
Nickname
DJRA CCD Table
CCD
Nickname
Create Table
Define Drop and Re-create Trigger Prune
a Table as
Replication
Source Pruning Control Pruning Control
Nickname
Insert
Register Register
Nickname
reg_synch
Reg_Synch Reg_Synch
Nickname
Remarks:
• Notice that the REGISTER, the PRUNCNTL, and the REG_SYNCH table
are already present in the non-IBM database, and that there is a nickname
for each of those tables in the DataJoiner database. These tables and the
corresponding nicknames are created when you create the Control Tables.
• Some database systems, such as Informix Dynamic Server Version 7 or
Microsoft SQL Server (without setting sp_dbcmptlevel to 70), support only
one trigger per SQL operation on a database table. This means that, for
one source table, you can create only:
• One trigger for INSERT
• One trigger for UPDATE
• One trigger for DELETE
Some of those systems do not even issue a warning (Informix, to their
honor, does) when you create a new trigger (say, for INSERT) on a table
that already has a trigger defined for this SQL operation. Therefore, the
database administrator must be careful not to overwrite existing triggers.
In this case all new trigger function required for change replication has to
be manually integrated into the existing triggers.
As you can imagine, DataJoiner Replication Administration will not
compensate missing database functionality for those non-IBM database
systems. But DJRA is smart enough to help the database administrator by
issuing a WARNING whenever non-IBM native triggers are created or
removed. DJRA does this for all supported non-IBM databases, regardless
of whether multiple triggers per SQL action are supported or not.
You will have to decide whether the Capture triggers can be created as
generated, or whether the generated trigger logic has to be integrated into
an existing trigger when a non-IBM table is defined as a replication
source. The same is also true when you remove the definition of a
replication source. Either remove the triggers or adapt the existing ones.
After all setup tasks named in the general implementation checklist are
completed and after the replication design has been defined and tested, your
replication system is ready to use. The next steps would be to carry over all
system components and all tested replication definitions to your production
system environments.
Before carrying the tested replication system over into your production
environment, we will spend some time discussing operational tasks of a
heterogeneous replication system.
5.1 Overview
But before we start, we want you to reposition yourself again by having a look
at the introductory diagram shown in Figure 18:
Approaching
Data Replication
General Operation,
Replication Replication Implementation Monitoring &
Planning Design Guidelines Tuning
Following the work breakdown approach that guides us through the complete
book, we will now discuss operational issues. In the previous chapter we
already gained first experiences on a multi-vendor replication system while
designing and implementing a first solution. That means, we can already
assume some expertise on working with distributed replication systems.
This chapter contains a lot of detailed information. It is natural, that you will
not follow every thought while browsing through the different parts of this
chapter for the first time. But the more time you spend on tuning and
AS/400 Remark: Capture can be started at each IPL. The best way to do this
is to include the start of the QZDNDPR sub-system and of Capture in the
QSTRUP program.
When Capture stops, it writes the log sequence number of the last
successfully captured DB2 log record (or AS/400 journals) into one of the
DProp control tables (ASN.IBMSNAP_WARM_START), so that it can easily
determine a safe restart point. Even in those cases where it was impossible
for Capture to write the WARM start information when shutting down (for
example, after a DB2 shutdown, using MODE FORCE), Capture is able to
determine a safe restart by evaluating SYNCHPOINT values stored in other
replication control tables, such as the register table. (The only assumption is:
Capture has been successfully capturing changes before the hard shutdown.)
Remark: Apply always inserts the data that it fetched from the replication
source server within a single transaction into the target tables, to guarantee
target site transaction consistency at subscription set level. This also applies
to the full refresh.
To let you control the replication initialization, DProp offers a lot of freedom
and flexibility in how the initial full refresh task is performed.
Before skipping over the following paragraphs, let us just recall that an
automatically maintained initial refresh consists of the following two steps:
1. Handshake between Capture and Apply
2. Moving data from the source tables to the target tables
OK, now it is safe to go ahead and jump to 5.2.2.3, “Manual Refresh / Off-line
Load” on page 89!
As already mentioned, moving data is not the only activity during the full
refresh. Even more interesting is how Capture and Apply perform the initial
handshake, because this handshake has to be replayed when initializing the
replication target tables manually.
Before fetching data from the replication source tables, Apply lets Capture
know that it is starting to perform the initial refresh.
Upon seeing a DB2 log record indicating that the SYNCHPOINT column has
been updated to hexadecimal zero for a subscription member, Capture
immediately translates the hex zero synchpoint into the actual log sequence
number of the log record read. The log sequence number value is retrieved
from the header of the log record that contains the update to
x’00000000000000000000’. See step (4) in Figure 19.
1 4
APPLY 2 CAPTURE
(or user)
DB2
Log
Record 3
Synchpoint set to
x ’0000...0000’
Apply will take the translated synchpoints into account when it performs the
next replication cycle (for that subscription set). The translated synchpoint
tells Apply exactly, when it initiates the initial refresh. Apply now knows that
all CD table records with a higher log sequence number (LSN) are awaiting
replication, and that all updates with a lower log sequence number have
already been included within the initial refresh.
Basically, if you decide to perform the initial load of your replication targets
yourself, your responsibilities will be:
• To guarantee that replication source and replication target are
synchronized (by loading the target tables), and
• To let DProp know about it, by updating the replication control tables as
explained it in 5.2.2.2, “Initial Refresh - A Look Behind the Curtain” on
page 86.
The necessity for reorganizing the change data tables and the unit-of-work
table (ASN.IBMSNAP_UOW ), of course, depends on the update rates against the
replication source tables. As a rule of thumb, reorganize the change data
tables and the unit-of-work table about once a week.
On AS/400: Run the RGZPFM command on all the change data tables and
on the unit-of-work table, once a week.
5.3.2 Pruning
DProp terminology uses the term pruning for the process of removing records
from the change data tables that have already been replicated to all targets.
Capture performs the pruning for the change data tables, the unit-of-work
table, and the Capture trace table. Manual pruning has to be established for
CCD tables (consistent change data tables) are maintained by Apply. They
are not automatically pruned by Capture.
For some types of CCD tables, by replication design, pruning is not required:
• Complete condensed CCD tables are updated in place, so that they do not
grow without bound. The only records that could be removed from these
CCD tables are those with IBMSNAP_OPERATION equal to ’D’ (Delete)
that have already been propagated to the dependent targets.
On the other hand, CCD pruning is an issue for internal CCD tables. This type
of table will grow if there is large update activity, and it could reach the size of
a complete CCD table. Yet, there is no value in letting this table grow, as only
the most recent changes will be fetched from it.
To enable pruning for internal CCD tables, you might want to add an
SQL-After statement to the internal CCD table's subscription to prune change
data that has already been applied to all dependent targets. Instead of letting
Apply launch the pruning statement via SQL-After processing, you could also
add the pruning statement to every other automatic scheduling facility.
A crude, but effective statement for internal CCD table pruning would be:
DELETE FROM <ccd_owner>.<ccd_table>
WHERE IBMSNAP_COMMITSEQ <=
(SELECT MIN(SYNCHPOINT)
FROM ASN.IBMSNAP_PRUNCNTL);
This will prune behind the slowest of all the subscriptions, not just those
subscriptions which refer to the source table associated with the internal
CCD. You might want to improve the pruning precision by adding the
replication source table to the subselect:
DELETE FROM <ccd_owner>.<ccd_table>
WHERE IBMSNAP_COMMITSEQ <=
(SELECT MIN(SYNCHPOINT)
FROM ASN.IBMSNAP_PRUNCNTL
WHERE PHYS_CHG_OWNER = ’<phys_chg_owner>’
AND PHYS_CHG_TABLE = ’<phys_chg_table>’;
To find out all internal CCD tables together with their source and change data
tables that are defined within your replication system, run the following query
at the source server:
SELECT SOURCE_OWNER, SOURCE_TABLE,
PHYS_CHG_OWNER, PHYS_CHG_TABLE,
CCD_OWNER, CCD_TABLE
FROM ASN.IBMSNAP_REGISTER
WHERE CCD_OWNER IS NOT NULL;
To keep the table from growing too large, these rows need to be deleted from
time to time. When to delete these rows is entirely up to you. Apply writes to
the ASN.IBMSNAP_APPLYTRAIL table, but never reads from it again.
If you are one of those more sophisticated types of DBAs, your SQL
statement could look like the following example instead:
DELETE FROM ASN.IBMSNAP_APPLYTRAIL
WHERE
( STATUS = 0
AND EFFECTIVE_MEMBERS = 0
AND LASTRUN < (CURRENT TIMESTAMP - 1 DAYS))
OR
( STATUS = 0
AND EFFECTIVE_MEMBERS > 0
AND LASTRUN < (CURRENT TIMESTAMP - 7 DAYS))
OR
(
LASTRUN < (CURRENT TIMESTAMP - 14 DAYS))
;
The statement shown above will prune the Apply trail table in stages:
• All Apply status messages, reporting that nothing was replicated
(EFFECTIVE_MEMBERS = 0), and also that no error occurred during replication
(STATUS = 0), will be removed first (after 1 day).
• All Apply status messages that report some replication action will stay in
the table a little bit longer (for example, 7 days). We detect that data was
actually replicated within one subscription cycle by specifying
EFFECTIVE_MEMBERS > 0.
• All other messages, possibly those reporting replication problems, will stay
longer. We can prevent error messages from being pruned earlier by
restricting the first two predicates to STATUS = 0.
Feel free to adjust the time period that the statistics records stay in your
Apply trail table. For example, if you are replicating continuously, you will
Remark: You can even occasionally delete everything from the Apply trail
table. However, you had better not do that for one of the other replication
control tables. So be careful when typing in the SQL statement!
If you are running OS/400 V420 or a later version, a system exit prevents you
from removing receivers that are still needed by the Capture program. We
recommend that you indicate MNGRCV(*SYSTEM) when you create the
journals, and that you indicate a threshold when you create the journal
receivers.
If you are running OS/400 V410, you must use the ANZDPRJRN command to
safely remove the receivers that are no longer needed, and we recommend
that you create the journals with MNGRCV(*USER).
To see how to issue a re-synch request for your replication target tables, refer
to 5.6.3, “Full Refresh on Demand” on page 132.
5.3.3.2 Recovery
As with load processing, your RECOVER procedures should consider the
effect on data consistency with copies derived from source tables that needed
a RECOVER. You may want to expand your procedures to perform a
coordinated recovery of a source table and all its copies, or you may want to
drive the replication software to re-initialize those subscription sets which
refer to source tables for which there was a recent RECOVER operation, or
you may decide to tolerate any data consistency errors resulting from your
RECOVER procedures.
These tools do not actually alter a table, but rather unload, drop, re-create
and load a new table in place of your existing table. Keep in mind that DB2
logs updates to a table based on internal identifiers, not by the names of the
tables. From Capture's perspective (and DB2's) it is merely coincidental that
this new table has a name matching the name of your old table. If you wish to
continue using such tools, you will need to carefully coordinate the
pseudo-alter processing:
If you are using tablespace compression, it is very important that you specify
the KEEPDICTIONARY REORG utility option, which is not the default.
DB2 for OS/390 can keep at most one compression dictionary in memory per
tablespace. Once a new compression dictionary is generated, such as during
a REORG, it replaces the previous dictionary. The DB2 log interface cannot
return log records written using an old compression dictionary.
If Capture has already processed all log records written prior to the REORG,
there is no problem.
If Capture requests a range of log records written before the REORG, and at
least one of the log records within the requested range was written using a
compression dictionary that has changed as a result of a REORG, then DB2
will not return the requested range of log records through the log interface.
Better yet, learn to live with the compression dictionaries you now have,
resisting the urge to rebuild them.
This section will give you an overview of replication monitoring issues and
techniques. Before we go into details and focus on the several separate
components of a distributed replication system, we will name all the
components that will be subject to monitoring.
Most of the replication statistics are available within the replication control
tables. We will use the following sections to introduce examples of how to
make use of the information within the replication control tables to fulfill
replication monitoring tasks.
Some of the control tables can be used to determine the status of the change
capture process, others are available to get an overview of the subscription
status, the subscription latency, or to evaluate statistics about the data
volume replicated within the most recent subscription cycles.
In the following sections, we will provide you with queries against the DProp
control tables, which extract useful monitoring information, and with
techniques to work around replication problems.
Trace
Finally, start Capture or Apply in trace mode, if the problem that is causing an
error is not obvious:
• Capture’s start option to enable trace mode is TRACE.
Remark:This option is not available on AS/400 because the Capture/400
program writes a lot of information into the ASN.IBMSNAP_TRACE table.
• Apply’s start option to enable trace mode is TRCFLOW.
The generated traces contain a large amount of text output, including dumps
and SQLCA information that can be used to determine the cause of the
Advice: Only start Capture and Apply in trace mode, if you are investigating
problems. The trace mode obviously slows the replication processes down
and also generates a lot of output.
Additionally, Capture reads through the DB2 log sequentially. A problem with
one replication source table can therefore delay change capture for the
complete replication system.
Considering this, the main monitoring tasks regarding the Capture program
will fall into the following categories:
• See if Capture is up and running
• Detect and solve Capture problems as soon as possible
Every time Capture commits, it updates the global record with the log
sequence number (SYNCHPOINT) and the timestamp associated with the
last processed log sequence number (SYNCHTIME).
According to this query, the Capture lag is displayed in seconds. To see the
actual timestamp of the log record most recently processed by Capture, just
select the global record from the register table:
SELECT SYNCHPOINT, SYNCHTIME
FROM ASN.IBMSNAP_REGISTER
WHERE GLOBAL_RECORD=’Y’;
So what does Capture do if the DB2 log interface cannot deliver the log
records requested by Capture? Right, Capture stops and issues an error
message. In DProp terminology, we call this status a gap (that is, some piece
of the DB2 log is missing).
In regard to unavailable log records, please consider that DB2 also might be
in trouble if certain log records are no longer available. Also consider that,
since Capture is able to process archived log records, a Capture gap
because of unavailable log records should never happen. But to be prepared,
we nevertheless want to go into more detail.
Remark: On AS/400 the way to start Capture in COLD mode is to indicate the
RESTART(*NO) parameter in the STRDPRCAP command.
If you started Capture using the WARMNS start option (which means WARM
start or no start), Capture will terminate when a requested log record cannot
be provided by the DB2 log interface. Capture will issue an error message
into the Capture Trace table ( ASN.IBMSNAP_TRACE) and Capture will write at
least one WARM start record into the WARM start table
(ASN.IBMSNAP_WARM_START). The WARM start table could look like the following
example:
SEQ AUTHTKN AUTHID CAPTURED UOWTIME
----------------------- ------------ ---------------- -------- -----------
x’00000000485E57D60000’ 0
x’00000000485E82F60000’ APPLY01 DB2RES5 N -1307724304
x’00000000485107C80000’ APPLY01 DB2RES5 N -1307736417
x’00000000480135D20000’ APPLY01 DB2RES5 N 0
If a restart attempt fails again with the same error message, you need to
provide Capture with a different restart point (a different WARM start log
sequence).
OS/390 Remark: A valid log sequence number can be derived by first using
the DSNJU004 utility to find an active log range, and then by running the
DSN1LOGP utility with this given log range (or a smaller subset) as an input.
The DSN1LOGP utility will show the actual log record numbers within the
given range. Choose a BEGIN UR or COMMIT log record, avoid UNDO or
REDO records.
Depending on the execution platform you are using, the facilities available to
check if a program is running will be different. The following example shows
how to determine if Apply is running for UNIX operating systems:
#!/bin/ksh
ps -ef | grep ’asnapply’ | grep -v grep | wc -l
If you are using several Apply processes on the same machine, all using a
different Apply qualifier or a different control server, the command to check if
Apply is running could even distinguish between several processes:
#!/bin/ksh
ps -ef | grep ’asnapply <apply_qual> <cntl_server>’ | grep -v grep | wc -l
If you are running Apply on an AS/400, you can check whether it is running by
issuing the WRKSBS command, choose option 8 in front of the QZSNDPR
sub-system, and you should see a job having the name of the Apply Qualifier.
Table 2 displays all possible states of a subscription set, taking the ACTIVATE
column and the STATUS column of the subscription set table into account.
Table 2. Determining the Status of Subscription Sets
0 0 The subscription set has never been run (initial setting after
defining an empty set), or the subscription set has been
manually deactivated.
Keep in mind that no data is replicated to any of the replication target tables
of a set whenever a subscription set fails. If a subscription error occurs after
changes were already inserted into some of the target tables, those changes
will be rolled back immediately.
The timestamp columns that we will use for this comparison are all available
from the subscription set table, ASN.IBMSNAP_SUBS_SET(see Table 3).
Table 3. Timestamp Information Available from the Subscription Set Table
Remark: Please notice that you are comparing a control server timestamp
(current timestamp) with a source server timestamp (SYNCHTIME). This
query could cause unexpected results, if the control server and the source
server are placed within different time zones, for example.
Use the following query example to select data from the Apply trail table:
SELECT APPLY_QUAL, SET_NAME, WHOS_ON_FIRST,
STATUS, LASTRUN, LASTSUCCESS, SYNCHTIME,
MASS_DELETE, EFFECTIVE_MEMBERS,
SET_INSERTED, SET_DELETED, SET_UPDATED, SET_REWORKED, SET_REJECTED_TRXS,
SQLCODE, SUBSTR(APPERRM , 1 , 8) AS ASNMSG, APPERRM
Modify the statement to determine the most recent Apply trail record for a
subscription set which was not successful:
SELECT SQLCODE, APPERRM FROM ASN.IBMSNAP_APPLYTRAIL
WHERE APPLY_QUAL = ’<apply_qual>’
AND SET_NAME = ’<set_name>’
AND WHOS_ON_FIRST = ’<whos_on_first>’
AND STATUS = -1
AND LASTRUN = (SELECT LASTRUN
FROM ASN.IBMSNAP_SUBS_SET
WHERE APPLY_QUAL = ’<apply_qual>’
AND SET_NAME = ’<set_name>’
AND WHOS_ON_FIRST = ’<whos_on_first>’);
As an additional idea, you could easily define a trigger on the Apply trail table
that would always execute (and perhaps sends a message) if a failing
subscription is reported into the Apply trail table ( STATUS = -1).
Remark: For all known error situations, a more detailed description and
possible solutions can be obtained from the DB2 Replication Guide and
Reference, S95H-0999.
If the Apply trail table (or the Apply trace) reveals that a problem had
originally occurred at a non-IBM database system (by showing SQLCODE
-1822), you have to refer to column APPERRM of the Apply trail table for
more details. This column will contain the complete SQL message (at least as
In this section, we will provide you with some guidelines on how to customize
and use the sample program ASNDONE.
The following are some very useful examples where ASNDONE could be
used:
• If a subscription cycle has not completed successfully (which can be
determined from the STATUS value passed to the ASNDONE program),
an automated monitoring system could be notified.
• If a subscription cycle has not completed successfully, an e-mail could
automatically be sent to the replication operator.
• Depending on the reason causing a problem, ASNDONE could deactivate
the subscription set causing the problem.
• In Update-Anywhere scenarios (updates to the replication target tables are
replicated back to the replication source table), ASNDONE can be used to
react on compensated transactions. Capture marks every transaction that
was compensated at the replica site by adding a compensation code to
the unit-of-work record that was captured into the unit-of-work table (and
rejected transactions will only be pruned by retention limit pruning so that
they are available for additional processing). ASNDONE could make use
of the rejection code provided by Capture to notify users or to
automatically reinsert compensated transactions.
Keep in mind, that ASNDONE (and this also applies to stored procedures) is
called from the Apply program. Therefore, if the user exit uses compiled SQL
statements, the user exit must fulfill the following requirements:
• Use CONNECT TYPE 1 only
• If executed on OS/390, link with DB2 CAF
• Static SQL packages must be bound against the databases/locations
where the SQL will execute
• If called from OS/390, the packages must be included in the Apply plan
PKLIST
Even REXX user exits can be used to substitute the compiled versions. On
OS/2, substitute the ASNDONE program in %DB2PATH%\bin with your REXX
exec code. On Windows NT/95, a REXX exec is called by issuing " REXX
execname parameters".
The easy example below just switches off (deactivates) failing subscriptions.
In a more sophisticated approach, some more logic could be added to
automatically fix certain problems or to notify an administrator or a monitoring
system.
if status = 0 then
SIGNAL GO_EXIT
/*******************************/
/* Load Rexx DB2 functions */
/*******************************/
/*************************/
/* CONNECT TO CNTLSERVER */
/*************************/
/*---------------------------------------------------------------------*/
/* INVESTIGATE ON THE REASON FOR THE PROBLEM */
/*---------------------------------------------------------------------*/
/*---------------------------------------------------------------------*/
/* TRY TO AUTOMATICALLY FIX THE PROBLEM */
/*---------------------------------------------------------------------*/
/* ... */
/*---------------------------------------------------------------------*/
/* DEACTIVATE SUBSCRIPTION, IF PROBLEM CANNOT BE FIXED */
/*---------------------------------------------------------------------*/
if status = -1 then
do
sql_stmt = "UPDATE ASN.IBMSNAP_SUBS_SET",
" SET ACTIVATE = 0",
" WHERE SET_NAME = ’"set_name"’",
" AND APPLY_QUAL = ’"apply_qual"’",
" AND WHOS_ON_FIRST = ’"whos_on_first"’";
/*********************/
/* EXECUTE IMMEDIATE */
/*********************/
/*---------------------------------------------------------------------*/
/* SEND AN EMAIL TO THE REPLICATION OPERATOR */
/*---------------------------------------------------------------------*/
/* ... */
SIGNAL GO_EXIT
/*********************/
/* SQL ERROR HANDLER */
/*********************/
SQL_ERROR:
RC = SQLCA.SQLCODE
go_exit:
return RC
Start the Apply instance with the TRCFLOW start option to find all trace
messages issued by ASNDONE in Apply’s trace output.
The above command will return a value greater or equal to 1 if the DataJoiner
instance is active.
So, most of the performance techniques introduced within this chapter are
common database tuning techniques, applied to change data tables,
database logs, or static and dynamic SQL. The following dedicated DProp
tuning techniques will be introduced within this section:
• Running Capture with the appropriate priority
• Adjusting the Capture tuning parameters
• Using separate tablespaces
• Choosing appropriate lock rules
• Using the proposed change data indexes
• Updating database statistics
• Making use of subscription sets
• Using pull rather than push replication
• Using multiple Apply processes in parallel
• Using high performance full refresh techniques
• Using memory rather than disk for the spill file
The default for the Capture commit interval is 30 seconds. A higher commit
interval reduces the cost of change capture, but also might increase the
latency of very frequently running subscriptions (for example, for continuously
running subscriptions). The commit interval is specified in seconds.
Recommendation:
But consider test systems, that are from time to time used to check out new
replication techniques. Capture might have been stopped for some time (say
weeks). When it is re-started with the WARM start option (which is the
default), Capture would request all DB2 log datasets from the time is was
stopped. If those datasets are still available, they would be mounted.
You probably do not want this to happen. The general advice is to start
Capture in COLD mode in test environments, if Capture has not been running
for a while. If it is accidentally started in WARM mode, the lag limit will let
Capture switch to a COLD start (or stop, if WARMNS is used), if the log
records that Capture would require to perform a WARM start are older than
the lag limit. The lag limit parameter is specified in minutes.
Remark: Retention limit pruning can destroy replication consistency for those
subscriptions that did not connect for a long time. Those subscriptions will
automatically do a full-refresh when starting the next subscription cycle.
The retention limit is used during pruning, to prune all transactions from the
change data tables that are older than CURRENT TIMESTAMP - RETENTION_LIMIT
minutes.
Remark (DB2 UDB for Intel and UNIX): Excellent performance can be
achieved by placing multiple change data tables into one single tablespace
(for example, using disk striping accross multiple disks for that tablespace).
To guarantee optimal performance, the one and only change data table index
should look like the following example (DB2 for OS/390 syntax):
CREATE TYPE 2 UNIQUE INDEX <indexname> ON <cd_owner>.<cd_table>
(IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC)
USING STOGROUP <stogroup> PRIQTY <nnn> SECQTY <mm>
FREEPAGE 0 PCTFREE 10;
The unit-of-work table index should look like the following example (DB2 for
OS/390 syntax):
CREATE TYPE 2 UNIQUE INDEX <indexname> ON ASN.IBMSNAP_UOW
(IBMSNAP_COMMITSEQ ASC, IBMSNAP_UOWID ASC, IBMSNAP_LOGMARKER ASC)
USING STOGROUP <stogroup> PRIQTY <nnn> SECQTY <mm>
FREEPAGE 0 PCTFREE 0 ;
OS/390 Remark: Make sure that all indexes on change data tables, the
unit-of-work table and all other replication control tables are defined as TYPE
2 indexes. DB2 ignores TYPE 1 indexes when using isolation UR.
Note for all DPRTools V1 users and all DJRA early users: Make sure that
there is only 1 unique CD index and one unique UOW index after migrating to
AS/400 Remark: In DPropR/400 V1, the indexes were different from those of
the other platforms. With DPropR/400 V5, use the same indexes as those
described above (except the TYPE 2, FREEPAGE and PCTFREE parameters
that do not exist on AS/400).
RUNSTATS must be run at a time when the change data tables contain
sufficient data so that the carefully chosen indexes on change data tables and
on the unit-of-work table will be used by Apply and Capture.
It is not necessary to update the statistics again, once the catalog tables
show that there is an advantage to using the indexes. The SQL against the
changed data tables is dynamic, using parameter marker values, and
therefore default filter factors will be used.
The cardinality of the tables will affect the default filter factor values, but the
fact that the high and low values are old will not have any effect.
Rebind the Capture and Apply packages after the RUNSTATS has been
performed, so that the static SQL contained in these packages can benefit
from the updated statistics.
APPLY
Looking at Figure 20, we can identify at least the following Apply tasks, which
execute in the following sequence:
1. Control Server: Look for work and determine subscription set details
2. Source Server: Fetch changes from change data tables into the spill file
3. Target Server: Apply changes from the spill file to target tables
4. Control Server: Update subscription statistics
5. Source Server: Advance pruning control synchpoint to enable pruning
All these tasks need database connections, and are executed at subscription
set level.
The only impact of having big subscription sets is that the transactions
needed to replicate data into the target tables can become quite large (all
changes within one subscription set are applied within one transaction). Be
sure to allocate enough log space and enough space for the spill file.
To prevent database log and spill file overflow, DProp offers another
technique to keep target transactions small. To use this technique, you have
to add a blocking factor (also referred to as the MAX_SYNCH_MINUTES feature) to
the subscription set. This also guarantees transaction consistency at set
level, but lets Apply replicate changes in multiple smaller mini-cycles rather
than in one big transaction. Refer to 3.3.3, “Using Blocking Factor” on page
54 for the details.
As a reminder, pull means that DProp Apply is running at the target server,
fetching data from the replication source server, usually over a network, and
inserting all the fetched changes into the target tables locally.
In push mode, DProp Apply is running at a site other than the target server
(probably at the source server), inserting all the changes into the target tables
remotely over the network.
When Apply is started for one Apply qualifier, it immediately calculates, based
on the subscription timing that you defined, if subscription sets need to be
serviced. If several subscription sets are awaiting replication, Apply always
services the most overdue one first.
5.5.11 Using Memory Rather Than Disk for the Spill File
When using Apply for MVS, Apply provides an option to create the spill file in
memory rather than on disk. There is an obvious advantage in using memory
for the spill file rather than using disk storage (refer to 5.5.7, “Making Use of
Subscription Sets” on page 122 to see when and where the spill file is
created).
If your replication cycles are short, the amount of data to be replicated may
be appropriate for creating spill files in memory.
Whether Apply will actually use DB2/DRDA block fetch depends on the bind
options that were used when binding the Apply packages against the
replication source server, either DB2 or DataJoiner. For details about binding
Apply, refer to the general implementation checklist, “Step 22—Bind DProp
Apply” on page 64.
5.5.12.2 OS/390
Specify the bind option CURRENTDATA(NO) when binding the packages of
Apply for OS/390 against the remote replication source server.
5.5.12.3 AS/400
Nothing special needs to be done for the AS/400.
For for details about pruning, please refer to 5.3.2, “Pruning” on page 91.
The solution here is to disable the pruning trigger during peak hours and to
enable it when appropriate. Some of the supported multi-vendor database
systems provide the option to simply deactivate triggers. We are showing two
examples here:
For all other database systems, check the documentation for the database
system you are using as replication source to check if triggers can be
temporarily disabled. If disabling of triggers is not supported, use the DROP
TRIGGER and CREATE TRIGGER statements instead:
-- temporarily drop pruning control trigger
DROP TRIGGER <schema>.PRUNCNTL_TRIGGER;
-- recreate pruning control trigger
CREATE TRIGGER <schema>.PRUNCNTL_TRIGGER ...
Important: Copy the DDL to create the pruning control trigger from the SQL
script generated by DJRA. Be sure to copy the CREATE TRIGGER statement from
the SQL script of the source registration that you created last, because the
trigger body of the pruning control trigger changes with every registered
source table.
To gain control, DProp allows you to disable any automatic full refresh for
certain source tables.
Use the following SQL statement to disable any automatic full refresh for a
certain source table. Issue the statement while you are connected to the
Using the technique described above, full refresh can only be disabled (or
enabled) for all the subscriptions that use the source table, because the
disable refresh attribute is set at the replication source table level.
Use the following technique to generally disable full refresh from a replication
source, but to open the door for certain subscriptions only. We will make use
of Apply’s capability to issue SQL statements while performing a replication
cycle.
The only thing that Apply does between executing SQL Before statements of
type G and type S is reading the register table. Therefore, this time window,
and the chance that other subscriptions (which should not perform the refresh
automatically) are reading the register table in parallel, is more than
acceptably small.
Note: The statement enabling full refresh has to be of statement type ’G’.
-- TEMPORARILTY ENABLE FULL REFRESH
Note: The statement disabling full refresh again has to be of statement type
’S’.
-- DISABLE FULL REFRESH AGAIN
If you, for whatever reason, want to persuade Apply to perform a full refresh
the next time it processes the set, the following three techniques are
available. Please notice the different scopes of each technique. Select the
technique that is most suitable for your needs.
The statement resets certain columns of the subscription set table to their
initial values.
The statement will reset the SYNCHPOINT and SYNCHTIME columns, for all
subscriptions replicating from the source table, to NULL. It has the same
effect as a Capture COLD start, but limited to only one replication source
table.
Advice: Doing so could cause a lot of network traffic. Also, replication targets
maintaining histories might lose data. Think twice!
5.6.3.3 Forcing a Refresh for All Sets Reading from a Source Server
Start Capture in COLD mode. This is the ’brute force’ method. We strongly
recommend never to COLD start Capture within a production environment.
Capture performs an overall cleanup when starting in COLD mode. For
example, Capture removes all the records from all the change data tables.
Refer to the DB2 Replication Guide and Reference, S95H-0999 for more
details about Capture COLD starts.
To change either the Apply Qualifier or the subscription set name, follow the
procedure below:
1. Stop the Apply process servicing the Apply Qualifier that you want to
change.
2. Update all tables at the control server to change the Apply Qualifier and
set name.
-- Change APPLY_QUAL / SET_NAME within the Subscription Set Table
UPDATE ASN.IBMSNAP_SUBS_SET
SET APPLY_QUAL = ’<new_apply_qual>’,
SET_NAME = ’<new_set_name>’
WHERE APPLY_QUAL = ’<apply_qual>’
AND SET_NAME = ’<set_name>’
AND WHOS_ON_FIRST = ’<whos_on_first>’;
3. Update the pruning control table at the replication source server to change
the Apply Qualifier and set name.
-- Change APPLY_QUAL / SET_NAME within the Pruning Control Table
UPDATE ASN.IBMSNAP_PRUNCNTL
SET APPLY_QUAL = ’<new_apply_qual>’,
SET_NAME = ’<new_set_name>’
WHERE APPLY_QUAL = ’<apply_qual>’
AND SET_NAME = ’<set_name>’
AND CNTL_ALIAS = ’<cntl_alias>’
AND TARGET_SERVER = ’<target_server>’;
For the specific business application we are using in this example, we have
chosen Informix Dynamic Server (V7.3) as the replication source database,
but the techniques that we are going to use are applicable as well to other
non-IBM source databases, such as Oracle, Microsoft SQL Server, or Sybase
SQL Server, too.
Design: This section is used to highlight the design options that are most
appropriate to implement this data consolidation application. We will give
additional recommendations on how to scale the application to a large
number of replication source servers.
Finally, we are going to reveal some details about how the capture triggers
are used to emulate all functions, that, for DB2 replication sources, are
Figure 21 displays the high level system architecture used by the "retail"
company. So far, no database connectivity exists between the Informix EPOS
systems and the mainframe DB2 data sharing group. Until today, data has
only been exchanged through FTP using the existing TCP/IP network that
connects all branch offices to the company’s headquarters.
Company Headquater
DB2 Data Sharing Group
DB2I
TCP/IP
Business
Applications
EPOS
System
SALES DETAILS
Informix IDS V7.3
We will follow this approach while designing the replication solution for this
case study.
Control Table Placement: The control tables that coordinate change capture
always have to be created at the replication source server. Apply’s control
tables, the control server tables , can be placed anywhere in the network. As
we decided to locate Apply centrally at the replication target server database,
we also will create Apply’s control tables within the replication target server
database. All subscription information can be retrieved using only one local
database connection. Performance and manageability could not be better.
Let us see how we can deal with this requirement. The most interesting
question here is, whether there will be any volatile data stored in these
DataJoiner databases that will make any housekeeping for the DataJoiner
databases necessary. And the answer is definitely NO!
As a rule of thumb to estimate how much disk space will be finally required for
all DataJoiner databases, multiply the number of non-IBM source servers by
20 MB:
Number of Non-IBM Source Servers * 20 MB = Required DJ DB Disk Space
If you realize, when your replication system is growing, that it takes too much
time to collect data from all branches sequentially, you can always
re-distribute the subscriptions that are already running over all available
Apply qualifiers. To do so, follow the instructions given in 5.6.6, “Changing
Apply Qualifier or Set Name for a Subscription Set” on page 134.
OS/390 Remark: On OS/390, for example, the size of the spill file that Apply
allocates when it fetches data from a change data table is defined within the
Apply start job. If you specify a huge file size, because the biggest shop
requires it, a huge spill file is allocated for every set that this Apply job (this
Apply Qualifier) services. This remark does not apply to Apply for UNIX or
Intel platforms.
Consider that the same identically structured table, in our example the
SALES table, is created at each of the distributed locations. Additionally,
consider that all the distributed tables will be defined as sources for data
replication.
To consolidate the content of all the SALES tables into one large
company-wide SALES table that contains the data of all distribution sites, the
technique we are introducing here basically requires creating multiple
UNION TABLE
VIEW 1
VIEW 2
VIEW n
TARGET
The following task list describes how to set up a Target Site Union replication
system:
1. Create the replication target table manually. Use the same DDL (same
structure) as used at the distributed locations.
2. Create as many views over the target table as there are distributed
locations (create as many views as the number of subscriptions that you
expect). Each view should be created as follows:
In case of a full refresh, this technique lets Apply automatically append the
content coming from one location, instead of deleting the complete table and
inserting the data from the one location that is currently being refreshed.
Apply does so by deleting everything from the view, letting the WHERE
clause of the view limit the effect of the delete. Apply has no knowledge of the
UNION table: Apply knows only the view.
Background on Refresh
When Apply performs an initial refresh, Apply deletes the complete target
table before inserting the content selected from the replication source table
(Apply replaces the target content with the source content to initialize the
replication subscription). Defining the view as replication target table,
Apply’s delete (we call that mass delete) is restricted to the values that
fulfill the where-clause of the view.
If your source tables do not have an attribute that is unique at every source
site that could be used in the where-clause of the target site views, we have
two options to generate such a uniqueness attribute:
1. Create a new column at every source site.
2. Create a uniqueness attribute automatically during replication without the
need to change the source data model.
Obviously we could create another column, but that is perhaps not what we
want. More easily, we could use one of DProp’s advanced features and
create the uniqueness attribute on the fly (while replicating the data up to the
consolidated target).
Use the DJRA feature List Members or Add a Column to Target Tables to add
a computed column to a subscription member as shown in Figure 24.
The following SQL excerpt shows the most interesting statements that were
automatically generated by DJRA:
--* The column name REPLFLAG1 is not present in the target table
--* CHRIS.REPLFLAG.
ALTER TABLE CHRIS.REPLFLAG ADD REPLFLAG CHAR(8) NOT NULL WITH DEFAULT;
...
-- create a new row in IBMSNAP_SUBS_COLS
INSERT INTO ASN.IBMSNAP_SUBS_COLS
(APPLY_QUAL, SET_NAME, WHOS_ON_FIRST, TARGET_OWNER, TARGET_TABLE,
COL_TYPE, TARGET_NAME, IS_KEY, COLNO, EXPRESSION) VALUES
(’IFXUP02’, ’SET01’ , ’S’, ’CHRIS’, ’REPLFLAG’,
’C’, ’REPLFLAG’, ’N’, 3 , ’SUBSTR (’’BRANCH01’’ , 1 , 8)’);
Create the views at the target site referencing the new calculated column in
the where-clause, like:
WHERE REPLFLAG = ’BRANCH01’;
6.2.2.4 Aggregation
In common data consolidation examples, it not necessary to replicate all table
records created at the source sites. Instead, it would be sufficient to replicate
a summary only (for example, summaries grouped by products).
The IBM replication solution provides two methods for the replication of
summaries:
• Base Aggregates: Summaries are built over the replication source tables
• Change Aggregates: Summaries are built over the change data tables
Figure 25 shows that system layer 2 (the IBM database middleware layer) is
installed on one of the AIX servers (sky), that already contained one of the
Informix instances. The other two Informix instances are accessed remotely
using Informix ESQL/C client software.
DProp
Apply
mvsip
Informix V7.3
sj_branch03
Informix V7.3 Informix V7.3
sky
sj_branch01 sj_branch02
azov star
Smart Remark: All network connections between all system components use
TCP/IP as the network protocol.
Assumption: All three Informix server instances are installed and running.
sky sjsky_ifx01
azov sjazov_ifx01
star sjstar_ifx01
All Informix server instances were from "Informix Dynamic Server, Version
7.30UC7".
In order to connect to all Informix instances, the sqlhosts file used on sky was
configured with the following four entries. Please be aware that we will
reference these entries later when creating the DataJoiner server mappings.
#********************************************************************
#
# location: $INFORMIXDIR/etc/sqlhosts
#
To check the success of this configuration step we used the Informix client
interface dbaccess to natively connect to all three Informix instances. Refer to
Appendix B, especially B.2.2, “Using Informix’s dbaccess” on page 329 for
useful instructions on how to set up and use Informix’s client interface
dbaccess.
The first step after loading the DataJoiner code onto the middleware server
was to create an Informix data access module (“Step 4—Prepare DataJoiner
to access the remote data sources” of the implementation checklist).
DataJoiner will use this access module for all connections to Informix using
the currently installed version of the Informix client.
Edit the file djxlink.makefile before executing the make command to set the
Informix environment variables accordingly.
The result of executing the make command will be the Informix data access
module, named ’ informix72’.
Remark: The name of the DataJoiner data access module we created during
this step is ’informix72’. Nonetheless, we will access Informix servers running
Informix Dynamic Server Version 7.3. To clarify, the name of the data access
module is not related to the server version. It is just a label. If you like,
change the name when building the data access module.
DB2 UDB for AIX, Version 5.2, was not only installed to demonstrate how
flexible the DB2 communication setup really is, but also to work around a
current limitation of DataJoiner: Although DataJoiner, V2.1.1, contains the
capability to access all DRDA servers using TCP/IP, DataJoiner V.2.1.1 is not
enabled to be a DRDA application server through TCP/IP itself. DB2 UDB, by
the way, is.
-- SYSIBM.IPNAMES -------------------------------------------------
--
-- LINKNAME: Pointer to SYSIBM.LOCATIONS
-- IPADDR: HOSTNAME or IP Address
-- SECURITY_OUT: P (connect with userid and password)
-- SYSIBM.USERNAMES -----------------------------------------------
--
-- TYPE: O (Type of translation, oh for outbound)
-- AUTHID: DB2RES5
-- LINKNAME: Pointer to SYSIBM.LOCATIONS
-- NEWAUTHID: djinst3 (DJ authid)
-- PASSWORD: djinst3 (dj’s password)
Refer to the IBM Redbook Wow! DRDA supports TCP/IP, SG24-2212 for
further details on how to set up DRDA connectivity using TCP/IP between
DB2 for OS/390 and other DB2 database servers.
Remark: DB2 for OS/390 caches the tables of the communication database
(CDB). Therefore, if you update your CDB tables again after the first
connection attempt, you will need to recycle the DB2 Distributed Data Facility
(DDF) to make your changes effective.
From this client, we cataloged the DataJoiner instance as TCP/IP node and
all databases directly at the DataJoiner instance (for LAN connections, no
hopping over DB2 UDB is required). The DB2 for OS/390 replication target
server was also cataloged at the DataJoiner instance, using the DataJoiner
instance as DRDA Gateway.
Bind Apply
After creating the replication control tables, Apply for OS/390 was bound
against the replication target server (DB2 for OS/390) and against all
DataJoiner databases. Refer to “Step 22—Bind DProp Apply” of the general
implementation guidelines for more details about the Bind task.
To enable SPUFI to work with DataJoiner databases, just bind SPUFI against
those databases.
SPUFI packages have to be bound against all new locations you want to
access (in our case, all three DataJoiner databases), and the SUPFI plan has
to be rebound, for all locations (including those you were accessing already
before). The following excerpt of the Bind job shows the procedure:
DSN SYSTEM(DB2I)
BIND PACKAGE (DJDB01.DSNESPCS) MEMBER(DSNESM68) -
ACT(REP) ISO(CS) SQLERROR(NOPACKAGE) VALIDATE(BIND)
BIND PACKAGE (DJDB02.DSNESPCS) MEMBER(DSNESM68) -
ACT(REP) ISO(CS) SQLERROR(NOPACKAGE) VALIDATE(BIND)
BIND PACKAGE (DJDB03.DSNESPCS) MEMBER(DSNESM68) -
ACT(REP) ISO(CS) SQLERROR(NOPACKAGE) VALIDATE(BIND)
To process any SQL against DataJoiner, and therefore any SQL against
Informix, possibly using nicknames, DataJoiner’s PASSTHRU mode or
transparent DDL, set the CONNECT LOCATION on the SPUFI main panel to
the location name of the DataJoiner database (as defined within the DB2 for
OS/390 communication database):
For remote SQL processing:
Play around! Create a table in, say Informix, using SPUFI for OS/390, to
realize that there are no more limits (ask your DataJoiner administrator for the
necessary database privileges).
Assumption:
All Informix databases contain an identically structured sales table. The table
name is SJCOMP.SALES.
Therefore, the first task when preparing the registration of the SALES tables,
located within the three Informix source databases, was to create a nickname
for every SALES table:
CONNECT TO DJDB01;
CONNECT TO DJDB02;
CONNECT TO DJDB03;
After creating the nicknames, the DJRA function Define One Table as a
Replication Source was used to register the nicknames as replication
sources. We chose the following replication source characteristics for this
case study:
• Capture all available columns
• Capture After-Images only
• Capture Updates as Updates (not as Delete/Insert pairs)
If you want to understand how the created change capture triggers finally
work, see section 6.7, “Some Background on Replicating from Multi-Vendor
Sources” on page 166. It introduces an overall picture of all triggers defined
for a non-IBM replication source server and describes how the triggers
interact to emulate all functions that, for DB2 replication sources, are
provided by DProp Capture.
COMMIT;
COMMIT;
After defining the sets, one member was added to each set.
Source Server Target Server Apply Qualifier Set Name Event Name
Note that we chose event-driven subscription timing, using a single event for
every set (to better control the replication activities for our test scenario).
Note: DJRA also supports the setup of subscription members for existing
target tables or target views. That means, no target table is created if the
target table (or view) already exists. However, a CREATE TABLESPACE
statement is always generated, regardless of whether the target table exists
or not. We simply removed the CREATE TABLESPACE statement from the
SQL output that DJRA generated.
The following insert into the event table, for example, will trigger the
subscription replicating from branch 03:
INSERT INTO ASN.IBMSNAP_SUBS_EVENT (EVENT_NAME, EVENT_TIME)
VALUES (’BRANCH03’, CURRENT TIMESTAMP);
Remark: Apply queries the event table after every subscription cycle to see if
there are new events that trigger another subscription. If there is nothing to
replicate, Apply will at least query the event table every 5 minutes.
Bar 1 visualizes the amount of time that it took to insert a day’s worth of sales
data (27,340 rows) into the sales table at Informix. (The value was taken from
the performance measurement experiment in Chapter 3: 3.4, “Performance
Considerations for Capture Triggers” on page 55. Even though we have set
up capture triggers for the Informix table during this case study, we want to
eliminate the impact of change capture triggers for this comparison.)
Bar 2, now, shows the time Apply for OS/390 needed to replicate the
captured changes (27,340 rows) to DB2 for OS/390. This time bar is divided
into two sections:
• Section 1: Apply’s fetch phase, fetching the data from Informix/AIX into the
Spill file on the host.
• Section 2: Apply’s insert phase, inserting the change data from the spill file
into the target table (through the target view).
Remark: The start and the end of the insert phase was exactly measured by
adding SQL statements (one of type B, one of type A) to the subscription set,
that inserted the current timestamp into a separately created table. See
Figure 26.
45 55 INSERT Phase
Applying the sec sec
FETCH Phase
Change Data
to DB2 for *) Informix Insert Performance
OS/390 without change capture triggers
60 120
sec sec
As expected, the Inserts on the host are quicker that the Inserts on AIX! Even
though you might consider this to be obvious, we would like to use this result
to encourage you to invest some time on performance considerations before
you decide about the platform of your central data store or data warehouse.
The main issue will therefore be to clone the available setup information and
all defined database objects (like change data tables or capture triggers) to
meet the productive requirements. Mainly, two different strategies can be
followed to achieve this cloning:
• Strategy 1: DJRA provides a feature to re-generate DProp control
information, by re-engineering inserts to the DProp control tables from
existing definitions. This feature is called the PROMOTE feature (also
referred to as the CLONE feature).
It is recommended to use the promote function when carrying replication
definitions over from a test to a production system, because all changes
made to the replication control tables after the initial setup (for example, to
tune the setup) will be caught by PROMOTE.
• Strategy 2: Save all DJRA-generated or customized SQL scripts that
were used to configure the test system. As an option, anonymize the
scripts and generate new scripts from the anonymized examples when
adding a new source server to the replication system. Objects that are
unique for each productive instance are:
• CONNECT statements (either to the source server or to the control
server)
• Non-IBM database names, which are referenced in SET PASSTHRU
commands or CREATE NICKNAME statements
• References to the replication source server, the replication target
server, and the replication control server, that are named within the
INSERT statements that configure the replication control tables.
If separate procedures exist to create database objects for Informix and
DB2/DataJoiner, divide the generated scripts into one DB2/DataJoiner part
and one Informix part.
Remark: You may notice that the pruning control trigger code changes
with every new replication source table that is added to a non-IBM
replication source server.
The change capture triggers will feed the Change Data tables at the
multi-vendor replication source database. Providing compatibility with DB2
replication sources, capture triggers can be defined to capture both before
and after images or after images only. Additionally, the DProp Capture feature
to capture updates as delete-and-insert pairs can be emulated. Of course,
triggers can be set up to capture either only certain columns of a replication
source table or all the available columns.
The pruning trigger is used to delete records which are no longer needed
from the non-IBM replication source’s Change Data tables. Change Data
table rows are no longer needed when all the Apply processes have
replicated these records to the replication targets. The pruning trigger is
defined on the pruning control table (within the non-IBM replication source
database) and is invoked when Apply updates the pruning control table after
successfully replicating a Subscription Set. Refer to Chapter 5.5.13.2, “How
to Defer Pruning for Multi-Vendor Sources” on page 127 to see how to gain
performance benefits by temporarily disabling the pruning trigger for non-IBM
We have seen when setting up the replication definitions for this case study
that all the triggers are created natively within the non-IBM replication source
database. The reg_synch trigger is defined when the control tables are
created; the capture triggers are generated when a non-IBM table is defined
as a replication source.
Change capture triggers are always automatically generated for the three
possible DML operations. The definition of a non-IBM table as a replication
source, therefore, always results in the creation of three native change
capture triggers:
• One trigger for INSERT
• One trigger for UPDATE
• One trigger for DELETE
The trigger is defined to execute after each insert operation into the source
table, and it inserts a new row into the Change Data table, named
CHRIS.SALESCD. All the new column values, represented by :NEW.<columnname>,
are used when inserting a row into the Change Data table.
Remark: DProp Capture, DB2’s log based change capture mechanism, reads
the DB2 database log sequentially and as quickly as possible. Capture does
not wait for transactions to commit or rollback. To ensure that only committed
change data is replicated to the replication target tables, DProp Capture
maintains a global unit-of-work table (ASN.IBMSNAP_UOW) that contains
one record for every committed transaction. DProp Apply joins every Change
Data table with the global unit-of-work table when replicating from a DB2
replication source. Using this technique, change data that has not yet been
committed is hidden from the Apply process and therefore is not replicated.
reg_synch
Insert
Source Table Update Pruning Control Reg_Synch Register
Delete
Prune
CCD Table
Multi-Vendor
Database
3 4 1 2
Other
Control Tables
DataJoiner
Database
APPLY
To Target
As the first action after Apply has connected to the DataJoiner database,
Apply executes an SQL Before statement which updates the REG_SYNCH
table (in Figure 28, this operation is marked as step 1). This only use of this
update is to invoke the reg_synch trigger, which immediately updates the
SYNCHPOINT column for all registered source tables in the register table
(as previously explained). The SQL Before statement that updates the
REG_SYNCH table is automatically added when creating a subscription set if
the source server is a non-IBM database.
Still connected to the replication source server, Apply will subsequently fetch
the most recent changes to the target server, which is shown as step 3. As we
are dealing with multi-vendor sources here, the change data table is
previously fed by change data triggers (assuming that the source table was
changed since Apply accessed the source server before).
After all changes have been applied to the target server, Apply reconnects to
the source server to advance the status of the subscription with an update to
the pruning control table, which is shown as step 4. Updates to the pruning
control table will finally invoke the pruning control trigger (if it has not been
disabled as described in 5.5.13.2, “How to Defer Pruning for Multi-Vendor
Sources” on page 127) to prune all records from the change data table that
were already replicated.
6.8 Summary
We used case study 1 to give you a practical example for a data replication
application, using:
• Informix replication source servers
• A DB2 for OS/390 replication target server
• IBM DataJoiner as central database middleware
• DProp Apply to actually move the data
After focusing on the implementation of the test environment that was used to
prove all techniques, we provided ideas on how to carry a tested replication
application over from a test environment to a production environment.
The final part of this chapter, showing change capture triggers at work, can be
used as a reference see how the IBM replication solution integrates
multi-vendor database systems into an enterprise-wide, cross-platform data
replication application. (It’s really that easy!)
We will utilize the following major techniques in IBM Data Replication Solution
within this scenario to optimize the performance and the managability of the
solution:
• Replication from DB2 for OS/390 to Microsoft SQL Server
• Source-Site Join-Views
• Noncomplete, condensed internal CCDs
• Two-tier versus three-tier approach
• Pull configuration for enhanced replication performance
• Data subsetting to distribute only the data relevant to each branch
• Invoking stored procedures in the target database
TCP/IP
Two major approaches exist for the design of the new inventory application:
1. The inventory application accesses the required product and supplier
information directly from the DB2 for OS/390 database at the company
headquarters, using remote requests over the network link.
2. The application accesses a local copy of the required data held in the
Microsoft SQL Server database (where all the other relevant data for the
application is located as well).
The first design approach has some serious disadvantages in this scenario:
• Network outages between head offices and branches will directly affect
the availability of the new inventory application.
• The contention between the instances of the new inventory application in
the branches and the central applications will have an impact on the
performance of the central applications.
• The network traffic will increase, which will result in higher network costs.
• The performance of the local inventory application will be degraded due to
remote database requests.
These issues lead to the conclusion that the second design approach, where
local copies of the relevant data are distributed to each of the branches, is
more feasible.
The only issue that has to be resolved for the second approach is: The
distribution of copies of the data introduces redundancy into the system.
Because the required data is not static, the redundancy has to be managed to
Each branch will copy a subset of data from the headquarters database
corresponding to the products sold at that particular branch.
Supplier
Store_item Supp_no
Store_num Supp_Name
Prodline_no
Store
Store_Num
CompNo
Name Items
Street Item_Num
City Desc
Zip Prod_Line_No ProdLine
Region_Id Prod_Line_Num
Supp_No
Desc
Brand_Num
Sales
BasartNo
Date
StoreNo
Comapny
Out_Prc
Tax Brand
Location Brand_Num
Pieces
Transfer_Date Desc
Process_Date
Figure 30. Partial Data Model for the Retail Company Headquarters
You can refer to Table 7 on page 206 for description of the tables. Only the
STORE_ITEM table is not described there. This table holds information about
the product lines sold in each store.
The table S_PRODUCT holds the information about the products available at
a particular branch.
The table P_ITEMS holds the information about the number of ITEMS for
each product line.
P_Items
Prod_Line_Num
Item_Count Brand
Brand_Num
Desc
Figure 31. Partial Data Model for a Branch of the Retail Company
The target tables are read-only. Therefore, you do not need to set up conflict
detection. Applications can use the target tables, which are local copies, so
that they do not overload the network, and will make the load on the central
server more managable. Refer to Figure 32.
Since the target is a non-IBM database, the Apply program cannot connect to
the Microsoft SQL Server directly. It will connect to a DataJoiner database
instead (with DB2 DataJoiner connected to the Microsoft SQL Server) and will
apply the changes to Microsoft SQL Server targets using DB2 DataJoiner
nicknames.
WinNT/AIX
DataJoiner
Since the data volume was acceptable in this case study, we chose the first
solution.
That is, we can choose to run the Apply program at the source server (on the
headquarters side), which is called a Push configuration, or at the target
server (on the DataJoiner side), which is called a Pull configuration.
1. In a Push configuration, the Apply program for OS/390 connects to the
headquarters source server (DB2 for OS/390) and retrieves the data. Then
it connects to the remote DataJoiner server and pushes the updates to the
target table in Microsoft SQL Server (through DataJoiner nicknames).
In a Push configuration, the Apply program pushes the updates row by
row, and cannot use DB2’s block fetch capability to improve network
efficiency.
The Push techniques are touted as reducing the overhead of having
clients continually poll the server, looking to see if there is any new
information to pull. This configuration will be sufficient when tables are
infrequently updated.
2. In a Pull configuration, the Apply program is located at the DataJoiner
server and connects to the remote DB2 for OS/390 to retrieve the data.
DB2 can use block fetch to retrieve the data across the network efficiently.
After all the data is retrieved, the Apply program connects to the
DataJoiner database and applies the changes to Microsoft SQL Server
through DataJoiner nicknames.
In a Pull configuration, the Apply program can take advantage of the
database protocol’s block fetch optimization.
Two-tier
STORE_ITEM ITEMS Replication
Item_num
Store_num
Desc
prodline_no
Prod_line_no
Supp_no
S_PRODUCT (view)
Store_num
Item_num
Desc
Prod_line_no
Supp_No
Source
Store_num=01 Store_num=nn Target
S_PRODUCT S_PRODUCT
Item_num Item_num
Desc Desc
Prod_line_no Prod_line_no
Supp_no Supp_no
Store01 Storenn
For the other tables, BRAND, PRODLINE, and SUPPLIER which are all
needed in each store with their whole contents, we will use internal CCDs to
net out hot-spots while updating the source tables. This will reduce the
number of rows that really need to be replicated, if the same record (same
primary key) is updated several times within one replication cycle.
So we will have a two-tier topology for tables STORE_ITEM and ITEMS, and
a three-tier topology for the other tables, as shown in Figure 37.
Tier 1
S_PRODUCT View
APPLY
Noncomplete
condensed
CCD Table Tier 2
Capture (internal)
APPLY
DJDB Nicknames
Middleware
DataJoiner
for NT
Tier 3
PRODUCT SUPPLIER PRODUCT SUPPLIER
MS SQL Server
MS SQL Server
Store 01 NT Store 02 NT
Noncomplete CCD tables contain only the modified rows from the source
table.
The CCD table that we created for the SUPPLIER table is called CCDSUPP.
In this case study, we used the following technique to fulfill this task:
1. In the Microsoft SQL Server database, we created the following stored
procedure in the target database:
CREATE PROCEDURE compute_item AS
delete from p_items
insert into p_items select prod_line_num, count(item_num) from s_product
group by prod_line_num
This stored procedure will aggregate the product numbers for each
product line sold in the store. The first statement is used to clear the
historic data. And the second part of the stored procedure computes the
current aggregate data, then inserts into the aggregation table.
Each time the Subscription Set is processed, this stored procedure is
called.
DProp DProp
Capture Apply
OS/390
wtscpok
DProp
Apply
DataJoiner V2.1
DJDB
MS SQL Server
Client
We assume that all the Microsoft SQL Servers in the branches are already
installed and running.
To check the success of this configuration step we used the SQL Server
Enterprise Manager to natively connect to all the SQL Server instances.
You do not need to install DProp Apply on the DataJoiner server because
Apply has already been installed with DataJoiner.
Open DJRA, select File => Preferences, then click the Connection tab, and
set the userid and password for the source and target.
At the DataJoiner instance, change the directory to SQLLIB\BND and use the
following statements to bind Apply:
Connect to SJ390DB1 user db2res5 using pwd;
bind @applyur.lst isolation ur blocking all;
bind @applycs.lst isolation cs blocking all;
Connect to DJDB;
bind @applyur.lst isolation ur blocking all;
bind @applycs.lst isolation cs blocking all;
Remark: You must first register all the tables, before defining the join view as
replication source, then register the S_PRODUCT view (see Figure 42).
For the Column capture policy, select After-images only (Option: both
before-images and after-images would be used in an auditing scenario, for
example).
For Update capture policy, if the souce tables’ primary key or partition key
could be updated, then you would have to choose Updates as delete/insert
pairs. Here we simply need Updates captured as updates.
Remark: If the Capture program is running while you are defining a new
replication source, you will have to reinitialize Capture so that it takes the new
registration into account.
Logically you will create the Subscription Sets for the CCDs first, and then the
Subscription Set for the User Copy tables.
For the Copy Tables Subscription Set, the Apply Qualifier is AQLY (the Apply
Qualifier is used in the command to start Apply, and it is also used as part of
the password-file name).
We also specify the time interval for this Subscription set as 1440 minutes,
which means 24 hours.
We can see from Figure 43 that there is another parameter named Blocking
factor. The value you specify here will be the MAX_SYNCH_MINUTES value.
If a blocking factor is specified, Apply takes this factor into account when
selecting data from the change data tables (either CD or CCD). If the time
span of queued transactions is greater than the numbers of minutes specified
by MAX_SYNC_MINUTES, Apply will try to convert a single subscription
cycle into many mini-cycles, cutting the backlog down to manageable pieces.
But, in doing so, Apply will never cut transactions into pieces. A transaction is
always replicated completely, or not at all. This reduces the stress on the
network and DBMS resources and reduces the risk of failure.
Save the file as AQLYdb2DJDB.PWD in the directory where you will invoke the
Apply program. AQLY is the value of Apply qualifier we defined in the
previous step (see Figure 43 on page 195).
Note: In this step, the CCD table is an internal CCD; we used DJRA’s target
table logic user exit to customize the create tablespace statements for the
CCD table’s tablespace.
The following is the DB2 for MVS part of the target table logic file:
SAY "-- in TARGSVR.REX";
SUBLOGIC_TIME_SUFFIX=SUBSTR(TIME(’L’),4,2)||,
SUBSTR(TIME(’L’),7,2)||,
SUBSTR(TIME(’L’),10,2);
SELECT
WHEN SUBSTR(IN_TARGET_PRDID,1,3)="DSN" THEN; /* DB2 FOR MVS */
DO; /* CREATE A TABLESPACE FOR THE TARGET TABLE */
SAY "-- About to create a target table tablespace";
SAY "CREATE TABLESPACE TS"||SUBLOGIC_TIME_SUFFIX;
SAY " IN SJ390DB1 SEGSIZE 4 LOCKSIZE PAGE CLOSE NO CCSID
EBCDIC;";
OUT_TARGET_TABLESPACE="SJ390DB1.TS"||SUBLOGIC_TIME_SUFFIX;
END
Attention: The source tables you choose are always the real tables, not the
CCDs. This would be different if you had defined external CCDs instead of
internal CCDs, because in the case of external CCDs, it is the CCDs that are
indicated as sources for the dependent target tables.
So when you use internal CCDs, the CCDs are really transparent. You define
them as targets, but you never refer to them afterwards. Apply will, of course,
take the internal CCDs into account when servicing the subscriptions.
Remark: You must not indicate the word where in the where-clause input
field.
You should pay attention to the following items in the generated SQL:
The table it creates in the SQL Server database has the default schema
"dbo", but DJRA will fetch the REMOTE_AUTHID from the
SYSIBM.SYSREMOTEUSER table: "sa".
In the generated SQL, it will use "sa" as the table schema when creating
nicknames and indexes, so you should update the SQL script, and change
"sa" to "dbo". If you can create a user with the same login id and username in
Microsoft SQL Server, then there will be no need to update the SQL Script.
Note: Since DESC is a reserved word in SQL Server, you cannot create a
table with this column name in the SQL Server database (it will report an
ODBC error 37000), so you should update the generated SQL manually, and
update the target table column name. Refer to Appendix D.3, “Add a Member
to Subscription Sets” on page 342.
You also can specify a trace for the Apply program using the following
command:
asnapply AQLY DJDB trcflow;
This can help you when there is something wrong: You can get the error
messages and sqlcode from the trace information. You can also record the
trace information in a file by running the following command:
asnapply AQLY DJDB trcflow > filename;
The specific objectives of the case study are to demonstrate how DProp can
be used in a data warehousing environment to:
• Populate and maintain a data warehouse in a non-IBM database.
• Show how join replication can be used to denormalize data.
• Describe how temporal histories can be automatically maintained by
DProp within the data warehouse.
• Demonstrate how DProp can automatically maintain aggregations of data
within the data warehouse.
In this chapter we will also describe a technique for pushing down the
replication status to a non-IBM database. This is not specifically a data
warehousing issue, but it is, nevertheless, a useful trick.
This new business intelligence (BI) application will enable the company to
control their inventory more closely and manage their supply chain more
The retail company has decided to utilize an existing Oracle server to act as
the data warehouse store. This server is located within the head office.
Head Office
Retail Outlet
EPOS
DB2 for OS/390
Data Sharing Group
Retail Outlet
Stock Ordering and
EPOS
Distribution
Application
.
..
Oracle
Data
Warehouse Retail Outlet
Server EPOS
Note: The replication techniques introduced in this case study will show
solutions for some of the most common issues in populating data warehouses
or data marts, and will be applicable for many other data warehousing
situations.
Supplier
Supp_no
Supp_Name
Store
Store_Num
CompNo
Name Items
Street Item_Num
City Desc
Zip
Prod_Line_No
Region_Id
Supp_No
Sales
BasartNo
Date
Location
Company ProdLine
Out_Prc Prod_Line_Num
Tax Desc
Pieces Brand_Num
Transfer_date
Process_Date
Region
Region_Id
Region_Name
Contains_stores
Brand
Brand_Num
Desc
Items Contains 1 row for each product which the company 38,000
sells.
The Valid_From and Valid_To columns in Outlets and Suppliers and the
IBMSNAP_LOGMARKER and EXPIRED_TIMESTAMP columns in Products
enable those tables to maintain temporal histories and will be created
manually. For more detailed information refer to 8.4.6, “Adding Temporal
History Information to Target Tables” on page 250.
The data model shown in Figure 50 does not show all of the DProp control
columns. These columns are added to target tables during the subscription
process and are automatically maintained by DProp.
IBMSNAP_LOGMARKER is shown because it is used by the data warehouse
applications as the start of a record’s validity period.
The data has been denormalized into a star schema with Sales as the central
fact table and three dimensions for Products, Outlets and Time.
Denormalization was performed in order to aid query performance. More
complex data warehouses are likely to have more than three dimensions, but
for the purpose of this study, three will suffice.
Since the Time dimension table does not require any DProp replication
definitions to be maintained, it will not be discussed further in this book.
Both the type and location of the replication source and replication target are
fixed by business requirements. Since the source database is fixed, the
placement of DProp Capture is also fixed: Capture must be co-located with
the source database. The placement of all other components, such as
DataJoiner, Apply and the Replication Administration Workstation are
variable.
As a general rule of thumb, DProp can perform any data transformation which
can be expressed in SQL by using views over source tables, staging tables or
target tables. Alternatively, more complex transformations can be achieved by
executing SQL statements or stored procedures (either DB2 or multi-vendor)
at various stages during the subscription cycle. The SQL or stored procedure
can operate against:
• Any table at the replication source system, including replication source
and change data tables.
• Any table at the replication target system, including replication target
tables. The SQL statements or stored procedures can be executed before
or after the answer set is applied.
8.2.2.2 Denormalization
Database systems used for on-line business critical applications are tuned for
high volume transaction processing. Typically this requires the data to be in a
highly normalized form (as shown in Figure 49 on page 205). This form is
optimized for fast SQL insert and update transactions, but not for the selects
which will be used in the warehouse environment. A common technique in
data warehousing is therefore to hold the data in a denormalized form within
the warehouse—thus facilitating faster response to queries. The process of
introducing redundancy and structuring the data according to business needs
instead of application needs is known as denormalization.
Other techniques are available with DProp for denormalizing data. For
example, creating views over staging tables or simulating outer joins. These
techniques are not specifically covered in this chapter, but use the same type
of procedures as those which are described in detail.
DProp provides the so-called Consistent Change Data Tables (CCD Tables)
as a solution for history tables of this type. See the DB2 Replication Guide
and Reference, SR5H-0999 for a basic introduction on CCD tables.
The fact table usually records events (such as a sale). An event is associated
with a single date or timestamp. Events are inserted periodically (for example,
daily or weekly) into the fact table, building a history of events over time.
The attribute values recorded in the dimension tables (for example the
supplier information for a product) are usually valid for a certain period of
time. For example, product X was supplied by supplier A from 1997-02-01 to
1997-12-31. After this time period the supplier for product A was switched to
supplier B.
TCP/IP
RS/6000 J50
DDCS AIX v4.3.1
Apply
SQL*Plus
DataJoiner V2.1.1
Oracle V8.0.4
Net8
Data Warehouse
Advice: Net8 only provides the communication between Oracle client and
Oracle database server. It does not provide a command line interpreter where
Now create the DataJoiner instance using db2icrt, and create the DataJoiner
database that will be used to access the Oracle database. The following
syntax was used to create the DataJoiner database for this case study:
CREATE DATABASE djdb
COLLATE USING IDENTITY
WITH "DataJoiner database";
Once the DataJoiner database has been successfully created, configure DB2
database connectivity between DataJoiner and the DB2 for OS/390
subsystem which is to act as the replication source. Connectivity for this case
study is established using DRDA over TCP/IP using the following node and
database definitions:
CATALOG TCPIP NODE DB2INODE REMOTE MVSIP SERVER 33320;
CATALOG DCS DATABASE SJ390DB1 AS DB2I;
CATALOG DATABASE SJ390DB1 AT NODE DB2INODE AUTHENTICATION DCS;
Now that all DB2 connectivity has been established and verified, configure
connectivity from DataJoiner to Oracle. This connectivity is configured by
Now that all database connectivity has been configured and verified, we need
to start implementing the DProp Capture and Apply components.
When DJRA has been installed, proceed to “Step 17—Set up DJRA to access
the source and target databases” of the Implementation Checklist to enable
DJRA to communicate with all databases within the replication scenario. In
this case study, this means DB2 for OS/390 and DataJoiner (because
DataJoiner is used to establish connectivity to the Oracle database).
Once the bind has completed, you are now ready to start defining replication
sources (called registrations) and their associated targets (called replication
subscriptions). A summary of the steps required to configure the replication is
detailed in 8.4, “Implementing the Replication Design” on page 217.
DProp Capture must be started after the replication definitions have been
created (steps 1 to 8), but before populating the data warehouse for the first
time (step 9). DProp Apply may be started after the data has been loaded into
the warehouse.
The subscription set timing has been defined to execute every 1440 minutes
(that is, once every 24 hours) at midnight (presumably when there is little
activity on the servers or network).
Advice: Another option to control the timing of the replication is to use event
based timing. See 3.3.2.3, “Advanced Event Based Scheduling” on page 53
for an example of how to use event based timing to execute your
subscriptions once a day at midnight, on week days only.
The SQL generated by DJRA can be seen in Appendix E.1, “Output from
Define the SALES_SET Subscription Set” on page 347. The generated SQL
was saved and then executed using the Run menu option from the DJRA
output window.
These requirements can be satisfied with DProp by specifying the target table
to be a complete, non-condensed CCD with an additional column to record
expiry timestamps (for time consistent queries). These attributes are
summarized in Table 9:
Table 9. Attributes of Supplier Target Table
Figure 53 shows the relationship between the source and target Supplier
tables.
Source
Target
Supplier(CCD)
Supp_no
Supp_Name
IBMSNAP_INTENTSEQ
IBMSNAP_OPERATION
IBMSNAP_COMMITSEQ
IBMSNAP_LOGMARKER
EXPIRED_TIMESTAMP
Suppliers(view)
Supplier_Number
Supplier_Name
Valid_From
Valid_To
All columns pre-fixed with IBMSNAP are DProp control columns which are
required and are automatically maintained by Apply for CCD target tables. A
view named Suppliers will be created to hide these control columns from
warehouse users and also to rename the IBMSNAP_LOGMARKER and
EXPIRED_TIMESTAMP columns to more meaningful names (see 8.4.2.4,
“Hiding DProp Control Columns” on page 228 for more details).
For a complete listing of the SQL used for registering Supplier, see Appendix
E.2, “Output from Register the Supplier Table” on page 348.
This approach has the advantage that multiple registrations can have refresh
disabled with a single SQL statement (in the example above, all those tables
registered and owned by ITSOSJ), but suffers from the drawback that the
After the modifications have been made and saved, the file was executed
from the DJRA output window using the Run menu option, thus defining the
Supplier table as a replication source.
With full refresh disabled, the administrator must synchronize Capture and
Apply before change capture replication can be enabled. This can be done
either manually or by using the Off-line load option of DJRA. Refer to 8.4.9,
“Initial Load of Data into the Data Warehouse” on page 261 for more details
on performing this synchronization.
If automatic full refresh was enabled, then Apply would automatically perform
the full refresh and synchronize itself with Capture when it is started.
Although the current version of DJRA does not allow the direct creation of
CCD tables at non-IBM targets, it is possible to work around this by editing
the generated SQL prior to execution. Future versions of DJRA may well
support this function directly.
Note the following from the subscription definition shown in Figure 55:
• In this case, the Target table qualifier field is SIMON. This specifies the
user and schema who will own the target CCD table in Oracle. This must
be an already existing Oracle user.
Advice: When creating a target table at a non-IBM database, the target
table qualifier field must be set to a DataJoiner user who has a user
mapping defined to the remote server where the target table is to be
created. DJRA uses the remote authid from this user mapping to
determine the schema and owner of the remote table. This is not the case
when creating CCD tables in non-IBM targets because we are fooling
DJRA into thinking the CCD table will be in the local DataJoiner database.
Essentially we have to perform the mapping ourselves by specifying an
existing Oracle user who will own the CCD table. If the mapping is not
correct, the CREATE TABLE statement will fail during the subscription
definition with the following error message:
SQL0204N "SQLNET object: Unknown " is an undefined name. SQLSTATE=42704
• Target structure should be a CCD table and the DataJoiner non-IBM target
server should be (None). DJRA will issue a message and will not generate
Note: The Setup button is only available on DJRA versions 2.1.1.140 and
later. If you are using an earlier version of DJRA, then you will have to edit the
generated SQL to ensure that the following condition has been set:
ASN.IBMSNAP_SUBS_MEMBR.TARGET_CONDENSED=’N’.
Advice: If using a version of DJRA earlier than 2.1.1.140, then the generated
SQL would also have to be modified to remove the auto-registration of the
CCD. This is the SQL insert into the ASN.IBMSNAP_REGISTER table at the
end of the generated SQL. Failure to remove this record would result in SQL
return code -30090, reason code 18 when Apply attempts to replicate the
data to the target. This is because Apply thinks the CCD table is in DataJoiner
and is attempting to update both the CCD table (in Oracle) and the Register
table (in DataJoiner) in the same Unit Of Work (UOW).
The specific SQL After statements used to maintain temporal histories for the
Supplier table are:
UPDATE SIMON.SUPPLIER A SET EXPIRED_TIMESTAMP =
( SELECT MIN(IBMSNAP_LOGMARKER) FROM SIMON.SUPPLIER B
WHERE A.SUPP_NO = B.SUPP_NO AND
A.EXPIRED_TIMESTAMP IS NULL AND
B.EXPIRED_TIMESTAMP IS NULL AND
(B.IBMSNAP_INTENTSEQ > A.IBMSNAP_INTENTSEQ))
WHERE A.EXPIRED_TIMESTAMP IS NULL
AND A.IBMSNAP_OPERATION IN (’I’,’U’);
The view definition was stored in a file and executed directly from SQL*Plus.
For some useful hints on using SQL*Plus, see Appendix B.1.2, “Using
Oracle’s SQL*Plus” on page 325.
The technique used in this case is to copy the individual source tables to
target tables and perform denormalization through a view at the target site.
This approach is adopted in this case study to compare and contrast the
technique with the one discussed in 8.4.4, “Using Source Site Joins to
Denormalize Product Information” on page 237. Performing the join at the
target and not the source also alleviates the source system from having to
perform join operations against base and CD tables (as discussed in 8.4.4,
“Using Source Site Joins to Denormalize Product Information” on page 237).
To understand the replication techniques used for Store and Region, we first
have to understand the applications which work on the source data. For the
Region table:
• A record is inserted into the table when a new region is added.
• There are no deletes from the Region table. When a region no longer
contains any stores, the region information is maintained in the table and
the CONTAINS_STORES flag is updated with an ’N’.
• No other columns in the table are updated.
The data warehouse attributes and their DProp equivalents for the Store and
Region target tables are summarized in Table 10.
Table 10. ’Attributes of Store and Region Target Tables
Figure 57 summarizes the relationship between the source and target Store
and Region tables and their denormalization through a target site view.
Source
Target
Store(CCD) Region(PIT)
Store_Num Region_Id
CompNo
Name Region_Name
Street IBMSNAP_LOGMARKER
City
Zip
Region_Id
IBMSNAP_INTENTSEQ
IBMSNAP_OPERATION
IBMSNAP_COMMITSEQ
IBMSNAP_LOGMARKER
EXPIRED_TIMESTAMP
Outlets(view)
Store_Num
CompNo
Name
Street
City
Region_Id
Region_Name
Valid_From
Valid_To
Outlets is a view defined over the Store and Region target tables. For details
on the view definition, please refer to 8.4.3.4, “Create the Denormalization
View” on page 236.
Although in this case the target table type for Region is PIT, by analyzing the
application behavior and only replicating inserts we will actually create a
target table which maintains historic information (because records are only
appended to it).
Advice: There are three simple approaches for removing unwanted records
from the target history table:
1. The first approach, described here, is to place a predicate on the
subscription definition that prevents the unwanted records from
replicating. This is probably the simplest method, but also means that full
refresh for the source must be disabled.
2. The second approach would be to replicate the unwanted records, and
simply create a view at the target which does not include these records.
This has the disadvantage of replicating unwanted records, which would
consume network resources and CPU cycles. However, if at a later date
deletes are required in the target history, then this method simply requires
the target view to be redefined.
3. The third approach is the most flexible: Define a view over the source
table and register this view as a source for replication. However, before
executing the generated SQL for this registration, modify the CREATE
VIEW statement for the change data view in the generated SQL and add
the predicate IBMSNAP_OPERATION='I'. This way, the subscription does
not even know that the filtering is occurring and all the subscriptions will
be simpler once the source is set up this way. This CD-view technique also
works for both full refresh and differential refresh because the predicate is
defined on the CD table view and subsequently will not be applied to the
source during a full refresh.
The complete listing of the SQL executed to define these registrations can be
found in Appendix E.4, “Output from Register the Store and Region Tables”
on page 351.
Automatic full refresh for Region is disabled because we are going to define a
predicate in the subscription definition for Region, which prevents SQL
updates from replicating. This predicate refers to the IBMSNAP_OPERATION
column, which only exists in the CD table for Region. During full refresh,
Full refresh for the Store table is also disabled so that the historical
information held in this table does not get lost during a full refresh from the
source table.
For details on how the Store and Region tables where loaded into the target
database, refer to 8.4.9, “Initial Load of Data into the Data Warehouse” on
page 261.
The DJRA window used for defining the Region subscription member is
shown in Figure 59.
Note the following from the subscription definition shown in Figure 59:
The generated SQL from the DJRA tool for the Region subscription can be
found in Appendix E.5, “Output from Subscribe to the Region Table” on page
353.
The DJRA window used to define the Store replication subscription can be
seen in Figure 60.
The full listing of the SQL used to define the Store subscription can be found
in Appendix E.6, “Output from Subscribe to the Store Table” on page 355.
The following view definition was saved to a file and then executed from
Oracle’s SQL*Plus:
CREATE VIEW simon.outlets AS
SELECT s.store_num, s.compno, s.name,
s.street, s.city, s.region_id, r.region_name,
s.ibmsnap_logmarker as valid_from,
s.expired_timestamp as valid_to
FROM simon.store s,
simon.region r
WHERE s.region_id = r.region_id;
To execute the file in SQL*Plus, start Oracle SQL*Plus and then type:
The technique used is to create a view at the source site which performs the
denormalization. This view is then used as the source for replication, the
target table being the materialization of the source view.
Since we would like to maintain historic data at the target, the target table
type should be a non-condensed, complete CCD table. These attributes are
summarized in Table 11.
Table 11. Replication Attributes of Items, ProdLine and Brand Tables
Denormalize data in Items, Create a source site view which performs the
ProdLine and Brand denormalization and register this as a source for
replication.
Figure 61 shows the relationship between the three source tables, the
Products source site view, and the target CCD table.
Products(view)
Item_Num
Item_Description
Prod_Line_Num
Product_Line_Desc
Supplier_Num
Brand_Num
Brand_Description
Source
Target
Products(CCD)
Item_Num
Item_Description
Prod_Line_Num
Product_Line_Desc
Supplier_Num
Brand_Num
Brand_Description
IBMSNAP_INTENTSEQ
IBMSNAP_OPERATION
IBMSNAP_COMMITSEQ
IBMSNAP_LOGMARKER
EXPIRED_TIMESTAMP
This can be seen in the generated SQL in Appendix E.8, “Output from
Register the Products View” on page 361.
DProp Apply will use these views to determine the change data to replicate to
the target. Each of these views joins one CD table with all other base tables
from the original view. Therefore, when Apply is serving this subscription
cycle, it will be accessing the source tables directly (and joining these with
CD tables). This is an important fact to consider when replicating from a
source site view because DProp is no longer working purley from log based
change capture, but is also accessing base tables directly. This may impact
the performance of the source applications.
Alter the generated SQL to prevent full refresh of all the base tables by
setting ASN.IBMSNAP_REGISTER.DISABLE_REFRESH=1. The modified
sections of the code can be seen below:
COMMIT;
COMMIT;
COMMIT;
The complete SQL used to register the Items, ProdLine and Brand tables is in
Appendix E.7, “Output from Register the Items, ProdLine, and Brand Tables”
on page 357.
Advice: Remember to use correlation ids when creating the view. When
registering the view as a replication source DJRA parses the view definition
The view uses the SQL SUBSTR function to perform some data manipulation
on the DESC column on the source system. The view was created using
SPUFI on the OS/390 source system.
Once the view has been created, it can be registered as a replication source
using the Define DB2 Views as Replication Sources function in DJRA
(shown in Figure 63).
As with the registration of the base tables, the generated SQL is modified to
disable full refresh for the ProductsA, ProductsB and ProductsC views
(shown below):
-- register the base and change data views for component
INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER,
SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED,
SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE,
DISABLE_REFRESH,CCD_OWNER,CCD_TABLE,CCD_OLD_SYNCHPOINT,SYNCHPOINT,
SYNCHTIME,CCD_CONDENSED,CCD_COMPLETE,ARCH_LEVEL,BEFORE_IMG_PREFIX,
CONFLICT_LEVEL,PARTITION_KEYS_CHG) VALUES(’N’,’DB2RES5’,’PRODUCTS’, 1 ,
1 ,’Y’,’Y’,’DB2RES5’,’PRODUCTSA’,’ITSOSJ’,’CDPRODLINE’, 1 ,NULL,NULL,
NULL,NULL,NULL ,NULL,NULL,’0201’,NULL,’0’,’N’);
Figure 64 shows the DJRA function used to add Products to the SALES_SET
subscription set.
Note the following from the subscription definition shown in Figure 64:
• Only the DB2RES5.PRODUCTS view needs to be selected as a source for
replication. DJRA hides all the complexity of the base tables and
generated views at this point.
• Once again, because the target CCD is to be created in Oracle, the Target
table qualifier field is set to SIMON. This is the user and schema who will
own the target CCD table in Oracle. It must be an existing Oracle user.
Refer to Advice on page 224 for more information on setting the Target
table qualifier.
• The DataJoiner non-IBM target is defined as (None). CCDs are not directly
supported by DJRA to non-IBM targets. We will modify the generated SQL
prior to execution in order to create the CCD table in Oracle.
• No primary key is defined initially.
By the nature of the source application, there will never be any updates made
to the Sales table. SQL inserts are performed to record each sale transaction,
and SQL deletes are performed in batch to remove records for housekeeping
purposes. The batch deletes should not be replicated because they are only
being performed for housekeeping purposes and have no significance within
the warehouse (this will also help to reduce by half the number of changes
made to the Sales table that are replicated).
Since only inserts are copied, the Sales table can be replicated to either a PIT
or CCD target and still maintain history information. A PIT target table would
save space and take less network bandwidth compared to a CCD table
because a CCD table has the overhead of maintaining three additional DProp
control columns. However, a PIT target table requires a primary key, and one
is not readily definable on the target table because the uniqueness of a row
cannot be guaranteed even using all target columns. Therefore we have
chosen to make the target table a CCD table.
Do not replicate batch delete from source Apply predicate to Sales subscription to
to target. prevent deletes from replicating.
Figure 65 below shows the relationship between the source and target Sales
tables.
Sales
Date
BasArtNo
Location
Company
StoreNo
Pieces
Out_Prc
Tax
Transfer_Date
Process_Date
Source
Target
Sales(CCD)
Sale_Date
BasArtNo
Location
Company
Pieces
Out_Prc
Tax
Transfer_Date
Process_Date
IBMSNAP_INTENTSEQ
IBMSNAP_OPERATION
IBMSNAP_COMMITSEQ
IBMSNAP_LOGMARKER
Edit the generated SQL to disable full refresh. Also modify the CREATE
TABLESPACE statement to create a large DB2 for OS/390 tablespace with
enough primary and secondary storage to hold the large amounts of change
data expected for the Sales table. The modified SQL can be seen below:
-- in SRCESVR.REX, about to create a change data tablespace
--CREATE TABLESPACE TSSALES
-- IN SJ390DB1 SEGSIZE 4 LOCKSIZE TABLE CLOSE NO CCSID EBCDIC;
CREATE TABLESPACE TSSALES IN sj390db1
SEGSIZE 4 LOCKSIZE TABLE CLOSE NO CCSID EBCDIC
USING STOGROUP SJDB1SG2 PRIQTY 180000 SECQTY 5000;
The initial size of the Sales target table is 87Mb with an estimated change
volume of 14Mb per day. To manage these large amounts of change data and
the expected change volume, it is often necessary to define a tablespace at
the target capable of managing large amounts of data. In this case, the
following command was used to create a tablespace in Oracle capable of
holding sales information:
CREATE TABLESPACE BIGTS DATAFILE ’/oracle8/u01/oradata/ora8/bigts.dbf’
SIZE 90M
AUTOEXTEND ON
NEXT 15M ;
Define the tablespace directly from within SQL*Plus. It will have an initial size
of 90M and will be able to automatically extend in chunks of 15M. For more
information on managing Oracle tablespaces, see the Oracle8 Administrator’s
Guide, A58397-01.
Once the Oracle tablespace has been created, the Add a Member to
Subscription Set feature of DJRA is used to create the subscription for the
Sales table (see Figure 67).
The Target table attributes are similar to those described in detail in 8.4.2.2,
“Subscribe to the Supplier Table” on page 223.
The Where clause was added to the subscription definition to prevent the
batch deletes from replicating to the target table.
A full listing of the SQL used to define the Sales subscription can be found in
Appendix E.11, “Output from Subscribe to the Sales Table” on page 366.
Now that all the replication registrations and subscriptions have been defined,
we need to look at more detailed information on how to use DProp to support
temporal histories, maintain aggregate information and finally load the data
into the warehouse.
The additional column is added to the target table(s) by editing the DJRA
generated SQL for the subscription. In this case study, the column is called
EXPIRY_TIMESTAMP.
The first SQL works by scanning through the <tablename> table for records
with the same source key column value(s) and placing a timestamp in the
EXPIRED_TIMESTAMP column of the oldest of these records. The oldest
record is identified as the one with the lowest value in
IBMSNAP_INTENTSEQ. The IBMSNAP_LOGMARKER value of the new
record is used as the timestamp which is inserted into the
EXPIRED_TIMESTAMP column of the old record. In other words, the start of
the validity period of the new record becomes the end of the validity period of
The second SQL statement is used to provide additional handling for source
records, which are deleted. This statement looks for records that record a
delete operation against the source. It updates the EXPIRED_TIMESTAMP
column of such records with the IBMSNAP_LOGMARKER of the same
record. In effect, it closes the record’s validity period immediately. It is
included to respect one of the basic principles of life-span modeling, which
states that the start and end dates represent the time in which the object is
true in the modeled reality. Any query requesting information on the object
outside of its modeled validity period should result in false or an SQLCODE
100 being returned. If you leave the end date of a deleted record open,
temporal queries will return true, which is the wrong answer. The point is that
the object was logically deleted, and thus, the state history must reflect this.
The record with the source key column value ’A’ is updated at the source and
’B’ is deleted.These changes are replicated to the CCD table. After Apply has
replicated these changes to the target, but before the SQL After statements
which maintain the temporal histories are executed, the table will contain the
following data:
KeyCol IBMSNAP_LOGMARKER IBMSNAP_OPERATION EXPIRED_TIMESTAMP
A 1999-03-26-11.37.30.000000 I 1999-03-26-13.40.30.000000
A 1999-03-26-13.40.30.000000 U <NULL>
A 1999-03-26-18.12.08.000000 U <NULL>
B 1999-03-26-11.37.30.000000 I <NULL>
B 1999-03-26-18.12.08.000000 D <NULL>
C 1999-03-26-15.22.21.000000 I <NULL>
The update and the delete have been recorded in the CCD table, but the
validity period has not been changed. Once the SQL After statements have
been executed, the target table will contain:
KeyCol IBMSNAP_LOGMARKER IBMSNAP_OPERATION EXPIRED_TIMESTAMP
A 1999-03-26-11.37.30.000000 I 1999-03-26-13.40.30.000000
A 1999-03-26-13.40.30.000000 U 1999-03-26-18.12.08.000000
A 1999-03-26-18.12.08.000000 U <NULL>
B 1999-03-26-11.37.30.000000 I 1999-03-26-18.12.08.000000
B 1999-03-26-18.12.08.000000 D 1999-03-26-18.12.08.000000
C 1999-03-26-15.22.21.000000 I <NULL>
Similar SQL After statements were used to add temporal history support to
the Store and Products target tables. For details of the specific SQL used,
refer to 8.4.2.3, “Add Temporal History Support to the Supplier Table” on page
227 for the Supplier table; 8.4.3.3, “Add Temporal History Support to the
Store Table” on page 235 for the Store table; and 8.4.4.3, “Add Temporal
History Support to the Products Table” on page 244 for the Products table.
The SQL generated to add SQL After statements to the Supplier table can be
seen in Appendix E.12, “SQL After to Support Temporal Histories for Supplier
Table” on page 369.
The ANALYZE command is used to gather statistics for the table and index and
is similar to DB2’s RUNSTATS command. We recommended creating the index
and analyzing the data after the initial load of the data into the target. This
way, the statistics will be more accurate.
DataJoiner will not automatically recognize the new Oracle index. To make
DataJoiner aware of the index, connect to the DataJoiner database and
create an index on the Supplier nickname using the following syntax:
CREATE UNIQUE INDEX tempidx ON simon.supplier
(SUPP_NO, IBMSNAP_INTENTSEQ);
This does not actually create an index on the nickname; it just populates the
DataJoiner global catalog so that DataJoiner knows there is an index on the
Oracle table. It is also advisable to use DB2 RUNSTATS against the nickname in
order to ensure that the DataJoiner global statistics are up-to-date. For the
Supplier target table, the following SQL was issued from the DB2 Command
Line while connected to the DataJoiner database in order to update the global
statistics:
Finally, we need to tell DataJoiner that the collating sequence used within the
Oracle database is the same as the collating sequence used within the local
DataJoiner database. This allows DataJoiner to push down order-dependent
operations (such as ORDER BY, MIN, MAX, SELECT DISTINCT) to Oracle. If
we do not set this option, DataJoiner must retrieve the necessary data from
Oracle, and perform the ordering locally—this is usually far less efficient
because far more data is transferred from Oracle to DataJoiner. We use the
DataJoiner COLSEQ server option to do this. In this case study, the option is
created for the AZOVORA8 server mapping by issuing the following SQL
from the DB2 Command Line:
CREATE SERVER OPTION colseq FOR SERVER azovora8 SETTING ’y’
This server option only needs to be created once, as it applies to the whole
Oracle server. By creating the COLSEQ server option and setting it to "Y",
performance can improve dramatically. For example, consider the Products
target table which contains 37,000 rows. Without the server option, the SQL
After statement took several minutes to execute. After creating the option,
execution time for the SQL After was less than 5 seconds.
For more details on DataJoiner server options, please refer to the DataJoiner
Application Programming and SQL Reference Supplement, SC26-9148. For
more information about tuning DataJoiner in the heterogeneous environment,
please refer to the DataJoiner Administration Supplement, SC26-9146.
Apply does not maintain base aggregate tables from log based changed data
capture. It maintains base aggregates by querying the application base tables
directly. These tables may be large and contention may occur between Apply
and your OLTP transactions when Apply is accessing the source table(s).
Change aggregates are relatively inexpensive to maintain because Apply
queries the change data table, and not the base table. Not only does this
avoid contention with your OLTP applications, but change data tables are
usually much smaller than application tables. For more information on DProp
Maintain
3
CD Table
Change
Maintain Aggregate
2 SIMON.MOVEMENT
Figure 69. Maintain Base Aggregate Table from Change Aggregate Subscription
Let us consider an example for this case study. A common query against the
warehouse is to find the total number of items sold and the total price of all
these items broken down by store. The following SQL statement can be used
to provide this analysis:
SELECT company, location, sum(pieces), sum(out_prc)
FROM sales GROUP BY company,location
We would like to have this information precalculated and stored within the
warehouse. By using the process described, it is possible to maintain such a
target aggregate table from a change aggregate subscription. The SQL script
detailed in Appendix E.13, “Maintain Base Aggregate Table from Change
Aggregate Subscription” on page 370 was used to maintain the aggregate
shown above within the data warehouse (and contains detailed comments on
how the scheme works).
Advice: Use the Replication Analyzer with the DEEPCHECK option to check
the validity of the SQL Before and SQL After statements before starting the
subscription.
The Valid_To is NULL predicate is added to ensure that only those records
from the Outlets table which are valid at the present time are used. The ORDER
BY clause will order the data so that the stores which take the most money will
appear first in the report.
This technique will work for SQL column functions AVG, COUNT and SUM. It
is not possible to use the technique with the MIN and MAX column functions
(these functions will still have to be maintained directly from the source tables
by using standard base aggregate subscriptions).
If Capture is cold-started, then you will probably need to reactivate the base
aggregate set and deactivate the change aggregate set to refresh the base
aggregate table. Once the refresh is complete, the base aggregate set will
automatically be deactivated, and the change aggregate set will be activated.
Now, when the WHQ1 subscription set has executed, Apply will automatically
update status information into the Subscription Set table. The SQL After
statement will then be executed, which will copy this status information into
Oracle by using DataJoiner. The multi-vendor DBA can now access DProp
status information using the tools and techniques which they are familiar with.
For example, the Oracle DBA could use the following SQL query from
SQL*Plus to obtain the status of the last replication cycle:
SELECT APPLY_QUAL,
SET_NAME,
STATUS,
TO_CHAR(LASTRUN,’IYYY-MM-DD-HH24:MI:SS’),
TO_CHAR(LASTSUCCESS_RUN,’IYYY-MM-DD-HH24:MI:SS’),
TO_CHAR(CONSISTENT_TO,’IYYY-MM-DD-HH24:MI:SS’)
FROM DPROP_STATUS;
Since the subscription definition has full refresh disabled, the initial full
refresh of the data and synchronization of Capture and Apply must be
performed manually.
The DJRA Off-line Load utility can be used to help load data into the
warehouse manually. The utility will only unload/load data one subscription
set at a time. Therefore, we will have to unload/load all the data from the
SALES_SET at once.
The four steps that off-line load utility guides you through are these:
1. Prepare the tables for the off-line load:
• Disable full refresh for the subscription set members.
• Disable the subscription set.
• Initiate change capture by performing synchpoint translation for
each source table.
2. Unload the data from the source tables.
3. Load the data into the target tables.
4. Reactivate the subscription set.
Steps 1 and 4 are performed by the Off-line Load utility. Steps 2 and 3, the
unloading and loading of the data, must be performed manually by the
replication administrator.
There are many ways in which the unload and load tasks can be performed.
The most suitable method is usually determined by the volume of data being
loaded into the target. Several of the most common alternatives are
described below.
Once the DB2 for OS/390 Server Mapping has been defined, create a
nickname for the replication source tables.
Advice: We need to ensure that the timestamp we initially load into the
IBMSNAP_LOGMARKER column is either the same or earlier than the
minimum date from the central fact table (because we use this column to
denote the start of the validity period). If we do not do this, then the predicate
described in 8.4.6.1, “Defining a Time Consistent Query” on page 255 may
not return all the valid rows because the SALE_DATE may be after the initial
timestamp marking the start of that records validity period. In this case study,
the following SQL was issued against the source Sales table to find the
correct timestamp to use:
SELECT MIN(DATE) FROM DB2RES5.SALES
For a CCD table, we have to generate values for the additional DProp control
columns. The example below shows the SQL used to populate the Supplier
table:
INSERT INTO SIMON.SUPPLIER
(SUPP_NO,
SUPP_NAME,
IBMSNAP_INTENTSEQ,
IBMSNAP_OPERATION,
IBMSNAP_COMMITSEQ,
IBMSNAP_LOGMARKER)
SELECT SUPP_NO,
SUPP_NAME,
x’00000000000000000001’ as IBMSNAP_INTENTSEQ,
’I’ as IBMSNAP_OPERATION,
x’00000000000000000001’ as IBMSNAP_COMMITSEQ,
’1997-12-01’ as IBMSNAP_LOGMARKER FROM DJINST5.SUPPLIER_SOURCE
Default values must be generated for the DProp control columns because
they do not exist within the source table and the columns are defined as NOT
NULL on the target table.
We used the following SQL script to export data from the Products view on
DB2 for OS/390 and import data into the Oracle Products table by using a
DataJoiner nickname:
-- Manual addition to export the data
CONNECT TO SJ390DB1 USER db2res5 using;
CONNECT RESET;
Since the target Oracle table is a CCD, additional column values are
generated for the DProp control columns on export.
Advice: EXPORT will create the IXF file on the machine where the EXPORT
command is issued. If there is a significant amount of data in the file, then it
should be transferred to the machine where the Oracle target database
resides before using the IMPORT command. This will dramatically improve
the performance of the IMPORT because DataJoiner will be able to perform
the SQL inserts locally against the Oracle database (and not across the
network). Of course this is only possible if DataJoiner is on the same machine
as Oracle.
The JCL used for invoking the DSNTIAUL program in our environment is
shown below:
//DB2RES5$ JOB (999,POK),’DSNTIAUL’,
// CLASS=A,MSGCLASS=T,MSGLEVEL=(1,1),TIME=1440,
// NOTIFY=DB2RES5
//*
//DELETE EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
DELETE DB2RES5.SYSREC00;
DELETE DB2RES5.SYSPUNCH;
SET MAXCC = 0;
/*
//*
//UNLOAD EXEC PGM=IKJEFT01,DYNAMNBR=20,COND=(4,LT)
//STEPLIB DD DISP=SHR,DSN=DB2V510.SDSNLOAD
//DBRMLIB DD DISP=SHR,DSN=DB2V510I.DBRMLIB.DATA
//SYSPRINT DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*
//SYSREC00 DD DSN=DB2RES5.SYSREC00,UNIT=SYSDA,
// VOL=SER=PSOFT6,SPACE=(CYL,(500,0),RLSE),DISP=(,CATLG)
//SYSPUNCH DD DSN=DB2RES5.SYSPUNCH,UNIT=SYSDA,
// VOL=SER=SAP007,SPACE=(1024,(15,15)),DISP=(,CATLG)
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
DSN S(DB2I)
RUN PROGRAM(DSNTIAUL) PLAN(DSNTIB51) PARMS(’SQL’) -
LIB(’DB2V510I.RUNLIB.LOAD’)
/*
//SYSIN DD *
SELECT
CHAR(DATE,ISO) AS DATE,
CHAR(BASARTNO) AS BASARTNO,
CHAR(LOCATION) AS LOCATION,
CHAR(COMPANY) AS COMPANY,
CHAR(PIECES) AS PIECES,
CHAR(OUT_PRC) AS OUT_PRC,
CHAR(TAX) AS TAX,
CHAR(DATE(TRANSFER_DATE),ISO) AS TRANSFER_DATE,
This JCL will probably need modification to meet particular site requirements
and configurations. When using DSNTIAUL it is important to estimate the size
of the dataset which will be created and use the SPACE allocation of the
SYSREC00 DD statement to ensure there is sufficient disk space available. Use
the RLSE parameter to shorten the data set to the space occupied by the data
at the time the data set is closed.
The SQL select statement used to extract the data is at the bottom of the JCL
file. In order to overcome the differences in the representations of various
data types (for example, decimal, integer) between OS/390 and AIX, the
externalized data must be converted to character prior to the unload. It is then
converted back to the corresponding data type for the target tables during the
load. The Sales table contained three data types which were converted to
CHARACTER using the techniques described below:
• DECIMAL columns are converted to CHARACTER format by using the
CHAR SQL function. For example: CHAR(TAX) AS TAX .
• DATE columns are converted to CHARACTER format using the CHAR
SQL function with an additional parameter indicating the format of the date
within the character field. For example: CHAR(DATE,ISO) AS DATE .
• TIMESTAMP columns are converted to CHARACTER by first of all using
the DATE function to convert the TIMESTAMP to a DATE type. The result
of this was subsequently converted to CHARACTER using the same
method as that described for the DATE type above. For example:
CHAR(DATE(PROCESS_DATE),ISO) AS PROCESS_DATE. The time information in the
TIMESTAMP is lost when it is converted to a DATE. This is acceptable in
this situation because even though the TRANSFER_DATE and
PROCESS_DATE columns where of TIMESTAMP type, they only
contained DATE information.
Although the target Sales table is a CCD, many of the DProp control columns
can be omitted from the SQL used by DSNTIAUL. This is because they can
be added as constant values from the SQL*Loader control file. This reduces
the amount of data which is held in the export file created by DSNTIAUL, and
consequently the amount of data which will be transferred across the
network.
Advice: The only DProp control column to be added to the export file is
IBMSNAP_LOGMARKER. This can be added as the CURRENT DATE DB2
special register and not the CURRENT TIMESTAMP special register. This is
because Oracle does not have the same precision for TIMESTAMPS as DB2.
In fact, Oracle stores all its time and date information in columns of type
DATE, which can only hold data accurate to the second (not 100,000 ths of a
second like DB2).
(
SALE_DATE POSITION(1:10) DATE ’YYYY-MM-DD’ ,
BASARTNO POSITION(11:25) DECIMAL EXTERNAL ,
LOCATION POSITION(26:31) DECIMAL EXTERNAL ,
COMPANY POSITION(32:36) DECIMAL EXTERNAL ,
PIECES POSITION(37:45) DECIMAL EXTERNAL ,
OUT_PRC POSITION(46:62) DECIMAL EXTERNAL ,
TAX POSITION(63:79) DECIMAL EXTERNAL ,
TRANSFER_DATE POSITION(80:89) DATE ’YYYY-MM-DD’ ,
PROCESS_DATE POSITION(90:99) DATE ’YYYY-MM-DD’ ,
IBMSNAP_INTENTSEQ CONSTANT ’00000000000000000001’,
IBMSNAP_OPERATION CONSTANT ’I’ ,
IBMSNAP_COMMITSEQ CONSTANT ’00000000000000000001’,
IBMSNAP_LOGMARKER POSITION(100:109) DATE ’YYYY-MM-DD’
)
As you can see, this file is somewhat similar in format to the SYSPUNCH file
generated by DSNTIAUL. A brief summary follows:
Full details of the control file format and SQL*Loader parameters can be
found in the Oracle8 Utilities Guide, A58244-01.
The DIRECT=TRUE parameter tells the loader to use the direct path option. This
option creates preformatted data blocks and inserts these blocks directly into
the table. This avoids the overhead of issuing multiple SQL inserts and the
associated database logging which will occur. This is similar to the DB2 UDB
LOAD utility.
Beside the discard file, SQL*Loader also creates a .log file which contains a
log of the work done and a .bad file which contains all the records which could
not be loaded into the target.
The query produces a summary of the total sales recorded in the Sales table
during 1997 and 1998 grouped by region and product line. Essentially, it tells
us the best (and worst) selling product lines by region over a 2-year period.
8.6 Summary
During this chapter we have seen how to maintain a data warehouse within
Oracle from changes captured from a DB2 for OS/390 system. Specific
techniques have been discussed showing how to use DProp to maintain
historic information, denormalize data, and maintain temporal histories within
the target data warehouse. Advanced techniques showing how to push down
the replication status to the warehouse and how to maintain base aggregate
tables from change aggregate subscriptions have also been demonstrated.
Many of the techniques discussed within this chapter apply not only to data
warehousing situations, but to replication situations where DProp is used as
the product of choice.
In such an environment, not all the data will be replicated from the source
server towards all the target servers. Each target database will receive only a
subset of rows that are of interest for that particular target database. The
subsetting will be done according to a geographical criterion (agency code for
example). Since the partitioning data is not present in every source table, this
scenario will also illustrate the use of view registrations to implement the
subsetting technique.
Remark
At the time this book was written, the DB2 DataPropagator for Microsoft Jet
product was still in test phase, so the results described below should be
considered with some degree of caution.
The insurance company’s head office owns the corporate data and runs the
reference applications. The insurance company has several agencies spread
all over the country, and sales representatives in each agency.
Each sales representative is attached to only one agency, and each customer
is usually managed by only one sales representative. The sales
representative’s Microsoft Access tables contain all the data pertaining to all
the customers that are attached to the sales representative’s agency. If a
sales representative is not available, he can ask one of his colleagues from
the same agency to replace him for a specific customer case. Sales
representatives do not have access to the data that belong to other agencies.
Also, there are the equivalent four target tables in Microsoft Access. The
target tables have the same structure as the source tables.
CUSTNO CONTRACT
....... .......
AGENCY CUSTNO
VEHICLES ACCIDENTS
PLATENUM CUSTNO
....... ACCNUM
CUSTNO
The SQL statements that we used to create the tables are shown in Appendix
F.1, “Structures of the Tables” on page 381.
In our scenario the source database is called SJNTDWH1, and the schema of
the source tables is called IWH.
Since we will be replicating join views, we must take care of the issue
involving "double-delete" (or "simultaneous-delete"). What happens if a row
that was deleted from the CUSTOMERS table and the corresponding row,
The problem is that, since the row was deleted from the two components of
the join, it does not appear in the views (base views and CD-views) and so
the double-delete is not replicated.
There are ways to deal with this issue. A well-known technique is to define a
side CCD table for one of the components of the join. This CCD table should
be condensed and non-complete (you can define it as complete, but this is
not necessary) and located on the target server. The IBMSNAP_OPERATION
column of this CCD table is used to detect the deletes. The most common
way to do this is to add an SQL after statement in the definition of the
subscription set. The SQL statement will remove, from the target table, all the
rows for which the IBMSNAP_OPERATION is equal to "D" in the CCD table.
But in this scenario we are replicating between DB2 and Microsoft Access,
and we cannot create a CCD table into a Microsoft Access database.
Furthermore, DB2 DataPropagator for Microsoft Jet does not allow the use of
SQL After statements. So if we wanted to really deal with the double-delete
issue in this scenario we would need to:
• Create a CCD table on the source server. This means that we would have
an extra Apply program running on the source server to feed the CCD
table.
• Insert some code in the ASNJDONE user exit so that, after each
replication, it would connect to the source server, read the content of the
CCD table, and delete the target rows if IBMSNAP_OPERATION is equal
to "D".
You should always try to find a way to prevent conflicts from occurring. For
example, in our scenario, a convention should be established between sales
representatives and the head office, so that they do not update the same
tables the same day.
We will be replicating data between a DB2 UDB for Windows NT server and
several Microsoft Access databases located on other Windows NT servers or
If the source server were a DB2 for OS/390 host or an AS/400, the only
difference would be the need to install DDCS or DB2 Connect (or DB2 UDB
Enterprise Edition, since it includes DB2 Connect, or DataJoiner since it
includes DDCS), either on a separate Windows NT server that would operate
as a gateway (DB2 Connect Enterprise Edition), or on each target workstation
(DB2 Connect Personal Edition).
In this replication scenario, the Dprop control tables must be located in a DB2
database. So we will create them in the source server. This is important
because it means that the administration workstation will only need to have
access to the source server. You can define all the replication sources and all
the Subscriptions Sets even before you configure the target workstations.
You can even let ASNJET create the target database and the target tables for
you. If you do not create them yourself, ASNJET will create them
automatically the first time it is run.
The Dprop control tables are the same as for any other Dprop replication
scenario, except that there are two additional tables:
• ASN.IBMSNAP_SCHEMA_CHG: Used to signal modifications to a
subscription.
• ASN.IBMSNAP_SUBS_TGTS: Used by ASNJET to maintain the list of the
row-replica table names. It enables ASNJET to automatically delete a
row-replica table if the corresponding subscription definition was removed
since the last synchronization.
From the system topology diagram shown above, you can see that ASNJET
replaces the function of the Apply component. No additional functionality of
DataJoiner is needed in this scenario. The control tables are located at the
central DB2 UDB server, which acts as the master-copy for the mobile clients.
To establish the database connectivity, DB2 CAE is implemented on the
mobile clients. The sales representatives access their local copy of the
Remark: You will notice that, unlike other Apply components, ASNJET does
not require any bind operation. The only necessary binds are:
• The bind for Capture on the source server.
• The binds for CAE (from the administration workstation and target
workstations towards the source server). Note that if CAE is at the same
level of maintenance on all the workstations, the binds for CAE need only
be done once.
We also assume that DB2 UDB is already installed on the source server, that
you have already created the source database and the source tables, and
that Microsoft Access is already installed on the target workstations.
Create the replication control tables (see Chapter 4.4.4, “Create the
Replication Control Tables” on page 74):
• On the administration workstation, use DJRA to generate and run an SQL
script to create the Dprop control tables in the source server database.
Bind DProp Capture (see Chapter 4.4.5, “Bind DProp Capture and DProp
Apply” on page 74):
• On the source server, bind the Capture component on the source
database.
You must first define the physical tables as replication sources before you can
define the join views as replication sources.
When you generate an SQL script, always choose a meaningful script name
so that you will be able to remember the purpose of the script. For example,
we generated the following scripts: regcust.sql, regcont.sql, regvehi.sql,
regacci.sql, regvcont.sql, regvvehi.sql and regvacci.sql (reg stands for
"registration", which is a synonym for "define a replication source").
Then choose the CONTRACTS table from the list of source tables, specify
that you will need all the source columns, that you want to capture both
before-images and after-images, that you want to capture the updates as
updates, and choose a standard conflict detection level.
The panel should now look like this (see Figure 76):
Select Generate SQL to generate the regcont.sql script. See the generated
SQL script in Appendix F.2, “SQL Script to Define the CONTRACTS Table as
a Replication Source” on page 383.
Save and run the generated SQL script, then select Cancel to come back to
DJRA’s main panel.
Remarks:
• If you want DJRA to generate an SQL script that uses your own naming
conventions (names of the CD tables for example), you can press the Edit
Logic button before you generate the SQL script.
• For the CUSTOMERS table, we chose exactly the same parameters,
except for the update capture policy. We decided that since a customer
Indicate the source view qualifier (IWH), and press the Build List Using Filter
button.
Select the IWH.VCONTRACTS view, then select Generate SQL. See the
generated SQL script in Appendix F.3, “SQL Script to Define the
VCONTRACTS View as a Replication Source” on page 385.
Save and run the generated SQL script, then select Cancel to come back to
DJRA’s main panel.
On the administration workstation, use DJRA to generate and run SQL scripts
to define the replication targets for the first target workstation (see the details
below). After you have done this, you will duplicate the SQL scripts, adapt the
scripts for the other target workstations, and run the scripts.
For this scenario, we will create one subscription set for each table, so we will
have only one member per subscription set. An alternative would have been,
for example, to create one subscription set including the four members. The
performance of the replication would have been a little bit better, but you must
Warning: Neither DJRA nor ASNJET will automatically create the referential
integrity constraints between the Microsoft Access tables. You will have to
define these constraints yourselves. This is important because we will see
later that if you do not create the Microsoft Access tables yourself, ASNJET
will create them for you, the first time it is run, but it will not create the
constraints. In that case you will have to add the referential integrity
constraints after ASNJET has created the tables.
Select the Microsoft Jet check box for Target servers , and enter the name of
the Microsoft Access database, for example, DBSR0001 (for DataBase for
Sales Representative 0001 ). Each sales representative will have only one
Microsoft Access database.
Set name: We decided to create one set per target table, so you can choose
a set name such as CUST0001 (for Set for the CUSTOMERS table for sales
representative 0001 ).
Your DJRA panel should now look like this (see Figure 80):
Select Generate SQL. See the generated SQL script in Appendix F.4, “SQL
Script to Create the CUST0001 Empty Subscription Set” on page 386.
Save and run the generated SQL script. Always remember to give a
meaningful script name (such as SETCUST.SQL, for example).
Repeat the same operations for the three other subscription sets: CONT0001
(for CONTRACTS), VEHI0001 (for VEHICLES) and ACCI0001 (for
ACCIDENTS).
From DJRA’s main panel, select the Add a member to Subscription Sets
option. The following panel is then displayed (see Figure 81):
You will receive a message saying Target structure must be row replica for
server DBSR0001. Simply answer OK.
Then select the second Build List button. This will display the list of defined
replication sources.
Specify that you want all columns, and indicate the target table
characteristics:
• Qualifier: IWH
• Target table name: CONTRACTS (it does not need to be VCONTRACTS)
• Target structure: Row-replica
The screen should now look like this (see Figure 82):
Select Generate SQL. See the generated SQL script in Appendix F.5, “SQL
Script to Add a Member to the CONT0001 Empty Subscription Set” on
page 387.
Save and run the generated SQL script. Select a meaningful script name
(such as MBRCONT.SQL, for example).
So far, we have generated and run the SQL scripts to define the subscription
sets and subscription members for the first sales representative. We stored
these SQL scripts in directory ASNJET\SCRIPTS\TARGET1:
• SETCUST.SQL: Subscription set for the target CUSTOMERS table
• SETCONT.SQL: Subscription set for the target CONTRACTS table
• SETVEHI.SQL: Subscription set for the target VEHICLES table
• SETACCI.SQL: Subscription set for the target ACCIDENTS table
• MBRCUST.SQL: Subscription member for the target CUSTOMERS table
• MBRCONT.SQL: Subscription member for the target CONTRACTS table
• MBRVEHI.SQL: Subscription member for the target VEHICLES table
• MBRACCI.SQL: Subscription member for the target ACCIDENTS table
Now, we will create the equivalent SQL scripts for sales representative 2.
To do this, use the following steps:
• Copy the content of ASNJET\SCRIPTS\TARGET1 towards
ASNJET\SCRIPTS\TARGET2.
• Update SETCUST.SQL: Replace the string ’0001’ by ’0002’ everywhere.
• Update SETCONT.SQL: Replace the string ’0001’ by ’0002’ everywhere.
• Update SETVEHI.SQL: Replace the string ’0001’ by ’0002’ everywhere.
• Update SETACCI.SQL: Replace the string ’0001’ by ’0002’ everywhere.
• Update MBRCUST.SQL:
• Replace the string ’0001’ by ’0002’ everywhere.
• Find the filtering predicate ’AGENCY = 25’ (there should be only one
occurrence) and replace the 25 by the appropriate value for sales
representative 2.
In fact, two laptops (for sales representatives 1 and 2) have been configured
at that time, both having the same subsetting predicate (AGENCY 25).
CUSTOMERS table:
db2 select CUSTNO, LNAME, FNAME, AGENCY, SALESREP from IWH.CUSTOMERS where
AGENCY = 25
VCONTRACTS view:
db2 select CONTRACT, CUSTNO, BASEFARE, CREDATE, AGENCY from IWH.VCONTRACTS
where AGENCY = 25
VVEHICLES view:
VACCIDENTS view:
ASN.IBMSNAP_REGISTER table:
ASN.IBMSNAP_SUBS_SET table:
APPLY_QUAL SET_NAME WHOS_ON SOURCE_ SOURCE_ TARGET_ TARGET_
_FIRST SERVER ALIAS SERVER ALIAS
---------- --------- ------- -------- -------- -------- --------
AQSR0001 CUST0001 S SJNTDWH1 SJNTDWH1 MSJET DBSR0001
AQSR0001 CUST0001 F MSJET DBSR0001 SJNTDWH1 SJNTDWH1
AQSR0001 CONT0001 S SJNTDWH1 SJNTDWH1 MSJET DBSR0001
AQSR0001 CONT0001 F MSJET DBSR0001 SJNTDWH1 SJNTDWH1
AQSR0001 VEHI0001 S SJNTDWH1 SJNTDWH1 MSJET DBSR0001
AQSR0001 VEHI0001 F MSJET DBSR0001 SJNTDWH1 SJNTDWH1
AQSR0001 ACCI0001 S SJNTDWH1 SJNTDWH1 MSJET DBSR0001
AQSR0001 ACCI0001 F MSJET DBSR0001 SJNTDWH1 SJNTDWH1
AQSR0002 CUST0002 S SJNTDWH1 SJNTDWH1 MSJET DBSR0002
AQSR0002 CUST0002 F MSJET DBSR0002 SJNTDWH1 SJNTDWH1
AQSR0002 CONT0002 S SJNTDWH1 SJNTDWH1 MSJET DBSR0002
AQSR0002 CONT0002 F MSJET DBSR0002 SJNTDWH1 SJNTDWH1
AQSR0002 VEHI0002 S SJNTDWH1 SJNTDWH1 MSJET DBSR0002
AQSR0002 VEHI0002 F MSJET DBSR0002 SJNTDWH1 SJNTDWH1
AQSR0002 ACCI0002 S SJNTDWH1 SJNTDWH1 MSJET DBSR0002
AQSR0002 ACCI0002 F MSJET DBSR0002 SJNTDWH1 SJNTDWH1
In the SUBS_SET table, notice that there are 2 rows for each subscription
set. The row with a WHOS_ON_FIRST value of "F" represents the replication
from Microsoft Access towards DB2 UDB, and the row with a
WHOS_ON_FIRST value of "S" represents the replication from DB2 UDB
towards Microsoft Access. You can also notice that the MSJET string is used
as a generic database name for Microsoft Access databases, and the real
name of the Microsoft Access database is indicated in the SOURCE_ALIAS
and TARGET_ALIAS columns.
ASN.IBMSNAP_SUBS_MEMBR table:
APPLY_QUAL SET_NAME WHOS SOURCE_ SOURCE TARGET_ TARG._ PREDICATES
_ON_ TABLE _VIEW TABLE STRUCT
FIRST _QUAL
---------- --------- ----- ---------- ------ --------- ------ -------------
AQSR0001 CUST0001 S CUSTOMERS 0 CUSTOMERS 9 (AGENCY = 25)
AQSR0001 CUST0001 F CUSTOMERS 0 CUSTOMERS 1 -
AQSR0001 CONT0001 S VCONTRACTS 1 CONTRACTS 9 (AGENCY = 25)
AQSR0001 CONT0001 S VCONTRACTS 2 CONTRACTS 9 (AGENCY = 25)
ASN.IBMSNAP_PRUNCNTL table:
ASN.IBMSNAP_SCHEMA_CHG table:
ASN.IBMSNAP_SUBS_TGTS is empty.
ASN.IBMSNAP_TRACE table:
OPERATION DESCRIPTION
--------- ----------------------------------------------------------
INIT ASN0100I: The Capture program initialization is successful
PARM ASN0103I: The Capture program started with SERVER_NAME
SJNTDWH1; the START_TYPE is COLD00000000000084..
When you look at the ASN.IBMSNAP_TRACE table, you can see that
Capture has been triggered by ASNJET to start capturing the updates for the
source tables (see the GOCAPT messages):
OPERATION DESCRIPTION
--------- ----------------------------------------------------------
INIT ASN0100I: The Capture program initialization is successful
PARM The Capture program started with SERVER_NAME SJNTDWH1; ...
GOCAPT Change Capture started for ... table name is CUSTOMERS ...
GOCAPT Change Capture started for ... table name is CUSTOMERS ...
GOCAPT Change Capture started for ... table name is CONTRACTS ...
GOCAPT Change Capture started for ... table name is CUSTOMERS ...
GOCAPT Change Capture started for ... table name is ACCIDENTS ...
GOCAPT Change Capture started for ... table name is CUSTOMERS ...
GOCAPT Change Capture started for ... table name is VEHICLES ...
When you look at the target side, you can see that in fact two Microsoft
Access databases were created by ASNJET (see Figure 83):
The target database (DBSR0001) contains the four target tables, plus some
complementary control tables (see Figure 84):
Now, open each target table to check that the content is equivalent to that of
the corresponding source table, according to the subsetting predicate
(Agency = 25). For example, the content of the target CONTRACTS table is
the following (see Figure 85):
Before starting ASNJET, check that Capture has had the time to capture the
update. For example, you can perform a SELECT over the Change Data table
(IWH.CDCONTRACTS) that is associated with the CONTRACTS table.
After ASNJET has stopped, enter Microsoft Access and open the
CONTRACTS table. The TAXES column now contains a value of 500 for
contract number 14. If you query the ASN.IBMSNAP_APPLYTRAIL table
(on the DB2 UDB side), you can also see that ASNJET has added the
following rows:
APPLY SET_NAME WHOS MASS EFF SET SET SET SOURCE TARGET
QUAL _ON_ DEL. MBR INS DEL UPD SERVER SERVER
FIRST
-------- -------- ----- ---- --- --- --- --- -------- --------
AQSR0001 CUST0001 F N 0 0 0 0 MSJET SJNTDWH1
AQSR0001 CONT0001 S N 1 0 0 1 SJNTDWH1 MSJET
AQSR0001 CONT0001 F N 0 0 0 0 MSJET SJNTDWH1
AQSR0001 ACCI0001 S N 0 0 0 0 SJNTDWH1 MSJET
AQSR0001 ACCI0001 F N 0 0 0 0 MSJET SJNTDWH1
AQSR0001 VEHI0001 S N 0 0 0 0 SJNTDWH1 MSJET
AQSR0001 VEHI0001 F N 0 0 0 0 MSJET SJNTDWH1
After ASNJET has stopped, query the CONTRACTS table in DB2 UDB. The
TAXES column now contains a value of 800 for Contract Number 8. If you
query the ASN.IBMSNAP_APPLYTRAIL, you can also see that ASNJET has
added the following rows:
Update the CONTRACTS table in Microsoft Access. For contract 17, change
the BASEFARE column from 1250 to 5000. Then close the Microsoft Access
table.
We now have:
• 17 - 1250 - 2000 in DB2 UDB
• 17 - 5000 - 100 in Microsoft Access
Check that Capture has captured the update on the DB2 UDB side (query the
Change Data table: IWH.CDCONTRACTS).
When ASNJET has ended, check the content of the CONTRACTS tables. We
now have:
• 17 - 1250 - 2000 in DB2 UDB (unchanged)
• 17 - 1250 - 2000 In Microsoft Access
So we can see that the DB2 update has won the conflict and both databases
are left in a consistent state. In the Microsoft Access database, look at the
IBMSNAP_IWH_CONFLICT_CONTRACTS table. It contains one row that
9.7.1.1 Network
Any kinds of networks (for example, WAN, LAN, and phone lines) are suitable
for DataPropagator for Microsoft Jet. But DataPropagator for Microsoft Jet will
be most often used to replicate data towards occasionally connected
workstations, using phone lines. This means that special attention must be
9.7.1.2 Security
The general database security considerations apply. In addition, a password
file must be defined on each target workstation, named the following way:
Apply_Qualifier.PWD
This file contains the userids and passwords that are used by ASNJET when
it connects to the source server and to the control server. If the control server
is co-located with the source server, the password file contains only one row.
9.7.1.3 Scheduling
ASNJET can be started with either the MOBILE parameter or with the
NOMOBILE (which is the default) parameter:
• With the NOMOBILE parameter, the general scheduling considerations of
any replication scenario apply. This means that the subscription sets can
be processed either according to a timing frequency or according to the
arrival of specific events, or both. In this mode, ASNJET does not stop
automatically, and so the user must stop it when he so wishes.
• With the MOBILE parameter, ASNJET does not really take care about the
timing frequency that is defined in the control tables. It processes all the
eligible subscription sets only once, and then it stops automatically . This
is probably the option that will be chosen most often, especially if the
Target workstations are occasionally-connected laptops.
9.7.1.4 Locking
The general locking considerations of any replication scenario apply here.
Additionally, the user should close any Microsoft Access table he has been
updating, before starting the ASNJET program.
Furthermore, since the process to define the source and target tables does
not require any connection to the target Microsoft Access databases, all the
setup can be prepared even before the target workstations are configured.
In this table there are four columns that you should check:
• ACTIVATE: Indicates whether a subscription set is active (value 1) or not
(value 0). If it is not active (value 0), it is probably because you decided
that this subscription set should not be processed. So, in fact, you only
need to check the values of the three other columns listed below, for the
rows that have the ACTIVATE column equal to 1.
Now that you have looked at the ASN.IBMSNAP_SUBS_SET table, you know
which subscription sets are OK, and which ones are not.
For those that have a problem, you must now determine what went wrong. To
do this, first have a look at the ASN.IBMSNAP_APPLYTRAIL table. In most
cases you will find helpful information there. The most interesting columns to
look at are:
• SQLCODE: This gives the SQL error code. Look at the DB2 reference
documentation to retrieve the description of SQLCODEs.
• SQLSTATE: This gives the SQLSTATE code. Look at the DB2 reference
documentation to retrieve the description of SQLSTATEs.
• APPERRM: This gives the error message.
Please make sure that you are only checking the rows that correspond to the
LASTRUN time, and not older rows.
Just have a look at the ASN.IBMSNAP_TRACE table to see if there are error
messages. If you have the feeling that updates were not captured, you should
check that the IBMSNAP_TRACE table contains a GOCAPT message for the
source table.
And you can also of course start Capture with a trace (be careful, the
parameter is TRACE, not TRCFLOW).
On the Target side you can also find useful error information in the following
Microsoft Access tables:
• IBMSNAP_ERROR_MESSAGE: This contains the error codes and error
messages.
• IBMSNAP_ERROR_INFO: This contains error information that helps
identify the row-replica table and the row that caused the error.
If you think that some updates should have been replicated from the
Microsoft Access tables towards the DB2 tables, and were not, it is probably
because a conflict was detected, and you must have a look at the two
following tables in the Microsoft Access database:
• IBMSNAP_SIDE_INFORMATION: This contains the names of the conflict
tables.
• IBMSNAP_target_table_CONFLICT: This contains the rejected updates.
9.9 Summary
In this scenario we have illustrated the following capabilities of the DB2
DataPropagator for Microsoft Jet component (ASNJET):
• Update-anywhere replication between DB2 and Microsoft Access, in an
occasionally-connected, mobile environment. We have seen that to
achieve this goal, the ASNJET program uses both the push and the pull
modes of replication.
Among the operational aspects, we have also seen a very important aspect:
• You can very easily recover from any important loss of data in the target
Microsoft Access tables. Simply delete the target database files and
ASNJET will automatically recreate the tables the next time it is run.
So this replication solution perfectly fits the needs of people who want to
exchange data between geographically dispersed micro-computers,
equipped with Microsoft Access, and a central DB2 server.
This Appendix contains a table (Table 13) that points you to all the tips, tricks,
and smart techniques described within this redbook. It provides a quick and
easy way to find a certain technique in the book.
Table 13. Index to Data Replication Tips, Tricks, and Techniques
How to prune CCD tables (including 5.3.2.2, “Pruning of CCD Tables” on page
internal CCD’s) 92
How to automatically prune the Apply Trail 5.3.2.3, “Pruning of the APPLYTRAIL
table Table” on page 93
Checking to see if the Capture process is 5.4.3.1, “Monitoring the Capture Process”
running on page 101
How to determine the current Capture lag 5.4.3.3, “Capture Lag” on page 102
How to resolve a gap with a Capture cold , “Resolving the Gap with a Capture COLD
start Start” on page 104
How to resolve a gap without a Capture , “Resolving the Gap Manually” on page
cold start 104
How to defer pruning for multi-vendor 5.5.13.2, “How to Defer Pruning for
replication sources Multi-Vendor Sources” on page 127
How to disable full refresh for all 5.6.2.1, “Disable Full Refresh for All
subscriptions Subscriptions” on page 129
How to disable full refresh for certain 5.6.2.2, “Allow Full Refresh for Certain
subscriptions Subscriptions” on page 130
Changing the Apply Qualifier or set name 5.6.6, “Changing Apply Qualifier or Set
for a subscription set Name for a Subscription Set” on page 134
Using SPUFI on OS/390 to access 6.4, “Nice Side Effect: Using SPUFI to
non-IBM databases Access Multi-Vendor Data” on page 158
How to maintain a change history (CCD) 8.4.2, “Maintaining a Change History for
table in a non-IBM target Suppliers” on page 220
How to denormalize data using target-site 8.4.3, “Using Target Site Views to
views Denormalize Outlet Information” on page
228
How to only replicate certain SQL 8.4.3, “Using Target Site Views to
operations Denormalize Outlet Information” on page
228
How to push down the replication status to 8.4.8, “Pushing Down the Replication
non-IBM targets Status to Oracle” on page 259
How to load data from a DB2 for OS/390 8.4.9.1, “Using SQL INSERT....SELECT....
source to an Oracle target by using from DataJoiner” on page 262
DataJoiner’s INSERT...SELECT...
How to load data from a DB2 for OS/390 8.4.9.2, “Using DataJoiner’s
source to an Oracle target by using EXPORT/IMPORT Utilities” on page 263
DataJoiner’s EXPORT/IMPORT utilities
How to load data from a DB2 for OS/390 8.4.9.3, “Using DSNTIAUL and Oracle’s
source to an Oracle target by using SQL*Loader Utility” on page 264
DSNTIAUL and Oracle’s SQL*Loader.
Dealing with the double delete issue when 9.1.2, “Comments about the Table
replicating join views Structures” on page 273
For full information about configuring the non-IBM clients and databases,
always refer to the documentation for that particular database software.
Advice: Many Oracle DBA’s have copies of the tnsnames.ora files used within
their organizations. Ask the DBA for permission to copy this pre-configured
file to your workstation.
For more information about configuring Oracle clients, see the Oracle Net8
Administrator’s Guide, A58230-01.
Scott is the sample userid provided with Oracle, and tiger is Scott’s
password. If this userid has been revoked, then contact the Oracle DBA for
valid userid and password.
Here are a few useful tips once you have logged onto the Oracle server:
• End all SQL*Plus commands with a semicolon ( ; )
• To find the structure of an Oracle table use this command:
DESCRIBE <tablename>;
• To find out who you are logged onto Oracle as, issue the command:
SELECT * FROM USER_USERS
• To invoke SQL*Plus and use a file as input, use the command:
sqlplus user/pwd@orainst @<input_file>
Put a quit; at the end of the input_file and SQL*Plus stops when finished.
• Use spool <out_filename>; to dump output to an output file, and spool off;
to stop dumping the output to a file.
• Use COMMIT; to commit the changes. There is no auto-commit.
Table 14 provides details on some of the more useful Oracle data dictionary
views.
To start Oracle, issue the startup command from the server manager, to stop
Oracle use the shutdown command. Before issuing these commands you
usually have to issue the connect internal command. For more information
TNSPING is similar to the TCP/IP ping command, except that it pings the Oracle
database to see if basic database connectivity is working. For example, if
your Oracle server is named AZOV, then type the following from the operating
system command line:
tnsping azov
The Trace Route Utility ( TRCROUTE) allows you to discover what path or route a
connection is taking from a client to a server. If a problem is encountered,
TRCROUTE returns an error stack to the client, which makes troubleshooting
easier. For information on how to use the TRCROUTE utility, see the Oracle8
Administrator’s Guide, A58397-01 .
You can enter information in the sqlhosts file by using a standard text editor (copy
a sample from $INFORMIXDIR/etc/sqlhosts.std). The table-like structure of the file
is shown in the example below:
dbservername nettype hostname port options
sjazov_ifx01 onsoctcp azov 2800
sjstar_ifx01 onsoctcp azov 2801
sjsky_ifx01 onsoctcp sky 2810
Advice: Like the Oracle tnsnames.ora file, many Informix DBA’s will have a
copy of this file customized for use within their organization. If you ask them
nicely, I am sure they will allow you to copy the file to your Informix client.
For example:
BEGIN WORK;
INSERT INTO.... VALUES (...);
INSERT INTO.... VALUES (...);
INSERT INTO.... VALUES (...);
INSERT INTO.... VALUES (...);
COMMIT;
Parameter Meaning
Microsoft also provides a graphical user interface called the SQL Server
Query Analyzer.
B.3.7 ODBCPing
This utility checks database connectivity from client to Microsoft SQL Server
databases accessed via ODBC. The syntax of the command is:
ODBCPING [-S Server | -D DSN] [-U login id] [-P Password]
SYBSVR2
master tcp ether 137.12.111.42 3048
query tcp ether 137.12.111.42 3048
Parameter Meaning
This Appendix contains the SQL generated from DJRA for the various
replication definitions which were configured in case study 2.
-- create the index for the change data table for LIYAN.ITEMS
CREATE TYPE 2 UNIQUE INDEX LIYAN.CDI00LIYANCD_ITEMS ON LIYAN.LIYANCD_ITEMS
(IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC);
-- insert a registration record into ASN.IBMSNAP_REGISTER
INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER,
SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED,
SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE,
DISABLE_REFRESH,ARCH_LEVEL,BEFORE_IMG_PREFIX,CONFLICT_LEVEL,
This Appendix contains the SQL generated from DJRA for the various
replication definitions which were configured in case study 3. The
modifications to the generated SQL are shown in bold typeface.
-- If you don’t see: ’-- now done interpreting...’ then check your REXX code
-- now done interpreting REXX password file PASSWORD.REX
--* CONNECTing TO SJ390DB1 USER db2res5 USING pwd ;
--*
--* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I ’
--*
--* CONNECTing TO DJDB USER djinst5 USING pwd;
--*
--* The ALIAS name ’DJDB’ matches the RDBNAM ’DJDB’
--*
--* connect to the CNTL_ALIAS
--*
CONNECT TO DJDB USER djinst5 USING pwd;
-- create the index for the change data table for ITSOSJ.SUPPLIER
CREATE TYPE 2 UNIQUE INDEX ITSOSJ.CDI00000CDSUPPLIER ON
ITSOSJ.CDSUPPLIER(IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC);
--*
--* Calling C:\DPRTools\addmembr.rex for WHQ1/SALES_SET pair # 3
--*
--* Echo input: ADDMEMBR DJDB WHQ1 SALES_SET ITSOSJ SUPPLIER
--* NONEEXECLUDED CCD=NYNNN NONE SIMON SUPPLIER NODATAJOINER U
--*
-- using REXX password file PASSWORD.REX
-- If you don’t see: ’-- now done interpreting...’ then check your REXX code
-- now done interpreting REXX password file PASSWORD.REX
--* Connect to the CNTL_ALIAS
--*
CONNECT TO DJDB USER djinst5 USING pwd;
--* If you don’t see: ’--* now done interpreting REXX logic file
--* CNTLSVR.REX’, then check your REXX code
--*
--* The subscription predicate was not changed by the user logic in
--* CNTLSVR.REX
--* now done interpreting REXX logic file CNTLSVR.REX
--* If you don’t see: -- now done interpreting REXX logic file
--* TARGSVR.REX, then check your REXX code
--*
-- in TARGSVR.REX
-- About to create a target table tablespace
--CREATE TABLESPACE TSSUPPLIER MANAGED BY DATABASE USING (FILE
’/data/djinst5/djinst5/SUPPLIER.F1’ 2000 );
-- create the index for the change data table for ITSOSJ.REGION
CREATE TYPE 2 UNIQUE INDEX ITSOSJ.CDI0000000CDREGION ON
ITSOSJ.CDREGION(IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC);
COMMIT;
--*
--* Calling TABLEREG for source table ITSOSJ.STORE
--*
--* echo input: TABLEREG SJ390DB1 ITSOSJ STORE AFTER NONEXCLUDED
--* DELETEINSERTUPDATE NONE N
--*
-- using SRCESVR.REX as the REXX logic filename
-- using REXX password file PASSWORD.REX
-- create the index for the change data table for ITSOSJ.STORE
CREATE TYPE 2 UNIQUE INDEX ITSOSJ.CDI00000000CDSTORE ON ITSOSJ.CDSTORE(
IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC);
COMMIT;
-- If you don’t see: ’-- now done interpreting...’ then check your REXX code
-- now done interpreting REXX password file PASSWORD.REX
--* Connect to the CNTL_ALIAS
--*
CONNECT TO DJDB USER djinst5 USING pwd;
--* If you don’t see: ’--* now done interpreting REXX logic file
--* CNTLSVR.REX’, then check your REXX code
--*
--* The subscription predicate was not changed by the user logic in
--* CNTLSVR.REX
--* now done interpreting REXX logic file CNTLSVR.REX
--* If you don’t see: -- now done interpreting REXX logic file
--* TARGSVR.REX, then check your REXX code
--*
-- in TARGSVR.REX
-- now done interpreting REXX logic file TARGSVR.REX
-- If you don’t see: ’-- now done interpreting...’ then check your REXX code
-- now done interpreting REXX password file PASSWORD.REX
--* Connect to the CNTL_ALIAS
--*
CONNECT TO DJDB USER djinst5 USING pwd;
--* If you don’t see: ’--* now done interpreting REXX logic file
--* CNTLSVR.REX’, then check your REXX code
--*
--* The subscription predicate was not changed by the user logic in
--* CNTLSVR.REX
--* now done interpreting REXX logic file CNTLSVR.REX
--* If you don’t see: -- now done interpreting REXX logic file
--* TARGSVR.REX, then check your REXX code
--*
-- in TARGSVR.REX
-- About to create a target table tablespace
-- CREATE TABLESPACE TSSTORE MANAGED BY DATABASE USING (FILE
’/data/djinst5/djinst5/STORE.F1’ 2000 );
E.7 Output from Register the Items, ProdLine, and Brand Tables
--* File Name: register_items+prodline+brand.sql
--*
--* Calling TABLEREG for source table ITSOSJ.BRAND
--*
-- create the index for the change data table for ITSOSJ.BRAND
CREATE TYPE 2 UNIQUE INDEX ITSOSJ.CDI00000000CDBRAND ON ITSOSJ.CDBRAND(
IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC);
COMMIT;
--*
--* Calling TABLEREG for source table ITSOSJ.ITEMS
--*
--* echo input: TABLEREG SJ390DB1 ITSOSJ ITEMS AFTER NONEXCLUDED
--* DELETEINSERTUPDATE NONE
--*
-- using SRCESVR.REX as the REXX logic filename
-- using REXX password file PASSWORD.REX
-- create the index for the change data table for ITSOSJ.ITEMS
CREATE TYPE 2 UNIQUE INDEX ITSOSJ.CDI00000000CDITEMS ON ITSOSJ.CDITEMS(
IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC);
--*
--* Calling TABLEREG for source table ITSOSJ.PRODLINE
--*
--* echo input: TABLEREG SJ390DB1 ITSOSJ PRODLINE AFTER NONEXCLUDED
--* DELETEINSERTUPDATE NONE
--*
-- using SRCESVR.REX as the REXX logic filename
-- using REXX password file PASSWORD.REX
-- create the index for the change data table for ITSOSJ.PRODLINE
CREATE TYPE 2 UNIQUE INDEX ITSOSJ.CDI00000CDPRODLINE ON
ITSOSJ.CDPRODLINE(IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC);
COMMIT;
-- If you don’t see: ’-- now done interpreting...’ then check your REXX code
-- now done interpreting REXX password file PASSWORD.REX
-- input view OWNER=DB2RES5 input view NAME=PRODUCTS
-- connect to the source server
CONNECT TO SJ390DB1 USER db2res5 USING pwd ;
COMMIT;
--*
--* Calling C:\DPRTools\addmembr.rex for WHQ1/SALES_SET pair # 2
--*
--* Echo input: ADDMEMBR DJDB WHQ1 SALES_SET DB2RES5 PRODUCTS
--* NONEEXECLUDED CCD NONE SIMON PRODUCTS NODATAJOINER U
--*
-- using REXX password file PASSWORD.REX
-- If you don’t see: ’-- now done interpreting...’ then check your REXX code
-- now done interpreting REXX password file PASSWORD.REX
--* Connect to the CNTL_ALIAS
--*
CONNECT TO DJDB USER djinst5 USING pwd;
--* If you don’t see: ’--* now done interpreting REXX logic file
--* CNTLSVR.REX’, then check your REXX code
--*
--* The subscription predicate was not changed by the user logic in
--* CNTLSVR.REX
--* now done interpreting REXX logic file CNTLSVR.REX
--* If you don’t see: -- now done interpreting REXX logic file
--* TARGSVR.REX, then check your REXX code
--*
-- in TARGSVR.REX
-- About to create a target table tablespace
-- CREATE TABLESPACE TSPRODUCTS MANAGED BY DATABASE USING (FILE
’/data/djinst5/djinst5/PRODUCTS.F1’ 2000 );
-- create the index for the change data table for DB2RES5.SALES
CREATE TYPE 2 UNIQUE INDEX DB2RES5.CDI00000000CDSALES ON
DB2RES5.CDSALES(IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC);
COMMIT;
--* If you don’t see: ’--* now done interpreting REXX logic file
--* CNTLSVR.REX’, then check your REXX code
--*
--* The subscription predicate was not changed by the user logic in
--* CNTLSVR.REX
--* now done interpreting REXX logic file CNTLSVR.REX
--* If you don’t see: -- now done interpreting REXX logic file
--* TARGSVR.REX, then check your REXX code
--*
-- in TARGSVR.REX
-- About to create a target table tablespace
-- CREATE TABLESPACE TSSALES MANAGED BY DATABASE USING (FILE
’/data/djinst5/djinst5/SALES.F1’ 2000 );
-- If you don’t see: ’-- now done interpreting...’ then check your REXX code
-- now done interpreting REXX password file PASSWORD.REX
--* connect to the CNTL_ALIAS
--*
CONNECT TO DJDB USER djinst5 USING pwd;
SUM_OUTPRC=
(SELECT CASE
WHEN SUM(DIFFERENCE_OUTPRC) IS NULL THEN A.SUM_OUTPRC
ELSE SUM(DIFFERENCE_OUTPRC) + A.SUM_OUTPRC
END
FROM SIMON.MOVEMENT M
WHERE A.COMPANY=M.COMPANY AND A.LOCATION=M.LOCATION),
IBMSNAP_HLOGMARKER=
(SELECT CASE
WHEN MAX(M.IBMSNAP_HLOGMARKER) IS NULL THEN A.IBMSNAP_HLOGMARKER
ELSE MAX(M.IBMSNAP_HLOGMARKER)
END
FROM SIMON.MOVEMENT M)’,’0000002000’);
--
--
-- Add some more SQL-after to adds rows when new COMPANY’s, LOCATION’s
-- are created.
--
--
INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST,
BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT,ACCEPT_SQLSTATES)
This Appendix contains the SQL generated from DJRA for the various
replication definitions which were configured in case study 4. The
modifications to the generated SQL are shown in bold typeface.
COMMENT ON IWH.CUSTOMERS (
CUSTNO IS ’Customer number’,
LNAME IS ’First name’,
FNAME IS ’Last name’,
SEX IS ’Sex’,
BIRTHDATE IS ’Birth date’,
AGENCY IS ’Agency code’,
SALESREP IS ’Sales rep in charge of the customer’,
ADDRESS IS ’Customer Address’,
LICNB IS ’Driving licence number’,
LICCAT IS ’Driving licence category’,
LICDATE IS ’Driving licence date’) ;
-- Contracts table
CREATE TABLE IWH.CONTRACTS (
CONTRACT INTEGER NOT NULL,
CONTYPE CHAR(2) NOT NULL,
CUSTNO CHAR(8) NOT NULL,
LIMITED CHAR(1),
BASEFARE DECIMAL(7, 2),
COMMENT ON IWH.CONTRACTS (
CONTRACT IS ’Contract number’,
CONTYPE IS ’Contract type’,
CUSTNO IS ’Customer number’,
LIMITED IS ’Warranty excludes fire/glass break’,
BASEFARE IS ’Annual base fare’,
TAXES IS ’Taxes’,
CREDATE IS ’Creation date’) ;
-- Vehicles table
CREATE TABLE IWH.VEHICLES (
PLATENUM CHAR(12) NOT NULL,
CONTRACT INTEGER NOT NULL,
CUSTNO CHAR(8) NOT NULL,
BRAND CHAR(10),
MODEL CHAR(10),
COACHWORK CHAR(1),
ENERGY CHAR(2),
POWER DECIMAL(4, 0),
ENGINEID CHAR(10),
VALUE DECIMAL(10, 0),
FACTORDATE DATE,
ALARM CHAR(1),
ANTITHEFT HAR(1),
PRIMARY KEY(PLATENUM))
DATA CAPTURE CHANGES ;
COMMENT ON IWH.VEHICLES (
PLATENUM IS ’Plate-number’,
CONTRACT IS ’Contract number’,
CUSTNO IS ’Customer number’,
BRAND IS ’Brand’,
MODEL IS ’Model’,
COACHWORK IS ’Coachwork type code’,
ENERGY IS ’Energy type’,
POWER IS ’Power’,
ENGINEID IS ’Engine identification number’,
VALUE IS ’Initial purchase value’,
FACTORDATE IS ’Date of exit from factory’,
ALARM IS ’Alarm feature code’,
ANTITHEFT IS ’Anti-theft feature code’) ;
COMMENT ON IWH.ACCIDENTS (
CUSTNO IS ’Customer number’,
ACCNUM IS ’Accident record number’,
TOWN IS ’Town where accident happened’,
REPAIRCOST IS ’Repair cost’,
STATUS IS ’Status’,
ACCDATE IS ’Accident Date’) ;
Notice : We adapted the generated SQL script before running it, to change
the name of the Change Data table.
--* echo input: TABLEREG SJNTDWH1 IWH CONTRACTS BOTH NONEEXCLUDED
--* DELETEINSERTUPDATE STANDARD N
--*
-- using SRCESVR.REX as the REXX logic filename
-- using REXX password file PASSWORD.REX
-- create the index for the change data table for IWH.CONTRACTS
CREATE UNIQUE INDEX IWH.CDICONTRACTS ON IWH.CDCONTRACTS(
IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC);
COMMIT;
-- If you don’t see: ’-- now done interpreting...’ then check your REXX
code
-- now done interpreting REXX password file PASSWORD.REX
-- input view OWNER=IWH input view NAME=VCONTRACTS
-- connect to the source server
CONNECT TO SJNTDWH1 USER DBADMIN USING pwd;
COMMIT;
-- Satisfactory completion at 6:40pm
-- If you don’t see: ’-- now done interpreting...’ then check your REXX
code
-- now done interpreting REXX password file PASSWORD.REX
--* CONNECTing TO SJNTDWH1 USER DBADMIN USING pwd;
--*
--* The ALIAS name ’SJNTDWH1’ matches the RDBNAM ’SJNTDWH1’
--*
--* connect to the CNTL_ALIAS
--*
CONNECT TO SJNTDWH1 USER DBADMIN USING pwd;
-- If you don’t see: ’-- now done interpreting...’ then check your REXX
code
-- now done interpreting REXX password file PASSWORD.REX
--* Connect to the CNTL_ALIAS
--*
CONNECT TO SJNTDWH1 USER DBADMIN USING pwd;
--* If you don’t see: ’--* now done interpreting REXX logic file
--* CNTLSVR.REX’, then check your REXX code
--*
--* The subscription predicate was not changed by the user logic in
--* CNTLSVR.REX
--* now done interpreting REXX logic file CNTLSVR.REX
IBM may have patents or pending patent applications covering subject matter
in this document. The furnishing of this document does not give you any
license to these patents. You can send license inquiries, in writing, to the IBM
Director of Licensing, IBM Corporation, 500 Columbus Avenue, Thornwood,
NY 10594 USA.
Licensees of this program who wish to have information about it for the
purpose of enabling: (i) the exchange of information between independently
created programs and other programs (including this one) and (ii) the mutual
use of the information which has been exchanged, should contact IBM
Corporation, Dept. 600A, Mail Drop 1329, Somers, NY 10589 USA.
The information contained in this document has not been submitted to any
formal IBM test and is distributed AS IS. The information about non-IBM
("vendor") products in this manual has been supplied by the vendor and IBM
assumes no responsibility for its accuracy or completeness. The use of this
information or the implementation of any of these techniques is a customer
Any pointers in this publication to external Web sites are provided for
convenience only and do not in any manner serve as an endorsement of
these Web sites.
The following document contains examples of data and reports used in daily
business operations. To illustrate them as completely as possible, the
examples contain the names of individuals, companies, brands, and
products. All of these names are fictitious and any similarity to the names and
addresses used by an actual business enterprise is entirely coincidental.
Reference to PTF numbers that have not been released through the normal
distribution process does not imply general availability. The purpose of
including these reference numbers is to alert IBM customers to specific
information relative to the implementation of the PTF when it becomes
available to each customer according to the normal IBM PTF distribution
process.
Informix, Informix Dynamic Server, Informix ESQL/C, and Informix Client SDK
are trademarks of Informix Corporation.
Microsoft, Windows, Windows NT, the Windows logo, and Access are
trademarks of Microsoft Corporation in the United States and/or other
countries.
MMX, and Pentium are trademarks of Intel Corporation in the United States
and/or other countries. (For a complete list of Intel trademarks see
www.intel.com/dradmarx.htm)
SET and the SET logo are trademarks owned by SET Secure Electronic
Transaction LLC.
The publications listed in this section are considered particularly suitable for a
more detailed discussion of the topics covered in this redbook.
http://www.software.ibm.com
http://www.software.ibm.com/data
http://www.software.ibm.com/data/dpropr
http://www.software.ibm.com/data/datajoiner
http://www.software.ibm.com/data/db2/performance
http://www.software.ibm.com/data/db2/performance/dprperf.htm
This section explains how both customers and IBM employees can find out about ITSO redbooks,
CD-ROMs, workshops, and residencies. A form for ordering books and CD-ROMs is also provided.
This information was current at the time of publication, but is continually subject to change. The latest
information may be found at http://www.redbooks.ibm.com/.
Redpieces
For information so current it is still in the process of being written, look at "Redpieces" on the
Redbooks Web Site (http://www.redbooks.ibm.com/redpieces.html). Redpieces are redbooks in
progress; not all redbooks become redpieces, and sometimes just a few chapters will be published
this way. The intent is to get the information out much quicker than the formal publishing process
allows.
Redpieces
For information so current it is still in the process of being written, look at "Redpieces" on the
Redbooks Web Site (http://www.redbooks.ibm.com/redpieces.html). Redpieces are redbooks in
progress; not all redbooks become redpieces, and sometimes just a few chapters will be published
this way. The intent is to get the information out much quicker than the formal publishing process
allows.
Company
Address
We accept American Express, Diners, Eurocard, Master Card, and Visa. Payment by credit card not
available in all countries. Signature mandatory for credit card payment.
403
404 The IBM Data Replication Solution
List of Abbreviations
409
410 The IBM Data Replication Solution
ITSO REDBOOK EVALUATION
My Mother Thinks I’m a DBA! Cross-Platform, Multi-Vendor, Distributed Relational Data Replication with
IBM DB2 DataPropagator and IBM DataJoiner Made Easy!
SG24-5463-00
Your feedback is very important to help us maintain the quality of ITSO redbooks. Please complete
this questionnaire and return it using one of the following methods:
• Use the online evaluation form found at http://www.redbooks.ibm.com
• Fax this form to: USA International Access Code + 1 914 432 8264
• Send your comments in an Internet note to redbook@us.ibm.com
Please rate your overall satisfaction with this book using the scale:
(1 = very good, 2 = good, 3 = average, 4 = poor, 5 = very poor)
Was this redbook published in time for your needs? Yes___ No___