Sunteți pe pagina 1din 21

WHITE PAPER

BMC
APPLICATION
RESTART
CONTROL
A return on investment analysis

Driving business value through collaborative intelligence

WWW.OVUM.COM

WWW.OVUM.COM

Written by: Martin Gandar

WHITE PAPER

Published September 2010, Ovum

Contents
Executive summary .......................................................................................................................................................... 3
Introduction ....................................................................................................................................................................... 4
Methodology .................................................................................................................................................................... 4
The business problem ...................................................................................................................................................... 4
The BMC solution APPLICATION RESTART CONTROL ............................................................................................. 6
BMC APPLICATION RESTART CONTROL.................................................................................................................... 6
AR/CTL for DB2............................................................................................................................................................... 7
AR/CTL for IMS ............................................................................................................................................................... 7
AR/CTL for VSAM ........................................................................................................................................................... 7
Customer experiences...................................................................................................................................................... 8
Customers general and site-specific assessment of BMCs tools and support ............................................................... 8
BMC APPLICATION RESTART CONTROL return on investment................................................................................ 10
Performance savings ..................................................................................................................................................... 11
Problem resolution savings............................................................................................................................................ 11
Savings in staff utilization .............................................................................................................................................. 12
Increased systems availability ....................................................................................................................................... 12
Migration savings........................................................................................................................................................... 13
Risk mitigationn/business value..................................................................................................................................... 13
Analysis and conclusion ................................................................................................................................................ 15
Appendix.......................................................................................................................................................................... 16
Supporting customer evidence on the value and capabilities of BMC APPLICATION RESTART CONTROL............... 16

BMC APPLICATION RESTART CONTROL


Ovum. This White Paper is a licensed product and is not to be photocopied

Published 09/2010
Page 2

WWW.OVUM.COM

WHITE PAPER

EXECUTIVE SUMMARY
The batch environment is still of major importance in delivering mainframe-based, mission-critical business
applications. As transaction rates and volumes have increased, its capacity to cope with the updates often
required to support important online environments has been squeezed. Being able to run batch applications in
parallel and concurrently with online applications has become vital for many organizations. In order to do that,
the batch activities need to avoid contention and locking out other processes. When problems arise they need
to be resolved effectively and automatically and that is why BMC developed APPLICATION RESTART
CONTROL (AR/CTL), for IMS, DB2, and VSAM and to coordinate the restart and recovery of batch processes
that abended or were causing other problems. At the core of BMC AR/CTL is the way it manages checkpoint
strategies helping to automate the insertion of checkpoints and pacing their use so that no matter how many
checkpoints are issued only a sensible number of them are actually processed. This saves massive amounts
of CPU time whilst ensuring that there are secure points from which to restart programs rather than restarting
from the beginning.
In this paper we approached a number of major BMC customers to see how valuably they regarded BMC
AR/CTL and to monetize their experiences to determine a potential return on investment (ROI) for those that
have yet to adopt the technology.
We found that the different sites often used the wide-ranging capabilities of AR/CTL in different ways, some
not using features that were of critical value to others such as the virtual sequential access method (VSAM)
support, for example. However, the universal response was that BMC AR/CTL was a critical element in
enabling their online systems to maintain the service levels required, and that without it the batch stream
would often be unable to process the background updates required to support it.
The ROI calculations were based on anecdotal and measured evidence from the customers and from our
own judgment on what would be realistic to expect for a significant site taking on BMC AR/CTL if they had
some or all of the problems that the sites in question had before its adoption.
We looked at the following areas and gave our assessment of what we felt was a realistic possible return
based on what these companies had told us.

Performance reducing CPU usage through pacing

Problem resolution automating the recovery from errors

$250,000

Staff savings reducing the analysis and manual recovery effort

$150,000

Increased systems availability though extending online availability

$1,000,000

Avoiding migration for VSAM sites if applicable

$2,000,000

Risk mitigation the value of reduced downtime

$1.500,000

$1,000,000

We cant say that all sites would get close to a $4 million payback (assuming they dont require the VSAM
element) through the adoption of BMC AR/CTL and clearly many will have resolved or avoided the problems
in other ways, but we would say that BMCs solution was regarded as indispensible by these sites, that they
found it of immense value, and that the support given by BMC was found to be exceptionally good.

BMC APPLICATION RESTART CONTROL


Ovum. This White Paper is a licensed product and is not to be photocopied

Published 09/2010
Page 3

WWW.OVUM.COM

WHITE PAPER

INTRODUCTION
METHODOLOGY
To better understand how BMC AR/CTL provides business value to those that make it a key part of their
mainframe systems management strategy, Ovum undertook interviews with a number of users of these
products. These interviews covered companies in a wide range of industries including a medical lab, a
government agency, logistics, and financial and insurance services.

THE BUSINESS PROBLEM


The batch stream is critical to most mainframe environments undertaking fundamental data transformations
that enable the customer-facing online applications to deliver the organizations mission-critical applications,
but in a very high percentage of installations the batch programs are now extremely mature, many having
been written decades ago this gives us a number of challenges.

The code may be poorly understood or undocumented, making it dangerous to amend.

The code may have been tuned for a generation of mainframes that had significantly less processing
power than is currently available, such that the checks and controls built into them are no longer
processing efficiently.

They may have been built for earlier generations of operating environment that in an ideal world one
would wish to migrate from them, but the cost of such migration and redevelopment might be prohibitive.

The scale of work undertaken by these batch programs may have grown exponentially over the years
such that each batch run is now performing a massive number of important updates and takes a
considerable time to run. Restarting them from the beginning in the event of a problem may simply be
impossible given the available batch capacity.

The need for 24/7 online applications that access or share information that is updated by batch programs
dictates that there will be a conflict of interest in managing the locks and the access to that information
which needs to be very carefully managed.

The temptation may be to leave the environment alone and run these programs as they have always run;
maybe overnight in a batch window or at least in an environment that isolates problems from your main online
systems. In the past you may have had on call supervisors/analysts that would notice an abended batch
process and have the appropriate knowledge regarding the best way to restart it, or you may have waited
until the morning to re-run failed programs. Either way this risked holding up important updates and was a
waste of time and resources.
Figure 1:

Delayed restart

Source: BMC

BMC APPLICATION RESTART CONTROL


Ovum. This White Paper is a licensed product and is not to be photocopied

OVUM

Published 09/2010
Page 4

WWW.OVUM.COM

WHITE PAPER

Figure 1 makes the point that the time delay involved in manually resolving an abend and restarting from the
beginning is wasteful. For example, if the think and recover time was seven hours then a five-hour job that
abended after four hours has just become a sixteen-hour job once completed.
This time delay may well be acceptable in some sites. But if the batch processes perform critical updates
necessary for the smooth running of your business they may be totally unacceptable: for example, if you need
these to have competed before you can run your critical online systems, or if you are running these
concurrently with the online environment and may be increasing the risk of further lock contention. and other
clashes.
So what mechanisms exist to help resolve these issues or avoid them?
When a batch job fails, it must be restarted at the point of failure or at the beginning of the job (after recovery
of affected databases and files). It is not practical to back out everything and start from the beginning because
backing out updates can take twice as long as running the application. Nor is it practical to restart from the
wrong point, then back out errors and start all over again. The only way to restart at the point of failure is to
ensure that the application takes periodic checkpoints of the contents of application working storage areas
and knows the position of flat files so that they can be repositioned correctly at restart.
Taking too many checkpoints wastes CPU resources, but taking too few lengthens the time required to
recover the failed job. So there is an optimal level to the number of checkpoints that you would want to take,
and indeed this might vary during the course of the day and night.
You also have to ensure that you have checkpoints that cover all the applications and subsystems that might
cause contentions or problems including IMS and DB2.
If you get the balance right and have checkpoints that cover all the different potentially conflicting areas then
you will be able to back out only a minimum of updates or put a hold on a process whilst the conflicts are
removed and then restart, or reattach and restart, without wasting time or effort. Most importantly, as this can
be done automatically, you are not then waiting on manual supervisor intervention.
Figure 2:

Automated restart

Source: BMC

OVUM

Figure 2 shows the much more comfortable situation where an abend was correctly checkpointed and the
batch five-hour application was able to be automatically restarted from checkpoint 15. This job that failed after
four and a half hours may well have completed in a little over six hours.

BMC APPLICATION RESTART CONTROL


Ovum. This White Paper is a licensed product and is not to be photocopied

Published 09/2010
Page 5

WWW.OVUM.COM

WHITE PAPER

THE BMC SOLUTION APPLICATION RESTART


CONTROL
BMCs solution to the automated handling of batch restarts is BMC APPLICATION RESTART CONTROL
(AR/CTL). AR/CTL products are available for DB2, IMS, and VSAM.
The tools provide the ability to automatically resume the processing of failed or interrupted batch applications
from the most recent checkpoint, rather than from the beginning of the job step. AR/CTL also offers the
opportunity to balance performance, restart time, and checkpoint overhead by controlling the checkpoint
frequency independently from the checkpoints actually written into the internal applications code.

BMC APPLICATION RESTART CONTROL


AR/CTL products provide the following features:

Automatic restart checkpoint selection ensures integrity and shortens the restart time by determining
which checkpoint to restart from.

Application working storage checkpointing can capture and restore an application programs working
storage areas in main memory. This allows the program to resume processing at the last checkpoint. It
can also capture and restore saved areas of virtual storage for subprograms executing under the main
program.

Application reattach improves the operational stability of many application environments by providing
automation to react to certain types of abend conditions. Abends often result from lock contention. Often
the resolution to these conditions is well understood and can be resolved automatically. This makes it
possible to schedule update processes to run in parallel rather than serially because when locks occur
they do not cause significant difficulties.

Checkpoint and restart coordination for DB2, IMS, and CICS/VSAM restarts so that there is a
synchronized restart of batches that touch multiple environments.

Automatic checkpoint simplifies and speeds the process of implementing checkpoint/restart logic into
application programs.

Program exception handling automatically redirects bad input data that causes S0C7 abends into a
reject file and lets the application continue. Redirected records can be cleaned up later and resubmitted.

Flat files automatically manages flat files and ensures that the contents of the files are synchronized
with database activity when checkpoints are issued. During restart processing, the files are automatically
repositioned to their state as of the latest checkpoint.

Suspend and resume processing to obtain a point of consistency required for reorganization or
recovery with the following products by forcing a well-organized abend and then restarting the following
batch processes when contention is no longer a problem:
o

BMC Backup and Recovery Solution for IMS

BMC MAXM Reorg/Online for IMS

BMC IMAGE COPY PLUS

BMC APPLICATION RESTART CONTROL


Ovum. This White Paper is a licensed product and is not to be photocopied

Published 09/2010
Page 6

WWW.OVUM.COM
o

BMC RECOVERY MANAGER for IMS

BMC REORG PLUS for DB2 Online Feature

BMC Fast Path Online Restructure/EP

WHITE PAPER

AR/CTL FOR DB2


AR/CTL provides DB2-oriented features:

SQL return code handling can intercept a defined SQL return code received during application program
processing and issue a user-defined user abend code and reason code. This can be used to standardize
911 processing throughout an entire application environment.

Cursor repositioning most checkpoint restart solutions can effectively save working storage, but
AR/CTL for DB2 can return the application to the proper position within the cursor. This removes the need
to add logic to your DB2 applications to track and store the cursor position for use in a checkpoint restart.

Batch attachment facility performs the attachment to DB2 on behalf of the application and can run in an
attach only mode to provide the DB2 attach facility for programs not using checkpoint/restart services.

AR/CTL FOR IMS


AR/CTL provides IMS-oriented features:

Restart with no code changes fully supports and enhances the IMS Extended Restart facility,
requires no application code or JCL changes, and eliminates the need to change application code to call
a third-party restart program.

Flat file management supports and manages IMS generalized sequential access method (GSAM)
files and native file techniques; there is no need to convert flat files to GSAM.

Checkpoint management externally filters excessive checkpoint activity to provide significant savings
in elapsed time and CPU consumption. Many legacy applications were developed to run on slower
processors and the checkpoint intervals were never recalibrated for hardware upgrades.

Database recovery control (DBRC) conversion aid can automatically provide a logging environment
to avoid having to retrofit data language interface and job control language scripts (DL/I JCL) when
converting an application to run under DBRC.

AR/CTL FOR VSAM


AR/CTL provides VSAM-oriented features:

Local VSAM access services for VSAM data sets these are accessed exclusively by a batch VSAM
application program; provides checkpoint support and automatic backout support for VSAM files.

Database management system (DBMS) synchronization automatically synchronizes VSAM


checkpoint/restart activity with DB2 or IMS checkpoint processing.

VSAM file sharing supports remote VSAM file sharing between batch applications and CICS regions
executing on the same or different z/OS images. This allows batch application programs to update
VSAM files while they are online to CICS and in full update mode, and makes it possible to avoid
converting a VSAM file to DB2 or IMS to provide 24x7 type access to the file.

BMC APPLICATION RESTART CONTROL


Ovum. This White Paper is a licensed product and is not to be photocopied

Published 09/2010
Page 7

WWW.OVUM.COM

WHITE PAPER

CUSTOMER EXPERIENCES
CUSTOMERS GENERAL AND SITE-SPECIFIC ASSESSMENT OF BMCS TOOLS
AND SUPPORT
Ovum has a high regard for BMCs mainframe management solutions, and this is clearly supported by their
client base.
Figure 3:

BMC customers assessment of the quality of BMC tools and support

Source: Ovum survey of BMC customer base

OVUM

Figure 3 shows the customers average assessment of the overall quality of the BMC mainframe
management tools and their quality of support. They were asked to score this on a scale where 1 represents
poor quality or support and 5 represents excellence.

BMC APPLICATION RESTART CONTROL


Ovum. This White Paper is a licensed product and is not to be photocopied

Published 09/2010
Page 8

WWW.OVUM.COM

Figure 4:

WHITE PAPER

Customers valuation of functionality specific to their own particular site

Source: BMC customer base

OVUM

To create Figure 4 we asked the customers to review the specific usage of the features of BMC AR/CTL at
their own sites and to make an assessment of the relative value to them based on the following guidelines: if
they didnt use the feature we scored it as a 1; if they did use the feature then if it was of little use they should
score it 2 and if it was regarded as extremely important and useful to the site they should score it 5. A
maximum of 25 would then be possible for a site that used and found valuable every feature of AR/CTL that
we identified in the list.
You can see from the diagram that only one site made extensive use of the VSAM support features and that
they rated this highly. The main strengths of the solution were what we might have expected, in that it
reduces the elapsed time needed to run batch jobs and increased availability of the online services.
What also became clear from these scores is that AR/CTL also reduced the effort and problems caused by
the sort of issues that it was developed to fix, reducing the burden on staff and making the response to
problems much faster and easier.

BMC APPLICATION RESTART CONTROL


Ovum. This White Paper is a licensed product and is not to be photocopied

Published 09/2010
Page 9

WWW.OVUM.COM

Figure 5:

WHITE PAPER

The average value provided to each specific site of key functionality

Source: Ovum survey of BMC customer base

OVUM

Taking these figures and averaging them provides the summarized view of site-specific value given in Figure
5. We stress again that this is a chart that shows the relative value of the features to these sites rather than
an analysts technical evaluation of their particular capabilities. It is clear from the results that although the
VSAM support is not particularly useful to most sites although invaluable to those that need it because it
removed the need for an expensive migration the general set of features is highly regarded and seen to
offer significant value. This is something we will now explore in the next section of this paper.
Perhaps we should finish this section by quoting the software support manager at the government agency
who said I have loved this product ever since we have had it. Its really easy to maintain and update and
because I was one of the people who brought it in I feel that I know it inside out.

BMC APPLICATION RESTART CONTROL RETURN ON


INVESTMENT
When we interviewed BMCs clients we were given anecdotal evidence sometimes supported by solid
metrics. We found common themes across these sites and strong evidence of value. The busy systems
management staff werent able however to provide detailed metrics for every capability supported by BMC
AR/CTL. Nevertheless we gathered enough evidence, which is presented in more detail as an appendix to
this paper, to monetize their experiences and develop a strong ROI case that we present here.

BMC APPLICATION RESTART CONTROL


Ovum. This White Paper is a licensed product and is not to be photocopied

Published 09/2010
Page 10

WWW.OVUM.COM

WHITE PAPER

PERFORMANCE SAVINGS
Michael Pope at Safeco has some 6,000 batch jobs run daily that are registered with BMC AR/CTL. For
every thousand checkpoints requested the pacing mechanism maybe executes only ten. The average
number of checkpoints requested is roughly 5,000 per batch. Without AR/CTL they would obviously have
worked hard to reduce the checkpoints some other way, but we think it fair to suggest that they may have got
the number down to 50 rather than ten and that AR/CTL saving them at least 40 checkpoints per thousand
requested per batch run. Michael said that checkpoints were sub second and we will take 0.1 of a second for
this estimate. We will also take a conservative estimate of cost for the CPU usage.
Cost of CPU = $0.15 per second
The value of CPU time saving per day through the use of the pacing function is thus:
Number of jobs run * number of thousands of checkpoint pre-pacing * saving per thousand * time
to checkpoint * cost per second of CPU
6,000 * 5 *40 * 0.1 * 0.15 = $18,000 per day or $5,400,000 per annum
The medical laboratory had measured their actual CPU savings and said that the cost savings were:
Number of processes run per day * savings through pacing per process in minutes * 60 * CPU cost
per second
29 * 12 * 60 * 0.15 = $3,132 or $939,600 per annum

PROBLEM RESOLUTION SAVINGS


The government agencys software manager said that they had about 15 critical abends a week on their
300 most important batch jobs. She suggested that it might take up to six hours to manually recover from
such a problem and that that analysis and recovery might involve a number of people. If we think merely in
terms of the cost of the people involved, forgetting the CPU cost, and say that the average resolution took two
people three hours for each problem then:
Cost of a trained support staff at $100 per hour
Cost per week = number of critical abends * number of staff involved * elapsed time to resolve *
cost per hour
= 15 * 2 * 3 * $100 = $9,000 or $468,000 per annum
Petra Kopp at Hermes had similar savings and suggested that just the elegant mechanism for forcing a
managed abend and restart saves them five hours of processing effort per week.
Taking the cost of elapsed processor time as:
Cost of elapsed time = $2 per minute
Then the saving per week would be:
Number of hours per week saved * 60 * cost per minute
= 5 * 60 * 2 = $600 or $31,200 per annum

BMC APPLICATION RESTART CONTROL


Ovum. This White Paper is a licensed product and is not to be photocopied

Published 09/2010
Page 11

WWW.OVUM.COM

WHITE PAPER

Michael Pope at Safeco his major site gets 200 problems a day that are resolved automatically by
reattaches. Michael said I can only say that if each of these was an incident i.e., a job failure with on-call
support and action then reattach is providing significant savings in time and effort. We suggest that in his
case it may be many times the cost experienced at Hermes.

SAVINGS IN STAFF UTILIZATION


The medical lab companys support manager said that it would take 20 minutes to manually resolve a
contention issue and that on any given day he might get 12 or 13 of these. Just as interesting, it often
involved two or more staff members to resolve such issues. He said that the automated resolution saved CPU
time but also allowed staff to get on with other activities. Considering just the improvements in staff utilization
we get a saving of:
Number of staff involved * cost of staff per hour * time to resolve incident * number of incidents
2 * $100 * (20/60) * 12 = $800 per day or $239,000 per annum
He said in summary that he felt that the use of AR/CTL had saved him at least one fully paid up staff member
which we normally cost at $150,000 which seems to support our calculated figure.
Petra Kopp at Hermes said Wed probably need to employ an additional person at least, particularly to
cover jobs in the night that would otherwise impact the start of the online system in the morning. There is a lot
of pressure to make sure the online system is available.
Central systems database leader at the global courier saw programmer productivity improvements in
testing particularly Adding a couple of hours work each time they tested, and with version control, etc., it
can all be quite a lot of work.
If the programming staff are making five such changes in a day then the costs would be:
Cost per hour of programmer * time saved * number of changes and tests
$100 * 2 * 5 = $1,000 per day or $365,000 per annum

INCREASED SYSTEMS AVAILABILITY


In the clients we talked to it was clear that BMC AR/CTL was of major value in enabling batch work to run
concurrently with the online systems and avoid contentions.
The government agency said We estimate that using BMC AR/CTL to enable parallel processing is going
to give us two hours extra per day availability of our online systems as well as resolving contentions.
So what is two hours per day extra online systems availability worth? Its certainly many thousands of dollars.
Petra Kopp of Hermes was able to offer the online clients completing contracts around the world an extra 11
hours per day due to the elimination of most of the need for a significant batch window. She estimated that
the cost of delaying availability of the online system at the start of the day would be in the order of 12,000
($16,000) per hour. If we say that due to the ability to run in parallel they are getting maybe three more such
productive hours throughout the extended availability than the value would be:
Value of online system per hour * number of additional hours available
$16,000 * 3 = $48,000 or $14.4 million per annum

BMC APPLICATION RESTART CONTROL


Ovum. This White Paper is a licensed product and is not to be photocopied

Published 09/2010
Page 12

WWW.OVUM.COM

WHITE PAPER

MIGRATION SAVINGS
Only one of the sites we questioned had used the VSAM/DB2 capabilities of BMC AR/CTL to enable them to
run VSAM-based applications against DB2 without the need to migrate those environments and re-write the
programs.
The medical laboratory in question suggested that It would be a huge effort to convert to DB2 and would
need outside help. There are 3 million lines of code which would cost well over $1 million and the database
conversion would be additional certainly the full cost would be over $2 million.A typical migration saving for
a site with 3 million lines of code might be in the order of $2 million. So although this is an issue that is rather
specialized, it has enormous value to the sites where such savings are applicable.

RISK MITIGATIONN/BUSINESS VALUE


In many cases, the increased volume of transactions that their systems were now running in comparison to
the time they were first introduced had stretched the capacity of our clients to process the batches in the time
available.
The medical laboratory said Before we had AR/CTL we were close to missing our service level agreement
(SLA) as we were within minutes of missing our batch window. After an abend even with all hands on deck
we wouldnt guarantee to complete in time. We asked the systems manager what the effect of missing that
window was. He said If we miss the batch SLA we go into the next day. We may miss weekend processing
or month end as we have 54 days on average outstanding invoices (24 days to process and 30 day payment
terms) totaling $3-4 million a day, if we miss it there is down time for the users doing billing. If we miss the
month end it could lengthen the time to receipt of payment to 84 days. Its happened and it hurts but now its
most unlikely to ever happen again.
If we say that the cost of the cashflow crisis was only the value of lost interest on the delayed payments at an
annual rate of interest of 5% then the cost of such a missed monthly SLA would be:
Number of days missing payment * amount per day * interest rate* (percentage of year delayed)
= 30 * $3 million * 5% * 30/365 = $369,855
Hermes needs BMC AR/CTL because they have a three-day window at the end of the month to run 2,000
batch jobs and that window is short. They also have to launch it during the day. If they get clashes or
problems they have to restart these batches; without checkpointing it would be impossible as starting from the
beginning would cost too much time. If we missed the batch window people would have to wait, especially
the data warehouse (DW) application, until the batches had finished before they could run their reports. At the
beginning of the month they need a lot of reports. If they have to wait, management cannot make decisions.
One or two or three days delay whilst we catch up. She said that reports provide the basis for decisionmaking: We will lose thousands of euros.
The global courier would have similar financial repercussions should they fail to process the tracking data
prior to delivery of their consignments to one of their global depots. The global courier tracks some 11.8
million consignments per day at a rate growing annually by 20%. Their main concern was that customers
would not be able to check their assignments locations until they caught up with the processing, and that they
would risk the loss of customers.

BMC APPLICATION RESTART CONTROL


Ovum. This White Paper is a licensed product and is not to be photocopied

Published 09/2010
Page 13

WWW.OVUM.COM

WHITE PAPER

In all the calculations discussed here its often difficult to put a value on the opportunity cost, or on the
business value lost of the reduced availability or complete loss of the companies critical applications.
According to several analyst houses, including Ovum Enterprise IT, on average, businesses lose between
$84,000 and $108,000 for every hour of system downtime, and according to Dunn & Bradstreet, 59% of
Fortune 500 companies experience a minimum of 1.6 hours of downtime per week.
A more detailed recent study gives typical hourly cost of downtime by industry for those areas that suffer the
most critical losses. Clearly these larger figures are for a total collapse in service availability.
Brokerage service

$6.48 million

Energy

$2 .80 million

Telecom

$2 .00 million

Manufacturing

$1.60 million

Retail

$1.10 million

Healthcare

$0.64 million

Media

$0.90 million

Average value

$2.22 million

Sources: Network Computing, the Meta Group, and Contingency Planning Research. All figures in US dollars.
Many of the customers gave clear recognition to the fact that failure to process the batch work would have
severe consequences. This might involve damage to their capacity to make critical business decisions
through a delay in the production of reports, disrupting business-critical online services, or delaying the
collection of payments. There was no doubt, in those we questioned, that the introduction of BMC AR/CTL
has reduced that likelihood considerably. If we use that Dunn & Bradstreet figure for actual downtime and
suggest that without BMC AR/CTL that downtime might be increased by as little as 1% then taking an
average for the industries from the table above we get a value for the reduced risk per year (when using BMC
AR/CTL) of:
Average cost of downtime per hour * number of hours reduced per week * number of weeks in a
year
= $2.22 * (1.6 * 1/100) * 52 = $1.847 million

BMC APPLICATION RESTART CONTROL


Ovum. This White Paper is a licensed product and is not to be photocopied

Published 09/2010
Page 14

WWW.OVUM.COM

WHITE PAPER

ANALYSIS AND CONCLUSION


Table 1:

Potential impact of BMC AR/CTL on savings

Area

Specific example

Performance savings

Safeco

Problem resolution savings

$5,400,000

Medical laboratory

$939,600

Government agency

$468,000

Hermes
Savings in staff utilization

Potential value/saving
per annum

$31,200

Medical laboratory

$239,000

Hermes

$150,000

The global courier

$365,000

Increased systems availability

Hermes

Migration savings

Medical laboratory

Risk mitigation/business value

Medical laboratory delayed payments


Estimated value of reduced risk of major
downtime

Source: BMC

$14,400,000
$2,000,000
$369,855
$1,847,000

OVUM

The table above summarizes the calculations that we have made based on the anecdotal evidence provided by
the clients and our analysis of the impact that BMC AR/CTL is likely to make on each possible area of savings.
Performance savings are going to vary enormously between sites. But all those we questioned said that they
made significant savings and it would not be unreasonable to expect a saving of $1 million at a major
installation based on the evidence we were given.
We calculated the value of problem resolution in man effort rather than business impact in order not to double
count the impact of BMC AR/CTLs contribution to alleviating downtime or enabling higher systems availability
but it still showed a value of something like $250,000 on average.
Savings in staff utilization were almost unanimously in step with the view that they were saving at least a
single if not two full-time equivalents (FTEs) in headcount and thus it is easy to justify a value of $150,000 for
this saving.
Increased systems availability was harder to determine as most systems managers did not have a firm grasp
of the value per hour of their systems to the business although they all suggested that BMC ARC/CTL was a
valuable contributor to improving their availability. However one customer had experienced measurable
savings that resulted in a figure of $14.4 million per annum because BMC AR/CTL was the major enabler of
their ability to extend their online availability. We think it would be safer to suggest that on average
organizations may gain at least $1 million in additional value from higher online availability.
The avoidance of migration issues was worth over $2 million to one site. We think that this is tremendously
valuable to those for whom it is relevant but we will not assume that most would gain any benefit from this.
Risk mitigation was, however, clearly a valuable factor in the use of BMC AR/CTL particularly for sites where
the volume of transactions continued to grow and the pressure and risk of failure had been mounting on them.
We think that to say that BMC AR/CTL had a value of $1.5 million in reducing this risk would be a fair
assessment.

BMC APPLICATION RESTART CONTROL


Ovum. This White Paper is a licensed product and is not to be photocopied

Published 09/2010
Page 15

WWW.OVUM.COM

WHITE PAPER

In conclusion then, a typical major installation might see a return on investment approaching $4 million from
the use of BMC AR/CTL if they had issues similar to those found on the sites of the clients that we
interviewed.

APPENDIX
SUPPORTING CUSTOMER EVIDENCE ON THE VALUE AND CAPABILITIES OF
BMC APPLICATION RESTART CONTROL
BMCs clients proved to be quite voluble on both the general and specific value they saw from the use of
BMC AR/CTL. They also talked about using those tools in conjunction with other BMC products. We will
concentrate on the main product set in review here but will also mention other products as they seem
relevant.
Performance savings
At the Safeco arm of Liberty Mutual there are two mainframes with IMS and DB2 using BMC AR/CTL.
Hundreds of job/steps are registered with AR/CTL thats usage is standard with IMS BMPs and DB2
processing.
Performance metrics are not kept but Michael Pope of the database services team said We do run a daily
audit report on pacing statistics and on many days have over 6000 entries in both test and production, with
most showing pacing saves between 90 and 100 % of checkpoints requested.
For example, the number on the right is actual completed checkpoints for the applications described.

BMC150165I

938,862 checkpoints issued

BMC150165I

181,097 checkpoints issued 1,426

BMC150165I

512 checkpoints issued

BMC150165I

88,531 checkpoints issued

188

2
88

Michael said that Safeco didnt use checkpoint processing before AR/CTL so in some ways we are using
more CPU cycles. Whenever there is a unit of work the programmers are instructed to request a checkpoint.
If there are 900,000 units of work there are likely to be 900,000 checkpoint requests. Multiple pacing
parameters are used to protect the online environments to ensure that there are not a large number of locks
on at any given time. Pacing is based on CPU time, number of calls, number of updates etc. and typically
only a small % of checkpoints are therefore applied ( maybe as little as 1% ).
Michael said that We would have difficulty running jobs with all these checkpoints. The slower an application
runs the more checkpoints get requested extending the runtime, and conversely the faster it runs the
overhead is reduced as less checkpoints actually get applied. We are saving a dramatic number of possible
checkpoints by using the pacing capabilities available with this product.
Michael estimated that a checkpoint was a sub second activity but we have not seriously measured what the
real CPU savings are. Without pacing developers would use an alternative method to reduce the number of
checkpoints, the benefits to AR/CTL are simplicity and a common programming interface.

BMC APPLICATION RESTART CONTROL


Ovum. This White Paper is a licensed product and is not to be photocopied

Published 09/2010
Page 16

WWW.OVUM.COM

WHITE PAPER

An insurance company running a huge IMS site has recently turned on pacing on for a fairly significant
amount of their IMS BMP workload. A BMP is a mix that can process files, but the databases, buffering and
log belong to the control region enabling the running of batch activity concurrent with online systems. They
measured CPU usage before and after they turned on pacing and have been able to identify a 25-30% CPU
reduction over all the jobs that are running within that workload. Their programs were issuing an average of
four to five checkpoints each second. By moving that interval out to one checkpoint per second they were
able to achieve this significant saving. BMC projects that they will get some more incremental benefit by
eventually moving the interval out to a three second interval, but they are already in the area of diminishing
returns just going to a one-second interval because that removed 75% of the checkpoints and the associated
overhead.
The systems manager at the medical laboratory said that checkpoint pacing reduces checkpoints from 94.35
% of the issued checkpoints on his site. We dont have to hit the catalog all those times; the checkpoints are
automated not explicit. Pacing is ten-second time (for example in one run we issued 194 actual checkpoints
through AR/CTL).
Although it would commit over 1,000 times, total time would have been 19 minutes and with AR/CTL it took
seven minutes. We have 29 similar processes on a daily basis (each saving 12 minutes) and a total of 43
registered under AR/CTL that also run at weekends. Daily saving is then 348 minutes (5.7 hours!).
He summarized by saying Its a wonderful tool for the optimization of our batch system.
Problem resolution savings
Michael Pope of Safeco knows that the automated restart facilities are saving him considerable effort post
abends. There are as many as 200 reattaches in a given day. Typically a developer uses re-attach to reset
conditions where a delay of a few seconds can clear the issue. Without reattach developers would design
differently or else they would be called out in the middle of the night. It clearly is added value to detect and
delay and in almost all cases resolve the issue with this capability.
In the IMS world an abend without checkpoints requires a failed job step to bac kout all the way to the
beginning of the job step, rather than reprocessing from the last completed checkpoint. There are
fundamental time savings in the work already processed being saved. And if each of these became an
incident (job failure with on call support and action) then reattach is providing significant savings in time and
effort.
The software support manager at a government agency uses BMC AR/CTL with manual checkpointing for
their COBOL and Natural applications against ADABAS databases. 300 of their critical batch programs out of
a total of 1,500 are registered with BMC AR/CTL.
The software manager says We often update two different databases in one program so restarts need to
realign the databases from the last commit, otherwise wed have to back out everything to the beginning. We
get about 15 abends a week due to different things such as messing with JCL, or programs not being
registered, or even if there is an empty data set where they might want an orderly abend because that tells
them what to run next. Without BMC AR/CTL the critical abends would require a lot of human intervention and
analysis to resolve. Typically, wed need a lot of analysis to see where we were in the processing because its
very complicated to work out where we are. We have to go back to where we were and look at locks from that
point. It would take a good six hours to recover from one critical program that abends.
She said Tons of senior and expensive people are involved and it wastes a lot of CPU time and also ruins
the batch window because we then have to work out what cant run if we are to run our online systems.

BMC APPLICATION RESTART CONTROL


Ovum. This White Paper is a licensed product and is not to be photocopied

Published 09/2010
Page 17

WWW.OVUM.COM

WHITE PAPER

Petra Kopp at Euler Hermes Kreditversicherungs-AG said AR/CTL can suspend a batch program if it is
doing something crazy like looping. We are able to suspend it in an elegant way through an interrupt facility,
back it out, and then resolve and restart the program. Most time we do this is when a batch job is running
over hours. This often happens due to DB2 database growth because the indexes werent set up for that
size. We may need to rebind them, or program changes make the access paths different. This happens
mostly with the Data Warehouse applications that are growing. It happens quite a lot there as they often need
to pull in other databases; this implies changes to access paths. Its happening about five times a week and
the time saved by being able to make the change and restart from an elegant checkpointed restart is an
average of 1 hour each time.
Petras final comment here was AR/CTL has definitely had a visible impact on our performance and the
degree of satisfaction with our department in the eyes of the business.
The central systems database leader of the global courier said Automated restart for 777s abends is
used extensively; the operators would have to have dealt with restarts manually before we put this in. These
are either cross-system between DB2 and IMS or within the IMS system. Errors like this used to delay
execution of several of our most critical jobs, as it would require manual intervention. This is not so much
about staff productivity, but the fact is that once it has failed it does nothing until someone has the time to go
in and deal with it. If it restarts automatically we dont have to worry about it unless it fails repeatedly (we have
this set to three which is rare).
Staff utilization savings
Michael Pope of Safeco said that The other main benefit of BMC AR/CTL is file repositioning which is not a
characteristic of basic checkpoint processing. But AR/C does this for you and thats a significant value.
Managing these definitions in the Safeco environment does take some resource, but is centralized so any
developer can take advantage of the products capabilities.
The systems manager at the medical laboratory says that AR/CTL has automated the elimination of
contention problems that give 911 abends. Our batch runs very much in serial and our schedule was reduced
without doing anything. Would take 20 minutes to resolve a negative 911 manually and on any given day we
might get 12 or 13 of these. We saw a red flag as DB2 tables were clobbering each other and it went on for a
year. The tables were growing and the batch window was processing more data. Other jobs scheduled to run
at certain times were beginning to overlap and thus there was even more contention.
Often a number of people were involved in resolution. Although technically one could handle it, we often had
four or five looking at it. So we often let other jobs finish and then wed restart them as we didnt have AR/CTL
to automate restarts from checkpoints. Now life is so good! No one even sees any problems as the reattaches
work and we get return codes of zero. This saves CPU time as there are no hiccups and lots of people time
so they could work on other things.
The systems manager also said that Maintenance is very manageable and easy to use. We only need one or
two people and the danger is that you really ought to have more than one who knows how to use it, as if there
is an AR/CTL problem you need to know what to do. Weve written down procedures that helped this, on
balance Id say it has reduced staff (probably by one unit).
Petra Kopp at Euler Hermes said We have about five restarts in a day. Every program change can cause a
mistake to be made. Most of the time we can use the restart facility and not often would we need more
specialist knowledge. The automated restart facility is simple to do use. If done manually its not the time but
the complexity of the task that is the problem particularly if the person is not familiar with IMS. If done by a
scheduler with limited knowledge they would need to know which checkpoint to use and whether it was
possible and necessary to make backups etc in which case it would be much more difficult.

BMC APPLICATION RESTART CONTROL


Ovum. This White Paper is a licensed product and is not to be photocopied

Published 09/2010
Page 18

WWW.OVUM.COM

WHITE PAPER

We also use the Batch Backout Facility implemented in AR/CTL and that makes it much safer because the
backup of IMS database is very critical. If batch jobs abend during the night, the database might not be
available for other jobs and for the online-systems so it is vital, that any problems concerning backout are
solved as soon as possible. Again, if these staff who only schedule jobs should manually need to solve this
problem, there might well be mistakes. Now its automatic and we dont have to worry.
Wed probably need to employ an additional person at least particularly to cover jobs in the night that would
otherwise impact the start of the online system in the morning. There is a lot of pressure to make sure the
online system is available.
The central systems database leader of the global courier saw programmer productivity savings. He said
There are programmer savings on doing the testing: i.e., all thats required is a JCL parameter change to
force an abend to happen rather than either altering the data to force a known error or valid validation
condition that could trigger an event, or change the code to test it and then change it back again which
wouldnt really be a valid test! Adding a couple of hours work each time they tested, and with version control,
etc., it can all be quite a lot of work.
Increased systems availability
Safeco uses BMPs with AR/CTL to run transactions and batch processing simultaneously. This provides a
higher level of availability which is another true advantage. We knew that we had to add checkpoint logic if
we were to run BMPs to ensure that there is no run away batch activity and protect the integrity of the online
environment. AR/CTL was a good match because whether using IMS or DB2 the calls are very similar, better
and more consistent for developers particularly as a lot of our programs access IMS and DB2
simultaneously.
The government agency currently has a daily batch window although they will soon use BMC AR/CTL to
enable online updates to run in parallel to the batch. The software support manager said We still get
contention between batch programs about three times a week when they change the schedule and dont
realize that they have a utility running that needs exclusive access to data. We estimate that using BMC
AR/CTL to enable parallel processing is going to give them two hours extra per day availability of their online
systems as well as resolving these contentions.
Petra Kopp at Euler Hermes said AR/CTL is very important for our main daily business because we aim to
have a 7 day 24 hour online system and we currently provide a 23 hour online service both in our offices and
globally. The online customers work during the night to enter their contracts. Overnight batch runs must not
lock databases so that they can do this.
We started with AR/CTL so that we could have online transactions in parallel with batch during the night.
Without AR/CTL we couldnt have offered the online opportunity. Without checkpoint restart there was no
possibility of parallel working as databases could be locked for too long. There are 10,000 online users
(registered) but maybe only 2,000 using it in any one day. There are around 1,200 In-house users with maybe
500 using it on a daily basis.
Online customers are very important customers as they do a lot of the work for themselves and we must take
great care of them. In-house users work online until 19:00, remote users till 06:30: i.e., we enabled a further
11 hours access for them. The remaining one hour is used for image copies for backup etc. We have 400,000
transactions in a day.
I asked her what she thought the cost of delaying the start of the online system in the morning might be per
hour and she estimated roughly 12,000.

BMC APPLICATION RESTART CONTROL


Ovum. This White Paper is a licensed product and is not to be photocopied

Published 09/2010
Page 19

WWW.OVUM.COM

WHITE PAPER

Migration avoidance
The systems manager at the medical laboratory said that his site had used AR/CTL to avoid migration
from VSAM to DB2 and had undertaken no conversions for four years.
He said BMC regards us as an exceptional site because we have 30 million records per VSAM file some with
nine indices and we had to get BMC to raise the 4MB limit to 8MB for AR/CTL.
It would be a huge effort to convert to DB2 and would need outside help. There are 3 million lines of code
which would cost well over $1 million, and the database conversion would be additional: certainly the full cost
would be over $2 million.
Risk mitigation and increased business value
The systems manager at the medical laboratory said We have a tight and restricted batch window. If we
fail to complete then the problems piggy back. Before we had AR/CTL we were close to missing our SLA as
we were within minutes of missing our batch window. After an abend, even with all hands on deck, we
wouldnt guarantee to complete in time. We asked the systems manager what the effect of missing that
window was. He said If we miss the batch SLA we go into the next day. We may miss weekend processing
or month end as we have 54 days on average outstanding invoices (24 days to process and 30 day payment
terms) totaling $3-4 million a day. If we miss it there is down time for the users doing billing. If we miss the
month end it could lengthen the time to receipt of payment to 84 days. Its happened and it hurts but now its
most unlikely to ever happen again.
Petra Kopp at Euler Hermes has a site that runs 1,000 batch jobs overnight to aggregate data, mainly in
support of a data warehouse applications tools. At the month end they run 2,000 batch jobs. Euler Hermes
needs BMC AR/CTL because they have a three-day window at the end of the month and that window is short.
They also have to launch some batch jobs during the day. If they get clashes or problems they have to restart
these batches, and without checkpointing it would be impossible as starting from the beginning would cost too
much time.
Petra said Some batch jobs are short some are long running from 1 to many hours, it depends. If a threehour job had to be restarted from the beginning, we wouldnt have the time required to complete all our work
in the three-day window. This is very stressful. If we missed the batch window people would have to wait,
especially the DW application, until the batches had finished before they could run their reports. At the
beginning of the month they need a lot of reports. If they have to wait, their managers cannot make decisions
based on the processed information. There might be as much as one, two or even three days delay whilst we
catch up. These reports are the basic of decisions and without them we will lose thousands of euros.
The central systems database leader at the global courier where they use BMC AR/CTL to resolve
problems that could effect their tracking systems said We get dozens and dozens of deadlocks in a day
although many of these are CICS transactions in CICS DBCTL (we dont use data sharing). They are mostly
background CICS tasks although some involve foreground CICS tasks, but not very many.
The ones that affect the batch jobs, some of which are scheduled automatically as messages, come in from
outside users, so they are not under operator control, and we just have to react to the failures they may
cause. These external messages are run ad hoc at random times during the day. If these updates fail, then
the stream of messages creates a backlog. In business terms some of these jobs are critical and a backlog of
15 minutes is bad news. Users get frustrated as they expect a response to these messages in a timely
fashion (within an hour) and these users could be anywhere in the world.

BMC APPLICATION RESTART CONTROL


Ovum. This White Paper is a licensed product and is not to be photocopied

Published 09/2010
Page 20

WWW.OVUM.COM

WHITE PAPER

We track 1.8 million consignments per day with a 20% increase at Christmas, but as rates are generally
increasing we expect that by March next year that may be the norm. If we send data through as a plane takes
off, it needs to all be there by the time the plane lands. If it isnt, then theres difficulty. They can carry on
operating because the most fundamental info is on the labels of the goods to enable delivery but it wouldnt
feed into the tracking system and so there would be uncertainty with regards to each packages location.
They would be scanning packages locally without the supporting info and would have to match these up when
the info eventually came through. In the meantime the database wouldnt show a proper status for that
consignment, which would make it difficult to interpret online. The mainframe isnt the only link in the chain,
and other problems may compound and extend the delay, so its important that the mainframe part is fast. It
would be damaging to our reputation. We could lose customers.
Table 2:

Contact Details

Corporate Headquarters

London Office

BMC Software

Assurance House

2101 CityWest Blvd.

Vicarage Road

Houston

Egham, Surrey

Texas 77042

TW20 9UY

USA

UK

Tel: +1 (800) 841 2031

Tel: +44 (0)1784 478 000

Email: databaseadministrationsolution@bmc.com
www.bmc.com/uk

www.bmc.com
Source: BMC

OVUM

Ovums Knowledge Centers are new premium services offering the entire suite of Ovum information in fully interactive formats.
To find out more about Knowledge Centers and our research, contact us:
Ovum Europe
119 Farringdon Road
London, EC1R 3DA
United Kingdom
t: +44 (0)20 7551 9000
f: +44 (0)20 7551 9090/1
e: info@ovum.com

Ovum Australia
Level 5, 459 Little Collins Street
Melbourne 3000
Australia
t: +61 (0)3 9601 6700
f: +61 (0)3 9670 8300
e: info@ovum.com

Ovum New York


245 Fifth Avenue, 4th Floor
New York, NY 10016
United States
t: +1 212 652 5302
f: +1 212 202 4684
e: info@ovum.com

All Rights Reserved


No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form by any means, electronic, mechanical, photocopying, recording or
otherwise, without the prior permission of the publisher, Ovum Europe Limited. Whilst every care is taken to ensure the accuracy of the information contained in this
material, the facts, estimates and opinions stated are based on information and sources which, while we believe them to be reliable, are not guaranteed. In particular, it
should not be relied upon as the sole source of reference in relation to the subject matter. No liability can be accepted by Ovum Europe Limited, its directors or
employees for any loss occasioned to any person or entity acting or failing to act as a result of anything contained in or omitted from the content of this material, or our
BMC
APPLICATION RESTART CONTROL
Published 09/2010
conclusions as stated. The findings are Ovums current opinions; they are subject to change without notice. Ovum has no obligation to update or amend the research or
to let anyone know if our opinions change materially.
Ovum. This White Paper is a licensed product and is not to be photocopied
Page 21
Ovum. Unauthorised reproduction prohibited
This report is a licensed product and is not to be reproduced without prior permission.

*174381*

S-ar putea să vă placă și