Sunteți pe pagina 1din 9

Modeling and Managing Software Process Efficiency and Effectiveness

By Gary Gack, Process-Fusion.net

This article is the third in a series. The first two articles describe the history and
definition of two important but generally under-utilized software metrics: the cost of
(poor) quality (CoQ) framework (a measure of efficiency), and defect containment (a
measure of effectiveness). Industry benchmark data for these metrics and for alternative
appraisal methods were reviewed. In this article I describe an approach to modeling and
managing efficiency and effectiveness that integrates these two metrics to provide both
prospective (leading) and retrospective (lagging) indicators of software project
outcomes. This model facilitates “simulation” of alternative strategies and provides
quantitative indications of the impact of alternatives under consideration.

Modeling and Managing Both Efficiency and Effectiveness

In this section I introduce models intended to help software organizations determine an


optimal combination of defect containment methods and resource allocation. These
models provide a mechanism to forecast the consequences of alternative assumptions
and strategies in terms of both efficiency (measured by non-value added [NVA] vs.
value-added [VA] percent of total effort) and delivered quality (measured by total
containment effectiveness [TCE], that is, the estimated percentage of defects removed
before software release to customers).

As George Box said, “All models are wrong--some are useful” (Box 1979). This model
uses parameters taken from a variety of public sources, but makes no claim that these
parameter values are valid or correct in any particular situation. It is hoped that the
reader will take away a thought process and perhaps make use of this or similar models
using parameter values appropriate and realistic in the intended context. It is widely
recognized that all benchmark values are subject to wide (and typically unstated)
variation. Many parameter values will change significantly as a function of project size,
application domain, and other factors.

The objectives of these models include the following:


• Predict: 1) delivered quality (effectiveness); and 2) total NVA effort
(efficiency) consequences of alternative appraisal strategies
• Predict defect insertion
– Focus attention on defects, which account for the largest share of total
development cost.

© 2011 Gar y A. Gack Page 1


– Enable early monitoring of the relationship between defects likely to be
present and those actually found; provide early awareness using leading
indicators.
• Estimate effort needed to execute the volume of appraisal necessary to find
the number of defects one forecasts to remove.
– A “sanity check” on the planned level of appraisal effort, that is, is it
actually plausible to remove an acceptable volume of defects with the
level of appraisal effort planned?
• Forecast both pre-release (before delivery) and post-release (after delivery)
NVA effort.
– When delivered quality is poor (TCE approximately 85 percent), the
example discussed next indicates post-release defect repair costs can be
50 percent of the original project budget. Lower quality will lead to even
higher post-release costs.

The complete model includes five tables; the first four include user-supplied
parameters and calculate certain fields based on those parameters. The fifth table
summarizes the results of the other four. This article will look at the summary first,
and then will provide an overview of the details upon which the summary is based.

I have defined a number of different scenarios (four of which we will examine in this
article), all based on an assumed size of 1000 function points, to illustrate use of the
model. Many other scenarios might be constructed. The first two scenarios assume
defects are “inserted” at U.S. average rates (Jones 2009, p.69) of a total of 5.0
defects per function point, including bad fixes and documentation errors. Scenarios 3
and 4 reflect results reported by high maturity groups in which defects inserted are
reduced to 2.7 per function point. These reductions are generally consistent with
best-in-class results reported in Jones (2008).

The table below summarizes key parameters associated with each of these
scenarios – again, these reflect a reasonable set of assumptions but there are an
almost infinite set of alternative assumptions that might be reasonable in a given
situation. This model is available to anyone interested on request – you are most
welcome to try this out with parameter values appropriate to your situation.

© 2011 Gar y A. Gack Page 2


Scenario 1, “Test Only” represents a scenario in which pre-test appraisals, such as
formal inspections, are not used – typical of level 1 and level 2 organizations.
Scenario 2, “Level 3”, introduces certain pre-test appraisals, including inspections
and static analysis to achieve results that are generally consistent with those
reported by organizations rated at level 3 in the CMMI scale. Scenario 3, “Level 5”,
achieves results typical of those reported by CMMI level 5 organization – fewer
defects are “inserted” due to effective prevention activities and as fewer defects
therefore enter test some test activities (unit and function tests) are discontinued.
Inspection effectiveness (percent of defects found) is assumed to be 60 percent in
scenario 2 and increases to 80% in scenarios 3 and 4. Scenario 4, “High Reliability”,
reflects a situation where safety or other concerns require maximum reliability with
cost as a secondary consideration. This scenario increases the percentage of work
products inspected to 100%, none of the test activities are eliminated as in scenario
3 and an additional test step (Field or “Beta” test) is added. Other model parameters,
discussed later, remain constant across all scenarios. Inspection percentages
indicate the portion of the work product inspected. Note that static analysis can only
be used for certain procedural languages such as C and Java.

© 2011 Gar y A. Gack Page 3


Two summaries are
developed – one that shows
the impact of alternative
appraisal strategies on
delivered quality as
measured by TCE, that is,
the percentage of defects
removed prior to delivery of
the software. In this
illustration, a “level 3” mix of
appraisal activities (in which
95.4% of defects are removed before the software is released) reduces delivered
defects by more than 70% percent compared to a test-only approach typically used
by average groups (in scenario 1 only 83.4% of defects are found before release).
Comparison of levels 1 and 2 isolates the impact of an improved mix of containment
methods independent of reductions in defect potentials indicated in scenarios 3 and
4.

The second summary illustrated


here covers scenarios 1 thorough
3 only – we’ll come back to
scenario 4 later. This graph shows
the impact of alternative appraisal
strategies on NVA effort as defined
in the CoQ framework, that is, all
appraisal and rework effort is by
definition NVA. Although certainly
a necessary evil, the goal will
always be to minimize these costs.
In the interest of simplicity, I assume ancillary costs such as defect tracking, change
management, and other defect-related activities are included in either appraisal or
rework, as they are in any case a small fraction of total costs.

In this illustration a scenario 2 mix of appraisal activities reduces total NVA effort
(including both pre- and post-release effort) by 40 percent compared to the scenario
1 test-only approach typically used by average groups (68.6 person months in
scenario 2 vs. 113.5 in scenario 1). More mature organizations, as a result of lower
defect insertion and improved inspection effectiveness, can reduce NVA by an
additional 38 percent (to 42.7 in scenario 3 vs. 68.6 in scenario 2).

© 2011 Gar y A. Gack Page 4


The largest gains in efficiency are realized when formal inspections are used, and
those gains are far larger post release than they are pre-release. Perhaps this
dynamic explains why so many organizations try inspections but do not sustain them
-- the largest savings are not evident at the time of release, and may in any case be
in someone else’s budget. So far we see that higher delivered quality and lower cost
coincide – better is cheaper. What happens in the high reliability scenario? Clearly,
costs go up substantially when TCE greater than about 98% is required.

As mentioned in the second article in this series (“Measuring Software Process


Efficiency”), the traditional manufacturing centric view of the cost of quality curve is a
bit misleading when applied to software. While it is true that very high levels of
reliability e.g., 98%+) will increase total NVA and therefore total cost, it is also true
that for most generally acceptable levels of reliability total NVA actually declines as
delivered quality improves – in software, as illustrated below, when and how
appraisals are done is far more important than the total amount spent.

© 2011 Gar y A. Gack Page 5


The “Cost of Quality Management” curve for software looks more like a “hockey
stick – declining at first, increasing only for high reliability scenarios – rather different
than the “traditional” manufacturing picture.
“Traditional”
Cost of Quality

Model Parameters

Four sets of parameter values, each in a separate Excel table, are required to
generate the summary conclusions described previously. One set of tables is used
for each scenario and is contained in a single Excel sheet (tab) dedicated to each
scenario. These tables and the parameters in each are as follows:
1. Defect insertion and removal forecast. This table contains a row for each
distinct appraisal step, for example, requirements inspection, design
inspection, code inspection, static analysis, and unit-function-integration-
system-acceptance tests. Any desired set may be identified. Defects inserted
are forecast on a “per size” basis, for example, using Jones benchmark value
per function point. Percent of defects forecast to be removed by each
appraisal step are also forecast, again using Jones benchmark values or
other values locally determined. Given these user-supplied parameters, the
model calculates defects found and remaining at each appraisal step and the
final TCE percent.
2. Inspection effort forecast. This table contains a row for three types of
inspections--requirements, design, and code. Other rows may be added. For
each row the user specifies the percentage of the work product to be
inspected--0 percent if the inspection is not performed as in the test-only
scenario, up to 100 percent. The user also specifies the expected average
number of defects to be found per inspection and the number of rework hours
forecast per defect. The model calculates the number of inspections required
to remove the number of defects forecast by Table 1, inspection person
months, rework person months, and total person months required.

© 2011 Gar y A. Gack Page 6


3. Static analysis effort forecast. User parameter values include hours per
size to perform static analysis and rework hours per defect. The model
calculates analysis person months, rework person moths, and total person
months.
4. Test effort forecast. User parameter values include test effort hours per size
for each test type to be included, rework hours per defect for each, and a
“pre-test impact factor.” This last parameter is used to estimate the impact
that will result if pre-test appraisals are in fact used. When pre-test methods
are used, there are several important consequences that impact test effort:
• The number of defects coming into any given test step will necessarily
be significantly fewer.
• Hence, fewer defects will be found, and less rework will be needed to
correct those defects. Variable costs will go down.
• Fewer defects incoming means fewer tests will fail. When a test fails it
will often prevent other tests from being executed--they are said to be
“blocked.”
• Fewer defects incoming also means fewer tests will need to be re-
executed to confirm the fix resolved the problem and did not cause
unintended secondary effects.
• In total, the length of the overall test cycle may be significantly shorter,
resulting in a reduction in total labor cost required; fixed costs may also
be less.

The pre-test impact factor is used to quantify the overall impact of these consequences;
in effect, this value indicates the percent reduction expected for a given test step due to
pre-test appraisals. The value may in some instances be 100 percent (1.0) if incoming
quality is believed to be sufficiently good to simply not do certain types of tests (for
example, unit tests).

Sanity Check

Does all of this really make sense? How does this “simulation” compare to what we
know about the real world? One way to evaluate the results of these models is to
examine their conclusions in relation to total effort likely to be devoted to an actual
project. Sanity checking any model is always a good idea. Experimenting with these
models has shown that some published parameter values lead to totally implausible
conclusions--for example, pre-release NVA effort can exceed total project effort when
some of these values are used. Obviously such a conclusion cannot be valid; at least
one parameter value must be incorrect when the results do not make sense.

© 2011 Gar y A. Gack Page 7


According to Jones (2008, p.295) an average 1000 function point project will require
about 97.5 person months of effort. The table below summarizes the results of the four-
scenario simulation, using the parameter values described previously, to illustrate how
those results relate to 97.5 person months of total effort.

In general, the sanity check seems consistent with published results (for example,
Jones 2009; 2010; Humphrey 2008) if one assumes the scenarios considered are
roughly representative of CMMI levels 1 to 5, respectively.

CONCLUSIONS AND NEXT STEPS

For many software organizations, judicious use of an optimal combination of appraisal


and prevention will lead to significant improvements in both effectiveness (percentage of
defects removed before delivery) and efficiency (total cost of delivered software,
considering both pre- and post-release costs). The model described can be used to
predict the consequences of alternatives under consideration using values appropriate
and realistic in any given situation. Give it a try!

The Excel model described here is available from the author at no cost – to obtain a
copy, including additional detail about the model and the parameters used, email
ggack@process-fusion.net

© 2011 Gar y A. Gack Page 8


REFERENCES

Box, G. 1979. Paper Robustness in the Strategy of Scientific Model Building.


Humphrey. 2008 World Conference on Software Quality keynote, “Faster-Cheaper-
Worse”
Jones, C. 2007. Estimating software costs, 2nd edition. New York: McGraw Hill.
Jones, C. 2008. Applied software measurement, 3rd edition. New York: McGraw Hill.
Jones, C. 2010. Software engineering best practices. New York: McGraw Hill.
Kan, S. 2003. Metrics and models in software quality engineering, 2nd edition. Boston:
Addison-Wesley.

BIOGRAPHY
Gary Gack is the founder and president of Process-Fusion.net, a provider of
assessments, strategy advice, training, and coaching relating to integration and
deployment of software and IT best practices. Gack holds an MBA from the Wharton
School, is a Lean Six Sigma Black Belt, and is an ASQ Certified Software Quality
Engineer. He has more than 40 years of diverse experience, including more than 20
years focused on process improvement. He is the author of many articles and a book
entitled Managing the Black Hole: The Executive’s Guide to Software Project Risk.
LinkedIn profile: http://www.linkedin.com/in/garygack.

© 2011 Gar y A. Gack Page 9