0 Issue PDF

Softwa re Quality CrossTalk
As the software in todays systems grow larger, it also contains more defects
The Software Quality Challenge CO-SPONSORS:
4 that adversely affect safety, security, and reliability of the systems. This article
DOD-CIO The Honorable John Grimes
explains why the common test-and-fix software quality strategy is no longer

OSD (AT&L) Kristen Baldwin
adequate, and offers some suggestions for improvement. NAVAIR Jeff Schwalb
by Watts S. Humphrey 76 SMXG Phil Perkins
309 SMXG Karl Rogers
11 Measuring
This article discusses two measures that have strong influences on the
Defect Potentials and Defect Removal Efficiency
DHS Joe Jarzombek
outcomes of software projects: defect potentials and defect removal
efficiency, and relates the positive effects that can be achieved by increasing
STAFF:
the defect removal efficiency to 95 percent and beyond.

MANAGING DIRECTOR Brent Baxter
by Capers Jones
PUBLISHER Kasey Thompson
MANAGING EDITOR Ken Davies
ASSOCIATE EDITOR Chelene Fortier-Lozancich
14 Quality
Asking, Would your company like to save $100,000 per day? this article
Processes Yield Quality Products
ARTICLE COORDINATOR Nicole Kentta
lists steps than can be taken to achieve that goal and more. The author PHONE (801) 775-5555
draws on 15 years of experience in process improvement to sift through
helpful tips including: not looking for a quick fix, keeping it short, and not
E-MAIL stsc.customerservice@
hill.af.mil
reinventing the wheel.
by Thomas D. Neff
CROSSTALK ONLINE www.stsc.hill.af.mil/
crosstalk
CrossTalk,The Journal of Defense Software

Engineering is co-sponsored by the Department of
Softwa re Engineering Technology Defense Chief Information Office (DoD-CIO); the
Office of the Secretary of Defense (OSD) Acquisition,
18 The Use and Limitations of Static-Analysis Tools to Improve Technology and Logistics (AT&L); U.S. Navy (USN);
U.S. Air Force (USAF); and the U.S. Department of
Advanced static-analysis tools have been found to be effective at finding

Software Quality Homeland Security (DHS). DoD-CIO co-sponsor:
defects that jeopardize system safety and security. In this article, the author
Assistant Secretary of Defense (Networks and
describes how these work and outlines their limitations.

Information Integration). OSD (AT&L) co-sponsor:
Software Engineering and System Assurance. USN co-
by Dr. Paul Anderson

sponsor: Naval Air Systems Command. USAF co-
sponsors: Oklahoma City-Air Logistics Center (ALC)
76 Software Maintenance Group (SMXG); and
Ogden-ALC 309 SMXG. DHS co-sponsor: National
22 Automated Combinatorial Test Methods Beyond Pairwise Cyber Security Division of the Office of
This article introduces new tools for automating the production of

Testing Infrastructure Protection.
complete test cases covering up to 6-way combinations, going beyond the Center (STSC) is the publisher of CrossTalk,
The USAF Software Technology Support
popular Pairwise testing. Pairwise (2-way) is low in cost, but is not sufficient of the journal. CrossTalks mission is to encourage
providing both editorial oversight and technical review
for assurance of mission-critical software.
by D. Richard Kuhn, Dr. Yu Lei, and Dr. Raghu Kacker
the engineering development of software to improve
the reliability, sustainability, and responsiveness of our
warfighting capability.
Open Forum
27 Software
The term software quality has many interpretations and meanings. The author
Quality Unpeeled
Subscriptions: Send correspondence concerning
helps readers understand the underlying considerations that underscore
subscriptions and changes of address to the following
software quality. Software quality is a lot more than standards, metrics

address.You may e-mail us or use the form on p. 30.
models, and testing, and the mystique behind this elusive area is explored.
517 SMXS/MXDEA
by Dr. Jeffrey Voas

6022 Fir AVE
BLDG 1238
Hill AFB, UT 84056-5820
Article Submissions:We welcome articles of interest
approved by the CROSSTALK editorial board prior to

to the defense software community. Articles must be
publication. Please follow the Author Guidelines, avail-

D ep art m e n t s
CROSSTALK does not pay for submissions. Articles
able at <www.stsc.hill.af.mil/crosstalk/xtlkguid.pdf>.
published in CROSSTALK remain the property of the

3 From the Publisher authors and may be submitted to other publications.
Reprints: Permission to reprint or post articles must
er and coordinated with CROSSTALK.
be requested from the author or the copyright hold-
10 Coming Events
Web Sites Trademarks and Endorsements:This Department of
for members of the DoD. Contents of CROSSTALK
Defense (DoD) journal is an authorized publication
ON THE COVER 13 Call for Articles are not necessarily the official views of, or endorsed
by, the U.S. government, the DoD, the co-sponsors, or
the STSC. All product names referenced in this issue
Cover Design by are trademarks of their companies.
Kent Bingham
17 Reader Results Request CrossTalk Online Services: See <www.stsc.hill.af.mil/
crosstalk>, call (801) 777-0857 or e-mail <stsc.web
master@hill.af.mil>.
BackTalk
Additional art services
provided by Janna Jensen
31 Back Issues Available: Please phone or e-mail us to
see if back issues are available free of charge.
2 CROSSTALK The Journal of Defense Software Engineering June 2008

From the Publisher
Quality Programming Begets Software Quality
J oe Jarzombek, Director for Software Assurance in the National Cyber Security

Division (NCSD) of the Department of Homeland Security (DHS, CrossTalks
co-sponsor), has given many keynote presentations at conferences in which he advo-
cates the need for security-enhanced processes and practices. His message at the
Software Engineering Process Group Conference in March was a snapshot of the ses-
sion he facilitated on February 8, 2008, on Security-Enhanced Quality Assurance and
Project Management: Mitigating Risks to the Enterprise at the Defense Acquisition
Universitys (DAU) Advanced Software Acquisition Management course. His opening message
set the theme for his DAU presentation and this issue of CrossTalk. Jarzombek said, With
todays global supply chain for information technology and software, the processes associated
with software engineering, quality assurance (QA), and project management must explicitly
address security risks posed by exploitable software. However, traditional processes do not
explicitly address software security risks that can be passed from projects to using organizations.
Mitigating supply chain risks requires an understanding and management of suppliers
process capabilities, products and services. Enterprise risks stemming from the supply chain are
influenced by suppliers and acquisition projects (including procurement, QA, and testing).
Software assurance processes and practices span development and acquisition.
Derived (non-explicit) security requirements should be elicited and considered. QA and
testing can integrate security considerations in their practices to enhance value in mitigating risks
to the enterprise.
He then asked the audience, What legacy do you intend to leave from the programs in
which you have project responsibilities? Is it one that contributes to a more resilient system and
enterprise or one that was simply good enough to get the customer to accept without an under-
standing of the residual risk passed to the end user?
Along those lines, this months issue of CrossTalk deals with software quality. In his arti-
cle The Software Quality Challenge, Watts S. Humphrey discusses how todays more complex soft-
ware offers greater challenges in the areas of safety, security and reliability. Capers Jones talks
about two measures that have a strong influence on the outcomes of software projects in
Measuring Defect Potentials and Defect Removal Efficiency. Asking the tantalizing question, Would
your company like to save $100,000 per day? Thomas D. Neff offers insights into how that is
possible in Quality Processes Yield Quality Products; while Dr. Paul Anderson discusses the advan-
tages and limitations of using static-analysis tools in The Use and Limitations of Static-Analysis Tools
to Improve Software Quality. Next, D. Richard Kuhn, Dr. Yu Lei, and Dr. Raghu Kacker offer new
tools for automating the production of complete test cases covering up to 6-way combinations
in Automated Combinatorial Test Methods Beyond Pairwise Testing, and Dr. Jeffrey Voas gives us a
look at the misunderstood term software quality in Software Quality Unpeeled.
For additional information, I want to remind everybody that DHS NCSD offers free
resources related to security-enhanced quality assurance, project management, and software engineering via
their software assurance and BuildSecurityIn Web sites at: <www.us-cert.gov/swa> and
<https://buildsecurityin.us-cert.gov>. More can also be learned at the World Congress for
Software Quality, a major international gathering of software quality professionals that will take
place September 15-18, 2008 in Bethesda, Maryland. For more information on the conference,
visit <www.asq.org/conferences/wcsq>.
Finally, I would like to thank Beth Starrett for the exemplary work she has done over the
past eight years as publisher of CrossTalk. During her tenure at the helm the publication
has thrived, and has broadened both its scope and depth in the field of defense software engi-
neering. With her departure, we welcome Kasey Thompson as our new publisher. Kasey has
long been associated with CrossTalk, and we look forward to a new era of cutting-edge
articles, innovative features, and ever-evolving quality information on the subjects that you,
our readers, demand.
Brent D. Baxter
Managing Director
June 2008 www.stsc.hill.af.mil 3

Software Quality
The Software Quality Challenge

Watts S. Humphrey
The Software Engineering Institute
Many aspects of our lives are governed by large, complex systems with increasingly complex software, and the safety, securi-
ty, and reliability of these systems has become a major concern. As the software in todays systems grows larger, it has more
defects, and these defects adversely affect the safety, security, and reliability of the systems. This article explains why the com-
mon test-and-fix software quality strategy is no longer adequate, and characterizes the properties of the quality strategy we
T
must pursue to solve the software quality problem in the future.
oday, many of the systems on which defects per page while even poor-quality different ways they can be used, and the
our lives and livelihoods depend are software has much less than one defect more ways users can use them, the harder
run by software. Whether we fly in air- per listing page. This means that the qual- it is to test all of these conditions in
planes, file taxes, or wear pacemakers, our ity level of even poor-quality software is advance. This was the logic behind the
safety and well being depend on software. higher than that obtained for other kinds beta-testing strategy started at IBM with
With each system enhancement, the size of human written text. Programming is an the OS/360 system more than 40 years
and complexity of these systems increase, exacting business, and these professionals ago. Early copies of the new system releas-
as does the likelihood of serious prob- are doing extraordinarily high quality es were sent to a small number of trusted
lems. Defects in video games, reservations work. The only problem is that based on users and IBM then fixed the problems
systems, or accounting programs may be historical trends, future systems will be they found before releasing the public ver-
inconvenient, but software defects in air- much larger and more complex than sion. This strategy was so successful that it
craft, automobiles, air traffic control sys- today, meaning that just to maintain has become widely used by almost all ven-
tems, nuclear power plants, and weapons todays defect levels, we must do much dors of commercial software.
systems can be dangerous. higher quality work in the future. Unfortunately, however, the beta-test-
Everyone depends on transportation To appreciate the challenge of achieving strategy is not suitable for life-critical
networks, hospitals, medical devices, pub- ing 10 or fewer defects per million lines of systems. The V-22 Osprey helicopter, for
lic utilities, and the international financial code, consider what the source listing for example, uses a tilting wing and rotor sys-
infrastructure. These systems are all run such a program would look like. The list- tem in order to fly like an airplane and
by increasingly complex and potentially ing for a 1,000-line program would fill 40 land like a helicopter. In one test flight, the
defective software systems. Regardless of text pages; a million-line program would hydraulic system failed just as the pilot was
whether these large life-critical systems are take 40,000 pages. Clearly, finding all but tilting the wing to land. While the aircraft
newly developed or composed from mod- 10 defects in 40,000 pages of material is had a built-in back-up system to handle
ified legacy systems, to be safe or secure, humanly impossible. However, we now such failures, the aircraft had not been
they must have quality levels of very few have complex life-critical systems of this tested under those precise conditions, and
defects per million parts. scale and will have much larger ones in the the defect in the back-up systems soft-
Modern, large-scale systems typically relatively near future. So we must do ware had not been found. The defect
have enormous requirements documents, something, but what? That is the question caused the V-22 to become unstable and
large and complex designs, and millions of addressed in this article. crash, killing all aboard.
lines of software code. Uncorrected errors The problem is that as systems
in any aspect of the design and develop- become more complex, the number of
ment process generally result in defects in possible ways to use these systems grows
Why Defective Systems Work
To understand the software quality prob-
the operational systems. The defect levels lem, the first question we must answer is If exponentially. The testing problem is fur-
of such operational systems are typically todays software is so defective, why arent there ther complicated by the fact that the way
measured in defects per thousand lines of more software quality disasters? The answer is such systems are configured and the envi-
code. A one million line-of-code system that software is an amazing technology. ronments in which they are used also
with the typical quality level of one defect Once you test it and fix all of the prob- affect the way the software is executed.
per 1,000 lines would have 1,000 undis- lems found, that software will always work Table 1 lists some of the variations that
covered defects, while any reasonably safe under the conditions for which it was test- must be considered in testing complex
system of this scale must have only a very ed. It will not wear out, rust, rot, or get systems. An examination of the number
few defects, certainly less than 10. tired. The reason there are not more soft- of possibilities for even relatively simple
ware disasters is that testers have been systems shows why it is impractical to test
The Need for Quality able to exercise these systems in just about all possibilities for any complex system. So
all of the ways they are typically used. So, why is complex software so defective?
to solve the software quality problem, all
Software
Before condemning programmers for
doing sloppy work, it is appropriate to we must do is keep testing these systems Some Facts
consider the quality levels of other types in all of the ways they will be used. So Software is and must remain a human-
of printed media. A quick scan of most what is the problem? produced product. While tools and tech-
books, magazines, and newspapers will The problem is complexity. The more niques have been devised to automate the
reveal at least one and generally more complex these systems become, the more production of code once the requirements

A The Software Quality Challenge
and design are known, the requirements

and design must be produced by people. 1. Data rates Switches Paths
Further, as systems become increasingly 2. Data values 1 2
complex, their requirements and design
3. Data errors 4 6
grow increasingly complex. This complex-
4. Configuration variations 9 20
ity then leads to errors, and these errors
5. Number, type, and timing of 16 70
result in defects in the requirements,
simultaneous processes 36 924
design, and the operational code itself.

6. Hardware failures 49 3,432
Thus, even if the code could be automati-

7. Network failures 64 12,870
cally generated from the defective require-

8. Operator errors 81 48,620
ments and design, that code would reflect

9. Version changes 100 184,756
these requirements and design defects

10. Power variations 400 1.38E+11
and, thus, still be defective. Table 1: Some of the Possible Testing Table 2: Possible Paths Through a Network
When people design things, they make Variations defects. For example, a complex design
mistakes. The larger and more complex likely to be exercised when such systems defect that produced a confusing operator
their designs, the more mistakes they are are
300subjected to the stresses of high trans- message could pose no danger while a
likely to make. From course data on thou- action volume, accidents, failures, or mili- trivial typographical mistake that changed
Design/Code
sands of experienced engineers learning tary combat. a no to a yes could be very dangerous.
the Personal Software ProcessSM (PSPSM), it 250
Since there is no way to tell in advance
has been found that developers typically which defects would be damaging, we
thousand lines of code
inject about 100 defects into every 1,000

The Defect Removal Problem
A200
defect is an incorrect or faulty con- must try to find them all. Then, after find-
lines of the code they write [1]. The distri- struction in a product. For software,
Total defects/
ing them, we must fix at least all of the

bution for the total defects injected by 810 defects
150 generally result from mistakes that ones thatReview
would be damaging.
experienced developers at the beginning of
Design Time
the designers or developers make as they
PSP training is shown by the total bars in produce their products. Examples are The Testing Problem
Figure 1. While there is considerable varia- oversights, misunderstandings, and typos. Since defects could be anywhere
100
in a large
tion and some engineers do higher-quality Furthermore, since defects result from software system, the only
Total
way testing
work, just about everybody injects defects. mistakes, they are not logical. As a conse- could find them all would be to complete-
Test
Developers use various kinds of tools

50
quence, there is no logical or deductive ly test every segment of code in the entire
to generate program code from their process that could possibly find all of the program. To understand this issue, consid-
designs, and they typically find and fix defects 1st
in1.aQuartile
system. They could be 3rd
any-Quartile
er the program structure in Figure 2. This
0
Data rates
about half of their defects during this where, and the only way to find all of the code fragment has one branch instruction
2nd Quartile 4th Quartile
process. This means that about 50 defects
2. Data values
defects with testing is to exhaustively test at point B; threeUnitsegments: A to B, B to C,
per 1,000 lines of code remain at the start
3. Data errors Test D/KLOC
every path, function, or system condition. and B to D; and two possible paths or
of initial testing. Again, the distribution of
4. Configuration variations
This leads to the next question which routes through the fragment: A-B-C and
the defects found in initial testing is also
5. Number, type, and timing of
concerns the testing objective: Must we A-B-D. So, for a program fragment like
simultaneous processes
shown by the test bars in Figure 1.
Developers generally test their pro- find all of the defects, or couldnt we just this, there could be defects on any of the
6. Hardware failures
grams until they run without obvious fail- find and fix those few that would be dan- code segments as well as in the branch
7. Network failures
ures. Then they submit these programs to gerous? 8.Obviously, we only need to fix instruction itself.
Operator errors
systems integration and testing where they the defects that would cause trouble, but
9. Version changes
For a large program, the numbers of C
are combined with other similar programs there is no10.
wayPower
to determine which defects possible paths or routes through a pro-
variations
into larger and larger sub-systems and sys- these are without examining all of the gram can vary by program type, but pro-
tems for progressively larger-scale testing. Figure 1: Total and Test Defect Rates of 810 Experienced Engineers
The defect content of programs entering
systems testing typically ranges between
10 and 20 defects per 1,000 lines.
The most disquieting fact is that test-
A 300 B
ing can only find a fraction of the defects
in a program. That is, the more defects a
250
program contains at test entry, the more it
thousand lines of code
is likely to have at test completion. The 200

reason for this is the point previously D
Total defects/
made about extensive testing. Clearly, if

defects are randomly sprinkled through-
150
out a large and complex software system,
some of them will be in the most rarely 100
used parts of the system and others will
be in those parts that are only exercised
Total
under failure conditions. Unfortunately,

Test
50
these rarely used parts are the ones most
0
SM
Personal Software Process and PSP are service marks of
Carnegie Mellon University. 1st Quartile 2nd Quartile 3rd Quartile 4th Quartile

Software Quality
in unusual ways, their software is most

likely to encounter undiscovered defects.
C Fifth, under these stressful conditions,
these systems are least likely to operate
correctly or reliably.
Therefore, with the current commonly
used test-based software quality strategy,
A B large-scale life-critical systems will be least
reliable in emergencies and that is when
reliability is most important.
Successful Quality Strategies

Organizations have reached quality levels
of a few defects per million parts, but these
D have been with manufacturing and not
design or development processes. In the
Figure 2: A Three-Segment Code Fragment manufacturing context, the repetitive work
grams generally have about one branch practical purposes, be considered infinite. is performed by machines, and the quality
instruction for every 10 or so lines of code. Furthermore, even if comprehensive path challenge is to consistently and properly
This means that a million-line program testing were possible, more than path test- follow all of the following steps:
would typically have about 100,000 branch ing would be required to uncover the Establish quality policies, goals, and
instructions. To determine the magnitude defects that involved timing, synchroniza- plans.
of the testing problem for such a system, tion, or unusual operational conditions. Properly set up the machines.
we must determine the number of test Keep the machines supplied with high-
paths through a network of 100,000 quality parts and materials.
Maintain the entire process under con-
Conclusions on Testing
switches. Here, from Figure 3, we can cal- At this point, several conclusions can be
culate that, for a simple network of 16 drawn. First, todays large-scale systems tinuous statistical control.
switches, there are 70 possible paths from typically have many defects. Second, these Evaluate the machine outputs.
A to B. As shown in Table 2, the number of defects generally do not cause problems Properly handle all deviations and
possible paths through larger networks as long as the systems are used in ways problems.
grows rapidly with 100 switches having that have been tested. Third, because of Suitably package, distribute, or other-
184,756 possible paths and 400 switches the growing complexity of modern sys- wise handle the machine outputs.
having 1.38E+11 possible paths. Clearly, tems, it is impossible to test all of the Consistently strive to improve all
the number of possible paths through a ways in which such systems could be aspects of the production and evalua-
system with 100,000 switches could, for used. Fourth, when systems are stressed tion processes.
While these eight steps suggest some
Figure 3: Possible Paths Through a 16-Switch Network
approaches to consider for software devel-
opment, they are not directly applicable for
B human-intensive work such as design and
development. However, by considering an
analogous approach with people instead of
2
machines, we begin to see how to proceed.

Possible
Path The Eight Elements of
Software Quality Management
The eight steps required to consistently
produce quality software are based on the
five basic principles of software quality
shown in the Software Quality Principles
sidebar. With these principles in mind, we
can now define the eight steps required for
an effective software quality initiative.
1. Establish quality policies, goals, and
plans.
2. Properly train, coach, and support the
developers and their teams.
3. Establish and maintain a requirements
quality-management process.
4. Establish and maintain statistical con-
trol of the software engineering process.
5. Review, inspect, and evaluate all prod-
A uct artifacts.
6. Evaluate all defects for correction and

to identify, fix, and prevent other simi-

lar problems. Software Quality Principles*
7. Establish and maintain a configuration
management and change control sys- 1. Properly managed quality programs reduce total program cost, increase business
tem. value and quality of delivered products, and shorten development times.
8. Continually improve the development 1.1. If cost or development times increase, the quality program is not being proper-
process.
ly implemented.
The following sections discuss each of
1.2. The size of a product, including periodic reevaluation of size as changes occur,
these eight steps and relate them to the

must be estimated and tracked.
software quality principles as shown in the

1.3. Schedules, budgets, and quality commitments must be mutually consistent and
sidebar.
based on sound historical data and estimating methods.
1.4. The development approach must be consistent with the rate of change in
requirements.
Step 1: Quality Policies, Goals, and 2. To get quality work, the customer must demand it.
2.1. Attributes that define quality for a software product must be stated in measur-
Policies, goals, and plans go together and

Plans able terms and formally agreed to between developers and customers as part
form the essential foundation for all effec-

of the contract. Any instance of deviation from a specified attribute is a defect.
tive quality programs. The fundamental

2.2. The contract shall specify the agreed upon quality level, stated in terms of the
policy that forms the foundation for the

acceptable quantity or ratio of deviations (defects) in the delivered product.
3. The developers must feel personally responsible for the quality of the products they
quality program is that quality is and must produce.
be the first priority. Many software devel-
3.1. The development teams must plan their own work and negotiate their commit-
opers, managers, and customers would
ments with management and the customer.
argue that product function is critical and

3.2. Software managers must provide appropriate training for developers.
that project schedule and program cost are

3.3. A developer is anyone who produces a part of the product, be it a designer, doc-
every bit as important as quality. In fact,

umenter, coder, or systems designer.
they will argue that cost, schedule, and

4. For the proper management of software development projects, the development
teams themselves must plan, measure, and control the work.
quality must be traded off. 4.1. Project teams must have knowledge and experience in the relevant technolo-
The reason this is a policy issue is given gies and applications domains commensurate with project size and other risk
in the first principle of software quality
factors.
stated in the sidebar: Properly managed quali-
4.2. Removal yield at each step and in total pre-delivery must be measured.
ty programs reduce total program cost, increase

4.3. Effort associated with each activity must be recorded.
business value and quality of delivered products,

4.4. Defects discovered by each appraisal method must be recorded.
and shorten development times. Customers

4.5. Measurements must be recorded by those performing the activity and be ana-
lyzed by both developers and managers.
must demand quality work from their sup- 5. Software management must recognize and reward quality work.
pliers and management must believe that if 5.1. Projects must utilize a combination of appraisal methods sufficient to verify the
the quality program increases program
agreed defect levels.
costs or schedules, that quality program is
5.2. Managers must use measures to ensure high quality and improve processes.
not properly managed. There is, in fact, no

5.3. Managers must use measurements with due respect for individuals.
cost/schedule/quality trade-off: manage * These principles were defined by a group of 13 software quality experts convened by Taz Daughtrey. The
quality properly, and cost and schedule

experts are: Carol Dekkers, Gary Gack, Tom Gilb, Watts Humphrey, Joe Jarzombek, Capers Jones, Stephen
improvements will follow. Everyone in the

Kan, Herb Krasner, Gary McGraw, Patricia McQuaid, Mark Paulk, Colin Tully, and Jerry Weinberg.
organization must understand and accept the developers, their teams, management, the skills required to measure and manage
this point: it is always faster and cheaper to and the customer. When defective work is the quality of their work, requires training.
do the job right the first time than it is to found, it must be promptly fixed. The While it would be most desirable for them
waste time fixing defective products after principle is that defects cost money. The to get this skill and the required knowledge
they have been developed. longer they are left in the product, the before they graduate from college, practic-
Once the basic quality policy is in more work will be built on this defective ing software developers must generally
place, customers, managers, and develop- foundation, and the more it will cost to learn them from using methods such as
ers must then establish and agree on the find and fix them [2]. the PSP.
quality goals for each project. The princi- With properly trained developers, the
pal goal must be to find and remove all development teams then need proper
defects in the program at the earliest pos- management, leadership, and coaching.
Step 2:Train and Coach Developers
sible time, with the overall objective of Quality work is not done by accident; it Again, the Team Software ProcessSM
and Teams
removing all defects before the start of takes dedicated effort and properly skilled (TSPSM) can provide this guidance and sup-
integration and system test. With the goals and motivated professionals. The third port [3, 4, 5].
established, the development teams must principle of software quality is absolutely
make measurable quality plans that can be essential: The developers must feel personally
tracked and assessed to ensure that the
Step 3: Manage Requirements
responsible for the quality of the products they pro-
project is producing quality work. This in duce. If they do not, they will not strive to One fundamental truth of all quality pro-
Quality
turn requires that the quality of the work produce quality results, and later trying to grams is that you must start with a quality
be measured at every step, and that the find and fix their defects will be costly, foundation to have any hope of producing
quality data be reviewed and assessed by time consuming, and ineffective. a quality result. In software, requirements
Team Software Process and TSP are service marks of Convincing developers that quality is their are the foundation for everything we do,
so the quality of requirements is para-
SM
Carnegie Mellon University. personal responsibility and teaching them

Software Quality
mount. However, the requirements quality and agreement reached on how to incor- and manage the quality of the process used
problem is complicated by two facts. porate this new understanding into the to produce the programs parts. If, for
First, the quality measures must not be development work. This means that the example, we could devise a process that
abstract characteristics of a requirements
document; they should be precise and mea-
surable items such as defect counts from
B
requirements must be recognized as evolv-
ing through a sequence of versions while
the development estimates, plans, and
would consistently produce 1,000-line
modules that2each had less than a one per-
cent chance of having a single defect, a sys-
requirements inspections or counts of commitments are progressing through a tem of 1,000 of these modules would like-
requirements
Possible defects found in system test similar but delayed sequence of versions. ly have less than 10 defects per million
or customer
Pathuse. However, to be most help- And finally, the product itself will ulti- lines. One obvious problem with this strat-
ful, these quality measures must also address mately be produced in a further delayed egy concerns our ability to devise and
the precise understanding the developers sequence of versions. The quality manage- properly use such a process.
themselves have of the requirements ment problem concerns managing the There has been considerable progress
regardless of what the requirements origi- quality and maintaining the synchroniza- in producing and using such a process.
nators believe or how good a requirements tion of this sequence of parallel require- This is accomplished by measuring each
document has been produced. The develop- ments, plan, design, and product versions. developers process and producing a
ers will build what they believe the require- Process Quality Index (PQI). The TSP
ments say and not what the requirements quality profile, which forms the basis for
developers intended to say. This means that While statistical process control is a large the PQI measure, is shown in Figure 5 [6].
Step 4: Statistical Process Control
the quality-management problem the subject, we need only discuss two aspects: Then, the developers and their teams use
requirements process must address is the process management and continuous standard statistical process management
transfer of understanding from the require- process improvement. The first aspect, techniques to manage the quality of all
ments experts to the software developers. process management, is discussed here, dimensions of the development work [7].
The second key requirements fact is and process improvement is addressed in Data on early TSP teams show that by fol-
that the requirements are dynamic. As peo- Step 8. lowing this practice, quality is substantially
ple learn more about what the users need The first step in statistical process improved [8].
and what the developers can build, their management is to redefine the quality
A
views of what is needed will change. This
fact enormously complicates the require-
ments-management problem. The reason
management strategy. To achieve high lev-
els of software quality, it is necessary to
switch from looking for defects to manag-
Quality evaluation has two elements: eval-
Step 5: Quality Evaluation
uating the quality of the process used to

is that peoples understanding of their ing the process. As noted earlier, to achieve produce each product element, and evalu-
needs Switches
evolves gradually and Paths
often without a quality level of 10 defects per million ating the quality of the products produced
any conscious appreciation of how much lines with current software quality manage- by that process. The reason to measure
their views have changed. There is also a
1 2
ment methods, the developers would have and evaluate process quality, of course, is
time lag: Even when the users know that
4 6
to find and fix all but 10 of the 10,000 to to guide the process-improvement activi-
their needs16have changed, it takes
9 20
time for 20,000 defects in a program with a 40,000 ties discussed in Step 8. The Capability
them to truly understand their
70
new ideas page listing. Unless someone devises a Maturity Model Integration (CMMI)
and to communicate
36
them to3,432
the develop-
924
magic machine that could flawlessly identi- model and appraisal methods were devel-
ers. Even 64 after the developers
49
understand fy every software defect, it would be clear- oped to guide process-quality assessments,
the changes, they cannot just ly impossible to improve human search and the TSP process was developed to
48,620 every-
drop
12,870
thing and100 switch to the new184,756

81
version. and analysis skills to this degree. guide organizations in defining, using, and
To implement a change,1.38E+11
the design and Therefore, achieving these quality levels improving high-quality processes as well
implementation implications of every
400
through better testing, reviews, or inspec- as in measuring, managing, and evaluating
requirements change must be appraised; tions is not feasible. product quality.
plans, costs, and commitments adjusted; A more practical strategy is to measure To evaluate process quality, the devel-
opers and their teams must gather data on
Figure 5: TSP Software Quality Profile [6] their work, and then evaluate these data
against the goals they established in their
quality plan. If any process or process
Design/Code Time
step falls below the team-defined quality
threshold, the resulting products must be
evaluated for repair, redevelopment, or
replacement, and the process must be
brought into conformance. These actions
Design Review Time Compile D/KLOC must be taken for every process step and
especially before releasing any products
from development into testing. In product
evaluation, the system integration and
testing activities are also measured and
evaluated to determine if the final product
has reached a suitable quality level or if
some remedial action is required.

Capability Maturity Model and CMMI are registered in the
U.S. Patent and Trademark Office by Carnegie Mellon
Unit Test D/KLOC Code Review Time University.

Regardless of the quality management

methods used (i.e., International Organ-
References
Perhaps the most important single step in 1. Humphrey, W.S. PSP: A Self-Im-
Step 6: Defect Analysis
any quality management and improvement ization for Standardization, correctness-by- provement Process for Software Engi-
system concerns defect data. Every defect construction, or AS9100) continuous neers. Reading, MA: Addison-Wesley,
found after development, whether by final improvement strategies such as those 2005.
testing, the users, or any other means must defined by CMMI and TSP should be 2. Jones, C. Software Quality: Analysis
be carefully evaluated and the evaluation applied to the improvement process itself. and Guidelines for Success. New York:
results used to improve both the process This means that the process quality mea- International Thompson Computer
and the product. The reason that these sures, the evaluation methods, and the deci- Press, 1997.
data are so important is that they concern sion thresholds must also be considered as 3. Humphrey, W.S. Winning With Soft-
the process failings. Every defect found important aspects of continuous process ware: An Executive Strategy. Reading,
after development represents a failure of improvement. Furthermore, since every MA: Addison-Wesley, 2002.
the development process, and each such developer, team, project, and organization 4. Humphrey, W.S. TSP: Leading a De-
failure must be analyzed and the results is different, it means that this continuous velopment Team. Reading, MA:
used to make two kinds of improvements. improvement process must involve every Addison-Wesley, 2006.
The first improvement and the one person on every development team and on 5. Humphrey, W.S. TSP: Coaching Devel-
that requires the most rapid turnaround every project in the organization. opment Teams Reading, MA: Addi-
time is determining where in the product son-Wesley, 2006.
similar defects could have been made and Conclusion 6. Humphrey, W.S. Three Dimensions of
taking immediate action to find and fix all While we face a major challenge in improv- Process Improvement, Part III: The
of those defects. The second improvement ing software quality, we also have substan- Team Process CrossTalk Apr. 1998.
activity is to analyze these defects to deter- tial and growing quality needs. It should 7. Florac, S., and A.D. Carleton. Measur-
mine how to prevent similar defects from now be clear to just about everyone in the ing the Software Process: Statistical
being injected in the future, and to devise a software business that the current testing- Process Control for Software Process
based quality strategy has reached a dead Improvement. Reading, MA: Addison
means to more promptly find and fix all
end. Software development groups have Wesley, 1999.
such defects before final testing or release
struggled for years to get quality improve- 8. Davis, N., and J. Mullaney. Team
to the user.
ments of 10 to 20 percent by trying differ- Software Process in Practice. SEI
ent testing strategies and methods, by Technical Report CMU/SEI-2003-TR
experimenting with improved testing tools, -014, Sept. 2003.
For any large-scale development effort,
Step 7: Configuration Management
and by working harder.
configuration management (CM) is critical. The quality improvements required are
This CM process must cover the product vast, and such improvements cannot be
About the Author
artifacts, the requirements, the design, and achieved by merely bulling ahead with the
the development process. It is also essen- Watts S. Humphrey
test-based methods of the past. While the
tial to measure and manage the quality of joined the Software En-
methods described in this article have not
the CM process itself. Since CM processes yet been fully proven for software, we now gineering Institute (SEI)
are relatively standard, however, they need have a growing body of evidence that they after his retirement from
not be discussed further. will work at least better than what we IBM. He established the
have been doing. What is more, this quali- SEIs Process Program
ty strategy uses the kinds of data-based and led development of the CMM for
The fundamental change required by this
Step 8: Process Improvement
methods that can guide long-term contin- Software, the PSP, and the TSP. He man-
software quality-management strategy is to uous improvement. In addition to improv- aged IBMs commercial software devel-
use the well-proven methods of statistical ing quality, this strategy has also been opment and was vice president of tech-
process control to guide continuous shown to save time and money.
process improvement [7]. Here, however, nical development. He is an SEI Fellow,
Finally, and most importantly, software an Association of Computing Machin-
we are not talking about improving the tol- quality is an issue that should concern every-
erances of machines or the purity of mate- ery member, an Institute of Electrical
one. Poor quality software now costs each of
rials; we are talking about managing the us time and money. In the immediate future, and Electronics Engineers Fellow, and a
quality levels of what people do, as well as it is also likely to threaten our lives and liveli- past member of the Malcolm Baldrige
the quality levels of their work products. hoods. Every one of us, whether a develop- National Quality Award Board of
While people will always make mistakes, er, a manager, or a user, must insist on qual- Examiners. In a recent White House cer-
they tend to make the same mistakes over ity work; it is the only way we will get the emony, the President awarded him the
and over. As a consequence, when devel- kind of software we all need. National Medal of Technology. He
opers have data on the defects they per- holds graduate degrees in physics and
sonally inject during their work and know business administration.
how to use these data, they and their team-
Acknowledgements
My thanks to Bob Cannon, David
mates can learn how to find just about all Carrington, Tim Chick, Taz Daughtrey,
of the mistakes that they make. Then, in Harry Levinson, Julia Mullaney, Bill
SEI
defining and improving the quality-man- Nichols, Bill Peterson, Alan Willett, and
4500 Fifth AVE
agement process, every developer must use Carol Woody for reviewing this article and
Pittsburgh, PA 15213-2612
these data to optimally utilize the full range offering their helpful suggestions. I also Phone: (412) 268-6379
of available defect detection and preven- much appreciate the constructive sugges- Fax: (412) 268-5758
tion methods. tions of the CrossTalk editorial board. E-mail: watts@sei.cmu.edu

Software Quality
COMING EVENTS WEB SITES

June 30-July 2
Software Quality Profile article proposes innovative solutions to
The 17 International Conference on
th
www.sei.cmu.edu/publications/articles/ todays emerging software quality issues.
Software Engineering and Data quality-profile/index.html
Engineering The software community has been slow Society for Software
to use data to measure software quality. Quality
Los Angeles, CA www.ssq.org
This article discusses the reasons for this
http://sce.cl.uh.edu/sede08/ problem, and describes a way to use The Society for Software Quality (SSQ) is
process measurements to assess product a membership organization for those inter-
July 14-17 quality. When the correct data are gath- ested in promoting quality as a universal
2008 World Congress in Computer Science, ered for every engineer and for every step goal for software. The SSQ promotes
Computer Engineering and Applied of the development process, a host of increased knowledge and interest in the
quality measures can be derived to evalu- technology associated with the develop-
Computing Conference ment and maintenance of quality software.
ate software quality.
Las Vegas, NV
www.world-academy-of The Software Quality Page Better Software Magazine
-science.org/worldcomp08/ws/ www.swquality.com/users/pustaver/ www.stickyminds.com/bettersoftware/
conferences index.shtml magazine.asp
Here is your connection to the world of Better Software is the magazine for soft-
software quality, standards, and process ware professionals who care about quali-
July 16-17
improvement. The Software Quality ty. Each issue addresses relevant, timely
National Security Space Policy and Web site contains links to areas including information to help with building better
Architecture Symposium software quality and testing, software software. Better Software delivers in-depth
Chantilly, VA inspections and reviews, quality and articles on testing, tools, defect tracking,
process metrics, software quality assur- metrics, and management, and is the
www.ndia.org
ance, and other standards, as well as pro- only commercial magazine exclusively
vides helpful links to other software and dedicated to software professionals.
July 20-24 quality organizations.
International Symposium on Software Software Test and
Testing and Analysis American Society for Performance Magazine
Seattle, WA Quality www.stpmag.com
www.asq.org/pub/sqp/ Software Test & Performance is written for
http://issta08.rutgers.edu This site offers articles and discussion on software and application development
basic concepts, quality tools, organiza- managers, project managers, team lead-
July 27-28 tion-wide approaches, people creating ers, and test and quality assurance man-
The 44 Annual Aerospace and Defense
th quality, using data, specific applications, agers. Articles in the magazine provide
Contract Management Conference and other software quality-related issues. useful information to help those in the
Garden Grove, CA field understand trends and emerging
Software Q&A and Testing technologies, come to grip with new and
www.ncmahq.org/meetings/ADC06 Resource Center timeless challenges, adopt new best prac-
www.softwareqatest.com
There are many categories of questions tices concepts, and ultimately make bet-
July 28-30 ter decisions to improve software quality.
and answers when it comes to software
Night Vision Systems quality and assurance testing. This Web
Alexandria, VA site breaks them down into categories Handbook of Software
www.iqpc.com/ShowEvent.aspx? that include frequently asked questions, Quality Assurance
not-so-frequently asked questions, test- www.amazon.com/handbook-software
id=97070 -quality-assurance-3rd/dp/0130104701
ing resources, test tools, site management
tools, jobs, news, and more. The software industry has witnessed a
2009 dramatic rise in the impact and effective-
Why Software Quality ness of software quality assurance. From
Matters its infancy when a handful of software
www.baselinemag.com/c/a/projects pioneers explored the first applications of
2009 Systems and Software -processes/why-software-quality-matters quality assurance to the development of
As software spreads from computers into software, software quality assurance has
Technology Conference
auto engines, factory robots, hospital X- become integrated into all phases of soft-
Salt Lake City, UT ray machines and elsewhere, defects are ware development. This handbook capi-
www.sstc-online.org no longer a problem to be managed. talizes on the talents and skills of the
COMING EVENTS: Please submit coming events that They must be predicted and excised or experts who deal with the implementa-
are of interest to our readers at least 90 days else unanticipated uses will lead to unin- tion of software quality assurance on a
before registration. E-mail announcements to:
nicole.kentta@hill.af.mil.
tended consequences. This intriguing daily basis.

Measuring Defect Potentials and
Defect Removal Efficiency
Capers Jones
Software Productivity Research, LLC
There are two measures that have a strong influence on the outcomes of software projects: 1) defect potentials and 2) defect
removal efficiency. The term defect potentials refers to the total quantity of bugs or defects that will be found in five software arti-
facts: requirements, design, code, documents, and bad fixes, or secondary defects. The term defect removal efficiency refers to the
percentage of total defects found and removed before software applications are delivered to customers. As of 2007, the average
for defect potentials in the United States was about five defects per function point. The average for defect removal efficiency in the
United States was only about 85 percent. The average for delivered defects was about 0.75 defects per function point.
T here are two very important measure-

ments of software quality that are crit-
ical to the industry:
The phrase defect removal efficiency refers
to the percentage of the defect potentials
that will be removed before the software
This is not to say that achieving a defect
removal efficiency level of 100 percent is
impossible, but it is certainly very rare.
1. Defect potentials application is delivered to its users or cus- Organizations with defect potentials
2. Defect removal efficiency tomers. As of 2007, the average for defect higher than seven per function point cou-
All software managers and quality removal efficiency in the U.S. was about 85 pled with defect removal efficiency levels
assurance personnel should be familiar percent. of 75 percent or less can be viewed as
with these measurements because they If the average defect potential is five exhibiting professional malpractice. In
have the largest impact on software quality, bugs or defects per function point and other words, their defect prevention and
cost, and schedule of any known measures. removal efficiency is 85 percent, then the defect removal methods are below accept-
The phrase defect potentials refers to the total number of delivered defects will be able levels for professional software organi-
probable numbers of defects that will be about 0.75 per function point. However, zations.
found during the development of software some forms of defects are harder to find Most forms of testing average only
applications. As of 2008, the approximate and remove than others. For example, about 30 to 35 percent in defect removal
averages in the United States for defects in requirements defects and bad fixes are efficiency levels and seldom top 50 percent.
five categories, measured in terms of much more difficult to find and eliminate Formal design and code inspections, on the
defects per function point and rounded than coding defects. other hand, often top 85 percent in defect
slightly so that the cumulative results are an At a more granular level, the defect removal efficiency and average about 65
integer value for consistency with other removal efficiency against each of the five percent.
publications by the author, follow. defect categories is approximate in Table 2. As can be seen from the short discus-
Note that defect potentials should be Note that the defects discussed in this sions here, measuring defect potentials and
measured with function points and not section include all severity levels, ranging defect removal efficiency provide the most
with lines of code. This is because most of from severity 1: show stoppers, down to effective known ways of evaluating various
the serious defects are not found in the severity 4: cosmetic errors. Obviously, it is aspects of software quality control. In gen-
code itself, but rather in requirements and important to measure defect severity levels eral, improving software quality requires
design. Table 1 shows the averages for as well as recording numbers of defects 2. two important kinds of process improve-
defect potentials in the U.S. circa 2008. There are large ranges in terms of both ment: 1) defect prevention and 2) defect
The measured range of defect poten- defect potentials and defect removal effi- removal.
ciency levels. The best in class organizations The phrase defect prevention refers to
tials is from just below two defects per
have defect potentials that are below 2.50
function point to about 10 defects per Table 1: Averages for Defect Potential
defects per function point coupled with
function point. Defect potentials correlate
defect removal efficiencies that top 95 per-
with application size. As application sizes
cent across the board.
increase, defect potentials also rise.
Requirements defects 1.00
Defect removal peak at
efficiency levels1.00
A useful approximation of the relation-
Requirements defects Design defects 1.25
about 99.5defects
percent. In examining data
1.25from
ship between defect potentials and defect
Design Coding defects 1.75
about 13,000 software projects over a peri-
size is a simple rule of thumb: application
Coding defects 1.75 Documentation defects 0.60
odDocumentation
of 40 years, only two projects 0.60
had zero
function points raised to the 1.25 power
defects Bad fixes 0.40
defect
Bad reports
fixes in the first year after0.40release.
will yield the approximate defect potential
Total 5.00
for software applications. Actually, this rule Table 2: Defect Removal Efficiency
Total 5.00
applies primarily to applications developed Defect Origin Defect

by organizations at Capability Maturity DefectsPotential
Model (CMM) Level 1. For the higher
Defect Origin Defect Removal
Requirements defects Remaining 1.00
CMM levels, lower powers would occur.
Potential Efficiency
Design defects 1.25
Reference [1] shows additional factors that
Requirements defects 1.00 77% 0.23
Coding defects 1.75
affect the rule of thumb1.
Design defects 1.25 85% 0.19
Coding defects 1.75 Documentation
95% defects 0.09 0.60
Bad fixes 80% 0.40
2008 Capers Jones. All rights reserved. Documentation defects 0.60 0.12
Capability Maturity Model and CMM are registered in the
Bad fixes 0.40 Total 70% 0.12 5.00
U.S. Patent and Trademark Office by Carnegie Mellon Total 5.00 85% 0.75
University.

Software Quality
technologies and methodologies that can faction previously had been very poor. ciency levels are important to the industry
lower defect potentials or reduce the num- as a whole; these measures have the great-
bers of bugs that must be eliminated. Measurement of Defect est impact on software performance of
Examples of defect prevention methods any known metrics.
include joint application design, structured Additionally, as an organization pro-
Potentials and Defect Removal
design, and also participation in formal Efficiency gresses from the U.S. average of 85 per-
inspections 3. Measuring defect potentials and defect cent in defect removal efficiency up to 95
The phrase defect removal refers to meth- removal efficiency levels are among the percent, the saved money and shortened
ods that can either raise the efficiency lev- easiest forms of software measurement, development schedules result because
els of specific forms of testing or raise the and are also the most important. To mea- most schedule delays and cost overruns are
overall cumulative removal efficiency by sure defect potentials it is necessary to due to excessive defect volumes during
adding additional kinds of review or test keep accurate records of all defects found testing. However, to climb above 95 per-
activity. Of course, both approaches are during the development cycle, which is cent defect removal efficiency up to 99
possible at the same time. something that should be done as a matter percent does require additional costs. It
In order to achieve a cumulative defect of course. The only difficulty is that private will be necessary to perform 100 percent
removal efficiency of 95 percent, it is nec- forms of defect removal such as unit test- inspections of every deliverable, and test-
essary to use approximately the following ing will need to be done on a volunteer ing will require about 20 percent more test
sequence of at least eight defect removal basis. cases than normal.
activities: Measuring the numbers of defects It is an interesting sociological obser-
Design inspections. found during reviews, inspections, and vation that measurements tend to change
Code inspections. testing is also straightforward. To complete human behavior. Therefore, it is important
Unit tests. the calculations for defect removal effi- to select measurements that will cause
New function tests. ciency, customer-reported defect reports behavioral changes in positive and benefi-
Regression tests. submitted during a fixed time period are cial directions. Measuring defect potentials
Performance tests. compared against the internal defects and defect removal efficiency levels have
System tests. found by the development team. The nor- been noted to make very beneficial
External beta tests. mal time period for calculating defect improvements in software development
To go above 95 percent, additional removal efficiency is 90 days after release. practices.
removal stages are needed. For example, As an example, if the development and When these measures were introduced
requirements inspections, test case inspec- testing teams found 900 defects before into large corporations such as IBM and
tions, and specialized forms of testing, release, and customers reported 100 ITT, in less than four years the volumes
such as human factors testing, add to defects in the first three months of usage, of delivered defects had declined by more
defect removal efficiency levels. it is apparent that the defect removal effi- than 50 percent, maintenance costs were
Since each testing stage will only be ciency would be 90 percent. reduced by more than 40 percent, and
about 30 percent efficient, it is not feasible Unfortunately, although measurements development schedules were shortened
to achieve a defect removal efficiency level of defect potentials and defect removal by more than 15 percent. There are no
of 95 percent by means of testing alone. efficiency levels should be carried out by other measurements that can yield such
Formal inspections will not only remove 100 percent of software organizations, the positive benefits in such a short time
most of the defects before testing begins, frequency of these measurements circa span. Both customer satisfaction and
it also raises the efficiency level of each 2008 is only about five percent of U.S. employee morale improved, too, as a
test stage. Inspections benefit testing companies. In fact, more than half of U.S. direct result of the reduction in defect
because design inspections provide a more companies do not have any useful quality potentials and the increase in defect
complete and accurate set of specifications metrics at all. More than 80 percent of U.S. removal efficiency levels.
from which to construct test cases. companies, including the great majority of
From an economic standpoint, com- commercial software vendors, have only
marginal quality control and are much
Reference
bining formal inspections and formal test- 1. Jones, Capers. Estimating Software
ing will be cheaper than testing by itself. lower than the optimal 95 percent defect Costs. 2nd edition. McGraw-Hill, New
Inspections and testing in concert will also removal efficiency level. This fact is one of York: 2007.
yield shorter development schedules than the reasons why so many software projects
testing alone. This is because when testing fail completely or experience massive cost
and schedule overruns. Usually failing pro-
Notes
starts after inspections, almost 85 percent 1. The averages for defect potentials are
of the defects will already be gone. jects seem to be ahead of schedule until derived from studies of about 600
Therefore, testing schedules will be short- testing starts, at which point huge volumes companies and 13,000 projects. Non-
ened by more than 45 percent. of unanticipated defects stop progress disclosure agreements prevent the
When IBM applied formal inspections almost completely. identification of most companies.
to a large database project, delivered As it happens, projects that average However some companies such as
defects were reduced by more than 50 per- about 95 percent in cumulative defect IBM and ITT have provided data on
cent from previous releases, and the over- removal efficiency tend to be optimal in defect potentials and removal efficien-
all schedule was shortened by about 15 several respects. They have the shortest cy levels.
percent. Testing itself was reduced from development schedules, the lowest devel- 2. The normal period for measuring
two shifts over a 60-day period to one shift opment costs, the highest levels of cus- defect removal efficiency starts with
over a 40-day period. More importantly, tomer satisfaction, and the highest levels requirements inspections and ends 90
customer satisfaction improved to good of team morale. This is why measures of days after delivery of the software to
from prior releases where customer satis- defect potentials and defect removal effi- its users or customers. Of course, there

Measuring Defect Potentials and Defect Removal Efficiency
are still latent defects in the software American Library, Mentor Books.
that will not be found in 90 days, but New York: 1979. About the Author
having a 90-day interval provides a 3. Garmus, David, and David Herron.
standard benchmark for defect Function Point Analysis. Addison Capers Jones is cur-
removal efficiency. It might be thought Wesley Longman, Boston: 2001. rently the president of
that extending the period from 90 days 4. Garmus, David, and David Herron. Capers Jones and Asso-
to six months or 12 months would pro- Measuring the Software Process: A ciates, LLC. He is also
vide more accurate results; however, Practical Guide to Functional Meas- the founder and former
updates and new releases usually come urement. Prentice Hall, Englewood chairman of Software
out after 90 days, so these would dilute Cliffs, NJ: 1995. Productivity Research (SPR) where he
the original defect counts. Latent 5. Grady, Robert B., and Deborah L. holds the title of Chief Scientist
defects found after the 90-day period Caswell. Software Metrics: Establish- Emeritus. He is a well-known author
can exist for years, but on average ing a Company-Wide Program. Pren- and international public speaker, and
about 50 percent of residual latent tice-Hall: 1987. has authored the books Patterns of
defects are found each year. The results 6. International Function Point Users
vary with number of users of the Software Systems Failure and Suc-
Group. IT Measurement. Addison cess, Applied Software Measure-
applications. The more users, the faster
Wesley Longman, Boston: 2002. ment, Software Quality: Analysis
residual latent defects are discovered.
7. Jones, Capers. Applied Software and Guidelines for Success, Esti-
3. Formal design and code inspections
Measurement. 3rd edition; McGraw- mating Software Costs, and Soft-
are the most effective defect removal
activity in the history of software, and Hill, New York: 2008. ware Assessments, Benchmarks, and
are also very good in terms of defect 8. Jones, Capers. Sizing Up Software. Best Practices. Jones and his col-
prevention. Once participants in Scientific American New York: Dec.
leagues from SPR have collected his-
inspections observe various kinds of 1998.
torical data from more than 600 cor-
defects in the materials being inspect- 9. Jones, Capers. Software Assessments,
porations and more than 30 govern-
ed, they tend to avoid those defects in Benchmarks, and Best Practices.
Addison Wesley Longman. Boston: ment organizations. This historical
their own work. All software projects data is a key resource for judging the
larger than 1,000 function points 2000.
10. Jones, Capers. Conflict and Litigation effectiveness of software process
should use formal design and code
Between Software Clients and De- improvement methods.
inspections.
velopers. Software Productivity Re-
search, Burlington, MA: 2003. Software Productivity
11. Kan, Stephen H. Metrics and Models
Additional Reading
1. Boehm, Barry W. Software Engineer- Research, LLC
ing Economics. Prentice Hall, Engle- in Software Quality Engineering. 2nd Phone: (877) 570-5459
wood Cliffs, NJ; 1981. edition. Addison Wesley Longman, Fax: (877) 570-5459
2. Crosby, Philip B. Quality Is Free. New Boston: 2003. E-mail: cjonesiii@cs.com
for
accept article submissions on software-related
the Editor and
CALL FOR ARTICLES
greater detail on the types of articles we're
l ki for:
If your experience or research has produced information that could be
useful to others, CrossTalk can get the word out. We are specifically
looking for articles on software-related topics to supplement upcoming
theme issues. Below is the submittal schedule for three areas of emphasis
we are looking
oo ng for:
Data and Data Management
December 2008
Submission Deadline: July 18, 2008
Engineering for Production

January 2009
Submission Deadline: August 15, 2008
Software Measurement
February 2009
Submission Deadline: September 12, 2008
Please follow the Author Guidelines for CrossTalk, available on the Internet
at <www.s
<www.stsc.hill.af.mil/crosstalk>. We accept article submissions on software-related
topics at any time, along with Letters to the Editor and BackTalk. We also provide a
link tto each monthly theme, giving greater detail on the types of articles we're
looking for at <www.stsc.hill.af.mil/crosstalk/theme.html>.

Quality Processes Yield Quality Products
Thomas D. Neff
MTC Technologies
Would your company like to save $100,000 per day? Would you like to surge an urgent projects delivery time by 50 per-
cent and deliver zero errors? Software organizations have done just that. In this article, I list small steps you can take that
will lead your company toward similar results based on my 15 years of process improvement experience.
I f your company developed software

that ran tools capable of propelling big
objects long distances, measured accuracy
manage hardware requirements. Before
long, that entire group was achieving
record low manufacturing defects, record
goes to a conference where process
improvement is discussed. Inevitably
they return with the latest fad and want
in miles, and increased its accuracy to high profits, record high customer and to implement it by weeks end. It takes
inches, you might save your customer mil- employee satisfaction, record low employ- between two and three years to get
lions of dollars. This actually happened ee turnover, and many more positive CPI institutionalized. Your processes
[1]. If your company refined its software effects [3]. did not get screwed up overnight; they
development processes so that your unit If the rewards from doing this are so will not get fixed overnight, either.
testing department found zero errors in a great, why do so few companies achieve Hold people accountable. This is
three-year period, you might eliminate unit CMMI Level 5 the highest level of the biggest key to any CPI effort. If
testing and move those testers into other process maturity? I contend it is because you create a meager CPI plan complete
types of testing, saving many dollars with they do not execute their continuous with a feedback loop for improve-
every release. This also happened [2]. If process improvement (CPI) effort proper- ments, then hold people accountable
your customer asked you to speed up your ly. There are many ways to do it right, but to following it you will make great
next software delivery by 50 percent and even more ways to do it wrong. If you progress in relatively little time. I have
guarantee no flaws in the delivered product, would like to help ensure success in your experienced both sides of this and can
could you do it without incurring any CPI effort, read this article and get start- vouch that not holding them account-
extra costs or sacrificing other projects? ed. Before long you could very well be able will guarantee failure, and always
One company did [3]. producing (or acquiring) software of holding them accountable is more like-
You may figure those goals are impos- exceptional quality, precisely meeting cus- ly to guarantee success. However, you
sible for your organization to achieve or tomer requirements, and incurring mini- cannot hold them accountable for six
you do not have enough money to make it mal maintenance costs. months and then give up because it is
happen. If so, you are wrong. Right now Based on 15 years of CPI experience, not working. Refer to the second point
open a Web browser, type <www. here are some items you might consider above.
sei.cmu.edu>, and hit enter. If your orga- when starting or reinvigorating your CPI Do not aim for a certain level of
nization does software development, effort. While they are no guarantee that improvement. Never state, We want
search for Capability Maturity Model you will reach the CMMI pinnacle, they to achieve CMMI Level 3 by ___
Integration for Development (CMMI- can help you avoid pitfalls that snag many date. What matters are the qualities
DEV). If you are an acquisition organiza- such efforts. (Throughout this list, we and exhibited, not the score obtained. Your
tion, search for CMMI-Acquisition our refer to the Nuclear Weapons Effects primary emphasis must be to institu-
(ACQ). All organizations should check Division Process Improvement Team tionalize CPI. Once that is accom-
out People Capability Maturity Model (P- [PIT]): plished, the rest will fall into place. If
CMM). Do not try to inspect in quality. All your aim is Level 3, once you reach it,
These models are all instantiations of too often, people believe they can have you will not have any objective left and
Total Quality Management (TQM), the ad-hoc development processes, then you will begin backsliding. However, if
method that turned low-quality Japanese use an inspection process at the end and you emphasize CPI, once you reach
trinkets into high-quality automobiles, effectively remove all defects, yielding Level 5, you will be thinking about
electronics, cameras, and many other a quality product. It just will not hap- what Level 6 (if there was one) would
products [4]. Because these models are pen. My experience shows that only a look like or you will seek other compa-
different views of the same paradigm, you small portion of defects are actually ny areas that could benefit from your
can also use them in other areas. One removed if the attempt is only at the CPI attitude. Levels are just indicators
company used a predecessor of the end. Inspections in every phase of the of your progress.
CMMI-DEV and achieved the models process are good, just do not wait to Do not follow the CMMIs in the
highest level of process maturity in soft- the end and then do a lone inspection! order they are written. They are writ-
ware development. That companys hard- Industry statistics indicate that for ten so that one size fits all. As you and I
ware people realized the software folks every four errors pulled out, one new know, even though one size fits all, it
really had their act together. They got jeal- error is injected. Hence, you must iter- rarely looks good. You are much better
ous and sought the software secret. When ate many times to approach zero. off finding those areas of the CMMI
they were shown the CMM, they said, We Large expense, little return not a currently giving you the most
could use that if we just change a few of good business decision. headaches and work on those first. If
the terms. Instead of talking about man- Do not look for a quick fix. I have that does not work for you or you have
aging software requirements, we would learned to fear when a senior manager many headaches, take a new project

Quality Processes Yield Quality Products
and use it to pilot the CMMI. As you seven, plus or minus two), the better. contain. It does not tell you how to do
work through that project, write the However, you do not want just anyone. it but there is plenty of what. Do not
necessary standard operating instruc- You want people who share your wait until you have the how to get start-
tions (SOI)/standard operating proce- enthusiasm for process improvement ed. Take the what (i.e., CMMI) and turn
dures (SOPs), as identified by the and who see the big picture. If you it into a policy statement (SOI). Then,
CMMI, and test them with that pro- have the wrong team members, it can when it is time to create the how (SOP),
ject. Once they are acceptable, publish be detrimental because you will spend people will know which how to devel-
them as an example of how your orga- 80 percent of your time educating 20 op. You will have already added some
nization does business. Of course, percent of them. structure to your process improve-
these are living documents and as you Every team needs cheerleaders. If ment effort.
mature, your processes must evolve you document/improve all processes Keep focused. When writing an SOI,
with you. but no one knows about them, you do not delve into how people should do
Perfect is the enemy of good enough. have accomplished nothing. Find the something. You want to focus on what
If you are looking to produce perfect opinion leaders in each work area and they are to do and, on occasion, why.
processes, you will never get there. get them involved. If they are not on You can even describe a little of who or
Aim for the 80 percent solution. While the PIT, try to include them on the when. Once an SOP is created then it is
that might seem pretty low, remember occasional work group or have them time to describe how to do the job.
that each process has a feedback loop write an article for the PIT newsletter. These SOIs/SOPs should not be writ-
whereby improvements can easily and That newsletter can be another good ten for a three-year-old, but they also
frequently be made. I know of no one should not be written for a brain sur-
who has ever gone to work thinking, geon (unless it is an SOP on brain
I want to do worse today than yester- One company used a surgery). You should rarely include any
day. Most employees want to do a why material in an SOP. If the worker
better job. The problem is that they do predecessor of the does not know why they are doing their
not always know how, but processes job, they have bigger issues. SOPs
can give them a framework. Their CMMI-DEV and should be written by those already
experience and intuition will help fill in doing the job.
the details on how to improve. achieved the models We dont need no stinkin tools. Just
Jealousy is a great thing. Do not let as everyone wants instant gratification,
the lack of senior management sup- highest level of process we also want a super tool to make our
port stop you. All you need is any man- jobs easier, thus solving all of our
ager to support CPI to get it going. maturity in software problems. Come back to reality. That
Once you are making progress, others tool does not exist. I have found that if
will see something is different with development.That you buy a tool to solve your problems,
your manager: projects are being pro- you are more likely to get a failed CPI
duced on time, on budget, and/or with companys hardware effort and be poorer to boot.
greater quality. The other managers Because you do not have a repeatable
will become jealous and want to people realized the process, the tool only lets you make
achieve the same success. mistakes faster, easier, and with greater
Do not try to end world hunger. software folks had their impact. Of course, this frustrates peo-
Aim low and reach your target. If you ple and they will quit using the tool.
try to fix your whole company, you will act together ...When They will not realize it was the lack of
likely spend most of your time negoti- process that caused the problems, not
ating, selling, and/or compromising. It they were shown the the tool. First, create a process and then
is better to fix your little niche and introduce a tool to help people per-
make others jealous (see above). Allow CMM, they said, We form the process faster, cheaper, and
them to modify your processes to fit better.
their needs. They will already have Sometimes status quo is good.
could use that ....
incentive to ensure they are successful Remember, people want to do their
(jealousy still reigns). If they fail, they cheerleader. It is your first opportunity jobs better they just do not want to
will come back since they will become to provide training snippets on new change to do that. The mere act of
jealous over something else you are processes as well as keeping everyone documenting your current (probably
doing better than them (and reaping informed of your current CPI status. flawed) processes is a huge improve-
tangible rewards such as a bigger bud- Fail and get over it. As humans we ment over undocumented processes.
get or additional people). They might are imperfect. Do not worry about At least now you could repeat the
even stumble onto a better process failure. The only failure is one where process twice in a row. It is better to
than yours great! Ask to use it and you learn nothing. If you are unsuc- get the early buy-in than try to perfect
make it work for you. Now you have a cessful and learn from it then it was a the process too quickly. There will be
strong ally. great learning experience, not a failure plenty of opportunity to improve the
Two heads are better than one. because you now know at least one process as people use it.
Create a PIT encompassing each work way not to do it. Sometimes status quo is bad.
unit in your area. As a PIT leader, you Start with the obvious. The CMMI- Hopefully you will never hear we do it
do not have the corner on good ideas. ACQ has a lot of information about this way because it is how we have always
The more people you include (up to what your acquisition program should done it. However, someone is thinking

Software Quality
it. My experience is that if people do I think I can, I think I can. The Little CPI efforts could not figure out where to
not know why they are doing some- Engine That Could ran uphill for a begin, lost steam before starting, could
thing, they are also ripe for the sugges- long time. It was about out of steam not get any management support (usually
tion that there might be a better way, when it crested the hill and things tried at too high a level), focused too
especially if it means less work. Many became easier. So it is with CPI. You much on tools versus processes, could not
of the always done it this way processes will face an uphill battle for at least six find a quick fix and quit, or tried to solve
can be reduced in effort by 50 percent months and probably more. world hunger and gave up.
or more. Often, some work products However, at some point (that point will Based on my 15 years in process
are used by no one. If a product has be different for each organization) you improvement, I suspect that if you follow
no customer (user), eliminate it. You will crest the hill and gain momentum. these suggestions, sticking with it at least
will earn many new friends. If there At that point, no one can stop your two years, you will be successful in your
was a hidden customer, they will even- CPI effort. It will be institutionalized CPI effort.
tually figure out something changed and no longer dependent on individu- If you have CPI lessons learned, I
and come to you to explain why they als, becoming an integral part of your would enjoy hearing them.
need what you eliminated. Then you organizations business practices. As
will know why it is needed. long as you have steam, you must keep
Keep it short. We keep SOIs to no chugging uphill. Set your sights just
References
1. Yamamura, George, and Gary B.
more than three pages. Most are one to over the crest and you will get there.
Wigle. SEI CMM Level 5: For the
one-and-a-half pages, with the shortest This is not three-card monte. Pick
Right Reasons, CrossTalk Aug.
being two paragraphs. SOPs are longer, a model, any model. There are many
but we still try to keep them to about process models from which to choose 1997 <www.stsc.hill.af.mil/crosstalk/
four pages. If attachments are added, (CMM, CMMI, International Organi- frames.asp?uri=1997/08/seicmm5.
we do not count those against the four- zation for Standardization [ISO] 9000, asp>.
page goal. A short document will get TQM, Lean, Six Sigma, Lean Six 2. Billings, C., J. Clifton, B. Kolkhorst, E.
read, but a long one will not. Our plan Sigma, etc.). Which should you use? Lee, and W.B. Wingert. Journey to a
is to write 100 short documents instead When you are getting started, it does Mature Software Process. IBM
of one all-encompassing volume. not matter. Just pick one and go. Any Systems Journal. Vol. 33, No. 1. 1994:
Do not get hung up on training. improvement is better than none. You pp. 46-61.
Some people feel they need training on may even choose bits and pieces of 3. Vu, John D. Presentation to CIOs
everything. At some level, I agree. several models. Having said that, I Office. National Reconnaissance
However, it is just as bad to do too believe the CMM and CMMI models Office. Chantilly, VA., Mar 2001.
much training as not enough. No one are the most comprehensive and take 4. Deming, W. Edwards. Out of the
needs training on our SOIs. Even most you farther than the others. For Crisis. MIT Press. 1986.
SOPs are written so that anyone suffi- instance, ISO 9000 takes you to about
ciently educated could pick up an SOP a CMMI Level 2. The Lean and Six
and determine how to perform its Sigma models require you to docu-
About the Author
task. Use screen captures, pictures, and ment your process first, so you can Thomas D. Neff, Lt
flowcharts some people like determine just how much it has Col, U.S. Air Force
words,some need pictures. Cater to improved. Since most organizations (Ret.) spent most of his
both but keep it short, and provide just starting CPI do not have docu- Air Force career in soft-
training as needed or requested. mented processes, it seems the
A hyperlink is your friend. Ample CMM/CMMI might be best for start- ware development and
hyperlinking avoids redundancy and ing because they provide guidance on project management.
inaccuracy. For instance, we have an what should be in your key processes. Currently, he works for MTC Technolo-
SOI describing acronyms and defini- As your processes mature, you will gies supporting the Defense Threat
tions. All acronyms and definitions likely incorporate other models into Reduction Agencys Nuclear Weapons
used in our SOIs/SOPs are included your CPI effort to speed your progress Effects Division as a process manager,
here. We then name the definition as a or improve the quality. Use whatever and uses the CMMI-ACQ and P-CMM
bookmark and hyperlink upon its first works for you. as guides for that effort. Neff is a fre-
use in each document. That way we Do not reinvent the wheel. Reuse is
quent speaker on process improvement
ensure the proper definition is used your friend. Build on others successes.
and we do not have to spell it out, Learn from them. Never embrace not at information technology conferences.
which keeps our documents shorter. invented here syndrome. The Software He has a Master of Computer Science
Procrastination is your enemy. Engineering Institute already devel- from Texas A&M, which helped steer
There is no bad place to start a CPI oped all the tools you need to start him toward process improvement.
effort except to not start at all. I do not making significant leaps in the quality
know how many times I have been of your processes. Their CMM and
asked how to start a CPI effort. I CMMI models describe every charac-
DTRA/RD-NTE
answer, It doesnt matter. Start teristic your organization should

8725 JJ Kingman RD
where you feel most comfortable, with exhibit at various levels of process
Ft Belvoir, VA 22060-6201
what causes the most headaches, with maturity. Use the models they work.
Phone: (703) 767-4106
what will give the best return on They will give you quality processes
Fax: (703) 767-9844
investment, or you can use any other leading to quality products. E-mail: thomas.neff_contractor
criteria. As Nike said, just do it. From what I have seen, most failed @dtra.mil

Departments
Urg ent Reader Request!

Have we helped? How?
CrossTalk has been there for you for almost twenty years, and now
we are asking that you be there for CrossTalk. As a free publication,
your comments are the lifeblood of our existence. Has the information
provided in our publication ever helped you save time or money?
Have you benefitted in other ways? If so, we want to hear about it.
Our goal has always been to inform and educate you our readers on
software engineering best practices, processes, policies and other technologies.
Your comments will help CrossTalk continue to bring you the news
If we have succeeded in this goal, let us know how, where, when, and why.
and information youve come to expect.
Send your stories of success to crosstalk.publisher@hill.af.mil, or go to

www.stsc.hill.af.mil/crosstalk.
We will feature your comments in our 20th anniversary issue this August.
Thank You!
A rticles ... and metrics

helped me save $33.5 million
on government programs.
Quoting it all the time to

substantiate project plans
and estimates.
... Between CrossTalk and our

tech advisor, we shipped
580K + SLOC with more
f unctionality than originally planned.
Share Your Results!

Software Engineering Technology
The Use and Limitations of Static-Analysis

Tools to Improve Software Quality
Dr. Paul Anderson
GrammaTech, Inc.
Advanced static-analysis tools have been found to be effective at finding defects that jeopardize system safety and security. This
article describes how these work and outlines their limitations. They are best used in combination with traditional dynamic
testing techniques, and can even reduce the cost to create and manage test cases for stringent run-time coverage.
S tatic analysis has commonly been

known as a technique for finding
violations of superficial stylistic pro-
interpret the source code in the same
way that the real compiler does. It does
this by modeling how the real compile
individual paths through the program.
This is important because it means that
when a warning is reported, the tool
gramming rules, and for alerting pro- works as closely as possible. The com- can tell the user the path along which
grammers to typing discrepancies in mand-line flags are an essential input to execution must proceed in order for the
type-unsafe languages. The latest static- that. flaw to be manifest. Tools also usually
analysis tools go far beyond this, and As the build system progresses, each indicate the points along that path
are capable of finding serious errors in invocation of the compiler is used to where relevant transformations occur
programs such as null-pointer de-refer- create a whole program model of the and conditions on the data values that
ences, buffer overruns, race conditions, must hold. These help users understand
resource leaks, and other errors. They the result and how to correct the prob-
can do so without requiring additional lem should it be confirmed.
input from the users, and without In order to understand Once a set of warnings have been
requiring changes to development issued, these tools offer features to help
processes or practices. Actionable the user manage the results, including
the limitations of
results are produced quickly with a low allowing the user to manually label indi-
level of false positives. These static- vidual warnings. Warnings that corre-
the techniques that
analysis tools are not a silver bullet, spond to real flaws can be labeled as
true positives. Warnings that are false
however, because they can never prove
these tools use, it is
that a program is completely free of important to understand alarms can be labeled as false positives.
flaws. The following is a description of Warnings that are technically true posi-
how static-analysis tools work, followed the metrics used tives but which are benign can be
by a discussion of how they can be used labeled as dont care. Most tools offer
to complement traditional testing. to assess their features that allow the user to suppress
reporting of such warnings in subse-
How Static Analysis Finds performance. quent analyses.
Flaws
The first thing a static analysis tool program. This model consists of a set Limitations of Static
must do is identify the code to be ana- of abstract representations of the
lyzed. The source files that must be
Analysis
source, and is similar to what a compil- In order to understand the limitations
compiled to create a program may be er might generate as an intermediate of the techniques that these tools use, it
scattered across many directories, and representation. It includes the control- is important to understand the metrics
may be mixed in with other source code flow graph, the call graph, and infor- used to assess their performance. The
that is not used for that program. Static mation about symbols such as variables first metric, recall, is a measure of the
analysis tools operate much like compil- and type names. ability of the tool to find real problems.
ers so they must be able to identify Once the model has been created, Recall is measured as the number of
exactly which source files contribute the analysis performs a symbolic execu- flaws found divided by all flaws present.
and should ignore those that do not. tion on it. This can be thought of as a The second metric is precision, which
The scripts or build system that builds simulation of a real execution. Whereas measures the ability of the tool to
the executable obviously know which a real execution would use concrete val- exclude false positives. It is the ratio of
files to use, so the best static analysis ues in variables, the symbolic execution true positives to all warnings reported.
tools can extract this information by uses abstract values instead. This execu- The third metric is performance.
reading those scripts directly or by tion explores paths and, as it proceeds, Although not formally defined, this is a
observing the build system in action. if any anomalies are observed, they are measure of the computing resources
This way the tool gets to see not only reported as warnings. This approach is needed to generate the results.
the source files but also which compiler based on abstract interpretation [1] and These three metrics usually operate
is being used and any command-line model checking [2]. in opposition to each other. It is easy to
flags that were passed in. The parser The analysis is path-sensitive, which create a tool that has perfect precision
that the static analysis tool uses must means that it can compute properties of and excellent performance one that

The Use and Limitations of Static-Analysis Tools to Improve Software Quality
reports no lines contain flaws will satis- practice, this is fewer because some the program. The appeal of the sym-
fy because it reports no false positives. branches are correlated, but the asymp- bolic execution is that each abstract
Similarly, it is easy to create a tool with totic behavior remains. If procedure state represents potentially many possi-
perfect recall and excellent perfor- calls and returns are taken into account, ble concrete states. For example, given
mance one that reports that all lines the number of paths is doubly exponen- an 8-bit variable x, there are 2 8 possible
have errors will answer because it tial, and if loops are taken into account concrete values: 0, 1, , 255. The sym-
reports no false negatives. Clearly, how- then the number of paths is unbound- bolic execution, however, might repre-
ever, neither tool is of any use whatso- ed. Clearly it is not possible for a tool to sent the value as two abstract states:
ever. explore all of these paths. The tools x=0, and x>0. So where a concrete exe-
Finally, it is at least theoretically pos- restrict their exploration in two ways. cution has 256 states to explore, the
sible to write an analyzer that would First, loops are handled by exploring a symbolic execution has only two.
have excellent precision and excellent small fixed number of iterations: often, As such, the expressivity of this
recall given enough time and access to the first time around the loop is singled abstract domain is an important factor
enough processing power. Whether out as special, and all other iterations that determines the effectiveness of the
such a tool would be as useless as the are considered en masse and represent- analysis. Again, there is a trade-off
previous two example tools is debatable ed by an approximation. Second, not all here: better precision and recall can be
and would depend on just how much paths are explored. It is typical for an achieved by more sophisticated abstract
time it would take. What is clear is that analysis to place an upper bound on the domains, but more resources will then
no such tools currently exist and to cre- number of paths explored in a particu- be required to complete an analysis.
ate them would be very difficult. Values in the abstract domain are equa-
As a result, all tools occupy a middle tions that represent constraints on val-
ground around a sweet spot that devel- If asynchronous paths ues, i.e., x=0, or y>10. As the analysis
opers find most useful. Developers progresses, a constraint solver is used
expect analyses to complete in time can occur (such as those to combine and simplify these equa-
roughly proportional to the size of tions. A key characteristic of these
their code base and within hours rather caused by interrupts or abstract domains is that there is a spe-
than days. Tools that take longer simply cial value, usually named bottom, which
do not get used because they take too exceptions) or if the indicates that the analysis knows no
long. Low precision means more false useful information about the actual
positives, which has an insidious effect program uses concurrency, value. Bottom is the abstract value that
on users. As precision goes down, even corresponds to all possible concrete
true positive warnings are more likely to then the number of values. Reaching bottom is impossible
be erroneously judged as false positives to avoid for any non-trivial abstraction
because the users lose trust in the tool. possible paths to in general as this would require solving
For most classes of flaws, precision the halting problem. Once bottom is
less than 80 percent is unacceptable. consider increases reached, the analysis has a choice of
For more serious flaws, however, preci- treating it as a potentially dangerous
sion as low as five percent may be further. Many tools value, which would increase recall, or as
acceptable if the code is to be deployed a probably safe value, which would
in very risky environments. It is diffi- simply ignore the increase precision. Most tools opt for
cult to quantify acceptable values for the latter as the former also has the
recall as it is impossible to measure effect of decreasing precision enor-
possibilities.
accurately in practice, but clearly users mously.
would not bother using these tools at all lar procedure or on the amount of time If there are program constructs that
if they did not find serious flaws that available, and a selection of those step outside the bounds of what can be
escape detection by other means. remaining paths are explored. expressed in the abstract domain, this
Each of these constraints intro- If asynchronous paths can occur causes the analysis to lose track of vari-
duces its own set of limitations, howev- (such as those caused by interrupts or ables and their relationships. For exam-
er they are all interrelated. The reasons exceptions) or if the program uses con- ple, an abstract domain that allows the
that lead to low recall are explained in currency, then the number of possible expression of affine relationships
more detail in the following sections. paths to consider increases further. between no more than two variables
Many tools simply ignore these possi- admits expressions such as x=2y.
bilities. Finally, most tools also ignore However, something such as x=y+z is
As mentioned earlier, these analyses are recursive function calls, and function out of bounds because it involves three
Path Limitations
path sensitive. This improves both calls that are made through function variables and the analysis would be
recall and precision and is probably the pointers (or make very coarse approxi- forced to conclude x=bottom instead.
key aspect of these products that makes mations) as considering these also con- The consequence of this is the
them most useful. A full exploration of tributes to poor performance and poor abstract domain that a tool uses deter-
all paths through the program would be precision. mines a great deal about the kind of
very expensive. If there are n branch flaws that it is capable of detecting. For
points in a procedure, and there are no example, if the tool uses an abstract
loops in that procedure, then the num- As previously mentioned, these tools domain of affine relations between two
Abstract Domain
ber of intraprocedural paths through work by exploring paths and looking variables, then it may fail to find flaws
that procedure can be as many as 2 n. In for anomalies in the abstract state of that depend on three variables.

.
monly used libraries are being used
Coverage a b c incorrectly, or where there are inconsis-
Statement T - - tencies in the code that indicate misun-
derstanding. If the code does the wrong
if (a || b || c) Decision T - -
x = 0; F F F thing for some other reason, but does
MCDC T - - not then terminate abnormally, then sta-
F T - tic analysis is unlikely to be able to help
F F T because it is unable to divine the intent
F F F of the author. For example, if a function
is intended to sort in ascending order,
Table 1: Test Cases Needed for Statement, Decision, and MCDC Coverage
but perfectly sorts in descending order
Similarly, most tools choose a domain recall but it is forced to make the sim- instead, then static analysis will not help
that allows them to reason about the ple assumption unless told otherwise. much. This kind of functionality testing
values of integers and addresses but There are two approaches around is what traditional dynamic testing is
not floating-point values, so they will this. First, if source is not available but good for.
fail to find flaws in floating-point arith- object code is, then the analysis could
metic (such as divide by zero). be extended into the object code. This
is a highly attractive solution but no
Static Analysis and Testing
Static analysis should never be seriously
products are available yet. The techno- considered as a replacement for tradi-
If the source code to a part of a pro- logical basis for such a tool exists, how- tional dynamic testing activities. Rather,
Missing Source Code
gram is not available, as is almost always ever [3], and it is expected that products it should be thought of as a way of
the case because of operating system capable of analyzing object code as well amplifying the software assurance effort.
and third-party libraries, or if the code as C/C++ will appear. The cheapest bug to find is the one that
is written in a language not recognized A second approach to the problem gets found earliest, and as static analysis
by the analysis tool, then the analysis is to specify stubs, or models, that sum- can be used very early in the develop-
must make some assumptions about marize key aspects of the missing ment cycle, its use can reduce the cost of
how that missing code operates. Take, source code. The popular analysis tools development and liberate resources for
for example, a call to a function in a provide models for commonly used use elsewhere. This is the traditional
third-party library that takes a single libraries such as the C library. These view of how static analysis can reduce
pointer-typed parameter and returns an models only have to approximate the testing costs. However, there is a second
integer. In the absence of any other behavior of the code. Users can, of way in which the use of static analysis
information, most analyses will assume course, write these themselves for their can reduce the cost of testing: it makes
that the function does nothing and own libraries but it can be a tricky and it easier to achieve full coverage.
returns an unknown value. This clearly time-consuming effort. One measure of the effectiveness of a
is not realistic, but it is not practical to test suite is how well it exercises or covers
do better in general. The function may the code being tested. There are many dif-
de-reference its pointer parameter, it There are, of course, entire classes of
Out of Scope
ferent kinds of coverage. Statement cover-
may read or write any global variable flaws that static analysis is unlikely ever age is the most common, but for riskier
that is in scope, it may return an integer to be able to detect. Static analysis excels code more stringent forms are often
from a particular range, or it may even at finding places where the fundamental required. Decision coverage is a superset
abort execution. If the analysis knew rules of the language are being violated of statement coverage, and requires that all
this, it would have better precision and such as buffer overruns, or where com- branches in the control flow of the pro-
gram are taken. In DO-178B, a develop-
Figure 1: A Redundant Condition Warning ment standard for flight software [4], the
riskiest code is required to be tested with
100 percent modified condition/decision
c:\CodeSonar\ex2.c
coverage (MCDC). This means that a test

Enter foo
5 void foo (int rest, int length)
suite must be chosen such that all sub-
6 {
expressions in all conditionals are evaluat-

7 if (rest <=1)
ed to both true and false. Table 1 illustrates

8 buf[pos-1] = >;
how many different test cases are needed

9 else if (rest == 2)
10
for each to achieve coverage. For the code
buf[pos++] = >;
11 else if (length > rest)
sample on the left, the values required of
Always True: 12 if (--rest > 1) { /* Redundant Condition (ID: 1) */
the boolean variables a, b, and c to achieve

rest > 1
each form of coverage is shown on the

13 if (rest >= 2)
right.
14 rest --;
15
Achieving full coverage, even for
}
Figure 2: A Second Redundant Condition Warning statement coverage, can be very time
consuming. The engineer creating the
test case must figure out what inputs
Never True: 8 if (!flags & MASK) /*Redundant Condition */
must be given to drive the program to

($temp2 & 16) != 0
each statement. What can make it very

9 {
frustrating is if it is fundamentally
10 error(Cannot sign packet);
11 return;
12 }
impossible to do so, but this may not be

The Use and Limitations of Static-Analysis Tools to Improve Software Quality
apparent simply by looking at the code. to place parentheses around the inner Interpretation: A Unified Lattice
If the program contains unreachable expression. This is a potentially danger- Model for Static Analysis of Programs
code, then statement coverage is impos- ous flaw as it means that the error con- by Construction or Approximation of
sible. If it contains redundant conditions dition would not be detected, which Fixpoints. ACM Symposium on
(those that are either always true or could result in unpredictable behavior.
always false), then MCDC is impossible. Principles of Programming Lan-
Developers can spend hours trying to guages. Los Angeles, CA., 1977.
refine a test case before it is evident that 2. Clarke, E.M., O. Grumberg, and D.A.
When to Use Static Analysis
their efforts are pointless. Peled. Model Checking. MIT Press:
Tools
The best time to use advanced static
If the unreachable code or redun- analysis tools is early in the development Cambridge, MA: 1999.
dant conditions can be brought to the cycle. In Holzmanns 10 rules for safety- 3. Balakrishnan, G., R. Gruian, T. Reps,
attention of the tester early, then they do critical development [5], the most far- and T. Teitelbaum. CodeSurfer/x86
not need to waste time in a futile attempt reaching rule states that these tools A Platform for Analyzing x86
to achieve the impossible. This is what should be used throughout the develop-
static analysis can do easily and efficient- Executables. International Confer-
ment process. As well as reducing the ence on Compiler Construction. 2005.
ly. Figure 1 shows an example of a cost of development by finding flaws
report from CodeSonar1 illustrating a 4. RTCA/DO-178B. Software Con-
earlier and reducing testing effort, early
redundant condition in a sample of code siderations in Airborne Systems and
adoption exerts a force on programmers
taken from an open-source application. Equipment Certification. 1992.
to write code that is more amenable to
The variable rest, an unaliased integer, 5. Holzmann, G.J. The Power of 10:
analysis, thereby increasing the probabil-
must be at least three by line 12. The Rules for Developing Safety-Critical
ity that the tool will find errors. Care
decrement on that line means it is at
should be taken, however, to avoid a risk Code. IEEE Computer 2006.
least two, so the condition will always be
true. The following line is also redun- compensation phenomenon, where pro-
dant and shown in a different report. grammers use less care because they
assume that the static analysis tool will
Note
In this example, all the components 1. GrammaTechs static analysis tool.
of the code relevant to the redundancy find their mistakes.
are in close proximity so it is likely that a If adopted late in the development
cycle, static analysis may issue a large
About the Author
reviewer would have spotted this during
a manual review. It would not have been number of warnings. The best value is Paul Anderson, Ph.D.,
so easy to spot if the code were more gained if these are all dealt with, either
is vice president of engi-
complex. If the code had spanned sev- by fixing the code, marking them as false
positives, or labeling them as dont care if neering at GrammaTech,
eral pages, or if relevant parts had been a spin-off of Cornell
embedded in function calls or macro they are believed to be benign. However,
if scheduling time to sift through these University that special-
invocations, then it would have been dif-
ficult to spot. Static analysis is not sensi- is not feasible, then an alternative strate- izes in static analysis,
tive to superficial aspects of the code gy is to operate in a differential mode, where he manages GrammaTechs engi-
such as its layout, so it would not have where programmers are only told about neering team and is the architect of the
been confused. new warnings. This way they are alerted companys static analysis tools. He has
These kinds of redundancies corre- to flaws in code that they are working worked in the software industry for 16
late well with genuine flaws as well; for with while it remains fresh in their years, with most of his experience
example, consider the example in Figure minds.
focused on developing static analysis,
2. This was distilled from a genuine flaw automated testing, and program trans-
found in a widely used open-source pro-
formation tools. A significant portion of
Conclusion
gram, and is a redundant condition Advanced static analysis tools offer
warning where the tool has deduced that much to help improve the quality of Andersons work has involved applying
the true branch of the conditional will software. The best tools are easy to inte- program analysis to improve security.
never be taken. The reason why it con- grate into the development cycle, and His research on static analysis tools and
cluded so is shown to the left. The first can yield high-quality results quickly techniques has been reported in numer-
operand to the bitwise AND (the & sym- without requiring additional engineering ous articles, journal publications, book
bol) is either zero or one as this is the effort. They can be used not just for chapters, and international conferences.
range of the negation operator (the ! finding flaws, but also to guide testing
activities. They use sophisticated sym- Anderson has a B.Sc. from Kings
symbol). This is what is represented by
$temp2. The constant MASK has the bolic execution techniques for which College, University of London, and his
value 16. The result of the AND expres- engineering trade-offs have been made doctorate in computer science from City
sions 1&16 and 0&16 are both zero, so so that they can generate useful results University, London.
the conditional expression is guaranteed in a reasonable time. As such, they
to be zero. inevitably have both false positives and
The programmer who wrote this false negatives, and so should never be
GrammaTech, Inc.
code probably misunderstood the prece- considered a replacement for traditional

317 N Aurora ST
dence of the operators in the condition- testing techniques.

Ithaca, NY 14850
al expression and assumed that the

Phone: (607) 273-7340
innermost operator had higher prece-

Fax: (607) 273-8752
References
dence. If so, then a correction would be 1. Cousot, P., and R. Cousot. Abstract
E-mail: paul@grammatech.com

Automated Combinatorial Test Methods
Beyond Pairwise Testing
D. Richard Kuhn and Dr. Raghu Kacker Dr. Yu Lei
National Institute of Standards and Technology University of Texas, Arlington
Pairwise testing has become a popular approach to software quality assurance because it often provides effective error detection
at low cost. However, pairwise (2-way) coverage is not sufficient for assurance of mission-critical software. Combinatorial
testing beyond pairwise is rarely used because good algorithms have not been available for complex combinations such as 3-
way, 4-way, or more. In addition, significantly more tests are required for combinations beyond pairwise testing, and testers
must determine expected results for each set of inputs. This article introduces new tools for automating the production of com-
M
plete test cases covering up to 6-way combinations.
any testers are familiar with the most may be observed to fail only for the User studied in [4].
basic form of combinatorial testing Datagram Protocol (UDP) when packet But if pairwise testing can detect 90
all pairs or pairwise testing, in which all rate exceeds 1.3 million packets per sec- percent of bugs, what interaction strength
possible pairs of parameter values are cov- ond a 2-way interaction between proto- is needed to detect 100 percent?
ered by at least one test [1, 2]. Pairwise col type and packet rate. An even more Surprisingly, we found no evidence that
testing uses specially constructed test sets difficult bug might be one which is detect- this question had been studied when the
that guarantee testing every parameter ed only for UDP when packet volume National Institute of Standards and
value interacting with every other parame- exceeds 1.3 million packets per second Technology (NIST) began investigating
ter value at least once. For example, sup- and packet chaining is used a 3-way software faults in 1996. Results showed
pose we had an application that is intend- interaction between protocol type, packet that across a variety of domains, all fail-
ed to run on a variety of platforms com- rate, and chaining option. ures could be triggered by a maximum of
prised of five components: an operating Unfortunately, only a handful of tools 4-way to 6-way interactions [5]. As shown
system (Windows XP, Apple OS X, Red can generate more complex combinations, in Figure 2, the detection rate increases
Hat Linux), a browser (Internet Explorer, such as 3-way, 4-way, or more (we refer to rapidly with interaction strength. With the
Firefox), protocol stack (IPv4, IPv6), a the number of variables in combinations NASA application, for example, 67 per-
processor (Intel, AMD), and a database as the combinatorial interaction strength, or cent of the failures were triggered by only
(MySQL, Sybase, Oracle), a total of 3 x 2 simply, interaction strength, e.g., a 4-way a single parameter value, 93 percent by 2-
x 2 x 2 x 2 = 48 possible platforms. With combination has 4 variables and thus its way combinations, and 98 percent by 3-
only 10 tests, as shown in Figure 1, it is interaction strength is 4). The few tools way combinations. The detection rate
possible to test every component interact- that do generate tests with interaction curves for the other applications are simi-
ing with every other component at least strengths higher than 2-way may require lar, reaching 100 percent detection with 4-
once, i.e., all possible pairs of platform several days to generate tests [3] because way to 6-way interactions. That is, six or
components. The effectiveness of pair- the generation process is mathematically fewer variables were involved in all failures
wise testing is based on the observation complex. Pairwise testing, i.e. testing 2- for the applications studied, so 6-way test-
that software faults often involve interac- way combinations, has come to be accepting could, in theory, detect all of the fail-
tions between parameters. While some ed as the standard approach to combina- ures. While not conclusive, these results
bugs can be detected with a single para- torial testing because it is computationally suggest that combinatorial testing that
meter value, such as a divide-by-zero tractable and can effectively detect many exercises high strength interaction combi-
error, the toughest bugs often can only be faults. For example, pairwise testing could nations can be an effective approach to
detected when multiple conditions are detect 70 percent to more than 90 percent high-integrity software assurance.
true simultaneously. For example, a router of software faults for the applications Applying combinatorial testing to real-
world software presents a number of chal-
Figure 1: Pairwise Test Configurations lenges. For one of the best algorithms,
the number of tests needed for combina-
Test OS Browser Protocol CPU DBMS
torial coverage of n parameters with v val-
ues each is proportional to v t log n, where
t is the interaction strength [3]. Unit test-
1 XP IE IPv4 Intel MySQL
ing of a small module with 12 parameters

2 XP Firefox IPv6 AMD Sybase
3 XP IE IPv6 Intel Oracle required only a few dozen tests for 2-way
combinations, but approximately 12,000
for 6-way combinations [6]. But a large
4 OS X Firefox IPv4 AMD MySQL
number of test cases will not be a barrier

5 OS X IE IPv4 Intel Sybase
6 OS X Firefox IPv4 Intel Oracle if they can be produced with little human
intervention, thus reducing cost. To apply
combinatorial testing, it is necessary to
7 RHL IE IPv6 AMD MySQL
find a set of test inputs that covers all t-

8 RHL Firefox IPv4 Intel Sybase
9 RHL Firefox IPv4 AMD Oracle way combinations of parameter values,
and to match up each set of inputs with
the expected output for these input values.
10 OS X Firefox IPv6 AMD Oracle

100
10 OS X Firefox IPv6 AMD Oracle
Automated Combinatorial Test Methods Beyond Pairwise Testing
These are both difficult problems, but

they can now be solved with new algo-
100
rithms on currently available hardware.
We explain these two steps followed by a
90
small but complete illustrative example. 80
Cumulative Percent
Computing T-Way 70
Combinations of Input Values 60
Using FireEye
The first step in combinatorial testing is to 50
find a set of tests that will cover all t-way Test OS Browser Protocol CPU DBMS
combinations of parameter values for the
40
desired combinatorial interaction strength t.
1 XP IE IPv4 Intel MySQL
This collection of tests is known as a covering

230 XP Firefox IPv6 AMD Sybase
array. The covering array specifies test data 320 XP IE IPv6 Intel Medical devices
Oracle
where each row of the array can be regard-
ed as a set of parameter values for an indi-
4 OS X Firefox IPv4 AMD Browser MySQL
10
vidual test. Collectively, the rows of the
5 OS X IE IPv4 Intel Server Sybase
array cover all t-way combinations of para- 6 0 Intel NASA-distributed

OS X Firefox IPv4 Oracle database
meter values. An example is given in Figure 1 2 3 IE 4 5IPv6 6

3, which shows a 3-way covering array for
7 RHL AMD MySQL
Interactions
10 variables with two values each. The inter-
8 RHL Firefox IPv4 Intel Sybase
Figure92: Error Detection Rates for Interaction Strengths 1 to 6
esting property of this array is that any three RHL Firefox IPv4 AMD Oracle
columns contain all eight possible values for ing the10expectedOS results set of input ForIPv6
X for eachFirefox example, in some AMD cases we can run a
data. To solve this problem, we use the model checker in simulation mode, produc-
Oracle
three binary variables. For example, taking
columns F, G, and H, we can see that all open-source NuSMV model checker [7] (an ing expected results directly rather than
A B C D E F G H I J
eight possible 3-way combinations (000, enhanced
0 version
0 of the
0 well-known
0 SMV 0 through
0 a 0 counterexample,
0 0 but the 0
001, 010, 011, 100, 101, 110, 111) occur model checker1 [7]). Conceptually,
1 100 1 1 the model 1 approach
1 illustrated
1 in1 this article
1 is more 1
somewhere in the rows of the three checker can be viewed as exploring all states general, and can be applied to non-deter-
1 1 1 0 1 0 0 0 0 1
columns. In fact, this is true for any three of a system model to determine if a prop- ministic systems or used with mutation-
1 90 0 1 1 0 1 0 1 0 0
columns. Collectively, therefore, this set of erty claimed
1
80 in a specification statement is based methods in addition to combinatorial
0 0 0 1 1 1 0 0 0
tests will exercise all 3-way combinations of true. What makes a model checker particu- testing [8]. The method chosen for resolving
0 1 1 0 0 1 0 0 1 0
input values in only 13 tests, as compared larly valuable is that if the claim is false, the the oracle problem depends on the problem
Cumulative Percent
0 70 0 1 0 1 0 1 1 1 0
with 1,024 for exhaustive coverage. Similar model1 checker1 not only0reports this,
1 but also 0 0 but model
at hand, 1 checking
0 can1be effective 0
arrays can be generated to cover up to all 6- provides a counterexample showing how the in testing protocols, access control, or other
0 60 0 0 1 1 1 0 0 1 1
way combinations. A non-commercial claim can50be shown false. As will be seen in applications where there is a state machine,
0 0 1 1 0 0 1 0 0 1
research tool called FireEye [3], developed the illustrative example, this gives us the unified modeling language state chart, 0or
0 1 0 1 1 0 0 1 0
by NIST and the University of Texas at ability
1 to40match
0 every 0 set of input
0 test data 0
other
0
formal 0model available.
1 1 1
Arlington1, makes this possible with much with the30
0
result that the system should pro-
1 0 0 0 1 1 1 0 1
greater efficiency than previous tools. For duce for that input data. Figure 4 outlines Illustrative Example
example, a commercial tool required 5,400 the process.
20 Here we present a Medical small devices
example of an
seconds to produce a less-optimal test set The model checker thus automates the access control system. The rules of the sys-
Browser
than FireEye generated in 4.2 seconds. work that normally must be done by a tem are a simplified multi-level security sys-
10
human tester determining what the cor- tem, followed by a step-by-step construc-
Server
rect output 1should be 2 for each3 set of input tion

5 of tests6 using an automated process.
0 NASA-distributed database
Matching Combinatorial Inputs
data. Other approaches to determining the Each subject (user) has a clearance level u_l,
4
correct output for each test can also be used. and each file has a classification level f_l.
With Expected Outputs Using Interactions
Figure 3: 3-way Covering Array for 10 Parameters With Two Values Each
Nu Symbolic Model Verifier
(SMV)
The second step in combinatorial test devel-
opment is to determine what output should
A B C D E F G H I J
be produced by the system under test for 0 0 0 0 0 0 0 0 0 0
each set of input parameter values, often
1 1 1 1 1 1 1 1 1 1
referred to as the oracle problem in testing. The
1 1 1 0 1 0 0 0 0 1
conventional approach to this problem is

1 0 1 1 0 1 0 1 0 0
human intervention to design tests and

1 0 0 0 1 1 1 0 0 0
assign expected results or, in some cases, to

0 1 1 0 0 1 0 0 1 0
use a reference implementation that is known to

0 0 1 0 1 0 1 1 1 0
1 1 0 1 0 0 1 0 1 0
be correct (for example, in checking confor- 0 0 0 1 1 1 0 0 1 1
mance of various vendor products to a pro- 0 0 1 1 0 0 1 0 0 1
tocol standard). Because combinatorial test- 0 1 0 1 1 0 0 1 0 0
ing can require a large number of tests, an 1 0 0 0 0 0 0 1 1 1
automated method is needed for determin- 0 1 0 0 0 1 1 1 0 1

System Model
This system is easily modeled in the lan-
guage of the NuSMV model checker as a
Input Covering
Covering
simple two-state finite state machine. Other
values array
array
tools could be used, but we illustrate the test
generator
production procedure using NuSMV

because it is among the most widely used
model checkers and is freely available. Our
approach is to model the system as a simple
state machine, then use NuSMV to evaluate
the model and post-process the results into
System Model Counter-
complete test cases.
model checker examples
Figure 5 shows the system model
defined in SMV. The START state initial-
izes the system (line 8), with the rule noted
previously used to evaluate access as either
GRANT or DENY (lines 9-13). For exam-
ple, line 10 represents the first line of the
pseudo-code example: in the current state,
Post - Test System
(always START for this simple model), if
processor cases under test
u_l f_l then the next state is GRANT.
Each line of the case statement is exam-
Figure 4: Automated Combinatorial Test Construction ined sequentially, as in a conventional pro-
Levels are given as 0, 1, or 2, which could if u_l >= f_l & act = rd then gramming language. Line 12 implements
the else DENY rule, since the predicate
represent levels such as Confidential, Secret, GRANT;
1 is always true. SPEC clauses given at the
and Top Secret. A user u can read a file f if else if f_l >= u_l & act = wr
end of the model define statements that
u_l f_l (the no read up rule), or write to a then GRANT; else DENY;
are to be proven or disproven by the
file if f_l u_l (the no write down rule). model checker. The SPEC statements in
Thus, a pseudo-code representation of Tests produced will check that these rules Figure 5 duplicate the access control rules
the access control policy is: are correctly implemented in a system. as temporal logic statements and are, thus,
Figure 5: SMV Model of Access Control Rules provable. In the following sections, we
illustrate how to combine them with input
1. MODULE main data values to generate complete tests with
2. VAR
expected results.
In SMV, specifications of the form AG
--Input parameters
(predicate 1) -> AX (predi-

3. u_l: 0..2; -- user level
cate 2) indicate essentially that for all

4. f_l: 0..2; -- file level
paths (the A in AG) for all states globally

5. act: {rd, wr}; -- action
--output parameter (the G), if predicate 1 holds then (->)

6. access: {START_, GRANT,DENY}; for all paths, in the next state (the X in AX)
predicate 2 will hold. SMV checks the
properties in the SPEC statements and
7. ASSIGN
shows that they match the access control
8. init(access) := START_;
rules as implemented in the finite state

--if access is allowed under rules, then next state is GRANT
machine, as expected. Once the model is

--else next state is DENY
correct and SPEC claims have been shown

9. next(access) := case
valid for the model, counterexamples can

10. u_l >= f_l & act = rd : GRANT;
11. f_l >= u_l & act = wr : GRANT;
12. 1 : DENY; be produced that will be turned into test
13. esac;
cases.
14. next(u_l) := u_l;
Generating Covering Array
We will compute covering arrays that give
15. next(f_l) := f_l;
all t-way combinations, with degree of

16. next(act) := act;
interaction coverage two for this example.

If we had a larger number of parameters,
-- reflection of the assigns for access
-- if user level is at or above file level then read is OK
SPEC AG ((u_l >= f_l & act = rd ) -> AX (access = GRANT)); we would produce test configurations that
cover all 3-way, 4-way, etc., combinations.
(With only three parameters, 3-way inter-
-- if user level is at or below file level, then write is OK
action would be equivalent to exhaustive
SPEC AG ((f_l >= u_l & act = wr ) -> AX (access = GRANT));
testing, so we use 2-way combinations for

illustration purposes.) The first step is to
-- if neither condition above is true, then DENY any action
define the parameters (using the graphical

SPEC AG (!( (u_l >= f_l & act = rd ) | (f_l >= u_l & act = wr ))
-> AX (access = DENY));

SPEC AG (!( (u_l >= f_l & act = rd ) | (f_l >= u_l &
-> AX (access = DENY));
Automated Combinatorial Test Methods Beyond Pairwise Testing
user interface if desired) and their values the input values would disprove the claims
Test
in a system definition file that will be used specified in the previous section. Each of
u_l: 0,1,2 1
as input to the covering array generator these counterexamples is, thus, a set of test
2
f_l: 0,1,2
FireEye with the following format: After data that would have the expected result of
3
the system definition file is saved, we run GRANT or DENY. For each SPEC claim, if
act: rd, wr 4
Figure 6: Model Parameters and Values
FireEye, in this case specifying 2-way this set of values cannot in fact lead to the
5
interactions. FireEye produces the output particular result, the model checker indicates
6
shown in Figure 6. that this is true. For example, for the config- Test u_l f_l act 7
Each test configuration defines a set of uration below, the claim that access will not
u_l: 0,1,2 1 0 0 rd 8
values for the input parameters u_l, f_l, be f_l:

granted 0,1,2
is true, because the users clear- 2 0 1 wr 9
and act. The complete test set ensures that ance level (u_l
rd, =wr 0) is below the files level 3 0 2 rd
all 2-way combinations of parameter values (f_l = 2):
act: 4 1 0 wr
have been covered
5 1 1 rd
-- specification AG (((u_l 6 1 2 wr
= 0 & f_l = 2) & act = rd)
7 2 0 rd
Model Claims With Covering
-> AX !(access = GRANT)) is
8 2 1 wr
true
Array Values Inserted
The next step is to assign values from the
9 2 2 wr
covering array to parameters used in the Figure 7: FireEye Output Test Values
model. For each test, we write a claim that If the claim is false, the model checker
indicates this and provides a trace of para- simply reports the fact while if it is false, a
the expected result will not occur. The trace of inputs and internal states is pro-
model checker determines combinations meter input values and states that will prove
it is false. In effect, this is a complete test duced to show how the claim fails. Some
that would disprove these claims, out- testing may require information on internal
putting these as counterexamples. Each case, i.e., a set of parameter values and an
expected result. It is then simple to map states or variable values, and the previous
counterexample can then be converted to a procedure provides this information.
these values into complete test cases in the
test with known expected result. For exam-
syntax needed for the system under test. An
ple, for Test 1 the parameter values are:
excerpt from NuSMV output is shown in
Figure 8.
Shell Script Post-Processing to
u_l = 0 & f_l = 0 & act = rd
The model checker finds that six of the
Produce Complete Tests
The last step is to use a post-processing tool
input parameter configurations produce a
For each of the nine configurations in that reads the output of the model checker
result of GRANT and three produce a
the covering array (Figure 7), we create a DENY result, so at the completion of this and generates a set of test inputs with
SPEC claim of the form: SPEC AG( cover- expected results. The post-processor strips
step we have successfully matched up each
ing array values ) -> AX !(access = result). input parameter configuration with the out the parameter names and values, giving
This process is repeated for each possi- result that should be produced by the sys- tests that can be applied to the system under
ble result, in this case either GRANT or tem under test. test. Simple scripts are then used to convert
DENY, so we have nine claims for each of At first, the method previously the test cases into input for a suitable test
the two results. The model checker is able to described may seem backward. Instead of harness. The tests produced are shown in
determine, using the model defined previ- negating each possible result, why not sim- Figure 9 (see next page).
ously, which result is the correct one for ply produce tests from model checker out-
each set of input values, producing a total of put such as specification AG
nine tests.
Conclusion
(((u_l = 0 & f_l = 2) & act = While tests for this trivial example could
rd) -> AX (access = DENY)) is easily have been constructed manually,
Excerpt: true? Such a procedure would work fine for the procedures introduced in this tutorial
this simple example, but more sophisticated can and have been used to produce
SPEC AG((u_l = 0 & f_l = 0 & act testing may require more information. Note tens of thousands of complete test cases
= rd) -> AX !(access = GRANT)); that if the claim is true, the model checker in a few minutes once the SMV model
SPEC AG((u_l = 0 & f_l = 1 & act
= wr) -> AX !(access = GRANT)); Figure 8: Counterexamples (excerpt)
SPEC AG((u_l = 0 & f_l = 2 & act -- specification AG (((u_l = 0 & f_l = 0) & act = rd)
= rd) -> AX !(access = GRANT)); -> AX !(access = GRANT)) is false
etc. -- as demonstrated by the following execution sequence
Trace Description: CTL Counterexample
SPEC AG((u_l = 0 & f_l = 0 & act Trace Type: Counterexample
= rd) -> AX !(access = DENY)); -> State: 1.1 <-
SPEC AG((u_l = 0 & f_l = 1 & act u_l = 0
= wr) -> AX !(access = DENY)); f_l = 0
SPEC AG((u_l = 0 & f_l = 2 & act act = rd
= rd) -> AX !(access = DENY)); access = START_
etc. -> Input: 1.2 <-`
-> State: 1.2 <-
Generating Counterexamples access = GRANT
With Model Checker
NuSMV produces counterexamples where

etc.

5. Wallace, D.R., and D.R. Kuhn. Failure

Modes in Medical Device Software: An
u_l = 0 & f_l = 0 & act = rd -> access = GRANT
Analysis of 15 Years of Recall Data.
u_l = 0 & f_l = 1 & act = wr -> access = GRANT
u_l = 1 & f_l = 1 & act = rd -> access = GRANT International Journal of Reliability,
u_l = 1 & f_l = 2 & act = wr -> access = GRANT Quality and Safety Engineering 8(4):351-
u_l = 2 & f_l = 0 & act = rd -> access = GRANT 371, 2001.
6. Kuhn, D.R., and V. Okun. Pseudo-
Exhaustive Testing for Software. Proc.
u_l = 2 & f_l = 2 & act = rd -> access = GRANT
of 30th NASA/IEEE Software Engi-
u_l = 0 & f_l = 2 & act = rd -> access = DENY
neering Workshop. Apr. 2006.
u_l = 1 & f_l = 0 & act = wr -> access = DENY
u_l = 2 & f_l = 1 & act = wr -> access = DENY 7. Cimatti, A., E.M. Clarke, E.
Figure 9: Test Cases Giunchiglia, F. Giunchiglia, M. Pistore,
has been defined for the system under M. Roveri, R. Sebastiani, and A.
test. The methods in this article still Tacchella. NuSMV 2: An OpenSource
References
1. Daich, G.T. New Spreadsheet Tool
require human intervention and engi- Tool for Symbolic Model Checking.
Helps Determine Minimal Set of Test Proc. of International Conference on
neering judgment to define a formal Parameter Combinations. CrossTalk Computer-Aided Verification. Copen-
model of the system under test and for Aug. 2003. hagen, Denmark.
determining appropriate abstractions and 2. Phadke, M.S. Planning Efficient Soft- 8. Ammann, P., and P.E. Black. Abstract-
equivalence classes for input parameters. ware Tests. CrossTalk Oct. 1997. ing Formal Specifications to Generate
But by automating test generation we can 3. Lei, Y., R. Kacker, D.R. Kuhn, V. Okun, Software Tests via Model Checking.
provide much more thorough testing and J. Lawrence. IPOG/IPOG-D: Proc. of 18th Digital Avionics Systems
than is possible with most conventional Efficient Test Generation for Multi-Way Conference. St. Louis, MO. Oct. 1999.
methods. In addition, the testing has a Combinatorial Testing. Software
sound empirical basis in the observation Testing, Verification, and Reliability (to
that software failures have been shown to appear 2008).
Notes
1. Available on <http://csrc.nist.gov/
be caused by the interaction of relatively 4. Kuhn, D.R., D. Wallace, and A. Gallo. acts>.
few variables. By testing all variable inter- Software Fault Interactions and 2. The tool can be downloaded at
actions to an appropriate strength, we Implications for Software Testing. <http://nusmv.irst.itc.it/>. More infor-
can provide stronger assurance for criti- IEEE Transactions on Software mation on SMV can be found at
cal software. Engineering 30(6):418-421, 2004. <www.cs.cmu.edu/~modelcheck/>.
About the Authors

D. Richard Kuhn is a Yu Lei, Ph.D., is an Raghu Kacker, Ph.D.,
computer scientist in the assistant professor of is a mathematical statisti-
computer security divi- computer science at the cian in the mathematical
sion of the National University of Texas, and computational sci-
Institute of Standards Arlington. He was a ences division of the
and Technology (NIST). member of the Fujitsu NIST. His current inter-
His primary technical interests are in Network Communications, Inc., techni- ests include software testing, uncertainty
information security, software assurance, cal staff from 1998 to 2001. Leis in physical and virtual measurements,
and empirical studies of software failure. research is in the area of automated soft- interlaboratory evaluations, and Bayesian
He co-developed the role based access ware analysis, testing, and verification. uncertainty in measurement. Kacker
control model (RBAC) used throughout His current research is supported by received his doctorate in statistics from
industry, and led the effort to establish NIST. Lei has a bachelors degree from Iowa State University.
RBAC as an American National Wuhan University, a masters degree
Standards Institute standard. Kuhn has from Chinese Academy of Sciences, and NIST
a masters degree in computer science a doctorate from North Carolina State 100 Bureau DR
from the University of Maryland, University. MS 8910
College Park, and a bachelors and master Gaithersburg, MD 20899-8910
of business administration from William The University of Texas Phone: (301) 975-2109
& Mary. at Arlington Fax: (301) 975-3553
Department of Computer E-mail: raghu.kacker@nist.gov
NIST Science and Engineering
MS 8930 P.O. Box 19015
Gaithersburg, MD 20899-8930 Arlington,TX 76019-0015
Phone: (301) 975-3337 Phone: (817) 272-2341
Fax: (301) 975-8387 Fax: (817) 272-3784
E-mail: kuhn@nist.gov E-mail: ylei@cse.uta.edu

Open Forum
Software Quality Unpeeled

Dr. Jeffrey Voas
SAIC
The expression software quality has many interpretations and meanings. In this article, I do not attempt to select any one
in particular, but instead help the reader see the underlying considerations that underscore software quality. Software quality
is a lot more than standards, metrics, models, testing, etc. This article digs into the mystique behind this elusive area.
T he term software quality has been one

of the most overused, misused, and
overloaded terms in software engineering.
ing off different layers, it allows us to
have a rational discussion between a typ-
ical software supplier and end user such
and completed. (Demonstrating that they
were applied correctly is a trickier issue.)
In the second school, you certify that the
It is a generic term that suggests quality that an agreement can be reached as to developed software meets the functional
software but lacks general consensus on whether or not the software is good requirements; this can be accomplished via
meaning. Attempts have been made to enough. various types of testing or other analyses.
define it. The Institute for Electronics and For the third school, you can certify that
Electrical Engineers (IEEE) Standard 729 Certification the software itself is fit for purpose. This
defines it as: We will begin dissecting software quality third school will be the most useful, and
by first looking at the multiple viewpoints throughout this article, it will be consid-
totality of features of a software behind the term certification. This will pro- ered software to be good enough if it is fit
product that bears on its ability to vide us with a look into our first layer. for purpose.
satisfy given needs and compos- In this article, the term purpose sug-
ite characteristics of software that gests that two things are present: (1) exe-
determine the degree to which the In some instances cutable software; and (2) an operating
software in use will meet the expec- environment. An environment is a complex
tations of the customer. [1] phantom users more entity: It involves the set of inputs that
the software will receive during execu-
However, this attempt and others are heavily determine tion, along with the probability that the
few, and not precise. In fact the second events will occur [6]. This is referred to as
edition of the Encyclopedia of Software whether the software is the operational profile [6]. But it also
Engineering [2] does not have it listed as involves the hardware that the software
an entry; the encyclopedia skips straight fit for purpose than the operates on, the operating system, avail-
from Software Productivity able memory, disk space, drivers, and
Consortium to software reading. And traditional inputs. In other background processes that are
worse, books with software quality in the potentially competing for hardware
title never give a definition to it in their short, it is environment resources, etc. These other factors are as
pages [3, 4]. much a part of the environment as are
If you review the past 20 years or so, that gives fit for purpose the traditional inputs; they have been
you will find an abundance of other termed invisible or phantom users.
terms that have been employed as pseu- context. In some instances phantom users
do-synonyms for software quality. more heavily determine whether the soft-
Examples include process improvement, The term is often used to refer to cer- ware is fit for purpose than the tradition-
software testing, quality management, the tifying people skills. For example, the al inputs. In short, it is environment that
International Organization for American Society for Quality (ASQ) has gives fit for purpose context. By more
Standardization 9001, software metrics, a host of certifications that individuals completely defining and thus bounding
software reliability, quality modeling, con- can attain in order to demonstrate com- the environment to include phantom
figuration management, Capability petence in certain fields, e.g., they can users, we gain an advantage in that we can
Maturity Model Integration, bench- become an ASQ Certified Software reduce the set of assumptions needed to
marking, etc. In doing so, the term soft- Quality Engineer. An individual can also predict whether the software is good
ware quality has wound up representing a become certified in specific commercial enough. Understanding the distinction
family of processes and ideas more than software packages, e.g., a Microsoft between traditional inputs and phantom
it represents good enough software. In Certified Software Engineer. users is one ingredient needed to argue
short, software quality has become a cul- For the purposes here, I employ a dif- that fit for purpose has been achieved.
ture and community more than a techni- ferent perspective that comes from three Further, note that rarely will there be
cal goal [5]. schools of thought. The first school deals only one environment that software, and
In this article, I will avoid the quick- with certifying that a certain set of devel- in particular general purpose software, will
sand associated with trying to come up opment, testing, or other processes encounter during operation. That offers a
with a one-size-fits-all definition. Instead, applied during the pre-release phases of key insight as to why general purpose soft-
I will expose how software quality is the life-cycle were satisfied. In doing so, ware is not certified by independent labo-
composed of various layers and, by peel- you certify that the processes were followed ratories; such laboratories could not be

Open Forum
requirements.) Other family members

Scenario Meets Requirements Satisfies Development Fit for Purpose such as dependability, survivability, sus-
Processes tainability, testability, interoperability, and
1 No No No scalability each require some degree of
2 No No Yes one or more of the first six attributes. So,
3 No Yes No for example, to have a dependable sys-
4 No Yes Yes tem, some level of reliable and fault-tol-
5 Yes No No erant behavior is necessary. To have a sur-
6 Yes No Yes vivable system, some amount of fault tol-
7 Yes Yes No erance and availability is required, and so
8 Yes Yes Yes on. However, to simplify this discussion
Table 1: Views on Certification towards our goal of understanding the
term software quality, we will focus only
omnipotent and could not know all of the In summary, fit for purpose is the near- on the first six: reliability, performance,
potential target environments [7]. est of the three certification schools of safety, security, availability, and fault-tol-
By revisiting the three schools of thought of the IEEE definitions for soft- erance.
thought on certification, we discover ware quality. However, we cannot only
Reliability Performance Safety Security Availability Fault-tolerance
Yes
eight ways to visualize software quality rely on knowing the environment
Yes Yes Yes Yes and Yes
(See Table 1). Let us look at a couple. expect to be justified in proclaiming we
Ility Oxymorons
Table 2 illustrates combinations of these
Yes No
No
In Table 1, scenario 2 representsYes a sys- have achieved software quality. Let us six ilities. If we were to flesh this table out
No
tem that did not meet the requirements explore other Yes
considerations. as we did in Table 1 we would have 64
No
and was not developed according to the Yes
rows; however, here we only show 14
specified development procedures, but Three High-Level Attributes combinations for brevity. (For the cells
No Yes
miraculously, the end result was software of Fit for Purpose left empty, we are not considering the
Yes No
that was usable in the field. WhileNothis Most readers Yes would probably be com-
degree to which that attribute contributes
Yes No
seems implausible, it is possible. Scenario fortable with No labeling software as being to the quality of the softwares behavior.)
7 is the opposite: a system that met the of good quality if the software
Yes could Let us look at a few of these categories
Yes
Yes requirements and Nowas developed accord- ensure thatYes (1) it produces accurate and and determine what they represent:
Yes No
ing to specifiedYes development procedures reliable output, Yes (2) it produces the need- Category 1. Suggests that the soft-
but resulted in unusable software. ed output in
Noa timely manner,No and (3) it ware No is reliable, has good perfor-
Scenario 7 may seem like heresy to many produces the output in a secure and pri- mance, does not trigger unsafe events
No No No
in the community of software quality vate manner. These three criteria simply to occur (e.g., in a transportation con-
practitioners. It is not; it simply dispels the state that you get the right results at the trol system), has appropriate levels of
myth that requirements elicitation is far right time at the correct level of security. security built in, has good availability
from a perfect science and Scenario
that simply Meets
fol- These are the next three considerations
Requirements Satisfies Development andPurpose
Fit for thus does not suffer from fre-
lowing common sense dos and donts (as that cannot be ignored and must incor- quent failures resulting in downtime,
Processes
spelled out in a development process
1 plan) porate
No
into what software quality
No
means. and Nois resilient to internal failures (i.e.,
guarantees good enough software 2 [8]. No
While each of these is Nointuitive, none fault
Yestolerant). Is all of this possible in
Note that only four of these3 scenarios is No precise enough. The familyYes
of attribut- a single
No software system?
yield good enough software: 2, 4, 6, and es referred to as the ilities is
4 No Yes
a good start- Category 2. Suggests that the soft-
Yes
8. The other four provide a product5
that ing
Yes
point to help increase Nothat precision wareNo offers reliable behavior, but suf-
is not usable for its target environment [9]. This family includes behavioral
6 Yes No
char- fers from the likelihood of producing
Yes
and that brings us back to the7discussion acteristics
Yes
such as reliability,
Yes
perfor- outputs
No that send the system that the
of why scoping the target environment
8
as mance,
Yes
safety, security, availability,
Yes
fault- software feeds inputs into an unsafe
Yes
precisely as possible is an important piece tolerance, etc. (These attributes are also mode. This would represent a safety-
of what software quality means. sometimes termed non-functional critical system where hazardous fail-
Table 2: Ility Combinations
Category Reliability Performance Safety Security Availability Fault-tolerance
1 Yes Yes Yes Yes Yes Yes
2 Yes No
3 No Yes
4 No Yes
5 No Yes
6 No Yes
7 Yes No
8 Yes No
9 No Yes
10 Yes No
11 Yes No Yes
12 Yes No Yes
13 Yes Yes
14 No No No No No No

Software Quality Unpeeled
ures are unacceptable; hazardous fail- tant point is that some combinations of passing through the channel is moving
ures are categorized differently for the ilities are simply counterintuitive, such between trusted entities. The difficulty in
such systems than failures that do not as a system that is safe but unreliable. defining shall nots for security is that we
facilitate possible disastrous loss-of- One last thing to note: It is vital to get cannot imagine all of the different forms
life or loss-of-property consequences. solid definitions for the ilities and to know of malicious attacks that are being
(Note that software by itself is never which ones are quantifiable. For example, invented on-the-fly and if we cannot
unsafe; however software is often reliability and performance are quantifi- imagine those attacks, we likely will not
referred to as unsafe if it produces able; security and safety are not. This prevent them.
outputs to a system that put the sys- makes it far easier to make statements Before leaving the topic of negative
tem into an unsafe mode. Safety is a such as we have very high reliability but an functional requirements, it is worth men-
system property, not a software prop- unknown level of security. tioning an interesting relationship
erty.) A classic example of a reliable between them and the environment. So
product that is unsafe is placing a far, we have only mentioned traditional
functioning toaster into a bathtub of inputs and phantom users as players in
The Shall Nots
There is yet another layer in the notion of
water with the cord still connected; fit for use that deals with negative function- the environment. Traditional inputs are
the toaster is reliable, but it is not safe al requirements. Think of a negative those that the software expects to receive
to go near. requirement as the software shall not do during operation. But there are two other
Category 3. Suggests that the soft- X, as opposed to a functional require- types of inputs worth mentioning: mali-
ware behaves so unreliably when exe- ment stating that the software shall do X. cious illegal and non-malicious illegal. A mali-
cuted that it cannot put the system cious illegal input is one that someone
into an unsafe mode. An example deliberately feeds into the software to
here would be that the software gets For certain types of attack a system, and a non-malicious ille-
hung up in a loop and the safety func- gal input is simply an input that the sys-
tionality is never invoked. systems, particularly tem designers do not want the software
Category 8. Suggests safe but not to accept but has no malicious intent. In
secure software behavior. This is quite safety-critical systems, both cases, filtering on either type of
realistic for a safety-critical system input can be useful to ensure that certain
with no security concerns. Note that enumerating negative inputs do not become a part of the envi-
the interesting aspect of this category ronment and in doing so ensure that neg-
is how safety and security are defined. requirements is a ative functional requirements are
Many people use these terms inter- enforced.
changeably, which is incorrect. necessity. And for
Category 11. Suggests that the soft- Time
ware behaves reliably and has good software requiring The next layer in our quest for software
availability, but lacks adequate security quality is time. Software has fixed longevi-
precautions. Many systems suffer security capabilities, ty; it can be expanded, as we learned from
from this problem. Y2K, but not indefinitely.
Category 12. Suggests that the soft- security rules and One of the easiest ways to explain
ware behaves reliably, is extremely why time fits here is to look at the situa-
slow, but has adequate security. It policies are its equivalent tion where a software package operates
makes one wonder if the system is so correctly on Monday but does not oper-
slow that it is effectively unusable, and
to negative ate correctly on Tuesday. Further, the
thus secure, since it would take too software package was not modified
long to break in. between these days. (This is the classic
requirements.
Category 13. Suggests high levels of problem that quickly carves down the
security and high levels of perfor- Negative requirements are far more number of freshman computer science
mance. In certain situations that is difficult to elicit than regular require- majors.) Why has this problem occurred?
plausible, however typically security ments. Why? Because humans are not It all goes back to the importance of
kills performance and vice versa. programmed to anticipate and enumerate environment in the understanding of
Category 14. This is the easiest com- all of the bad circumstances that can pop software quality. Earlier we defined the
bination to achieve. Anyone can build up and that we need protection against; environment as inputs with probabilities
a useless system. we are instead programmed to think of selection, hardware configurations,
The main point here is that the afore- about the good things we want the soft- access to memory, operating systems,
mentioned high-level attributes (1) pro- ware to do. attached databases, and whether other
duce accurate and reliable output, (2) For certain types of systems, particu- background processes were over-
produce the needed output in a timely larly safety-critical systems, enumerating indulging in resources, etc.
manner, and (3) produce the output in a negative requirements is a necessity. And But what is not mentioned was calen-
secure and private manner are actually for software requiring security capabili- dar time. Environment is also a function
composed of the lower-level ilities. ties, security rules and policies are its of time. As time moves forward, other
Another important point not to overlook equivalent to negative requirements. For pieces of the environment change. And
is the fact that some of the ilities are not example, a negative security requirement so while all effort and expense can be
compatible with one another. An exam- could be that the software shall never levied toward what we perceive is evi-
ple of this can easily be found using fault open access to a particular channel unless dence supporting the claim that we have
tolerance and testability. A final impor- it can be guaranteed that the information good enough software, we need to recog-

Open Forum
nize that even if we do have good decades to come.

enough software, it may be only for a
short window of time. Thus, software References
quality is time-dependent, a bitter pill to 1. IEEE. Standard Glossary of Software
Get Your Free Subscription swallow. Engineering Terminology IEEE.
American National Standards Insti-
Fill out and send us this form. Cost tute/IEEE Standard 729-1983: 1983.
We cannot end this article without men- 2. Marciniak, J., ed. Encyclopedia of
517 SMXS/MXDEA tioning cost. The costs associated with Software Engineering. Second
6022 Fir Ave software quality are exasperated by the Edition. Wiley Inter-Science: 2002.
Bldg 1238 un-family like behaviors of various ilities. 3. Wieczorek, Martin, and Dirk
Hill AFB, UT 84056-5820 Not only are there technical trade-offs Meyerhoff, editors. Software Quality:
Fax: (801) 777-8069 DSN: 777-8069
discovered when trying to increase the State of the Art in Management,
degree to which one ilitiy exists only to Testing, and Tools. Springer. New
Phone: (801) 775-5555 DSN: 775-5555
find that another is automatically York: 2001.
decreased, but there is the financial trade- 4. Gao, J.Z., H.S. Jacob Tsao, and Y. Wu.
off quagmire concerning how to allocate Testing and Quality Assurance for
Or request online at www.stsc.hill.af.mil
NAME:________________________________________________________________________ financial resources between distinct ilities. Component-Based Software. Artech

If you overspend on one, there may not House. Norwood, MA: 2003.
be enough funds for another. And, as if 5. Whittacker, J., and J. Voas 50 Years of
RANK/GRADE:_____________________________________________________ the technical considerations are not hard Software: Key Principles for Quality.
enough when trying to define software IEEE IT Professional. 4(6): 28-35,
POSITION/TITLE:__________________________________________________ quality, the financial considerations come Nov. 2002.
aboard, making the problem worse. 6. Musa, John D., Anthony Iannino, and
ORGANIZATION:_____________________________________________________ Kazuhiiro Okumoto. Software
Reliability: Measurement, Prediction,
Application. McGraw-Hill, NY, 1987.
Conclusion
In this article, a set of layers for what
ADDRESS:________________________________________________________________ software quality means has been 7. Voas, J. Software Certification Lab-
unpeeled. I have argued that a more use- oratories? CrossTalk Apr. 1998.
________________________________________________________________ ful perspective for what software quality 8. Voas, J. Can Clean Pipes Produce
represents starts from the notion of the Dirty Water? IEEE Software July
BASE/CITY:____________________________________________________________ software being fit for purpose, which 1997.
requires: 9. Voas, J. Softwares Secret Sauce: The
1. Understanding the relationship Ilities. IEEE Software Nov. 2004.
STATE:___________________________ZIP:___________________________________
between the functional requirements and
the environment.
PHONE:(_____)_______________________________________________________ 2. Understanding the three high-level
attributes of software quality: the
About the Author
FAX:(_____)_____________________________________________________________ software: (a) produces accurate and reli- Jeffrey Voas, Ph.D., is
able output, (b) produces the needed currently director of sys-
output in a timely manner, and (c) pro-
E-MAIL:__________________________________________________________________
duces the output in a secure and private tems assurance at Science
CHECK BOX(ES) TO REQUEST BACK ISSUES: manner. Applications Interna-
3. Understanding that the ilities afford tional Corporation (SAIC).
MAR2007 c SOFTWARE SECURITY
the potential to have varying degrees He was the president of
A PR 2007 c AGILE DEVELOPMENT of the high-level attributes. the IEEE Reliability Society from 2003-
MAY2007 c SOFTWARE ACQUISITION 4. Understanding that the shall-not func- 2005, and currently serves on the Board
JUNE2007 c COTS INTEGRATION tional requirements are often of equal of Governors of the IEEE Computer
JULY2007 c NET-CENTRICITY importance to the functional require- Society. Voas has published numerous
AUG2007 STORIES OF CHANGE ments.
c articles over the past 20 years, and is best
5. Understanding that there is a temporal
SEPT2007 c SERVICE-ORIENTED ARCH.
component to software quality; soft- known for his work in software testing,
OCT2007 c SYSTEMS ENGINEERING
ware quality is not static or stagnant, reliability, and metrics. He has a doctor-
NOV2007 c WORKING AS A TEAM 6. Understanding that the ilities offer ate in computer science from the College
DEC2007 c SOFTWARE SUSTAINMENT technical incompatibilities as well as of William & Mary.
JAN2008 c TRAINING AND EDUCATION financial incompatibilities.
FEB2008 c SMALL PROJECTS, BIG ISSUES 7. Understanding that the environment
contains many more parameters such
SAIC
MAR2008 c THE BEGINNING
as the phantom users than is typically
200 12th ST South
APR2008 c PROJECT TRACKING
considered.
STE 1500
MAY2008 c LEAN PRINCIPLES Thus, software quality, when viewed
Arlington,VA 22202
To request back issues on topics not with these different considerations, Phone: (703) 414-3842
listed above, please contact <stsc. becomes a far more interesting topic, and Fax: (703) 414-8250
customerservice@hill.af.mil>. one that will continue to perplex us for E-mail: j.voas@ieee.org

BACKTALK
Forecasting the Future
W elcome to the June issue of CrossTalk and I hope

you didnt miss the Systems and Software Technology
Conference (SSTC) conference in Las Vegas about a month
three days out) is great. Two weeks out? More general pre-
dictions (i.e., warming trend) give the forecaster some leeway.
One day out? A high of 73 with afternoon showers, ending
ago. It was great! First of all, the location was superb con- by 9 p.m. tonight.
ference facilities, hotel, location. Come on Barry Manilow The secret to quality is the same: if you really think you
and the Star Trek Experience? Geek heaven. The weather was can set a schedule (such as 28 lines of code per programmer
fantastic (although a bit warm during the day), and having the per day) that will allow your developers to achieve a quality
monorail to travel to/from the Strip was convenient. target (No more than two errors found per 500 function
The exhibits were also good (as usual). There seemed to points during integration testing) that is also reasonably
be quite a few of the process-oriented vendors this year a good accurate one year out, well ... how did you enjoy SSTC 2009?
thing, if you ask me. And, lets face it, the giveaways and gad- Even with lots of historic data for similar projects, each
gets were spectacular. The food was also great. Personally, I development effort is different. Weather forecasters have
think that the speakers were better than ever this year. It was access to about a hundred years of hurricane data, but still
good seeing a lot of old friends, and making new ones. The cannot tell me one month out when and where (or even if) a
only drawback was that with all of the distractions, far too hurricane is going to hit the United States.
many of us stayed up late, and made the 2008 SSTC confer- Quality is fragile. It is hard to achieve and, once lost, it
ence a conference to remember. What more could you ask seems to be gone forever. You cant test quality back in
for? those who have tried know better. You have to plan aggres-
Except that I am writing this column in March, and the sively for quality and you have to have a good process for it
SSTC conference is still a month in the future. However, I (I wasnt kidding earlier the more process-oriented vendors
am totally convinced that almost everything I wrote above at SSTC, the better it is for Department of Defense software
will be true, and that after the conference ends, I will be able in general).
to argue that I was very successful in predicting the future. If Forecast the future as best you can. Revise your forecasts
only software were so easy. (and timelines) as you get closer to your goals. A good
The theme of this issue is Software Quality. I have a very weather person has no problem saying, Well, last week we
unique definition of quality. In my mind, quality was defined said generally clear, but prepare for a heavy rain tomorrow.
by Simon and Garfunkel back in 1970 on their Bridge Over A good program manager might just have to say, We had
Troubled Water album (arguably the best piece of music ever hoped to be in integration testing this week, but we need
released). There was a song entitled Keep the Customer another month to complete inspections and peer reviews.
Satisfied. Awesome lyrics. And that is the key keeping the Nobody wants to hear that their vacation and beach plans are
customer satisfied. going to be washed away, but it happens. Nobody wants to
So, how do I achieve this dubious thing called quality hear that we are having problems with code quality, but if
which is hard to measure directly? Well, I am sure that this you have a good forecast and update it as needed, maybe you
issue contains great articles about quality (since Im writing wont.
in the future, I dont even know the article lineup yet!). Hope you enjoyed SSTC 2008 and see you at SSTC
In my mind, quality needs to include customer satisfac- 2009. Trust me it was great, also!
tion. Is the software going to be used in a passenger aircraft?
Well then, as a potential customer, I am pretty darn hard to David A. Cook
satisfy. The other day, I was flying across the little pond The AEgis Technologies Group, Inc.
(returning from London across the Atlantic) and the in-seat dcook@aegistg.com
entertainment system I was using crashed I actually got a
core dump error message and saw Linux rebooting. The lady
sitting next to me had it happen to her, and her comment Can You BackTalk?
was, I sure hope that the software that runs the aircraft
works better. Well, having worked with several aircraft soft- Here is your chance to make your point, even if it is a bit
ware developers, I can assure you that it does run much bet- tongue-in-cheek, without your boss censoring your writing. In
ter. Is the software going to be used to simply rip a few of addition to accepting articles that relate to software engineer-
my old CDs to MP3s so that I can load them onto my latest ing for publication in CrossTalk, we also accept articles for
gadget? Then I am willing to have it occasionally fail. I have the BackTalk column. BackTalk articles should provide a
a feeling that the latest SuperX MP3 Ripper program I down- concise, clever, humorous, and insightful perspective on the
loaded free off the Web cost a lot less per line of code to software engineering profession or industry or a portion of it.
develop than the software that will power the Joint Strike Your BackTalk article should be entertaining and clever or
Fighter. original in concept, design, or delivery. The length should not
Which brings me back to forecasting the future. exceed 750 words.
Forecasting the future is not an exact science. A friend of For a complete authors packet detailing how to submit
mine who is a meteorologist says that an 85 percent to 90 your BackTalk article, visit our Web site at
percent success rate in intermediate range predictions (one to <www.stsc.hill.af.mil>.

CrossTalk is
co-sponsored by the
following organizations:
CrossTalk / 517 SMXS/MXDEA PRSRT STD

6022 Fir AVE U.S. POSTAGE PAID
Albuquerque, NM
BLDG 1238 Permit 737
Hill AFB, UT 84056-5820

0 Issue PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

0 Issue PDF

Încărcat de

Drepturi de autor:

Formate disponibile

Softwa re Quality CrossTalk

explains why the common test-and-fix software quality strategy is no longer

the defect removal efficiency to 95 percent and beyond.

CrossTalk,The Journal of Defense Software

Advanced static-analysis tools have been found to be effective at finding

describes how these work and outlines their limitations.

by Dr. Paul Anderson

This article introduces new tools for automating the production of

software quality. Software quality is a lot more than standards, metrics

by Dr. Jeffrey Voas

approved by the CROSSTALK editorial board prior to

publication. Please follow the Author Guidelines, avail-

published in CROSSTALK remain the property of the

2 CROSSTALK The Journal of Defense Software Engineering June 2008

Quality Programming Begets Software Quality

J oe Jarzombek, Director for Software Assurance in the National Cyber Security

June 2008 www.stsc.hill.af.mil 3

The Software Quality Challenge

4 CROSSTALK The Journal of Defense Software Engineering June 2008

and design are known, the requirements

design, and the operational code itself.

Thus, even if the code could be automati-

cally generated from the defective require-

ments and design, that code would reflect

these requirements and design defects

inject about 100 defects into every 1,000

ing them, we must fix at least all of the

Developers use various kinds of tools

is likely to have at test completion. The 200

made about extensive testing. Clearly, if

under failure conditions. Unfortunately,

June 2008 www.stsc.hill.af.mil 5

in unusual ways, their software is most

Successful Quality Strategies

machines, we begin to see how to proceed.

6 CROSSTALK The Journal of Defense Software Engineering June 2008

to identify, fix, and prevent other simi-

these eight steps and relate them to the

software quality principles as shown in the

Policies, goals, and plans go together and

form the essential foundation for all effec-

tive quality programs. The fundamental

policy that forms the foundation for the

argue that product function is critical and

that project schedule and program cost are

every bit as important as quality. In fact,

they will argue that cost, schedule, and

ty programs reduce total program cost, increase

business value and quality of delivered products,

and shorten development times. Customers

not properly managed. There is, in fact, no

quality properly, and cost and schedule

improvements will follow. Everyone in the

Carnegie Mellon University. personal responsibility and teaching them

June 2008 www.stsc.hill.af.mil 7

uating the quality of the process used to

thing and100 switch to the new184,756

8 CROSSTALK The Journal of Defense Software Engineering June 2008

Regardless of the quality management

June 2008 www.stsc.hill.af.mil 9

COMING EVENTS WEB SITES

10 CROSSTALK The Journal of Defense Software Engineering June 2008

T here are two very important measure-