Documente Academic
Documente Profesional
Documente Cultură
As the software in todays systems grow larger, it also contains more defects
The Software Quality Challenge CO-SPONSORS:
4 that adversely affect safety, security, and reliability of the systems. This article
DOD-CIO The Honorable John Grimes
adequate, and offers some suggestions for improvement. NAVAIR Jeff Schwalb
by Watts S. Humphrey 76 SMXG Phil Perkins
309 SMXG Karl Rogers
11 Measuring
This article discusses two measures that have strong influences on the
Defect Potentials and Defect Removal Efficiency
DHS Joe Jarzombek
outcomes of software projects: defect potentials and defect removal
efficiency, and relates the positive effects that can be achieved by increasing
STAFF:
by Capers Jones
PUBLISHER Kasey Thompson
MANAGING EDITOR Ken Davies
ASSOCIATE EDITOR Chelene Fortier-Lozancich
14 Quality
Asking, Would your company like to save $100,000 per day? this article
Processes Yield Quality Products
ARTICLE COORDINATOR Nicole Kentta
lists steps than can be taken to achieve that goal and more. The author PHONE (801) 775-5555
draws on 15 years of experience in process improvement to sift through
helpful tips including: not looking for a quick fix, keeping it short, and not
E-MAIL stsc.customerservice@
hill.af.mil
reinventing the wheel.
by Thomas D. Neff
CROSSTALK ONLINE www.stsc.hill.af.mil/
crosstalk
18 The Use and Limitations of Static-Analysis Tools to Improve Technology and Logistics (AT&L); U.S. Navy (USN);
U.S. Air Force (USAF); and the U.S. Department of
defects that jeopardize system safety and security. In this article, the author
Assistant Secretary of Defense (Networks and
complete test cases covering up to 6-way combinations, going beyond the Center (STSC) is the publisher of CrossTalk,
The USAF Software Technology Support
popular Pairwise testing. Pairwise (2-way) is low in cost, but is not sufficient of the journal. CrossTalks mission is to encourage
providing both editorial oversight and technical review
for assurance of mission-critical software.
by D. Richard Kuhn, Dr. Yu Lei, and Dr. Raghu Kacker
the engineering development of software to improve
the reliability, sustainability, and responsiveness of our
warfighting capability.
Open Forum
27 Software
The term software quality has many interpretations and meanings. The author
Quality Unpeeled
Subscriptions: Send correspondence concerning
helps readers understand the underlying considerations that underscore
subscriptions and changes of address to the following
models, and testing, and the mystique behind this elusive area is explored.
517 SMXS/MXDEA
ON THE COVER 13 Call for Articles are not necessarily the official views of, or endorsed
by, the U.S. government, the DoD, the co-sponsors, or
the STSC. All product names referenced in this issue
Cover Design by are trademarks of their companies.
Kent Bingham
17 Reader Results Request CrossTalk Online Services: See <www.stsc.hill.af.mil/
crosstalk>, call (801) 777-0857 or e-mail <stsc.web
master@hill.af.mil>.
BackTalk
Additional art services
provided by Janna Jensen
31 Back Issues Available: Please phone or e-mail us to
see if back issues are available free of charge.
Brent D. Baxter
Managing Director
Many aspects of our lives are governed by large, complex systems with increasingly complex software, and the safety, securi-
ty, and reliability of these systems has become a major concern. As the software in todays systems grows larger, it has more
defects, and these defects adversely affect the safety, security, and reliability of the systems. This article explains why the com-
mon test-and-fix software quality strategy is no longer adequate, and characterizes the properties of the quality strategy we
T
must pursue to solve the software quality problem in the future.
oday, many of the systems on which defects per page while even poor-quality different ways they can be used, and the
our lives and livelihoods depend are software has much less than one defect more ways users can use them, the harder
run by software. Whether we fly in air- per listing page. This means that the qual- it is to test all of these conditions in
planes, file taxes, or wear pacemakers, our ity level of even poor-quality software is advance. This was the logic behind the
safety and well being depend on software. higher than that obtained for other kinds beta-testing strategy started at IBM with
With each system enhancement, the size of human written text. Programming is an the OS/360 system more than 40 years
and complexity of these systems increase, exacting business, and these professionals ago. Early copies of the new system releas-
as does the likelihood of serious prob- are doing extraordinarily high quality es were sent to a small number of trusted
lems. Defects in video games, reservations work. The only problem is that based on users and IBM then fixed the problems
systems, or accounting programs may be historical trends, future systems will be they found before releasing the public ver-
inconvenient, but software defects in air- much larger and more complex than sion. This strategy was so successful that it
craft, automobiles, air traffic control sys- today, meaning that just to maintain has become widely used by almost all ven-
tems, nuclear power plants, and weapons todays defect levels, we must do much dors of commercial software.
systems can be dangerous. higher quality work in the future. Unfortunately, however, the beta-test-
Everyone depends on transportation To appreciate the challenge of achiev- ing strategy is not suitable for life-critical
networks, hospitals, medical devices, pub- ing 10 or fewer defects per million lines of systems. The V-22 Osprey helicopter, for
lic utilities, and the international financial code, consider what the source listing for example, uses a tilting wing and rotor sys-
infrastructure. These systems are all run such a program would look like. The list- tem in order to fly like an airplane and
by increasingly complex and potentially ing for a 1,000-line program would fill 40 land like a helicopter. In one test flight, the
defective software systems. Regardless of text pages; a million-line program would hydraulic system failed just as the pilot was
whether these large life-critical systems are take 40,000 pages. Clearly, finding all but tilting the wing to land. While the aircraft
newly developed or composed from mod- 10 defects in 40,000 pages of material is had a built-in back-up system to handle
ified legacy systems, to be safe or secure, humanly impossible. However, we now such failures, the aircraft had not been
they must have quality levels of very few have complex life-critical systems of this tested under those precise conditions, and
defects per million parts. scale and will have much larger ones in the the defect in the back-up systems soft-
Modern, large-scale systems typically relatively near future. So we must do ware had not been found. The defect
have enormous requirements documents, something, but what? That is the question caused the V-22 to become unstable and
large and complex designs, and millions of addressed in this article. crash, killing all aboard.
lines of software code. Uncorrected errors The problem is that as systems
in any aspect of the design and develop- become more complex, the number of
ment process generally result in defects in possible ways to use these systems grows
Why Defective Systems Work
To understand the software quality prob-
the operational systems. The defect levels lem, the first question we must answer is If exponentially. The testing problem is fur-
of such operational systems are typically todays software is so defective, why arent there ther complicated by the fact that the way
measured in defects per thousand lines of more software quality disasters? The answer is such systems are configured and the envi-
code. A one million line-of-code system that software is an amazing technology. ronments in which they are used also
with the typical quality level of one defect Once you test it and fix all of the prob- affect the way the software is executed.
per 1,000 lines would have 1,000 undis- lems found, that software will always work Table 1 lists some of the variations that
covered defects, while any reasonably safe under the conditions for which it was test- must be considered in testing complex
system of this scale must have only a very ed. It will not wear out, rust, rot, or get systems. An examination of the number
few defects, certainly less than 10. tired. The reason there are not more soft- of possibilities for even relatively simple
ware disasters is that testers have been systems shows why it is impractical to test
The Need for Quality able to exercise these systems in just about all possibilities for any complex system. So
all of the ways they are typically used. So, why is complex software so defective?
to solve the software quality problem, all
Software
Before condemning programmers for
doing sloppy work, it is appropriate to we must do is keep testing these systems Some Facts
consider the quality levels of other types in all of the ways they will be used. So Software is and must remain a human-
of printed media. A quick scan of most what is the problem? produced product. While tools and tech-
books, magazines, and newspapers will The problem is complexity. The more niques have been devised to automate the
reveal at least one and generally more complex these systems become, the more production of code once the requirements
and, thus, still be defective. Table 1: Some of the Possible Testing Table 2: Possible Paths Through a Network
When people design things, they make Variations defects. For example, a complex design
mistakes. The larger and more complex likely to be exercised when such systems defect that produced a confusing operator
their designs, the more mistakes they are are
300subjected to the stresses of high trans- message could pose no danger while a
likely to make. From course data on thou- action volume, accidents, failures, or mili- trivial typographical mistake that changed
Design/Code
sands of experienced engineers learning tary combat. a no to a yes could be very dangerous.
the Personal Software ProcessSM (PSPSM), it 250
Since there is no way to tell in advance
has been found that developers typically which defects would be damaging, we
thousand lines of code
grams until they run without obvious fail- find and fix those few that would be dan- code segments as well as in the branch
7. Network failures
ures. Then they submit these programs to gerous? 8.Obviously, we only need to fix instruction itself.
Operator errors
systems integration and testing where they the defects that would cause trouble, but
9. Version changes
For a large program, the numbers of C
are combined with other similar programs there is no10.
wayPower
to determine which defects possible paths or routes through a pro-
variations
into larger and larger sub-systems and sys- these are without examining all of the gram can vary by program type, but pro-
tems for progressively larger-scale testing. Figure 1: Total and Test Defect Rates of 810 Experienced Engineers
The defect content of programs entering
systems testing typically ranges between
10 and 20 defects per 1,000 lines.
The most disquieting fact is that test-
A 300 B
ing can only find a fraction of the defects
in a program. That is, the more defects a
250
program contains at test entry, the more it
thousand lines of code
A uct artifacts.
6. Evaluate all defects for correction and
sidebar.
based on sound historical data and estimating methods.
1.4. The development approach must be consistent with the rate of change in
requirements.
Step 1: Quality Policies, Goals, and 2. To get quality work, the customer must demand it.
2.1. Attributes that define quality for a software product must be stated in measur-
cost/schedule/quality trade-off: manage * These principles were defined by a group of 13 software quality experts convened by Taz Daughtrey. The
organization must understand and accept the developers, their teams, management, the skills required to measure and manage
this point: it is always faster and cheaper to and the customer. When defective work is the quality of their work, requires training.
do the job right the first time than it is to found, it must be promptly fixed. The While it would be most desirable for them
waste time fixing defective products after principle is that defects cost money. The to get this skill and the required knowledge
they have been developed. longer they are left in the product, the before they graduate from college, practic-
Once the basic quality policy is in more work will be built on this defective ing software developers must generally
place, customers, managers, and develop- foundation, and the more it will cost to learn them from using methods such as
ers must then establish and agree on the find and fix them [2]. the PSP.
quality goals for each project. The princi- With properly trained developers, the
pal goal must be to find and remove all development teams then need proper
defects in the program at the earliest pos- management, leadership, and coaching.
Step 2:Train and Coach Developers
sible time, with the overall objective of Quality work is not done by accident; it Again, the Team Software ProcessSM
and Teams
removing all defects before the start of takes dedicated effort and properly skilled (TSPSM) can provide this guidance and sup-
integration and system test. With the goals and motivated professionals. The third port [3, 4, 5].
established, the development teams must principle of software quality is absolutely
make measurable quality plans that can be essential: The developers must feel personally
tracked and assessed to ensure that the
Step 3: Manage Requirements
responsible for the quality of the products they pro-
project is producing quality work. This in duce. If they do not, they will not strive to One fundamental truth of all quality pro-
Quality
turn requires that the quality of the work produce quality results, and later trying to grams is that you must start with a quality
be measured at every step, and that the find and fix their defects will be costly, foundation to have any hope of producing
quality data be reviewed and assessed by time consuming, and ineffective. a quality result. In software, requirements
Team Software Process and TSP are service marks of Convincing developers that quality is their are the foundation for everything we do,
so the quality of requirements is para-
SM
mount. However, the requirements quality and agreement reached on how to incor- and manage the quality of the process used
problem is complicated by two facts. porate this new understanding into the to produce the programs parts. If, for
First, the quality measures must not be development work. This means that the example, we could devise a process that
abstract characteristics of a requirements
document; they should be precise and mea-
surable items such as defect counts from
B
requirements must be recognized as evolv-
ing through a sequence of versions while
the development estimates, plans, and
would consistently produce 1,000-line
modules that2each had less than a one per-
cent chance of having a single defect, a sys-
requirements inspections or counts of commitments are progressing through a tem of 1,000 of these modules would like-
requirements
Possible defects found in system test similar but delayed sequence of versions. ly have less than 10 defects per million
or customer
Pathuse. However, to be most help- And finally, the product itself will ulti- lines. One obvious problem with this strat-
ful, these quality measures must also address mately be produced in a further delayed egy concerns our ability to devise and
the precise understanding the developers sequence of versions. The quality manage- properly use such a process.
themselves have of the requirements ment problem concerns managing the There has been considerable progress
regardless of what the requirements origi- quality and maintaining the synchroniza- in producing and using such a process.
nators believe or how good a requirements tion of this sequence of parallel require- This is accomplished by measuring each
document has been produced. The develop- ments, plan, design, and product versions. developers process and producing a
ers will build what they believe the require- Process Quality Index (PQI). The TSP
ments say and not what the requirements quality profile, which forms the basis for
developers intended to say. This means that While statistical process control is a large the PQI measure, is shown in Figure 5 [6].
Step 4: Statistical Process Control
the quality-management problem the subject, we need only discuss two aspects: Then, the developers and their teams use
requirements process must address is the process management and continuous standard statistical process management
transfer of understanding from the require- process improvement. The first aspect, techniques to manage the quality of all
ments experts to the software developers. process management, is discussed here, dimensions of the development work [7].
The second key requirements fact is and process improvement is addressed in Data on early TSP teams show that by fol-
that the requirements are dynamic. As peo- Step 8. lowing this practice, quality is substantially
ple learn more about what the users need The first step in statistical process improved [8].
and what the developers can build, their management is to redefine the quality
A
views of what is needed will change. This
fact enormously complicates the require-
ments-management problem. The reason
management strategy. To achieve high lev-
els of software quality, it is necessary to
switch from looking for defects to manag-
Quality evaluation has two elements: eval-
Step 5: Quality Evaluation
any quality management and improvement ization for Standardization, correctness-by- provement Process for Software Engi-
system concerns defect data. Every defect construction, or AS9100) continuous neers. Reading, MA: Addison-Wesley,
found after development, whether by final improvement strategies such as those 2005.
testing, the users, or any other means must defined by CMMI and TSP should be 2. Jones, C. Software Quality: Analysis
be carefully evaluated and the evaluation applied to the improvement process itself. and Guidelines for Success. New York:
results used to improve both the process This means that the process quality mea- International Thompson Computer
and the product. The reason that these sures, the evaluation methods, and the deci- Press, 1997.
data are so important is that they concern sion thresholds must also be considered as 3. Humphrey, W.S. Winning With Soft-
the process failings. Every defect found important aspects of continuous process ware: An Executive Strategy. Reading,
after development represents a failure of improvement. Furthermore, since every MA: Addison-Wesley, 2002.
the development process, and each such developer, team, project, and organization 4. Humphrey, W.S. TSP: Leading a De-
failure must be analyzed and the results is different, it means that this continuous velopment Team. Reading, MA:
used to make two kinds of improvements. improvement process must involve every Addison-Wesley, 2006.
The first improvement and the one person on every development team and on 5. Humphrey, W.S. TSP: Coaching Devel-
that requires the most rapid turnaround every project in the organization. opment Teams Reading, MA: Addi-
time is determining where in the product son-Wesley, 2006.
similar defects could have been made and Conclusion 6. Humphrey, W.S. Three Dimensions of
taking immediate action to find and fix all While we face a major challenge in improv- Process Improvement, Part III: The
of those defects. The second improvement ing software quality, we also have substan- Team Process CrossTalk Apr. 1998.
activity is to analyze these defects to deter- tial and growing quality needs. It should 7. Florac, S., and A.D. Carleton. Measur-
mine how to prevent similar defects from now be clear to just about everyone in the ing the Software Process: Statistical
being injected in the future, and to devise a software business that the current testing- Process Control for Software Process
based quality strategy has reached a dead Improvement. Reading, MA: Addison
means to more promptly find and fix all
end. Software development groups have Wesley, 1999.
such defects before final testing or release
struggled for years to get quality improve- 8. Davis, N., and J. Mullaney. Team
to the user.
ments of 10 to 20 percent by trying differ- Software Process in Practice. SEI
ent testing strategies and methods, by Technical Report CMU/SEI-2003-TR
experimenting with improved testing tools, -014, Sept. 2003.
For any large-scale development effort,
Step 7: Configuration Management
and by working harder.
configuration management (CM) is critical. The quality improvements required are
This CM process must cover the product vast, and such improvements cannot be
About the Author
artifacts, the requirements, the design, and achieved by merely bulling ahead with the
the development process. It is also essen- Watts S. Humphrey
test-based methods of the past. While the
tial to measure and manage the quality of joined the Software En-
methods described in this article have not
the CM process itself. Since CM processes yet been fully proven for software, we now gineering Institute (SEI)
are relatively standard, however, they need have a growing body of evidence that they after his retirement from
not be discussed further. will work at least better than what we IBM. He established the
have been doing. What is more, this quali- SEIs Process Program
ty strategy uses the kinds of data-based and led development of the CMM for
The fundamental change required by this
Step 8: Process Improvement
methods that can guide long-term contin- Software, the PSP, and the TSP. He man-
software quality-management strategy is to uous improvement. In addition to improv- aged IBMs commercial software devel-
use the well-proven methods of statistical ing quality, this strategy has also been opment and was vice president of tech-
process control to guide continuous shown to save time and money.
process improvement [7]. Here, however, nical development. He is an SEI Fellow,
Finally, and most importantly, software an Association of Computing Machin-
we are not talking about improving the tol- quality is an issue that should concern every-
erances of machines or the purity of mate- ery member, an Institute of Electrical
one. Poor quality software now costs each of
rials; we are talking about managing the us time and money. In the immediate future, and Electronics Engineers Fellow, and a
quality levels of what people do, as well as it is also likely to threaten our lives and liveli- past member of the Malcolm Baldrige
the quality levels of their work products. hoods. Every one of us, whether a develop- National Quality Award Board of
While people will always make mistakes, er, a manager, or a user, must insist on qual- Examiners. In a recent White House cer-
they tend to make the same mistakes over ity work; it is the only way we will get the emony, the President awarded him the
and over. As a consequence, when devel- kind of software we all need. National Medal of Technology. He
opers have data on the defects they per- holds graduate degrees in physics and
sonally inject during their work and know business administration.
how to use these data, they and their team-
Acknowledgements
My thanks to Bob Cannon, David
mates can learn how to find just about all Carrington, Tim Chick, Taz Daughtrey,
of the mistakes that they make. Then, in Harry Levinson, Julia Mullaney, Bill
SEI
defining and improving the quality-man- Nichols, Bill Peterson, Alan Willett, and
4500 Fifth AVE
agement process, every developer must use Carol Woody for reviewing this article and
Pittsburgh, PA 15213-2612
these data to optimally utilize the full range offering their helpful suggestions. I also Phone: (412) 268-6379
of available defect detection and preven- much appreciate the constructive sugges- Fax: (412) 268-5758
tion methods. tions of the CrossTalk editorial board. E-mail: watts@sei.cmu.edu
for software applications. Actually, this rule Table 2: Defect Removal Efficiency
Total 5.00
technologies and methodologies that can faction previously had been very poor. ciency levels are important to the industry
lower defect potentials or reduce the num- as a whole; these measures have the great-
bers of bugs that must be eliminated. Measurement of Defect est impact on software performance of
Examples of defect prevention methods any known metrics.
include joint application design, structured Additionally, as an organization pro-
Potentials and Defect Removal
design, and also participation in formal Efficiency gresses from the U.S. average of 85 per-
inspections 3. Measuring defect potentials and defect cent in defect removal efficiency up to 95
The phrase defect removal refers to meth- removal efficiency levels are among the percent, the saved money and shortened
ods that can either raise the efficiency lev- easiest forms of software measurement, development schedules result because
els of specific forms of testing or raise the and are also the most important. To mea- most schedule delays and cost overruns are
overall cumulative removal efficiency by sure defect potentials it is necessary to due to excessive defect volumes during
adding additional kinds of review or test keep accurate records of all defects found testing. However, to climb above 95 per-
activity. Of course, both approaches are during the development cycle, which is cent defect removal efficiency up to 99
possible at the same time. something that should be done as a matter percent does require additional costs. It
In order to achieve a cumulative defect of course. The only difficulty is that private will be necessary to perform 100 percent
removal efficiency of 95 percent, it is nec- forms of defect removal such as unit test- inspections of every deliverable, and test-
essary to use approximately the following ing will need to be done on a volunteer ing will require about 20 percent more test
sequence of at least eight defect removal basis. cases than normal.
activities: Measuring the numbers of defects It is an interesting sociological obser-
Design inspections. found during reviews, inspections, and vation that measurements tend to change
Code inspections. testing is also straightforward. To complete human behavior. Therefore, it is important
Unit tests. the calculations for defect removal effi- to select measurements that will cause
New function tests. ciency, customer-reported defect reports behavioral changes in positive and benefi-
Regression tests. submitted during a fixed time period are cial directions. Measuring defect potentials
Performance tests. compared against the internal defects and defect removal efficiency levels have
System tests. found by the development team. The nor- been noted to make very beneficial
External beta tests. mal time period for calculating defect improvements in software development
To go above 95 percent, additional removal efficiency is 90 days after release. practices.
removal stages are needed. For example, As an example, if the development and When these measures were introduced
requirements inspections, test case inspec- testing teams found 900 defects before into large corporations such as IBM and
tions, and specialized forms of testing, release, and customers reported 100 ITT, in less than four years the volumes
such as human factors testing, add to defects in the first three months of usage, of delivered defects had declined by more
defect removal efficiency levels. it is apparent that the defect removal effi- than 50 percent, maintenance costs were
Since each testing stage will only be ciency would be 90 percent. reduced by more than 40 percent, and
about 30 percent efficient, it is not feasible Unfortunately, although measurements development schedules were shortened
to achieve a defect removal efficiency level of defect potentials and defect removal by more than 15 percent. There are no
of 95 percent by means of testing alone. efficiency levels should be carried out by other measurements that can yield such
Formal inspections will not only remove 100 percent of software organizations, the positive benefits in such a short time
most of the defects before testing begins, frequency of these measurements circa span. Both customer satisfaction and
it also raises the efficiency level of each 2008 is only about five percent of U.S. employee morale improved, too, as a
test stage. Inspections benefit testing companies. In fact, more than half of U.S. direct result of the reduction in defect
because design inspections provide a more companies do not have any useful quality potentials and the increase in defect
complete and accurate set of specifications metrics at all. More than 80 percent of U.S. removal efficiency levels.
from which to construct test cases. companies, including the great majority of
From an economic standpoint, com- commercial software vendors, have only
marginal quality control and are much
Reference
bining formal inspections and formal test- 1. Jones, Capers. Estimating Software
ing will be cheaper than testing by itself. lower than the optimal 95 percent defect Costs. 2nd edition. McGraw-Hill, New
Inspections and testing in concert will also removal efficiency level. This fact is one of York: 2007.
yield shorter development schedules than the reasons why so many software projects
testing alone. This is because when testing fail completely or experience massive cost
and schedule overruns. Usually failing pro-
Notes
starts after inspections, almost 85 percent 1. The averages for defect potentials are
of the defects will already be gone. jects seem to be ahead of schedule until derived from studies of about 600
Therefore, testing schedules will be short- testing starts, at which point huge volumes companies and 13,000 projects. Non-
ened by more than 45 percent. of unanticipated defects stop progress disclosure agreements prevent the
When IBM applied formal inspections almost completely. identification of most companies.
to a large database project, delivered As it happens, projects that average However some companies such as
defects were reduced by more than 50 per- about 95 percent in cumulative defect IBM and ITT have provided data on
cent from previous releases, and the over- removal efficiency tend to be optimal in defect potentials and removal efficien-
all schedule was shortened by about 15 several respects. They have the shortest cy levels.
percent. Testing itself was reduced from development schedules, the lowest devel- 2. The normal period for measuring
two shifts over a 60-day period to one shift opment costs, the highest levels of cus- defect removal efficiency starts with
over a 40-day period. More importantly, tomer satisfaction, and the highest levels requirements inspections and ends 90
customer satisfaction improved to good of team morale. This is why measures of days after delivery of the software to
from prior releases where customer satis- defect potentials and defect removal effi- its users or customers. Of course, there
are still latent defects in the software American Library, Mentor Books.
that will not be found in 90 days, but New York: 1979. About the Author
having a 90-day interval provides a 3. Garmus, David, and David Herron.
standard benchmark for defect Function Point Analysis. Addison Capers Jones is cur-
removal efficiency. It might be thought Wesley Longman, Boston: 2001. rently the president of
that extending the period from 90 days 4. Garmus, David, and David Herron. Capers Jones and Asso-
to six months or 12 months would pro- Measuring the Software Process: A ciates, LLC. He is also
vide more accurate results; however, Practical Guide to Functional Meas- the founder and former
updates and new releases usually come urement. Prentice Hall, Englewood chairman of Software
out after 90 days, so these would dilute Cliffs, NJ: 1995. Productivity Research (SPR) where he
the original defect counts. Latent 5. Grady, Robert B., and Deborah L. holds the title of Chief Scientist
defects found after the 90-day period Caswell. Software Metrics: Establish- Emeritus. He is a well-known author
can exist for years, but on average ing a Company-Wide Program. Pren- and international public speaker, and
about 50 percent of residual latent tice-Hall: 1987. has authored the books Patterns of
defects are found each year. The results 6. International Function Point Users
vary with number of users of the Software Systems Failure and Suc-
Group. IT Measurement. Addison cess, Applied Software Measure-
applications. The more users, the faster
Wesley Longman, Boston: 2002. ment, Software Quality: Analysis
residual latent defects are discovered.
7. Jones, Capers. Applied Software and Guidelines for Success, Esti-
3. Formal design and code inspections
Measurement. 3rd edition; McGraw- mating Software Costs, and Soft-
are the most effective defect removal
activity in the history of software, and Hill, New York: 2008. ware Assessments, Benchmarks, and
are also very good in terms of defect 8. Jones, Capers. Sizing Up Software. Best Practices. Jones and his col-
prevention. Once participants in Scientific American New York: Dec.
leagues from SPR have collected his-
inspections observe various kinds of 1998.
torical data from more than 600 cor-
defects in the materials being inspect- 9. Jones, Capers. Software Assessments,
porations and more than 30 govern-
ed, they tend to avoid those defects in Benchmarks, and Best Practices.
Addison Wesley Longman. Boston: ment organizations. This historical
their own work. All software projects data is a key resource for judging the
larger than 1,000 function points 2000.
10. Jones, Capers. Conflict and Litigation effectiveness of software process
should use formal design and code
Between Software Clients and De- improvement methods.
inspections.
velopers. Software Productivity Re-
search, Burlington, MA: 2003. Software Productivity
11. Kan, Stephen H. Metrics and Models
Additional Reading
1. Boehm, Barry W. Software Engineer- Research, LLC
ing Economics. Prentice Hall, Engle- in Software Quality Engineering. 2nd Phone: (877) 570-5459
wood Cliffs, NJ; 1981. edition. Addison Wesley Longman, Fax: (877) 570-5459
2. Crosby, Philip B. Quality Is Free. New Boston: 2003. E-mail: cjonesiii@cs.com
for
accept article submissions on software-related
the Editor and
CALL FOR ARTICLES
greater detail on the types of articles we're
l ki for:
If your experience or research has produced information that could be
useful to others, CrossTalk can get the word out. We are specifically
looking for articles on software-related topics to supplement upcoming
theme issues. Below is the submittal schedule for three areas of emphasis
we are looking
oo ng for:
Data and Data Management
December 2008
Submission Deadline: July 18, 2008
Software Measurement
February 2009
Submission Deadline: September 12, 2008
Please follow the Author Guidelines for CrossTalk, available on the Internet
at <www.s
<www.stsc.hill.af.mil/crosstalk>. We accept article submissions on software-related
topics at any time, along with Letters to the Editor and BackTalk. We also provide a
link tto each monthly theme, giving greater detail on the types of articles we're
looking for at <www.stsc.hill.af.mil/crosstalk/theme.html>.
Would your company like to save $100,000 per day? Would you like to surge an urgent projects delivery time by 50 per-
cent and deliver zero errors? Software organizations have done just that. In this article, I list small steps you can take that
will lead your company toward similar results based on my 15 years of process improvement experience.
and use it to pilot the CMMI. As you seven, plus or minus two), the better. contain. It does not tell you how to do
work through that project, write the However, you do not want just anyone. it but there is plenty of what. Do not
necessary standard operating instruc- You want people who share your wait until you have the how to get start-
tions (SOI)/standard operating proce- enthusiasm for process improvement ed. Take the what (i.e., CMMI) and turn
dures (SOPs), as identified by the and who see the big picture. If you it into a policy statement (SOI). Then,
CMMI, and test them with that pro- have the wrong team members, it can when it is time to create the how (SOP),
ject. Once they are acceptable, publish be detrimental because you will spend people will know which how to devel-
them as an example of how your orga- 80 percent of your time educating 20 op. You will have already added some
nization does business. Of course, percent of them. structure to your process improve-
these are living documents and as you Every team needs cheerleaders. If ment effort.
mature, your processes must evolve you document/improve all processes Keep focused. When writing an SOI,
with you. but no one knows about them, you do not delve into how people should do
Perfect is the enemy of good enough. have accomplished nothing. Find the something. You want to focus on what
If you are looking to produce perfect opinion leaders in each work area and they are to do and, on occasion, why.
processes, you will never get there. get them involved. If they are not on You can even describe a little of who or
Aim for the 80 percent solution. While the PIT, try to include them on the when. Once an SOP is created then it is
that might seem pretty low, remember occasional work group or have them time to describe how to do the job.
that each process has a feedback loop write an article for the PIT newsletter. These SOIs/SOPs should not be writ-
whereby improvements can easily and That newsletter can be another good ten for a three-year-old, but they also
frequently be made. I know of no one should not be written for a brain sur-
who has ever gone to work thinking, geon (unless it is an SOP on brain
I want to do worse today than yester- One company used a surgery). You should rarely include any
day. Most employees want to do a why material in an SOP. If the worker
better job. The problem is that they do predecessor of the does not know why they are doing their
not always know how, but processes job, they have bigger issues. SOPs
can give them a framework. Their CMMI-DEV and should be written by those already
experience and intuition will help fill in doing the job.
the details on how to improve. achieved the models We dont need no stinkin tools. Just
Jealousy is a great thing. Do not let as everyone wants instant gratification,
the lack of senior management sup- highest level of process we also want a super tool to make our
port stop you. All you need is any man- jobs easier, thus solving all of our
ager to support CPI to get it going. maturity in software problems. Come back to reality. That
Once you are making progress, others tool does not exist. I have found that if
will see something is different with development.That you buy a tool to solve your problems,
your manager: projects are being pro- you are more likely to get a failed CPI
duced on time, on budget, and/or with companys hardware effort and be poorer to boot.
greater quality. The other managers Because you do not have a repeatable
will become jealous and want to people realized the process, the tool only lets you make
achieve the same success. mistakes faster, easier, and with greater
Do not try to end world hunger. software folks had their impact. Of course, this frustrates peo-
Aim low and reach your target. If you ple and they will quit using the tool.
try to fix your whole company, you will act together ...When They will not realize it was the lack of
likely spend most of your time negoti- process that caused the problems, not
ating, selling, and/or compromising. It they were shown the the tool. First, create a process and then
is better to fix your little niche and introduce a tool to help people per-
make others jealous (see above). Allow CMM, they said, We form the process faster, cheaper, and
them to modify your processes to fit better.
their needs. They will already have Sometimes status quo is good.
could use that ....
incentive to ensure they are successful Remember, people want to do their
(jealousy still reigns). If they fail, they cheerleader. It is your first opportunity jobs better they just do not want to
will come back since they will become to provide training snippets on new change to do that. The mere act of
jealous over something else you are processes as well as keeping everyone documenting your current (probably
doing better than them (and reaping informed of your current CPI status. flawed) processes is a huge improve-
tangible rewards such as a bigger bud- Fail and get over it. As humans we ment over undocumented processes.
get or additional people). They might are imperfect. Do not worry about At least now you could repeat the
even stumble onto a better process failure. The only failure is one where process twice in a row. It is better to
than yours great! Ask to use it and you learn nothing. If you are unsuc- get the early buy-in than try to perfect
make it work for you. Now you have a cessful and learn from it then it was a the process too quickly. There will be
strong ally. great learning experience, not a failure plenty of opportunity to improve the
Two heads are better than one. because you now know at least one process as people use it.
Create a PIT encompassing each work way not to do it. Sometimes status quo is bad.
unit in your area. As a PIT leader, you Start with the obvious. The CMMI- Hopefully you will never hear we do it
do not have the corner on good ideas. ACQ has a lot of information about this way because it is how we have always
The more people you include (up to what your acquisition program should done it. However, someone is thinking
it. My experience is that if people do I think I can, I think I can. The Little CPI efforts could not figure out where to
not know why they are doing some- Engine That Could ran uphill for a begin, lost steam before starting, could
thing, they are also ripe for the sugges- long time. It was about out of steam not get any management support (usually
tion that there might be a better way, when it crested the hill and things tried at too high a level), focused too
especially if it means less work. Many became easier. So it is with CPI. You much on tools versus processes, could not
of the always done it this way processes will face an uphill battle for at least six find a quick fix and quit, or tried to solve
can be reduced in effort by 50 percent months and probably more. world hunger and gave up.
or more. Often, some work products However, at some point (that point will Based on my 15 years in process
are used by no one. If a product has be different for each organization) you improvement, I suspect that if you follow
no customer (user), eliminate it. You will crest the hill and gain momentum. these suggestions, sticking with it at least
will earn many new friends. If there At that point, no one can stop your two years, you will be successful in your
was a hidden customer, they will even- CPI effort. It will be institutionalized CPI effort.
tually figure out something changed and no longer dependent on individu- If you have CPI lessons learned, I
and come to you to explain why they als, becoming an integral part of your would enjoy hearing them.
need what you eliminated. Then you organizations business practices. As
will know why it is needed. long as you have steam, you must keep
Keep it short. We keep SOIs to no chugging uphill. Set your sights just
References
1. Yamamura, George, and Gary B.
more than three pages. Most are one to over the crest and you will get there.
Wigle. SEI CMM Level 5: For the
one-and-a-half pages, with the shortest This is not three-card monte. Pick
Right Reasons, CrossTalk Aug.
being two paragraphs. SOPs are longer, a model, any model. There are many
but we still try to keep them to about process models from which to choose 1997 <www.stsc.hill.af.mil/crosstalk/
four pages. If attachments are added, (CMM, CMMI, International Organi- frames.asp?uri=1997/08/seicmm5.
we do not count those against the four- zation for Standardization [ISO] 9000, asp>.
page goal. A short document will get TQM, Lean, Six Sigma, Lean Six 2. Billings, C., J. Clifton, B. Kolkhorst, E.
read, but a long one will not. Our plan Sigma, etc.). Which should you use? Lee, and W.B. Wingert. Journey to a
is to write 100 short documents instead When you are getting started, it does Mature Software Process. IBM
of one all-encompassing volume. not matter. Just pick one and go. Any Systems Journal. Vol. 33, No. 1. 1994:
Do not get hung up on training. improvement is better than none. You pp. 46-61.
Some people feel they need training on may even choose bits and pieces of 3. Vu, John D. Presentation to CIOs
everything. At some level, I agree. several models. Having said that, I Office. National Reconnaissance
However, it is just as bad to do too believe the CMM and CMMI models Office. Chantilly, VA., Mar 2001.
much training as not enough. No one are the most comprehensive and take 4. Deming, W. Edwards. Out of the
needs training on our SOIs. Even most you farther than the others. For Crisis. MIT Press. 1986.
SOPs are written so that anyone suffi- instance, ISO 9000 takes you to about
ciently educated could pick up an SOP a CMMI Level 2. The Lean and Six
and determine how to perform its Sigma models require you to docu-
About the Author
task. Use screen captures, pictures, and ment your process first, so you can Thomas D. Neff, Lt
flowcharts some people like determine just how much it has Col, U.S. Air Force
words,some need pictures. Cater to improved. Since most organizations (Ret.) spent most of his
both but keep it short, and provide just starting CPI do not have docu- Air Force career in soft-
training as needed or requested. mented processes, it seems the
A hyperlink is your friend. Ample CMM/CMMI might be best for start- ware development and
hyperlinking avoids redundancy and ing because they provide guidance on project management.
inaccuracy. For instance, we have an what should be in your key processes. Currently, he works for MTC Technolo-
SOI describing acronyms and defini- As your processes mature, you will gies supporting the Defense Threat
tions. All acronyms and definitions likely incorporate other models into Reduction Agencys Nuclear Weapons
used in our SOIs/SOPs are included your CPI effort to speed your progress Effects Division as a process manager,
here. We then name the definition as a or improve the quality. Use whatever and uses the CMMI-ACQ and P-CMM
bookmark and hyperlink upon its first works for you. as guides for that effort. Neff is a fre-
use in each document. That way we Do not reinvent the wheel. Reuse is
quent speaker on process improvement
ensure the proper definition is used your friend. Build on others successes.
and we do not have to spell it out, Learn from them. Never embrace not at information technology conferences.
which keeps our documents shorter. invented here syndrome. The Software He has a Master of Computer Science
Procrastination is your enemy. Engineering Institute already devel- from Texas A&M, which helped steer
There is no bad place to start a CPI oped all the tools you need to start him toward process improvement.
effort except to not start at all. I do not making significant leaps in the quality
know how many times I have been of your processes. Their CMM and
asked how to start a CPI effort. I CMMI models describe every charac-
DTRA/RD-NTE
where you feel most comfortable, with exhibit at various levels of process
Ft Belvoir, VA 22060-6201
what causes the most headaches, with maturity. Use the models they work.
Phone: (703) 767-4106
what will give the best return on They will give you quality processes
Fax: (703) 767-9844
investment, or you can use any other leading to quality products. E-mail: thomas.neff_contractor
criteria. As Nike said, just do it. From what I have seen, most failed @dtra.mil
CrossTalk has been there for you for almost twenty years, and now
we are asking that you be there for CrossTalk. As a free publication,
your comments are the lifeblood of our existence. Has the information
provided in our publication ever helped you save time or money?
Have you benefitted in other ways? If so, we want to hear about it.
Our goal has always been to inform and educate you our readers on
software engineering best practices, processes, policies and other technologies.
Your comments will help CrossTalk continue to bring you the news
If we have succeeded in this goal, let us know how, where, when, and why.
and information youve come to expect.
reports no lines contain flaws will satis- practice, this is fewer because some the program. The appeal of the sym-
fy because it reports no false positives. branches are correlated, but the asymp- bolic execution is that each abstract
Similarly, it is easy to create a tool with totic behavior remains. If procedure state represents potentially many possi-
perfect recall and excellent perfor- calls and returns are taken into account, ble concrete states. For example, given
mance one that reports that all lines the number of paths is doubly exponen- an 8-bit variable x, there are 2 8 possible
have errors will answer because it tial, and if loops are taken into account concrete values: 0, 1, , 255. The sym-
reports no false negatives. Clearly, how- then the number of paths is unbound- bolic execution, however, might repre-
ever, neither tool is of any use whatso- ed. Clearly it is not possible for a tool to sent the value as two abstract states:
ever. explore all of these paths. The tools x=0, and x>0. So where a concrete exe-
Finally, it is at least theoretically pos- restrict their exploration in two ways. cution has 256 states to explore, the
sible to write an analyzer that would First, loops are handled by exploring a symbolic execution has only two.
have excellent precision and excellent small fixed number of iterations: often, As such, the expressivity of this
recall given enough time and access to the first time around the loop is singled abstract domain is an important factor
enough processing power. Whether out as special, and all other iterations that determines the effectiveness of the
such a tool would be as useless as the are considered en masse and represent- analysis. Again, there is a trade-off
previous two example tools is debatable ed by an approximation. Second, not all here: better precision and recall can be
and would depend on just how much paths are explored. It is typical for an achieved by more sophisticated abstract
time it would take. What is clear is that analysis to place an upper bound on the domains, but more resources will then
no such tools currently exist and to cre- number of paths explored in a particu- be required to complete an analysis.
ate them would be very difficult. Values in the abstract domain are equa-
As a result, all tools occupy a middle tions that represent constraints on val-
ground around a sweet spot that devel- If asynchronous paths ues, i.e., x=0, or y>10. As the analysis
opers find most useful. Developers progresses, a constraint solver is used
expect analyses to complete in time can occur (such as those to combine and simplify these equa-
roughly proportional to the size of tions. A key characteristic of these
their code base and within hours rather caused by interrupts or abstract domains is that there is a spe-
than days. Tools that take longer simply cial value, usually named bottom, which
do not get used because they take too exceptions) or if the indicates that the analysis knows no
long. Low precision means more false useful information about the actual
positives, which has an insidious effect program uses concurrency, value. Bottom is the abstract value that
on users. As precision goes down, even corresponds to all possible concrete
true positive warnings are more likely to then the number of values. Reaching bottom is impossible
be erroneously judged as false positives to avoid for any non-trivial abstraction
because the users lose trust in the tool. possible paths to in general as this would require solving
For most classes of flaws, precision the halting problem. Once bottom is
less than 80 percent is unacceptable. consider increases reached, the analysis has a choice of
For more serious flaws, however, preci- treating it as a potentially dangerous
sion as low as five percent may be further. Many tools value, which would increase recall, or as
acceptable if the code is to be deployed a probably safe value, which would
in very risky environments. It is diffi- simply ignore the increase precision. Most tools opt for
cult to quantify acceptable values for the latter as the former also has the
recall as it is impossible to measure effect of decreasing precision enor-
possibilities.
accurately in practice, but clearly users mously.
would not bother using these tools at all lar procedure or on the amount of time If there are program constructs that
if they did not find serious flaws that available, and a selection of those step outside the bounds of what can be
escape detection by other means. remaining paths are explored. expressed in the abstract domain, this
Each of these constraints intro- If asynchronous paths can occur causes the analysis to lose track of vari-
duces its own set of limitations, howev- (such as those caused by interrupts or ables and their relationships. For exam-
er they are all interrelated. The reasons exceptions) or if the program uses con- ple, an abstract domain that allows the
that lead to low recall are explained in currency, then the number of possible expression of affine relationships
more detail in the following sections. paths to consider increases further. between no more than two variables
Many tools simply ignore these possi- admits expressions such as x=2y.
bilities. Finally, most tools also ignore However, something such as x=y+z is
As mentioned earlier, these analyses are recursive function calls, and function out of bounds because it involves three
Path Limitations
path sensitive. This improves both calls that are made through function variables and the analysis would be
recall and precision and is probably the pointers (or make very coarse approxi- forced to conclude x=bottom instead.
key aspect of these products that makes mations) as considering these also con- The consequence of this is the
them most useful. A full exploration of tributes to poor performance and poor abstract domain that a tool uses deter-
all paths through the program would be precision. mines a great deal about the kind of
very expensive. If there are n branch flaws that it is capable of detecting. For
points in a procedure, and there are no example, if the tool uses an abstract
loops in that procedure, then the num- As previously mentioned, these tools domain of affine relations between two
Abstract Domain
ber of intraprocedural paths through work by exploring paths and looking variables, then it may fail to find flaws
that procedure can be as many as 2 n. In for anomalies in the abstract state of that depend on three variables.
gram is not available, as is almost always ever [3], and it is expected that products it should be thought of as a way of
the case because of operating system capable of analyzing object code as well amplifying the software assurance effort.
and third-party libraries, or if the code as C/C++ will appear. The cheapest bug to find is the one that
is written in a language not recognized A second approach to the problem gets found earliest, and as static analysis
by the analysis tool, then the analysis is to specify stubs, or models, that sum- can be used very early in the develop-
must make some assumptions about marize key aspects of the missing ment cycle, its use can reduce the cost of
how that missing code operates. Take, source code. The popular analysis tools development and liberate resources for
for example, a call to a function in a provide models for commonly used use elsewhere. This is the traditional
third-party library that takes a single libraries such as the C library. These view of how static analysis can reduce
pointer-typed parameter and returns an models only have to approximate the testing costs. However, there is a second
integer. In the absence of any other behavior of the code. Users can, of way in which the use of static analysis
information, most analyses will assume course, write these themselves for their can reduce the cost of testing: it makes
that the function does nothing and own libraries but it can be a tricky and it easier to achieve full coverage.
returns an unknown value. This clearly time-consuming effort. One measure of the effectiveness of a
is not realistic, but it is not practical to test suite is how well it exercises or covers
do better in general. The function may the code being tested. There are many dif-
de-reference its pointer parameter, it There are, of course, entire classes of
Out of Scope
ferent kinds of coverage. Statement cover-
may read or write any global variable flaws that static analysis is unlikely ever age is the most common, but for riskier
that is in scope, it may return an integer to be able to detect. Static analysis excels code more stringent forms are often
from a particular range, or it may even at finding places where the fundamental required. Decision coverage is a superset
abort execution. If the analysis knew rules of the language are being violated of statement coverage, and requires that all
this, it would have better precision and such as buffer overruns, or where com- branches in the control flow of the pro-
gram are taken. In DO-178B, a develop-
Figure 1: A Redundant Condition Warning ment standard for flight software [4], the
riskiest code is required to be tested with
100 percent modified condition/decision
c:\CodeSonar\ex2.c
right.
14 rest --;
15
Achieving full coverage, even for
}
Figure 2: A Second Redundant Condition Warning statement coverage, can be very time
consuming. The engineer creating the
test case must figure out what inputs
Never True: 8 if (!flags & MASK) /*Redundant Condition */
frustrating is if it is fundamentally
10 error(Cannot sign packet);
11 return;
12 }
impossible to do so, but this may not be
apparent simply by looking at the code. to place parentheses around the inner Interpretation: A Unified Lattice
If the program contains unreachable expression. This is a potentially danger- Model for Static Analysis of Programs
code, then statement coverage is impos- ous flaw as it means that the error con- by Construction or Approximation of
sible. If it contains redundant conditions dition would not be detected, which Fixpoints. ACM Symposium on
(those that are either always true or could result in unpredictable behavior.
always false), then MCDC is impossible. Principles of Programming Lan-
Developers can spend hours trying to guages. Los Angeles, CA., 1977.
refine a test case before it is evident that 2. Clarke, E.M., O. Grumberg, and D.A.
When to Use Static Analysis
their efforts are pointless. Peled. Model Checking. MIT Press:
Tools
The best time to use advanced static
If the unreachable code or redun- analysis tools is early in the development Cambridge, MA: 1999.
dant conditions can be brought to the cycle. In Holzmanns 10 rules for safety- 3. Balakrishnan, G., R. Gruian, T. Reps,
attention of the tester early, then they do critical development [5], the most far- and T. Teitelbaum. CodeSurfer/x86
not need to waste time in a futile attempt reaching rule states that these tools A Platform for Analyzing x86
to achieve the impossible. This is what should be used throughout the develop-
static analysis can do easily and efficient- Executables. International Confer-
ment process. As well as reducing the ence on Compiler Construction. 2005.
ly. Figure 1 shows an example of a cost of development by finding flaws
report from CodeSonar1 illustrating a 4. RTCA/DO-178B. Software Con-
earlier and reducing testing effort, early
redundant condition in a sample of code siderations in Airborne Systems and
adoption exerts a force on programmers
taken from an open-source application. Equipment Certification. 1992.
to write code that is more amenable to
The variable rest, an unaliased integer, 5. Holzmann, G.J. The Power of 10:
analysis, thereby increasing the probabil-
must be at least three by line 12. The Rules for Developing Safety-Critical
ity that the tool will find errors. Care
decrement on that line means it is at
should be taken, however, to avoid a risk Code. IEEE Computer 2006.
least two, so the condition will always be
true. The following line is also redun- compensation phenomenon, where pro-
dant and shown in a different report. grammers use less care because they
assume that the static analysis tool will
Note
In this example, all the components 1. GrammaTechs static analysis tool.
of the code relevant to the redundancy find their mistakes.
are in close proximity so it is likely that a If adopted late in the development
cycle, static analysis may issue a large
About the Author
reviewer would have spotted this during
a manual review. It would not have been number of warnings. The best value is Paul Anderson, Ph.D.,
so easy to spot if the code were more gained if these are all dealt with, either
is vice president of engi-
complex. If the code had spanned sev- by fixing the code, marking them as false
positives, or labeling them as dont care if neering at GrammaTech,
eral pages, or if relevant parts had been a spin-off of Cornell
embedded in function calls or macro they are believed to be benign. However,
if scheduling time to sift through these University that special-
invocations, then it would have been dif-
ficult to spot. Static analysis is not sensi- is not feasible, then an alternative strate- izes in static analysis,
tive to superficial aspects of the code gy is to operate in a differential mode, where he manages GrammaTechs engi-
such as its layout, so it would not have where programmers are only told about neering team and is the architect of the
been confused. new warnings. This way they are alerted companys static analysis tools. He has
These kinds of redundancies corre- to flaws in code that they are working worked in the software industry for 16
late well with genuine flaws as well; for with while it remains fresh in their years, with most of his experience
example, consider the example in Figure minds.
focused on developing static analysis,
2. This was distilled from a genuine flaw automated testing, and program trans-
found in a widely used open-source pro-
formation tools. A significant portion of
Conclusion
gram, and is a redundant condition Advanced static analysis tools offer
warning where the tool has deduced that much to help improve the quality of Andersons work has involved applying
the true branch of the conditional will software. The best tools are easy to inte- program analysis to improve security.
never be taken. The reason why it con- grate into the development cycle, and His research on static analysis tools and
cluded so is shown to the left. The first can yield high-quality results quickly techniques has been reported in numer-
operand to the bitwise AND (the & sym- without requiring additional engineering ous articles, journal publications, book
bol) is either zero or one as this is the effort. They can be used not just for chapters, and international conferences.
range of the negation operator (the ! finding flaws, but also to guide testing
activities. They use sophisticated sym- Anderson has a B.Sc. from Kings
symbol). This is what is represented by
$temp2. The constant MASK has the bolic execution techniques for which College, University of London, and his
value 16. The result of the AND expres- engineering trade-offs have been made doctorate in computer science from City
sions 1&16 and 0&16 are both zero, so so that they can generate useful results University, London.
the conditional expression is guaranteed in a reasonable time. As such, they
to be zero. inevitably have both false positives and
The programmer who wrote this false negatives, and so should never be
GrammaTech, Inc.
M
plete test cases covering up to 6-way combinations.
any testers are familiar with the most may be observed to fail only for the User studied in [4].
basic form of combinatorial testing Datagram Protocol (UDP) when packet But if pairwise testing can detect 90
all pairs or pairwise testing, in which all rate exceeds 1.3 million packets per sec- percent of bugs, what interaction strength
possible pairs of parameter values are cov- ond a 2-way interaction between proto- is needed to detect 100 percent?
ered by at least one test [1, 2]. Pairwise col type and packet rate. An even more Surprisingly, we found no evidence that
testing uses specially constructed test sets difficult bug might be one which is detect- this question had been studied when the
that guarantee testing every parameter ed only for UDP when packet volume National Institute of Standards and
value interacting with every other parame- exceeds 1.3 million packets per second Technology (NIST) began investigating
ter value at least once. For example, sup- and packet chaining is used a 3-way software faults in 1996. Results showed
pose we had an application that is intend- interaction between protocol type, packet that across a variety of domains, all fail-
ed to run on a variety of platforms com- rate, and chaining option. ures could be triggered by a maximum of
prised of five components: an operating Unfortunately, only a handful of tools 4-way to 6-way interactions [5]. As shown
system (Windows XP, Apple OS X, Red can generate more complex combinations, in Figure 2, the detection rate increases
Hat Linux), a browser (Internet Explorer, such as 3-way, 4-way, or more (we refer to rapidly with interaction strength. With the
Firefox), protocol stack (IPv4, IPv6), a the number of variables in combinations NASA application, for example, 67 per-
processor (Intel, AMD), and a database as the combinatorial interaction strength, or cent of the failures were triggered by only
(MySQL, Sybase, Oracle), a total of 3 x 2 simply, interaction strength, e.g., a 4-way a single parameter value, 93 percent by 2-
x 2 x 2 x 2 = 48 possible platforms. With combination has 4 variables and thus its way combinations, and 98 percent by 3-
only 10 tests, as shown in Figure 1, it is interaction strength is 4). The few tools way combinations. The detection rate
possible to test every component interact- that do generate tests with interaction curves for the other applications are simi-
ing with every other component at least strengths higher than 2-way may require lar, reaching 100 percent detection with 4-
once, i.e., all possible pairs of platform several days to generate tests [3] because way to 6-way interactions. That is, six or
components. The effectiveness of pair- the generation process is mathematically fewer variables were involved in all failures
wise testing is based on the observation complex. Pairwise testing, i.e. testing 2- for the applications studied, so 6-way test-
that software faults often involve interac- way combinations, has come to be accept- ing could, in theory, detect all of the fail-
tions between parameters. While some ed as the standard approach to combina- ures. While not conclusive, these results
bugs can be detected with a single para- torial testing because it is computationally suggest that combinatorial testing that
meter value, such as a divide-by-zero tractable and can effectively detect many exercises high strength interaction combi-
error, the toughest bugs often can only be faults. For example, pairwise testing could nations can be an effective approach to
detected when multiple conditions are detect 70 percent to more than 90 percent high-integrity software assurance.
true simultaneously. For example, a router of software faults for the applications Applying combinatorial testing to real-
world software presents a number of chal-
Figure 1: Pairwise Test Configurations lenges. For one of the best algorithms,
the number of tests needed for combina-
Test OS Browser Protocol CPU DBMS
torial coverage of n parameters with v val-
ues each is proportional to v t log n, where
t is the interaction strength [3]. Unit test-
1 XP IE IPv4 Intel MySQL
Cumulative Percent
Computing T-Way 70
Combinations of Input Values 60
Using FireEye
The first step in combinatorial testing is to 50
find a set of tests that will cover all t-way Test OS Browser Protocol CPU DBMS
combinations of parameter values for the
40
desired combinatorial interaction strength t.
1 XP IE IPv4 Intel MySQL
array. The covering array specifies test data 320 XP IE IPv6 Intel Medical devices
Oracle
where each row of the array can be regard-
ed as a set of parameter values for an indi-
4 OS X Firefox IPv4 AMD Browser MySQL
10
vidual test. Collectively, the rows of the
5 OS X IE IPv4 Intel Server Sybase
0 70 0 1 0 1 0 1 1 1 0
with 1,024 for exhaustive coverage. Similar model1 checker1 not only0reports this,
1 but also 0 0 but model
at hand, 1 checking
0 can1be effective 0
arrays can be generated to cover up to all 6- provides a counterexample showing how the in testing protocols, access control, or other
0 60 0 0 1 1 1 0 0 1 1
way combinations. A non-commercial claim can50be shown false. As will be seen in applications where there is a state machine,
0 0 1 1 0 0 1 0 0 1
research tool called FireEye [3], developed the illustrative example, this gives us the unified modeling language state chart, 0or
0 1 0 1 1 0 0 1 0
by NIST and the University of Texas at ability
1 to40match
0 every 0 set of input
0 test data 0
other
0
formal 0model available.
1 1 1
Arlington1, makes this possible with much with the30
0
result that the system should pro-
1 0 0 0 1 1 1 0 1
greater efficiency than previous tools. For duce for that input data. Figure 4 outlines Illustrative Example
example, a commercial tool required 5,400 the process.
20 Here we present a Medical small devices
example of an
seconds to produce a less-optimal test set The model checker thus automates the access control system. The rules of the sys-
Browser
than FireEye generated in 4.2 seconds. work that normally must be done by a tem are a simplified multi-level security sys-
10
human tester determining what the cor- tem, followed by a step-by-step construc-
Server
correct output for each test can also be used. and each file has a classification level f_l.
With Expected Outputs Using Interactions
Figure 3: 3-way Covering Array for 10 Parameters With Two Values Each
Nu Symbolic Model Verifier
(SMV)
The second step in combinatorial test devel-
opment is to determine what output should
A B C D E F G H I J
be produced by the system under test for 0 0 0 0 0 0 0 0 0 0
each set of input parameter values, often
1 1 1 1 1 1 1 1 1 1
referred to as the oracle problem in testing. The
1 1 1 0 1 0 0 0 0 1
System Model
This system is easily modeled in the lan-
guage of the NuSMV model checker as a
Input Covering
Covering
simple two-state finite state machine. Other
values array
array
tools could be used, but we illustrate the test
generator
user interface if desired) and their values the input values would disprove the claims
Test
in a system definition file that will be used specified in the previous section. Each of
u_l: 0,1,2 1
as input to the covering array generator these counterexamples is, thus, a set of test
2
f_l: 0,1,2
FireEye with the following format: After data that would have the expected result of
3
the system definition file is saved, we run GRANT or DENY. For each SPEC claim, if
act: rd, wr 4
Figure 6: Model Parameters and Values
FireEye, in this case specifying 2-way this set of values cannot in fact lead to the
5
interactions. FireEye produces the output particular result, the model checker indicates
6
shown in Figure 6. that this is true. For example, for the config- Test u_l f_l act 7
Each test configuration defines a set of uration below, the claim that access will not
u_l: 0,1,2 1 0 0 rd 8
and act. The complete test set ensures that ance level (u_l
rd, =wr 0) is below the files level 3 0 2 rd
all 2-way combinations of parameter values (f_l = 2):
act: 4 1 0 wr
have been covered
5 1 1 rd
-- specification AG (((u_l 6 1 2 wr
= 0 & f_l = 2) & act = rd)
7 2 0 rd
Model Claims With Covering
-> AX !(access = GRANT)) is
8 2 1 wr
true
Array Values Inserted
The next step is to assign values from the
9 2 2 wr
covering array to parameters used in the Figure 7: FireEye Output Test Values
model. For each test, we write a claim that If the claim is false, the model checker
indicates this and provides a trace of para- simply reports the fact while if it is false, a
the expected result will not occur. The trace of inputs and internal states is pro-
model checker determines combinations meter input values and states that will prove
it is false. In effect, this is a complete test duced to show how the claim fails. Some
that would disprove these claims, out- testing may require information on internal
putting these as counterexamples. Each case, i.e., a set of parameter values and an
expected result. It is then simple to map states or variable values, and the previous
counterexample can then be converted to a procedure provides this information.
these values into complete test cases in the
test with known expected result. For exam-
syntax needed for the system under test. An
ple, for Test 1 the parameter values are:
excerpt from NuSMV output is shown in
Figure 8.
Shell Script Post-Processing to
u_l = 0 & f_l = 0 & act = rd
The model checker finds that six of the
Produce Complete Tests
The last step is to use a post-processing tool
input parameter configurations produce a
For each of the nine configurations in that reads the output of the model checker
result of GRANT and three produce a
the covering array (Figure 7), we create a DENY result, so at the completion of this and generates a set of test inputs with
SPEC claim of the form: SPEC AG( cover- expected results. The post-processor strips
step we have successfully matched up each
ing array values ) -> AX !(access = result). input parameter configuration with the out the parameter names and values, giving
This process is repeated for each possi- result that should be produced by the sys- tests that can be applied to the system under
ble result, in this case either GRANT or tem under test. test. Simple scripts are then used to convert
DENY, so we have nine claims for each of At first, the method previously the test cases into input for a suitable test
the two results. The model checker is able to described may seem backward. Instead of harness. The tests produced are shown in
determine, using the model defined previ- negating each possible result, why not sim- Figure 9 (see next page).
ously, which result is the correct one for ply produce tests from model checker out-
each set of input values, producing a total of put such as specification AG
nine tests.
Conclusion
(((u_l = 0 & f_l = 2) & act = While tests for this trivial example could
rd) -> AX (access = DENY)) is easily have been constructed manually,
Excerpt: true? Such a procedure would work fine for the procedures introduced in this tutorial
this simple example, but more sophisticated can and have been used to produce
SPEC AG((u_l = 0 & f_l = 0 & act testing may require more information. Note tens of thousands of complete test cases
= rd) -> AX !(access = GRANT)); that if the claim is true, the model checker in a few minutes once the SMV model
SPEC AG((u_l = 0 & f_l = 1 & act
= wr) -> AX !(access = GRANT)); Figure 8: Counterexamples (excerpt)
SPEC AG((u_l = 0 & f_l = 2 & act -- specification AG (((u_l = 0 & f_l = 0) & act = rd)
= rd) -> AX !(access = GRANT)); -> AX !(access = GRANT)) is false
etc. -- as demonstrated by the following execution sequence
Trace Description: CTL Counterexample
SPEC AG((u_l = 0 & f_l = 0 & act Trace Type: Counterexample
= rd) -> AX !(access = DENY)); -> State: 1.1 <-
SPEC AG((u_l = 0 & f_l = 1 & act u_l = 0
= wr) -> AX !(access = DENY)); f_l = 0
SPEC AG((u_l = 0 & f_l = 2 & act act = rd
= rd) -> AX !(access = DENY)); access = START_
etc. -> Input: 1.2 <-`
-> State: 1.2 <-
Generating Counterexamples access = GRANT
With Model Checker
NuSMV produces counterexamples where
etc.
Yes requirements and Nowas developed accord- ensure thatYes (1) it produces accurate and and determine what they represent:
Yes No
ing to specifiedYes development procedures reliable output, Yes (2) it produces the need- Category 1. Suggests that the soft-
but resulted in unusable software. ed output in
Noa timely manner,No and (3) it ware No is reliable, has good perfor-
Scenario 7 may seem like heresy to many produces the output in a secure and pri- mance, does not trigger unsafe events
No No No
in the community of software quality vate manner. These three criteria simply to occur (e.g., in a transportation con-
practitioners. It is not; it simply dispels the state that you get the right results at the trol system), has appropriate levels of
myth that requirements elicitation is far right time at the correct level of security. security built in, has good availability
from a perfect science and Scenario
that simply Meets
fol- These are the next three considerations
Requirements Satisfies Development andPurpose
Fit for thus does not suffer from fre-
lowing common sense dos and donts (as that cannot be ignored and must incor- quent failures resulting in downtime,
Processes
spelled out in a development process
1 plan) porate
No
into what software quality
No
means. and Nois resilient to internal failures (i.e.,
guarantees good enough software 2 [8]. No
While each of these is Nointuitive, none fault
Yestolerant). Is all of this possible in
Note that only four of these3 scenarios is No precise enough. The familyYes
of attribut- a single
No software system?
yield good enough software: 2, 4, 6, and es referred to as the ilities is
4 No Yes
a good start- Category 2. Suggests that the soft-
Yes
8. The other four provide a product5
that ing
Yes
point to help increase Nothat precision wareNo offers reliable behavior, but suf-
is not usable for its target environment [9]. This family includes behavioral
6 Yes No
char- fers from the likelihood of producing
Yes
and that brings us back to the7discussion acteristics
Yes
such as reliability,
Yes
perfor- outputs
No that send the system that the
of why scoping the target environment
8
as mance,
Yes
safety, security, availability,
Yes
fault- software feeds inputs into an unsafe
Yes
precisely as possible is an important piece tolerance, etc. (These attributes are also mode. This would represent a safety-
of what software quality means. sometimes termed non-functional critical system where hazardous fail-
Table 2: Ility Combinations
Category Reliability Performance Safety Security Availability Fault-tolerance
1 Yes Yes Yes Yes Yes Yes
2 Yes No
3 No Yes
4 No Yes
5 No Yes
6 No Yes
7 Yes No
8 Yes No
9 No Yes
10 Yes No
11 Yes No Yes
12 Yes No Yes
13 Yes Yes
14 No No No No No No
ures are unacceptable; hazardous fail- tant point is that some combinations of passing through the channel is moving
ures are categorized differently for the ilities are simply counterintuitive, such between trusted entities. The difficulty in
such systems than failures that do not as a system that is safe but unreliable. defining shall nots for security is that we
facilitate possible disastrous loss-of- One last thing to note: It is vital to get cannot imagine all of the different forms
life or loss-of-property consequences. solid definitions for the ilities and to know of malicious attacks that are being
(Note that software by itself is never which ones are quantifiable. For example, invented on-the-fly and if we cannot
unsafe; however software is often reliability and performance are quantifi- imagine those attacks, we likely will not
referred to as unsafe if it produces able; security and safety are not. This prevent them.
outputs to a system that put the sys- makes it far easier to make statements Before leaving the topic of negative
tem into an unsafe mode. Safety is a such as we have very high reliability but an functional requirements, it is worth men-
system property, not a software prop- unknown level of security. tioning an interesting relationship
erty.) A classic example of a reliable between them and the environment. So
product that is unsafe is placing a far, we have only mentioned traditional
functioning toaster into a bathtub of inputs and phantom users as players in
The Shall Nots
There is yet another layer in the notion of
water with the cord still connected; fit for use that deals with negative function- the environment. Traditional inputs are
the toaster is reliable, but it is not safe al requirements. Think of a negative those that the software expects to receive
to go near. requirement as the software shall not do during operation. But there are two other
Category 3. Suggests that the soft- X, as opposed to a functional require- types of inputs worth mentioning: mali-
ware behaves so unreliably when exe- ment stating that the software shall do X. cious illegal and non-malicious illegal. A mali-
cuted that it cannot put the system cious illegal input is one that someone
into an unsafe mode. An example deliberately feeds into the software to
here would be that the software gets For certain types of attack a system, and a non-malicious ille-
hung up in a loop and the safety func- gal input is simply an input that the sys-
tionality is never invoked. systems, particularly tem designers do not want the software
Category 8. Suggests safe but not to accept but has no malicious intent. In
secure software behavior. This is quite safety-critical systems, both cases, filtering on either type of
realistic for a safety-critical system input can be useful to ensure that certain
with no security concerns. Note that enumerating negative inputs do not become a part of the envi-
the interesting aspect of this category ronment and in doing so ensure that neg-
is how safety and security are defined. requirements is a ative functional requirements are
Many people use these terms inter- enforced.
changeably, which is incorrect. necessity. And for
Category 11. Suggests that the soft- Time
ware behaves reliably and has good software requiring The next layer in our quest for software
availability, but lacks adequate security quality is time. Software has fixed longevi-
precautions. Many systems suffer security capabilities, ty; it can be expanded, as we learned from
from this problem. Y2K, but not indefinitely.
Category 12. Suggests that the soft- security rules and One of the easiest ways to explain
ware behaves reliably, is extremely why time fits here is to look at the situa-
slow, but has adequate security. It policies are its equivalent tion where a software package operates
makes one wonder if the system is so correctly on Monday but does not oper-
slow that it is effectively unusable, and
to negative ate correctly on Tuesday. Further, the
thus secure, since it would take too software package was not modified
long to break in. between these days. (This is the classic
requirements.
Category 13. Suggests high levels of problem that quickly carves down the
security and high levels of perfor- Negative requirements are far more number of freshman computer science
mance. In certain situations that is difficult to elicit than regular require- majors.) Why has this problem occurred?
plausible, however typically security ments. Why? Because humans are not It all goes back to the importance of
kills performance and vice versa. programmed to anticipate and enumerate environment in the understanding of
Category 14. This is the easiest com- all of the bad circumstances that can pop software quality. Earlier we defined the
bination to achieve. Anyone can build up and that we need protection against; environment as inputs with probabilities
a useless system. we are instead programmed to think of selection, hardware configurations,
The main point here is that the afore- about the good things we want the soft- access to memory, operating systems,
mentioned high-level attributes (1) pro- ware to do. attached databases, and whether other
duce accurate and reliable output, (2) For certain types of systems, particu- background processes were over-
produce the needed output in a timely larly safety-critical systems, enumerating indulging in resources, etc.
manner, and (3) produce the output in a negative requirements is a necessity. And But what is not mentioned was calen-
secure and private manner are actually for software requiring security capabili- dar time. Environment is also a function
composed of the lower-level ilities. ties, security rules and policies are its of time. As time moves forward, other
Another important point not to overlook equivalent to negative requirements. For pieces of the environment change. And
is the fact that some of the ilities are not example, a negative security requirement so while all effort and expense can be
compatible with one another. An exam- could be that the software shall never levied toward what we perceive is evi-
ple of this can easily be found using fault open access to a particular channel unless dence supporting the claim that we have
tolerance and testability. A final impor- it can be guaranteed that the information good enough software, we need to recog-
To request back issues on topics not with these different considerations, Phone: (703) 414-3842
listed above, please contact <stsc. becomes a far more interesting topic, and Fax: (703) 414-8250
customerservice@hill.af.mil>. one that will continue to perplex us for E-mail: j.voas@ieee.org