Documente Academic
Documente Profesional
Documente Cultură
Functional testing
From Wikipedia, the free encyclopedia
Functional testing is a type of black box testing that bases its test cases on the specifications of the software
component under test. Functions are tested by feeding them input and examining the output, and internal
program structure is rarely considered (not like in white-box testing).[1]
Functional testing differs from system testing in that functional testing "verif[ies] a program by checking it
against ... design document(s) or specification(s)", while system testing "validate[s] a program by checking it
against the published user or system requirements" (Kaner, Falk, Nguyen 1999, p. 52).
Functional testing typically involves five steps[citation needed]:
1.
2.
3.
4.
5.
See also
Non-functional testing
Acceptance testing
Regression testing
System testing
Software testing
Integration testing
Unit testing
Database testing
References
1. ^ Kaner, Falk, Nguyen. Testing Computer Software. Wiley Computer Publishing, 1999, p. 42. ISBN 0-47135846-0.
External links
JTAG for Functional Test without Boundary-scan
(http://www.corelis.com/blog/index.php/blog/2011/01/10/jtag-for-functional-test-without-boundaryscan)
1/2
23/11/2012
en.wikipedia.org/wiki/Functional_testing
2/2
Quick Quote
Request a Demo
Login
Vision
Functionality Testing
What is Functionality Testing?
Functionality testing is employed to verify whether your product meets the intended specifications and functional requirements laid out in your development documentation.
What is the purpose of Functionality Testing?
As competition in the software and hardware development arena intensifies, it becomes critical to deliver products that are virtually bug-free. Functionality testing helps your
company deliver products with a minimum amount of issues to an increasingly sophisticated pool of end users. Potential purchasers of your products may find honest and often
brutal product reviews online from consumers and professionals, which might deter them from buying your software. nResult will help ensure that your product functions as
intended, keeping your service and support calls to a minimum. Let our trained professionals find functional issues and bugs before your end users do!
How can nResult help you deliver high quality products that are functionally superior to products offered by your competition?
We offer several types of functional testing techniques:
Ad Hoc Takes advantage of individual testing talents based upon product goals, level of user capabilities and possible areas and features that may create confusion.
The tester will generate test cases quickly, on the spur of the moment.
Exploratory The tester designs and executes tests while learning the product. Test design is organized by a set of concise patterns designed to assure that testers dont
miss anything of importance.
Combination The tester performs a sequence of events using different paths to complete tasks. This can uncover bugs related to order of events that are difficult to find
using other methods.
Scripted The tester uses a test script that lays out the specific functions to be tested. A test script can be provided by the customer/developer or constructed by
nResult, depending on the needs of your organization.
Let nResult ensure that your hardware or software will function as intended. Our team will check for any anomalies or bugs in your product, through any or all stages of
development, to help increase your confidence level in the product you are delivering to market. nResult offers detailed, reasonably priced solutions to meet your testing needs.
2/4
Introduction to
Performance Testing
First Presented for:
PSQT/PSTT Conference
Washington, DC May, 2003
Scott Barber
Chief Technology Officer
PerfTestPlus, Inc.
www.PerfTestPlus.com
Page 1
Agenda
Why Performance Test?
What is Performance related testing?
Intro to Performance Engineering Methodology
Where to go for more info
Summary / Q&A
www.PerfTestPlus.com
Page 2
www.PerfTestPlus.com
Page 3
Speed
User Expectations
Experience
Psychology
Usage
System Constraints
Hardware
Network
Software
Costs
Speed can be expensive!
www.PerfTestPlus.com
Page 4
Scalability
How many users
Database capacity
File Server capacity
Back-up Server capacity
Data growth rates
www.PerfTestPlus.com
Page 5
Stability
What happens if
www.PerfTestPlus.com
Page 6
Confidence
If you know what the performance is
www.PerfTestPlus.com
Page 7
What?
Detect
Diagnose
No
www.PerfTestPlus.com
tR
e so
l ve
d
Why?
Resolve
Page 8
Performance Validation
Performance validation is the process by which software is
tested with the intent of determining if the software meets
pre-existing performance requirements. This process aims
to evaluate compliance.
www.PerfTestPlus.com
Page 9
Performance Testing
Performance testing is the process by which software is
tested to determine the current system performance. This
process aims to gather information about current
performance, but places no value judgments on the
findings.
www.PerfTestPlus.com
Page 10
Performance Engineering
Performance engineering is the process by which software is
tested and tuned with the intent of realizing the required
performance. This process aims to optimize the most
important application performance trait, user experience.
www.PerfTestPlus.com
Page 11
www.PerfTestPlus.com
Page 12
Is iterative.
Has clear goals, but fuzzy end points.
Includes the effort of tuning the application.
Focuses on multiple scenarios with pre-determined
standards.
Heavily involves the development team.
Occurs concurrently with software development.
www.PerfTestPlus.com
Page 13
Intro to PE Methodology
Evaluate System
Develop Test Assets
Baselines and Benchmarks
Analyze Results
Tune
Identify Exploratory Tests
Execute Scheduled Tests
Complete Engagement
www.PerfTestPlus.com
Page 14
Evaluate System
Determine performance requirements.
Identify expected and unexpected user activity.
Determine test and/or production architecture.
Identify non-user-initiated (batch) processes.
Identify potential user environments.
Define expected behavior during unexpected circumstances.
www.PerfTestPlus.com
Page 15
www.PerfTestPlus.com
Page 16
www.PerfTestPlus.com
Page 17
Analyze Results
Most important.
Most difficult.
Focuses on:
Have the performance criteria been met?
What are the bottlenecks?
Who is responsible to fix those bottlenecks?
Decisions.
www.PerfTestPlus.com
Page 18
Tune
Engineering only.
Highly collaborative with development team.
Highly iterative.
Usually, performance engineer supports and validates while
developers/admins tune.
www.PerfTestPlus.com
Page 19
www.PerfTestPlus.com
Page 20
www.PerfTestPlus.com
Page 21
Complete Engagement
Document:
Actual Results
Tuning Summary
Known bottlenecks not tuned
Other supporting information
Recommendation
www.PerfTestPlus.com
Page 22
(Good
www.PerfTestPlus.com
Page 23
Summary
We test performance to:
Evaluate Risk.
Determine system capabilities.
Determine compliance.
www.PerfTestPlus.com
Page 24
E-mail:
sbarber@perftestplus.com
www.PerfTestPlus.com
Web Site:
www.PerfTestPlus.com
Page 25
Helsinki 26.09.2006
Seminar paper
University of Helsinki
Department of Computer Science
Faculty of Science
Xiang Gan
Tyn nimi Arbetets titel Title
26.9.2006
Performance is one of the most important aspects concerned with the quality of software. It
indicates how well a software system or component meets its requirements for timeliness. Till
now, however, no significant progress has been made on software performance testing. This
paper introduces two software performance testing approaches which are named workload
characterization and early performance testing with distributed application, respectively.
ACM Computing Classification System (CCS):
A.1 [Introductory and Survey],
D.2.5 [Testing and Debugging]
ii
Contents
1
Introduction .......................................................................................1
Conclusion .........................................................................................8
References ...............................................................................................9
1 Introduction
Although the functionality supported by a software system is apparently important, it
is usually not the only concern. The various concerns of individuals and of the
society as a whole may face significant breakdowns and incur high costs if the
system cannot meet the quality of service requirements of those non-functional
aspects, for instance, performance, availability, security and maintainability that are
expected from it.
Performance is an indicator of how well a software system or component meets its
requirements for timeliness. There are two important dimensions to software
performance timeliness, responsiveness and scalability [SmW02]. Responsiveness is
the ability of a system to meet its objectives for response time or throughput. The
response time is the time required to respond to stimuli (events). The throughput of a
system is the number of events processed in some interval of time [BCK03].
Scalability is the ability of a system to continue to meet its response time or
throughput objectives as the demand for the software function increases [SmW02].
As Weyuker and Vokolos argued [WeV00], usually, the primary problems that
projects report after field release are not system crashes or incorrect systems
responses, but rather system performance degradation or problems handling required
system throughput. If queried, the fact is often that although the software system has
gone through extensive functionality testing, it was never really tested to assess its
expected performance. They also found that performance failures can be roughly
classified as the following three categories:
l
l
l
This seminar paper concentrates upon the introduction of two software performance
testing approaches. Section 2 introduces a workload characterization approach which
requires a careful collection of data for significant periods of time in the production
environment. In addition, the importance of clear performance requirements written
in requirement and specification documents is emphasized, since it is the
fundamental basis to carry out performance testing. Section 3 focuses on an
approach to test the performance of distributed software application as early as
possible during the entire software engineering process since it is obviously a large
overhead for the development team to fix the performance problems at the end of the
whole process. Even worse, it may be impossible to fix some performance problems
without sweeping redesign and re-implementation which can eat up lots of time and
money. A conclusion is made at last in section 4.
2.1
2.2
3
The workload characterization approach described by Alberto Avritzer and Joe
Kondek [AKL02] is comprised of two steps that will be illustrated as follows.
The first step is to model the software system. Since most industrial software
systems are usually too complex to handle all the possible characteristics, then
modeling is necessary. The goal of this step is thus to establish a simplified version
of the system in which the key parameters have been identified. It is essential that
the model be as close enough to the real system as possible so that the data collected
from it will realistically reflect the true systems behavior. Meanwhile, it shall be
simple enough as it will then be feasible to collect the necessary data.
The second step is to collect data while the system is in operation after the system
has been modeled, and key parameters identified. According to the paper [AKL02],
this activity should usually be done for periods of two to twelve months. Following
that, the data must be analyzed and a probability distribution should be determined.
Although the input space, in theory, is quite enormous because of the non-uniform
property of the frequency distribution, experience has shown that there are a
relatively small number of inputs which actually occur during the period of data
collection. The paper [AKL02] showed that it is quite common for only several
thousand inputs to correspond to more than 99% of the probability mass associated
with the input space. This means that a very accurate picture of the performance that
the user of the system tends to see in the field can be drawn only through testing the
relatively small number of inputs.
2.3
After performing the workload characterization and determining what are the
paramount system characteristics that require data collection, now we need to use
that information to design performance test cases to reflect field production usage for
the system. The following prescriptions were defined by Weyuker and Vokolos
[WeV00]. One of the most interesting points in this list of prescriptions is that they
also defined how to design performance test cases in case the detailed historical data
is unavailable. Their by then situation was that a new platform has been purchased
but not yet available; plus software has already been designed and written explicitly
for the new hardware platform. The goal of such work is to determine whether there
are likely to be performance problems once the hardware is delivered and the
software is installed and running with the real customer base.
Typical steps to form performance test cases are as follows:
l
l
identify the software processes that directly influence the overall performance of
the system,
for each process, determine the input parameters that will most significantly
influence the performance of the system. It is important to limit the parameters
to the essential ones so that the set of test cases selected will be of manageable
size,
4
l
It is, however, important to recognize that this list cannot be treated as a precise
preparation for test cases since every system is different.
3 Early performance
distributed application
testing
with
Testing techniques are usually applied towards the end of a project. However, most
researchers and practitioners agree that the most critical performance problems, as a
quality of interest, depend upon decisions made in the very early stages of the
development life cycle, such as architectural choices. Although iterative and
incremental development has been widely promoted, the situation concerned with
testing techniques has not been changed so much.
With the increasingly advance in distributed component technologies, such as J2EE
and CORBA, distributed systems are no longer built from scratch [DPE04]. Modern
distributed systems are often built on top of middlewares. As a result, when the
architecture is defined, a certain part of the implementation of a class of distributed
applications is already available. Then, it was argued that this enables performance
testing to be successfully applied at such early stages.
The method proposed by Denaro, Polini and Emmerich [DPE04] is based upon the
observation that the middleware used to build a distributed application often
determines the overall performance of the application. However, they also noted that
only the coupling between the middleware and the application architecture
determines the actual performance. The same middleware may perform quite
differently under the context of different applications. Based on such observation,
architecture designs were proposed as a tool to derive application-specific
performance test cases which can be executed on the early available middleware
platform on which a distributed application is built. It then allows measurements of
performance to be done in the very early stage of the development process.
3.1
The detailed contents in each phase are discussed in the following sub-sections.
3.1.1
First of all, the design of functional test cases is entirely different from the case in
performance testing as already indicated in the previous section. However, as for
performance testing of distributed applications, the main parameters relating to it are
much more complicated than that described before. Table 1 is excerpted from the
paper [DPE04] to illustrate this point.
6
the relative interactions in distributed settings according to the place where they
occur. This taxonomy is far from complete, however, it was believed that such a
taxonomy of distributed interactions is key for using this approach. The next step is
the definition of appropriate metrics to evaluate the performance relevance of the
available use-cases according to the interactions that they trigger.
3.1.3
Generating stubs
To actually implement the test cases, it needs to solve the problem that not all of the
application components which participate in the use-cases are available in the early
stages of development. Stubs should be used in place where the components miss.
Stubs are fake versions of components that can be used instead of the corresponding
components for instantiating the abstract use-cases. Stubs will only take care that the
distributed interactions happen as specified and the other components are coherently
exercised.
The main hypothesis of this approach is that performance measurements in the
presence of the stubs are decent approximations of the actual performance of the
final application [DPE04]. It results from the observation that the available
components, for instance, middleware and databases, embed the software that mainly
impact performance. The coupling between such implementation support and the
application-specific behavior can be extracted from the use-cases, while the
implementation details of the business components remain negligible.
3.1.4
Building the support to test execution involves more technical problems provided
scientific problems raised in the previous three sub-sections have been solved. In
addition, several aspects, for example, deployment and implementation of workload
generators, execution of measurements, can be automated.
Conclusion
In all, two software performance testing approaches were described in this paper.
Workload characterization approach can be treated as a traditional performance
testing approach that requires to carefully collecting a series of data in the production
field and that can only be implemented at the end of the project. In contrast, early
performance testing approach for distributed software applications seems to be more
novel since it encourages to implement performance testing early in the development
process, say, when the architecture is defined. Although it is still not a very mature
approach and more researches need to be conducted upon it according to its
advocators [DPE04], its future looks like to be promising since it allows to fix those
performance problems as early as possible which is quite attractive.
Several other aspects also need to be discussed. First of all, there has been very little
research published in the area of software performance testing. For example, with
the search facility IEEE Xplore, if one enters software performance testing in the
search field, there were only 3 results returned when this paper was written. Such a
situation indicates that the field of software performance testing as a whole is only in
its initial stage and needs much more emphasis in future. Secondly, the importance
of requirements and specifications is discussed in this paper. The fact, however, is
that usually no performance requirements are provided, which means that there is no
precise way of determining whether or not the software performance is acceptable.
Thirdly, a positive trend is that software performance, as an important quality, is
increasingly punctuated during the development process. Smith and Williams
[SmW02] proposed Software Performance Engineering (SPE) which is a systematic,
quantitative approach to constructing software systems that meet performance
objectives. It aids in tracking performance throughout the development process and
prevents performance problems from emerging late in the life cycle.
References
AKL02
Avritzer A., Kondek J., Liu D., Weyuker E.J., Software performance
testing based on workload characterization. Proc. of the 3rd
international workshop on software and performance, Jul. 2002, pp.
17-24.
AvW04
Avritzer A., and Weyuker E.J., The role of modeling in the performance
testing of E-commerce applications. IEEE Transactions on software
engineering, 30, 12, Dec. 2004, pp. 1072-1083.
BCK03
DPE04
MMP00
Mus93
SmW02
VoW98
WeV00
Abstract
Software reliability engineering is focused on
engineering
techniques
for
developing
and
maintaining software systems whose reliability can be
quantitatively evaluated. In order to estimate as well
as to predict the reliability of software systems, failure
data need to be properly measured by various means
during software development and operational phases.
Moreover, credible software reliability models are
required to track underlying software failure processes
for accurate reliability analysis and forecasting.
Although software reliability has remained an active
research subject over the past 35 years, challenges
and open questions still exist. In particular, vital
future goals include the development of new software
reliability engineering paradigms that take software
architectures, testing techniques, and software failure
manifestation mechanisms into consideration. In this
paper, we review the history of software reliability
engineering, the current trends and existing problems,
and specific difficulties. Possible future directions and
promising research subjects in software reliability
engineering are also addressed.
1. Introduction
Software permeates our daily life. There is probably
no other human-made material which is more
omnipresent than software in our modern society. It
has become a crucial part of many aspects of society:
home appliances, telecommunications, automobiles,
airplanes, shopping, auditing, web teaching, personal
entertainment, and so on. In particular, science and
technology demand high-quality software for making
improvements and breakthroughs.
The size and complexity of software systems have
grown dramatically during the past few decades, and
the trend will certainly continue in the future. The data
from industry show that the size of the software for
2.
Historical
software
engineering techniques
reliability
2.2.
Software
measurement
reliability
models
and
Develop
Operational Profile
Continue
Testing
Reliability
Objective
met?
Yes
Start to Deploy
Validate Reliability in the Field
Feedback to Next Release
Positive
Reference
Horgan (1994) [17]
Frankl (1988) [16]
Rapps (1988) [38]
Chen (1992) [13]
Wong (1994)
Frate (1995)
Negative
Findings
High code coverage brings high software reliability and low fault rate.
A correlation between code coverage and software reliability was observed.
The correlation between test effectiveness and block coverage is higher than
that between test effectiveness and the size of test set.
An increase in reliability comes with an increase in at least one code coverage
measure, and a decrease in reliability is accompanied by a decrease in at least
one code coverage measure.
Code coverage contributes to a noticeable amount of fault coverage.
The testing result for published data did not support a causal dependency
between code coverage and fault coverage.
4.5. Reliability
applications
for
emerging
software
5. Conclusions
As the cost of software application failures grows and
as these failures increasingly impact business
performance, software reliability will become
progressively more important. Employing effective
software reliability engineering techniques to improve
product and process reliability would be the industrys
best interests as well as major challenges. In this paper,
we have reviewed the history of software reliability
engineering, the current trends and existing problems,
and specific difficulties. Possible future directions and
promising research problems in software reliability
engineering have also been addressed. We have laid
out the current and possible future trends for software
reliability engineering in terms of meeting industry and
customer needs. In particular, we have identified new
software reliability engineering paradigms by taking
software architectures, testing techniques, and software
failure manifestation mechanisms into consideration.
Some thoughts on emerging software applications have
also been provided.
References
[1] S. Amasaki, O. Mizuno, T. Kikuno, and Y. Takagi, A
Bayesian Belief Network for Predicting Residual Faults in
Software Products, Proceedings of 14th International
Symposium
on
Software
Reliability
Engineering
(ISSRE2003), November 2003, pp. 215-226,
[2] ANSI/IEEE, Standard Glossary of Software Engineering
Terminology, STD-729-1991, ANSI/IEEE, 1991.
[3] L. Baresi, E. Nitto, and C. Ghezzi, Toward Open-World
Software: Issues and Challenges, IEEE Computer, October
2006, pp. 36-43.
[4] A. Bertolino, Software Testing Research: Achievements,
Challenges, Dreams, Future of Software Engineering 2007,
L. Briand and A. Wolf (eds.), IEEE-CS Press, 2007.
[5] J. Bishop and N. Horspool, Cross-Platform
Development: Software That Lasts, IEEE Computer,
October 2006, pp. 26-35.
[6] L. Briand and D. Pfahl, Using Simulation for Assessing
the Real Impact of Test Coverage on Defect Coverage,
[19] C.Y. Huang, M.R. Lyu, and S.Y. Kuo, "A Unified
Scheme of Some Non-Homogeneous Poisson Process
Models for Software Reliability Estimation," IEEE
Transactions on Software Engineering, vol. 29, no. 3, March
2003, pp. 261-269.
23/11/2012
Software reliability testing is one of the testing field, which deals with checking the ability of software to
function under given environmental conditions for particular amount of time by taking into account all precisions
of the software. In Software Reliability Testing, the problems are discovered regarding the software design and
functionality and the assurance is given that the system meets all requirements. Software Reliability is the
probability that software will work properly in specified environment and for given time.
Probability = Number of cases when we find failure / Total number of cases under consideration
Using this formula, failure probability is calculated by testing a sample of all available input states. The set of all
possible input states is called as input space. To find reliability of software, we need to find output space from
given input space and software.[1]
Contents
1 Overview
2 Objective of reliability testing
2.1 Secondary objectives
2.2 Points for defining objectives
3 Need of reliability testing
4 Types of reliability testing
4.1 Feature test
4.2 Load test
4.3 Regression test
5 Tests planning
5.1 Steps for planning
5.2 Problems in designing test cases
6 Reliability enhancement through testing
6.1 Reliability growth testing
6.2 Designing test cases for current release
7 Reliability evaluation based on operational testing
7.1 Reliability growth assessment and prediction
7.2 Reliability estimation based on failure-free working
8 See also
9 References
10 External links
Overview
To perform software testing, it is necessary to design the test cases and test procedure for each software
module. Data is gathered from various stages of development for reliability testing,like design stage, Operating
stage etc. The tests are limited because of some restrictions, like Cost of test performing and time restrictions.
Statistical samples are obtained from the software products to test for reliability of the software. when sufficient
data or information is gathered then statistical studies are done. time constraints are handled by applying fix
en.wikipedia.org/wiki/Software_Reliability_Testing
1/6
23/11/2012
dates or deadlines to the tests to be performed,after this phase designed of the software is stopped and actual
implementations started.As there are restriction on cost and time the data is gathered carefully so that each data
has some purpose and it gets expected precision.[2] To achieve the satisfactory results from reliability testing one
must take care of some reliability characteristics. for example Mean Time to Failure (MTTF)[3] is measured in
terms of three factors
1. Operating Time.
2. Number of on off cycles.
3. Calendar Time.
If the restrictions are on Operation time or if focus is on first point for improvement then one can apply
compressed time accelerations to reduce the test time. If the focus is on calendar time that is there are
predefined deadlines,then intensified stress testing is used.[2]
Software Reliability is measured in terms of Mean Time Between Failure(MTBF).[4]
MTBF consisting of mean time to failure (MTTF) and mean time to repair(MTTR). MTTF means difference of
time in two consecutive failures and MTTR is the time required to fix the failure.[5] .Reliability for good software
should be always between 0 to 1.Reliability increases when the errors or bugs from the programs are
removed.[6]
e.g. MTBF = 1000 hours for average software, then software should work for 1000 hrs for continuous
operations.
Secondary objectives
1.
2.
3.
4.
5.
2/6
23/11/2012
any software.[8]
To improve the performance of software product and software development process through assessment of
reliability is required. Reliability testing is of great use for software managers and software practitioners. Thus
ultimately testing reliability of a software is important.[9]
Feature test
feature test for software conducts in following steps
Each operation in the software is executed once.
Interaction between the two operations is reduced and
Each operation each checked for its proper execution.
feature test is followed by the load test.[10]
Load test
This test is conducted to check the performance of the software under maximum work load. Any software
performs better up to some extent of load on it after which the response time of the software starts degrading.
For example, a web site can be tested to see till how many simultaneous users it can function without
performance degradation. This testing mainly helps for Databases and Application servers.Load testing also
requires to do software performance testing where it checks that how well some software performs under
workload.[10]
Regression test
Regression testing is used to check if any bug fixing in the software introduced new bug. One part of the
software affects the other is determined. Regression testing is conducted after every change in the software
features. This testing is periodic. The period depends on the length and features of software.[10]
Tests planning
Reliability testing costs more as compare to other types of testing. Thus while doing reliability testing proper
management and planning is required. This plan includes testing process to be implemented, data about its
environment, test schedule, test points etc.
en.wikipedia.org/wiki/Software_Reliability_Testing
3/6
23/11/2012
6.
7.
8.
9.
to check new prototypes of the software which are initially supposed to fail
frequently.Failure causes are detected and actions are taken to reduce defects. suppose T is total accumulated
time for prototype.n(T) is number of failure from start to time T.The graph drawn for n(T)/T is a straight line.
This graph is called Duance Plot. one can get, how much reliability can be gained after all other cycles of test
and to fix it.
where K is e^b. if value of alpha in the equation is zero the reliability can not be improved as expected for given
number of failure. for alpha greater than zero cumulative time T increases. this explains that number of the failure
doesn't depends on test lengths.
4/6
23/11/2012
Finally combine all test cases from current version and previous one and record all the results.[10]
There is a predefined rule to calculate count of new test cases for the software. if N is the probability of
occurrence of new operations for new release of the software, R is the probability of occurrence of used
operations in the current release and T is the number of all previously used test cases then
See also
Software testing
Load testing
en.wikipedia.org/wiki/Software_Reliability_Testing
5/6
23/11/2012
Regression testing
Reliability engineering
References
1. ^ Software Reliability. Hoang Pham.
2. ^ a b E.E.Lewis. Introduction to Reliability Engineering.
3. ^ "MTTF" (http://www.weibull.com/hotwire/issue94/relbasics94.htm) .
http://www.weibull.com/hotwire/issue94/relbasics94.htm.
4. ^ Roger Pressman. Software Engineering A Practitioner's Approach. McGrawHill.
5. ^ "Approaches to Reliability Testing & Setting of Reliability Test Objectives"
(http://www.softwaretestinggenius.com/articalDetails.php?qry=963) .
http://www.softwaretestinggenius.com/articalDetails.php?qry=963.
6. ^ Aditya P. Mathur. Foundations of Software Testing. Pearson publications.
7. ^ a b Reliability and life testing handbook. Dimitri kececioglu.
8. ^ A Statistical Basis for Software Reliability Assessment. M. xie.
9. ^ Software Reliability modelling. M. Xie.
10. ^ a b c d e f John D. Musa. Software reliability engineering: more reliable software, faster and cheaper.
McGraw-Hill. ISBN 0-07-060319-7.
11. ^ a b E.E.Liwis. Introduction to Reliability Engineering. ISBN 0-471-01833-3.
12. ^ a b "Problem of Assessing reliability". CiteSeerX: 10.1.1.104.9831
(http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.104.9831) .
External links
Mean Time Between Failure (http://www.weibull.com/hotwire/issue94/relbasics94.htm/)
Software Life Testing (http://www.weibull.com/basics/accelerated.htm/)
Retrieved from "http://en.wikipedia.org/w/index.php?title=Software_Reliability_Testing&oldid=521833844"
Categories: Software testing
This page was last modified on 7 November 2012 at 15:12.
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may
apply. See Terms of Use for details.
Wikipedia is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.
en.wikipedia.org/wiki/Software_Reliability_Testing
6/6
23/11/2012
In software engineering, performance testing is in general testing performed to determine how a system
performs in terms of responsiveness and stability under a particular workload. It can also serve to investigate,
measure, validate or verify other quality attributes of the system, such as scalability, reliability and resource
usage.
Performance testing is a subset of performance engineering, an emerging computer science practice which
strives to build performance into the implementation, design and architecture of a system.
Contents
1 Performance testing types
1.1 Load testing
1.2 Stress testing
1.3 Endurance testing (soak testing)
1.4 Spike testing
1.5 Configuration testing
1.6 Isolation testing
2 Setting performance goals
2.1 Concurrency/throughput
2.2 Server response time
2.3 Render response time
2.4 Performance specifications
2.5 Questions to ask
3 Pre-requisites for Performance Testing
3.1 Test conditions
3.2 Timing
4 Tools
5 Technology
6 Tasks to undertake
7 Methodology
7.1 Performance testing web applications
8 See also
9 External links
1/7
23/11/2012
are also monitored, then this simple test can itself point towards any bottlenecks in the application software.
Stress testing
Stress testing is normally used to understand the upper limits of capacity within the system. This kind of test is
done to determine the system's robustness in terms of extreme load and helps application administrators to
determine if the system will perform sufficiently if the current load goes well above the expected maximum.
Spike testing
Spike testing is done by suddenly increasing the number of or load generated by, users by a very large amount
and observing the behaviour of the system. The goal is to determine whether performance will suffer, the system
will fail, or it will be able to handle dramatic changes in load.
Configuration testing
Rather than testing for performance from the perspective of load, tests are created to determine the effects of
configuration changes to the system's components on the system's performance and behaviour. A common
example would be experimenting with different methods of load-balancing.
Isolation testing
Isolation testing is not unique to performance testing but involves repeating a test execution that resulted in a
system problem. Often used to isolate and confirm the fault domain.
Concurrency/throughput
If a system identifies end-users by some form of log-in procedure then a concurrency goal is highly desirable. By
en.wikipedia.org/wiki/Software_performance_testing
2/7
23/11/2012
definition this is the largest number of concurrent system users that the system is expected to support at any
given moment. The work-flow of your scripted transaction may impact true concurrency especially if the
iterative part contains the log-in and log-out activity.
If the system has no concept of end-users then performance goal is likely to be based on a maximum throughput
or transaction rate. A common example would be casual browsing of a web site such as Wikipedia.
Performance specifications
It is critical to detail performance specifications (requirements) and document them in any performance test plan.
Ideally, this is done during the requirements development phase of any system development project, prior to any
design effort. See Performance Engineering for more details.
However, performance testing is frequently not performed against a specification i.e. no one will have expressed
what the maximum acceptable response time for a given population of users should be. Performance testing is
frequently used as part of the process of performance profile tuning. The idea is to identify the weakest link
there is inevitably a part of the system which, if it is made to respond faster, will result in the overall system
running faster. It is sometimes a difficult task to identify which part of the system represents this critical path, and
some test tools include (or can have add-ons that provide) instrumentation that runs on the server (agents) and
report transaction times, database access times, network overhead, and other server monitors, which can be
analyzed together with the raw performance statistics. Without such instrumentation one might have to have
someone crouched over Windows Task Manager at the server to see how much CPU load the performance
tests are generating (assuming a Windows system is under test).
Performance testing can be performed across the web, and even done in different parts of the country, since it is
known that the response times of the internet itself vary regionally. It can also be done in-house, although routers
would then need to be configured to introduce the lag what would typically occur on public networks. Loads
should be introduced to the system from realistic points. For example, if 50% of a system's user base will be
accessing the system via a 56K modem connection and the other half over a T1, then the load injectors
(computers that simulate real users) should either inject load over the same mix of connections (ideal) or simulate
the network latency of such connections, following the same user profile.
It is always helpful to have a statement of the likely peak numbers of users that might be expected to use the
system at peak times. If there can also be a statement of what constitutes the maximum allowable 95 percentile
response time, then an injector configuration could be used to test whether the proposed system met that
specification.
en.wikipedia.org/wiki/Software_performance_testing
3/7
23/11/2012
Questions to ask
Performance specifications should ask the following questions, at a minimum:
In detail, what is the performance test scope? What subsystems, interfaces, components, etc. are in and
out of scope for this test?
For the user interfaces (UIs) involved, how many concurrent users are expected for each (specify peak
vs. nominal)?
What does the target system (hardware) look like (specify all server and network appliance
configurations)?
What is the Application Workload Mix of each system component? (for example: 20% log-in, 40%
search, 30% item select, 10% checkout).
What is the System Workload Mix? [Multiple workloads may be simulated in a single performance test]
(for example: 30% Workload A, 20% Workload B, 50% Workload C).
What are the time requirements for any/all back-end batch processes (specify peak vs. nominal)?
Test conditions
In performance testing, it is often crucial (and often difficult to arrange) for the test conditions to be similar to the
expected actual use. This is, however, not entirely possible in actual practice. The reason is that the workloads
of production systems have a random nature, and while the test workloads do their best to mimic what may
happen in the production environment, it is impossible to exactly replicate this workload variability - except in
the most simple system.
Loosely-coupled architectural implementations (e.g.: SOA) have created additional complexities with
performance testing. Enterprise services or assets (that share a common infrastructure or platform) require
coordinated performance testing (with all consumers creating production-like transaction volumes and load on
shared infrastructures or platforms) to truly replicate production-like states. Due to the complexity and financial
and time requirements around this activity, some organizations now employ tools that can monitor and create
production-like conditions (also referred as "noise") in their performance testing environments (PTE) to
understand capacity and resource requirements and verify / validate quality attributes.
Timing
It is critical to the cost performance of a new system, that performance test efforts begin at the inception of the
development project and extend through to deployment. The later a performance defect is detected, the higher
the cost of remediation. This is true in the case of functional testing, but even more so with performance testing,
due to the end-to-end nature of its scope. It is always crucial for performance test team to be involved as early
as possible. As key performance requisites e.g. performance test environment acquisition and preparation is
often a lengthy and time consuming process.
en.wikipedia.org/wiki/Software_performance_testing
4/7
23/11/2012
Tools
In the diagnostic case, software engineers use tools such as profilers to measure what parts of a device or
software contributes most to the poor performance or to establish throughput levels (and thresholds) for
maintained acceptable response time.
Technology
Performance testing technology employs one or more PCs or Unix servers to act as injectors each emulating
the presence of numbers of users and each running an automated sequence of interactions (recorded as a script,
or as a series of scripts to emulate different types of user interaction) with the host whose performance is being
tested. Usually, a separate PC acts as a test conductor, coordinating and gathering metrics from each of the
injectors and collating performance data for reporting purposes. The usual sequence is to ramp up the load
starting with a small number of virtual users and increasing the number over a period to some maximum. The test
result shows how the performance varies with the load, given as number of users vs response time. Various
tools, are available to perform such tests. Tools in this category usually execute a suite of tests which will
emulate real users against the system. Sometimes the results can reveal oddities, e.g., that while the average
response time might be acceptable, there are outliers of a few key transactions that take considerably longer to
complete something that might be caused by inefficient database queries, pictures etc.
Performance testing can be combined with stress testing, in order to see what happens when an acceptable load
is exceeded does the system crash? How long does it take to recover if a large load is reduced? Does it fail in
a way that causes collateral damage?
Analytical Performance Modeling is a method to model the behaviour of an system in a spreadsheet. The model
is fed with measurements of transaction resource demands (CPU, disk I/O, LAN, WAN), weighted by the
transaction-mix (business transactions per hour). The weighted transaction resource demands are added-up to
obtain the hourly resource demands and divided by the hourly resource capacity to obtain the resource loads.
Using the responsetime formula (R=S/(1-U), R=responsetime, S=servicetime, U=load), responsetimes can be
calculated and calibrated with the results of the performance tests. Analytical performance modelling allows
evaluation of design options and system sizing based on actual or anticipated business usage. It is therefore much
faster and cheaper than performance testing, though it requires thorough understanding of the hardware
platforms.
Tasks to undertake
Tasks to perform such a test would include:
Decide whether to use internal or external resources to perform the tests, depending on inhouse expertise
(or lack thereof)
Gather or elicit performance requirements (specifications) from users and/or business analysts
Develop a high-level plan (or project charter), including requirements, resources, timelines and milestones
Develop a detailed performance test plan (including detailed scenarios and test cases, workloads,
environment info, etc.)
Choose test tool(s)
Specify test data needed and charter effort (often overlooked, but often the death of a valid performance
test)
Develop proof-of-concept scripts for each application/component under test, using chosen test tools and
strategies
en.wikipedia.org/wiki/Software_performance_testing
5/7
23/11/2012
Develop detailed performance test project plan, including all dependencies and associated time-lines
Install and configure injectors/controller
Configure the test environment (ideally identical hardware to the production platform), router
configuration, quiet network (we dont want results upset by other users), deployment of server
instrumentation, database test sets developed, etc.
Execute tests probably repeatedly (iteratively) in order to see whether any unaccounted for factor might
affect the results
Analyze the results - either pass/fail, or investigation of critical path and recommendation of corrective
action
Methodology
Performance testing web applications
According to the Microsoft Developer Network the Performance Testing Methodology
(http://msdn2.microsoft.com/en-us/library/bb924376.aspx) consists of the following activities:
Activity 1. Identify the Test Environment. Identify the physical test environment and the production
environment as well as the tools and resources available to the test team. The physical environment
includes hardware, software, and network configurations. Having a thorough understanding of the entire
test environment at the outset enables more efficient test design and planning and helps you identify testing
challenges early in the project. In some situations, this process must be revisited periodically throughout
the projects life cycle.
Activity 2. Identify Performance Acceptance Criteria. Identify the response time, throughput, and
resource utilization goals and constraints. In general, response time is a user concern, throughput is a
business concern, and resource utilization is a system concern. Additionally, identify project success
criteria that may not be captured by those goals and constraints; for example, using performance tests to
evaluate what combination of configuration settings will result in the most desirable performance
characteristics.
Activity 3. Plan and Design Tests. Identify key scenarios, determine variability among representative
users and how to simulate that variability, define test data, and establish metrics to be collected.
Consolidate this information into one or more models of system usage to be implemented, executed, and
analyzed.
Activity 4. Configure the Test Environment. Prepare the test environment, tools, and resources
necessary to execute each strategy as features and components become available for test. Ensure that the
test environment is instrumented for resource monitoring as necessary.
Activity 5. Implement the Test Design. Develop the performance tests in accordance with the test
design.
Activity 6. Execute the Test. Run and monitor your tests. Validate the tests, test data, and results
collection. Execute validated tests for analysis while monitoring the test and the test environment.
Activity 7. Analyze Results, Tune, and Retest. Analyse, Consolidate and share results data. Make a
tuning change and retest. Improvement or degradation? Each improvement made will return smaller
improvement than the previous improvement. When do you stop? When you reach a CPU bottleneck,
the choices then are either improve the code or add more CPU.
See also
Stress testing (software)
en.wikipedia.org/wiki/Software_performance_testing
6/7
23/11/2012
Benchmark (computing)
Web server benchmarking
Application Response Measurement
External links
Web Load Testing for Dummies (http://www.gomez.com/ebook-web-load-testing-for-dummiesgeneric/) (Book, PDF Version)
The Art of Application Performance Testing - O'Reilly ISBN 978-0-596-52066-3
(http://oreilly.com/catalog/9780596520670) (Book)
Performance Testing Guidance for Web Applications (http://msdn2.microsoft.com/enus/library/bb924375.aspx) (MSDN)
Performance Testing Guidance for Web Applications (http://www.amazon.com/dp/0735625700) (Book)
Performance Testing Guidance for Web Applications
(http://www.codeplex.com/PerfTestingGuide/Release/ProjectReleases.aspx?ReleaseId=6690) (PDF)
Performance Testing Guidance (http://www.codeplex.com/PerfTesting) (Online KB)
Enterprise IT Performance Testing (http://www.perftesting.co.uk) (Online KB)
Performance Testing Videos (http://msdn2.microsoft.com/en-us/library/bb671346.aspx) (MSDN)
Open Source Performance Testing tools (http://www.opensourcetesting.org/performance.php)
"User Experience, not Metrics" and "Beyond Performance Testing"
(http://www.perftestplus.com/pubs.htm)
"Performance Testing Traps / Pitfalls" (http://www.mercury-consultingltd.com/wp/Performance_Testing_Traps.html)
Retrieved from "http://en.wikipedia.org/w/index.php?title=Software_performance_testing&oldid=523786276"
Categories: Software testing Software optimization
This page was last modified on 19 November 2012 at 03:44.
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may
apply. See Terms of Use for details.
Wikipedia is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.
en.wikipedia.org/wiki/Software_performance_testing
7/7
Contents
1 Why software testing?
2 White-box testing
3 Black-box testing
10
14
5 Testing in perspective
15
6 Exercises
16
Programs often contain errors (so-called bugs), even though the compiler accepts the program
as well-formed: the compiler can detect only errors of form, not of meaning. Many errors
and inconveniences in programs are discovered only by accident when the program is being
used. However, errors can be found in more systematic and effective ways than by random
experimentation. This is the goal of software testing.
You may think, why dont we just fix errors when they are discovered? After all, what
harm can a program do? Consider some effects of software errors:
In the 1991 Gulf war, some Patriot missiles failed to hit incoming Iraqi Scud missiles,
which therefore killed people on the ground. Accumulated rounding errors in the control
softwares clocks caused large navigation errors.
Errors in the software controlling the baggage handling system of Denver International
Airport delayed the entire airports opening by a year (19941995), causing losses of
around 360 million dollars. Since September 2005 the computer-controlled baggage
system has not been used; manual baggage handling saves one million dollars a month.
The first launch of the European Ariane 5 rocket failed (1996), causing losses of hundreds
of million dollars. The problem was a buffer overflow in control software taken over from
Ariane 4. The software had not been re-tested to save money.
1
Original 1998 version written for the Royal Veterinary and Agricultural University, Denmark.
Errors in a new train control system deployed in Berlin (1998) caused train cancellations
and delays for weeks.
Errors in poorly designed control software in the Therac-25 radio-therapy equipment
(1987) exposed several cancer patients to heavy doses of radiation, killing some.
A large number of other software-related problems and risks have been recorded by the RISKS
digest since 1985, see the archive at http://catless.ncl.ac.uk/risks.
1.1
A program in Java, or C# or any other language, may contain several kinds of errors:
syntax errors: the program may be syntactically ill-formed (e.g. contain while x {},
where there are no parentheses around x), so that strictly speaking it is not a Java
program at all;
semantic errors: the program may be syntactically well-formed, but attempt to access
non-existing local variables or non-existing fields of an object, or apply operators to the
wrong type of arguments (as in true * 2, which attempts to multiply a logical value
by a number);
logical errors: the program may be syntactically well-formed and type-correct, but
compute the wrong answer anyway.
Errors of the two former kinds are relatively trivial: the Java compiler javac will automatically discover them and tell us about them. Logical errors (the third kind) are harder to deal
with: they cannot be found automatically, and it is our own responsibility to find them, or
even better, to convince ourselves that there are none.
In these notes we shall assume that all errors discovered by the compiler have been fixed.
We present simple systematic techniques for finding semantic errors and thereby making it
plausible that the program works as intended (when we can find no more errors).
1.2
Testing fits into the more general context of software quality assurance; but what is software
quality? ISO Standard 9126 (2001) distinguishes six quality characteristics of software:
functionality: does this software do what it is supposed to do; does it work as intended?
usability: is this software easy to learn and convenient to use?
efficiency: how much time, memory, and network bandwidth does this software consume?
reliability: how well does this software deal with wrong inputs, external problems such
as network failures, and so on?
maintainability: how easy is it to find and fix errors in this software?
portability: how easy is it to adapt this software to changes in its operating environment,
and how easy is it to add new functionality?
The present note is concerned only with functionality testing, but note that usability testing
and performance testing address quality characteristics number two and three. Reliability can
be addressed by so-called stress testing, whereas maintainability and portability are rarely
systematically tested.
2
1.3
The purpose of testing is very different from that of debugging. It is tempting to confuse the
two, especially if one mistakenly believes that the purpose of debugging is to remove the last
bug from the program. In reality, debugging rarely achieves this.
The real purpose of debugging is diagnosis. After we have observed that the program does
not work as intended, we debug it to answer the question: why doesnt this program work?
When we have found out, we modify the program to (hopefully) work as intended.
By contrast, the purpose of functionality testing is to strengthen our belief that the
program works as intended. To do this, we systematically try to show that it does not work.
If our best efforts fail to show that the program does not work, then we have strengthened
our belief that it does work.
Using systematic functionality testing we might find some cases where the program does
not work. Then we use debugging to find out why. Then we fix the problem. And then we
test again to make sure we fixed the problem without introducing new ones.
1.4
The distinction between functionality testing and debugging has a parallel in the distinction
between performance testing and profiling. Namely, the purpose of profiling is diagnosis. After
we have observed that the program is too slow or uses too much memory, we use profiling to
answer the question: why is this program so slow, why does it use so much memory? When
we have found out, we modify the program to (hopefully) use less time and memory.
By contrast, the purpose of performance testing is to strengthen our belief that the program is efficient enough. To do this, we systematically measure how much time and memory
it uses on different kinds and sizes of inputs. If the measurements show that it is efficient
enough for those inputs, then we have strengthened our belief that the program is efficient
enough for all relevant inputs.
Using systematic performance testing we might find some cases where the program is too
slow. Then we use profiling to find out why. Then we fix the problem. And then we test
again to make sure we fixed the problem without introducing new ones.
Schematically, we have:
Purpose \ Quality
Diagnosis
Quality assurance
1.5
Functionality
Debugging
Functionality testing
Efficiency
Profiling
Performance testing
Two important techniques for functionality testing are white-box testing and black-box testing.
White-box testing, sometimes called structural testing or internal testing, focuses on the
text of the program. The tester constructs a test suite (a collection of inputs and corresponding
expected outputs) that demonstrates that all branches of the programs choice and loop
constructs if, while, switch, try-catch-finally, and so on can be executed. The
test suite is said to cover the statements of the program.
Black-box testing, sometimes called external testing, focuses on the problem that the program is supposed to solve; or more precisely, the problem statement or specification for the
3
program. The tester constructs a test data set (inputs and corresponding expected outputs)
that includes typical as well as extreme input data. In particular, one must include inputs
that are described as exceptional or erroneous in the problem description.
White-box testing and black-box testing are complementary approaches to test case generation. White-box testing does not focus on the problem area, and therefore may not discover
that some subproblem is left unsolved by the program, whereas black-box testing should.
Black-box testing does not focus on the program text, and therefore may not discover that
some parts of the program are completely useless or have an illogical structure, whereas
white-box testing should.
Software testing can never prove that a program contains no errors, but it can strengthen
ones faith in the program. Systematic software testing is necessary if the program will be
used by others, if the welfare of humans or animals depends on it (so-called safety-critical
software), or if one wants to base scientific conclusions on the programs results.
1.6
Test coverage
Given that we cannot make a perfect test suite, how do we know when we have a reasonably
good one? A standard measure of a test suites comprehensiveness is coverage. Here are some
notions of coverage, in increasing order of strictness:
method coverage: does the test suite make sure that every method (including function,
procedure, constructor, property, indexer, action listener) gets executed at least once?
statement coverage: does the test suite make sure that every statement of every method
gets executed at least once?
branch coverage: does the test suite make sure that every transfer of control gets executed at least once?
path coverage: does the test suite make sure that every execution path through the
program gets executed at least once?
Method coverage is the minimum one should expect from a test suite; in principle we know
nothing at all about a method that has not been executed by the test suite.
Statement coverage is achieved by the white-box technique described in Section 2, and is
often the best coverage one can achieve in practice.
Branch coverage is more demanding, especially in relation to virtual method calls (socalled virtual dispatch) and exception throwing. Namely, consider a single method call statement a.m() where expression a has type A, and class A has many subclasses A1, A2 and so on,
that override method m(). Then to achieve branch coverage, the test suite must make sure
that a.m() gets executed for a being an object classs A1, an object of class A2, and so on.
Similarly, there is a transfer of control from an exception-throwing statement throw exn to
the corresponding exception handler, if any, so to achieve branch coverage, the test suite must
make sure that each such statement gets executed in the context of every relevant exception
handler.
Path coverage is usually impossible to achieve in practice, because any program that
contains a loop will usually have an infinite number of possible execution paths.
White-box testing
The goal of white-box testing is to make sure that all parts of the program have been executed,
for some notion of part, as described in Section 1.6 on test coverage. The approach described
in this section gives statement coverage. The resulting test suite includes enough input data
sets to make sure that all methods have been called, that both the true and false branches
have been executed in if statements, that every loop has been executed zero, one, and more
times, that all branches of every switch statement have been executed, and so on. For every
input data set, the expected output must be specified also. Then, the program is run with
all the input data sets, and the actual outputs are compared to the expected outputs.
White-box testing cannot demonstrate that the program works in all cases, but it is a
surprisingly efficient (fast), effective (thorough), and systematic way to discover errors in the
program. In particular, it is a good way to find errors in programs with a complicated logic,
and to find variables that are initialized with the wrong values.
2.1
The program below receives some integers as argument, and is expected to print out the
smallest and the greatest of these numbers. We shall see how one performs a white-box test
of the program. (Be forewarned that the program is actually erroneous; is this obvious?)
public static void main ( String[] args )
{
int mi, ma;
if (args.length == 0)
/* 1 */
System.out.println("No numbers");
else
{
mi = ma = Integer.parseInt(args[0]);
for (int i = 1; i < args.length; i++)
/* 2 */
{
int obs = Integer.parseInt(args[i]);
if (obs > ma) ma = obs;
/* 3 */
else if (mi < obs) mi = obs;
/* 4 */
}
System.out.println("Minimum = " + mi + "; maximum = " + ma);
}
}
The choice statements are numbered 14 in the margin. Number 2 is the for statement.
First we construct a table that shows, for every choice statement and every possible outcome,
which input data set covers that choice and outcome:
Choice
1 true
1 false
2 zero times
2 once
2 more than once
3 true
3 false
4 true
4 false
Input property
No numbers
At least one number
Exactly one number
Exactly two numbers
At least three numbers
Number > current maximum
Number current maximum
Number current maximum and > current minimum
Number current maximum and current minimum
While constructing the above table, we construct also a table of the input data sets:
Input data set
A
B
C
D
E
Input contents
(no numbers)
17
27 29
39 37
49 47 48
Expected output
No numbers
17 17
27 29
37 39
47 49
Actual output
No numbers
17 17
27 29
39 39
49 49
When running the above program on the input data sets, one sees that the outputs are wrong
they disagree with the expected outputs for input data sets D and E. Now one may
run the program manually on e.g. input data set D, which will lead one to discover that the
condition in the programs choice 4 is wrong. When we receive a number which is less than
the current minimum, then the variable mi is not updated correctly. The statement should
be:
else if (obs < mi) mi = obs;
/* 4a */
After correcting the program, it may be necessary to reconstruct the white-box test. It may
be very time consuming to go through several rounds of modification and re-testing, so it
pays off to make the program correct from the outset! In the present case it suffices to change
the comments in the last two lines of the table of choices and outcomes, because all we did
was to invert the condition in choice 4:
Choice
1 true
1 false
2 zero times
2 once
2 more than once
3 true
3 false
4a true
4a false
Input property
No numbers
At least one number
Exactly one number
Exactly two numbers
At least three numbers
Number > current maximum
Number current maximum
Number current maximum and < current minimum
Number current maximum and current minimum
The input data sets remain the same. The corrected program produced the expected output
for all input data sets AE.
6
2.2
The program below receives some non-negative numbers as input, and is expected to print out
the two smallest of these numbers, or the smallest, in case there is only one. (Is this problem
statement unambiguous?). This program, too, is erroneous; can you find the problem?
public static void main ( String[] args )
{
int mi1 = 0, mi2 = 0;
if (args.length == 0)
/* 1
System.out.println("No numbers");
else
{
mi1 = Integer.parseInt(args[0]);
if (args.length == 1)
/* 2
System.out.println("Smallest = " + mi1);
else
{
int obs = Integer.parseInt(args[1]);
if (obs < mi1)
/* 3
{ mi2 = mi1; mi1 = obs; }
for (int i = 2; i < args.length; i++)
/* 4
{
obs = Integer.parseInt(args[i]);
if (obs < mi1)
/* 5
{ mi2 = mi1; mi1 = obs; }
else if (obs < mi2)
/* 6
mi2 = obs;
}
System.out.println("The two smallest are " + mi1
}
}
}
*/
*/
*/
*/
*/
*/
Choice
1 true
1 false
2 true
2 false
3 false
3 true
4 zero time
4 once
4 more than once
5 true
5 false
6 true
6 false
Input property
No numbers
At least one number
Exactly one number
At least two numbers
Second number first number
Second number < first number
Exactly two numbers
Exactly three numbers
At least four numbers
Third number < current minimum
Third number current minimum
Third number current minimum and < second least
Third number current minimum and second least
7
Contents
(no numbers)
17
27 29
39 37
49 48 47
59 57 58
67 68 69
77 78 79 76
Expected output
No numbers
17
27 29
37 39
47 48
57 58
67 68
76 77
Actual output
No numbers
17
27 0
37 39
47 48
57 58
67 0
76 77
Running the program with these test data, it turns out that data set C produces wrong
results: 27 and 0. Looking at the program text, we see that this is because variable mi2
retains its initial value, namely, 0. The program must be fixed by inserting an assignment
mi2 = obs just before the line labelled 3. We do not need to change the white-box test,
because no choice statements were added or changed. The corrected program produces the
expected output for all input data sets AH.
Note that if the variable declaration had not been initialized with mi2 = 0, the Java
compiler would have complained that mi2 might be used before its first assignment. If so, the
error would have been detected even without testing.
This is not the case in many other current programming languages (e.g. C, C++, Fortran),
where one may well use an uninitialized variable its value is just whatever happens to be
at that location in the computers memory. The error may even go undetected by testing,
when the value of mi2 equals the expected answer by accident. This is more likely than it may
sound, if one runs the same (C, C++, Fortran) program on several input data sets, and the
same data values are used in several data sets. Therefore it is a good idea to choose different
data values in the data sets, as done above.
2.3
Cases to test
Condition false and true
Zero, one, and more than one iterations
Zero, one, and more than one iterations
One, and more than one, iterations
Every case and default branch must be executed
The try clause, every catch clause, and the finally clause
must be executed
A conditional expression such as (x != 0 ? 1000/x : 1) must be tested for the condition (x != 0) being true and being false, so that both alternatives have been evaluated.
Short-cut logical operators such as (x != 0) && (1000/x > y) must be tested for all
possible combinations of the truth values of the operands. That is,
(x != 0)
false
true
true
&&
(1000/x > y)
false
true
Note that the second operand in a short-cut (lazy) conjunction will be computed only if the
first operand is true (in Java, C#, C, and C++). This is important, for instance, when the
condition is (x != 0) && (1000/x > y), where the second operand cannot be computed if
the first one is false, that is, if x == 0. Therefore it makes no sense to require that the
combinations (false, false) and (false, true) be tested.
In a short-cut disjunction (x == 0) || (1000/x > y) it holds, dually, that the second
operand is computed only if the first one is false. Therefore, in this case too there are only
three possible combinations:
(x == 0)
true
false
false
||
(1000/x > y)
false
true
Methods The test suite must make sure that all methods have been executed. For recursive
methods one should test also the case where the method calls itself.
The test data sets are presented conveniently by two tables, as demonstrated in this
section. One table presents, for each statement, what data sets are used, and which property
of the input is demonstrated by the test. The other table presents the actual contents of the
data sets, and the corresponding expected output.
Black-box testing
The goal of black-box testing is to make sure that the program solves the problem it is
supposed to solve; to make sure that it works. Thus one must have a fairly precise idea of
the problem that the program must solve, but in principle one does not need the program
text when designing a black-box test. Test data sets (with corresponding expected outputs)
must be created to cover typical as well as extreme input values, and also inputs that are
described as exceptional cases or illegal cases in the problem statement. Examples:
In a program to compute the sum of a sequence of numbers, the empty sequence will
be an extreme, but legal, input (with sum 0).
In a program to compute the average of a sequence of numbers, the empty sequence
will be an extreme, and illegal, input. The program should give an error message for
this input, as one cannot compute the average of no numbers.
One should avoid creating a large collection of input data sets, just to be on the safe side.
Instead, one must carefully consider what inputs might reveal problems in the program, and
use exactly those. When preparing a black-box test, the task is to find errors in the program;
thus destructive thinking is required. As we shall see below, this is just as demanding as
programming, that is, as constructive thinking.
3.1
Problem: Given a (possibly empty) sequence of numbers, find the smallest and the greatest
of these numbers.
This is the same problem as in Section 2.1, but now the point of departure is the above
problem statement, not any particular program which claims to solve the problem.
First we consider the problem statement. We note that an empty sequence does not
contain a smallest or greatest number. Presumably, the program must give an error message
if presented with an empty sequence of numbers.
The black-box test might consist of the following input data sets: An empty sequence (A).
A non-empty sequence can have one element (B), or two or more elements. In a sequence with
two elements, the elements can be equal (C1), or different, the smallest one first (C2) or the
greatest one first (C3). If there are more than two elements, they may appear in increasing
order (D1), decreasing order (D2), with the greatest element in the middle (D3), or with the
smallest element in the middle (D4). All in all we have these cases:
Input property
No numbers
One number
Two numbers, equal
Two numbers, increasing
Two numbers, decreasing
Three numbers, increasing
Three numbers, decreasing
Three numbers, greatest in the middle
Three numbers, smallest in the middle
10
The choice of these input data sets is not arbitrary. It is influenced by our own ideas about
how the problem might be solved by a program, and in particular how it might be solved the
wrong way. For instance, the programmer might have forgotten that the sequence could be
empty, or that the smallest number equals the greatest number if there is only one number,
etc.
The choice of input data sets may be criticized. For instance, it is not obvious that data
set C1 is needed. Could the problem really be solved (wrongly) in a way that would be
discovered by C1, but not by any of the other input data sets?
The data sets C2 and C3 check that the program does not just answer by returning the
first (or last) number from the input sequence; this is a relevant check. The data sets D3 and
D4 check that the program does not just compare that first and the last number; it is less
clear that this is relevant.
Input data set
A
B
C1
C2
C3
D1
D2
D3
D4
3.2
Contents
(no numbers)
17
27 27
35 36
46 45
53 55 57
67 65 63
73 77 75
89 83 85
Expected output
Error message
17 17
27 27
35 36
45 46
53 57
63 67
73 77
83 89
Actual output
Problem: Given a (possibly empty) sequence of numbers, find the greatest difference between
two consecutive numbers.
We shall design a black-box test for this problem. First we note that if there is only
zero or one number, then there are no two consecutive numbers, and the greatest difference
cannot be computed. Presumably, an error message must be given in this case. Furthermore,
it is unclear whether the difference is signed (possibly negative) or absolute (always nonnegative). Here we assume that only the absolute difference should be taken into account, so
that the difference between 23 and 29 is the same as that between 29 and 23.
This gives rise to at least the following input data sets: no numbers (A), exactly one
number (B), exactly two numbers. Two numbers may be equal (C1), or different, in increasing
order (C2) or decreasing order (C3). When there are three numbers, the difference may be
increasing (D1) or decreasing (D2). That is:
Input property
No numbers
One number
Two numbers, equal
Two numbers, increasing
Two numbers, decreasing
Three numbers, increasing difference
Three numbers, decreasing difference
11
Contents
(no numbers)
17
27 27
36 37
48 46
57 56 59
69 65 67
Expected output
Error message
Error message
0
1
2
3
4
Actual output
One might consider whether there should be more variants of each of D1 and D2, in which the
three numbers would appear in increasing order (56,57,59), or decreasing (59,58,56), or
increasing and then decreasing (56,57,55), or decreasing and then increasing (56,57,59).
Although these data sets might reveal errors that the above data sets would not, they do
appear more contrived. However, this shows that black-box testing may be carried on indefinitely: you will never be sure that all possible errors have been detected.
3.3
Problem: Given a day of the month day and a month mth, decide whether they determine a
legal date in a non-leap year. For instance, 31/12 (the 31st day of the 12th month) and 31/8
are both legal, whereas 29/2 and 1/13 are not. The day and month are given as integers, and
the program must respond with Legal or Illegal.
To simplify the test suite, one may assume that if the program classifies e.g. 1/4 and
30/4 as legal dates, then it will consider 17/4 and 29/4 legal, too. Correspondingly, one may
assume that if the program classifies 31/4 as illegal, then also 32/4, 33/4, and so on. There
is no guarantee that the these assumptions actually hold; the program may be written in a
contorted and silly way. Assumptions such as these should be written down along with the
test suite.
Under those assumptions one may test only extreme cases, such as 0/4, 1/4, 30/4, and
31/4, for which the expected outputs are Illegal, Legal, Legal, and Illegal.
12
Contents
0 1
1 0
1 1
31 1
32 1
28 2
29 2
31 3
32 3
30 4
31 4
31 5
32 5
30 6
31 6
31 7
32 7
31 8
32 8
30 9
31 9
31 10
32 10
30 11
31 11
31 12
32 12
1 13
Expected output
Illegal
Illegal
Legal
Legal
Illegal
Legal
Illegal
Legal
Illegal
Legal
Illegal
Legal
Illegal
Legal
Illegal
Legal
Illegal
Legal
Illegal
Legal
Illegal
Legal
Illegal
Legal
Illegal
Legal
Illegal
Illegal
Actual output
It is clear that the black-box test becomes rather large and cumbersome. In fact it is just as
long as a program that solves the problem! To reduce the number of data sets, one might
consider just some extreme values, such as 0/1, 1/0, 1/1, 31/12 and 32/12; some exceptional
values around February, such as 28/2, 29/2 and 1/3, and a few typical cases, such as 30/4,
31/4, 31/8 and 32/8. But that would weaken the test a little: it would not discover whether
the program mistakenly believes that June (not July) has 31 days.
13
14
Testing in perspective
Testing can never prove that a program has no errors, but it can considerably improve
the confidence one has in its results.
Often it is easier to design a white-box test suite than a black-box one, because one
can proceed systematically on the basis of the program text. Black-box testing requires
more guesswork about the possible workings of the program, but can make sure that
the program does what is required by the problem statement.
It is a good idea to design a black-box test at the same time you write the program.
This reveals unclarities and subtle points in the problem statement, so that you can take
them into account while writing the program instead of having to fix the program
later.
Writing the test cases and the documentation at the same time is also valuable. When
attempting to write a test case, one often realizes what information users of a method
or class will be looking for in the documentation. Conversely, when one makes a claim
(when n+i>arr.length, then FooException is thrown) about the behaviour of a class
or method in the documentation, that should lead to one or more test cases that check
this claim.
If you further use unit test tools to automate the test, you can actually implement
the tests before you implement the corresponding functionality. Then you can more
confidently implement the functionality and measure your implementation progress by
the number of test cases that succeed. This is called test-driven development.
From the testers point of view, testing is successful if it does find errors in the program;
in this case it was clearly not a waste of time to do the test. From the programmers
point of view the opposite holds: hopefully the test will not find errors in the program.
When the tester and the programmer are one and the same person, then there is a
psychological conflict: one does not want to admit to making mistakes, neither when
programming nor when designing test suites.
It is a useful exercise to design a test suite for a program written by someone else. This
is a kind of game: the goal of the programmer is to write a program that contains no
errors; the goal of the tester is to find the errors in the program anyway.
It takes much time to design a test suite. One learns to avoid needless choice statements
when programming, because this reduces the number of test cases in the white-box
test. It also leads to simpler programs that usually are more general and easier to
understand.2
It is not unusual for a test suite to be as large as the software it tests. The C5 Generic
Collection Library for C#/.NET (http://www.itu.dk/research/c5) implementation has
27,000 lines of code, and its unit test has 28,000 lines.
How much testing is needed? The effort spent on testing should be correlated with the
consequences of possible program errors. A program used just once for computing ones
taxes need no testing. However, a program must be tested if errors could affect the
safety of people or animals, or could cause considerable economic losses. If scientific
conclusions will be drawn from the outputs of a program, then it must be tested too.
A program may be hard to understand even when it has no choice statements; see Exercises 10 and 11.
15
Exercises
Test this method with the black-box test suite you made above.
11. Use white-box techniques to construct a test suite for the method shown in Exercise 10.
This appears trivial and useless, since there are no choice statements in the program
at all. Instead one may consider jumps (discontinuities) in the processing of data.
In particular, integer division (/) and remainder (%) produce jumps of this sort. For
mth < 3 we have m = (mth + 9) mod 12 = mth + 9, and for mth 3 we have
m = (mth + 9) mod 12 = mth 3. Thus there is a kind of hidden choice when going
from mth = 2 to mth = 3. Correspondingly for m / 5 and (m % 5 + 1) / 2. This can
be used for choosing test cases for white-box test. Do that.
12. Consider a method String toRoman(int n) that is supposed to convert a positive
integer to the Roman numeral representing that integer, using the symbols I = 1,
V = 5, X = 10, L = 50, C = 100, D = 500 and M= 1000. The following rules determine
the Roman numeral corresponding to a positive number:
16
In general, the symbols of a Roman numeral are added together from left to right,
so II = 2, XX = 20, XXXI = 31, and MMVIII = 2008.
The symbols I, X and C may appear up to three times in a row; the symbol M may
appear any number of times; and the symbols V, L and D cannot be repeated.
When a lesser symbol appears before a greater one, the lesser symbol is subtracted,
not added. So IV = 4, IX = 9, XL = 40 and CM = 900.
The symbol I may appear once before V and X; the symbol X may appear once
before L and C; the symbol C may appear once before D and M; and the symbols V,
L and D cannot appear before a greater symbol.
So 45 is written XLV, not VL; and 49 is written XLIX, not IL; and 1998 is written
MCMXCVIII, not IIMM.
Exercise: use black-box techniques to construct a test suite for the method toRoman.
This can be done in two ways. The simplest way is to call toRoman(n) for suitably chosen
numbers n and checking that it returns the expected string. The more ambitious way
is to implement (and test!) the method fromRoman described in Exercise 12 below, and
use that to check Roman.
13. Consider a method int fromRoman(String s) with this specification: The method
checks that string s is a well-formed Roman numeral according to the rules in Exercise 12, and if so, returns the corresponding number; otherwise throws an exception.
Use black-box techniques to construct a test suite for this method. Remember to include
also some ill-formed Roman numerals.
17
23/11/2012
Usability testing
From Wikipedia, the free encyclopedia
Usability testing is a technique used in user-centered interaction design to evaluate a product by testing it on
users. This can be seen as an irreplaceable usability practice, since it gives direct input on how real users use the
system.[1] This is in contrast with usability inspection methods where experts use different methods to evaluate a
user interface without involving users.
Usability testing focuses on measuring a human-made product's capacity to meet its intended purpose. Examples
of products that commonly benefit from usability testing are foods, consumer products, web sites or web
applications, computer interfaces, documents, and devices. Usability testing measures the usability, or ease of
use, of a specific object or set of objects, whereas general human-computer interaction studies attempt to
formulate universal principles.
Contents
1 History of usability testing
2 Goals of usability testing
3 What usability testing is not
4 Methods
4.1 Hallway testing
4.2 Remote Usability Testing
4.3 Expert review
4.4 Automated expert review
5 How many users to test?
6 See also
7 References
8 External links
1/6
23/11/2012
between doing it and having it be the core of what engineers focus on.[5]
Methods
Setting up a usability test involves carefully creating a scenario, or realistic situation, wherein the person performs
a list of tasks using the product being tested while observers watch and take notes. Several other test
instruments such as scripted instructions, paper prototypes, and pre- and post-test questionnaires are also used
to gather feedback on the product being tested. For example, to test the attachment function of an e-mail
program, a scenario would describe a situation where a person needs to send an e-mail attachment, and ask him
or her to undertake this task. The aim is to observe how people function in a realistic manner, so that developers
can see problem areas, and what people like. Techniques popularly used to gather data during a usability test
include think aloud protocol, Co-discovery Learning and eye tracking.
Hallway testing
en.wikipedia.org/wiki/Usability_testing
2/6
23/11/2012
Hallway testing (or Hall Intercept Testing) is a general methodology of usability testing. Rather than using an
in-house, trained group of testers, just five to six random people are brought in to test the product, or service.
The name of the technique refers to the fact that the testers should be random people who pass by in the
hallway.[10]
Hallway testing is particularly effective in the early stages of a new design when the designers are looking for
"brick walls," problems so serious that users simply cannot advance. Anyone of normal intelligence other than
designers and engineers can be used at this point. (Both designers and engineers immediately turn from being
test subjects into being "expert reviewers." They are often too close to the project, so they already know how to
accomplish the task, thereby missing ambiguities and false paths.)
Expert review
Expert review is another general method of usability testing. As the name suggests, this method relies on
bringing in experts with experience in the field (possibly from companies that specialize in usability testing) to
evaluate the usability of a product.
3/6
23/11/2012
given rules for good design and heuristics. Though an automated review might not provide as much detail and
insight as reviews from people, they can be finished more quickly and consistently. The idea of creating
surrogate users for usability testing is an ambitious direction for the Artificial Intelligence community.
where p is the probability of one subject identifying a specific problem and n the number of subjects (or test
sessions). This model shows up as an asymptotic graph towards the number of real existing problems (see figure
below).
In later research Nielsen's claim has eagerly been questioned with both empirical evidence[17] and more
advanced mathematical models.[18] Two key challenges to this assertion are:
1. Since usability is related to the specific set of users, such a small sample size is unlikely to be
representative of the total population so the data from such a small sample is more likely to reflect the
en.wikipedia.org/wiki/Usability_testing
4/6
23/11/2012
See also
ISO 9241
Software testing
Educational technology
Universal usability
Commercial eye tracking
Don't Make Me Think
Software performance testing
System Usability Scale (SUS)
Test method
Tree testing
RITE Method
Component-Based Usability Testing
Crowdsource testing
Usability goals
References
1. ^ Nielsen, J. (1994). Usability Engineering, Academic Press Inc, p 165
2. ^ NN/G Usability Week 2011 Conference "Interaction Design" Manual, Bruce Tognazzini, Nielsen Norman
Group, 2011
3. ^ http://interactions.acm.org/content/XV/baecker.pdf
4. ^ http://books.google.com/books?id=lRs_4U43UcEC&printsec=frontcover&sig=ACfU3U1xvA7en.wikipedia.org/wiki/Usability_testing
5/6
23/11/2012
f80TP9Zqt9wkB9adVAqZ4g#PPA22,M1
5. ^ http://news.zdnet.co.uk/itmanagement/0,1000000308,2065537,00.htm
6. ^ a b International Standardization Organization. ergonomics of human system interaction - Part 210 -: Human
centred design for interactive systems (Rep N9241-210). 2010, International Standardization Organization
7. ^ Nielsen, Usability Engineering, 1994
8. ^ Mayhew. The usability engineering lifecycle: a practitioner's handbook for user interface design. London,
Academic press; 1999
9. ^ http://jerz.setonhill.edu/design/usability/intro.htm
10. ^ a b "Usability Testing with 5 Users (Jakob Nielsen's Alertbox)"
(http://www.useit.com/alertbox/20000319.html) . useit.com. 13.03.2000.
http://www.useit.com/alertbox/20000319.html.; references Jakob Nielsen, Thomas K. Landauer (April 1993).
"A mathematical model of the finding of usability problems" (http://dl.acm.org/citation.cfm?
id=169166&CFID=159890676&CFTOKEN=16006386) . Proceedings of ACM INTERCHI'93 Conference
(Amsterdam, The Netherlands, 24-29 April 1993). http://dl.acm.org/citation.cfm?
id=169166&CFID=159890676&CFTOKEN=16006386.
11. ^ Andreasen, Morten Sieker; Nielsen, Henrik Villemann; Schrder, Simon Ormholt; Stage, Jan (2007). "What
happened to remote usability testing?". Proceedings of the SIGCHI conference on Human factors in computing
systems - CHI '07. p. 1405. doi:10.1145/1240624.1240838 (http://dx.doi.org/10.1145%2F1240624.1240838) .
ISBN 9781595935939.
12. ^ Dray, Susan; Siegel, David (2004). "Remote possibilities?". Interactions 11 (2): 10.
doi:10.1145/971258.971264 (http://dx.doi.org/10.1145%2F971258.971264) .
13. ^ http://www.boxesandarrows.com/view/remote_online_usability_testing_why_how_and_when_to_use_it
14. ^ Dray, Susan; Siegel, David (March 2004). "Remote possibilities?: international usability testing at a distance".
Interactions 11 (2): 1017. doi:10.1145/971258.971264 (http://dx.doi.org/10.1145%2F971258.971264) .
15. ^ Chalil Madathil, Kapil; Joel S. Greenstein (May 2011). "Synchronous remote usability testing: a new approach
facilitated by virtual worlds". Proceedings of the 2011 annual conference on Human factors in computing
systems. CHI '11: 22252234. doi:10.1145/1978942.1979267 (http://dx.doi.org/10.1145%2F1978942.1979267)
. ISBN 9781450302289.
16. ^ Virzi, R.A., Refining the Test Phase of Usability Evaluation: How Many Subjects is Enough? Human Factors,
1992. 34(4): p. 457-468.
17. ^ http://citeseer.ist.psu.edu/spool01testing.html
18. ^ Caulton, D.A., Relaxing the homogeneity assumption in usability testing. Behaviour & Information
Technology, 2001. 20(1): p. 1-7
19. ^ Schmettow, Heterogeneity in the Usability Evaluation Process. In: M. England, D. & Beale, R. (ed.),
Proceedings of the HCI 2008, British Computing Society, 2008, 1, 89-98
20. ^ Bruce Tognazzini. "Maximizing Windows" (http://www.asktog.com/columns/000maxscrns.html) .
http://www.asktog.com/columns/000maxscrns.html.
External links
Usability.gov (http://www.usability.gov/)
A Brief History of the Magic Number 5 in Usability Testing
(http://www.measuringusability.com/blog/five-history.php)
Retrieved from "http://en.wikipedia.org/w/index.php?title=Usability_testing&oldid=519424139"
Categories: Usability Software testing Educational technology Evaluation methods Tests
This page was last modified on 23 October 2012 at 17:34.
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may
apply. See Terms of Use for details.
Wikipedia is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.
en.wikipedia.org/wiki/Usability_testing
6/6
23/11/2012
Search
Text Size
Home | Basics |
Methods
Templates
Download Reader
Guidelines
About Us
Home > Methods > Test & Refine the Site > Usability Testing
Usability Testing
Introduction | Test Plan | Preparation & Testing | Data Analyses & Report
Step-by-Step Guide
Introduction
Types of Evaluations
Usability Testing
Cost
Heuristic Evaluations
Implement & Retest
Methods at a Glance
When to Test
You should test early and test often. Usability testing lets the design
and development teams identify problems before they get coded (i.e.,
"set in concrete). The earlier those problems are found and fixed, the
less expensive the fixes are.
No Lab Needed
You DO NOT need a formal usability lab to do testing. You can do
effective usability testing in any of these settings:
a fixed laboratory having two or three connected rooms outfitted
with audio-visual equipment
a conference room, or the user's home or work space, with portable
recording equipment
a conference room, or the user's home or work space, with no
recording equipment, as long as someone is observing the user and
taking notes
remotely, with the user in a different location
Cost
Cost depends on the size of the site, how much you need to test, how
many different types of participants you anticipate having, and how
formal you want the testing to be. Remember to budget for more than
one usability test. Building usability into a Web site (or any product) is
www.usability.gov/methods/test_refine/learnusa/index.html
1/2
23/11/2012
Time
You will need time to plan the usability test. It will take the usability
specialist and the team time to get familiarized with the site and do dry
runs with scenarios. Budget for the time it takes to test users and for
analyzing the data, writing the report, and discussing the findings.
Recruiting Costs
Recruiting Costs: time of in-house person or payment to a recruiting firm.
Developing a user database either in-house or firm recruiting becomes
less time consuming and cheaper. Also allow for the cost of paying or
providing gifts for the participants.
Rental Costs
If you do not have equipment, you will have to budget for rental costs
for the lab or other equipment.
Back to Top
www.usability.gov/methods/test_refine/learnusa/index.html
2/2
Usability Testing
18
Usability Testing
There are two major considerations when
conducting usability testing. The first is to ensure that the best possible
method for testing is used. Generally, the best method is to conduct
a test where representative participants interact with representative
scenarios. The tester collects data on the participants success, speed of
performance, and satisfaction. The findings, including both quantitative
data and qualitative observations information, are provided to designers
in a test report. Using inspection evaluations, in place of well-controlled
usability tests, must be done with caution. Inspection methods, such as
heuristic evaluations or expert reviews, tend to generate large numbers
of potential usability problems that never turn out to be actual usability
problems.
The second major consideration is to ensure that an iterative approach
is used. After the first test results are provided to designers, they should
make changes and then have the Web site tested again. Generally, the
more iterations, the better the Web site.
Research-Ba s e d We b D e s i g n & U s a b i l i t y G u i d e l i n e s
18:3 Prioritize
18:1
Use an Iterative
Tasks Design Approach
an iterative design approach to create the most
useful and usable Web site.
Relative Importance:
Strength of Evidence:
Sources: Badre, 2002; Bailey, 1993; Bailey and Wolfson, 2005; Bradley and
Johnk, 1995; Egan, et al., 1989; Hong, et al., 2001; Jeffries, et al., 1991; Karat,
Campbell, and Fiegel, 1992; LeDoux, Connor and Tullis, 2005; Norman and
Murphy, 2004; Redish and Dumas, 1993; Tan, et al., 2001.
R e s e a r c h - B a s e d We b D e s i g n & U s a b i l i t y G u i d e l i n e s
Usability Testing
189
Usability Testing
190
Relative Importance:
Strength of Evidence:
Sources: Bailey, 2003; Bowers and Snyder, 1990; Capra, 2002; Hoc and Leplat,
1983; Ohnemus and Biers, 1993; Page and Rahimi, 1995; Van Den Haak, De
Jong, and Schellens, 2003; Wright and Converse, 1992.
Relative Importance:
Strength of Evidence:
Sources: John and Marks, 1997; Karat, 1994a; Ramey, 2000; Rehman, 2000;
Williams, 2000; Wixon and Jones, 1996.
Research-Ba s e d We b D e s i g n & U s a b i l i t y G u i d e l i n e s
Relative Importance:
Strength of Evidence:
fix first, address the tasks that users believe to be easy but are actually difficult.
The Usability Magnitude Estimation (UME) is a measure that can be used to
assess user expectations of the difficulty of each task. Participants judge how
difficult or easy a task will be before trying to do it, and then make a second
judgment after trying to complete the task. Each task is eventually put into
one of four categories based on these expected versus actual ratings:
Tasks that were expected to be easy and were actually easy; and
Relative Importance:
Strength of Evidence:
R e s e a r c h - B a s e d We b D e s i g n & U s a b i l i t y G u i d e l i n e s
Usability Testing
191
Usability Testing
192
18:2 Select
18:6
Solicit the
TestRight
Participants
Number of
Comments
Participants
Guideline: Select the right number of participants
Relative Importance:
Strength of Evidence:
Research-Ba s e d We b D e s i g n & U s a b i l i t y G u i d e l i n e s
193
Sources: Bailey, 1996; Bailey, 2000c; Bailey, 2000d; Brinck and Hofer, 2002;
Chin, 2001; Dumas, 2001; Gray and Salzman, 1998; Lewis, 1993; Lewis,
1994; Nielsen and Landauer, 1993; Perfetti and Landesman, 2001; Virzi,
1990; Virzi, 1992.
Relative Importance:
appropriate technology for the phase of the
design, the required fidelity of the prototype, and
skill of the person creating the prototype.
Strength of Evidence:
Sources: Sefelin, Tscheligi and Giller, 2003; Silvers, Voorheis and Anders, 2004;
Walker, Takayama and Landay, 2002.
R e s e a r c h - B a s e d We b D e s i g n & U s a b i l i t y G u i d e l i n e s
Usability Testing
It is best to perform iterative cycles of usability testing over the course
of the Web sites development. This enables usability specialists and
designers to observe and listen to many users.
Usability Testing
194
18:8
Inspection
EvaluationComments
Results Cautiously
18:2 Use
Solicit
Test Participants
Guideline: Use inspection evaluation results
with caution.
Relative Importance:
Strength of Evidence:
evaluations, expert reviews, and cognitive
walkthroughs. It is a common practice to conduct
an inspection evaluation to try to detect and resolve
obvious problems before conducting usability tests. Inspection evaluations
should be used cautiously because several studies have shown that they appear
to detect far more potential problems than actually exist, and they also tend to
miss some real problems. On average, for every hit there will be about 1.3 false
positives and .5 misses.
Another recent study concluded that the low effectiveness of heuristic
evaluations as a whole was worrisome because of the low problem detection
rate (p=.09), and the large number of evaluators required (16) to uncover
seventy-five percent of the potential usability issues.
Another difficulty when conducting heuristic evaluations is that evaluators
frequently apply the wrong heuristic, which can mislead designers that are
trying to fix the problem. One study reported that only thirty-nine percent of
the heuristics were appropriately applied.
Evaluators seem to have the most success identifying usability issues that can be
seen by merely looking at the display, and the least success finding issues that
require users to take several steps (clicks) to a target.
Heuristic evaluations and expert reviews may best be used to identify potential
usability issues to evaluate during usability testing. To improve somewhat
on the performance of heuristic evaluations, evaluators can use the usability
problem inspector (UPI) method or the Discovery and Analysis Resource
(DARe) method.
Sources: Andre, Hartson and Williges, 2003; Bailey, Allen and Raiello, 1992;
Catani and Biers, 1998; Cockton and Woolrych 2001; Cockton and Woolrych,
2002; Cockton, et al., 2003; Fu, Salvendy and Turley, 1998; Fu, Salvendy and
Turley, 2002; Law and Hvannberg, 2002; Law and Hvannberg, 2004; Nielsen
and Landauer, 1993; Nielsen and Mack, 1994; Rooden, Green and Kanis,
1999; Stanton and Stevenage, 1998; Virzi, Sorce and Herbert, 1993; Wang and
Caldwell, 2002.
Research-Ba s e d We b D e s i g n & U s a b i l i t y G u i d e l i n e s
18:9
RecognizeTasks
the Evaluator Effect
18:3 Prioritize
conducting inspection evaluations.
Strength of Evidence:
Sources: Hertzum and Jacobsen, 2001; Jacobsen, Hertzum and John, 1998;
Molich, et al., 1998; Molich, et al., 1999; Nielsen and Molich, 1990; Nielsen,
1992; Nielsen, 1993; Redish and Dumas, 1993; Selvidge, 2000.
Relative Importance:
Strength of Evidence:
one where software is used to evaluate a Web
site. An automatic evaluation tool can help find
certain types of design difficulties, such as pages
that will load slowly, missing links, use of jargon, potential accessibility
problems, etc. While automatic evaluation methods are useful, they should
not be used as a substitute for evaluations or usability testing with typical
users. There are many commercially available automatic evaluation methods
available for checking on a variety of Web site parameters.
Sources: Brajnik, 2000; Campbell and Stanley, 1963; Gray and Salzman, 1998;
Holleran, 1991; Ivory and Hearst, 2002; Ramey, 2000; Scholtz, 1998; World
Wide Web Consortium, 2001.
R e s e a r c h - B a s e d We b D e s i g n & U s a b i l i t y G u i d e l i n e s
195
Usability Testing
Relative Importance:
Usability Testing
196
18:2 Solicit
Test Participants
Comments
18:11
Use Cognitive
Walkthroughs
Cautiously
Guideline: Use cognitive walkthroughs with caution.
Relative Importance:
Hassenzahl, 2000; Jacobsen and John, 2000; Jeffries and Desurvire, 1992; John
and Mashyna, 1997; Karat, 1994b; Karat, Campbell and Fiegel, 1992; Spencer,
2000.
Relative Importance:
Strength of Evidence:
Sources: Brush, Ames and Davis, 2004; Hartson, et al., 1996; Thompson,
Rozanski and Rochester, 2004; Tullis, et al., 2002.
Research-Ba s e d We b D e s i g n & U s a b i l i t y G u i d e l i n e s
18:3 Prioritize
TasksRatings Cautiously
18:13
Use Severity
Relative Importance:
Sources: Bailey, 2005; Catani and Biers, 1998; Cockton and Woolrych, 2001;
Dumas, Molich and Jeffries, 2004; Hertzum and Jacobsen, 2001; Jacobsen,
Hertzum and John, 1998; Law and Hvannberg, 2004; Molich, 2005.
R e s e a r c h - B a s e d We b D e s i g n & U s a b i l i t y G u i d e l i n e s
Usability Testing
197
Usability Testing
Basics
An Overview
Contents
Usability Testing Defined ........................................................................................................................ 3
Decide What to Test ............................................................................................................................ 3
Determine When to Test What ........................................................................................................... 4
Decide How Many to Test ................................................................................................................... 4
Design the Test..................................................................................................................................... 5
Consider the Where, When, and How .......................................................................................... 5
Scenarios and Tasks ....................................................................................................................... 5
Prepare to Measure the Experience ............................................................................................. 7
Select Data to Capture .................................................................................................................... 8
Recruit Participants .............................................................................................................................. 8
Recruitment Ideas ............................................................................................................................ 8
Compensation ................................................................................................................................... 9
Prepare for Test Sessions .................................................................................................................. 9
Setting ................................................................................................................................................ 9
Schedule Participants ...................................................................................................................... 9
Stakeholders ..................................................................................................................................... 9
Observers .......................................................................................................................................... 9
Script ................................................................................................................................................ 10
Questionnaires and Surveys ........................................................................................................ 10
Conduct Test Sessions ..................................................................................................................... 10
Begin with a Run-through ............................................................................................................. 10
At the Test Session ........................................................................................................................ 10
Facilitation ....................................................................................................................................... 11
After the Session ............................................................................................................................ 11
Analyze Your Study ........................................................................................................................... 12
Step 1: Identify exactly what you observed................................................................................ 12
Step 2: Identify the causes of any problems .............................................................................. 12
Step 3: Determine Solutions ......................................................................................................... 12
Deliverables......................................................................................................................................... 13
Appendix A: Participant Recruitment Screener ................................................................................. 15
Recruitment Script.............................................................................................................................. 15
www.techsmith.com
www.techsmith.com
Participants:
Usability Goals:
Key Points:
Timeline:
The timeline for testingwhen the product or prototype will be ready for
testing, when the team would like to discuss the results, or any other
constraints
Additional
Information:
Be sure to identify user goals and needs as well. With this information you can then develop scenarios and
tasks for participants to perform that will help identify where the team can make improvements.
For example:
Who uses (or would use) the product?
What are their goals for using the product?
What tasks would those people want to or have to accomplish to meet that goals?
Are there design elements that cause problems and create a lot of support calls?
Are you interested in finding out if a new product feature makes sense to current users?
www.techsmith.com
DIAGNOSTIC
(FORMATIVE)
EVALUATION
SUMMATIVE TESTING
How many?
8-24 users
4-6 users
6-12+ users
Metrics and
Measures
Less formal
Increased focus on
qualitative data
More formal
Metrics based on
usability goals
Why
Establish baseline
metrics
When
Before a design
project begins or early
in development
During design
At end of process
How often
Once
Iterative
Once
www.techsmith.com
www.techsmith.com
www.techsmith.com
WHATS MEASURED
Task
Success
Time on
Task
Errors
Learnability
Satisfaction
Mouse
Clicks
Mouse
Movement
Problem/
Issue Counts
Optimal Path
Make Your
Own
Unlimited
www.techsmith.com
Recruit Participants
Recruiting is one of the most important components of a usability test. Your participants should adequately
reflect your true base of users and the user types you have decided to test, and represent a range of new and
experienced users in a way that would actually use your product.
Recruitment Ideas
Use your own customer databases or contacts
Hire an outside agency: look for market research firms if there are none specializing in usability
recruiting, good screeners are vital. There is a cost per candidate
Post on Craigs List: dont identify your company, just qualifications
Post something on your web site: start a usability testing page where site visitors can sign up to
participate
Place and ad in the paper: good for local audiences
When doing your own recruiting you should identify criteria that will help you select qualified
candidates. Experience with the product or the field, computer experience, age and other
demographics may be important to consider. See Appendix A: Participant Recruitment Screener.
When recruiting using an outside recruiting firm, a screener helps you get the right participants. A recruiting
screener is used to determine if a potential participant matches the user characteristics defined in the usability
test protocol. Include questions about demographics, frequency of use, experience level, etc.
www.techsmith.com
Ask questions that will help you filter out participants that dont match your target users and indicate when to
thank people for their time and let them know that they do not qualify. Ask enough questions so that you know
you have the right people. For example, qualified participants for a test of an online shopping site should have
access to a computer at home or at work and meet the other required demographics (age range, etc).
Compensation
You will need to think about what kind of compensation you will offer participants. Typically participants get
cash or an equivalent, a gift certificate, even merchandise from your company.
Setting
The most important factors are that the environment be comfortable for participants, similar to their real-world
environment, and reasonably similar between participants.
Schedule Participants
Schedule your participants to have adequate time to work through the test at their own pace. Allow enough
time between sessions to reset, debrief and regroup.
Stakeholders
When working your stakeholders, help them understand how you will conduct your testing. Stakeholders need
to understand how you will be interacting with your participant during test sessions. They need to understand
that you are there to facilitate the test and observe behavior, not help the participant complete tasks. There
are two basic models:
Facilitator interacts with the participant you often get more qualitative information, especially
when the facilitator is good at asking neutral questions and encouraging participants to find their own
answers in the product.
Facilitator does not interact with the participant you can get more natural behavior, but
participants are left to struggle or quit on their own. You often will not get as much qualitative data as
participants may not talk out loud as much. You may get more accurate measures of time on task and
failure, however.
Observers
At least one person can be enlisted to help you log all of your recordings for the data points youve
set out. By having someone else log the sessions, the facilitator can concentrate on the test. At the
same time, the recording will capture a rich set of data for later analysis.
In addition to a designated person to help you observe and log data, there may be a long list of
stakeholders who will benefit from observing a test. They commonly include the developers,
managers, product managers, quality testing analysts, sales and marketing staff, technical support
and documentation.
Watching users actually struggle with a product is a powerful experience. Observing test sessions helps make
the team open to making changes.
Remember to warn your observers not to drive changes until you and the team have had an opportunity to
analyze all testing and decide upon changes that will address the root causes.
www.techsmith.com
Script
Create a facilitator script to help you and your facilitators present a consistent set of information to your
participants. The script will also serve as a reminder to you to say certain things to your participants and
provide them appropriate paper work at the right times.
In your script, remind participants that the usability test is an evaluation of the product and not of their
performance, that all problems they find are helpful and that their feedback is valuable, and let participants
know their data will be aggregated with the rest of the participant data and they will not be identified.
10
www.techsmith.com
Use the script to help you remember what you need to do and say
Ask participants to fill out the consent form (include a non-disclosure agreement if your company
requires one)
Remember your facilitation skills and start the test
Allow enough time between test sessions to set equipment and prepare for your next participant
See Usability Testing and Morae for details on using Morae when conducting your test sessions.
Facilitation
Technique matters: An impartial facilitator conducts the test without influencing the participant. The facilitator
keeps the test flowing, provides simple directions, and keeps the participant focused. The facilitator may be
located near the participant or in another room with an intercom system. Often, participants are asked to keep
a running narration (called the think-aloud protocol) and the facilitator must keep the participant talking.
www.techsmith.com
11
12
www.techsmith.com
Deliverables
Deliverables reports, presentations, highlight videos and so on -- document what was done for future
reference. They often detail the usability problems found during the test plus any other data such as time on
task, error rate, satisfaction, etc. The Usability Test Report on the Morae Resource CD is one template you
might use to report results. Generally speaking, report or presentations will include:
Summary
Description of the product and the test objectives
Method
Participants
Context of the test
Tasks
Testing environment and equipment
Experiment design
What was measured (Metrics)
Results and findings
How participants fared (Graphs and tables)
Why they might not have done well (or why they did do well)
www.techsmith.com
13
Resources
A Practical Guide to Usability Testing, Revised by Joe Dumas and Ginny Redish, Intellect, 2
Edition, 1999
nd
References
Usability and Accessibility STEC Workshop 2008 Whitney Quesenbery
Recommendations on Recommendations, Rolf Molich, Kasper Hornbaek, Steve Krug, Jeff Johnson,
Josephine Scott, 2008, accepted for publication in User Experience Magazine, issue 7.4, October 2008
14
www.techsmith.com
NUMBER
CHARACTERISTICS
Experienced
product users
New product
users
Participation: All participants will spend about 60 minutes in the usability session. Incentive will be $50 in
cash.
Schedule: The usability tests will be conducted from May 5-7, 2008. Use schedule of available testing time
slots to schedule individual participants once they have passed the recruitment screener.
AVAILABLE TIME
SLOTS
TUES. MAY 5
WED. MAY 6
THURS. MAY 7
9-10 am
10:30-11:30 am
1-2 pm
2:30-3:30 pm
4-5 pm
Recruitment Script
Introduction
Hello, may I speak with ________. We are looking for participants to take part in a research study evaluating
the usability of the X Product. There will be $50 cash in compensation for the hour long session, which will
take the X Building located downtown. The session would involve one-on-one meeting with a researcher
where you would sit down in front of a computer and try to use a product while being observed and answering
questions about the product.
Would you be interested in participating?
If not: Thank you for taking the time to speak with me. If you know of anyone else who might be
interested in participating please have them call me, [Name], at 555-1234.
www.techsmith.com
15
Screening
I need to ask you a couple of questions to determine whether you meet the eligibility criteriaDo you have a
couple of minutes?
If not: When is a good time to call back?
Keep in mind that your answers to these questions to not automatically allow or disallow you take part in the
studywe just need accurate information about your background, so please answer as well as you can.
Have you ever used X product?
If yes:
How long have you used it for? [criteria: at least 1 yr.]
And how often do you use it? [criteria: at least 3 times a month]
If no:
Have you ever used any data processing products, such as [list competitor or similar products]?
[criteria: Yes]
If yes: How long have you used it for? [criteria: at least 1 yr.]
And how often do you use it? [criteria: at least 3 times a month]
Self-identify participant gender via voice and name and other cues.
Scheduling
If participant meets criteria: Will you be able to come to the X Building located downtown for one hour
between May 15 and 19? Free parking is available next to the building.
How is [name available times and dates]?
You will be participating in a one-on-one usability test session on [date and time]. Do you require any special
accommodations?
I need to have an e-mail address to send specific directions and confirmation information to. Thanks again!
If participant does not meet criteria: Unfortunately, you do not fit the criteria for this particular evaluation and
will not be able to participate. Thank you for taking the time to speak with me.
Use the screener questions in this script can in an email address for written recruitment.
16
www.techsmith.com
Chapter 13
Functional Testing
A functional specification is a description of intended program1 behavior,
distinct from the program itself. Whatever form the functional specification
takes whether formal or informal it is the most important source of information for designing tests. The set of activities for deriving test case specifications from program specifications is called functional testing.
Functional testing, or more precisely, functional test case design, attempts
to answer the question What test cases shall I use to exercise my program?
considering only the specification of a program and not its design or implementation structure. Being based on program specifications and not on the
internals of the code, functional testing is also called specification-based or
black-box testing.
Functional testing is typically the base-line technique for designing test
cases, for a number of reasons. Functional test case design can (and should)
begin as part of the requirements specification process, and continue through
each level of design and interface specification; it is the only test design technique with such wide and early applicability. Moreover, functional testing is
effective in finding some classes of fault that typically elude so-called whitebox or glass-box techniques of structural or fault-based testing. Functional testing techniques can be applied to any description of program behavior, from an informal partial description to a formal specification and at
any level of granularity, from module to system testing. Finally, functional
test cases are typically less expensive to design and execute than white-box
tests.
1 In this chapter we use the term program generically for the artifact under test, whether
that artifact is a complete application or an individual unit together with a test harness. This is
consistent with usage in the testing research literature.
47
Functional Testing
48
Required Background
13.1
Overview
In testing and analysis aimed at verification2 that is, at finding any discrepancies between what a program does and what it is intended to do
one must obviously refer to requirements as expressed by users and specified
by software engineers. A functional specification, i.e., a description of the expected behavior of the program, is the primary source of information for test
case specification.
Functional testing, also known as black-box or specification-based testing, denotes techniques that derive test cases from functional specifications.
Usually functional testing techniques produce test case specifications that
identify classes of test cases and be be instantiated to produce individual test
cases.
A particular functional testing technique may be effective only for some
kinds of software or may require a given specification style. For example,
a combinatorial approach may work well for functional units characterized
by a large number of relatively independent inputs, but may be less effective for functional units characterized by complex interrelations among inputs. Functional testing techniques designed for a given specification notation, e.g., finite state machines or grammars, are not easily applicable to other
specification styles.
The core of functional test case design is partitioning the possible behaviors of the program into a finite number of classes that can reasonably expected to consistently be correct or incorrect. In practice, the test case designer often must also complete the job of formalizing the specification far
enough to serve as the basis for identifying classes of behaviors. An important side effect of test design is highlighting weaknesses and incompleteness
of program specifications.
Deriving functional test cases is an analytical process which decomposes
specifications into test cases. The myriad of aspects that must be taken into
2 Here we focus on software verification as opposed to validation (see Chapter 2). The problems of validating the software and its specifications, i.e., checking the program behavior and its
specifications with respect to the users expectations, is treated in Chapter 12.
Overview
49
Test cases and test suites can be derived from several sources of information, including specifications (functional testing), detailed design and source code (structural testing), and hypothesized defects (fault-based testing). Functional test case design is an
indispensable base of a good test suite, complemented but never replaced by by structural and fault-based testing, because there are classes of faults that only functional testing effectively detects. Omission of a feature, for example, is unlikely to be revealed by
techniques which refer only to the code structure.
Consider a program that is supposed to accept files in either plain ASCII text, or
HTML, or PDF formats and generate standard PostScript. Suppose the programmer overlooks the PDF functionality, so the program accepts only plain text and HTML files. Intuitively, a functional testing criterion would require at least one test case for each item in
the specification, regardless of the implementation, i.e., it would require the program to
be exercised with at least one ASCII, one HTML, and one PDF file, thus easily revealing
the failure due to the missing code. In contrast, criterion based solely on the code would
not require the program to be exercised with a PDF file, since all of the code can be exercised without attempting to use that feature. Similarly, fault-based techniques, based on
potential faults in design or coding, would not have any reason to indicate a PDF file as a
potential input even if missing case were included in the catalog of potential faults.
A functional specification often addresses semantically rich domains, and we can use
domain information in addition to the cases explicitly enumerated in the program specification. For example, while a program may manipulate a string of up to nine alphanumeric characters, the program specification may reveal that these characters represent a
postal code, which immediately suggests test cases based on postal codes of various localities. Suppose the program logic distinguishes only two cases, depending on whether
they are found in a table of U.S. zip codes. A structural testing criterion would require
testing of valid and invalid U.S. zip codes, but only consideration of the specification and
richer knowledge of the domain would suggest test cases that reveal missing logic for
distinguishing between U.S.-bound mail with invalid U.S. zip codes and mail bound to
other countries.
Functional testing can be applied at any level of granularity where some form of specification is available, from overall system testing to individual units, although the level of
granularity and the type of software influence the choice of the specification styles and
notations, and consequently the functional testing techniques that can be used.
In contrast, structural and fault-based testing techniques are invariably tied to program structures at some particular level of granularity, and do not scale much beyond
that level. The most common structural testing techniques are tied to fine-grain program structures (statements, classes, etc.) and are applicable only at the level of modules
or small collections of modules (small subsystems, components, or libraries).
Functional Testing
50
account during functional test case specification makes the process error prone.
Even expert test designers can miss important test cases. A methodology for
functional test design systematically helps by decomposing the functional
test design activity into elementary steps that cope with single aspect of the
process. In this way, it is possible to master the complexity of the process and
separate human intensive activities from activities that can be automated.
Systematic processes amplify but do not substitute for skills and experience
of the test designers.
In a few cases, functional testing can be fully automated. This is possible
for example when specifications are given in terms of some formal model,
e.g., a grammar or an extended state machine specification. In these (exceptional) cases, the creative work is performed during specification and design
of the software. The test designers job is then limited to the choice of the test
selection criteria, which defines the strategy for generating test case specifications. In most cases, however, functional test design is a human intensive
activity. For example, when test designers must work from informal specifications written in natural language, much of the work is in structuring the
specification adequately for identifying test cases.
13.2
With few exceptions, the number of potential test cases for a given program
is unimaginably huge so large that for all practical purposes it can be considered infinite. For example, even a simple function whose input arguments
legal inputs. In contrast to input spaces,
are two 32-bit integers has
budgets and schedules are finite, so any practical method for testing must select an infinitesimally small portion of the complete input space.
Some test cases are better than others, in the sense that some reveal faults
and others do not.3 Of course, we cannot know in advance which test cases
reveal faults. At a minimum, though, we can observe that running the same
test case again is less likely to reveal a fault than running a different test case,
and we may reasonably hypothesize that a test case that is very different from
the test cases that precede it is more valuable than a test case that is very
similar (in some sense yet to be defined) to others.
As an extreme example, suppose we are allowed to select only three test
cases for a program that breaks a text buffer into lines of 60 characters each.
Suppose the first test case is a buffer containing 40 characters, and the second
is a buffer containing 30 characters. As a final test case, we can choose a buffer
containing 16 characters or a buffer containing 100 characters. Although we
cannot prove that the 100 character buffer is the better test case (and it might
not be; the fact that 16 is a power of 2 might have some unforeseen significance), we are naturally suspicious of a set of tests which is strongly biased
toward lengths less than 60.
3 Note that the relative value of different test cases would be quite different if our goal were to
measure dependability, rather than finding faults so that they can be repaired.
51
While the informal meanings of words like test may be adequate for everyday conversation, in this context we must try to use terms in a more precise and consistent manner. Unfortunately, the terms we will need are not always used consistently in the literature, despite the existence of an IEEE standard that defines several of them. The terms
we will use are defined below.
Independently testable feature (ITF): An ITF is a functionality that can be tested independently of other functionalities of the software under test. It need not correspond
to a unit or subsystem of the software. For example, a file sorting utility may be capable of merging two sorted files, and it may be possible to test the sorting and
merging functionalities separately, even though both features are implemented by
much of the same source code. (The nearest IEEE standard term is test item.)
As functional testing can be applied at many different granularities, from unit testing through integration and system testing, so ITFs may range from the functionality of an individual Java class or C function up to features of a integrated system
composed of many complete programs. The granularity of an ITF depends on the
exposed interface at whichever granularity is being tested. For example, individual
methods of a class are part of the interface of the class, and a set of related methods
(or even a single method) might be an ITF for unit testing, but for system testing the
ITFs would be features visible through a user interface or application programming
interface.
Test case: A test case is a set of inputs, execution conditions, and expected results. The
term input is used in a very broad sense, which may include all kinds of stimuli
that contribute to determining program behavior. For example, an interrupt is as
much an input as is a file. (This usage follows the IEEE standard.)
Test case specification: The distinction between a test case specification and a test case
is similar to the distinction between a program and a program specification. Many
different test cases may satisfy a single test case specification. A simple test specification for a sorting method might require an input sequence that is already in
sorted order. A test case satisfying that specification might be sorting the particular
vector (alpha, beta, delta.) (This usage follows the IEEE standard.)
Test suite: A test suite is a set of test cases. Typically, a method for functional testing
is concerned with creating a test suite. A test suite for a program, a system, or an
individual unit may be made up of several test suites for individual ITFs. (This usage
follows the IEEE standard.)
Test: We use the term test to refer to the activity of executing test cases and evaluating
their result. When we refer to a test, we mean execution of a single test case, except where context makes it clear that the reference is to execution of a whole test
suite. (The IEEE standard allows this and other definitions.)
Functional Testing
52
Accidental bias may be avoided by choosing test cases from a random distribution. Random sampling is often an inexpensive way to produce a large
number of test cases. If we assume absolutely no knowledge on which to
place a higher value on one test case than another, then random sampling
maximizes value by maximizing the number of test cases that can be created
(without bias) for a given budget. Even if we do possess some knowledge suggesting that some cases are more valuable than others, the efficiency of random sampling may in some cases outweigh its inability to use any knowledge
we may have.
Consider again the line-break program, and suppose that our budget is
one day of testing effort rather than some arbitrary number of test cases. If the
cost of random selection and actual execution of test cases is small enough,
then we may prefer to run a large number of random test cases rather than
expending more effort on each of a smaller number of test cases. We may in
a few hours construct programs that generate buffers with various contents
and lengths up to a few thousand characters, as well as an automated procedure for checking the program output. Letting it run unattended overnight,
we may execute a few million test cases. If the program does not correctly
handle a buffer containing a sequence of more than 60 non-blank characters
(a single word that does not fit on a line), we are likely to encounter this
case by sheer luck if we execute enough random tests, even without having
explicitly considered this case.
Even a few million test cases is an infinitesimal fraction of the complete
input space of most programs. Large numbers of random tests are unlikely
to find failures at single points (singularities) in the input space. Consider,
for example, a simple procedure for returning the two roots of a quadratic
- 1:
- 2:
- 3:
- 4:
- 5:
- 6:
- 7:
- 8:
- 9:
-10:
-11:
-12:
-13:
-14:
-15:
-16:
-17:
-18:
-19:
-20:
-21:
-22:
-23:
-24:
-25:
-26:
-27:
-28:
-29:
-30:
-31:
-32:
-33:
-34:
-35:
-36:
-37:
-38:
-39:
-40:
-41:
53
Figure 13.1: The Java class roots, which finds roots of a quadratic equation.
The case analysis in the implementation is incomplete: It does not properly
and
. We cannot anticipate all such
handle the case in which
faults, but experience teaches that boundary values identifiable in a specification are disproportionately valuable. Uniform random generation of even
large numbers of test cases is ineffective at finding the fault in this program,
but selection of a few special values based on the specification quickly uncovers it.
Functional Testing
54
A Systematic Approach
55
56
Functional Testing
different steps. Although most techniques are presented and applied as stand
alone methods, it is also possible to mix and match steps from different techniques, or to apply different methods for different parts of the system to be
tested.
Identify Independently Testable Features Functional specifications can be
large and complex. Usually, complex specifications describe systems that can
be decomposed into distinct features. For example, the specification of a web
site may include features for searching the site database, registering users
profiles, getting and storing information provided by the users in different
forms, etc. The specification of each of these features may comprise several
functionalities. For example, the search feature may include functionalities
for editing a search pattern, searching the data base with a given pattern,
and so on. Although it is possible to design test cases that exercise several
functionalities at once, the design of different tests for different functionalities can simplify the test generation problem, allowing each functionality to
be examined separately. Moreover, it eases locating faults that cause the revealed failures. It is thus recommended to devise separate test cases for each
functionality of the system, whenever possible.
The preliminary step of functional testing consists in partitioning the specifications into features that can be tested separately. This can be an easy step
for well designed, modular specifications, but informal specifications of large
systems may be difficult to decompose into independently testable features.
Some degree of formality, at least to the point of careful definition and use of
terms, is usually required.
Identification of functional features that can be tested separately is different from module decomposition. In both cases we apply the divide and
conquer principle, but in the former case, we partition specifications according to the functional behavior as perceived by the users of the software under
test,5 while in the latter, we identify logical units that can be implemented
separately. For example, a web site may require a sort function, as a service
routine, that does not correspond to an external functionality. The sort function may be a functional feature at module testing, when the program under
test is the sort function itself, but is not a functional feature at system test,
while deriving test cases from the specifications of the whole web site. On
the other hand, the registration of a new user profile can be identified as one
of the functional features at system level testing, even if such functionality is
implemented with several modules (unit at the design level) of the system.
Thus, identifying functional features does not correspond to identifying single modules at the design level, but rather to suitably slicing the specifications
to be able to attack their complexity incrementally, aiming at deriving useful
test cases for the whole system under test.
5 Here the word user indicates who uses the specified service. It can be the user of the system,
when dealing with specification at system level; but it can be another module of the system,
when dealing with specifications at unit level.
A Systematic Approach
57
Identify
Independently
Testable
Features
Functional Specifications
Brute
Force
Testing
D
a M erive
od
el
Representative Values
Model
Ge
Semantic Constraints
Combinatorial Selection
Exaustive Enumeration
Random Selection
se
Ca
st- ns
e
T io
te at
ra ific
ne pec
e
G S
ne
Sp rate
ec Te
ific st
ati -Ca
on se
s
Generate
Test Cases
Manual Mapping
Symbolic Execution
A-posteriori Satisfaction
Instantiate
Tests
Test Cases
Scaffolding
58
Functional Testing
Independently testable features are described by identifying all the inputs
that form their execution environments. Inputs may be given in different
forms depending on the notation used to express the specifications. In some
cases they may be easily identifiable. For example, they can be the input alphabet of a finite state machine specifying the behavior of the system. In
other cases, they may be hidden in the specification. This is often the case of
informal specifications, where some inputs may be given explicitly as parameters of the functional unit, but other inputs may be left implicit in the description. For example, a description of how a new user registers at a web site
may explicitly indicate the data that constitutes the user profile to be inserted
as parameters of the functional unit, but may leave implicit the collection of
elements (e.g., database) in which the new profile must be inserted.
Trying to identify inputs may help in distinguishing different functions.
For example, trying to identify the inputs of a graphical tool may lead to a
clearer distinction between the graphical interface per se and the associated
calbacks to the application. With respect to the web-based user registration
function, the data to be inserted in the database are part of the execution
environment of the functional unit that performs the insertion of the user
profile, while the combination of fields that can be use to construct such data
is part of the execution environment of the functional unit that takes care of
the management of the specific graphical interface.
Identify Representative Classes of Values or Derive a Model The execution
environment of the feature under test determines the form of the final test
cases, which are given as combinations of values for the inputs to the unit.
The next step of a testing process consists of identifying which values of each
input can be chosen to form test cases. Representative values can be identified directly from informal specifications expressed in natural language. Alternativey, representative values may be selected indirectly through a model,
which can either be produced only for the sake of testing or be available as
part of the specification. In both cases, the aim of this step is to identify the
values for each input in isolation, either explicitly through enumeration, or
implicitly trough a suitable model, but not to select suitable combinations of
such values, i.e., test case specifications. In this way, we separate the problem of identifying the representative values for each input, from the problem
of combining them to obtain meaningful test cases, thus splitting a complex
step into two simpler steps.
Most methods that can be applied to informal specifications rely on explicit enumeration of representative values by the test designer. In this case,
it is very important to consider all possible cases and take advantage of the information provided by the specification. We may identify different categories
of expected values, as well as boundary and exceptional or erroneous values.
For example, when considering operations on a non-empty lists of elements,
we may distinguish the cases of the empty list (an error value) and a singleton element (a boundary value) as special cases. Usually this step determines
Draft version produced 20th March 2002
A Systematic Approach
59
characteristics of values (e.g., any list with a single element) rather than actual
values.
Implicit enumeration requires the construction of a (partial) model of the
specifications. Such a model may be already available as part of a specification or design model, but more often it must be constructed by the test
designer, in consultation with other designers. For example, a specification
given as a finite state machine implicitly identifies different values for the inputs by means of the transitions triggered by the different values. In some
cases, we can construct a partial model as a mean for identifying different
values for the inputs. For example, we may derive a grammar from a specification and thus identify different values according to the legal sequences of
productions of the given grammar.
Directly enumerating representative values may appear simpler and less
expensive than producing a suitable model from which values may be derived. However, a formal model may also be valuable in subsequent steps of
test case design, including selection of combinations of values. Also, a formal model may make it easier to select a larger or smaller number of test
cases, balancing cost and thoroughness, and may be less costly to modify and
reuse as the system under test evolves. Whether to invest effort in producing a
model is ultimately a management decision that depends on the application
domain, the skills of test designers, and the availability of suitable tools.
Generate Test Case Specifications Test specifications are obtained by suitably combining values for all inputs of the functional unit under test. If representative values were explicitly enumerated in the previous step, then test
case specifications will be elements of the Cartesian product of values selected for each input. If a formal model was produced, then test case specifications will be specific behaviors or combinations of parameters of the model,
and single test case specification could be satisfied by many different concrete inputs. Either way, brute force enumeration of all combinations is unlikely to be satisfactory.
The number of combinations in the Cartesian product of independently
selected values grows as the product of the sizes of the individual sets. For a
simple functional unit with 5 inputs each characterized by 6 values, the size
test case specifications, which may be
of the Cartesian product is
an impractical number for test cases for a simple functional unit. Moreover, if
(as is usual) the characteristics are not completely orthogonal, many of these
combinations may not even be feasible.
Consider the input of a function that searches for occurrences of a complex pattern in a web database. Its input may be characterized by the length
of the pattern and the presence of special characters in the pattern, among
other aspects. Interesting values for the length of the pattern may be zero,
one, or many. Interesting values for the presence of special characters may be
zero, one, or many. However, the combination of value zero for the length
of the pattern and value many for the number of special characters in the
Draft version produced 20th March 2002
Functional Testing
60
13.4
Category-Partition Testing
Category-partition testing is a method for generating functional tests from informal specifications. The main steps covered by the core part of the categorypartition method are:
A. Decompose the specification into independently testable features: Test designers identify features to be tested separately, and identify parameters and any other elements of the execution environment the unit depends on. Environment dependencies are treated identically to explicit
Draft version produced 20th March 2002
Category-Partition Testing
61
parameters. For each parameter and environment element, test designers identify the elementary parameter characteristics, which in the
category-partition method are usually called categories.
B. Identify Relevant Values: Test designers select a set of representative classes
of values for each parameter characteristic. Values are selected in isola-
tion, independent of other parameter characteristics. In the categorypartition method, classes of values are called choices, and this activity is
called partitioning the categories into choices.
C. Generate Test Case Specifications: Test designers indicate invalid combinations of values and restrict valid combinations of values by imposing
semantic constraints on the identified values. Semantic constraints restrict the values that can be combined and identify values that need not
be tested in different combinations, e.g., exceptional or invalid values.
Categories, choices, and constraints can be provided to a tool to automatically generate a set of test case specifications. Automating trivial and
repetitive activities such as these makes better use of human resources and
reduces errors due to distraction. Just as important, it is possible to determine the number of test cases that will be generated (by calculation, or by actually generating them) before investing any human effort in test execution. If
the number of derivable test cases exceeds the budget for test execution and
evaluation, test designers can reduce the number of test cases by imposing
additional semantic constraints. Controlling the number of test cases before
test execution begins is preferable to ad hoc approaches in which one may at
first create very thorough test suites and then test less and less thoroughly as
deadlines approach.
We illustrate the category-partition method using a specification of a feature from the direct sales web site of Chipmunk Electronic Ventures. Customers are allowed to select and price custom configurations of Chipmunk
computers. A configuration is a set of selected options for a particular model
of computer. Some combinations of model and options are not valid (e.g.,
digital LCD monitor with analog video card), so configurations are tested for
validity before they are priced. The check-configuration function (Table 13.3)
is given a model number and a set of components, and returns the boolean
value True if the configuration is valid or False otherwise. This function has
been selected by the test designers as an independently testable feature.
A. Identify Independently Testable Features and Parameter Characteristics
We assume that step starts by selecting the Check-configuration feature to
be tested independently of other features. This entails choosing to separate
testing of the configuration check per se from its presentation through a user
interface (e.g., a web form), and depends on the architectural design of the
software system.
Draft version produced 20th March 2002
Functional Testing
62
Model:
Set of Components:
Check-Configuration:
Category-Partition Testing
63
64
Functional Testing
ment in Table 13.3 makes no distinction between configurations of models
with several required slots and models with none, but the experienced test
designer has seen enough failures on degenerate inputs to test empty collections wherever a collection is allowed.
The number of options that can (or must) be configured for a particular
model of computer may vary from model to model. However, the categorypartition method makes no direct provision for structured data, such as sets
of pairs. A typical approach is to flatten collections and describe characteristics of the whole collection as parameter characteristics.
Typically the size of the collection (the length of a string, for example, or in
this case the number of required or optional slots) is one characteristic, and
descriptions of possible combination of elements (occurrence of a special
characters in a string, for example, or in this case the selection of required
and optional components) are separate parameter characteristics.
Suppose the only significant variation among pairs was
between pairs that are compatible and pairs that are incompatible. If we
treated each pair as a separate characteristic, and assumed
slots, the category-partition method would generate all combinations of
compatible and incompatible slots. Thus we might have a test case in which
the first selected option is compatible, the second is compatible, and the third
incompatible, and a different test case in which the first is compatible but the
second and third are incompatible, and so on, and each of these combinations could be combined in several ways with other parameter characteristics. The number of combinations quickly explodes, and moreover since the
number of slots is not actually fixed, we cannot even place an upper bound
on the number of combinations that must be considered. We will therefore
choose the flattening approach and select possible patterns for the collection
as a whole.
Should the representative values of the flattened collection of pairs be one
compatible selection, one incompatible selection, all compatible selections, all
incompatible selections, or should we also include mix of 2 or more compatible
and 2 or more incompatible selections? Certainly the latter is more thorough,
but whether there is sufficient value to justify the cost of this thoroughness is
a matter of judgment by the test designer.
We have oversimplified by considering only whether a selection is compatible with a slot. It might also happen that the selection does not appear in
the database. Moreover, the selection might be incompatible with the model,
or with a selected component of another slot, in addition to the possibility
that it is incompatible with the slot for which it has been selected. If we treat
each such possibility as a separate parameter characteristic, we will generate many combinations, and we will need semantic constraints to rule out
combinations like there are three options, at least two of which are compatible with the model and two of which are not, and none of which appears in
the database. On the other hand, if we simply enumerate the combinations
that do make sense and are worth testing, then it becomes more difficult to
be sure that no important combinations have been omitted. Like all design
Category-Partition Testing
65
decisions, the way in which collections and complex data are broken into parameter characteristics requires judgment based on a combination of analysis and experience.
B. Identify Relevant Values This step consists of identifying a list of relevant values (more precisely, a list of classes of relevant values) for each of the
parameter characteristics identified during step . Relevant values should
be identified for each category independently, ignoring possible interactions
among values for different categories, which are considered in the next step.
Relevant values may be identified by manually applying a set of rules known
as boundary value testing or erroneous condition testing. The boundary value
testing rule suggests selection of extreme values within a class (e.g., maximum and minimum values of the legal range), values outside but as close as
possible to the class, and interior (non-extreme) values of the class. Values
near the boundary of a class are often useful in detecting off by one errors
in programs. The erroneous condition rule suggests selecting values that are
outside the normal domain of the program, since experience suggests that
proper handling of error cases is often overlooked.
Table 13.1 summarizes the parameter characteristics and the corresponding relevant values identified for feature Check-configuration.6 For numeric
characteristics, whose legal values have a lower bound of , i.e., number of
models in database and number of components in database, we identify , the
erroneous value, , the boundary value, and
, the class of values greater
than , as the relevant value classes. For numeric characteristics whose lower
bound is zero, i.e., number of required slots for selected model and number
of optional slots for selected model, we identify as a boundary value, and
many as other relevant classes of values. Negative values are impossible here,
so we do not add a negative error choice. For numeric characteristics whose
legal values have definite lower and upper-bounds, i.e., number of optional
components with selection empty and number of optional components with
selection empty, we identify boundary and (when possible) erroneous conditions corresponding to both lower and upper bounds.
Identifying relevant values is an important but tedious task. Test designers may improve manual selection of relevant values by using the catalog approach described in Section 13.8, which captures the informal approaches
used in this section with a systematic application of catalog entries.
Functional Testing
66
Parameter: Model
Model number
Parameter: Components
Correspondence of selection with
model slots
Number of components
database (#DBC)
in
Table 13.1: An example category-partition test specification for the the configuration checking feature of the web site of a computer vendor.
Draft version produced 20th March 2002
Category-Partition Testing
67
example, in the Table 13.1 we find 7 categories with 3 value classes, 2 categories with 6 value classes, and one with four value classes, potentially resulting in
test cases, which would be acceptable only if
the cost of executing and checking each individual test case were very small.
However, not all combinations of value classes correspond to reasonable test
case specifications. For example, it is not possible to create a test case from
a test case specification requiring a valid model (a model appearing in the
database) where the database contains zero models.
The category-partition method allows one to omit some combinations by
indicating value classes that need not be combined with all other values. The
label indicates a value class that need be tried only once, in combination with non-error values of other parameters. When constraints are
considered in the category-partition specification of Table 13.1, the number
of combinations to be considered is reduced to
Functional Testing
68
ponents with selection empty with value 0 for Number of Required Slots for
Selected Model (#SMRS). Similarly, the if OSNE constraint limits the combinations of values of the parameter characteristics Number of optional components with selection empty and Number of Optional Slots for Selected Model
(#SMOS).
The property and if-property constraints introduced in Table 13.1 further
reduce the number of combinations to be considered to
. (Exercise Ex13.4 discusses derivation of this
number.)
The number of combinations can be further reduced by iteratively adding
property and if-property constraints and by introducing the new single constraint, which is indicated with label single and acts like the error constraint,
i.e., it limits the number of occurrences of a given value in the selected combinations to 1.
Introducing new property, if-property, and single constraints further does
not rule out erroneous combinations, but reflects the judgment of the test designer, who decides how to restrict the number of combinations to be considered by identifying single values (single constraint) or combinations (property
and if-property constraints) that are less likely to need thorough test according to the test designers judgment.
The single constraints introduced in Table 13.1 reduces the number of
,
combinations to be considered to
which may be a reasonable tradeoff between costs and quality for the considered functionality. The number of combinations can also be reduced by
applying combinatorial techniques, as explained in the next section.
The set of combinations of values for the parameter characteristics can
be turned into test case specifications by simply instantiating the identified
combinations. Table 13.2 shows an excerpt of test case specifications. The
error tag in the last column indicates test cases specifications corresponding
to the error constraint. Corresponding test cases should produce an error
indication. A dash indicates no constraints on the choice of values for the
parameter or environment element.
Choosing meaningful names for parameter characteristics and value classes
allows (semi)automatic generation of test case specifications.
13.5
However one obtains sets of value classes for each parameter characteristic,
the next step in producing test case specifications is selecting combinations
of classes for testing. A simple approach is to exhaustively enumerate all
possible combinations of classes, but the number of possible combinations
rapidly explodes.
Some methods, such as the category-partition method described in the
previous section, take exhaustive enumeration as a base approach to generating combinations, but allow the test designer to add constraints that limit
Draft version produced 20th March 2002
69
Table 13.2: An excerpt of test case specifications derived from the value
classes given in Table 13.1
growth in the number of combinations. This can be a reasonable approach
when the constraints on test case generation reflect real constraints in the
application domain, and eliminate many redundant combinations (for example, the error entries in category-partition testing). It is less satisfactory
when, lacking real constraints from the application domain, the test designer
is forced to add arbitrary constraints (e.g., single entries in the categorypartition method) whose sole purpose is to reduce the number of combinations.
Consider the parameters that control the Chipmunk web-site display, shown
in Table 13.3. Exhaustive enumeration produces 432 combinations, which
is too many if the test results (e.g., judging readability) involve human judgment. While the test designer might hypothesize some constraints, such as
observing that monochrome displays are limited mostly to hand-held devices, radical reductions require adding several single and property constraints without any particular rationale.
Exhaustive enumeration of all -way combinations of value classes for
parameters, on the one hand, and coverage of individual classes, on the
other, are only the extreme ends of a spectrum of strategies for generating
combinations of classes. Between them lie strategies that generate all pairs of
classes for different parameters, all triples, and so on. When it is reasonable
to expect some potential interaction between parameters (so coverage of individual value classes is deemed insufficient), but covering all combinations
is impractical, an attractive alternative is to generate -way combinations for
, typically pairs or triples.
How much does generating possible pairs of classes save, compared to
Draft version produced 20th March 2002
Functional Testing
70
Display Mode
Color
Language
Fonts
Screen size
71
Table 13.4: Covering all pairs of value classes for three parameters by extending the cross-product of two parameters
the approach). Fortunately, efficient heuristic algorithms exist for this task,
and they are simple enough to incorporate in tools.7
The tuples in Table 13.5 cover all pairwise combinations of value choices
for parameters. In many cases not all choices may be allowed. For example, the specification of the Chipmunk web-site display may indicate that
monochrome displays are limited to hand-held devices. In this case, the tuples covering the pairs Monochrome Laptop and Monochrome Full-size ,
i.e., the fifth and ninth tuples of Table 13.5, would not correspond to legal inputs. We can restrict the set of legal combinations of value classes by adding
suitable constraints. Constraints can be expressed as tuples with wild-card
characters to indicate any possible value class. For example, the constraints
Monochrome Laptop
Monochrome Full-size
indicates that tuples that contain the pair Monochrome Hand-held as
values for the fourth and fifth parameter are not allowed in the relation of Table 13.3. Tuples that cover all pairwise combinations of value classes without
violating the constraints can be generated by simply removing the illegal tuples and adding legal tuples that cover the removed pairwise combinations.
Open choices must be bound consistently in the remaining tuples, e.g., tuple
Portuguese Monochrome Text-only - must become
Portuguese Monochrome Text-only - Hand-held
Constraints can also be expressed with sets of tables to indicate only the
legal combinations, as illustrated in Table 13.6, where the first table indicates
7 Exercise Ex13.12 discusses the problem of computing suitable combinations to cover all
pairs.
Functional Testing
72
Table 13.5: Covering all pairs of value classes for the five parameters
that the value class Hand-held for parameter Screen can be combined with
any value class of parameter Color, including Monochrome, while the second table indicates that the value classes Laptop and Full-size for parameter
Screen size can be combined with all values classes but Monochrome for parameter Color.
If constraints are expressed as a set of tables that give only legal combinations, tuples can be generated without changing the heuristic. Although
the two approaches express the same constraints, the number of generated
tuples can be different, since different tables may indicate overlapping pairs
and thus result in a larger set of tuples. Other ways of expressing constraints
may be chosen according to the characteristics of the specifications and the
preferences of the test designer.
So far we have illustrated the combinatorial approach with pairwise coverage. As previously mentioned, the same approach can be applied for triples
or larger combinations. Pairwise combinations may be sufficient for some
subset of the parameters, but not enough to uncover potential interactions
among other parameters. For example, in the Chipmunk display example,
the fit of text fields to screen areas depends on the combination of language,
fonts, and screen size. Thus, we may prefer exhaustive coverage of combinations of these three parameters, but be satisfied with pairwise coverage of
other parameters. In this case, we first generate tuples of classes from the
parameters to be most thoroughly covered, and then extend these with the
Draft version produced 20th March 2002
73
Hand-held devices
Display Mode
Language
Fonts
Color
Screen size
Language
Fonts
Color
Screen size
Table 13.6: Pairs of tables that indicate valid value classes for the Chipmunk
web-site display controller
Functional Testing
74
parameters which require less coverage.8
13.6
The combinatorial approaches described above primarily select combinations of orthogonal choices. They can accommodate constraints among choices,
but their strength is in generating combinations of (purportedly) independent choices. Some specifications, formal and informal, have a structure that
emphasizes the way particular combinations of parameters or their properties determine which of several potential outcomes is chosen. Results of the
computation may be determined by boolean predicates on the inputs. In
some cases, choices are specified explicitly as boolean expressions. More often, choices are described either informally or with tables or graphs that can
assume various forms. When such a decision structure is present, it can play
a part in choosing combinations of values for testing.
For example, the informal specification of Figure 13.4 describes outputs
that depend on type of account (either educational, or business, or individual), amount of current and yearly purchases, and availability of special prices.
These can be considered as boolean conditions, e.g., the condition educational account is either true or false (even if the type of account is actually
represented in some other manner). Outputs can be described as boolean
expressions over the inputs, e.g., the output no discount can be associated
with the boolean expression
individual account
current purchase
special offer price
business account
current purchase
current purchase
special offer price
8 See
75
Pricing: The pricing function determines the adjusted price of a configuration for a
particular customer. The scheduled price of a configuration is the sum of the
scheduled price of the model and the scheduled price of each component in the
configuration. The adjusted price is either the scheduled price, if no discounts
are applicable, or the scheduled price less any applicable discounts.
There are three price schedules and three corresponding discount schedules,
Business, Educational, and Individual. The Business price and discount schedules apply only if the order is to be charged to a business account in good standing. The Educational price and discount schedules apply to educational institutions. The Individual price and discount schedules apply to all other customers.
Account classes and rules for establishing business and educational accounts
are described further in [. . . ].
A discount schedule includes up to three discount levels, in addition to the possibility of no discount. Each discount level is characterized by two threshold
values, a value for the current purchase (configuration schedule price) and a
cumulative value for purchases over the preceding 12 months (sum of adjusted
price).
Educational prices The adjusted price for a purchase charged to an educational account in good standing is the scheduled price from the educational price schedule. No further discounts apply.
Business account discounts Business discounts depend on the size of the current
purchase as well as business in the preceding 12 months. A tier 1 discount is
applicable if the scheduled price of the current order exceeds the tier 1 current
order threshold, or if total paid invoices to the account over the preceding 12
months exceeds the tier 1 year cumulative value threshold. A tier 2 discount
is applicable if the current order exceeds the tier 2 current order threshold, or
if total paid invoices to the account over the preceding 12 months exceeds the
tier 2 cumulative value threshold. A tier 2 discount is also applicable if both the
current order and 12 month cumulative payments exceed the tier 1 thresholds.
Individual discounts Purchase by individuals and by others without an established
account in good standing are based on current value alone (not on cumulative purchases). A tier 1 individual discount is applicable if the scheduled price
of the configuration in the current order exceeds the the tier 1 current order
threshold. A tier 2 individual discount is applicable if the scheduled price of the
configuration exceeds the tier 2 current order threshold.
Special-price non-discountable offers Sometimes a complete configuration is offered at a special, non-discountable price. When a special, non-discountable
price is available for a configuration, the adjusted price is the non-discountable
price or the regular price after any applicable discounts, whichever is less.
Functional Testing
76
A predicate is a function with a boolean (True or False) value. When the input argument of the predicate is clear, particularly when it describes some property of the input of
a program, we often leave it implicit. For example, the actual representation of account
types in an information system might be as three-letter codes, but in a specification we
may not be concerned with that representation we know only that there is some predicate educational-account which is either True or False.
An elementary condition is a single predicate that cannot be decomposed further. A
complex condition is made up of elementary conditions, combined with boolean connectives.
The boolean connectives include and ( ), or ( ), not ( ), and several less common derived connectives such as implies and exclusive or.
Decision tables are completed with a set of constraints that limit the possible combinations of elementary conditions. A constraint language can be
based on boolean logic. Often it is useful to add some shorthand notations
for common conditions that are tedious to express with the standard connectives, such as at-most-one(C1, . . . , Cn) and exactly-one(C1, . . . , Cn).
Figure 13.5 shows the decision table for the functional specification of feature pricing of the Chipmunk web site presented in Figure 13.4.
The informal specification of Figure 13.4 identifies three customer profiles: educational, business, and individual. Table 13.5 has only rows educational and business. The choice individual corresponds to the combination false, false for choices educational and business, and is thus redundant.
The informal specification of Figure 13.4 indicates different discount policies depending on the relation between the current purchase and two progressive thresholds for the current purchase and the yearly cumulative purchase. These cases correspond to rows 3 through 6 of table 13.5. Conditions
on thresholds that do not correspond to individual rows in the table can be
defined by suitable combinations of values for these rows. Finally, the informal specification of Figure 13.4 distinguishes the cases in which special offer
prices do not exceed either the scheduled or the tier 1 or tier 2 prices. Rows 7
through 9 of the table, suitably combined, capture all possible cases of special
prices without redundancy.
Constraints formalize the compatibility relations among different elementary conditions listed in the table: Educational and Business accounts are
exclusive; A cumulative purchase exceeding threshold tier 2, also exceeds
threshold tier 1; a yearly purchase exceeding threshold tier 2, also exceeds
threshold tier 1; a cumulative purchase not exceeding threshold tier 1 does
not exceed threshold tier 2; a yearly purchase not exceeding threshold tier 1
does not exceed threshold tier 2; a special offer price not exceeding threshold tier 1 does not exceed threshold tier 2; and finally, a special offer price
exceeding threshold tier 2 exceeds threshold tier 1.
STEP 2: derive test case specifications from a model of the decision structure Different criteria can be used to generate test suites of differing complexity from decision tables.
The basic condition adequacy criterion requires generation of a test case
specification for each column in the table, and corresponds to the intuitive
principle of generating a test case to produce each possible result. Dont care
entries of the table can be filled out arbitrarily, so long as constraints are not
violated.
The compound condition adequacy criterion requires a test case specification for each combination of truth values of elementary conditions. The
compound condition adequacy criterion generates a number of cases exponential in the number of elementary conditions, and can thus be applied only
to small sets of elementary conditions.
products form.
F
F
F
F
ND
F
F
F
T
SP
Individual
F
F
F
F
T
T
F
F
F
T
T1 SP
F
F
T
F
T2
F
F
T
T
SP
T
F
F
F
ND
T
F
F
T
SP
T
T
F
F
F
T1
T
T
F
F
T
SP
T
F
T
F
F
T1
Business
T
T
F
T
T
T
F
T
F
SP T2
T
T
T
T
SP
T
T
F
T2
T
T
T
SP
T
T
F
T2
T
T
T
SP
Constraints
at-most-one(Edu, Bus)
YP YT1
YP YT2
CP CT1
CP CT2
SP T1
SP T2
at-most-one(YP
at-most-one(CP
at-most-one(SP
YT1, YP YT2)
CT1, CP CT2)
T1, SP T2)
Abbreviations
Edu.
Bus.
CP
YP
CP
YP
SP
SP
SP
CT1
YT1
CT2
YT2
Sc
T1
T2
Educational account
Business account
Current purchase greater than threshold 1
Year cumulative purchase greater than threshold 1
Current purchase greater than threshold 2
Year cumulative purchase greater than threshold 2
Special Price better than scheduled price
Special Price better than tier 1
Special Price better than tier 2
Edu
ND
T1
T2
SP
Educational price
No discount
Tier 1
Tier 2
Special Price
78
Figure 13.5: The decision table for the functional specification of feature pricing of the Chipmunk web site of Figure 13.4.
Functional Testing
Education
T
T
F
T
Edu SP
Edu.
Bus.
CP CT1
YP YT1
CP CT2
YP YT2
SP Sc
SP T1
SP T2
Out
79
For the modified condition/decision adequacy criterion (MC/DC), each column in the table represents a test case specification. In addition, for each of
the original columns, MC/DC generates new columns by modifying each of
the cells containing True or False. If modifying a truth value in one column
results in a test case specification consistent with an existing column (agree-
ing in all places where neither is dont care), the two test cases are represented
by one merged column, provided they can be merged without violating constraints.
The MC/DC criterion formalizes the intuitive idea that a thorough test
suite would not only test positive combinations of values, i.e., combinations
that lead to specified outputs, but also negative combinations of values, i.e.,
combinations that differ from the specified ones and thus should produce
different outputs, in some cases among the specified ones, in some other
cases leading to error conditions.
Applying MC/DC to column 1 of table 13.5 generates two additional columns:
one for Educational Account = false and Special Price better than scheduled
price = false, and the other for Educational Account = true and Special Price
better than scheduled price = true. Both columns are already in the table
(columns 3 and 2, respectively) and thus need not be added.
Similarly, from column 2, we generate two additional columns corresponding to Educational Account = false and Special Price better than scheduled
price = true, and Educational Account = true and Special Price better than
scheduled price = false, also already in the table.
The generation of a new column for each possible variation of the boolean
values in the columns, varying exactly one value for each new column, produces 78 new columns, 21 of which can be merged with columns already in
the table. Figure 13.6 shows a table obtained by suitably joining the generated
columns with the existing ones. Many dont care cells from the original table
are assigned either true or false values, to allow merging of different columns
or to obey constraints. The few dont-care entries left can be set randomly to
obtain a complete test case specification.
There are many ways of merging columns that generate different tables.
The table in Figure 13.6 may not be the optimal one, i.e., the one with the
fewest columns. The objective in test design is not to find an optimal test
suite, but rather to produce a cost effective test suite with an acceptable tradeoff between the cost of generating and executing test cases and the effectiveness of the tests.
The table in Figure 13.6 fixes the entries as required by the constraints,
while the initial table in Figure 13.5 does not. Keeping constraints separate
from the table corresponding to the initial specification increases the number of dont care entries in the original table, which in turn increases the opportunity for merging columns when generating new cases with the MC/DC
criterion. For example, if business account = false, the constraint at-mostone(Edu, Bus) can be satisfied by assigning either true or false to entry educational account. Fixing either choice prematurely may later make merging
with a newly generated column impossible.
CT1
YT1
CT2
YT2
Sc
T1
T2
T
F
T
F
F
F
F
F
Edu
T
F
T
F
T
T
SP
F
F
F
F
F
F
F
F
ND
F
F
F
F
T
T
SP
F
F
T
F
F
F
F
T1
F
F
T
F
F
T
T
SP
F
F
T
T
T
F
F
T2
F
F
T
T
T
T
T
SP
F
T
F
F
F
F
F
F
ND
F
T
F
F
F
T
T
SP
F
T
T
F
F
F
F
F
T1
F
T
T
F
F
T
SP
F
T
F
T
F
F
F
F
F
T1
F
T
F
T
F
F
T
T
SP
F
T
T
T
F
F
F
F
T2
F
T
T
T
F
F
T
T
T
SP
F
T
T
F
T
F
F
T2
F
T
T
F
T
T
T
T
SP
F
T
F
T
F
T
F
F
T2
F
T
F
T
F
T
T
T
T
SP
T
F
F
T
F
F
F
F
F
Edu
T
F
F
F
T
F
SP
T
F
T
T
F
Edu
Abbreviations
Edu.
Bus.
CP CT1
YP YT1
CP CT2
YP YT2
SP Sc
SP T1
SP T2
Educational account
Business account
Current purchase greater than threshold 1
Year cumulative purchase greater than threshold 1
Current purchase greater than threshold 2
Year cumulative purchase greater than threshold 2
Special Price better than scheduled price
Special Price better than tier 1
Special Price better than tier 2
Edu
ND
T1
T2
SP
Educational price
No discount
Tier 1
Tier 2
Special Price
80
Figure 13.6: The set of test cases generated for feature pricing of the Chipmunk web site applying the modified adequacy
criterion.
Functional Testing
Edu.
Bus.
CP
YP
CP
YP
SP
SP
SP
Out
T
F
T
T
T
SP
F
F
T
F
T
T
T
SP
F
F
T
F
F
T
SP
81
T-node
Case
Too
small
TC-1
No
No
TC-2
Abbreviations:
Too small
Ship where
Ship how
Cust type
Pay method
Same addr
CC Valid
Ship
where
Int
Dom
Ship
method
Air
Air
Cust
type
Bus
Ind
Pay
method
CC
CC
Same
addr
No
CC
valid
Yes
No (abort)
CostOfGoods MinOrder ?
Shipping address, Int = international, Dom = domestic
Air = air freight, Land = land freight
Bus = business, Edu = educational, Ind = individual
CC = credit card, Inv = invoice
Billing address = shipping address ?
Credit card information passes validity check?
The branch testing adequacy criterion requires each branch to be exercised at least once, i.e., each edge of the graph to be traversed for at least one
test case. Test T-branch covers all branches of the control flow graph of Figure 13.8 and thus satisfies the branch adequacy criterion.
Functional Testing
82
Process shipping order: The Process shipping order function checks the validity of orders and prepares the receipt.
A valid order contains the following data:
cost of goods If the cost of goods is less than the minimum processable order
(MinOrder) then the order is invalid.
shipping address The address includes name, address, city, postal code, and
country.
preferred shipping method If the address is domestic, the shipping method
must be either land freight, or expedited land freight, or overnight air.
If the address is international, the shipping method must be either air
freight or expedited air freight; a shipping cost is computed based on
address and shipping method.
type of customer which can be individual, business or educational
preferred method of payment Individual customers can use only credit cards,
while business and educational customers can choose between credit
card and invoice.
card information if the method of payment is credit card, fields credit card
number, name on card, expiration date, and billing address, if different
than shipping address, must be provided. If credit card information is not
valid the user can either provide new data or abort the order.
The outputs of Process shipping order are
validity Validity is a boolean output which indicates whether the order can be
processed.
total charge The total charge is the sum of the value of goods and the computed shipping costs (only if validity = true).
payment status if all data are processed correctly and the credit card information is valid or the payment is invoice, payment status is set to valid, the
order is entered and a receipt is prepared; otherwise validity = false.
83
;;;;;
Process shipping order
shipping address
international
domestic
no
no
no
yes
method of payement
yes
credit card
yes
invoice
no
yes
no
abort order?
yes
Functional Testing
84
T-branch
Case
Too
Ship
Ship
Cust
Pay
Same
CC
small where method type method addr
valid
TC-1
No
Int
Air
Bus
CC
No
Yes
No
Dom
Land
TC-2
TC-3
Yes
TC-4
No
Dom
Air
No
Int
Land
TC-5
TC-6
No
Edu
Inv
TC-7
No
CC
Yes
No
CC
No (abort)
TC-8
TC-9
No
CC
No (no abort)
Abbreviations:
(as above)
In principle, other test adequacy criteria described in Chapter 14 can be
applied to more control structures derived from specifications, but in practice
a good specification should rarely result in a complex control structure, since
a specification should abstract details of processing.
13.8
The test design techniques described above require judgment in deriving value
classes. Over time, an organization can build experience in making these
judgments well. Gathering this experience in a systematic collection can speed
up the process and routinize many decisions, reducing human error. Catalogs
capture the experience of test designers by listing all cases to be considered
for each possible type of variable that represents logical inputs, outputs, and
status of the computation. For example, if the computation uses a variable
whose value must belong to a range of integer values, a catalog might indicate the following cases, each corresponding to a relevant test case:
1. The element immediately preceding the lower bound of the interval
2. The lower bound of the interval
3. A non-boundary element within the interval
4. The upper bound of the interval
5. The element immediately following the upper bound
The catalog would in this way cover the intuitive cases of erroneous conditions (cases 1 and 5), boundary conditions (cases 2 and 4), and normal conditions (case 3).
The catalog based approach consists in unfolding the specification, i.e.,
decomposing the specification into elementary items, deriving an initial set
Draft version produced 20th March 2002
85
of test case specifications from pre-conditions, post-conditions, and definitions, and completing the set of test case specifications using a suitable test
catalog.
STEP 1: identify elementary items of the specification The initial specification is transformed into a set of elementary items that have to be tested.
Elementary items belong to a small set of basic types:
Pre-conditions represent the conditions on the inputs that must be satisfied
before invocation of the unit under test. Preconditions may be checked
either by the unit under test (validated preconditions) or by the caller
(assumed preconditions).
Post-conditions describe the result of executing the unit under test.
Variables indicate the elements on which the unit under test operates. They
can be input, output, or intermediate values.
Operations indicate the main operations performed on input or intermediate variables by the unit under test
Definitions are shorthand used in the specification
As in other approaches that begin with an informal description, it is not
possible to give a precise recipe for extracting the significant elements. The
result will depend on the capability and experience of the test designer.
Consider the informal specification of a function for converting URL-encoded form data into the original data entered through an html form. An
informal specification is given in Figure 13.7.10
The informal description of cgi decode uses the concept of hexadecimal
digit, hexadecimal escape sequence, and element of a cgi encoded sequence.
This leads to the identification of the following three definitions:
DEF 1 hexadecimal digits are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D,
E, F, a, b, c, d, e, f
DEF 2 a CGI-hexadecimal is a sequence of three characters: , where
and are hexadecimal digits
DEF 3 a CGI item is either an alphanumeric character, or character , or a
CGI-hexadecimal
In general, every concept introduced in the description as a support for
defining the problem can be represented as a definition.
The description of cgi decode mentions some elements that are inputs
and outputs of the computation. These are identified as the following variables:
10 The informal specification is ambiguous and inconsistent, i.e., it is the kind of spec one is
most likely to encounter in practice.
Functional Testing
86
INPUT: encoded
OUTPUT: decoded
OUTPUT: return value
cgi decode:
87
Functional Testing
88
PRE 1
PRE 2
POST 1
POST 2
POST 3
POST 4
POST 5
POST 6
VAR 1
VAR 2
VAR 3
DEF 1
DEF 2
DEF 3
OP 1
postconditions; here we have chosen a set of simple contingent postconditions, each of which captures one case.
Although the description of cgi decode does not mention explicitly how
the results are obtained, we can easily deduce that it will be necessary to scan
the input sequence. This is made explicit in the following operation:
OP 1 Scan the input string Encoded.
In general, a description may refer either explicitly or implicitly to elementary operations which help to clearly describe the overall behavior, like definitions help to clearly describe variables. As with variables, they are not strictly
necessary for describing the relation between pre- and postconditions, but
they serve as additional information for deriving test cases.
The result of step 1 for cgi decode is summarized in Table 13.8.
Draft version produced 20th March 2002
89
STEP 2 Derive a first set of test case specifications from preconditions, postconditions and definitions The aim of this step is to explicitly describe the
partition of the input domain:
Validated Preconditions: A simple precondition, i.e., a precondition that is
expressed as a simple boolean expression without and or or, identifies
two classes of input: values that satisfy the precondition and values that
do not. We thus derive two test case specifications.
A compound precondition, given as a boolean expression with and or
or, identifies several classes of inputs. Although in general one could
derive a different test case specification for each possible combination
of truth values of the elementary conditions, usually we derive only a
subset of test case specifications using the modified condition decision
coverage (MC/DC) approach, which is illustrated in Section 13.6 and in
Chapter ??. In short, we derive a set of combinations of elementary conditions such that each elementary condition can be shown to independently affect the outcome of each decision. For each elementary condition , there are two test case specifications in which the truth values
of all conditions except are the same, and the compound condition
as a whole evaluates to True for one of those test cases and False for the
other.
Assumed Preconditions: We do not derive test case specifications for cases
that violate assumed preconditions, since there is no defined behavior
and thus no way to judge the success of such a test case. We also do not
derive test cases when the whole input domain satisfies the condition,
since test cases for these would be redundant. We generate test cases
from assumed preconditions only when the MC/DC criterion generates
more than one class of valid combinations (i.e., when the condition is a
logical disjunction of more elementary conditions).
Postconditions: In all cases in which postconditions are given in a conditional form, the condition is treated like a validated precondition, i.e.,
we generate a test case specification for cases that satisfy and cases that
do not satisfy the condition.
Definition: Definitions that refer to input or output variables are treated like
postconditions, i.e., we generate a set of test cases for each definition
given in conditional form with the same criteria used for validated preconditions. The test cases are generated for each variable that refers to
the definition.
The elementary items of the specification identified in step 1 are scanned
sequentially and a set of test cases is derived applying these rules. While
scanning the specifications, we generate test case specifications incrementally. When new test case specifications introduce a finer partition than an
existing case, or vice versa, the test case specification that creates the coarser
Draft version produced 20th March 2002
Functional Testing
90
partition becomes redundant and can be eliminated. For example, if an existing test case specification requires a non-empty set, and we have to add
two test case specifications that require a size that is a power of two and one
which is not, the existing test case specification can be deleted.
Scanning the elementary items of the cgi decode specification given in
Table 13.7, we proceed as follows:
PRE 1: The first precondition is a simple assumed precondition, thus, according to the rules, we do not generate any test case specification. The
, but
only condition would be
this matches every test case and thus it does not identify a useful partition.
PRE 2: The second precondition is a simple validated precondition, thus we
generate two test case specifications, one that satisfies the condition
and one that does not:
TC-PRE2-1
TC-PRE2-2
POST 2:
TC-POST2-1
Tc-POST2-2
POST 3:
TC-POST3-1
TC-POST3-2
POST 4: we do not generate any new useful test case specifications, because the two specifications are already covered by the specifications generated from POST 2.
Draft version produced 20th March 2002
91
POST 5: we generate only the test case specification that satisfies the
condition; the test case specification that violates the specification
is redundant with respect to the test case specifications generated
from POST 3
TC-POST5-1 :
POST 6: as for POST 5, we generate only the test case specification that
satisfies the condition; the test case specification that violates the
specification is redundant with respect to most of the test case specifications generated so far.
TC-POST6-1
Functional Testing
92
PRE 2
[
POST 1
[
[
]
]
POST 2
[
[
]
]
POST 3
[
[
]
]
POST 4
POST 5
POST 6
[
: contains alphanumeric characters
: does not contain alphanumeric characters
: contains +
: does not contain +
: contains CGI-hexadecimals
: does not contain a CGI-hexadecimal
: contains illegal characters
VAR 2
DEF 1
DEF 2
DEF 3
OP 1
: contains malformed CGI-hexadecimals
VAR 1
VAR 3
[ ]
[ ]
True
False
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
93
[ ]
(the constant value)
(the element immediately preceding the constant value)
[ ]
(the element immediately following the constant value)
[ ]
Any other constant compatible with
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
Empty
A single element
More than one element
Maximum length (if bounded) or very long
Longer than maximum length (if bounded)
Incorrectly terminated
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
occurs at beginning of sequence
occurs in interior of sequence
occurs at end of sequence
occurs contiguously
does not occur in sequence
where is a proper prefix of
Proper prefix occurs at end of sequence
Table 13.10: Part of a simple test catalog.
Functional Testing
94
However, when the value of an output variable belongs to a finite set of values,
we should derive a test case for each possible outcome, but we cannot derive
a test case for an impossible outcome, so entry ENUMERATION of the catalog specifies that the choice of values outside the enumerated set is limited
to input variables. Intermediate variables, if present, are treated like output
variables.
Entry Boolean of the catalog applies to (VAR 3). The catalog
requires a test case that produces the value True and one that produces the
value False. Both cases are already covered by test cases TC-PRE2-1 and TCPRE2-2 generated for precondition PRE 2, so no test case specification is actually added.
Entry Enumeration of the catalog applies to any variable whose values are
chosen from an explicitly enumerated set of values. In the example, the values
in POST 5 are defined by
of (DEF 3) and of improper
enumeration. Thus, we can derive new test case specifications by applying
.
entry enumeration to POST 5 and to any variable that can contain
The catalog requires creation of a test case specification for each enumerated value and for some excluded values. For , which uses DEF 3, we
generate a test case specification where a CGI-item is an alphanumeric character, one where it is the character +, one where it is a CGI-hexadecimal,
and some where it is an illegal value. We can easily ascertain that all the required cases are already covered by test case specifications for TC-POST11, TC-POST1-2, TC-POST2-1, TC-POST2-2, TC-POST3-1, and TC-POST3-2, so
any additional test case specifications would be redundant.
From the enumeration of malformed CGI-hexadecimals in POST 5, we derive the following test cases: %y, %x, %ky, %xk, %xy (where x and y are hexadecimal digits and k is not). Note that the first two cases, %x (the second
hexadecimal digit is missing) and %y (the first hexadecimal digit is missing)
are identical, and %x is distinct from %xk only if %x are the last two characters
in the string. A test case specification requiring a correct pair of hexadecimal
digits (%xy) is a value out of the range of the enumerated set, as required by
the catalog.
The added test case specifications are:
TC-POST5-2
The test case specification corresponding to the correct pair of hexadecimal digits is redundant, having already been covered by TC-POST3-1. The
test case TC-POST5-1 can now be eliminated because it is more general than
the combination of TC-POST5-2, TC-POST5-3, and TC-POST5-4.
Draft version produced 20th March 2002
95
Entry Range applies to any variable whose values are chosen from a finite
range. In the example, ranges appear three times in the definition of hexadecimal digit. Ranges also appear implicitly in the reference to alphanumeric
characters (the alphabetic and numeric ranges from the ASCII character set)
in DEF 3. For hexadecimal digits we will try the special values / and : (the
characters that appear before 0 and after 9 in the ASCII encoding), the values 0 and 9 (upper and lower bounds of the first interval), some value between 0 and 9, and similarly @, G, A, F, and some value between A and
F for the second interval and , g, a, f, and some value between a and f
for the third interval.
These values will be instantiated for variable , and result in 30 additional test case specifications (5 values for each subrange, giving 15 values
for each hexadecimal digit and thus 30 for the two digits of CGI-hexadecimal).
The full set of test case specifications is shown in Table ??. These test case
specifications are more specific than (and therefore replace) test case specifications TC-POST3-1, TC-POST5-3, and TC-POST5-4.
For alphanumeric characters we will similarly derive boundary, interior
and excluded values, which result in 15 additional test case specifications,
also given in Table ??. These test cases are more specific than (and therefore
replace) TC-POST1-1, TC-POST1-2, TC-POST6-1.
Entry Numeric Constant does not apply to any element of this specification.
Entry Non-Numeric Constant applies to + and %, occurring in DEF 3 and
DEF 2 respectively. Six test case specifications result, but all are redundant.
(VAR 1),
(VAR 2), and
Entry Sequence applies to
(DEF 2). Six test case specifications result for each, of which only five are mutually non-redundant and not already in the list. From VAR 1 ( ) we
generate test case specifications requiring an empty sequence, a sequence
containing a single element, and a very long sequence. The catalog entry requiring more than one element generates a redundant test case specification,
which is discarded. We cannot produce reasonable test cases for incorrectly
terminated strings (the behavior would vary depending on the contents of
memory outside the string), so we omit that test case specification.
(VAR 2) would
All test case specifications that would be derived for
be redundant with respect to test case specifications derived for
(VAR
1).
(DEF 2) we generate two additional test case specFrom
: a sequence that terminates with % (the only
ifications for variable
way to produce a one-character subsequence beginning with %) and a sequence containing %xyz, where x, y, and z are hexadecimal digits.
(OP 1) and generates 17 test case specEntry Scan applies to
ifications. Three test case specifications (alphanumeric, +, and
) are
generated for each of the first 5 items of the catalog entry. One test case specification is generated for each of the last two items of the catalog entry when
Scan is applied to CGI item. The last two items of the catalog entry do not
apply to alphanumeric characters and +, since they have no non-trivial preDraft version produced 20th March 2002
Functional Testing
96
fixes. Seven of the 17 are redundant. The ten generated test case specifications are summarized in Table 13.11.
Test catalogs, like other check-lists used in test and analysis (e.g., inspection check-lists), are an organizational asset that can be maintained and enhanced over time. A good test catalog will be written precisely and suitably
annotated to resolve ambiguity (unlike the sample catalog used in this chapter). Catalogs should also be specialized to an organization and application
domain, typically using a process such as defect causal analysis or root cause
analysis. Entries are added to detect particular classes of faults that have been
encountered frequently or have been particularly costly to remedy in previous projects. Refining check-lists is a typical activity carried out as part of
process improvement. When a test reveals a program fault, it is useful to
make a note of which catalog entries the test case originated from, as an aid
to measuring the effectiveness of catalog entries. Catalog entries that are not
effective should be removed.
13.9
Finite state machines are often used to specify sequences of interactions between a system and its environment. State machine specifications in one
form or another are common for control and interactive systems, such as embedded systems, communication protocols, menu driven applications, threads
of control in a system with multiple threads or processes.
In several application domains, specifications may be expressed directly
as some form of finite-state machine. For example, embedded control systems are frequently specified with Statecharts, communication protocols are
commonly described with SDL diagrams, and menu driven applications are
sometimes modeled with simple diagrams representing states and transitions.
In other domains, the finite state essence of the systems are left implicit in
informal specifications. For instance, the informal specification of feature
Maintenance of the Chipmuk web site given in Figure 13.9 describes a set of
interactions between the maintenance system and its environment that can
be modeled as transitions through a finite set of process states. The finite
state nature of the interaction is made explicit by the finite state machine
shown in Figure 13.10. Note that some transitions appear to be labeled by
conditions rather than events, but they can be interpreted as shorthand for
an event in which the condition becomes true or is discovered (e.g., lack
component is shorthand for discover that a required component is not in
stock.
Many control or interactive systems are characterized by an infinite set of
states. Fortunately, the non-finite-state parts of the specification are often
simple enough that finite state machines remain a useful model for testing as
well as specification. For example, communication protocols are frequently
specified using finite state machines, often with some extensions that make
Draft version produced 20th March 2002
contains %xy,
with y in [B..E]
ter +
contains %xF
TC-DEF2-24
does not contain
TC-POST2-2
contains %xG
TC-DEF2-25
character +
contains %x
TC-DEF2-26
does not contain
TC-POST3-2
contains %xa
TC-DEF2-27
a CGI-hexadecimal
contains %xy,
TC-DEF2-28
terminates with
TC-POST5-2
with y in [b..e]
%x
contains %xf
TC-DEF2-29
is the empty seTC-VAR1-1
contains %xg
TC-DEF2-30
quence
contains %$
TC-DEF2-31
is a sequence
TC-VAR1-2
contains %xyz
TC-DEF2-32
containing a single charcontains /
TC-DEF3-1
acter
contains 0
TC-DEF3-2
is a very long seTC-VAR1-3
contains , with c
TC-DEF3-3
quence
in [1..8]
contains %/y
TC-DEF2-1
contains 9
TC-DEF3-4
contains %0y
TC-DEF2-2
contains :
TC-DEF3-5
contains %xy,
TC-DEF2-3
contains @
TC-DEF3-6
with x in [1..8]
contains A
TC-DEF3-7
contains %9y
TC-DEF2-4
contains , with c
TC-DEF3-8
contains %:y
TC-DEF2-5
in [B..Y]
contains %@y
TC-DEF2-6
contains Z
TC-DEF3-9
contains %Ay
TC-DEF2-7
contains [
TC-DEF3-10
contains %xy,
TC-DEF2-8
contains
TC-DEF3-11
with x in [B..E]
contains a
TC-DEF3-12
contains %Fy
TC-DEF2-9
contains , with c
TC-DEF3-13
contains %Gy
TC-DEF2-10
in [b..y]
contains %y
TC-DEF2-11
contains z
TC-DEF3-14
contains %ay
TC-DEF2-12
contains
TC-DEF3-15
contains %xy,
TC-DEF2-13
contains
TC-OP1-1
with x in [b..e]
contains +
TC-OP1-2
contains %fy
TC-DEF2-14
contains %xy
TC-OP1-3
contains %gy
TC-DEF2-15
contains $
TC-OP1-4
contains %x/
TC-DEF2-16
contains +$
TC-OP1-5
contains %x0
TC-DEF2-17
contains %xy$
TC-OP1-6
contains %xy,
TC-DEF2-18
contains
TC-OP1-7
with y in [1..8]
contains ++
TC-OP1-8
contains %x9
TC-DEF2-19
contains
TC-OP1-9
contains %x:
TC-DEF2-20
%xy%zw
contains %x@
TC-DEF2-21
contains
TC-OP1-10
contains %xA
TC-DEF2-22
%x%yz
are hexadecimal digits, is an alphanumeric character, represents
where
the beginning of the string, and $ represents the end of the string.
TC-POST2-1
contains charac-
TC-DEF2-23
97
Table 13.11: Summary table: Test case specifications for cgi-decode generated with a catalog.
Functional Testing
98
Maintenance:
Figure 13.9: The functional specification of feature Maintenace of the Chipmuk web site.
99
e
ct
je
e
at
im
st
5
Wait for
acceptance
Wait for
component
accept
estimate
Wait for
pick up
p ic k
Repair
(maintenance
station)
repair completed
(U una
)
S ble
t (a
or
n
e
n
UE t o r
po
m
re ep a
o
component
c
s id ir
k
en
l ac
arrives (a)
t)
8 Repair
lack component (b)
(regional
headquarters)
arrives (b)
l ac
kc
om
pon
en
t (c
component
)
arrives (c)
Repaired
l
sf u
c es
suc
ai r
rep
unable to
repair
component
unable to repair
(not (US or UE resident)
up
re
pa
ir
re
in
c o v al
nu nt r a id
mb ct
er
fu
l
Maintenance
(no warranty)
estimate
costs
Wait for
returning
return
es
s
by re q u
[ U S p h on e e st
( co o r UE o r we
n t ra
b
r
ct n esi de n
um
b er t]
)
su
cc
t a t t io n
u es
ta
re q n ce s )
a
n rr an t y
e
t
in
m a no w a
(
k up
p ic
request at
maintenance station
or by express cour ier
(contract number)
NO
Maintenance
Repairi
(main
headquarters)
Figure 13.10: The finite state machine corresponding to functionality Maintenance specified in Figure 13.9
Functional Testing
100
TC-1
TC-2
TC-3
TC-4
T-Cover
02410
0524560
035960
0357587897960
them not truly finite-state. A state machine that simply receives a message
on one port and then sends the same message on another port is not really
finite-state unless the set of possible messages is finite, but is often rendered
as a finite state machine, ignoring the contents of the exchanged messages.
State-machine specifications can be used both to guide test selection and
in construction of an oracle that judges whether each observed behavior is
correct. There are many approaches for generating test cases from finite state
machines, but most are variations on a basic strategy of checking each state
transition. One way to understand this basic strategy is to consider that each
transition is essentially a specification of a precondition and postcondition,
e.g., a transition from state to state on stimulus means if the system
is in state and receives stimulus , then after reacting it will be in state .
For instance, the transition labeled accept estimate from state Wait for acceptance to state Repair (maintenance station) of Figure 13.10 indicates that if an
item is on hold waiting for the customer to accept an estimate of repair costs,
and the customer accepts the estimate, then the maintenance station begins
repairing the item.
A faulty system could violate any of these precondition, postcondition
pairs, so each should be tested. For instance, the state Repair (maintenance
station) can be arrived through three different transitions, and each should
be checked.
Details of the approach taken depend on several factors, including whether
system states are directly observable or must be inferred from stimulus/response
sequences, whether the state machine specification is complete as given or
includes additional, implicit transitions, and whether the size of the (possibly
augmented) state machine is modest or very large.
A basic criterion for generating test cases from finite state machines is
transition coverage, which requires each transition to be traversed at least
once. Test case specifications for transition coverage are often given as sets of
state sequences or transition sequences. For example, T-Cover in Table 13.12
is a set of four paths, each beginning at the initial state, which together cover
all transitions of the finite state machine of Figure 13.10. T-Cover thus satisfies
the transition coverage criterion.
The transition coverage criterion depends on the assumption that the finiteDraft version produced 20th March 2002
101
11 The boundary interior path coverage was originally proposed for structural coverage of program control flow, and is described in Chapter 14
Functional Testing
102
Advanced search: The Advanced search function allows for searching elements in the
website database.
The key for searching can be:
a simple string , i.e., a simple sequence of characters,
a compound string , i.e.,
a combination of strings , i.e., a set of strings combined with the boolean operators NOT, AND, OR, and grouped within parenthesis to change the priority of operators.
Examples:
laptop The routine searches for string laptop
DVD*,CD* The routine searches for strings that start with substring DVD
or CD followed by any number of characters
NOT (C2021*) AND C20* The routine searches for strings that start with substring C20 followed by any number of characters, except substring 21
13.10
search
binop
term
regexp
choices
search
regexp
binop term
search
term
search
Char regexp
regexp
103
Char
choices
regexp choices
A second example is given in Figure 13.13, which specifies a product configuration of the Chipmuk website. In this case, the syntactic structure of
product configuration is described by an XML schema, which defines an element Model of type ProductConfigurationType. XML schemata are essentially
a variant of BNF, so it is not difficult to render the schema in the same BNF
notation, as shown in Figure 13.13.
In general, grammars are well suited to represent inputs of varying and unbounded size, boundary conditions, and recursive structures. None of which
can be easily captured with fixed lists of parameters, as required by most
methods presented in this chapter.
Generating test cases from grammar specifications is straightforward and
can easily be automated. To produce a string, we start from a non-terminal
symbol and we progressively substitute non-terminals occurring in the current string with substrings, as indicated by the applied productions, until we
obtain a string composed only of terminal symbols. In general at each step,
several rules can be applied. A minimal set of test cases can be generated
by requiring each production to be exercised at least once. Test cases can
be generated by starting from the start symbol and applying all productions.
The number and complexity of the generated test cases depend on the order of application of the productions. If we first apply productions with nonterminals on the right hand side, we generate a smaller set of test cases, each
one tending to be a large test case. On the contrary, first applying productions
with only terminals on the right hand side, we generate larger sets of smaller
test cases. An algorithm that favors non-terminals applied to the BNF for Advanced Search of Figure 13.11, generates the test case
not Char *, Char and (Char or Char)
that exercise all productions. The derivation tree for this test case is given
in Figure 13.15. It shows that all productions of the BNF are exercised at least
once.
The minimal set of test cases can be enriched by considering boundary
conditions. Boundary conditions apply to recursive productions. To generate test cases for boundary conditions we need to identify the minimum and
maximum number of recursive applications of a production and then generate a test case for the minimum, maximum, one greater than minimum and
Functional Testing
104
Figure 13.13: The XML Schema that describes a Product configuration of the
Chipmuk website
Draft version produced 20th March 2002
105
Model
compSequence
Component compSequence
optCompSequence
OptionalComponent
Component
ComponentType ComponentValue
OptionalComponent
ComponentType
modelNumber
ComponentType
ComponentValue
optCompSequence
<search>
<search> <binop> <term>
not <search>
<term>
<regexp>
Char <regexp>
{<choices>}
and
(<search>)
or <regexp>
Char
Char
<regexp> , <choices>
*
<regexp>
Char
Figure 13.15: The derivation tree of a test case for functionality Advanced
Search derived from the BNF specification of Figure 13.12.
Functional Testing
106
Model Model
Component compSequence
compSeq2 compSequence
optCompSeq1 limit=16 optCompSequence
optCompSeq2 optCompSequence
OptionalComponent optCompSequence
Comp Component
ComponentType ComponentValue
OptComp OptionalComponent
ComponentType
modNum modelNumber
CompTyp ComponentType
CompVal ComponentValue
107
Model
compSeq1
compSeq2
optCompSeq1
optCompSeq2
Comp
OptComp
modNum
CompTyp
CompVal
1
10
0
10
0
1
1
1
1
1
108
Functional Testing
techniques may be hard or even impossible to apply, or may lead to unsatisfactory results. Some techniques can be interchanged, i.e., they can be applied to the same specification and lead to similar results. Other techuiques
are complementary, i.e., they apply to different aspects of the same specification or at different stages of test case generation. In some cases, approaches
apply directly to the form in which the specification is given, in some other
cases, the specification must be transformed into a suitable form.
The choice of approach for deriving functional testing depends on several
factors: the nature of the specification, the form of the specification, expertieses and experience of test designers, the structure of the organization, the
availability of tools, the budget and quality constraints, and the costs of designing and implementing the scaffolding.
Nature and form of the specification Different approaches exploit different characteristics of the specification. For example, the presence of several
constraints on the input domain may suggest the category partition method,
while lack of constraints may indicate a combinatorial approach. The presence of a finite set of states could suggest a finite state machine approach,
while inputs of varying and unbounded size may be tackled with grammar
based approaches. Specifications given in a specific format, e.g., as finite state
machines, or decision structures suggest the corresponding techniques. For
example, functional test cases for SDL specifications of protocols are often
derived with finite state machine based criteria.
Experience of test designers and organization Experience of testers and
company procedures may drive the choice of the testing technique. For example, test designers expert in category partition may prefer this technique
over a catalog based approach when both are applicable, while a company
that works in a specific domain may require the use of catalogs suitably produced for the domain of interest.
Tools Some techniques may require the use of tools, whose availability and
cost should be taken into account when choosing a specific testing technique.
For example, several tools are available for deriving test cases from SDL specifications. The availability of one of these tools may suggest the use of SDL for
capturing a subset of the requirements expressed in the specification.
Budget and quality constraints Different quality and budget constraints
may lead to different choices. For example, the need of quickly check a software product without stringent reliability requirements may lead to chose a
random test generation approach, while a thorough check of a safety critical application may require the use of sophisticated methods for functional
test case generation. When choosing a specific approach, it is important to
Draft version produced 20th March 2002
109
evaluate all cost related aspects. For example, the generation of a large number of random tests may require the design of sophisticated oracles, which
may raise the costs of testing over an acceptable threshold; the cost of a specific tool and the related training may go beyond the advantages of adopting
a specific approach, even if the nature and the form of the specification may
suggest the suitability of that approach.
Many engineering activities require careful trading off different aspects.
Functional testing is not an exception: successfully balancing the many aspects is a difficult and often underestimated problem that requires highly
skilled designers. Functional testing is not an exercise of choosing the optimal approach, but a complex set of activities for finding a suitable combination of models and techniques that can lead to a set of test cases that satisfy
cost and quality constraints. This balancing extends beyond test design to
software design for test. Appropriate design not only improves the software
development process, but can greatly facilitate the job of test designers, and
thus lead to substantial savings.
Too often test designers make the same mistake as non-expert programmers, that is to start generating code in one case, test cases in the other,
without prior analysis of the problem domain. Expert test designers carefully examine the available specifications, their form, domain and company
constraints for identifying a suitable framework for designing test case specifications before even starting to consider the problem of test case generation.
definition of techniques for automatically deriving test cases from particular formal methods. Formal methods present new challenges and
opportunities for deriving test cases. We can both adapt existing techniques borrowed from other disciplines or research areas and define
new techniques for test case generation. The formal nature can support fully automatic generation of test cases, thus opening additional
problems and research challenges.
Draft version produced 20th March 2002
Functional Testing
110
adaptation of formal methods to be more suitable for test case generation. As illustrated in this chapter, test cases can be derived in two
broad ways, either by identifying representative values or by deriving a
model of the unit under test. The possibility of automatically generating test cases from different formal methods offers the opportunities of
a large set of models to be used in testing. The research challenge relies
in the capability of identifying a tradeoff between costs of generating
formal models and savings in automatically generating test cases. The
possibility of deriving simple formal models capturing only the aspects
of interests for testing has been already studied in some specific areas,
like concurrency, where test cases can be derived from models of the
concurrency structure ignoring other details of the system under test,
but the topic presents many new challenges if applied to wider classes
of systems and models.
identification of a general framework for deriving test cases from any
particular formal specification. Currently research is moving towards
the study of techniques for generating test cases for specific formal methods. The unification of methods into a general framework will constitute an additional important result that will allow the interchange of
formal methods and testing techniques.
Another hot research area is fed by the increasing interest in different specification and design paradigms. New software development paradigms, such
as the object oriented paradigm, as well as techniques for addressing increasingly important topics, such as software architectures and design patterns,
are often based on new notations. Semi-formal and diagrammatic notations
offer several opportunities for systematically generating test cases. Resarch is
active in investigating different possibilities of (semi) automatically deriving
test cases from these new forms of specifications and studying the effectiveness of existing test case generation techniques12 .
Most functional testing techniques do not satisfactory address the problem of testing increasingly large artifacts. Existing functional testing techniques do not take advantages of test cases available for parts of the artifact
under test. Compositional approaches for deriving test cases for a given system taking advantage of test cases available for its subsystems is an important
open research problem.
Further Reading
Functional testing techniques, sometimes called black-box testing or specificationbased testing, are presented and discussed by several authors. Ntafos [DN81]
makes the case for random, rather than systematic testing; Frankl, Hamlet,
12 Problems and state-of-art techniques for testing object oriented software and software architectures are discussed in Chapters ?? and ??
111
Littlewood and Strigini [FHLS98] is a good starting point to the more recent
literature considering the relative merits of systematic and statistical approaches.
Category partition testing is described by Ostrand and Balcer [OB88]. The
combinatorial approach described in this chapter is due to Cohen, Dalal,
Fredman, and Patton [CDFP97]; the algorithm described by Cohen et al. is
patented by Bellcore. Myers classic text [Mye79] describes a number of techniques for testing decision structures. Richardson, OMalley, and Tittle [ROT89]
and Stocks and Carrington [SC96] are among more recent attempts to generate test cases based on the structure of (formal) specifications. Beizers Black
Box Testing [Bei95] is a popular presentation of techniques for testing based
on control and data flow structure of (informal) specifications.
Catalog-based testing of subsystems is described in depth by Maricks The
Craft of Software Testing [Mar97].
Test design based on finite state machines has been important in the domain of communication protocol development and conformance testing; Fujiwara, von Bochmann, Amalou, and Ghedamsi [FvBK 91] is a good introduction. Gargantini and Heitmeyer [GH99] describe a related approach applicable to software systems in which the finite-state machine is not explicit but
can be derived from a requirements specification.
Test generation from context-free grammars is described by Celentano et
al. [CCD 80] and apparently goes back at least to Hanfords test generator
for an IBM PL/I compiler [Han70]. The probabilistic approach to grammarbased testing is described by Sirer and Bershad [SB99], who use annotated
grammars to systematically generate tests for Java virtual machine implementations.
Related topics
Readers interested in the complementarites between functional and structural testing as well as readers interested in the testing decision structures and
control and data flow graphs may continue with the next Chapters that describe structural and data flow testing. Readers interested in finite state machine based testing may go to Chapters 17 and ?? that discuss testing of object
oriented and distributed system, respectively. Readers interested in the quality of specifications may goto Chapters 25 and ??, that describe inspection
techniques and methods for testing and analysis of specifications, respectively. Readers interested in other aspect of functional testing may move to
Chapters 16 and ??, that discuss technuqes for testing complex data structures and GUIs, respectively.
Functional Testing
112
Exercises
Ex13.1. In the Extreme Programming (XP) methodology [?], a written description of a desired feature may be a single sentence, and the first step to designing the implementation of that feature is designing and implementing a set
of test cases. Does this aspect of the XP methodology contradict our assertion
that test cases are a formalization of specifications?
Ex13.2. Compute the probability of selecting a test case that reveals the fault inserted in line 25 of program Root of Figure 13.1 by randomly sampling the
. Cominput domain, assuming that type double has range
pute the probability of selecting a test case that reveals a fault, asuming that
both lines 18 and 25 of program Root contains the same fault, i.e., missing
condition
. Compare the two probabilities.
, , , ,
, the 10 digits
, , , , , the operations
to display the result of a sequence of operations
Draft version produced 20th March 2002
113
, to clear display
, , , ,
Ex13.5. Given a set of parameter characteristics (categories) and value classes (choices)
obtained by applying the category partition method to an informal specification, explain either with a deduction or with examples why unrestricted
use of constraints property and if-property makes it difficult to compute the
number of derivable combinations of value classes.
Write heuristics to compute a reasonable upper bound for the number of
derivable combinations of value classes when constraints can be used without limits.
Ex13.6. Consider the following specification, which extends the specification of
the feature Check-configuration of the Chipmuk web site given in Figure
13.3. Derive a test case specification using the category partition method
and compare the test specification you obtain with the specification of Table 13.1. Try to identify a procedure for deriving the test specifications of the
new version of the functional specification from the former version. Discuss
the suitability of category-partition test design for incremental development
with evolving specifications.
Draft version produced 20th March 2002
114
Functional Testing
Check-Configuration: the Check-configuration function checks the validity of a computer configuration. The parameters of check-configuration
are:
Product line: A product line identifies a set of products sharing several
components and accessories. Different product lines have distinct
components and accessories.
Example: Product lines include desktops, servers, notebooks, digital cameras, printers.
Model: A model identifies a specific product and determines a set of
constraints on available components. Models are characterized by
logical slots for components, which may or may not be implemented
by physical slots on a bus. Slots may be required or optional. Required slots must be assigned a suitable component to obtain a legal configuration, while optional slots may be left empty or filled
depending on the customers needs.
Example: The required slots of the Chipmunk C20 laptop computer include a screen, a processor, a hard disk, memory, and an
operating system. (Of these, only the hard disk and memory are
implemented using actual hardware slots on a bus.) The optional
slots include external storage devices such as a CD/DVD writer.
Set of Components: A set of pairs, which must correspond to the required and optional slots associated with the model.
A component is a choice that can be varied within a model, and
which is not designed to be replaced by the end user. Available components and a default for each slot is determined by the model. The
special value empty is allowed (and may be the default selection)
for optional slots.
In addition to being compatible or incompatible with a particular
model and slot, individual components may be compatible or incompatible with each other.
Example: The default configuration of the Chipmunk C20 includes
20 gigabytes of hard disk; 30 and 40 gigabyte disks are also available. (Since the hard disk is a required slot, empty is not an allowed choice.) The default operating system is RodentOS 3.2, personal edition, but RodentOS 3.2 mobile server edition may also be
selected. The mobile server edition requires at least 30 gigabytes of
hard disk.
Set of Accessories: An accessory is a choice that can be varied within a
model, and which is designed to be replaced by the end user. Available choices are determined by a model and its line. Unlike components, an unlimited number of accessories may be ordered, and the
default value for accessories is always empty. The compatibility of
some accessories may be determined by the set of components, but
accessories are always considered compatible with each other.
Draft version produced 20th March 2002
115
Example: Models of the notebook family may allow accessories including removable drives (zip, cd, etc.), PC card devices (modem,
lan, etc.), additional batteries, port replicators, carrying case, etc.
Ex13.7. Update the specification of feature Check-configuration of the Chipmuk
web site given in Figure 13.3 by using information from the test specification
provided in Table 13.1.
Ex13.8. Derive test specifications using the category partition method for the following Airport connection check function:
Airport connection check: The airport connection check is part of an
(imaginary) travel reservation system. It is intended to check the validity of a single connection between two flights in an itinerary. It is
described here at a fairly abstract level, as it might be described in a
preliminary design before concrete interfaces have been worked out.
Specification Signature: Valid Connection (Arriving Flight: flight, Departing Flight: flight) returns Validity Code
Validity Code 0 (OK) is returned if Arriving Flight and Departing Flight
make a valid connection (the arriving airport of the first is the departing airport of the second) and there is sufficient time between
arrival and departure according to the information in the airport
database described below.
Otherwise, a validity code other than 0 is returned, indicating why
the connection is not valid.
Data types
Flight: A flight is a structure consisting of
A unique identifying flight code, three alphabetic characters followed by up to four digits. (The flight code is not used by the
valid connection function.)
The originating airport code (3 characters, alphabetic)
The scheduled departure time of the flight (in universal time)
The destination airport code (3 characters, alphabetic)
The scheduled arrival time at the destination airport.
Validity Code: The validity code is one of a set of integer values with
the following interpretations
0: The connection is valid.
10: Invalid airport code (airport code not found in database)
15: Invalid connection, too short: There is insufficient time between
arrival of first flight and departure of second flight.
16: Invalid connection, flights do not connect. The destination airport of Arriving Flight is not the same as the originating airport
of Departing Flight.
20: Another error has been recognized (e.g., the input arguments
may be invalid, or an unanticipated error was encountered).
Draft version produced 20th March 2002
Functional Testing
116
Airport Database
The Valid Connection function uses an internal, in-memory table
of airports which is read from a configuration file at system initialization. Each record in the table contains the following information:
Three-letter airport code. This is the key of the table and can be
used for lookups.
Airport zone. In most cases the airport zone is a two-letter country code, e.g., us for the United States. However, where passage
from one country to another is possible without a passport, the
airport zone represents the complete zone in which passportfree travel is allowed. For example, the code eu represents the
European countries which are treated as if they were a single
country for purposes of travel.
Domestic connect time. This is an integer representing the minimum number of minutes that must be allowed for a domestic
connection at the airport. A connection is domestic if the originating and destination airports of both flights are in the same
airport zone.
International connect time. This is an integer representing the
minimum number of minutes that must be allowed for an international connection at the airport. The number -1 indicates
that international connections are not permitted at the airport.
A connection is international if any of the originating or destination airports are in different zones.
Ex13.9. Derive test specifications using the category partition method for the function SUM of Excel from the following description taken from the Excel
manual:
SUM: Adds all the numbers in a range of cells.
Syntax
SUM(number1,number2, ...)
Number1, number2, ...are 1 to 30 arguments for which you want
the total value or sum.
Numbers, logical values, and text representations of numbers
that you type directly into the list of arguments are counted. See
the first and second examples following.
If an argument is an array or reference, only numbers in that array or reference are counted. Empty cells, logical values, text, or
error values in the array or reference are ignored. See the third
example following.
Arguments that are error values or text that cannot be translated into numbers cause errors.
Draft version produced 20th March 2002
117
Examples
SUM(3, 2) equals 5
SUM(3, 2, TRUE) equals 6 because the text values are translated
into numbers, and the logical value TRUE is translated into the
number 1.
Unlike the previous example, if A1 contains 3 and B1 contains
TRUE, then:
SUM(A1, B1, 2) equals 2 because references to nonnumeric values
in references are not translated.
If cells A2:E2 contain 5, 15, 30, 40, and 50:
SUM(A2:C2) equals 50
SUM(B2:E2, 15) equals 150
Ex13.10. Eliminate from the test specifications of the feature check configuration
given in Table 13.1 all constraints that do not correspond to infeasible tuples,
but have been added for the sake of reducing the number of test cases.
Compute the number of test cases corresponding to the new specifications.
Apply the combinatorial approach to derive test cases covering all pairwise
combinations.
Compute the number of derived test cases.
Ex13.11. Consider the value classes obtained by applying the category partition
approach to the Airport Connection Check example of Exercise Ex13.8. Eliminate from the test specifications all constraints that do not correspond to
infeasible tuples and compute the number of derivable test cases. Apply the
combinatorial approach to derive test cases covering all pairwise combinations, and compare the number of derived test cases.
Ex13.12. Given a set of parameter characteristics and value classes, write a heuristic algorithm that selects a small set of tuples that cover all possible pairs of
the value classes using the combinatorial approach. Assume that parameter
characteristics and value classes are given without constraints.
Ex13.13. Given a set of parameter characteristics and value classes, compute a
lower bound on the number of tuples required for covering all pairs of values
according to the combinatorial approach.
Ex13.14. Generate a set of tuples that cover all triples of language, screen-size, and
font and all pairs of other parameters for the specification given in Table
13.3.
Ex13.15. Consider the following columns that correspond to educational and individual accounts of feature pricing of Figure 13.4:
Draft version produced 20th March 2002
Functional Testing
118
Edu.
CP
CP
SP
SP
SP
Out
CT1
CT2
Sc
T1
T2
Education
T
T
F
T
Edu SP
F
F
F
ND
Individual
F
F
F
F
T
T
F
F
T
F
T
SP T1 SP
F
T
F
T2
F
T
T
SP
write a set of boolean expressions for the outputs and apply the modified
condition/decision adequacy criterion (MC/DC) presented in Chapter 14
to derive a set of test cases for the derived boolean expressions. Compare the
result with the test case specifications given in Figure 13.6.
Ex13.16. Derive a set of test cases for the Airport Connection Check example of
Exercise Ex13.8 using the catalog based approach.
Extend the catalog of Table 13.10 as needed to deal with specification constructs.
Ex13.17. Derive sets of test cases for functionality Maintenance applying Transition Coverage, Single State Path Coverage, Single Tranistion Path Coverage,
and Boundary Interior Loop Coverage to the FSM specification of Figure
13.9
Ex13.18. Derive test cases for functionality Maintenance applying Transition Coverage to the FSM specification of Figure 13.9, assuming that implicit transitions are (1) error conditions or (2) self transitions.
Ex13.19. We have stated that the transitions in a state-machine specification can
be considered as precondition, postcondition pairs. Often the finite-state
machine is an abstraction of a more complex system which is not truly finitestate. Additional state information is associated with each of the states, including fields and variables that may be changed by an action attached to a
state transition, and a predicate that should always be true in that state. The
same system can often be described by a machine with a few states and complicated predicates, or a machine with more states and simpler predicates.
Given this observation, how would you combine test selection methods for
finite-state machine specifications with decision structure testing methods?
Can you devise a method that selects the same test cases regardless of the
specification style (more or fewer states)? Is it wise to do so?