Sunteți pe pagina 1din 19

SOFTWARE TESTING, VERIFICATION AND RELIABILITY

Softw. Test. Verif. Reliab. 2001; 11:207225 (DOI: 10.1002/stvr.238)

Investigating the effectiveness of


object-oriented testing strategies
using the mutation method
Sun-Woo Kim, John A. Clark, and John A. McDermid
Department of Computer Science, The University of York, Heslington, York YO10 5DD, U.K.

SUMMARY
The mutation method assesses test quality by examining the ability of a test set to distinguish syntactic
deviations representing specific types of faults from the program under test. This paper describes an
empirical study performed to evaluate the effectiveness of object-oriented (OO) test strategies using the
mutation method. The test sets for the experimental system are generated according to three selected OO
test strategies and their effectiveness is compared by determining how well the developed test sets kill
injected mutants derived from an established mutation system Mothra and the authors own OO-specific
mutation technique which is termed Class Mutation. Copyright 2001 John Wiley & Sons, Ltd.
KEY WORDS :

mutation testing; object-oriented testing; Java; Class Mutation

1. INTRODUCTION
The program mutation method [1] is a fault-based adequacy criterion that examines a set of syntactic
deviations of the program under test, known as mutants. A mutant is a marginally different, and
presumably incorrect, alternate program which represents a simple syntactic error such as using a
wrong arithmetic operator or a wrong variable name in an expression, or defining a wrong label
in GOTO statements, etc. The mutation method asks testers to demonstrate that the program under
test does not contain the pre-specified set of simple faults that mutants represent. Testers satisfy this
requirement by finding test cases that cause faulty versions of the program (mutants) to fail while the
original program works as intended on the test cases. The failure to differentiate the original program
from a specific mutant points directly to a portion of the program which is not receiving adequate

Correspondence to: John Clark, Computer Science, University of York, Heslington, York YO10 5DD, U.K.
E-mail: jac@cs.york.ac.uk
A version of this paper was originally presented at Mutation 2000, a Symposium on Mutation Testing, held in San Jose,
California, 67 October 2000. It is reproduced here in modified form with the permission of the Symposium organizers.

Copyright 2001 John Wiley & Sons, Ltd.

Received 1 June 2000


Accepted 10 May 2001

208

S.-W. KIM, J. A. CLARK AND J. A. McDERMID

testing and hence to a weakness in the test data (generated by a certain test strategy or adequacy
criterion).
Although there have been several proposals for test methods designed specifically to test objectoriented (OO) software, detailed information concerning their performance and effectiveness is not
available in the literature. This work assesses some test strategies for OO software using the mutation
technique. Three OO test strategies are chosen for the case study and test data sets for the experimental
system are generated according to the chosen testing strategies. The developed test sets are then applied
to kill the mutants of the system.
The system used in the case study is written in an OO language (Java). As far as the authors
are aware, there is no mutation system for Java. In order to apply the mutation method, one of the
traditional mutation systems, Mothras Fortran mutation system, was translated into the Java language.
In addition, the mutation technique proposed to handle the unique features in OO (Java) programming
was also applied. Several resources report faults that uniquely appear in OO programs related to the
OO-specific features such as inheritance, polymorphism and so on [25]. Since the Mothra Fortran
system was developed for a non-OO programming environment it may not be effective in handling
plausible flaws related to the OO-specific features. The technique, termed Class Mutation (CM), is a
form of OO-directed selective mutation testing which targets plausible faults that are likely to occur
due to the OO-unique features in an OO language (Java).
Applying both the Java version of the Mothra mutation system and the CM system allows
measurement of the effectiveness of the selected OO strategies to uncover traditionally plausible
errors (represented by the Mothra mutants) and OO-feature related errors (represented by the mutants
created by CM). The effectiveness of the OO test strategies is compared by analysing the mutation
adequacy score (i.e. the fault detection rate of a testing strategy). The OO strategies are also evaluated
by examining the mutants which they can kill and/or fail to kill. Here the types of mutants are of more
concern than the number of killed/live mutants (i.e. whether a certain OO strategy is more effective
at handling a certain type of error is examined).
Section 2 describes the OO strategies studied in the case study and presents the number of test
cases in the test sets generated according to the strategies. Section 3 discusses the traditional program
mutation and CM techniques used to generate the mutants of the Java programs under test. The section
explains how Mothras Fortran mutation operators are interpreted as operators for the Java language
and briefly introduces the OO-feature directed CM technique. Section 4 evaluates the OO test strategies
according to the mutation execution results of the developed test sets. Finally, conclusions and ideas
for future work are presented in Section 5.

2. TEST DATA GENERATION


Several strategies have been proposed for test case selection for OO software. All these OO test
strategies aim to provide a way of systematically selecting the sequences of methods in varying
orders and lengths and then verifying that the resulting state and outputs from methods of the objects
manipulated by the method sequences is correct. The approaches differ from one another primarily in
the kind of information required to generate test cases. For example, the formal specification-based test
approach places emphasis on the fact that classes are implementations of abstract data types (ADTs)
and constructs test data sets from the formal specifications of ADTs, whereas the state-based approach
Copyright 2001 John Wiley & Sons, Ltd.

Softw. Test. Verif. Reliab. 2001; 11:207225

OBJECT-ORIENTED TESTING STRATEGIES USING THE MUTATION METHOD

209

Table I. Classes of Logging cluster.


Class names

Descriptions

LogServiceProvider
PrintWriterLogServiceProvider
Logger

LogMessage
LogException

An abstract class which is extended by classes providing


logging services
Used for writing textual log messages to a print stream (for
example, to the console)
Provides the central control for the PSK logging service such
as registering multiple log service providers to be operative
concurrently.
A message format to be logged by the logging service
A base exception class for exceptions thrown by the logger
and log service providers

Table II. Scale information for the Logging classes.

Class names
LogServiceProvider
PrintWriterLogServiceProvider
Logger
LogMessage
LogException

No. of field
variables

No. of
methods

No. of
constructors

Lines of
code

9
10
4
9
3

9
9
6
4
1

1
1
Default constructor
5
1

230
84
170
151
55

is based on the state transition diagrams (derived from either programs or specifications or both) that
define the behaviour of systems or objects.
Three OO test strategiesDoong and Frankls OO testing method [6], Kirani and Tsais method
sequence specification [7], and Kung et al.s object state testing [8]were chosen to generate test cases
for the experimental Java system. The system is a beta version of an IBM product called Product Starter
Kit (PSK) for Java 1.0 which provides classes to facilitate development of production quality Java
applets and applications. The product consists of various facilities such as NLS (National Language
Support), Tracing, Logging, Asserts, etc. The cluster (related classes) of the Logging facility, which
provides classes for a full logging service for applications, is used for the case study. Tables I and II
show the classes of the Logging cluster used in the experimentation with brief descriptions and scale
information.
Copyright 2001 John Wiley & Sons, Ltd.

Softw. Test. Verif. Reliab. 2001; 11:207225

210

S.-W. KIM, J. A. CLARK AND J. A. McDERMID

2.1. Doong and Frankls OO testing method


Doong and Frankls approach [6] focuses on the question of whether a sequence of messages puts an
object of the class under test into the correct abstract state. It defines the concept of observational
equivalence to determine whether the objects are in the same abstract state. An object O1 is
observationally equivalent to an object O2 if and only if it is impossible to distinguish O1 from O2
using the operations of class C and related classes. Thus two observationally equivalent objects are in
the same abstract state, even if the details of their (internal) representations may be different. The
concept of observational equivalence reflects the encapsulation and information hiding features of the
OO paradigm.
In this approach, each test case consists of a pair of sequences of messages, along with a tag
indicating whether these sequences should put objects of the class under test into equivalent states
and/or return objects which are in equivalent states. That is, a test set T consists of 3-tuples
(S1 , S2 , tag), where S1 and S2 are sequences of messages, and tag is equivalent if S1 is equivalent
to S2 according to the specification and is not equivalent, otherwise. Tests are executed by sending
sequences S1 and S2 to objects O1 and O2 of class C, respectively. A user-supplied equivalencechecking mechanism is then invoked to check whether the objects are in the same abstract state and
finally the result of this check is compared to the pre-defined tag. If all the observational equivalence
checks agree with the tags, then the implementation is deemed correct; otherwise it is incorrect. The
test cases in this test scheme are called self-checking test cases, as each test case includes information,
in the form of the tag, describing the expected result of test execution. For example, consider a priority
queue of integers, whose functions are described informally as follows:

create: creates an empty priority queue;


add: adds an integer to the priority queue;
delete: removes the largest element of the priority queue;
largest: returns the value of the largest element of the priority queue, without modifying the
contents of the priority queue.

Test case examples are:


1. (create.add(5).add(3).delete, create.add(3), equivalent)
2. (create.add(5).add(3).delete.largest, create.add(3).largest,
equivalent)
Test case 1 says that creating an empty priority queue, adding 5 then 3, then applying delete
should be the same as creating an empty priority queue and adding 3 to it. Test case 2 says that the
objects returned by applying largest to those two priority queues should be equivalent.
Test cases can be automatically generated by term rewriting if an algebraic specification of the
abstract data type being tested is available. Otherwise, a person can develop test cases by reasoning
about an informal specification. As a formal specification of the example system is not available, test

An algebraic specification defines the intended semantics (behaviour) of an ADT by giving axioms describing the interaction

of operations [6].

Copyright 2001 John Wiley & Sons, Ltd.

Softw. Test. Verif. Reliab. 2001; 11:207225

OBJECT-ORIENTED TESTING STRATEGIES USING THE MUTATION METHOD

211

Table III. Test cases generated by Doong and Frankls method.


Class name
LogServiceProvider
PrintWriterLogServiceProvider
Logger
LogMessage
LogException

No. of test cases


20
30

11
4

cases are generated from an informal specification of each class. One problem in this application is
that it does not give guidance as to how many test cases should be generated. When a formal algebraic
specification is available, it is possible to translate each axiom into a pair of ADT trees and apply
some kind of coverage criterion over the specification to obtain a finite test data set. In this case study,
the number of test cases to be generated had to be based on personal judgementmore test cases than
(or as many test cases as) the other two strategies under investigation were created.
Table III shows the number of test cases generated according to Doong and Frankls test method.
Note that test data for class Logger (which is a static class) were not generated by the method due
to a technical restriction in the current Java mutation system. A Java static class has a single copy
of member data associated with the class and shared by all instances of the class. Comparing different
states of two objects by performing different operations on the objects cannot be directly performed for
a static class because it cannot keep separate states of the objects to be compared. It might be possible
to store the results of the first sequence of operations externally and then reload the class to perform the
second sequence of operations. However, the implemented mutation system currently does not support
this facility.
2.2. Kirani and Tsais method sequence specification
Kirani and Tsai [7] proposed a new specification technique called method sequence specification
that aids correct object behaviour definition by explicitly specifying the causal relationship between
methods of a class. The causal relationships between methods specify the sequence in which the
methods can be executed (i.e. the correct order in which the methods of a class can be invoked by
the methods in client classes)for example, method m1 must be invoked before method m2 . The
strict sequence rules between methods of a class depend on the functionality of the class. For example,
the objects of a Stack class should receive a pop message only after receiving a push message. Some
of the ordering rules can be required due to implementation issues. For example, objects must receive

An ADT tree is a tree in which nodes represent operations along with their arguments. Each path from the root to a leaf of an
ADT tree represents a possible state of the ADT. For an ADT tree with n paths, n test cases that have equivalent tags can be
generated and n(n 1) test cases that have not-equivalent tags [6].
 A client is an object that uses the resources of another by calling its member functions.

Copyright 2001 John Wiley & Sons, Ltd.

Softw. Test. Verif. Reliab. 2001; 11:207225

212

S.-W. KIM, J. A. CLARK AND J. A. McDERMID

a constructor message before receiving any other messages and a delete message must be the last
message.
The regular expression formalism (or regular definition) is used to model the causal relationship
between methods. Some definitions used are as follows.
Definition 1. For a class C, Methods(C) is the set of all the methods defined in C that are publicly
available. For example, for a simple bank account class the method set might be:
Methods(Account) = {create, deposit, open, withdraw, close, delete}
Definition 2. A method sequence S of a class C is a finite sequence over a method set M of C,
(m0 m1 mn ) where each mi M. A method sequence need not contain all the methods in
the method set.
To represent the method sequence specification of a class C, regular definition over the alphabet
(usually referred to as ) consisting of methods from Methods(C) is used. The regular definition is a
set of definitions of the form:
l1 r1
l2 r2
..
.
ln rn
where each li is a distinct label, and each ri is a regular expression over the methods in Methods(C)
{l1 , l2 , . . . , li1 }. The regular expression notation uses to denote concatenation of methods or
subexpressions (e.g. m1 m2 denotes m1 followed by m2 ), and | to denote choice of methods or
subexpressions (e.g. m1 |m2 indicates that either m1 or m2 is chosen). The transitivity symbol
denotes zero or more instances of a method (or zero or more repetitions of a subexpression) while +
denotes at least one or more instances of a method or subexpression. For example, (m1 |m2 )+ denotes
a non-empty but otherwise arbitrary sequence of invocations of methods from the set {m1 , m2 }.
Definition 3. A SeqSpec(C) is the specification of a class C that defines a sequence relationship
between all the methods of C. A regular expression is used for specifying the SeqSpec(C).
The regular expression associated with a class defines the set of valid method sequences for all the
objects of the class. For example, the following is a method sequence specification for class Account. In
the sequence specification, the regular expression labels such as AccountMethods are shown in italic:
SeqSpec(Account):
= {create, deposit, open, withdraw, close, delete}
Methods create AccountMethods delete
AccountMethods open TransactionMethods close
TransactionMethods (deposit (deposit|withdraw) )

Copyright 2001 John Wiley & Sons, Ltd.

Softw. Test. Verif. Reliab. 2001; 11:207225

OBJECT-ORIENTED TESTING STRATEGIES USING THE MUTATION METHOD

213

withdraw
Start

open

Empty
Account

deposit

Operation
Account
deposit
close
Null
Account

Figure 1. STD corresponding to the Account class.

Definition 4. A SafeSeq(C) defines a set of all sequences Si that can be derived from SeqSpec(C) of a
class C. Any sequence in SafeSeq(C) is a valid sequence of messages accepted by any instance of the
class C. For example, a safe sequence set of SeqSpec(Account) is:
SafeSeq(Account):
{(create open deposit withdraw close delete),
(create open deposit deposit withdraw close delete), . . . }
The regular expression of a class is used to construct a state transition diagram (STD), and test cases
can be generated from the state transition diagram corresponding to the method sequence specification
by applying coverage criteria to the diagram. For example, Figure 1 is a state transition diagram
corresponding to the Account class [7]. The method sequence (test case) opendepositclose can
be constructed to satisfy the all-node coverage criterion, whereas the all-edge coverage criterion would
be achieved by the longer test method sequence open deposit withdraw deposit close
or perhaps two smaller sequences open deposit withdraw close and open deposit
deposit close.
A shortcoming of Kirani and Tsais method is that it might neglect errors caused due to the
unexpected uses of operations. Kirani and Tsais strategy specifies method sequence specifications (i.e.
the correct order in which the methods can be invoked) and derives test cases from the specifications.
Naturally it focuses on the expected usage of operations and is less concerned about unexpected
situations.
A method sequence specification for each class in the experimented Java system was written and a
state transition diagram for each regular expression of the class was constructed. The all-edge coverage
criterion (which subsumes the all-node coverage criterion) is applied for test case generation. Table IV
shows the number of test cases in the test set for each class generated according to Kirani and Tsais
method.
Copyright 2001 John Wiley & Sons, Ltd.

Softw. Test. Verif. Reliab. 2001; 11:207225

214

S.-W. KIM, J. A. CLARK AND J. A. McDERMID

Table IV. Test cases generated by Kirani and Tsais method.


Class name
LogServiceProvider
PrintWriterLogServiceProvider
Logger
LogMessage
LogException

No. of test cases


13
10
7
4
3

2.3. Kungs object state testing


State-based testing focuses on an objects state dependent behaviour rather than the control structures
and individual data. Kung et al. proposed program-based object state testing, which constructs an object
state test model from programs [8]. In this approach, the notions of state and transition are associated
with specific programming concepts rather than high-level application domain concepts. A state of an
object is defined by the range of values of member data and state transitions are defined by execution
of a member function.
This strategy defines object states by examining the member data that are checked in conditional
statements only (the state-defining data members). Kung et al. [8] state that a value of a data member
affects the class behaviour when it takes part in a decision (condition)the evaluation of which at
run-time controls the execution path. Otherwise the member data does not participate in the definition
of the objects state and state changes of the object are independent of that data member. Thus data
members not checked in conditional statements are not considered in Kungs test method.
An object state test model which captures state dependent behaviour of the objects is called an
object state diagram (OSD). An atomic OSD, denoted AOSD, for a class C is a deterministic finitestate machine consisting of:
a finite set of states, each of which is either a simple condition or an interval of data values of a
(state defining) data member defined in class C;
a finite set of triggers, each of which consists of a (possibly empty) guard condition, a member
function, followed by a (possibly empty) sequence of member functions called responses;
the set of initial states;
the set of final states.
For example, Figure 2 is an AOSD of a member variable curQtrs that keeps track of quarter coins
received in a coin box class of a vending machine. OSDs can be automatically constructed from
source code based on symbolic execution. Kung et al. provide an algorithm that extracts the states of

The guard condition is a state where the defined member function can be activated.
The machine accepts only quarters and allows vending when two quarters are received. It keeps track of the total number

of quarters received, the current quarters (denoted curQtrs) received and whether vending is enabled or not (denoted
allowVend). Its function includes adding a quarter, returning the current quarters, resetting the coin box to its initial state
and vending [8].

Copyright 2001 John Wiley & Sons, Ltd.

Softw. Test. Verif. Reliab. 2001; 11:207225

OBJECT-ORIENTED TESTING STRATEGIES USING THE MUTATION METHOD

215

curQtrs:unsigned
CCoinBox
reset
S0
returnQtrs
[0,0]
[allowVend!=0]vend

addQtr

S1
[1,M]

addQtr

Figure 2. AOSD of the curQtrs variable.

allowVend: unsigned

CCoinBox
reset

S0
0,0

vend

[curQtrs>0]addQtr

S1
1,M

curQtrs:unsigned
CCoinBox
reset
S0
returnQtrs
[0,0]
[allowVend!=0]vend

addQtr

S1
[1,M]
addQtr

Figure 3. COSD of the allowVend and curQtrs member variables.

the data members and identifies the effects of the member functions on the states of the data members
by symbolically executing each member function of the class.
Once an AOSD for each member variable that changes object states is drawn, a composite OSD
(COSDi.e. an OSD of all state-defining member variables) can be constructed recursively from
AOSDs. For example, Figure 3 is a COSD of the allowVend and curQtrs variables in the coin
box example.
Copyright 2001 John Wiley & Sons, Ltd.

Softw. Test. Verif. Reliab. 2001; 11:207225

216

S.-W. KIM, J. A. CLARK AND J. A. McDERMID

(s0,s0)
reset, returnQtr

addQtr

(s0,s0)

(s0,s1)

reset, returnQtr

addQtr

(s0,s0)

reset, vend

(s1,s1)
returnQtr

(s0,s0)

reset, vend

(s1,s0)

returnQtr

(s0,s0)

(s1,s0)

addQtr
(s1,s1)

addQtr
(s1,s1)

Figure 4. Spanning tree of the COSD of the allowVend and curQtrs variables.

The next step is constructing a spanning tree of the COSD. The nodes of a test tree represent the
composite states of the COSD and the edges of the tree represent transitions between the states. If the
COSD contains k AOSDs, then each state is represented by a k-tuple, where the ith component denotes
the state of the ith AOSD. For example, Figure 4 is a spanning tree of the COSD in Figure 3.
The tree is then used to construct the test cases, each of which is a sequence of transitions starting
from the root (representing the initial state) and ending at any node. For example, the following test
cases can be constructed from the tree in Figure 4:
(addQtr, addQtr, reset)
(addQtr, addQtr, returnQtr, vend).
A drawback of Kungs testing method is that it cannot reliably discover missing functionality or
conditions (a common shortcoming of the white-box test approaches). During the case study, a missing
condition error was discoveredthe value of a member variable should have been checked to determine
whether certain actions need to be performed, but the conditional statement was totally missing. It is
unlikely that Kungs object state testing method would discover this error. It will simply view the
data variable as non-state defining member data (because the conditional statement that references the
variable is missing) and the states of the variable are consequently not included in the state testing.
Table V shows the numbers of test cases for each class generated according to Kungs object state
testing. The method was not applicable to class LogMessage and LogException because they do
Copyright 2001 John Wiley & Sons, Ltd.

Softw. Test. Verif. Reliab. 2001; 11:207225

OBJECT-ORIENTED TESTING STRATEGIES USING THE MUTATION METHOD

217

Table V. Test cases generated by Kungs object state testing.


Class name
LogServiceProvider
PrintWriterLogServiceProvider
Logger
LogMessage
LogException

No. of test cases


4
5
16

not have any data attributes that are used in conditional statements (i.e. no state-defining member data).
Thus no test data were generated for LogMessage and LogException.

3. MUTANT GENERATION
This section discusses both the traditional program mutation technique and the authors own CM
technique that were used to generate the mutants of the Java program under test.
3.1. Traditional program mutation
Mutants are simple syntactic changes of the program under test constructed by the application of
mutant operators (predefined program modification rules). The goal of the mutant operator is to
introduce simple errors into the program under test and/or to enforce some kind of coverage of the
program. Mutant operators are determined by the language of the program being tested, and there are
several mutation operators, each corresponding to a different class of simple errors that may occur
in a language. For example, Mothra (a software testing environment that supports mutation-based
testing for multiple source languages) supplies 22 mutation operators for Fortran programs [9] and
77 operators for C programs [10].
As far as the authors are aware, mutation systems have been proposed for Fortran [9], C [10,11]
and Ada [12], but not Java. In order to apply the traditional program mutation to the Java system
in the case study, Mothras Fortran mutation system is converted to a mutation system for Java
i.e. Mothras Fortran mutation operators are translated into Java. The authors follow the specification
of Fortran mutation operators in [9] as strictly as possible but some changes/adjustments had to be
made because of syntactic differences between Fortran and Java. For instance, arithmetic operator
replacement (AOR) replaces each occurrence of one of the operators +, -, *, / and ** with each
of the other operators. The exponentiation operator ** was not included as it is not available in
Java, and a restriction of not replacing the + operator when the variable type is String was added
because in Java string concatenation using the + operator is possible. For another example, the GOTO
label replacement (GLR) operator replaces the labels in unconditional GOTO, computed GOTO and
arithmetic IF statements with every label in the same program unit. goto is a reserved word in Java
Copyright 2001 John Wiley & Sons, Ltd.

Softw. Test. Verif. Reliab. 2001; 11:207225

218

S.-W. KIM, J. A. CLARK AND J. A. McDERMID

Table VI. Mutants created by the traditional program mutation.

Class name
LogServiceProvider
PrintWriterLogServiceProvider
Logger
LogMessage
LogException

Mutants

Equivalent
mutants

221
49
147
219
36

24
7
8
15
5

but not part of the Java language. The GLR operator is adjusted to replace labels in Java break and
continue statements, instead.
Table VI shows the mutants of the experimental Java classes generated by the Java version of
Mothras Fortran operators.
3.2. Class Mutation
Traditional program mutation may overlook some types of errors in OO programs because the mutation
operators were derived from studies of errors commonly made in non-OO programming languages
such as Fortran and C. In the opinion of the authors, it is mainly the flaws related to OO-specific
features such as inheritance, polymorphism and so on that the existing mutation systems might fail to
handle adequately. Thus the authors have developed a mutation technique called CM which is designed
specifically to deal with the OO features in OO (Java) programming.
The main difference between CM and the traditional program mutation concerns the types of flaws
that CM is intended to represent. CM is targeted at the plausible faults that are likely to occur due
to the OO-specific features that Java providesclass declarations and references, single inheritance,
information hiding and polymorphism. Since faults are actually introduced into the program by
mutation operators, CM consists of a set of new mutant operators that could aid in the detection of
OO feature-related errors. Table VII shows a brief description of each CM operator. The details of the
CM technique can be found in [13] and the approach to deriving the plausible flaws related to the OO
features (using the safety analysis technique, HAZOP) is presented in [14].
Although the basic unit of CM is a class, the CM operators take into account cluster information
in generating mutants. That is, the program entities, expression patterns and parameter values to be
replaced are collected over the related classes and the relationships of the classes are dynamically
reflected to generate mutants. This is somehow inevitable because CM operators aim to deal with OO
features such as inheritance and polymorphism that involve the interactions between related classes
(components).
The authors do not claim that the operators in Table VII are a complete set for Java. There could be
other kinds of mutation operators that would address the suggested OO features and there are several
Copyright 2001 John Wiley & Sons, Ltd.

Softw. Test. Verif. Reliab. 2001; 11:207225

OBJECT-ORIENTED TESTING STRATEGIES USING THE MUTATION METHOD

219

Table VII. CM operators.


CM operators
AEC (Argument Expression
Change)
AMC (Access Modifier Change)
AND (Argument Number Decrease
by 1)
AOC (Argument Order Change)
CRT (Compatible Reference Type
replacement)
EHC (Exception Handling Change)
EHR (Exception Handler Removal)
FAR (Field Access expression
Replacement)
FLS (Field and Local variable
Swap)
HFA (Hiding Field variable
Addition)
HFE (Hiding Field access
Expression change)
HFR (Hiding Field variable
Removal)
HLR (Hiding Local variable
Removal)
ICE (Instance Creation Expression
changes)
MIR (Method Invocation expression
Replacement)
OME (Overriding/hiding Method
invocation Expression change)
OMR (Overriding Method
Removal)
POC (Parameter Order Change)
SMC (Static Modifier Change)
VMR (oVerloading Method
Removal)

Copyright 2001 John Wiley & Sons, Ltd.

Descriptions
Replace argument expressions of overloading methods with other
argument expressions of the methods found in a cluster
Replace an access modifier with other access modifiers
Delete each argument in method invocation expressions in turn
Change a method argument order in method invocation expressions
Replace a reference (class) type with compatible types
Change an exception handling statement (try-catch clause) to an
exception propagation statement (throws statement), and vice versa
Remove exception handlers one by one
Replace a field access expression with other field expressions of the
same field name
When there are field and local variables of the same name, their
declarations are swapped
Add a field variable of the same name as the field variable in a
superclass
Field access expressions to a hiding field are replaced by other field
access expressions that appear in the same class unit or in a cluster
Remove a field variable declaration when it hides the variable in a
superclass
Remove a local variable declaration when it hides the same name of a
field variable
Change an instance creation expression with other instance creation
expressions of the same type and compatible class types. The replaced
expressions are collected from the classes in a cluster
Replace a method invocation expression with other expressions of the
same method name
Method invocation expressions of overriding methods are replaced
with other invocation expressions appearing in the same class or in
the given cluster classes
The declaration of an overriding method is removed
Change a method parameter order in method declarations
Add or remove a static modifier
Remove the declaration of an overloading method

Softw. Test. Verif. Reliab. 2001; 11:207225

220

S.-W. KIM, J. A. CLARK AND J. A. McDERMID

Table VIII. CM mutants for each class.

Class name
LogServiceProvider
PrintWriterLogServiceProvider
Logger
LogMessage
LogException

CM
mutants

Equivalent
mutants

125
67
136
102
36

10
5
3
3
2

features (such as threads and the features added in higher versions than Java 1.0) not considered.
Further operators that would aid in the detection of the overlooked errors should be sought.
Table VIII shows the numbers of mutants of the Java classes created by the CM operators.
4. EVALUATION OF THE OO METHODS
This section evaluates the selected OO test strategies according to the mutation testing results for the
program under test.
4.1. Analysis on mutation execution results
Mutation analysis indicates how well the program has been tested by its adequacy score, the ratio
of killed mutants over non-equivalent mutants. A high score indicates that test data are close to being
adequate for the program being tested relative to the set of mutants of the program, whereas a low score
indicates a weakness in the test data. Table IX indicates how well the test sets generated according to
the three OO strategies kill mutants created using the traditional mutation operators. In general, none
of the strategies gets a high mutation score. There is only one test set whose mutation adequacy score
is greater than 80%the test set for class LogMessage by Doong and Frankls method (84.8%).
Table X shows the mutation execution results for CM. The mutation scores in the table indicate the
effectiveness of the OO strategies in detecting OO feature-directed faults. Similarly to the traditional
mutation results, the mutation adequacy scores are generally low. Only one test set gets a mutation
score greater than 70%the test set for class Logger by Kirani and Tsais method (70.68%).
In both the traditional program mutation and CM, the OO test methods do not show large differences
in their ability to kill the mutants. As no method has a significantly higher score than the others, the
relative strength of the OO strategies (i.e. which method is more effective than the others) could not be
determined from the mutation adequacy score alone.
Each operator in CM was designed to deal with the characteristics (and plausible errors caused by
those characteristics) of a certain OO feature. Table XI shows the CM results of the OO strategies
by the CM operator types (combining the results of the three test methods). Examining the mutation
results by CM operators gives quite a different view from the analysis in Table X (which indicates
that the overall effectiveness of the OO methods for the CM mutants is low). For most types the CM
mutants are actually killed, but a few types (AMC, mainly) have a very low score that degrades the
Copyright 2001 John Wiley & Sons, Ltd.

Softw. Test. Verif. Reliab. 2001; 11:207225

OBJECT-ORIENTED TESTING STRATEGIES USING THE MUTATION METHOD

221

Table IX. Results of the traditional program mutation.

Class name

OO test methods

Killed
mutants

Live
mutants

Adequacy
score (%)

LogServiceProvider

Doong and Frankl


Kirani and Tsai
Kung et al.
Doong and Frankl
Kirani and Tsai
Kung et al.
Doong and Frankl
Kirani and Tsai
Kung et al.
Doong and Frankl
Kirani and Tsai
Kung et al.
Doong and Frankl
Kirani and Tsai
Kung et al.

127
132
104
25
24
19

79
66
173
124

20
20

70
65
93
17
18
23

60
73
31
80

11
11

64.47
67.01
52.79
59.52
57.14
45.24

56.84
47.48
84.80
60.78

64.52
64.52

PrintWriterLogServiceProvider

Logger

LogMessage

LogException

Table X. Results of CM.

Class name

OO test methods

Killed
mutants

Live
mutants

Adequacy
score (%)

LogServiceProvider

Doong and Frankl


Kirani and Tsai
Kung et al.
Doong and Frankl
Kirani and Tsai
Kung et al.
Doong and Frankl
Kirani and Tsai
Kung et al.
Doong and Frankl
Kirani and Tsai
Kung et al.
Doong and Frankl
Kirani and Tsai
Kung et al.

60
66
54
41
43
42

94
91
50
35

19
18

55
49
61
21
19
20

39
42
49
64

15
16

52.17
57.39
46.96
66.13
69.36
67.74

70.68
68.42
50.51
35.35

55.88
52.94

PrintWriterLogServiceProvider

Logger

LogMessage

LogException

Copyright 2001 John Wiley & Sons, Ltd.

Softw. Test. Verif. Reliab. 2001; 11:207225

222

S.-W. KIM, J. A. CLARK AND J. A. McDERMID

Table XI. CM results of the OO methods by operators.


Class names
(total no. of non-equivalent mutants)

CM
No. of mutants No. of killed No. of live Killing
operators by operators
mutants
mutants rates (%)

AMC
AND
AOC
CRT
EHC
SMC
PrintWriterLogServiceProvider AMC
(62 mutants)
AND
HFA
CRT
HFR
HFE
SMC
EHC
POC
AOC
Logger
AMC
(133 mutants)
AOC
EHC
MIR
SMC
CRT
AND
LogMessage
AMC
(99 mutants)
SMC
POC
LogException
AMC
(34 mutants)
AND
SMC
AOC
POC
LogServiceProvider
(115 mutants)

55
26
12
4
4
14
22
16
6
2
1
1
3
1
1
9
31
24
2
10
8
19
33
55
10
34
16
8
2
7
1

12
22
11
4
4
14
10
15
0
2
1
1
3
1
1
9
6
14
1
9
8
19
33
10
6
34
2
7
2
7
1

43
4
1
0
0
0
12
1
6
0
0
0
0
0
0
0
25
10
1
1
0
0
0
45
4
0
14
1
0
0
0

21.82
84.62
91.67
100.00
100.00
100.00
45.46
93.75
0
100.00
100.00
100.00
100.00
100.00
100.00
100.00
19.36
58.33
50.00
90.00
100.00
100.00
100.00
18.18
60.00
100.00
12.50
87.50
100.00
100.00
100.00

See Table VII for the description of each CM operator in the table.

overall CM adequacy score. This implies that the OO test methods might not be effective at handling
certain OO-specific features (such as the information hiding feature that the AMC operator represents).
4.2. A refined analysis
For each test method, two sets of mutants are extracted from the mutation execution results. One is a
mutant set that is killed by this method but not by the other methods, intended to show the effectiveness
of this method against the other methods. The other set is the number of mutants that are live in this
Copyright 2001 John Wiley & Sons, Ltd.

Softw. Test. Verif. Reliab. 2001; 11:207225

OBJECT-ORIENTED TESTING STRATEGIES USING THE MUTATION METHOD

223

method only (i.e. not killed by this method but killed by the others). These two mutant sets are analysed
to exhibit the effectiveness and weakness of each method.
The test set generated according to Doong and Frankls method could uniquely kill some mutants
created by the operators that distort data values, which implies that the method is effective at detecting
data-related errors such as a wrong value is assigned to a variable, a wrong array index is used,
etc. It is also observed that Doong and Frankls method is particularly effective at killing the mutants
of the SMC operator in CM. The SMC operator changes a static variable to an instance variable and
an instance variable to a static variable, in order to examine the possible errors in static and dynamic
object states. The main idea of Doong and Frankls method is comparing the states of two objects after
executing different sequences of methods on the objects. When a data variable is changed from static to
dynamic (or from dynamic to static), it will directly affect the resulting states of objects, so the method
is effective at discovering any differences in a variables static/dynamic states. The test set, however,
fails to kill some coverage-related mutants such as the mutants created by Mothras SDL operator (that
deletes each statement to check statement coverage).
The mutants that are not killed by the test set of Kirani and Tsais method are mainly related
to program coverage. Similarly to Doong and Frankls method, Kirani and Tsais method does not
explicitly require statement coverage (all program code is executed at least once), so some statements
of the program are not referenced by the test data. For example, a mutant created by the SAN operator
(Mothras statement analysis operator that checks if every block is reached once) is not killed by the
test set because the test set never executes the block corresponding to the mutant.
Kungs object state testing method does not enforce statement coverage either, so the test set
generated according to Kungs object state testing method also fails to kill several coverage-related
mutants. Kungs method takes account of state-defining data members only (i.e. member variables
which are checked in a conditional statement), so the data attributes that do not participate in a decision
(condition) and the member methods that do not modify the state of the state-defining member variables
are neglected.
The relationship between the test methods was also studied by comparing the mutant execution result
of each mutant in each method. That is, if a mutant killed by a method A is also killed by the method B,
the effectiveness of the test methods A and B are similar. On the other hand, if the mutant is not killed
by the method A but by the method B, the test methods A and B are in a complementary relationship.
In both the traditional Mothra mutation and CM, all three strategies produce similar resultsthat is,
they kill what the other methods can kill and fail to kill what the others cannot kill. This indicates that
the methods are similar to each other in terms of handling the traditional errors that Mothra mutation
represents and are equally ineffective at handling a few OO-specific features represented in CM.
5. CONCLUSION
OO programming languages are extensively used in modern day software development. The testing of
programs written in such languages is clearly of great importance, but is less well-researched and
understood than the testing of programs written in procedural languages. There seems to be little
agreement on what a reasonable test set for OO programs should be. As a consequence, the relative
merits of different strategies for test set selection are unclear.
This paper has assessed the three OO test strategies using the mutation method which provides
one means of comparing the effectiveness of different test creation strategies in terms of fault finding
Copyright 2001 John Wiley & Sons, Ltd.

Softw. Test. Verif. Reliab. 2001; 11:207225

224

S.-W. KIM, J. A. CLARK AND J. A. McDERMID

ability. Each investigated OO method does not show a big difference in handling mutants (i.e. detecting
potential faults in the program) from each other, thus it was difficult to decide that one is more effective
than the other. Yet the experimental results give an idea of how to combine testing approaches to
make an effective testing strategy for OO programs. The majority of the mutants that the methods
fail to kill, indicating inadequacies in the test sets of the methods, are related to program coverage.
None of the OO methods explicitly require statement coverage, so it appears that these methods do not
execute every statement of the program. This shortcoming can be easily fulfilled by applying traditional
coverage methods such as control-flow testing. In order to be effective, some kinds of coverage methods
should be used with the OO methods. The CM results also show that the OO methods, although they
are claimed to be adequate for OO systems, are not particularly effective at dealing with a few OO
features, especially the information hiding feature represented by the AMC operator. Special attention
is required for those features in the testing of OO software.
In this paper CM has been used as an independent yardstick by which to measure the effectiveness
of various techniques. The work reported here has not assessed the effectiveness of CM adequacy as a
criterion in its own right. Other experimental work [15] has shown that traditional mutation adequate
test sets can fail to be CM adequate (and so as a criterion it clearly addresses something that traditional
mutation does not). Equally, since CM is very much an OO-directed technique, there is no expectation
that CM adequate tests sets should find more traditional errors reliably. The authors wish to stress
that they regard CM scores as a source of information for making engineering decisions. A software
developer might well tolerate 0% adequacy for a particular mutant type provided he or she believes that
the risks of that mutant type arising in practice as a fault are sufficiently small (e.g. because it is the
target of a specific rigorous inspection). For large class hierarchies and complex systems creating fully
CM adequate sets might be prohibitively expensive. The issue of what should be done with inadequate
test sets becomes even more relevant. Making them 100% adequate may be the purest solution, but not
necessarily the best one as judged by engineering risk reduction needs.
The main limitation of the case study presented in this paper is that it is not extensive. Further
experimental work is needed to see whether these results apply to a wider selection of programs. In
addition, there is a pressing need to exercise the full range of mutation operators. The manual labour
involved in creating the test data sets is very high and only a single test set for each technique has been
produced. Statistically convincing and robust results would require multiple test sets and the authors
are currently considering means of generating test sets automatically. However, even this small case
study allows some interesting observations to be made. Plausible reasons have been outlined to explain
specific features of the results and why the techniques differed in effectiveness for particular operators.
Assessing the adequacy of OO testing strategies seems to be a crucial and largely neglected area. The
faults modelled by the mutation operators could form the starting point for a theoretical assessment of
the fault detection ability of OO techniques. The authors believe that in the near-term further empirical
work is needed so that theoretical investigation can proceed with confidence. CM is offered as a
potential vehicle for assessing the adequacy of OO strategies and the authors aim to develop it further
for this purpose.

ACKNOWLEDGEMENT

The authors would like to thank Adrian Colyer at IBM Hursley U.K. for providing the product for the case study.

Copyright 2001 John Wiley & Sons, Ltd.

Softw. Test. Verif. Reliab. 2001; 11:207225

OBJECT-ORIENTED TESTING STRATEGIES USING THE MUTATION METHOD

225

REFERENCES
1. DeMillo RA, Lipton RJ, Sayward FG. Hints on test data selection: Help for the practicing programmer. IEEE Computer
1978; 11(4):3441.
2. Marick B. The Craft of Software Testing. Prentice-Hall: Englewood Cliffs, NJ, 1995.
3. Meyers S. Effective C++: 50 Specific Ways to Improve Your Programs and Designs. Addison-Wesley: Boston, MA, 1992.
4. Meyers S. More Effective C++: 35 New Ways to Improve Your Programs and Designs. Addison-Wesley: Boston, MA,
1996.
5. Kim S, Clark J, McDermid J. Assessing test set adequacy for object-oriented programs using class mutation. Proceedings
of 28th JAIIO: Symposium on Software Technology (SoST99), Buenos Aires, September 1999. Also available at
www.cs.york.ac.uk/jac
6. Doong R-K, Frankl PG. The ASTOOT approach to testing object-oriented programs. ACM Transactions on Software
Engineering and Methodology 1994; 3(2):101130.
7. Kirani S, Tsai WT. Method sequence specification and verification of classes. Journal of Object-Oriented Programming
1994; 7(6):2838.
8. Kung D, Suchak N, Gao J, Hsia P, Toyoshima Y, Chen C. On object state testing. Proceedings of IEEE COMPSAC94.
IEEE Computer Society Press, 1994; 222227.
9. King KN, Offutt A. A Fortran language system for mutation-based software testing. SoftwarePractice and Experience
1991; 21(7):686718.
10. Agrawal H, DeMillo R, Hathaway R, Hsu Wm, Hsu W, Krauser E, Martin RJ, Mathur A, Spafford E. Design of mutant
operators for the C programming language. Technical Report SERC-TR-41-P, Software Engineering Research Center,
Purdue University, 1989.
11. Delamaro M, Maldonado JC, Mathur AP. Integration testing using interface mutations. Proceedings of the VII International
Symposium on Software Reliability Engineering (ISSRE). IEEE Computer Society Press, 1996; 112121.
12. Offutt AJ, Voas J, Payne J. Mutation operators for Ada. Technical Report ISSE-TR-96-09, Information and Software
Systems Engineering, George Mason University, 1996.
13. Kim S, Clark J, McDermid J. Class mutation: Mutation testing for object-oriented programs. Object-Oriented Software
Systems, Net.ObjectDays2000, Netobjectdays Forum, Erfurt, Germany, October 2000.
14. Kim S, Clark J, McDermid J. The rigorous generation of Java mutation operators using HAZOP. Proceedings of the 12th
International Conference on Software and Systems Engineering and their Applications (ICSSEA99), CNAM, Paris, France,
December 1999.
15. Kim S, Clark JA, McDermid JA. Investigating the applicability of traditional test adequacy criteria for object-oriented
programs. Proceedings of FESMA-AEMES2000: The European Software Measurement Conference, European Federation
of Software Metrics Associations, Polytechnic University of Madrid, Spain, October 2000.

Copyright 2001 John Wiley & Sons, Ltd.

Softw. Test. Verif. Reliab. 2001; 11:207225

S-ar putea să vă placă și