Documente Academic
Documente Profesional
Documente Cultură
Black-box and white-box are test design methods. Black-box test design treats the system as a black-box, so it
doesnt explicitly use knowledge of the internal structure. Black-box test design is usually described as focusing on
testing functional requirements. Synonyms for black-box include: behavioral, functional, opaque-box, and closedbox. White-box test design allows one to peek inside the box, and it focuses specifically on using internal
knowledge of the software to guide the selection of test data. Synonyms for white-box include: structural, glass-box
and clear-box.
While black-box and white-box are terms that are still in popular use, many people prefer the terms "behavioral" and
"structural". Behavioral test design is slightly different from black-box test design because the use of internal
knowledge isn't strictly forbidden, but it's still discouraged. In practice, it hasn't proven useful to use a single test
design method. One has to use a mixture of different methods so that they aren't hindered by the limitations of a
particular one. Some call this "gray-box" or "translucent-box" test design, but others wish we'd stop talking about
boxes altogether.
It is important to understand that these methods are used during the test design phase, and their influence is hard to
see in the tests once they're implemented. Note that any level of testing (unit testing, system testing, etc.) can use
any test design methods. Unit testing is usually associated with structural test design, but this is because testers
usually don't have well-defined requirements at the unit level to validate.
2. What are unit, component and integration testing?
Note that the definitions of unit, component, integration, and
Unit. The smallest compliable component. A unit typically is the work of one programmer (At least in principle). As
defined, it does not include any called sub-components (for procedural languages) or communicating components in
general.
Unit Testing: in unit testing called components (or communicating components) are replaced with stubs,
simulators, or trusted components. Calling components are replaced with drivers or trusted super-components. The
unit is tested in isolation.
component: a unit is a component. The integration of one or more components is a component.
Note: The reason for "one or more" as contrasted to "Two or more" is to allow for components that call themselves
recursively.
component testing: same as unit testing except that all stubs and simulators are replaced with the real thing.
Two components (actually one or more) are said to be integrated when:
a. They have been compiled, linked, and loaded together.
b. They have successfully passed the integration tests at the interface between them.
Thus, components A and B are integrated to create a new, larger, component (A,B). Note that this does not conflict
with the idea of incremental integrationit just means that A is a big component and B, the component added, is a
small one.
Integration testing: carrying out integration tests.
Integration tests (After Leung and White) for procedural languages. This is easily generalized for OO languages by
using the equivalent constructs for message passing. In the following, the word "call" is to be understood in the
most general sense of a data flow and is not restricted to just formal subroutine calls and returns for example,
passage of data through global data structures and/or the use of pointers.
Let A and B be two components in which A calls B.
Let Ta be the component level tests of A
Let Tb be the component level tests of B
Tab The tests in A's suite that cause A to call B.
Tbsa The tests in B's suite for which it is possible to sensitize A -- the inputs
are to A, not B.
Tbsa + Tab == the integration test suite (+ = union).
Note: Sensitize is a technical term. It means inputs that will cause a routine to go down a specified path. The
inputs are to A. Not every input to A will cause A to traverse a path in which B is called. Tbsa is the set of tests
which do cause A to follow a path in which B is called. The outcome of the test of B may or may not be affected.
There have been variations on these definitions, but the key point is that it is pretty darn formal and there's a goodly
hunk of testing theory, especially as concerns integration testing, OO testing, and regression testing, based on
them.
As to the difference between integration testing and system testing. System testing specifically goes after behaviors
and bugs that are properties of the entire system as distinct from properties attributable to components (unless, of
course, the component in question is the entire system). Examples of system testing issues:
Resource loss bugs, throughput bugs, performance, security, recovery,
Transaction synchronization bugs (often misnamed "timing bugs").
3. What's the difference between load and stress testing ?
One of the most common, but unfortunate misuse of terminology is treating load testing and stress testing as
synonymous. The consequence of this ignorant semantic abuse is usually that the system is neither properly load
tested nor subjected to a meaningful stress test.
Stress testing is subjecting a system to an unreasonable load while denying it the resources (e.g., RAM, disc,
mips, interrupts, etc.) needed to process that load. The idea is to stress a system to the breaking point in order to
find bugs that will make that break potentially harmful. The system is not expected to process the overload without
adequate resources, but to behave (e.g., fail) in a decent manner (e.g., not corrupting or losing data). Bugs and
failure modes discovered under stress testing may or may not be repaired depending on the application, the failure
mode, consequences, etc. The load (incoming transaction stream) in stress testing is often deliberately distorted so
as to force the system into resource depletion.
Load testing is subjecting a system to a statistically representative (usually) load. The two main reasons for using
such loads is in support of software reliability testing and in performance testing. The term "load testing" by itself is
too vague and imprecise to warrant use. For example, do you mean representative load," "overload," "high load,"
etc. In performance testing, load is
varied from a minimum (zero) to the maximum level the system can sustain without running out of resources or
having, transactions >suffer (application-specific) excessive delay.
A third use of the term is as a test whose objective is to determine the maximum sustainable load the system can
handle. In this usage, "load testing" is merely testing at the highest transaction arrival rate in performance testing.
4. What's the difference between QA and testing?
QA is more a preventive thing, ensuring quality in the company and therefore the product rather than just testing the
product for software bugs?
TESTING means "quality control"
QUALITY CONTROL measures the quality of a product
QUALITY ASSURANCE measures the quality of processes used to create a
quality product.
5. What is the best tester to developer ratio?
Reported tester: developer ratios range from 10:1 to 1:10.
There's no simple answer. It depends on so many things, Amount of reused code, number and type of interfaces,
platform, quality goals, etc.
It also can depend on the development model. The more specs, the less testers. The roles can play a big part
also. Does QA own beta? Do you include process auditors or planning activities?
These figures can all vary very widely depending on how you define "tester" and "developer". In some
organizations, a "tester" is anyone who happens to be testing software at the time -- such as their own. In other
organizations, a "tester" is only a member of an independent test group.
It is better to ask about the test labor content than it is to ask about the tester/developer ratio. The test labor
content, across most applications is generally accepted as 50%, when people do honest accounting. For life-critical
software, this can go up to 80%.
6. What is Software Quality Assurance?
Software QA involves the entire software development PROCESS - monitoring and improving the process, making
sure that any agreed-upon standards and procedures are followed, and ensuring that problems are found and dealt
with. It is oriented to 'prevention'.
7. What is Software Testing?
Testing involves operation of a system or application under controlled conditions and evaluating the results (eg, 'if
the user is in interface A of the application while using hardware B, and does C, then D should happen'). The
controlled conditions should include both normal and abnormal conditions. Testing should intentionally attempt to
make things go wrong to determine if things happen when they shouldn't or things don't happen when they should.
It is oriented to 'detection'.
Organizations vary considerably in how they assign responsibility for QA and testing. Sometimes they're the
combined responsibility of one group or individual. Also common are project teams that include a mix of testers and
developers who work closely together, with overall QA processes monitored by project managers. It will depend on
what best fits an organization's size and business structure.
8. What are some recent major computer system failures caused by Software bugs?
In March of 2002 it was reported that software bugs in Britain's national tax system resulted in more
than 100,000 erroneous tax overcharges. The problem was partly attibuted to the difficulty of testing the
integration of multiple systems.
A newspaper columnist reported in July 2001 that a serious flaw was found in off-the-shelf software that
had long been used in systems for tracking certain U.S. nuclear materials. The same software had been
recently donated to another country to be used in tracking their own nuclear materials, and it was not
until scientists in that country discovered the problem, and shared the information, that U.S. officials
became aware of the problems.
According to newspaper stories in mid-2001, a major systems development contractor was fired and
sued over problems with a large retirement plan management system. According to the reports, the
client claimed that system deliveries were late, the software had excessive defects, and it caused other
systems to crash.
In January of 2001 newspapers reported that a major European railroad was hit by the aftereffects of
the Y2K bug. The company found that many of their newer trains would not run due to their inability to
recognize the date '31/12/2000'; the trains were started by altering the control system's date settings.
News reports in September of 2000 told of a software vendor settling a lawsuit with a large mortgage
lender; the vendor had reportedly delivered an online mortgage processing system that did not meet
specifications, was delivered late, and didn't work.
In early 2000, major problems were reported with a new computer system in a large suburban U.S.
public school district with 100,000+ students; problems included 10,000 erroneous report cards and
students left stranded by failed class registration systems; the district's CIO was fired. The school
district decided to reinstate it's original 25-year old system for at least a year until the bugs were worked
out of the new system by the software vendors.
In October of 1999 the $125 million NASA Mars Climate Orbiter spacecraft was believed to be lost in
space due to a simple data conversion error. It was determined that spacecraft software used certain
data in English units that should have been in metric units. Among other tasks, the orbiter was to serve
as a communications relay for the Mars Polar Lander mission, which failed for unknown reasons in
December 1999. Several investigating panels were convened to determine the process failures that
allowed the error to go undetected.
Bugs in software supporting a large commercial high-speed data network affected 70,000 business
customers over a period of 8 days in August of 1999. Among those affected was the electronic trading
system of the largest U.S. futures exchange, which was shut down for most of a week as a result of the
outages.
In April of 1999 a software bug caused the failure of a $1.2 billion military satellite launch, the costliest
unmanned accident in the history of Cape Canaveral launches. The failure was the latest in a string of
launch failures, triggering a complete military and industry review of U.S. space launch programs,
including software integration and testing processes. Congressional oversight hearings were requested.
A small town in Illinois received an unusually large monthly electric bill of $7 million in March of 1999.
This was about 700 times larger than its normal bill. It turned out to be due to bugs in new software that
had been purchased by the local power company to deal with Y2K software issues.
In early 1999 a major computer game company recalled all copies of a popular new product due to
software problems. The company made a public apology for releasing a product before it was ready.
The computer system of a major online U.S. stock trading service failed during trading hours several
times over a period of days in February of 1999 according to nationwide news reports. The problem
was reportedly due to bugs in a software upgrade intended to speed online trade confirmations.
In April of 1998 a major U.S. data communications network failed for 24 hours, crippling a large part of
some U.S. credit card transaction authorization systems as well as other large U.S. bank, retail, and
government data systems. The cause was eventually traced to a software bug.
January 1998 news reports told of software problems at a major U.S. telecommunications company that
resulted in no charges for long distance calls for a month for 400,000 customers. The problem went
undetected until customers called up with questions about their bills.
In November of 1997 the stock of a major health industry company dropped 60% due to reports of
failures in computer billing systems, problems with a large database conversion, and inadequate
software testing. It was reported that more than $100,000,000 in receivables had to be written off and
that multi-million dollar fines were levied on the company by government agencies.
A retail store chain filed suit in August of 1997 against a transaction processing system vendor (not a
credit card company) due to the software's inability to handle credit cards with year 2000 expiration
dates.
In August of 1997 one of the leading consumer credit reporting companies reportedly shut down their
new public web site after less than two days of operation due to software problems. The new site
allowed web site visitors instant access, for a small fee, to their personal credit reports. However, a
number of initial users ended up viewing each others' reports instead of their own, resulting in irate
customers and nationwide publicity. The problem was attributed to "...unexpectedly high demand from
consumers and faulty software that routed the files to the wrong computers."
In November of 1996, newspapers reported that software bugs caused the 411 telephone information
system of one of the U.S. RBOC's to fail for most of a day. Most of the 2000 operators had to search
through phone books instead of using their 13,000,000-listing database. The bugs were introduced by
new software modifications and the problem software had been installed on both the production and
backup systems. A spokesman for the software vendor reportedly stated that 'It had nothing to do with
the integrity of the software. It was human error.'
On June 4 1996 the first flight of the European Space Agency's new Ariane 5 rocket failed shortly after
launching, resulting in an estimated uninsured loss of a half billion dollars. It was reportedly due to the
lack of exception handling of a floating-point error in a conversion from a 64-bit integer to a 16-bit
signed integer.
Software bugs caused the bank accounts of 823 customers of a major U.S. bank to be credited with
$924,844,208.32 each in May of 1996, according to newspaper reports. The American Bankers
Association claimed it was the largest such error in banking history. A bank spokesman said the
programming errors were corrected and all funds were recovered.
Software bugs in a Soviet early-warning monitoring system nearly brought on nuclear war in 1983,
according to news reports in early 1999. The software was supposed to filter out false missile
detections caused by Soviet satellites picking up sunlight reflections off cloud-tops, but failed to do so.
Disaster was averted when a Soviet commander, based on a what he said was a '...funny feeling in my
gut', decided the apparent missile attack was a false alarm. The filtering software code was rewritten.
9. Why is it often hard for management to get serious about quality assurance?
Solving problems is a high-visibility process; preventing problems is low-visibility. This is illustrated by an old
parable:
In ancient China there was a family of healers, one of whom was known throughout the land and employed as a
physician to a great lord. The physician was asked which of his family was the most skillful healer. He replied,
"I tend to the sick and dying with drastic and dramatic treatments, and on occasion someone is cured and my name
gets
out
among
the
lords."
"My elder brother cures sickness when it just begins to take root, and his skills are known among the local peasants
and
neighbors."
"My eldest brother is able to sense the spirit of sickness and eradicate it before it takes form. His name is unknown
outside our home."
10. Why does Software have bugs?
Software complexity - the complexity of current software applications can be difficult to comprehend for
anyone without experience in modern-day software development. Windows-type interfaces, clientserver and distributed applications, data communications, enormous relational databases, and sheer
size of applications have all contributed to the exponential growth in software/system complexity. And
the use of object-oriented techniques can complicate instead of simplify a project unless it is wellengineered.
changing requirements - the customer may not understand the effects of changes, or may understand
and request them anyway - redesign, rescheduling of engineers, effects on other projects, work already
completed that may have to be redone or thrown out, hardware requirements that may be affected, etc.
If there are many minor changes or any major changes, known and unknown dependencies among
parts of the project are likely to interact and cause problems, and the complexity of keeping track of
changes may result in errors. Enthusiasm of engineering staff may be affected. In some fast-changing
business environments, continuously modified requirements may be a fact of life. In this case,
management must understand the resulting risks, and QA and test engineers must adapt and plan for
continuous extensive testing to keep the inevitable bugs from running out of control.
time pressures - scheduling of software projects is difficult at best, often requiring a lot of guesswork.
When deadlines loom and the crunch comes, mistakes will be made.
poorly documented code - it's tough to maintain and modify code that is badly written or poorly documented;
the result is bugs. In many organizations management provides no incentive for programmers to document
their code or write clear, understandable code. In fact, it's usually the opposite: they get points mostly for
quickly turning out code, and there's job security if nobody else can understand it ('if it was hard to write, it
should be hard to read').
software development tools - visual tools, class libraries, compilers, scripting tools, etc. often introduce their
own bugs or are poorly documented, resulting in added bugs.
A lot depends on the size of the organization and the risks involved. For large organizations with high-risk
(in terms of lives or property) projects, serious management buy-in is required and a formalized QA process
is necessary.
Where the risk is lower, management and organizational buy-in and QA implementation may be a slower,
step-at-a-time process. QA processes should be balanced with productivity so as to keep bureaucracy from
getting out of hand.
For small groups or projects, a more ad-hoc process may be appropriate, depending on the type of
customers and projects. A lot will depend on team leads or managers, feedback to developers, and
ensuring adequate communications among customers, managers, developers, and testers.
In all cases the most value for effort will be in requirements management processes, with a goal of clear,
complete, testable requirement specifications or expectations.
Black box testing - not based on any knowledge of internal design or code. Tests are based on
requirements and functionality.
White box testing - based on knowledge of the internal logic of an application's code. Tests are based on
coverage of code statements, branches, paths, conditions.
unit testing - the most 'micro' scale of testing; to test particular functions or code modules. Typically done by
the programmer and not by testers, as it requires detailed knowledge of the internal program design and
code. Not always easily done unless the application has a well-designed architecture with tight code; may
require developing test driver modules or test harnesses.
incremental integration testing - continuous testing of an application as new functionality is added; requires
that various aspects of an application's functionality be independent enough to work separately before all
parts of the program are completed, or that test drivers be developed as needed; done by programmers or
by testers.
integration testing - testing of combined parts of an application to determine if they function together
correctly. The 'parts' can be code modules, individual applications, client and server applications on a
network, etc. This type of testing is especially relevant to client/server and distributed systems.
functional testing - black-box type testing geared to functional requirements of an application; this type of
testing should be done by testers. This doesn't mean that the programmers shouldn't check that their code
works before releasing it (which of course applies to any stage of testing.)
system testing - black-box type testing that is based on overall requirements specifications; covers all
combined parts of a system.
end-to-end testing - similar to system testing; the 'macro' end of the test scale; involves testing of a
complete application environment in a situation that mimics real-world use, such as interacting with a
database, using network communications, or interacting with other hardware, applications, or systems if
appropriate.
sanity testing - typically an initial testing effort to determine if a new software version is performing well
enough to accept it for a major testing effort. For example, if the new software is crashing systems every 5
minutes, bogging down systems to a crawl, or destroying databases, the software may not be in a 'sane'
enough condition to warrant further testing in its current state.
regression testing - re-testing after fixes or modifications of the software or its environment. It can be difficult
to determine how much re-testing is needed, especially near the end of the development cycle. Automated
testing tools can be especially useful for this type of testing.
acceptance testing - final testing based on specifications of the end-user or customer, or based on use by
end-users/customers over some limited period of time.
load testing - testing an application under heavy loads, such as testing of a web site under a range of loads
to determine at what point the system's response time degrades or fails.
stress testing - term often used interchangeably with 'load' and 'performance' testing. Also used to describe
such tests as system functional testing while under unusually heavy loads, heavy repetition of certain
actions or inputs, input of large numerical values, large complex queries to a database system, etc.
performance testing - term often used interchangeably with 'stress' and 'load' testing. Ideally 'performance'
testing (and any other 'type' of testing) is defined in requirements documentation or QA or Test Plans.
usability testing - testing for 'user-friendliness'. Clearly this is subjective, and will depend on the targeted
end-user or customer. User interviews, surveys, video recording of user sessions, and other techniques can
be used. Programmers and testers are usually not appropriate as usability testers.
recovery testing - testing how well a system recovers from crashes, hardware failures, or other catastrophic
problems.
security testing - testing how well the system protects against unauthorized internal or external access,
willful damage, etc; may require sophisticated testing techniques.
exploratory testing - often taken to mean a creative, informal software test that is not based on formal test
plans or test cases; testers may be learning the software as they test it.
ad-hoc testing - similar to exploratory testing, but often taken to mean that the testers have significant
understanding of the software before testing it.
alpha testing - testing of an application when development is nearing completion; minor design changes
may still be made as a result of such testing. Typically done by end-users or others, not by programmers or
testers.
beta testing - testing when development and testing are essentially completed and final bugs and problems
need to be found before final release. Typically done by end-users or others, not by programmers or testers.
mutation testing - a method for determining if a set of test data or test cases is useful, by deliberately
introducing various code changes ('bugs') and retesting with the original test data/cases to determine if the
'bugs' are detected. Proper implementation requires large computational resources.
poor requirements - if requirements are unclear, incomplete, too general, or not testable, there will be
problems.
unrealistic schedule - if too much work is crammed in too little time, problems are inevitable.
inadequate testing - no one will know whether or not the program is any good until the customer
complains or systems crash.
featuritis - requests to pile on new features after development is underway; extremely common.
solid requirements - clear, complete, detailed, cohesive, attainable, testable requirements that are
agreed to by all players. Use prototypes to help nail down requirements.
realistic schedules - allow adequate time for planning, design, testing, bug fixing, re-testing, changes,
and documentation; personnel should be able to complete the project without burning out.
adequate testing - start testing early on, re-test after fixes or changes, plan for adequate time for testing
and bug-fixing.
stick to initial requirements as much as possible - be prepared to defend against changes and additions
once development has begun, and be prepared to explain consequences. If changes are necessary,
they should be adequately reflected in related schedule changes. If possible, use rapid prototyping
during the design phase so that customers can see what to expect. This will provide them a higher
comfort level with their requirements decisions and minimize changes later on.
communication - require walkthroughs and inspections when appropriate; make extensive use of group
communication tools - e-mail, groupware, networked bug-tracking tools and change management tools,
intranet capabilities, etc.; insure that documentation is available and up-to-date - preferably electronic,
not paper; promote teamwork and cooperation; use prototypes early on so that customers' expectations
are clarified.
use descriptive function and method names - use both upper and lower case, avoid abbreviations, use as
many characters as necessary to be adequately descriptive (use of more than 20 characters is not out of
line); be consistent in naming conventions.
use descriptive variable names - use both upper and lower case, avoid abbreviations, use as many
characters as necessary to be adequately descriptive (use of more than 20 characters is not out of line); be
consistent in naming conventions.
function and method sizes should be minimized; less than 100 lines of code is good, less than 50 lines is
preferable.
function descriptions should be clearly spelled out in comments preceding a function's code.
coding style should be consistent throught a program (eg, use of brackets, indentations, naming
conventions, etc.)
in adding comments, err on the side of too many rather than too few comments; a common rule of thumb is
that there should be at least as many lines of comments (including header blocks) as lines of code.
no matter how small, an application should include documentaion of the overall program function and flow
(even a few paragraphs is better than nothing); or if possible a separate flow chart and detailed program
documentation.
make extensive use of error handling procedures and status and error logging.
for C++, to minimize complexity and increase maintainability, avoid too many levels of inheritance in class
heirarchies (relative to the size and complexity of the application). Minimize use of multiple inheritance, and
minimize use of operator overloading (note that the Java programming language eliminates multiple
inheritance and operator overloading.)
for C++, keep class methods small, less than 50 lines of code per method is preferable.
the program should act in a way that least surprises the user
it should always be evident to the user what can be done next and how to exit
the program shouldn't let the users do something stupid without warning them.
SEI = 'Software Engineering Institute' at Carnegie-Mellon University; initiated by the U.S. Defense
Department to help improve software development processes.
CMM = 'Capability Maturity Model', developed by the SEI. It's a model of 5 levels of organizational
'maturity' that determine effectiveness in delivering quality software. It is geared to large organizations
such as large U.S. Defense Department contractors. However, many of the QA processes involved are
appropriate to any organization, and if reasonably applied can be helpful. Organizations can receive
CMM ratings by undergoing assessments by qualified auditors.
ISO = 'International Organisation for Standardization' - The ISO 9001:2000 standard (which replaces the
previous standard of 1994) concerns quality systems that are assessed by outside auditors, and it applies
to many kinds of production and manufacturing organizations, not just software. It covers documentation,
design, development, production, testing, installation, servicing, and other processes. The full set of
standards consists of: (a)Q9001-2000 - Quality Management Systems: Requirements; (b)Q9000-2000 Quality Management Systems: Fundamentals and Vocabulary; (c)Q9004-2000 - Quality Management
Systems: Guidelines for Performance Improvements. To be ISO 9001 certified, a third-party auditor
assesses an organization, and certification is typically good for about 3 years, after which a complete
reassessment is required. Note that ISO certification does not necessarily indicate quality products - it
indicates only that documented processes are followed.
IEEE = 'Institute of Electrical and Electronics Engineers' - among other things, creates standards such as
'IEEE Standard for Software Test Documentation' (IEEE/ANSI Standard 829), 'IEEE Standard of Software
Unit Testing (IEEE/ANSI Standard 1008), 'IEEE Standard for Software Quality Assurance Plans' (IEEE/ANSI
Standard 730), and others.
ANSI = 'American National Standards Institute', the primary industrial standards body in the U.S.; publishes
some software-related standards in conjunction with the IEEE and ASQ (American Society for Quality).
Other software development process assessment methods besides CMM and ISO 9000 include SPICE,
Trillium, TickIT. and Bootstrap.
10
Possibly. For small projects, the time needed to learn and implement them may not be worth it. For
larger projects, or on-going long-term projects they can be valuable.
A common type of automated tool is the 'record/playback' type. For example, a tester could click
through all combinations of menu choices, dialog box choices, buttons, etc. in an application GUI and
have them 'recorded' and the results logged by a tool. The 'recording' is typically in the form of text
based on a scripting language that is interpretable by the testing tool. If new buttons are added, or
some underlying code in the application is changed, etc. the application can then be retested by just
'playing back' the 'recorded' actions, and comparing the logging results to check effects of the changes.
The problem with such tools is that if there are continual changes to the system being tested, the
'recordings' may have to be changed so much that it becomes very time-consuming to continuously
update the scripts. Additionally, interpretation of results (screens, data, logs, etc.) can be a difficult task.
Note that there are record/playback tools for text-based interfaces also, and for all types of platforms.
11
be able to maintain enthusiasm of their team and promote a positive atmosphere, despite what is a
somewhat 'negative' process (e.g., looking for or preventing problems)
have the ability to withstand pressures and say 'no' to other managers when quality is insufficient or QA
processes are not being adhered to
have people judgement skills for hiring and keeping skilled personnel
be able to communicate with technical and non-technical people, engineers, managers, and customers.
Obtain requirements, functional design, and internal design specifications and other necessary documents
Determine project-related personnel and their responsibilities, reporting requirements, required standards
and processes (such as release processes, change processes, etc.)
Identify application's higher-risk aspects, set priorities, and determine scope and limitations of tests
Determine test approaches and methods - unit, integration, functional, system, load, usability tests, etc.
Determine testware requirements (record/playback tools, coverage analyzers, test tracking, problem/bug
tracking, etc.)
12
Prepare test environment and testware, obtain needed user manuals/reference documents/configuration
guides/installation guides, set up test tracking processes, set up logging and archiving processes, set up or
obtain test input data
Perform tests
Retest as needed
Maintain and update test plans, test cases, test environment, and testware through life cycle
Title
Table of Contents
Relevant related document list, such as requirements, design documents, other test plans, etc.
Traceability requirements
Test outline - a decomposition of the test approach by test type, feature, functionality, process, system,
module, etc. as applicable
13
Outline of data input equivalence classes, boundary value analysis, error classes
Test environment - hardware, operating systems, other required software, data configurations, interfaces to
other systems
Test environment validity analysis - differences between the test and production systems and their impact
on test validity.
Software CM processes
Outline of system-logging/error-logging/other capabilities, and tools such as screen capture software, that
will be used to help describe and report bugs
Discussion of any specialized software or hardware tools that will be used by testers to help track the cause
or source of bugs
Personnel allocation
Test site/location
Outside test organizations to be utilized and their purpose, responsibilities, deliverables, contact persons,
and coordination issues
Open issues
A test case is a document that describes an input, action, or event and an expected response, to determine
if a feature of an application is working correctly. A test case should contain particulars such as test case
identifier, test case name, objective, test conditions/setup, input data requirements, steps, and expected
results.
Note that the process of developing test cases can help find problems in the requirements or design of an
application, since it requires completely thinking through the operation of the application. For this reason,
it's useful to prepare test cases early in the development cycle if possible.
14
variety of commercial problem-tracking/management software tools are available. The following are items to
consider in the tracking process:
Complete information such that developers can understand the bug, get an idea of it's severity, and
reproduce it if necessary.
The function, module, feature, object, screen, etc. where the bug occurred
Description of steps needed to reproduce the bug if not covered by a test case or if the developer
doesn't have easy access to the test case/test script/test tool
File excerpts/error messages/log file excerpts/screen shots/test tool logs that would be helpful in finding
the cause of the problem
Tester name
Test date
Description of fix
Date of fix
Retest date
Retest results
A reporting or tracking process should enable notification of appropriate personnel at various stages. For instance,
testers need to know when retesting is needed, developers need to know when bugs are found and how to get the
needed information, and reporting/summary capabilities are needed for managers.
33. What is 'configuration management'?
Configuration management covers the processes used to control, coordinate, and track: code, requirements,
documentation, problems, change requests, designs, tools/compilers/libraries/patches, changes made to them, and
who makes the changes.
15
Which aspects of the application can be tested early in the development cycle?
Which parts of the code are most complex, and thus most subject to errors?
Which parts of the requirements and design are unclear or poorly thought out?
What do the developers think are the highest-risk aspects of the application?
What kinds of problems would cause the most customer service complaints?
Work with the project's stakeholders early on to understand how requirements might change so that
alternate test plans and strategies can be worked out in advance, if possible.
16
It's helpful if the application's initial design allows for some adaptability so that later changes do not require
redoing the application from scratch.
If the code is well-commented and well-documented this makes changes easier for the developers.
Use rapid prototyping whenever possible to help customers feel sure of their requirements and minimize
changes.
The project's initial schedule should allow for some extra time commensurate with the possibility of
changes.
Try to move new requirements to a 'Phase 2' version of an application, while using the original requirements
for the 'Phase 1' version.
Negotiate to allow only easily-implemented new requirements into the project, while moving more difficult
new requirements into future versions of the application.
Be sure that customers and management understand the scheduling impacts, inherent risks, and costs of
significant requirements changes. Then let management or the customers (not the developers or testers)
decide if the changes are warranted - after all, that's their job.
Balance the effort put into setting up automated testing with the expected effort required to re-do them to
deal with changes.
Focus initial automated testing on application aspects that are most likely to remain unchanged.
Devote appropriate effort to risk analysis of changes to minimize regression testing needs.
Design some flexibility into test cases (this is not easily done; the best bet might be to minimize the detail in
the test cases, or set up only higher-level generic-type test plans)
Focus less on detailed test plans and test cases and more on ad hoc testing (with an understanding of the
added risk that this entails).
38. What if the project isn't big enough to justify extensive testing?
Consider the impact of project errors, not the size of the project. However, if extensive testing is still not justified, risk
analysis is again needed and the same considerations as described previously in 'What if there isn't enough time for
thorough testing?' apply. The tester might then do ad hoc testing, or write up a limited test plan based on the risk
analysis.
39. What if the application has functionality that wasn't in the requirements?
It may take serious effort to determine if an application has significant unexpected or hidden functionality, and it
would indicate deeper problems in the software development process. If the functionality isn't necessary to the
purpose of the application, it should be removed, as it may have unknown impacts or dependencies that were not
taken into account by the designer or the customer. If not removed, design information will be needed to determine
added testing needs or regression testing needs. Management should be made aware of any significant added risks
as a result of the unexpected functionality. If the functionality only effects areas such as minor improvements in the
user interface, for example, it may not be a significant risk.
40. How can Software QA processes be implemented without stifling productivity?
By implementing QA processes slowly over time, using consensus to reach agreement on processes, and adjusting
and experimenting as an organization grows and matures, productivity will be improved instead of stifled. Problem
prevention will lessen the need for problem detection, panics and burn-out will decrease, and there will be improved
focus and less wasted effort. At the same time, attempts should be made to keep processes simple and efficient,
minimize paperwork, promote computer-based processes and automated tracking and reporting, minimize time
required in meetings, and promote training as part of the QA process. However, no one - especially talented
technical types - likes rules or bureacracy, and in the short run things may slow down a bit. A typical scenario would
be that more days of planning and development will be needed, but less time will be required for late-night bugfixing and calming of irate customers.
41. What if an organization is growing so fast that fixed QA processes are impossible?
17
This is a common problem in the software industry, especially in new technology areas. There is no easy solution in
this situation, other than:
Management should 'ruthlessly prioritize' quality issues and maintain focus on the customer
Everyone in the organization should be clear on what 'quality' means to the customer
What are the expected loads on the server (e.g., number of hits per unit time?), and what kind of
performance is required under such loads (such as web server response time, database query response
times). What kinds of tools will be needed for performance testing (such as web load testing tools, other
tools already in house that can be adapted, web robot downloading tools, etc.)?
Who is the target audience? What kind of browsers will they be using? What kind of connection speeds will
they by using? Are they intra- organization (thus with likely high connection speeds and similar browsers) or
Internet-wide (thus with a wide variety of connection speeds and browser types)?
What kind of performance is expected on the client side (e.g., how fast should pages appear, how fast
should animations, applets, etc. load and run)?
Will down time for server and content maintenance/upgrades be allowed? how much?
What kinds of security (firewalls, encryptions, passwords, etc.) will be required and what is it expected to
do? How can it be tested?
How reliable are the site's Internet connections required to be? And how does that affect backup system or
redundant connection requirements and testing?
What processes will be required to manage updates to the web site's content, and what are the
requirements for maintaining, tracking, and controlling page content, graphics, links, etc.?
Which HTML specification will be adhered to? How strictly? What variations will be allowed for targeted
browsers?
Will there be any standards or requirements for page appearance and/or graphics throughout a site or parts
of a site??
How will internal and external links be validated and updated? how often?
Can testing be done on the production system, or will a separate test system be required? How are browser
caching, variations in browser option settings, dial-up connection variabilities, and real-world internet 'traffic
congestion' problems to be accounted for in testing?
How extensive or customized are the server logging and reporting requirements; are they considered an
integral part of the system and do they require testing?
How are cgi programs, applets, javascripts, ActiveX components, etc. to be maintained, tracked, controlled,
and tested?
18
Pages should be 3-5 screens max unless content is tightly focused on a single topic. If larger, provide
internal links within the page.
The page layouts and design elements should be consistent throughout a site, so that it's clear to the user
that they're still within a site.
All pages should have links external to the page; there should be no dead-end pages.
The page owner, revision date, and a link to a contact person or organization should be included on each
page.
Error Handling
Calculation errors
Race Conditions
Load Conditions
Hardware
19
Testing Errors
Functionality
Possible Error Conditions
Excessive Functionality
Inflated impression of functionality
Inadequacy for the task at hand
Missing function
Wrong function
Functionality must be created by user
Doesn't do what the user expects
Communication
Missing Information
Sl No
Possible Error Conditions
1
No on Screen instructions
2
Assuming printed documentation is already available.
3
Undocumented features
4
States that appear impossible to exit
5
No cursor
6
Failure to acknowledge input
7
Failure to show activity during long delays
8
Failure to advise when a change will take effect
9
Failure to check for the same document being opened twice
Wrong, misleading, confusing information
10
Simple factual errors
11
Spelling errors
12
Inaccurate simplifications
13
Invalid metaphors
14
Confusing feature names
15
More than one name for the same feature
16
Information overland
17
When are data saved
18
Wrong function
19
Functionality must be created by user
20
Poor external modularity
Help text and error messages
21
Inappropriate reading levels
22
Verbosity
23
Inappropriate emotional tone
24
Factual errors
25
Context errors
26
Failure to identify the source of error
27
Forbidding a resource without saying why
28
Reporting non-errors
29
Failure to highlight the part of the screen
30
Failure to clear highlighting
31
Wrong/partial string displayed
32
Message displayed for too long or not long enough
Display Layout
33
Poor aesthetics in screen layout
34
Menu Layout errors
20
35
Dialog box layout errors
36
Obscured Instructions
37
Misuse of flash
38
Misuse of color
39
Heavy reliance on color
40
Inconsistent with the style of the environment
41
Cannot get rid of on screen information
Output
42
Can't output certain data
43
Can't redirect output
44
Format incompatible with a follow-up process
45
Must output too little or too much
46
Can't control output layout
47
Absurd printout level of precision
48
Can't control labeling of tables or figures
49
Can't control scaling of graphs
Performance
50
Program Speed
51
User Throughput
52
Can't redirect output
53
Perceived performance
54
Slow program
55
slow echoing
56
how to reduce user throughput
57
Poor responsiveness
58
No type ahead
59
No warning that the operation takes long time
60
No progress reports
61
Problems with time-outs
62
Program pesters you
Program Rigidity
User tailorability
Sl No
Possible Error Conditions
1
Can't turn off case sensitivity
2
Can't tailor to hardware at hand
3
Can't change device initialization
4
Can't turn off automatic changes
5
Can't slow down/speed up scrolling
6
Can't do what you did last time
7
Failure to execute a customization commands
8
Failure to save customization commands
9
Side effects of feature changes
10
Can't turn off the noise
11
Infinite tailorability
Who is in control?
12
Unnecessary imposition of a conceptual style
13
Novice friendly, experienced hostile
14
Surplus or redundant information required
15
Unnecessary repetition of steps
16
Unnecessary limits
Command Structure and Rigidity
Inconsistencies
Sl No
Possible Error Conditions
1
Optimizations
2
Inconsistent syntax
21
3
Inconsistent command entry style
4
Inconsistent abbreviations
5
Inconsistent termination rule
6
Inconsistent command options
7
Similarly named commands
8
Inconsistent Capitalization
9
Inconsistent menu position
10
Inconsistent function key usage
11
Inconsistent error handling rules
12
Inconsistent editing rules
13
Inconsistent data saving rules
Time Wasters
14
Garden paths
15
choice can't be taken
16
Are you really, really sure
17
Obscurely or idiosyncratically named commands
Menus
18
Excessively complex menu hierarchy
19
Inadequate menu navigation options
20
Too many paths to the same place
21
You can't get there from here
22
Related commands relegated to unrelated menus
23
Unrelated commands tossed under the same menu
Command Lines
24
Forced distinction between uppercase and lowercase
25
Reversed parameters
26
Full command names are not allowed
27
Abbreviations are not allowed
28
Demands complex input on one line
29
no batch input
30
can't edit commands
Inappropriate use of key board
31
Failure to use cursor, edit, or function keys
32
Non std use of cursor and edit keys
33
non-standard use of function keys
34
Failure to filter invalid keys
35
Failure to indicate key board state changes
Missing Commands
State transitions
Sl No
Possible Error Conditions
1
Can't do nothing and leave
2
Can't quit mid-program
3
Can't stop mid-command
4
Can't pause
Disaster prevention
5
No backup facility
6
No undo
7
No are you sure
8
No incremental saves
Disaster prevention
9
Inconsistent menu position
10
Inconsistent function key usage
11
Inconsistent error handling rules
12
Inconsistent editing rules
13
Inconsistent data saving rules
Error handling by the user
22
14
No user specifiable filters
15
Awkward error correction
16
Can't include comments
17
Can't display relationships between variables
Miscellaneous
18
Inadequate privacy or security
19
Obsession with security
20
Can't hide menus
21
Doesn't support standard OS features
22
Doesn't allow long names
Error Handling
Sl No
1
2
3
4
5
6
7
Error prevention
Possible Error Conditions
Inadequate initial state validation
Inadequate tests of user input
Inadequate protection against corrupted data
Inadequate tests of passed parameters
Inadequate protection against operating system bugs
Inadequate protection against malicious use
Inadequate version control
Sl No
1
2
3
4
5
6
Error Detection
Possible Error Conditions
ignores overflow
ignores impossible values
ignores implausible values
ignores error flag
ignores hardware fault or error conditions
data comparison
Sl No
1
2
3
4
5
6
7
Error Recovery
Possible Error Conditions
automatic error detection
failure to report error
failure to set an error flag
where does the program go back to
aborting errors
recovery from hardware problems
no escape from missing disks
23
11
12
13
Calculation Errors
Sl No
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Race Conditions
Sl No
1
2
3
4
5
6
7
8
9
10
24
2
3
4
5
Re-entrance
Variables contains embedded command names
Wrong returning state assumed
Exception handling based exits
Sl No
1
2
3
4
Sl No
1
2
3
4
Program Stops
Possible Error Conditions
Dead crash
Syntax error reported at run time
Waiting for impossible condition or combinations of conditions
Wrong user or process priority
Sl No
1
2
3
4
5
6
Error Detection
Possible Error Conditions
infinite loop
Wrong starting value for the loop control variables
Accidental change of loop control variables
Command that do or don't belong inside the loop
Command that do or don't belong inside the loop
Improper loop nesting
Sl No
1
2
3
4
5
6
7
8
9
10
11
Sl No
1
2
3
4
Sl No
1
2
3
Multiple Cases
25
5
6
7
Sl No
1
2
3
Data boundaries
Possible Error Conditions
Un-terminated null strings
Early end of string
Read/Write past end of data structure or an element in it
Sl No
1
2
3
Sl No
1
2
3
4
Messaging Problems
Possible Error Conditions
Messages sent to wrong process or port
Failure to validate an incoming message
Lost or out of synch messages
Message sent to only N of N+1 processes
Sl No
1
2
3
4
Load Conditions
Sl No
1
2
26
3
4
5
6
7
8
9
10
11
Sl No
1
2
3
4
Hardware
Sl No
Possible Error Conditions
1
Wrong Device
2
Wrong Device Address
3
Device unavailable
4
Device returned to wrong type of pool
5
Device use forbidden to caller
6
Specifies wrong privilege level for the device
7
Noisy Channel
8
Channel goes down
9
Time-out problems
10
Wrong storage device
11
Doesn't check the directory of current disk
12
Doesn't close the file
13
Unexpected end of file
14
Disk sector bug and other length dependent errors
15
Wrong operation or instruction codes
16
Misunderstood status or return code
17
Underutilizing device intelligence
18
Paging mechanism ignored or misunderstood
19
Ignores channel throughput limits
20
Assuming device is or isn't or should be or shouldn't be initialized
21
Assumes programmable function keys are programmed correctly
Source, Version, ID Control
Sl No
1
2
3
4
5
6
7
8
Testing Errors
Sl No
27
1
2
3
4
5
6
7
8
Sl No
1
2
3
Sl No
1
2
3
4
5
6
7
8
Poor reporting
Possible Error Conditions
Illegible reports
Failure to make it clear how to reproduce the problem
Failure to say you can't reproduce the problem
Failure to check your report
Failure to report timing dependencies
Failure to simplify conditions
Concentration on trivia
Abusive language
Test strategy;
Test planning;
Test specification;
Test procedure.
These four stages of test design apply to all levels of testing, from unit testing through to system testing. This paper
concentrates on the specification of unit tests; i.e. the design of individual unit test cases within unit test
28
specifications. A more detailed description of the four stages of test design can be found in the IPL paper "An
Introduction to Software Testing".
The design of tests has to be driven by the specification of the software. For unit testing, tests are designed to verify
that an individual unit implements all design decisions made in the unit's design specification. A thorough unit test
specification should include positive testing, that the unit does what it is supposed to do, and also negative testing,
that the unit does not do anything that it is not supposed to do.
Producing a test specification, including the design of test cases, is the level of test design which has the highest
degree of creative input. Furthermore, unit test specifications will usually be produced by a large number of staff
with a wide range of experience, not just a few experts.
This paper provides a general process for developing unit test specifications, and then describes some specific
design techniques for designing unit test cases. It serves as a tutorial for developers who are new to formal testing
of software, and as a reminder of some finer points for experienced software testers.
B. Developing Unit Test Specifications
Once a unit has been designed, the next development step is to design the unit tests. An important point here is that
it is more rigorous to design the tests before the code is written. If the code was written first, it would be too tempting
to test the software against what it is observed to do (which is not really testing at all), rather than against what it is
specified to do.
A unit test specification comprises a sequence of unit test cases. Each unit test case should include four essential
elements:
A statement of the initial state of the unit, the starting point of the test case (this is only applicable where a
unit maintains state between calls);
The inputs to the unit, including the value of any external data read by the unit;
What the test case actually tests, in terms of the functionality of the unit and the analysis used in the design
of the test case (for example, which decisions within the unit are tested);
The expected outcome of the test case (the expected outcome of a test case should always be defined in
the test specification, prior to test execution).
The following subsections of this paper provide a six step general process for developing a unit test specification as
a set of individual unit test cases. For each step of the process, suitable test case design techniques are suggested.
(Note that these are only suggestions. Individual circumstances may be better served by other test case design
techniques). Section 3 of this paper then describes in detail a selection of techniques which can be used within this
process to help design test cases.
B.1 Step 1 - Make it Run
The purpose of the first test case in any unit test specification should be to execute the unit under test in the
simplest way possible. When the tests are actually executed, knowing that at least the first unit test will execute is a
good confidence boost. If it will not execute, then it is preferable to have something as simple as possible as a
starting point for debugging.
Suitable techniques:
- Specification derived tests
- Equivalence partitioning
B.2 Step 2 - Positive Testing
Test cases should be designed to show that the unit under test does what it is supposed to do. The test designer
should walk through the relevant specifications; each test case should test one or more statements of specification.
Where more than one specification is involved, it is best to make the sequence of test cases correspond to the
sequence of statements in the primary specification for the unit.
29
Suitable techniques:
- Specification derived tests
- Equivalence partitioning
- State-transition testing
B.3. Step 3 - Negative Testing
Existing test cases should be enhanced and further test cases should be designed to show that the software does
not do anything that it is not specified to do. This step depends primarily upon error guessing, relying upon the
experience of the test designer to anticipate problem areas.
Suitable techniques:
- Error guessing
- Boundary value analysis
- Internal boundary value testing
- State-transition testing
B.4. Step 4 - Special Considerations
Where appropriate, test cases should be designed to address issues such as performance, safety requirements and
security requirements. Particularly in the cases of safety and security, it can be convenient to give test cases special
emphasis to facilitate security analysis or safety analysis and certification. Test cases already designed which
address security issues or safety hazards should be identified in the unit test specification. Further test cases should
then be added to the unit test specification to ensure that all security issues and safety hazards applicable to the
unit will be fully addressed.
Suitable techniques:
- Specification derived tests
30
Depending upon an organizations standards for the specification of a unit, there may be no structural specification
of processing within a unit other than the code itself. There are also likely to have been human errors made in the
development of a test specification. Consequently, there may be complex decision conditions, loops and branches
within the code for which coverage targets may not have been met when tests were executed. Where coverage
objectives are not achieved, analysis must be conducted to determine why. Failure to achieve a coverage objective
may be due to:
Infeasible paths or conditions - the corrective action should be to annotate the test specification to provide a
detailed justification of why the path or condition is not tested. AdaTEST provides some facilities to help
exclude infeasible conditions from Boolean coverage metrics.
Unreachable or redundant code - the corrective action will probably be to delete the offending code. It is
easy to make mistakes in this analysis, particularly where defensive programming techniques have been
used. If there is any doubt, defensive programming should not be deleted.
Insufficient test cases - test cases should be refined and further test cases added to a test specification to
fill the gaps in test coverage.
Ideally, the coverage completion step should be conducted without looking at the actual code. However, in
practice some sight of the code may be necessary in order to achieve coverage targets. It is vital that all test
designers should recognize that use of the coverage completion step should be minimized. The most effective
testing will come from analysis and specification, not from experimentation and over dependence upon the
coverage completion step to cover for sloppy test design.
Suitable techniques:
- Branch testing
- Condition testing
- Data definition-use testing
- State-transition testing
B.8. General Guidance
Note that the first five steps in producing a test specification can be achieved:
It is usually a good idea to avoid long sequences of test cases which depend upon the outcome of preceding test
cases. An error identified by a test case early in the sequence could cause secondary errors and reduce the amount
of real testing achieved when the tests are executed.
The process of designing test cases, including executing them as "thought experiments", often identifies bugs
before the software has even been built. It is not uncommon to find more bugs when designing tests than when
executing tests.
Throughout unit test design, the primary input should be the specification documents for the unit under test. While
use of actual code as an input to the test design process may be necessary in some circumstances, test designers
must take care that they are not testing the code against itself. A test specification developed from the code will only
prove that the code does what the code does, not that it does what it is supposed to do.
C. Test Case Design Techniques
The preceding section of this paper has provided a "recipe" for developing a unit test specification as a set of
individual test cases. In this section a range of techniques which can be to help define test cases are described.
Test case design techniques can be broadly split into two main categories. Black box techniques use the interface to
a unit and a description of functionality, but do not need to know how the inside of a unit is built. White box
31
techniques make use of information about how the inside of a unit works. There are also some other techniques
which do not fit into either of the above categories. Error guessing falls into this category.
The most important ingredients of any test design are experience and common sense. Test designers should not let
any of the given techniques obstruct the application of experience and common sense.
The selection of test case design techniques described in the following subsections is by no means exhaustive.
Further information on techniques for test case design can be found in "Software Testing Techniques" 2nd Edition, B
Beizer,Van Nostrand Reinhold, New York 1990.
C.1. Specification Derived Tests
As the name suggests, test cases are designed by walking through the relevant specifications. Each test case
should test one or more statements of specification. It is often practical to make the sequence of test cases
correspond to the sequence of statements in the specification for the unit under test. For example, consider the
specification for a function to calculate the square root of a real number, shown in figure 3.1.
32
There are three statements in this specification, which can be addressed by two test cases. Note that the use of
Print_Line conveys structural information in the specification.
Test Case 1: Input 4, Return 2
- Exercises the first statement in the specification
("When given an input of 0 or greater, the positive square
root of the input shall be returned.").
Test Case 2: Input -10, Return 0, Output "Square root error - illegal negative input" using Print_Line.
-
Specification derived test cases can provide an excellent correspondence to the sequence of statements in the
specification for the unit under test, enhancing the readability and maintainability of the test specification. However,
specification derived testing is a positive test case design technique. Consequently, specification derived test cases
have to be supplemented by negative test cases in order to provide a thorough unit test specification.
A variation of specification derived testing is to apply a similar technique to a security analysis, safety analysis,
software hazard analysis, or other document which provides supplementary information to the unit's specification.
C.2. Equivalence Partitioning
Equivalence partitioning is a much more formalised method of test case design. It is based upon splitting the inputs
and outputs of the software under test into a number of partitions, where the behaviour of the software is equivalent
for any value within a particular partition. Data which forms partitions is not just routine parameters. Partitions can
also be present in data accessed by the software, in time, in input and output sequence, and in state.
Equivalence partitioning assumes that all values within any individual partition are equivalent for test purposes. Test
cases should therefore be designed to test one value in each partition. Consider again the square root function used
33
in the previous example. The square root function has two input partitions and two output partitions, as shown in
table 3.2.
The zero or greater partition has a boundary at 0 and a boundary at the most positive real number. The less than
zero partition shares the boundary at 0 and has another boundary at the most negative real number. The output has
a boundary at 0, below which it cannot go.
34
Test Case 1: Input {the most negative real number}, Return 0, Output "Square root error - illegal negative input"
using Print_Line
-Exercises the lower boundary of partition (i).
Test Case 2: Input {just less than 0}, Return 0, Output "Square root error - illegal
negative input" using Print_Line
- Exercises the upper boundary of partition (i).
Test Case 3: Input 0, Return 0
- Exercises just outside the upper boundary of partition (i),
the lower boundary of partition (ii) and the lower boundary
of partition (a).
Test Case 4: Input {just greater than 0}, Return {the positive square root of the input}
- Exercises just inside the lower boundary of partition (ii).
Test Case 5: Input {the most positive real number}, Return {the positive square root of the input}
- Exercises the upper boundary of partition (ii) and the upper boundary of
partition (a).
As for equivalence partitioning, it can become impractical to use boundary value analysis thoroughly for more
complex software. Boundary value analysis can also be meaningless for non scalar data, such as enumeration
values. In the example, partition (b) does not really have boundaries. For purists, boundary value analysis requires
knowledge of the underlying representation of the numbers. A more pragmatic approach is to use any small values
above and below each boundary and suitably big positive and negative numbers
C.4. State-Transition Testing
State transition testing is particularly useful where either the software has been designed as a state machine or the
software implements a requirement that has been modelled as a state machine. Test cases are designed to test the
transitions between states by creating the events which lead to transitions.
When used with illegal combinations of states and events, test cases for negative testing can be designed using this
approach. Testing state machines is addressed in detail by the IPL paper "Testing State Machines with AdaTEST
and Cantata".
C.5. Branch Testing
In branch testing, test cases are designed to exercise control flow branches or decision points in a unit. This is
usually aimed at achieving a target level of Decision Coverage. Given a functional specification for a unit, a "black
box" form of branch testing is to "guess" where branches may be coded and to design test cases to follow the
branches. However, branch testing is really a "white box" or structural test case design technique. Given a structural
specification for a unit, specifying the control flow within the unit, test cases can be designed to exercise branches.
Such a structural unit specification will typically include a flowchart or PDL.
Returning to the square root example, a test designer could assume that there would be a branch between the
processing of valid and invalid inputs, leading to the following test cases:
Test Case 1: Input 4, Return 2
- Exercises the valid input processing branch
Test Case 2: Input -10, Return 0, Output "Square root error - illegal negative input" using Print_Line.
- Exercises the invalid input processing branch
35
However, there could be many different structural implementations of the square root function. The following
structural specifications are all valid implementations of the square root function, but the above test cases would
only achieve decision coverage of the first and third versions of the specification.
36
It can be seen that branch testing works best with a structural specification for the unit. A structural unit specification
will enable branch test cases to be designed to achieve decision coverage, but a purely functional unit specification
could lead to coverage gaps.
One thing to beware of is that by concentrating upon branches, a test designer could loose sight of the overall
functionality of a unit. It is important to always remember that it is the overall functionality of a unit that is important,
and that branch testing is a means to an end, not an end in itself. Another consideration is that branch testing is
based solely on the outcome of decisions. It makes no allowances for the complexity of the logic which leads to a
decision.
C.6. Condition Testing
There are a range of test case design techniques which fall under the general title of condition testing, all of which
try to allay the weaknesses of branch testing when complex logical conditions are encountered. The object of
condition testing is to design test cases to show that the individual components of logical conditions and
combinations of the individual components are correct.
Test cases are designed to test the individual elements of logical expressions, both within branch conditions and
within other expressions in a unit. As for branch testing, condition testing could be used as a "black box" technique,
where the test designer makes intelligent guesses about the implementation of a functional specification for a unit.
However, condition testing is more suited to "white box" test design from a structural specification for a unit.
The test cases should be targeted at achieving a condition coverage metric, such as Modified Condition Decision
Coverage (available as Boolean Operand Effectiveness in AdaTEST). The IPL paper entitled "Structural Coverage
Metrics" provides more detail of condition coverage metrics.
To illustrate condition testing, consider the example specification for the square root function which uses successive
approximation (figure 3.3(d) - Specification 4). Suppose that the designer for the unit made a decision to limit the
37
algorithm to a maximum of 10 iterations, on the grounds that after 10 iterations the answer would be as close as it
would ever get. The PDL specification for the unit could specify an exit condition like that given in figure 3.4.
If the coverage objective is Modified Condition Decision Coverage, test cases have to prove that both error<desired
accuracy and iterations=10 can independently affect the outcome of the decision.
Test Case 1: 10 iterations, error>desired accuracy for all iterations.
- Both parts of the condition are false for the first 9
iterations. On the tenth iteration, the first part of the
condition is false and the second part becomes true,
showing that the iterations=10 part of the condition can
independently affect its outcome.
Test Case 2: 2 iterations, error>=desired accuracy for the first iteration, and
error<desired accuracy for the second iteration.
- Both parts of the condition are false for the first iteration.
On the second iteration, the first part of the condition
becomes true and the second part remains false, showing
that the error<desired accuracy part of the condition can
independently affect its outcome.
Condition testing works best when a structural specification for the unit is available. It provides a thorough test of
complex conditions, an area of frequent programming and design error and an area which is not addressed by
branch testing. As for branch testing, it is important for test designers to beware that concentrating on conditions
could distract a test designer from the overall functionality of a unit.
C.7. Data Definition-Use Testing
Data definition-use testing designs test cases to test pairs of data definitions and uses. A data definition is anywhere
that the value of a data item is set, and a data use is anywhere that a data item is read or used. The objective is to
create test cases which will drive execution through paths between specific definitions and uses.
Like decision testing and condition testing, data definition-use testing can be used in combination with a functional
specification for a unit, but is better suited to use with a structural specification for a unit.
Consider one of the earlier PDL specifications for the square root function which sent every input to the maths coprocessor and used the co-processor status to determine the validity of the result. (Figure 3.3(c) - Specification 3).
The first step is to list the pairs of definitions and uses. In this specification there are a number of definition-use
pairs, as shown in table 3.3.
38
These pairs of definitions and uses can then be used to design test cases. Two test cases are required to test all six
of these definition-use pairs:
Test Case 1: Input 4, Return 2
Tests definition-use pairs 1, 2, 5, 6
Test Case 2: Input -10, Return 0, Output "Square root error - illegal negative input" using Print_Line.
-
The analysis needed to develop test cases using this design technique can also be useful for identifying problems
before the tests are even executed; for example, identification of situations where data is used without having been
defined. This is the sort of data flow analysis that some static analysis tool can help with. The analysis of data
definition-use pairs can become very complex, even for relatively simple units. Consider what the definition-use
pairs would be for the successive approximation version of square root!
It is possible to split data definition-use tests into two categories: uses which affect control flow (predicate uses) and
uses which are purely computational. Refer to "Software Testing Techniques" 2nd Edition, B Beizer,Van Nostrand
Reinhold, New York 1990, for a more detailed description of predicate and computational uses.
C.8. Internal Boundary Value Testing
In many cases, partitions and their boundaries can be identified from a functional
specification for a unit, as described under equivalence partitioning and boundary value analysis above. However, a
unit may also have internal boundary values which can only be identified from a structural specification. Consider a
fragment of the successive approximation version of the square root unit specification, as shown in figure 3.5
( derived from figure 3.3(d) - Specification 4).
39
The calculated error can be in one of two partitions about the desired accuracy, a feature of the structural design for
the unit which is not apparent from a purely functional specification. An analysis of internal boundary values yields
three conditions for which test cases need to be designed.
Test Case 1: Error just greater than the desired accuracy
Test Case 2: Error equal to the desired accuracy
Test Case 3: Error just less than the desired accuracy
Internal boundary value testing can help to bring out some elusive bugs. For example, suppose "<=" had been
coded instead of the specified "<". Nevertheless, internal boundary value testing is a luxury to be applied only as a
final supplement to other test case design techniques.
C.9. Error Guessing
Error guessing is based mostly upon experience, with some assistance from other techniques such as boundary
value analysis. Based on experience, the test designer guesses the types of errors that could occur in a particular
type of software and designs test cases to uncover them. For example, if any type of resource is allocated
dynamically, a good place to look for errors is in the deallocation of resources. Are all resources correctly
deallocated, or are some lost as the software executes?
Error guessing by an experienced engineer is probably the single most effective method of designing tests which
uncover bugs. A well placed error guess can show a bug which could easily be missed by many of the other test
case design techniques presented in this paper. Conversely, in the wrong hands error guessing can be a waste of
time.
To make the maximum use of available experience and to add some structure to this test case design technique, it
is a good idea to build a check list of types of errors. This check list can then be used to help "guess" where errors
may occur within a unit. The check list should be maintained with the benefit of experience gained in earlier unit
tests, helping to improve the overall effectiveness of error guessing.
D. Conclusion
Experience has shown that a conscientious approach to unit testing will detect many bugs at a stage of the software
development where they can be corrected economically. A rigorous approach to unit testing requires:
40
That the expected outcomes of unit test cases are specified in the unit test
specification.
The process for developing unit test specifications presented in this paper is generic, in that it can be applied to any
level of testing. Nevertheless, there will be circumstances where it has to be tailored to specific situations. Tailoring
of the process and the use of test case design techniques should be documented in the overall test strategy.
48. LITERATURE REVIEW
2.1 Introduction
The purpose of this dissertation is to increase understanding of how experienced practitioners as individuals
evaluate diagrammatic models in Formal Technical Review (FTR). In this research, those aspects of FTR relating to
evaluation of an artifact by practitioners as individuals are referred to as Practitioner Evaluation (PE). The relevant
FTR literature is reviewed for theory and research applicable to PE. However, FTR developed pragmatically without
relation to underlying cognitive theory, and the literature consists primarily of case studies with a very limited
number of controlled experiments.
Other work on the evaluation of diagrams and graphs is also reviewed for possible theoretical models that could be
used in the current research. Human-Computer Interaction (HCI) is an Information Systems area that has drawn
extensively on cognitive science to develop and evaluate Graphical User Interfaces (GUIs). A brief overview of
cognitive-based approaches utilized in HCI is presented. One of these approaches, the Human Information
Processing System model, in which the human mind is treated as an information-processing system, provides the
cognitive theoretical model for this research and is discussed separately because of its importance. Work on
attention and the comprehension of graphics is also briefly reviewed.
Two further areas are identified as necessary for the development of the research task and tools: (1) types of
diagrammatic models and (2) types of software defects. Relevant work in each of these areas is briefly reviewed
and, since typologies appropriate to this research were not located, appropriate typologies are developed.
2.2 Formal Technical Review
Software review as a technique to detect software defects is not new -- it has been used since the earliest days of
programming. For example, Babbage and von Neumann regularly asked colleagues to examine their programs
[Freedman and Weinberg 1990], and in the 1950s and 1960s, large software projects often included some type of
software review [Knight and Myers 1993]. However, the first significant formalization of software review practice is
generally considered to be the development by Michael Fagan [1976] of a species of FTR that he called
"inspection."
Following Tjahjono [1996, 2], Formal Technical Review may be defined as any "evaluation technique that involves
the bringing together of a group of technical [and sometimes non-technical] personnel to analyze a software artifact,
typically with the goal of discovering errors or other anomalies." As such, FTR has the following distinguishing
characteristics:
1.
2.
3.
4.
Formal process.
Use of groups or teams. Most FTR techniques involve real groups, but nominal groups are used as well.
Review by knowledgeable individuals or practitioners.
Focus on detection of defects.
41
individual process not involving group dynamics, research in this area would be relevant but none applicable to the
current research was found.
It should be noted that Humphrey [1995] has developed a review method, called Personal Review (PR), which is
similar to desk checking. In PR, each programmer examines his own products to find as many defects as possible
utilizing a disciplined process in conjunction with Humphrey's Personal Software Process (PSP) to improve his own
work. The review strategy includes the use of checklists to guide the review process, review metrics to improve the
process, and defect causal analysis to prevent the same defects from recurring in the future. The approach taken in
developing the Personal Review process is an engineering one; no reference is made in Humphrey [1995] to
cognitive theory.
2. Peer Rating is a technique in which anonymous programs are evaluated in terms of their overall quality,
maintainability, extensibility, usability and clarity by selected programmers who have similar backgrounds [Myers
1979]. Shneiderman [1980] suggests that peer ratings of programs are productive, enjoyable, and nonthreatening experiences. The technique is often referred to as Peer Reviews [Shneiderman 1980], but some
authors use the term peer reviews for generic review methods involving peers [Paulk et al 1993; Humphrey
1989].
3. Walkthroughs are presentation reviews in which a review participant, usually the software author, narrates a
description of the software and the other members of the review group provide feedback throughout the
presentation [Freedman and Weinberg 1990; Gilb and Graham 1993]. It should be noted that the term
"walkthrough" has been used in the literature variously. Some authors unite it with "structured" and treat it as a
disciplined, formal review process [Myers 1979; Yourdon 1989; Adrion et al. 1982]. However, the literature
generally describes walkthrough as an undisciplined process without advance preparation on the part of
reviewers and with the meeting focus on education of participants [Fagan 1976].
4. Round-robin Review is a evaluation process in which a copy of the review materials is made available and
routed to each participant; the reviewers then write their comments/questions concerning the materials and
pass the materials with comments to another reviewer and to the moderator or author eventually [Hart 1982].
5. Inspection was developed by Fagan [1976, 1986] as a well-planned and well-defined group review process to
detect software defects defect repair occurs outside the scope of the process. The original Fagan Inspection
(FI) is the most cited review method in the literature and is the source for a variety of similar inspection
techniques [Tjahjono 1996]. Among the FI-derived techniques are Active Design Review [Parnas and Weiss
1987], Phased Inspection [Knight and Myers 1993], N-Fold Inspection [Schneider et al. 1992], and FTArm
[Tjahjono 1996]. Unlike the review techniques previously discussed, inspection is often used to control the
quality and productivity of the development process.
A Fagan Inspection consists of six well-defined phases:
i.
ii.
iii.
iv.
v.
vi.
Planning. Participants are selected and the materials to be reviewed are prepared and checked for review
suitability.
Overview. The author educates the participants about the review materials through a presentation.
Preparation. The participants learn the materials individually.
Meeting. The reader (a participant other than the author) narrates or paraphrases the review materials
statement by statement, and the other participants raise issues and questions. Questions continue on a
point only until an error is recognized or the item is deemed correct.
Rework. The author fixes the defects identified in the meeting.
Follow-up. The "corrected" products are reinspected.
42
individuals inspecting the artifact, generally involves greater scheduling problems [Ballman and Votta 1994],
and may make it more difficult for all participants to participate fully.
B. No vs. Single vs. Multiple Session Reviews. The traditional Fagan Inspection provided for one session to
inspect the software artifact, with the possibility of a follow-up session to inspect corrections. However,
variants have been suggested.
Humphrey [1989] comments that three-quarters of the errors found in well-run inspections are found during
preparation. Based on an economic analysis of a series of inspections at AT&T, Votta [1993] argues that
inspection meetings are generally not economic and should be replaced with depositions, where the author
and (optionally) the moderator meet separately with inspectors to collect their results.
On the other hand, some authors [Knight and Myers 1993; Schneider et al. 1992] have argued for multiple
sessions, conducted either in series or parallel. Gilb and Graham [1993] do not use multiple inspection
sessions but add a root cause analysis session immediately after the inspection meeting.
C. Nonsystematic vs. Systematic Defect-Detection Technique Reviews. The most frequently used
detection methods (ad hoc and checklist) rely on nonsystematic techniques, and reviewer responsibilities are
general and not differentiated for single session reviews [Siy 1996]. However, some methods employ more
prescriptive techniques, such as questionnaires [Parnas and Weiss 1987] and correctness proofs [Britcher
1988].
D.Single Site vs. Multiple Site Reviews. The traditional FTR techniques have assumed that the groupmeeting component would occur face-to-face at a single site. However, with improved telecommunications, and
especially with computer support (see item F below), it has become increasingly feasible to conduct even the
group meeting from multiple sites.
E. Synchronous vs. Asynchronous Reviews. The traditional FTR techniques have also assumed that the
group meeting component would occur in real-time; i.e., synchronously. However, some newer techniques
that eliminate the group meeting or are based on computer support utilize asynchronous reviews.
F. Manual vs. Computer-supported Reviews. In recent years, several computer supported review systems
have been developed [Brothers et al. 1990; Johnson and Tjahjono 1993; Gintell et al. 1993; Mashayekhi et
al 1994]. The type of support varies from simple augmentation of the manual practices [Brothers et al. 1990;
Gintell et al. 1993] to totally new review methods [Johnson and Tjahjono 1993].
2.2.2 Economic Analyses of Formal Technical Review
Wheeler et al. [1996], after reviewing a number of studies that support the economic benefit of FTR, conclude that
inspections reduce the number of defects throughout development, cause defects to be found earlier in the
development process where they are less expensive to correct, and uncover defects that would be difficult or
impossible to discover by testing. They also note "these benefits are not without their costs, however. Inspections
require an investment of approximately 15 percent of the total development cost early in the process [p. 11]."
In discussing overall economic effects, Wheeler et al. cite Fagan [1986] to the effect that investment in inspections
has been reported to yield a 25-to-35 percent overall increase in productivity. They also reproduce a graphical
analysis from Boehm [1987] that indicates inspections reduce total development cost by approximately 30%.
The Wheeler et al. [1996] analysis does not specify the relative value of Practitioner Evaluation to FTR, but two
recent economic analyses provide indications.
Votta [1993]. After analyzing data collected from 13 traditional inspections conducted at AT&T, Votta reports that
the approximately 4% increase in faults found at collection meetings (synergy) does not economically justify the
development delays caused by the need to schedule meetings and the additional developer time associated
with the actual meetings. He also argues that it is not cost-effective to use the collection meeting to reduce the
number of items incorrectly identified as defective prior to the meeting ("false positives"). Based on these
findings, he concludes that almost all inspection meetings requiring all reviewers to be present should be
replaced with Depositions, which are three person meetings with only the author, moderator, and one reviewer
present.
43
Siy [1996]. In his analysis of the factors driving inspection costs and benefits, Siy reports that changes in FTR
structural elements, such as group size, number of sessions, and coordination of multiple sessions, were largely
ineffective in improving the effectiveness of inspections. Instead, inputs into the process (reviewers and code
units) accounted for more outcome variation than structural factors. He concludes by stating "better techniques
by which reviewers detect defects, not better process structures, are the key to improving inspection
effectiveness [Abstract, p. 2]." (emphasis added)
Votta's analysis effectively attributes most of the economic benefit of FTR to PE, and Siy's explicitly states that
better PE techniques "are the key to improving inspection effectiveness." These findings, if supported by additional
research, would further support the contention that a better understanding of Practitioner Evaluation is necessary.
2.2.3 Psychological Aspects of FTR
Work on the psychological aspects of FTR can be categorized into four groups.
1.Egoless Programming. Gerald Weinberg [1971] began the examination of psychological issues associated with
software review in his work on egoless programming. According to Weinberg, programmers are often reluctant
to allow their programs to be read by other programmers because the programs are often considered to be an
extension of the self and errors discovered in the programs to be a challenge to one's self-image. Two
implications of this theory are as follows:
i. The ability of a programmer to find errors in his own work tends to be impaired since he tends to justify his
own actions, and it is therefore more effective to have other people check his work.
ii. Each programmer should detach himself from his own work. The work should be considered a public
property where other people can freely criticize, and thus, improve its quality; otherwise, one tends to
become defensive, and reluctant to expose one's own failures.
These two concepts have led to the justification of FTR groups, as well as the establishment of independent
quality assurance groups that specialize in finding software defects in many software organizations [Humphrey
1989].
2. Role of Management. Another psychological aspect of FTR that has been examined is the recording of data
and its dissemination to management. According to Dobbins [1987], this must be done in such a way that
individual programmers will not feel intimidated or threatened.
3. Positive Psychological Impacts. Hart [1982] observes that reviews can make one more careful in writing
programs (e.g., double checking code) in anticipation of having to present or share the programs with other
participants. Thus, errors are often eliminated even before the actual review sessions.
4.Group Process. Most FTR methods are implemented using small groups. Therefore, several key issues from
small group theory apply to FTR, such as group think (tendency to suppress dissent in the interests of group
harmony), group deviants (influence by minority), and domination of the group by a single member. Other key
issues include social facilitation (presence of others boosts one's performance) and social loafing (one member free
rides on the group's effort) [Myers 1990]. The issue of moderator domination in inspections is also documented in
the literature [Tjahjono 1996].
Perhaps the most interesting research from the perspective of the current study is that of Sauer et al. [2000].
This research is unusual in that it has an explicit theoretical basis and outlines a behaviorally motivated program
of research into the effectiveness of software development technical reviews. The finding that most of the
variation in effectiveness of software development technical reviews is the result of variations in expertise
among the participants provides additional motivation for developing a solid understanding of Formal Technical
Review at the individual level.
It should be noted that all of this work, while based on psychological theory, does not address the issue of how
practitioners actually evaluate software artifacts.
44
The language, concepts, and purposes of HCI are very similar to those of information systems, and it is
arguable that HCI is a part of information systems. (See, for example, the Huber [1983] and Robey [1983]
debate on cognitive style and DSS design.)
HCI is solidly rooted in psychology, a traditional information systems reference discipline.
Computer user-interfaces almost always have a visual component and are increasingly diagrammatic in
design.
User-interfaces can be and are evaluated in terms of the semantic error criteria described above; i.e.,
defects in functionality, performance, efficiency, etc.
Based on these facts, a decision was made to attempt to identify an HCI evaluation technique that could be
adapted for evaluation of software diagrammatic models.
2.3.2 Human-Computer Interaction
Human-computer interaction (HCI) has been defined as "the processes, dialogues . . . and actions that a user
employs to interact with a computer environment [Baecker and Buxton 1987, 40]."
2.3.2.1 HCI Evaluation Techniques
Mack and Nielsen [1994] identify eight usability inspection techniques:
1. Heuristic Evaluation. Heuristic evaluation is an informal method that involves having usability specialists judge
whether each dialogue element conforms to established usability principles or heuristics. Nielsen, the author of
the technique, recommends that evaluators go through the interface twice and notes that "[t]his two-pass
approach is similar in nature to the phased inspection method for code inspection (Knight and Myers 1993)
[Nielsen 1994, 29]."
2. Guideline Reviews. Guideline reviews are inspections where an interface is checked for conformance with a
comprehensive list of guidelines. Nielsen and Mack note that "since guideline documents contain on the order
45
of 1,000 guidelines, guideline reviews require a high degree of expertise and are fairly rare in practice [Nielsen
and Mack 1994, 5]."
3. Pluralistic Walkthroughs. A pluralistic walkthrough is a meeting in which users, developers, and human factors
experts step through a scenario, discussing usability issues associated with dialogue elements involved in the
scenario steps.
4. Consistency Inspections. Consistency inspections have designers representing multiple projects inspect an
interface to see whether it consistent with other interfaces in the "family" of products.
5. Standards Inspections. In a standards inspection, an expert on some interface standard checks the interface
for compliance with that standard.
6. Cognitive Walkthroughs. Cognitive walkthroughs use an explicitly detailed procedure to simulate a user's
problem-solving process at each step in the human-computer dialog, checking to see if the simulated user's
goals and memory for actions can be assumed to lead to the next correct action.
7. Formal Usability Inspections. Formal usability inspections are designed to be very similar to the Fagan
Inspection used in code reviews.
8. Feature Inspections. In feature inspections the focus is on the functionality provided by the software system
being inspected; i.e., whether the function as designed meets the needs of the intended end users.
These HCI evaluation techniques are clearly similar to FTR in that they involve the use of knowledgeable individuals
to detect defects in a software artifact; most also involve a formal process and a group.
2.3.2.2 Cognitive Psychology and HCI
To assist in the design of better dialogues, HCI researchers have attempted to apply the findings of cognitive
psychology since, all other factors being equal, an interface that requires less short-term memory resources or can
be manipulated more quickly because fewer cognitive steps are required should be superior. The following is a brief
overview of cognitive-based approaches utilized in HCI.
Human Information Processing System (HIPS). During the 1960s and 1970s, the main paradigm in cognitive
psychology was to characterize humans as information processors that processed information much like a
computer. While some of the assumptions of the original model proved to be overly restrictive and other
approaches have become popular, updated HIPS models continue to be useful for HCI research. Given the
importance of this model for this research, a more complete treatment is provided in Section 2.4.1 below.
Computational approaches also adopt the computer metaphor as a theoretical framework but conceptualize
the cognitive system in terms of the goals, planning, and action involved in task performance. Tasks are
analyzed not in terms of the amount of information processed in the various stages but in terms of how the
system deals with new information [Preece et al. 1994].
Connectionist approaches simulate behavior through neural network or Parallel Distributed Processing (PDP)
models in which cognition is represented as a web of interconnected nodes. Connectionist models have
become increasingly accepted in cognitive psychology [Ashcraft 1994], and this fact has been reflected in HCI
research [Preece et al. 1994].
Human Factors/Actors. Bannon [1991, 28] argues that the term human factors should be replaced with the
term human actors to indicate "emphasis is placed on the person as an autonomous agent that has the capacity
to regulate and coordinate his or her behavior, rather than being a simple passive element in a human-machine
system." The change is supposed to facilitate focusing on the way people act in real work settings instead of
viewing them as information processors.
Distributed Cognition. An emerging theoretical framework is distributed cognition. The goal of distributed
cognition is to conceptualize cognitive activities as embodied and situated within the work context in which they
occur [Hutchins 1990; Hutchins and Klausen 1992].
The human factors/actors and distributed cognition models are not appropriate to the current study. The
connectionist models show great promise but are not yet sufficiently developed to be useful for this research. The
46
information processor models are however appropriate and sufficiently mature; they provide the primary cognitive
theoretical base for the dissertation. Computational approaches are also utilized in that the study analyzes the
cognitive system in terms of the task planning involved in task performance.
2.4 Human Information Processing System (HIPS) Models and Related Topics
2.4.1 General Model
One of the major paradigms in cognitive science is the Human Information Processing System model. In this model,
humans are characterized as information processors, in which information enters the mind, is processed in a series
of ordered stages, and then exits [Preece et al. 1994]. Figure 2.1 summarizes one version of the basic model
[Barber 1988].
Figure 2.1 Human Information Processing Stages (adapted from Barber [1988])
An early attempt to apply the model was Card et al.'s The Psychology of Human-Computer Interaction [1983]. In
that work, the authors stated that the human mind is also an information-processing system and developed a
simplified model of it that they called the Model Human Processor. Based on this model, they made predictions
about the usability of various user interfaces, performed experiments, and reported their findings. The results were
equivocal, and subsequent cognitive psychology research has shown that the serial stage approach to cognition of
the original model is overly simplistic.
The original model also did not include memory and attention. Later versions do include these processes, and
Cowan [1995], in his exhaustive examination of the intersection of memory and attention, discusses a number of
these. Figure 2.2 summarizes a model that does include memory and attention [Barber 1988].
Figure 2.2 Extended Stages of the Information Processing Model (adapted from Barber [1988])
47
HIPS models, such as Anderson's ACT-R [1993], continue to be developed and are useful. Further, the information
processing approach has recently been described as the primary metatheory of cognitive psychology [Ashcraft
1994].
2.4.2 Coping with Attention as a Limited Resource
One of the earliest psychological definitions of attention is that of William James [1890, vol. 1, 403-404]:
Everyone knows what attention is. It is the taking possession of the mind, in clear and vivid form, of one
out of what seem several simultaneously possible objects or trains of thought. Focalization,
concentration of consciousness are of its essence. It implies withdrawal from some things in order to
deal more effectively with others . . . (emphasis added)
This appeal to intuition explicitly states that attention is a limited resource.
In reaction to the introspection methodology of James, the Behaviorist movement asserted that the study of internal
representations and processes was unscientific. Since behaviorists dominated American psychological thought
during the first half of the Twentieth Century, little or no work was done on attention in America during this period. In
Europe, Gestalt psychology became dominant at this time and that school, while not actively hostile to attention
studies, did not encourage work in the area. World War II however led to a rethinking of psychological approaches
and acceptance of using the experimental techniques developed by the behaviorists to study internal states and
processes [Cowan 1995].
An example of this rethinking is the work of Broadbent [1952] and Cherry [1953]. They used a technique to study
attention in which different spoken messages are presented to a subject's two ears at the same time. Their research
shows that subjects are able to attend to one message if the messages are distinguished by physical (rather than
merely semantic) cues, but recall almost nothing of the nonattended channel. In 1956, Miller reviewed a series of
experiments that utilized a different methodology and noted that, across many domains, subjects could keep in mind
no more than about seven "chunks" simultaneously. These findings were among the first experimental evidence that
attentional capacity is a limited resource.
More recent experimental work continues to indicate that attention is a limited resource [Cowan 1995]. Even
those cognitive psychologists who have recently challenged the very concept of attention assume their
"attention" analog is limited. One example of this would be Allport [1980] and Wickens [1984], who argue that
the concept of attention should be replaced with the concept of multiple limited processing resources.
Based on an examination of the exhaustive review by Cowan [1995] of the intersection of memory and attention, the
Shiffrin [1988, 739] definition appears to be representative of contemporary thought:
Attention has been used to refer to all those aspects of human cognition that the subject can control . . .
and to all aspects of cognition having to do with limited resources or capacity, and methods of dealing
with such constraints. (emphasis added)
Since human cognitive resources are limited, cognitively complex tasks may overload these resources and
decrease the quality and/or quantity of outputs. Various approaches to measuring the cognitive complexity of tasks
have been developed. In HCI, an informal view of complexity is often utilized. For example, Grant [1990, sec. 1.3]
defines a complex task as one for which there are a large number of potential practical strategies. This definitions is not
inconsistent with the measure assumed by Simon [1962] in his paper on the use of hierarchical decomposition to
decrease the complexity of problemsolving.
Simon [1990] argues that humans develop mechanisms to enable them to deal with complex, real-life situations
despite their limited cognitive resources. One such mechanism is task planning. According to Fredericksen and
Breuleaux [1990], task planning is a cognitive bargain in which the time and effort spent working with an abstract,
and therefore, smaller problem space during planning minimizes actual work on the task in the original, detailed
problem space.
Earley and Perry [1987, 279] define a task plan as "a cognitively based routine for attaining a particular objective
and consists of multiple steps." Newell and Simon [1972] identify planning from verbal protocols as those passages
in which:
48
Diagrams can group together all information that is used together, thus avoiding large amounts of search for the
elements needed to make a problem-solving inference.
Diagrams typically use location to group information about a single element, avoiding the need to match
symbolic labels.
Diagrams automatically support a large number of perceptual inferences, which are extremely easy for humans.
49
Figure 2.3. Winn [1994] Processes Involved in the Perception and Comprehension of Graphics
Zhang [1997] proposes a theoretical framework for external representation based problem solving. In an experiment
she conducted using a Tic-Tac-Toe board and its logical isomorphs, the results show that Tic-Tac-Toe behavior is
determined by the configuration of the board. External representations are thus shown to be more than just memory
aids and a representational determinism is suggested. This last point is particularly relevant to this dissertation since
it states that the form of representation determines what information can be perceived in a diagram.
2.6 Types of Diagrammatic Models
Selection of diagrammatic models to be included in the research task requires an appropriate typology. Two
diagrammatic model typologies were examined, Wieringa [1998] and Visible Systems [1999].
2.6.1 Wieringa 1998
Wieringa, in his discussion of graphical structures or models that may be used in software specification techniques,
lists four general classes:
1. Decomposition Specification Techniques. These represent the conceptual structure of data in a database
system. Examples include Entity-Relationship Diagrams (ERDs) and such ERD extensions as OO class
diagrams.
2. Communication Specification Techniques. These show how the conceptual components interact to realize
external system interactions. Examples include Dataflow Diagrams (DFDs), Context Diagrams, SADT Activity
Diagrams, Object Communication Diagrams, SDL Block Diagrams, Sequence Diagrams, and Collaboration
Diagrams.
3. Function Specification Techniques. These specify the external functions of a system or the functions of
system components. Examples Function Refinement Trees, Event-Response Specifications, and Use Case
Diagrams.
50
4. Behavior Specification Techniques. These show how functions of a system or its components are ordered in
time. Examples include Process Graphs, JSD Process Structure Diagrams, Finite (and Extended Finite) State
Diagrams, Mealy Machines, Moore Machines, Statecharts, and Process Dependency Diagrams.
2.6.2 Visible Systems
The methods listing in Visible Systems [1999] was examined as a representative of practitioner-oriented, CASEtools-based typologies. Seven models are listed; of these, six are diagrammatic in nature.
1. Functional Decomposition Model. Shows the business functions and the processes they support drawn in a
hierarchical structure; also known as the Business Model. This type of model is of a high-level functional nature
and specifically applies to functions and not to the data that those functions use. It is generally appropriate for
defining the overall functioning of an enterprise, not for individual projects.
2. Data Model. Shows the data entities of an application and the relationships between the entities. Entities and
relationships can be selected in subsets to produce views of the data model. The diagramming technique
normally used to depict graphically the data model is the Entity Relationship Diagram (ERD) and the model is
sometimes referred to as the Entity-Relationship Model.
3. Process Model. Shows how things occur in the organization via a sequence of processes, actions, stores,
inputs and outputs. Processes are decomposed into more detail, producing a layered hierarchical structure. The
diagramming technique used for process modeling in structured analysis is the Data Flow Diagram (DFD).
Several notations are available for representing process modeling, with the most widely used being
Yourdon/DeMarco and Gane & Sarson.
4. Product Model. Shows a hierarchical, top-down design map of how the application is to be programmed, built,
integrated, and tested. The modeling technique used in structured design is the structure chart. It is a tree or
hierarchical diagram that defines the overall architecture of a program or system by showing the program
modules and their interrelationships.
5. State Transition Model (Real Time Model). Shows how objects transition to and from various states or
conditions and the events or triggers that cause them to change between the different states.
6. Object Class Model. Shows classes of objects, subclasses, aggregations and inheritance and defines
structures and packaging of data for an application.
2.6.3 Evaluation of Typologies in Prior Work
In evaluating these two typologies for this research, two problems were noted:
1.Neither classification scheme includes diagrammatic representations of Graphical User Interfaces (GUIs). While
such representations are not technically graphs (and thus not discussed by Wieringa) and are not listed in Visible
Systems, they may be used to specify parts of a system and are therefore appropriate to this research.
2. Wieringa's work is based on the theoretical characteristics of graphs while Visible Analyst is representative of
practitioner-oriented, CASE-tool-based typologies. Neither is appropriate to the research of this dissertation
since neither captures factors likely to affect the cognitive processing of practitioners in evaluating software
diagrammatic models.
While it would be relatively easy to add diagrammatic representations of GUIs to Wieringa or Visible Analyst, it was
concluded that the second problem disqualified them for the purposes of this research. Further review of several
leading systems analysis and design texts [Fertuck 1995; Hoffer et al. 1998; Kendall and Kendall 1995] did not yield
an appropriate typology of diagrammatic models, and it was therefore deemed necessary to develop one
specifically for this dissertation.
2.6.4 Diagrammatic Model Typology Development
The first step in the development process was to consult several systems analysis and design and structured
techniques texts for classification insights and to derive lists of commonly used diagrammatic models. These
included Fertuck [1995], Hoffer et al. [1998], Kendall and Kendall [1995], and Martin and McClure [1985].
51
Martin and McClure make a major distinction between hierarchical diagrams (i.e., those having one overall node or
root and which do not remerge) and mesh or network diagrams (i.e., those not having a single overall node or root
or which do remerge). For the purposes of this research, this distinction is operationalized as the categorical
variable hierarchical/not hierarchical.
HIERARCHICAL
DIRECTIONAL
DATA
I
HYBRID
II
NOT DIRECTIONAL
NOT HIERARCHICAL
DIRECTIONAL
Functional Functional
Decomposi- Decomposition II
tion I
Data Flow
Structure
Charts
Data
Navigation
HIPO
(Detail)
UML
Sequence
WarnierOrr
(Data)
Michael
Jackson
DataStructur
e
Michael
Jackson
System
Network
Warnier-Orr
(Process)
UML
Collaboratio
n
Michael
Jackson
ProgramStructure
UML State
NassiShneiderman
Charts
UML
Activity
DATA
X
Data
Analysis
Flow
Charts
HIPO
HIPO
(Overview) (VTC)
NOT DIRECTIONAL
HYBRI PROCES
D
S
XI
XII
Typical
GUI
EntityUML
Relationshi Use Case
p
Inverted-L UML
Class
Action II
Action I
Martin and McClure also make a major distinction between diagrams showing sequence and those that do not.
Sequence usually implies temporal directionality; for this dissertation, the distinction is broadened to include the
possibility of logical and other forms of directionality and is operationalized as the categorical variable directional/not
directional.
A distinction found in all texts referenced is between data-oriented and process-oriented diagrams. Inspection of
diagram types shows that the distinction is actually a data/process orientation continuum. For the purposes of this
dissertation, this continuum is collapsed into the categorical variable data/hybrid/process oriented.
As a test of the feasibility of the classification scheme, twenty diagram types from Martin and McClure, UML
diagrams from Harmon and Watson [1998], and a model of a "typical" GUI were then categorized. The results of this
categorization are shown in table 2.1.
Table 2.1 Diagrammatic Model Types
52
Inspection of table 2.1 shows that only seven of the twelve (2 x 2 x 3) possible categories are actually populated.
Table 2.2 shows the categorization of the diagram types after collapsing unpopulated categories.
HIERARCHICAL
NOT HIERARCHICAL
DIRECTIONAL
DATA
I
HYBRID
II
DIRECTIONAL
PROCESS
III
Functional
Functional
Decomposition II Decomposition I
Structure Charts
Data Flow
HIPO
(Overview)
Data
Navigation
HIPO
(VTC)
HIPO
(Detail)
Warnier-Orr
(Data)
Michael
Jackson
DataStructure
HYBRID
VIII
PROCESS
IX
NOT DIRECTIONAL
DATA
X
HYBRID
XI
EntityRelationship
UML Use
Case
Inverted-L
UML Class
Table
2.2
Diagrammatic
Model Types
(Collapsed)
UML Sequence
Warnier-Orr
(Process)
UML
Collaboration
UML State
UML Activity
53
other qualities necessary for the purposes of the system. In other words, software defects are defined in terms of
missing qualities. Other research reviewed is not inconsistent with this approach. For example, Boehm et al. [1978]
and Bass et al. [1998] develop typologies of software qualities, and the definition in Grady [1992, 122] of a defect as
"any flaw in the specification, design, or implementation of a product" inherently includes software qualities.
Therefore, the primary focus of the first section below is on typologies of software qualities. The second section
reviews other software defect typologies, and the third section discusses the development of the typology used in
this research.
2.7.1 Software Quality Typologies
An interesting early software qualities typology is the Software Quality Characteristics Tree (SQCT) of Boehm
et al. [1978]. The SQCT is a hierarchical scheme in which the highest-level construct, General Utility, is
determined by two second-level constructs, As-Is Utility and Maintainability, and one third-level construct,
Portability. The second-level constructs are each in turn determined by three other third-level constructs,
Reliability, Efficiency, and Human Engineering and Testability, Understandability, and Modifiability
respectively. The third-level constructs are determined by various combinations of twelve primitive
characteristics (Device Independence, Completeness, Accuracy, Consistency, Device Efficiency, Accessibility,
Communicativeness, Structuredness, Self-Descriptiveness, Conciseness, Legibility, and Augmentability),
which are strongly differentiated with respect to each other.
54
Figure 2.4 Boehm et al. [1978] Software Quality Characteristics Tree (adapted)
The Grady [1992] software defect model is shown below in figure 2.5. It is also a hierarchical model (with the root at
the bottom) that classifies defects according to origin, type, and mode. Grady describes six types of software
defects that correspond to the five modes plus a residual "Other" category:
55
1. Specifications/Requirements Defect. A mistake in the definition of the customer/target needs for a system or
system component. Such mistakes can be in functional requirements, performance requirements, test
requirements, development standards, and so on.
2. Design Defect. A mistake in the design of a system or system component. Such mistakes can be in algorithms,
control logic, data structures, database access, input/output formats, interface descriptions, and so on.
3. Code Defect. A mistake in the implementation of a computer program. Such mistakes can be in product or test
code, JCL, build files, and so on.
4. Documentation Defect. A mistake in any non-code product material delivered to a customer. Such mistakes
can be in user manuals, installation instructions, data sheets, product demos, and so on. Mistakes in
requirements specification documents, design documents, or code listings are assumed to be specification
defects, design defects, and coding defects, respectively.
5. Environmental Support Defect. Defects that arise as a result of the system development and/or testing
environment. Such mistakes can be in the build/configuration process, the development/integration tools, the
testing environment, and so on.
6. Other.
56
Extending or changing capabilities. This category includes corrective maintenance and extensibility.
Deleting unwanted capabilities.
Adapting to new operating environments.
Restructuring.
7. Portability (NDR) is the ability of a system to run under different computing environments.
8. Reusability (NDR) relates to the design of a system so that the system's structure or some of its components
can be reused again in future applications. Bass et al. [1998, 84] note that "Reusability is actually a special case
of modifiability..."
9. Integrability (NDR) is the ability to make the separately developed components of the system work correctly
together.
10. Software testability (NDR) refers to the ease with which software can be made to demonstrate its faults through
(typically execution-based) testing.
This research uses Bass et al. [1998] as the basis for the qualities dimension of the software defects typology.
2.7.2 Other Defect Dimensions
Review of the literature yields three other dimensions for the classification of software defects.
57
2.7.2.1 Class
Class refers to whether the defect is the result of logic or other required structure's being missing (M), incorrect (I),
or extra (E) [Ebenau and Strauss 1994].
While extra functionality may increase storage requirements or otherwise decrease efficiency, the impact on
functionality is generally less severe than that caused by the other two types.
2.7.2.2 Severity
The defect severity categories generally listed are major (J), minor (N), and (sometimes) trivial (T) [Ebenau and
Strauss 1994; Gilb and Graham 1993; Kelly et al. 1992].
A major defect is defined as one "that is expected to cause product failure, departure from specifications, or prevent
further correct development of the product[Ebenau and Strauss 1994, 92]." A minor defect is defined as one "that
reduces the effectiveness, or confuses a product's representation, format, or development process characteristics,
but is not expected to impact the operation or further development of the product [p. 92 ]."
2.7.2.3 Cause
Humphrey [1995], following Gale [1990], lists five categories of basic defect causes:
1. Education. You did not understand how to do something.
2. Communication. You were not properly informed about something.
3. Oversight. You omitted doing something.
4. Transcription. You knew what to do but made a mistake in doing it.
5. Process. Your process somehow misdirected your actions.
2.7.3 Development of the Defect Typology
The four dimensions discussed above produce a four-dimensional defect space. However, examination shows that
dimensional simplification is appropriate.
1. Defect cause cannot be determined directly from examination of software diagrammatic models.
2. Defect severity is defined in terms of impact on system functionality. Given that functionality is a type of
technical quality, a separate dimension would be redundant.
Further simplification is achieved by ignoring extra functionality defects of the class dimension. The rationale for this
reduction is that, while defects associated with extra functionality may increase storage requirements or otherwise
decrease efficiency, the impact on functionality is generally less severe than that caused by missing and incorrect
defects.
Change is also necessary on the qualities dimension. Six of the Bass et al. [1998] qualities are not readily
discernable from diagrammatic models and are consequently not appropriate to the typology. However, according to
Boehm et al. [1978], the primitive quality Structuredness partially determines three of the six. Similarly, Fenton and
Neil [2001] lists Structuredness as an internal attribute associated with the external attributes reliability (or
availability), maintainability, and reusability. The six non-discernable qualities are listed below. A B indicates a Boehm
quality; an F indicates a Fenton attribute.
Availability F
Maintainability B,F
Portability
Reusability B,F
Integrability
Testability B
Since Structuredness is associated with four of the six non-discernable qualities and is readily discernable from a
diagrammatic model, it is substituted as a partial proxy.
58
During the early development of the research task, several subjects noted that the scope of the diagrammatic
models was not consistent. From a theoretical perspective, lack of Scope Consistency is an instance of a general
consistency problem. In the structured approach to IS development, data and process models are supposed to
model the same system but are fundamentally separate. This separateness leads to multiple problems including
lack of consistency [Repa 2001]. Consideration was given to adding the broader quality consistency to the topology,
but this was rejected because (1) some subjects perceived lack of Scope Consistency to be a separate issue and
(2) lack of Scope Consistency is different in that it can generally be readily discerned by comparing data and
process models, while other consistency problems are apparent only after significant functional analysis. Lack of
Scope Consistency would be expected to impact negatively on the integrability and maintainability of the specified
system
The resulting matrix is a two-dimensional defect space based on quality affected and class. It should be noted that
Scope Consistency and Structuredness are treated as logical variables; the quality is either present or missing.
Table 2.3 shows the resulting matrix.
Usability
Security
Performance
Functionality
Structuredness
QUALITY
Scope Consistency
CLASS
Missing
Incorrect
59
Usability
Security
Functionality
Performance
Structuredness
StrC2
ERD6
GUI7
W-O P3
HierarchicalDirectionalProcess (III)
DFD4
HierarchicalDirectionalHybrid (II)
FlowC5
HierarchicalDirectionalData (I)
M
W-O D1
MODEL
Scope Consistency
QUALITY
NOTES
M = missing
I = incorrect
Typical Diagram for Each Category
1
2
3
4
5
6
7
60
cannot inform this research effort. The first part of the literature review therefore provides context rather than
explicating applicable theory.
Three techniques from non-information systems disciplines for evaluating visual artifacts conveying meaning are
evaluated. While work on the evaluation of human-computer interaction (HCI) approaches proves not to be directly
applicable to this research, one of the HCI paradigms, the Human Information Processing System (HIPS) model, is
found to be relevant. The HIPS model is reviewed, as is cognitive science work on attention and the comprehension
of graphics.
Two other areas are identified as necessary for the development of the research task and tools: (1) types of
diagrammatic models and (2) types of software defects. The literature is reviewed and new typologies are
developed.
61