Assessment Through Metrics and Beyond - Handout

Software assessment
through metrics
and beyond
Tudor Gîrba
www.tudorgirba.com
Software assessment
through metrics
and beyond
Forward Engineering is the traditional process of moving from high- level abstractions and logical,
implementation-independent designs to the physical implementation of a system.
fo
rw
ar
d
en
gi
ne
er
in
g
{ {
{ {
} }
} }
In most projects, the actual development happens only at the code level, with only little documentation
maintenance. In the beginning, the code looks just like the blueprint says. However, after a couple of
years, the system is not in a tidy situation anymore.
fo
rw
ar
d
en
gi
ne
er
in
g
{ {
{ { { {
} { {
}
} actual development } }
} { } } }
Software systems are large and complex, and they manipulate large amounts of data. Thus,
approaching their understanding in an ad-hoc manner does not service us.
For example, when it comes to understanding code, reading is still the prevalent technique. However,
this does not scale. To put it in perspective, a person that reads one line in two seconds would require
approximately one month of work to read a quarter of a million lines of code. And this is just for reading
the code.
Informed decisions need to be taken based on reality, and not on what we think reality is. Thus, we need
means to understand reality as accurate as possible, and as fast as possible.
Reverse Engineering is the process of analyzing a subject system to identify the systemʼs components
and their interrelationships and create representations of the system in another form or at a higher level
fo
rw
g
rin
ar
of abstraction.
ee
d
en
in
ng
gi
ne
ee
er
rs
in
ve
Elliot Chikofsky and James Cross II, “Reverse Engineering and Design Recovery: A Taxonomy,” IEEE
g
re
{ { { {
Software, vol. 7, no. 1, January 1990, pp. 13—17.
{ {
} { {
}
} actual development } }
} { } } }
Tool vendors will tend to promote reverse engineering as a technical problem that can be solved by a
smart tool.
But, is this enough? Let's take a closer look.

engineering
reverse
{ {
{ {
}
}
}
} { }
In fact, reverse engineering only goes half way. The result of a reverse engineering tool is just another
piece of information. A higher-level one, but just a piece of information nevertheless.
engineering
reverse
{ {
{ {
}
}
}
} { }
The goal of an assessment is to support decision making by answering two questions:

- What is the current situation?
assessment
What is the current situation?

- What can we do about it?
What can we do about it?
To achieve it, we need to interpret the available information ourselves. Assessment is a human activity,
not a tool issue. To assess large amounts of data, we do need tools, but the interpreting is the crucial
part.
engineering
reverse
{
{
{ {
}
}
}
} { }
Various studies report assessment to account for as much as 50% of the total development effort.
Approaching it ad-hoc does service anybody.
Assessment is an important discipline and it should be addressed explicitly in the development process.
assessment is a discipline
Software assessment
through metrics
and beyond
When you can measure what you are talking about and express it in numbers, you know something
about it;
but when you cannot measure, when you cannot express it in numbers, your knowledge is of a meagre
and unsatisfactory kind: it may be the beginning of knowledge, but you have scarcely, in your thoughts,
You cannot control what advanced to the stage of science.
you cannot measure
arco
Tom deM
Software metrics are now widespread. Various tools exist that offer a large range of metrics.
However, in practice, their use for decision making is still limited. Why is that?
First, what is a software metric? Actually, the correct term is measure, but in practice, software metrics is
treated as a synonim.
Metrics are functions that assign numbers to

products, processes and resources.
To be precise, the definition of the measure must specify:

- domain: do we measure people’s height or width?
2‘521‘127
- range: do we measure height in centimeters or inches?
metric - mapping rules: do we allow shoes to be worn?
{ {
{ {
}
}
}
} { }
metric 12‘956 What can be measured? Anything:
metric
Internal vs. external attributes
74 Direct vs. indirect measures
System size, complexity, cohesion, coupling
Static vs. dynamic vs. historical characteristics
etc.
T.J. McCabe. A Measure of Complexity. In IEEE Transactions on Software Engineering 2(4) p. 308—
320, December 1976.
Cyclomatic complexity (CYCLO)
counts the number of independent
paths through the code of a function
 it reveals the minimum number of tests to write
 interpretation can’t directly lead to improvement action
Shyam R. Chidamber and Chris F. Kemerer. A Metrics Suite for Object Oriented Design. In IEEE
Transactions on Software Engineering 20(6) p. 476—493, June 1994.
Weighted Method Count (WMC)
sums up the complexity of class’
methods (usually in terms of CYCLO)
 it is configurable, thus adaptable to our precise needs
 interpretation can’t directly lead to improvement action
Depth of Inheritance Tree (DIT)
is the (maximum) depth level
of a class in a class hierarchy
 inheritance is measured
 only the potential and not the real impact is quantified

Coupling between objects (CBO)
shows the number of classes from
which methods or attributes are used
 it takes into account real dependencies not just declared ones
 no differentiation of types and/or intensity of coupling
J.M. Bieman and B.K. Kang. Cohesion and Reuse in an Object-Oriented System. In Proceedings ACM
Symposium on Software Reusability, April 1995.
Tight Class Cohesion (TCC)
counts the relative number of
method-pairs that access
attributes of the class in common TCC = 2 / 10 = 0.2
 interpretation can lead to improvement action
 ratio values allow comparison between systems
In general, tools offer a ton of metrics, typically in the form of a table. The problem is what to do with
Metric Value them?
LOC 35175
NOM 3618
NOC 384
CYCLO 5579
NOP 19
CALLS 15128
FANOUT 8590
AHH 0.12
ANDC 0.31
and now what?

What should we measure and what to do with a measurement?
The Goal-Question-Metric paradigm offers a framework to guide the use of metrics:

- Set the goals
Goal Question Metric - From each goal, derive a set of questions
- For each question, decide which metric is required to answer it
V. Basili and D. Rombach. The TAME project: Towards Improvement-Oriented Software Environments.
In IEEE Transactions on Software Engineering 14(6), June 1988.
To be interpreted we need to relate the numbers to a system of reference. This is achieved through
thresholds.
Metrics alone have no interpretation
Michele Lanza and Radu Marinescu. Object-Oriented Metrics in Practice, Springer-Verlag, 2006.
The book provides a pragmatic approach to using metrics for quantifying and judging object-oriented
design.
The Overview Pyramid offers a metrics overview of the system.
- NOP: Number of Packages

Inheritance - NOC: Number of Classes
ANDC
AHH
0.31
0.12
- NOM: Number of Methods
20.21 NOP 19 - LOC: Number of Lines of Code
9.42 NOC 384
9.72 NOM 3618 NOM 418 - CYCLO: Cyclomatic complexity
0.15 LOC 35175 15128 CALLS 0.56
CYCLO 5579 8590 FANOUT - CALLS: Number of Operation Calls (invocations)
Size Communication - FANOUT: Number of Called Classes
- ANDC: Average Number of Derived Classes
- AHH: Average Hierarchy Height
Colors denote quantitative, not qualitative characteristics. The thresholds were determined statistically
by studying a large amount of systems. This provides a good basis for comparing systems.
close to high close to average close to low
Quality models have long been proposed as a means to aggregate metrics in a high level overview.
Classic models, such as the Factor-Criteria-Metrics one proposed by McCall in 1977, are typically
decomposed in a tree with metrics providing the leaf-values and then being aggregated towards the root
by means of thresholds.
The strong point of these models is that they offer one concise answer to the state-of-affairs. However,
what good is the answer for?
1977
McCall,
Problem 1: the granularity of metrics is too fine grained
- they capture symptoms, not causes of problems in isolation,
- they donʼt lead to solutions for improvement
?
Problem 2: decomposing the problem in metrics implies an implicit mapping
- we donʼt reason in terms of metrics, but in terms of principles
Detection Strategies are metric-based queries

to detect design flaws
Rule 1
M E TRIC 1 > Threshold 1
AN D Quality problem
Rule 2
M E TRIC 2 < Threshold 2
E.g.: a God Class centralizes too much intelligence

ATFD: Access to Foreign Data, counts distinct attributes accessed from other classes
WMC: Weighted Method Count
TCC: Tight Class Cohesion
Class uses directly more than a
few attributes of other classes
ATFD > FEW Michele Lanza and Radu Marinescu. Object-Oriented Metrics in Practice, Springer-Verlag, 2006.
Functional complexity of the

class is very high
AND GodClass
WMC ! VERY HIGH
Class cohesion is low
TCC < ONE THIRD

Shotgun
Surgery has
Detection strategies provide a suitable vocabulary to reason about design problems. They both describe
uses is the problem in terms that engineers can relate to, and provide a path to action.
has (partial) Feature Data
Envy uses Class
God has
is partially Michele Lanza and Radu Marinescu. Object-Oriented Metrics in Practice, Springer-Verlag, 2006.
Intensive Class
Coupling Brain has
has
Method
Extensive Brain has Significant
Coupling Class Duplication
has
is
is
has
Refused
is Tradition
Parent
Breaker
Bequest
has (subclass)
Futile
Hierarchy
Identity Collaboration Classification

Disharmonies Disharmonies Disharmonies
There is much more to be said about metrics, but the essence is:
- use metrics for a goal, not for producing nice looking charts
- there is no magic metric
Software assessment - interpret them in concert and in context
through metrics
and beyond
Metrics is but a component of software analysis. Queries or visualizations are also important tools.
McCabe = 21
NOM =
102 =
75
3,0
00
ses
selec
t: #
isGod ... Furthermore, there isn't such a thing as a perfect analysis. Various analysis have various usages. Thus,
LO
C clas a tool should not strive to provide magic answers, but instead it should allow the analyst to be in charge
with deciding what is important at every step of the analysis process.
In particular, the analyst should always be informed on the decisions taken by the analysis algorithm,
and he should be allowed to combine and relate multiple results.
{
{
{ {
}
}
}
} { }
For example, this visualization, called System Complexity, shows class hierarchies. Furthermore, for
each class, the corresponding node shows three metrics:
- the height shows the number of methods in the class
- the width shows the number of attributes
- the color shows the number of lines of code
This visualization can provide a map of the software system (in this case showing approx 2000 classes).
If we reveal on this map the classes suspected of being GodClasses, we have a better understanding of
the overall problem.
What is an analysis in general?
Websterʼs definition of analysis:

- Detailed examination of the elements or structure of something, typically as a basis for discussion or
{
{
{
{
}
interpretation.
} analysis
}
} { } - The process of separating something into its constituent elements. Often contrasted with synthesis.
If you want to be able to interpret the output of an analysis, you need to control both the input and the
decisions taken during the analysis.
analysis
control to interpret
Here is a small example: How many methods are there in this class?
public class Library {
List books;
public Library() {…}
public void addBook(Book b) {…}
public void removeBook(Book b) {…}
private boolean hasBook(Book b) {…}
protected List getBooks() {…}
protected void setBooks(List books) {…}
public boolean equals(…) {…}
}
NOM = ?
7
List books;
}
NOM = 7
But, is a constructor a method? If the metric computation does not consider it as a method, we get 6
instead of 7.
List books;
}
NOM = 7 6
What about setters and getters? Are they to be considered as methods? If no, we have only 4.
List books;
}
NOM = 7 6 4
Do we count the private methods as well? Perhaps the metrics is just about the public ones. In this
case, we actually have only 3 methods.
List books;
}
NOM = 7 6 4 3
equals() is a method expected by Java, so we might as well not consider it a real method.
List books;
}
NOM = 7 6 4 3 2
So how many methods are there? All these are valid answers depending on what we understand by the
question.
List books;
Now, if we turn the situation around, and you get a report that says a class has 70 methods. What does
public void removeBook(Book b) {…} it mean? You have to know what the actual computation does.
protected void setBooks(List books) {…} And this is a simple metric.
}
NOM = 7, 6, 4, 3, 2 ?
ibility
u r respons
yo
The best feedback is the one that is continuous and that is contextual.
It is important for feedback to be continuous so that we can get an idea of what recent actions provoked
the unwanted situation.
assessment is a discipline
It is important for feedback to be contextual, because only when feedback is related to the details of our
situation can we easily map it to action.
Moose is an extensive platform for software and data analysis. Its main goal is to assist and enable a
human in the process of assessment.
To this end, it offers multiple services covering metrics, queries, visualizations, data mining, duplication
detection etc, it handles multiple languages and it enables the analyst to build custom dedicated tools.
More information can be found at:

- http://moosetechnology.org
- http://themoosebook.org
echnol ogy.org
mooset
Software assessment
through metrics
and beyond
Tudor Gîrba
www.tudorgirba.com
Tudor Gîrba
www.tudorgirba.com
creativecommons.org/licenses/by/3.0/

Assessment Through Metrics and Beyond - Handout

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Assessment Through Metrics and Beyond - Handout

Încărcat de

Drepturi de autor:

Formate disponibile

Software assessment

But, is this enough? Let's take a closer look.

The goal of an assessment is to support decision making by answering two questions:

What is the current situation?

Metrics are functions that assign numbers to

To be precise, the definition of the measure must specify:

 it reveals the minimum number of tests to write

 interpretation can’t directly lead to improvement action

 it is configurable, thus adaptable to our precise needs

 interpretation can’t directly lead to improvement action

 only the potential and not the real impact is quantified

 it takes into account real dependencies not just declared ones

 no differentiation of types and/or intensity of coupling

 interpretation can lead to improvement action

 ratio values allow comparison between systems

and now what?

The Goal-Question-Metric paradigm offers a framework to guide the use of metrics:

Metrics alone have no interpretation

- NOP: Number of Packages

close to high close to average close to low

Detection Strategies are metric-based queries

M E TRIC 1 > Threshold 1

M E TRIC 2 < Threshold 2

E.g.: a God Class centralizes too much intelligence

Functional complexity of the

Class cohesion is low

TCC < ONE THIRD

Identity Collaboration Classification

What is an analysis in general?

Websterʼs definition of analysis:

More information can be found at:

S-ar putea să vă placă și