Sunteți pe pagina 1din 10

Reverse Engineering of Design Patterns

from Java Source Code

Nija Shi Ronald A. Olsson


Department of Computer Science Department of Computer Science
University of California, Davis University of California, Davis
Davis, California 95616-8562 U.S.A. Davis, California 95616-8562 U.S.A.
shini@cs.ucdavis.edu olsson@cs.ucdavis.edu

ABSTRACT in practice. For example, the Singleton pattern is used to im-


This paper presents a new, fully automated pattern detec- plement java.awt.Toolkit in the Java AWT package (a GUI
tion approach. This approach enhances existing source code toolkit), the Composite, Interpreter, and Visitor patterns
analysis tools by bringing program understanding to the form the basic architecture of Jikes (a Java compiler writ-
design-level. The new approach is based on our reclassifica- ten in C++) [5], the Flyweight pattern is used in Apache
tion of the GoF patterns. We argue that the GoF pattern Ant (a Java build tool) to control the helper objects in a
catalog classifies design patterns in the forward-engineering project, etc. During the design phase, design patterns serve
sense, which can be misleading in the reverse-engineering as a slang for communicating design issues. During the cod-
sense. This paper also describes our tool, PINOT, that im- ing phase, design patterns provide clear guidelines on how
plements this new approach. Our tool is faster and more to create a problem-specific implementation. While design
accurate than existing pattern detection tools in detecting patterns are useful in the forward engineering process, they
patterns in Java AWT, JHotDraw, javac, and Apache Ant. are equally important in the reverse engineering process.
Software projects are usually documented as the software
evolves. However, documentation gradually becomes obso-
Categories and Subject Descriptors lete through time, due to employee turnover or inadequate
D.1.5 [Programming Techniques]: Object-oriented project management. As a result, software companies often
Programming; D.2.2 [Software Engineering]: Design find themselves spending lots of time and money on train-
Tools and Techniques—Computer-aided software engineer- ing new developers to get up to speed. Source code contains
ing (CASE); D.2.11 [Software Engineering]: Software all the information needed for documentation, but it cannot
Architectures—Patterns speak itself without a good reverse engineering tool.
Most reverse engineering tools enhance program under-
standing by extracting structural relationships and call
General Terms graphs from the source code. Such features are also embed-
Design, languages, verification ded in some IDEs. However, without proper documentation,
it would still take a lot of effort for a developer to become
proficient with the source code. Therefore, a powerful re-
Keywords verse engineering tool should be able to extract the intent
Design patterns, static analysis, reverse engineering, Java and design of the source code. We believe by tracing the
common variations of a pattern implementation, the roles
of the participating classes can be identified and the intent
1. INTRODUCTION of the corresponding source code is then revealed.
A design pattern abstracts a reusable object-oriented de- A pattern detection tool can be characterized by its false
sign that solves a common recurring design problem in a positive and false negative rates, defined as in Figure 1. The
particular context [18]. An object-oriented design describes false positive rate reflects the degree of soundness, while the
the roles, responsibilities, and collaboration of participat- false negative rate reflects the degree of completeness of a
ing classes and instances. Every design pattern has its own pattern detection tool. Together the false positive and false
unique intent. Since design patterns can make software de- negative rates determine the “accuracy” of a pattern detec-
velopment more efficient and effective, they are widely used tion tool. A good pattern detection tool has low false pos-
itive and false negative rates. However, the accuracy rates
can vary on different patterns. In general, recognizing pro-
gram behavior is, of course, an undecidable problem, hence
Permission to make digital or hard copies of all or part of this work for a fully automated static analysis will not be able to achieve
personal or classroom use is granted without fee provided that copies are 0% false positive and false negative rates.
not made or distributed for pro£t or commercial advantage and that copies Past efforts have used structural relationships (such as
bear this notice and the full citation on the £rst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior speci£c the generalization and association relationships) to find de-
permission and/or a fee. sign patterns. However, pattern detection tools that use
WOODSTOCK ’97 El Paso, Texas USA structural-based analysis fail to distinguish between pat-
Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00.
N umber of Incorrect P attern Instances
F alse P ositive Rate = × 100%
N umber of All Detected P attern Instances

N umber of U ndetected Correct P attern Instances


F alse N egative Rate = × 100%
N umber of Correct P attern Instances

Figure 1: Definitions of false positive rate and false negative rate

terns that are structurally identical but differ in behav- 2.1 Pure Static Approaches
ior (e.g., the Strategy and the State patterns). Other ap- Previous work [14, 12, 21, 24, 13, 29, 19, 25, 33, 31,
proaches attempt to verify pattern behavior using dynamic 27] uses structural analysis to find GoF design patterns [18]
analysis to distinguish such patterns and to reduce the false from source. Structural relationships of the code include
positive rate. While these attempts are able to capture pro- class inheritance, interface hierarchies, attributes, method
gram behavior, they fail to interpret and verify program in- invocations, parameters and return types, object creations,
tent that is unique for each design pattern. Furthermore, dy- and variable access within a method.
namic analysis depends on the coverage of test data, which Some previous work [14, 21, 33] extracts structural rela-
can increase the false negative rate if the coverage is not tionships from C++ source and stores this information in a
complete. database. Designs patterns are recovered through queries to
The GoF book presents a pattern cataloger for forward the database. Reference [13] uses the same schema defined
engineering, but the same classification can be misleading for UML class diagrams for extracting abstract semantics
for reverse engineering. Current approaches lack a proper graphs from C++ source. It defines an XML pattern lan-
pattern classification for reverse engineering. A pattern clas- guage for users to define patterns. Patterns are recovered
sification for reverse engineering should indicate whether or based on graph comparison. SPQR [31] first translates the
not every pattern is detectable and if there exist common abstract syntax graph obtained by gcc to a format recog-
search strategies to categorize detectable patterns. Thus, nized by a theorem prover. Then it runs the input on the
we reclassified the GoF patterns into five categories in the theorem prover that recognizes some design patterns pre-
reverse-engineering sense (see Section 4). Based on this re- defined using denotational semantics. Patterns are recov-
classification, we automated the pattern recognition process ered based on formal analysis.
using only static analysis that combines structural analy- These approaches heavily rely on the accuracy of the in-
sis and static program analysis. Our approach uses struc- formation extracted in the first stage. Although extracting
tural analysis to look for pattern structures from source code structural relationships seems straightforward, it is compli-
and static program analysis to verify whether the semantics cated by variations in the implementations of some relation-
matches the pattern behavior. ships, such as aggregation [29, 24]. Thus, these approaches
We have some promising preliminary results from our ini- can result in higher false positive or false negatives rates.
tial prototype, PINOT (Pattern INference and recOvery FUJABA [2, 24] extends the work from [29] and uses a
Tool), in recovering design patterns from the Java AWT bottom-up-top-down approach to speed up the search and
package, JHotDraw (a GUI framework) [4, 17], javac, and to reduce the false positive rate. It uses a combination of
Apache Ant. In particular, the Java AWT has been used as structural relationships to indicate a pattern. Thus, when
a benchmarking suite for pattern detection on Java source such information is obtained from the bottom-up search,
code [24, 29]. Our results on speed and accuracy in analyz- even partially, FUJABA then assumes the existence of a
ing the Java AWT are promising. possible pattern and tries to complete the rest of the search,
The rest of this paper is organized as follows. Section 2 that is the top-down search, to confirm that such a pattern
critiques current pattern detection tools. Section 3 presents actually exists. This iterative approach allows going back
an example that motivates our approach. Section 4 explains to their annotated abstract syntax tree (AST) for further
our reclassification of the GoF patterns. Section 5 describes analysis on demand. Follow-on work [25] introduces fuzzy
our initial prototype of PINOT and its accuracy and perfor- logic to make the search speed tolerable for larger-scale sys-
mance. Section 6 concludes the paper and covers our future tems. Like Reference [21], FUJABA is a semi-automatic de-
work. tection tool. The pattern detection engine is bundled with
the FUJABA Tool Suite RE (a software round-trip engi-
2. CRITIQUE OF CURRENT APPROACHES neering tool for Java), which is in parallel with the work
Based on detection methods, current approaches can be in Reference [11]. The pattern recognition process in FU-
categorized according to the kind of analysis they perform: JABA’s recent work [23] is more user driven. They believe
pure static or a combination of static and dynamic. Ta- pattern detection requires human intervention to overcome
ble 1 summarizes representative current approaches. For scalability problems caused by implementation variations in
each pattern detection tool, the target language, detection different problem domains. Thus, this approach assumes
techniques, case studies (generally some well-known appli- users to have a fair amount of knowledge of the analyzed
cations), and the patterns identified in the case studies are code. However, reverse engineering tools for design patterns
shown. are typically used in understanding legacy code, where users
may not be able to provide any feedback during the reverse
engineering process.
Tools Language Techniques Case Studies Patterns Identified
SPOOL [21] C++ Database query ET++, two classified systems from Template Method, Factory Method,
Bell Canada Bridge
DP++ [14] C++ Database query DTK Composite, Flyweight, Class Adapter
Vokac et al. [33] C++ Database query SuperOffice CRM Singleton, Template Method, Decora-
tor, Observer
Antoniol et al. [12] C++ Software metric Leda, libg++, socket, galib, groff, Bridge, Adapter
mec
SPQR [31] C++ Formal semantic Some C++ test programs Decorator
Balanyi et al. [13] C++ XML matching Jikes, Leda, Star Office Calc, Writer Builder, Bridge, Prototype, Proxy,
Strategy, Template Method, Factory
Method
PTIDEJ [11, 19] Java Constraint solver Java AWT, java.net package Composite, Facade
FUJABA [24, 25, 34] Java Fuzzy logic and Java AWT Bridge, Strategy, Composite
Dynamic analysis
Heuzeroth et al. [20] Java Dynamic analysis Java Swing Observer, Mediator, CoR, Visitor
KT [15] SmallTalk Dynamic analysis KT, three SmallTalk programs Composite, Decorator, Template
Method
MAISA [26] UML UML matching Nokia DX200 Switching System Abstract Factory

Table 1: Representative Current Approaches

2.2 Static and Dynamic Approaches havior. However, this approach failed to find the Chain
The approaches in Section 2.1 are limited in finding pat- of Responsibility pattern due to improper message logging
terns that are distinctive only in structure. However, some mechanism and insufficient test data.
patterns aim at program behavior, which cannot be deter-
mined analyzing only structural relationships. Other ap- 2.3 Other Approaches
proaches (e.g., those in References [20] and [34]) suggest us- MAISA [6, 26, 32] measures software quality at the de-
ing dynamic analysis to analyze behavior. They first obtain sign level. From a system’s architectural description (which
structural information from source code. Next for a particu- includes UML class, activity, component, and sequence di-
lar pattern, they compute a list of candidate classes. Then, agrams), MAISA is able to find design patterns and anti-
assuming what these candidates should behave, they verify patterns [16]. However, the number of patterns found were
the behavior during runtime. Reference [34] uses dynamic limited. Only the Abstract Factory pattern (which repre-
analysis as part of pattern identification. This approach sents a “good” pattern) and the Blob anti-pattern1 (which
complicates the search by expanding the set of candidate represents a “bad” pattern) were found in their analyzed
classes and results in analyzing more unrelated execution system. Recovering design patterns from architectural de-
traces. We believe that structural analysis should be used scriptions is not likely to be effective in practice for two
to narrow down the search space. reasons. First, during software development, architectural
Without any experimental results or proof, References [20] requirements and descriptions are usually laid out at the be-
and [34] claim that traditional data-flow and control flow ginning of the development cycle, but are rarely reiterated
analysis should not be feasible when polymorphism and dy- and detailed as the project evolves. Second, to use MAISA
namic method binding are involved. However, the critical to find patterns, one needs to first extract a set of UML dia-
behavior in a design pattern is defined in the base class. grams (including both structural and behavioral diagrams)
Therefore, we rarely have to trace every possible path hap- from source; however, how to recover system behavior is still
pening in the subclasses(s) (see the Chain of Responsibility ongoing research in the UML community.
pattern and other behavioral patterns in [18]). And more
importantly, dynamic analysis relies on a good coverage of 3. MOTIVATING EXAMPLE
test data to exercise every possible execution path; such test
The Singleton pattern is probably the most commonly
data is not often available. Even if test data is available in
used pattern. It is generally perceived to be the simplest
a distribution, the runtime results may be misleading since
pattern to detect [33, 27], since it does not require analyzing
the data was not originally designed for recognizing behav-
its interaction with other classes. The intent of the Singleton
ior of a particular pattern (e.g., a distribution might include
pattern is to ensure that a class has only one instance [18].
a validation or benchmark suite).
However, to verify this intent is not an easy task.
KT [15] uses algorithms to search for patterns in programs
The key features (which we later refer to as the “Single-
written in SmallTalk. It does not search for patterns that
ton class structure” in Section 4.2) to implement the Sin-
are structurally identical, e.g., the Strategy, State, and Com-
gleton pattern in Java include: a private constructor (so
mand patterns. These patterns share a common idea — the
that no other class — inside or outside of its package —
reification of responsibility. However, these patterns have
can instantiate the Singleton class); a private static vari-
very different intents and, therefore, very different behav-
able, instance, that holds the Singleton instance; and a pub-
iors. KT failed to find the Chain of Responsibility (CoR)
lic static getInstance() method that returns a Singleton class
pattern. KT’s search algorithm for the pattern is based on
type. The getInstance() method serves as a global access to
only dynamic analysis. It analyzes an object-message dia-
gram interpreted from a call tree constructed during run- 1
The Blob pattern describes the lack of OO design, which
time. This process removes unnecessary message calls unre- requires refactoring techniques to break the blob into object
lated to the search. Then the pattern should be identified if components. However, this work identifies the “Blob” when
the object-message diagram captures the right pattern be- unsynchronized shared memory is found using the UML
component and sequence diagrams.
instance. Some pattern detection tools, such as FUJABA weight pool are singletons and are created on demand (see
and Reference [27], stop here and conclude that an imple- Section 4.3). The Template Method and Visitor pattern
mentation of the Singleton pattern is found. Other work (both behavioral patterns) define their behavior in the class
described previously in Section 2 did not describe how their definitions, which can be identified based on structural anal-
tools detect the Singleton pattern. However, the same struc- ysis (see Sections 4.2). While this categorization is useful for
ture requirement applies to some creational pattern that has programmers, it is not helpful for pattern detection.
no restriction on the number of instances being created. Instead of using purposes and scopes, patterns should be
Consider the following variations of implementing the Sin- categorized, in the reverse-engineering sense, by how they
gleton pattern. If instance is initialized statically, then it is are implemented syntactically and semantically. Some pat-
trivial to make sure that instance is not created again by any terns are designed to structurally decouple classes and ob-
methods declared in the Singleton class. Now, if lazy initial- jects, which is related to the syntactic aspect of program-
ization is used (instance created on first call to getInstance()), ming; other patterns require specific actions implemented
then it requires knowing when and how the Singleton in- in the method bodies, which involves the semantic aspect of
stance gets created. Further, if more conditions are involved programming. Thus, we divide the GoF patterns based on
in getInstance() (e.g., getInstance() will not return instance their structural and behavioral resemblances into five cate-
unless it is not currently in use), then it requires more intel- gories:
ligence to figure out under what circumstances will instance
gets created and returned. • Patterns that are already provided in the language
We have tested FUJABA on various programs with class
definitions resembling the Singleton structure, but that be- • Patterns that can be detected using syntax-based anal-
have differently in getInstance(), such as returning an en- ysis
tirely new instance upon each call. We found that FUJABA
falsely identified these programs as correct instances of the • Patterns that can be detected using semantic-based
Singleton pattern. Also, the use of dynamic analysis to rec- analysis
ognize such variations will not be helpful in verifying the
• Patterns that are domain-specific
intent. Dynamic analysis can spot a different address (ob-
ject reference) being returned by getInstance(), so it can con-
• Patterns that are only generic concepts
clude the absence of the pattern. However, it cannot prove
that a class implements the Singleton pattern simply be- Figure 2 illustrates this categorization of the GoF patterns
cause getInstance() seems to be returning the same address implemented in Java. The squares are the design patterns,
for a certain period of time. Thus, a different approach is and ovals are structural sub-patterns (which are the building
needed. blocks of the design patterns, some of the sub-patterns here
are used in References [24, 31]). The un-boxed texts indi-
4. GoF PATTERNS RECLASSIFIED cate the searching criteria along the edge to another design
The GoF book [18] illustrates 23 common design pat- pattern. The following sections are organized based on this
terns and categorizes them based on their purposes and categorization. Each section discusses the common charac-
scopes. Purposes have three categories: creational, struc- teristics and search strategies of the patterns in a category.
tural, and behavioral. Creational patterns focus on how
objects get created. Structural patterns focus on class or- 4.1 Language-provided Patterns
ganization by roles using structural relationships, such as Design patterns are so widely used today that many lan-
class inheritances, interface hierarchies, and attribute as- guages (such as Java, Python, etc.) and packages (such as
sociations. Behavioral patterns focus on separating object JDK, STL, etc.) implement some of the common design
responsibilities based on polymorphism and delegation. Pat- patterns to make programming easier. Java provides the
terns are grouped by their scopes into either object or class Iterator (as in java.util.Enumeration, java.util.Iterator, and
patterns. Class patterns deal with relationships between the for-each loop) and Prototype (as the clone() method in
classes. This relationship is established statically during java.lang.Object) patterns. In practice, developers tend to
compile-time through inheritance. Object patterns deal with use such built-in facilities to efficiently and effectively build
dynamic relationship between objects during runtime. Most software systems. Such pattern instances can be recognized
patterns described in the GoF book are object patterns. by matching specific names.
Based on the GoF categorization, some researchers [13,
34] believe structural patterns can be identified based on 4.2 Syntax-based Patterns
only structural relationships and require the least effort to Patterns in this category can be identified syntactically.
analyze. Creational patterns come next, since statements of Syntactic structures include declarations, generalization, as-
object creation can be easily detected. Behavioral patterns sociation, and delegation relationships. At this stage, only
are considered the most difficult to detect, since analysis on the existence of these characteristics is important. Analyz-
the behavior in the method body is required. However, that ing exactly when, how, and why these structures take effect
view is not entirely accurate. For example, the Singleton during runtime occurs in the next stage, where semantic-
pattern (a creational pattern) requires not only detecting based analysis takes place. Patterns that can be identified at
the existence of object creation, but it also requires ver- this stage are: Bridge, Composite, Adapter, Facade, Proxy,
ifying the behavior of the method body that creates and Template Method, and Visitor. In particular, the Bridge,
returns the Singleton instance (see Section 3). The Fly- Composite, and Template Method patterns have been suc-
weight pattern (a structural pattern) requires behavioral cessfully identified in previous work (see Table 1). Our ap-
analysis to verify whether all flyweight objects in the fly- proach uses similar strategies, but modified for Java.
Figure 2: A Reclassification for Reverse Engineering of the 23 GoF Patterns

The Visitor pattern provides a way to define a new opera- 4.3 Semantic-based Patterns
tion to be performed on an object structure without chang- The syntax-based analysis (above) identifies the structural
ing the classes of the elements on which it operates [18]. aspect of a pattern. Patterns in the semantic-based category
The syntactic structures involved are: a method declaration require verifying pattern behavior defined in method bod-
accept (e.g., void Accept(Visitor v)), defined in the element ies. PINOT uses data-flow, control-flow, and state-machine
class to invite a visitor; and a method invocation visit (e.g., analyses to verify when, how, and why certain behavior oc-
v.visit(this)), where an element exposes itself to the visitor. curs.
The Object Adapter (adapter vs. adaptee), Facade (fa- The Singleton pattern is based on a Singleton class struc-
cade vs. subparts), and Proxy (proxy vs. real) patterns ture described in Section 3. Then, forward data-flow analy-
share a common goal: to define a new class to hide other sis is applied to verify that only a single instance is created
class(es) for system integration or simplification. We will re- for this Singleton class. Further, our previous work identi-
fer to Object Adapter pattern as the Adapter pattern. The fied whether the Double-checked Locking (DCL) pattern is
Adapter and Proxy patterns each hides one class, whereas used in conjunction with the Singleton pattern [30].
the Facade pattern hides multiple classes (to be distinguished The Abstract Factory and Factory Method patterns share
from the Adapter pattern). By “hiding”, we mean the hid- a common syntactic structure, which we call “factory inter-
den classes are not directly accessed (by reference or dele- face”. The factory interface contains at least one abstract
gation) from others except for the one that is hiding. method that returns another abstract type to indicate that
Some basic syntactic structures also need to be identi- an object will be returned and that the returned object’s ac-
fied for detecting semantic-based patterns in the next stage. tual type is not determined until runtime. Backward data-
The Singleton class structure is based the syntactic features flow analysis is applied for each factory method candidate
described in Section 3. The context-interface association, to verify that a newly created object is returned.
virtual delegation, factory interface, and the 1:N aggrega- The State and Strategy patterns share the same syntactic
tion are also identified for further semantic-based analysis. structure, which we call “context-interface” structure. Each
The next section further discusses these structures. has a concrete context class that keeps a reference to an ab-
stract class representing a state or strategy. For the Strat- Memento pattern “captures and externalizes an object’s in-
egy pattern, the strategy reference is decided by the context ternal state so that the object can be restored to this state
class whereas for the State pattern, the state reference is later” [18]. However, the pattern neither defines the repre-
modified by the concrete state classes. The two patterns are sentation for a state nor the requirement of a data structure
distinguished by using inter-procedural analysis to trace the for the memo pool. To our knowledge, this pattern has not
write access on this reference. been included in any of the pattern detection tools we listed
The Observer2 (subject vs. listeners), Mediator (mediator in Table 1. Generic concepts are difficult to detect, because
vs. colleagues), and Flyweight (factory vs. flyweights) pat- they lack definite structural and behavioral aspects.
terns are based on an 1:N aggregation relationship. Each
pattern involves a reference that keeps a collection of ob- 4.5 Domain-specific Patterns
jects. The Observer and Mediator patterns differ in their Some patterns in the GoF book build on top of other
communication styles. The subject in the Observer pattern patterns and are specialized for a particular domain. Such
pushes information to the listeners, whereas the mediator in patterns include the Interpreter and Command patterns.
the Mediator pattern acts like a communication hub relaying The Interpreter pattern uses the structure of the Compos-
information between the colleagues. Control-flow analysis is ite pattern and the behavior of the Visitor pattern. Based
used to verify whether a method invocation to a listener or a on this formation and a grammar of a language, the Inter-
colleague is embedded in a loop that traverses the entire set. preter pattern interprets the language. We consider this
If such behavior is absent and the communication is bidirec- pattern as a special case of realizing the Composite and
tional, then a candidate pattern instance is identified as a the Visitor patterns. The Command pattern is basically
Mediator pattern. The subject class detected here are often- a realization of the Bridge pattern that separates the user
times an “observable” class, which serves as a relay between interface from the actual implementation for command exe-
the actual subject class and the set of listeners. Thus, a cution. The Command pattern also suggests incorporating
subject is further identified by checking the call dependents the Composite pattern to support multi-commands and un-
of the observable class. doable operations and using the Memento pattern to store
The collection of objects in the Flyweight pattern are the history of executed commands. We consider these pat-
unique (by some intrinsic state) shared objects. A getFly- tern as domain-specific patterns. Such patterns are possible
weight(key) in the Flyweight pattern maintains the collection to detect (although no previous work has targeted them),
making sure that a new object is only created and returned but their detection will require more complicated analysis
by request. A newly created flyweight object is assigned techniques or heuristics.
with a unique key, which is used for subsequent retrieval. A
possible getFlyweight(key) method is identified by its method 5. PINOT
declaration, which indicates that a flyweight object type is Current pattern detection tools have one or more of the
returned. State-machine analysis is applied here to verify following drawbacks:
the behavior in getFlyweight(key). Thus, the statements in
the method body of getFlyweight(key) is translated to a stat- • Purely structural analysis can be limiting. Anal-
echart. The statechart should match a template statechart ysis based on limited structural relationships is not
for the Flyweight pattern. sufficient to analyze more complicated patterns and
The Chain of Responsibility and Decorator patterns each implementation variations.
involves a chain of objects. These linked objects are sub-
• Dynamic analysis can be misleading. As indi-
classes of the same super class. Both patterns share a com-
cated in our motivating example (Section 3), dynamic
mon syntactic structure: when a method is invoked in an
analysis captures system behavior, but it is not prac-
object, the object may invoke the same virtual method in
tical in verifying the logic of a program.
its adjacent object. The difference between the two patterns
is whether this method invocation happens conditionally. If • Semi-automatic approach is not practical. The
so, then a Chain of Responsibility is found, since a request requirement of human intervention to aid the process
gets passed along the chain when needed. Otherwise, the of pattern recovery requires the reverse engineer to
method invocation is mandatory, which indicates layers of have a fair amount of knowledge of the source code
decorators must take place along the chain. Control-flow being analyzed, which contradicts the purpose of the
analysis is applied to determine whether or not the method need for a reverse engineering tool.
invocation occurs conditionally.
To overcome these drawbacks, we are developing PINOT
(Pattern INference and recOvery Tool), a fully automated
4.4 Generic Concepts pattern detection tool using pure static analysis that com-
While useful in practice, the Builder and Memento pat- bines structural and semantic analysis. PINOT requires no
terns are only generic concepts lacking traceable implemen- extra runtime data or human intervention during the de-
tation patterns. The Builder pattern is a creational pattern tection process. To use PINOT, users only need to provide
that separates the building logic from the actual object cre- the location of the top directory that stores the Java files.
ation, so that the building logic is reusable [18]. In practice, Then PINOT automatically recovers the pattern instances
this pattern is often used for system bootstrapping, of which in the source code. PINOT outputs its results as text or
object creation may not be involved with initial configura- XMI (XML Metadata Interchange3 , [10]), which users can
tion. The Builder pattern was detected in Reference [13] 3
(as shown in Table 1) with a 86% false positive rate. The The purpose of XMI is to define a simple interchange of
metadata between modeling tools (based on OMG-UML)
and modeling repositories (OMG-MOF) in a distributed het-
2
Based on the push-model communication . erogeneous environment.
view with UML editors. Currently, PINOT supports Ar- the Object Adapter and Proxy patterns. They argue that
goUML [1]. the Object Adapter pattern is indistinguishable from a class
that simply refers to another class [12, 15]. However, call-
5.1 Implementation Overview dependents analysis was not considered for these patterns to
PINOT is a modification of Jikes [5] (a Java compiler writ- verify whether the hidden classes are intended to be hidden.
ten in C++). The benefits of using a compiler include: Hidden classes are not necessarily inner or private classes.
visibility and accessibility checking, resolution of variable Instead, they are designed to be hidden and to be accessed
aliasing, simple query of class dependencies and generaliza- through an additional access layer. Thus, the results from
tion, and a direct mapping from each symbol (such as class, analyzing call dependents is only relative and depend on the
method, and variable symbols) to the corresponding decla- completeness of the analyzed source code. For example, ex-
ration and definition in the abstract syntax tree (AST). We tensible source packages (such as Java AWT and JHotDraw)
chose Jikes instead of javac because Jikes is an open-source are not considered complete: they require user source code
project, and PINOT will be open-source too. We also in- to form a complete program. Thus, our experiment ana-
vestigated other alternatives with which to build PINOT, lyzed solely the source packages; the results (Section 5.3)
such as static program analysis tools and parsers. However, show the pattern instances within each package.
these tools do not provide the flexibility we need to go back The java.util.Observable implements a push-model Ob-
to the AST when further semantic analysis is necessary. server pattern that provides an internal data structure to
PINOT uses existing Jikes symbol table information and store its listeners and a fixed subject-listeners communica-
additional information: a DelegationTable to record each tion mechanism. The order in which notifications will be de-
method invocation, which is helpful in identifying pat- livered is unspecified [3]. Since the Observer pattern can be
terns that rely on delegation; and a ReadAccessTable and a applied to various contexts in practice, different implemen-
WriteAccessTable, which record read and write accesses for tation of internal data structures and communication mech-
each variable and method. For behavioral analysis, we use anisms are desired. Thus, PINOT detects common variants
data-flow and control-flow analyses to simulate and analyze of this pattern.
execution paths, and extract statecharts from a method’s An 1:N aggregation can be implemented as an ar-
AST. Each pattern requires different combination of struc- ray, linked-list, or using collection classes derived from
tural and semantic analysis. java.util.Collection. PINOT currently identifies arrays and
Java collection classes. Identifying linked-list structures re-
5.2 Detecting the GoF Patterns quires shape analysis, which is not currently implemented in
PINOT. Reference [20] identifies the Mediator pattern using
Based on the reclassification discussed Section 4, PINOT
both static and dynamic analyses. However, the dynamic
focuses on detecting syntax- and semantic-based patterns.
analysis is used to verify the correctness of the pattern’s
The rest of this section discusses the implementation vari-
syntactic structure. Reference [33] detects a pull-model Ob-
ants for detecting certain patterns.
server pattern (which we consider an example of the Medi-
Reference [20] detects the Visitor pattern using a combina-
ator pattern) using only static analysis. Reference [20] de-
tion of static and dynamic analyses. Our syntax-based anal-
tects a push-model Observer pattern using both static and
ysis is the same as their static analysis. However, they use
dynamic analyses. Their static analysis returns a list of can-
dynamic analysis for type checking, whereas we use Jikes’
didate methods defined in a subject class that adds, removes,
existing static analysis.
and notifies listeners. The dynamic analysis then monitors
Detecting the Adapter, Facade, and Proxy patterns, re-
the behavior based on this assumption. However, they did
quires analysis on call dependents (Section 4.2). The search
not enforce loop iteration for notifying the listeners. PINOT
strategies for detecting each of these patterns involve the
does not analyze the actual add or remove methods, because
following steps: 1. check the types between the hiding class
these methods come for free whenever an 1:N aggregation
vs. the hidden class(es); 2. verify that the hidden object(s)
exists. The distinctive behavior for the Observer pattern is
are created locally in the hiding class; 3. analyze call graphs
its communication style.
to verify all communications to the hidden objects. In Step
The Flyweight pattern relies on a key-value data structure
1, PINOT checks the types of the hiding and hidden classes
to store and retrieve unique flyweight objects, which is de-
based on their pattern definition. In this case, the proxy
fined as java.util.Dictionary in Java. Identifying subclasses of
and the real classes must be subclasses of the same super
this abstract dictionary class should facilitate pattern detec-
type. However, the adapter/facade and adaptee/subparts
tion. However, PINOT does not (yet) recognize this variant
are from separate class hierarchies. In particular, we enforce
implementation.
the type of the adaptee in the Adapter pattern to be a con-
Although not restricted for the Decorator pattern defined
crete instead of abstract. This requirement disambiguates
in the GoF book, PINOT reduces the false positive rate by
the Adaptor pattern from the Strategy pattern. Steps 2 and
enforcing the decorate(...) operation in the decorator class
3 are based on the information obtained by PINOT. PINOT
to do more than just invoking the decorated class’s deco-
keeps track of how variables get initialized. If a particular
rate(...) operation. Reference [15] treats the Decorator pat-
variable is initialized by class creation, then PINOT records
tern as syntax-based pattern that is structurally similar to
the variable’s possible concrete type. PINOT also records
the Composite pattern (1:1 aggregation), which did not ana-
a set of call dependents for each class. Thus, PINOT can
lyze the decorate(...) operation. Reference [31] defines both
perform set operation on the two sets of call dependents be-
structural and behavioral aspects of the Decorator pattern,
tween the hiding and hidden classes. In the Facade pattern,
but restricts the decorate(...) operation to be defined in the
the set of call dependents for the hidden classes is the union
base decorator class.
of each hidden class.
Past attempts have partially identified or even excluded
5.3 Results version 1.3 package, but neither did they make clear FU-
We tested PINOT on the following Java source packages: JABA’s coverage of pattern implementations nor did they
Java AWT 1.3, JHotDraw [4], javac 1.4, and Apache Ant. state whether the results of pattern instances were compre-
Table 2 shows the number of classes, lines of code, number hensive4 . Further, PINOT is significantly faster than FU-
of files, and the total processing time for each package. The JABA. FUJABA took approximately 22 minutes (on a Pen-
total number of classes also include inner and anonymous tium III 933MHz processor with 1G of memory) to analyze
classes. The total processing time includes the time to parse the entire Java AWT package, excluding the time for pars-
the source code, construct symbol tables, record association ing [24]. However, we are not certain if this time includes
and delegation relationships, analyze pattern instances, and displaying the results graphically.
finally print out the results. We ran PINOT on each of these Based on our experimentation with FUJABA, we discov-
packages on a Linux machine running on a 1.4GHz Intel pro- ered that the tool works on small-scale programs, but has
cessor with 512MB of memory. Table 3 shows the number limited pattern recognition capability on larger programs.
of patterns instances recovered by PINOT. The patterns are For example, we were unable to reproduce the reported FU-
listed in the order based on the categorization from the GoF JABA results on Java AWT. Therefore, we use the “Pattern
book to show the system design from a forward-engineering Stories: JavaAWT” webpage (Stories) [8] as a reference to
perspective. We manually verified all the pattern instances verify our results. This website is a discussion board where
detected by PINOT and found them as correct implementa- developers can share their experiences with design patterns.
tions of the patterns. However, we have not yet determined It discusses the pattern instances that appear in the Java
the false negative rate of PINOT. The rest of this section AWT (unfortunately, no version number is mentioned). The
gives an overview of the detected pattern instances. Due to pattern instances reported on this webpage were obtained
space limitations, it focuses on Java AWT and just touches by user experience, thus, the results are not comprehensive.
on the other packages. Reference [9] contains a comprehensive list of pattern in-
stances reported by PINOT.
Number of Number of Time PINOT, FUJABA [24, 25, 29], and Stories report
Package Classes KLOC Files (sec) pattern instances of the Bridge (java.awt.Component vs.
java.awt 485 142.8 345 18.31 java.awt.ComponentPeer), Composite (java.awt.Container vs.
JHotDraw 464 71.7 484 21.09 java.awt.Component), and Strategy (java.awt.Container vs.
javac 190 33.8 66 6.93 java.awt.LayoutManager) patterns. PINOT detects two
ant 526 72.4 232 15.81 more Bridge pattern instances: java.awt.font.GlyphVector
vs. java.awt.Shape and java.awt.MenuComponent vs.
Table 2: Timing Results java.awt.MenuComponentPeer. PINOT detects nine more
instances of the Strategy pattern. In particular, PINOT
detects java.awt.dnd.DropTargetAutoScroller as the con-
Packages text class and java.awt.dnd.AutoScroll as the strategy
java.awt JHotDraw javac ant interface. Another Strategy instance is formed by
Creational java.awt.image.renderable.RenderableImageProducer (as the
context) and java.awt.image.renderable.RenderableImage (as
Abstract Factory 7 4 0 0
the strategy interface).
Factory Method 1 6 2 0
Both PINOT and Stories report java.awt.Toolkit as a Sin-
Singleton 3 1 0 1
gleton class. PINOT further identifies java.awt.Toolkit im-
Structural
plementing the multithreaded Singleton pattern without
Adapter 3 0 3 8
double-checked locking. The other two Singleton classes
Bridge 3 27 0 1
detected by PINOT are java.awt.GraphicsEnviornment and
Composite 1 4 0 3
java.awt.image.ColorModel. Both PINOT and Stories re-
Decorator 0 6 1 12
port the Chain of Responsibility (CoR) pattern formed by
Facade 0 13 1 3
java.awt.Container and java.awt.Component, where getFore-
Flyweight 0 0 0 2
ground(), getBackground(), getFont(), getLocale(), etc., are
Proxy 5 11 1 16
the event handlers.
Behavioral Both PINOT and Stories report java.awt.Container as
CoR 1 0 0 0 a mediator and java.awt.Component as the type of col-
Mediator 3 0 4 1 leagues constituting the Mediator pattern. PINOT
Observer 4 25 2 0 also detects two other instances of Mediator pat-
State 0 2 0 0 tern: java.awt.EventDispatchThread (as a mediator)
Strategy 10 44 0 12 vs. java.awt.EventQueue (as type of colleagues) and
TemplateMethod 3 6 0 3 java.awt.MediaTracker vs. java.awt.image.ImageObserver. Al-
Visitor 0 0 1 0 though the name of the latter suggests an Observer pattern
instance, PINOT detects communication from the colleagues
Table 3: Results of Pattern Instances Recovered to the mediator and identifies them as participants of the
Mediator pattern.
PTIDEJ [19] and FUJABA [24] have been tested on the
Java AWT package, as shown in Table 1. However, Refer- 4
Reference [24] indicates that FUJABA reported only a con-
ence [19] neither specified which Java AWT version nor il- stellation of classes related to java.awt.Component. Refer-
lustrated any accuracy or performance results for PTIDEJ. ence [27] assumes that FUJABA covers all 23 GoF pattern
FUJABA [24, 25, 29] was tested on the entire Java AWT definitions.
Stories suggests java.awt.Toolkit as an Abstract Fac- tree structure. PINOT detects various structural patterns
tory class responsible of creating all AWT components. used in Ant.
However, PINOT requires analyzing the actual class cre-
ation behavior implemented a concrete factory class to 6. CONCLUSION AND FUTURE WORK
identify the Abstract Factory pattern. Each platform
This paper discusses how pattern recognition significantly
provides one concrete implementation of java.awt.Toolkit.
helps program understanding and the state-of-the-art pat-
For example, sun.awt.motif.MToolkit is provided for Linux,
tern detection tools. Our contributions include: reclassify-
while sun.awt.windows.MToolkit for Windows. Since our
ing the GoF patterns to facilitate pattern recognition and
experiment focused on only java.awt.*, PINOT was un-
implementing PINOT, a fully automated pattern detection
able to analyze sun.awt.motif.MToolkit. Among the Ab-
tool that is faster and more accurate than existing tools. Our
stract Factory pattern instances detected, PINOT detects
future work with PINOT will: expand its pattern recogni-
java.awt.font.GraphicsAttribute as an Abstract Factory class,
tion capability; explore its use to detect design patterns in
since java.awt.font.ImageGraphicAttribute creates and returns
specific application domains; and extend its overall usability.
a java.awt.geom.Rectangle2D.Float (which implements the
The current prototype of PINOT recognizes the common
Rectangle interface, that describes a rectangle, of the same
implementation variants for each design pattern. However,
package) through its getBounds() method. PINOT detects
most design patterns are based on certain syntactic struc-
java.awt.SystemColor as a Factory Method class with create-
tures (Section 4.2), which also allow many implementation
Context(...) as its factory method to create and return a
variants. We want PINOT to recognize more common user-
PaintContext used to generate a solid color pattern.
defined data structures, by incorporating more complicated
The Observer instance reported on Stories is
static analysis techniques, shape analysis, template match-
java.util.Observable, which is not within java.awt.*. PINOT
ing, etc.
detects the Observer pattern formed by java.awt.Container
The GoF book describes the general design patterns used
and java.awt.Component, where addNotify(), removeNotify(),
in common object-oriented software design. However, more
and applyComponentOrientation in java.awt.Container are
domain-specific design patterns have been proposed to solve
“notify” methods to its collection of Component objects.
problems in specific application domains. In particular,
In particular, the addNotify() method makes this container
we are interested in exploring design patterns for HPC So
displayable by connecting it to a native screen resource.
far, PINOT recognizes syntax- and semantic-based patterns
This method causes any of its containing components to be
based on its current vocabulary. To recognize HPC patterns,
made displayable [3], while removeNotify does the opposite.
we need to expand PINOT’s vocabulary by adding recogni-
Stories reported the update in java.awt.Component as a
tion on concurrent programming in Java, e.g., the under-
template method and paint as a primitive method. PINOT
lying locking, synchronization, and communication mech-
restricts a template method to be declared “final” to dis-
anisms that are both language- and package-specific (e.g.,
tinguish with the primitive methods can be reified. Thus,
java.util.concurrent). We also discover that the GoF patterns
PINOT rejects this pattern instance. A Template Method
are oftentimes the underlying platform for domain-specific
instance detected by PINOT is in java.awt.Toolkit, where
patterns. A known example is the java.util.Collections class,
getSystemEventQueue() is the template method and getSys-
which provides static methods that return synchronized Col-
temEventQueueImpl() is a primitive method.
lection objects backed by the specific Collections objects that
Absent from Stories but detected by PINOT are
are not thread-safe. This design is based on the combina-
the Adapter and Proxy patterns. PINOT detects
tion of the Decorator [18] and the Thread-safe Wrapper Fa-
java.awt.dnd.DropTargetContext.TransferableProxy as a proxy
cade [28] patterns. Recognizing these related patterns can
class for java.awt.datatransfer.StringSelection, which is a
speed up the pattern detection process. We will focus on
utility class that enables easy data transfer of strings.
the patterns for concurrent and networked objects [28] and
From the way java.awt.datatranser is set up, users need
parallel programming [22].
not directly call the methods in StringSelection [7].
To make PINOT more usable. we plan to provide a lan-
PINOT also detects FillAdapter as an adapter that adapts
guage to allow users to define new design patterns, complete
java.awt.geom.GeneralPath (which can be used to describe
the viewing capability for pattern instances, and explore
arbitrary shapes composed of line and curve segments) to
other uses of PINOT. The pattern detection algorithms are
java.awt.BasicStroke (that defines properties that control
hand-coded into our current prototype of PINOT. Some pat-
how lines are drawn).
tern detection tools provide their own detection languages
Stories reported java.awt.LayoutManager as an instance of
that generally reflect internal searching mechanisms. For
the Flyweight pattern. However, PINOT rejects this in-
example, FUJABA adopts a UML-like language, but is lim-
stance because no flyweight pool is detected.
ited to UML’s class diagrams for matching the source code
JHotDraw uses some well-known design patterns [17],
AST. However, a good language should be intuitive but not
which motivates us to test PINOT on it. PINOT detects
tied to the internal detection mechanism. We want to allow
four design patterns in JHotDraw, as shown in Table 3. The
users to use both the structural and behavioral diagrams in
particular pattern instances detected correspond to those re-
UML2 to describe different aspects of a design pattern. We
ported in Java World [17] and we verified them manually.
also plan to provide graphical output for pattern instances
PINOT detects the Visitor pattern in javac, which is typi-
that PINOT recognizes. We currently support viewing for
cal for a compiler. However, our initial prototype of PINOT
ArgoUML [1]. Finally, we want to explore PINOT’s capabil-
does not recognize complicated user-defined data structure,
ity in detecting antipatterns or other source code bugs from
thus PINOT fails to detect the Composite pattern, which ex-
the design level.
ists in javac. It turns out javac uses its own List (instead of
java.util.List) to form the Composite pattern for its internal
7. REFERENCES Pattern-based reverse-engineering of design
components. In Proc. of the 21st International
[1] ArgoUML. http://argouml.tigris.org/. Conference on Software Engineering, pages 226–235.
[2] FUJABA. http://www.fujaba.de. IEEE Computer Society Press, May 1999.
[3] JavaTM 2 Platform, Standard Edition, v 1.3.1 API [22] T. G. Mattson, B. A. Sanders, and B. L. Massingill.
Specification. Patterns for Parallel Programming. Addison-Wesley,
http://java.sun.com/j2se/1.3/docs/api/. Reading, Massachusetts, 2005.
[4] JHotDraw. http://www.jhotdraw.org. [23] J. Niere, M. Meyer, and L. Wendehals. User-driven
[5] Jikes. http://jikes.sourceforge.net/. adaption in rule-based pattern recognition. Technical
[6] MAISA. http://www.cs.helsinki.fi/group/maisa/. report, 2004.
[7] O’Reilly Java Foundation Classes in a Nutshell. [24] J. Niere, W. Shafer, J. P. Wadsack, L. Wendehals, and
http://www.unix.org.ua/orelly/java-ent/jfc/ J. Welsh. Towards pattern-based design recovery. In
index.htm. Proc. of the 24 International Conference on Software
[8] Pattern Stories: JavaAWT. Engineering, pages 338–348. IEEE Computer Society
http://wiki.cs.uiuc.edu/PatternStories/JavaAWT. Press, May 2002.
[9] PINOT’s Runtime Results. http://www.cs.ucdavis. [25] J. Niere, J. P. Wadsack, and L. Wendehals. Handling
edu/~shini/research/pinot/results. large search space in pattern-based reverse
[10] XMI Metadata Interchange. http://www.omg.org/ engineering. In Proc. of the 11th IEEE International
technology/documents/formal/xmi.htm. Workshop on Program Comprehension, pages 274–279.
[11] H. Albin-Amiot, P. Cointe, Y.-G. Guehéneuc, and IEEE Computer Society Press, May 2003.
N. Jussien. Instantiating and detecting design [26] J. Paakki, A. Karhinen, J. Gustafsson, L. Nenonen,
patterns: putting bits and pieces together. In and A. I. Verkamo. Software metrics by architectural
Proceedings of the 16th Annual International pattern mining. In Proceedings of the International
Conference on Automated Software Engineering, pages Conference on Software: Theory and Practice, pages
166–173. IEEE Computer Society Press, November 325–332. 16th IFIP World Computer Congress,
2001. August 2000.
[12] G. Antoniol, R. Fiutem, and L. Cristoforetti. Design [27] I. Philippow, D. Streitferdt, M. Riebisch, and
pattern recovery in object-oriented software. In Proc. S. Naumann. An approach for reverse engineering of
of the 6th International Workshop on Program design patterns. Software Systems Modeling, pages
Comprehension, pages 153–160. IEEE Computer 55–70, 2005.
Society Press, June 1998. [28] D. Schmidt, M. Stal, H. Rohnert, and F. Buschmann.
[13] Z. Balanyi and R. Ferenc. Mining design patterns Pattern-Oriented Software Architecture: Patterns for
from C++ source code. In Proc. of the International Concurrent and Networked Objects. John Wiley &
Conference on Software Maintenance, pages 305–314. Sons, Chichester, England, 2000.
IEEE Computer Society Press, September 2003. [29] J. Seemann and J. W. von Gudenberg. Pattern-based
[14] J. Bansiya. Automating design-pattern identification – design recovery of Java software. In Proceedings of the
DP++ is a tool for C++ programs. Dr. Dobbs 6th ACM SIGSOFT International Symposium on
Journal, 1998. Foundations of Software Engineering, pages 10–16.
[15] K. Brown. Design reverse engineering and automated ACM Press, 1998.
design pattern detection in SmallTalk. Master’s thesis, [30] N. Shi and R. A. Olsson. Reverse Engineering of
North Carolina State University, 1998. Design Patterns for High Performance Computing.
[16] W. H. Brown, R. Malveau, H. W. McCormick, and http://charm.cs.uiuc.edu/patHPC/papers/shi.ps,
T. J. Mowbray. Anti-Patterns: Refactoring Software, 2005. Workshop on Patterns in High Performance
Architectures, and Projects in Crisis. John Wiley & Computing at the University of Illinois at
Sons, New York, New York, 1998. Urbana-Champaign.
[17] E. Gamma. Becoming a Programming Picasso with [31] J. M. Smith and D. Stotts. SPQR: flexible automated
JHotDraw. http://www.javaworld.com/javaworld/ design pattern extraction from source code. In Proc.
jw-02-2001/jw-0216-jhotdraw.html. of the 18th IEEE International Conference on
[18] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Automated Software Engineering, pages 215–224.
Design Patterns: Elements of Reusable IEEE Computer Society Press, October 2003.
Object-Oriented Software. Addison-Wesley, Reading, [32] A. I. Verkamo, J. Gustafsson, L. Nenonen, and
Massachusetts, 1995. J. Paakki. Design patterns in performance prediction.
[19] Y.-G. Guehéneuc and N. Jussien. Using explanations In Proceedings of the Second International Workshop
for design patterns identification. In Proceedings of the on Software and Performance, pages 143–144. ACM
1st IJCAI Workshop on Modelling and Solving Press, September 2000.
Problems with Constraints, pages 57–64, August 2001. [33] M. Vokáč. An efficient tool or recovering design
[20] D. Heuzeroth, T. Holl, G. Hogstrom, and W. Lowe. patterns from C++ code. Journal of Object
Automatic design pattern detection. In Proc. of the Technology, July/August 2005. To appear.
11th IEEE International Workshop on Program [34] L. Wendehals. Improving design pattern instance
Comprehension, pages 94–103. IEEE Computer recognition by dynamic analysis. In Proc. of the ICSE
Society Press, May 2003. Workshop on Dynamic Analysis (WODA), pages
[21] R. Keller, R. Shauer, S. Robitaille, and P. Pagé. 29–32. IEEE Computer Society Press, May 2003.

S-ar putea să vă placă și