Documente Academic
Documente Profesional
Documente Cultură
Coursework 2014/15
Amending and Experimenting with the BinarySearchTree class
Note that the development work for this coursework may be completed in
pairs (recommended, but you may NOT work in groups larger than two) or
individually.
If you work as a pair this means working on all four parts of the submission
together (the report is an individual report, but the development, testing and
experimentation should be done jointly) it is not intended that you delegate
different parts of the implementation work to partners individually, though it
is acceptable for you to identify who will do the lead work for each
function/part (as in, who types the code in to the computer once the
algorithm/logic has been agreed and who sits and watches over their
shoulder!), and you may wish to decide that where one person takes the lead
on implementing some function that the other person takes the lead
responsibility for testing it.
Part A - Extended Binary Search Trees
In the lecture based on chapter 9 of the textbook by Collins we met the
concept of the external path length of a tree and the External Path Length
Theorem. In the context of trees the material as presented in the
book/lecture is not all that interesting as the lower bound given by the
theorem for the external path length is typically much lower than the actual
external path length for a tree. It is noted at the end of section 9.4 that for a
two-tree the external path length can be proven to be at least k log2 k
(where k is the number of leaves which we will now call external nodes)
rather than k/2 floor(log2 k), and this is a more useful lower bound.
Many theoretical treatments of binary trees are framed in terms of an
extended binary tree in which leaves are appended wherever possible to the
nodes of the original tree (this is sometimes called decorating the tree). The
original nodes in such an extended tree are called internal nodes (and in
diagrams are usually represented by circles) and the appended nodes are
called external nodes (in diagrams, usually represented by squares). Among
the benefits that these treatments provide is that every extended binary
search tree is a two-tree and each node represents a distinct search outcome
a successful search terminates at an internal node and an unsuccessful
search terminates at an external node. This contrasts with the situation
where an unsuccessful search for an element in a binary search tree without
external nodes terminates when the search falls off the tree.
To illustrate and expand on some of these points, consider the extended
binary search tree formed by inserting the sequence: 72, 31, 44, 87, 37, 75,
60, 24 into an initially empty tree.
COMP09044 Cswk
Page 1
For simplicity, assume that the range of allowed values in the tree is 1-100.
The tree is shown below.
Internal path length (I) = (2x1) + (3x2) + (2x3) = 14
External path length (E) = (1x2) + (4x3) + (4x4) = 30
For an extended tree, E = 2n + I. For this tree: 30 = (2x8) + 14
The n+1 external nodes are notionally labeled with the values
that could be inserted at that position (assuming no duplicates
are allowed):
A = 1-23, B = 25-30, C = 32-36, D = 38-43, E = 45-59
F = 61-71, G = 73-74, H = 76-86, I = 88-100.
72
31
87
24
44
37
60
75
The labelling of the external nodes is purely for illustrative purposes, and
indicates where a search for any of the labelled values would terminate (so,
for example, a search for the value 22 would terminate in the external node
labelled A) in completing this coursework you will not need to store any
information in an external node (apart from the link to its parent).
The first part of the coursework is to amend the BinarySearchTree<E> class
discussed in the lecture so that it represents an extended binary search tree.
Whenever a new value is inserted into the tree the result is that an external
node is replaced by an internal node containing that value and two external
nodes (where the external nodes are the left and right children of the new
internal node).
To ensure that the Entry<E> class supports both internal and external nodes,
add the following methods to the Entry<E> class:
public boolean isExternal(); /* returns true if this entry is an
external node and false otherwise */
COMP09044 Cswk
Page 2
COMP09044 Cswk
Page 3
class so that (except perhaps when inserting the first item into an empty
tree) an insertion always inserts into an external node in the extended tree.
An alternative idea, involving the use of left and right rotations, is to revise
the insertion algorithm so that it inserts at the root, and this has the
potential advantage that the most recently inserted items are near the top of
the tree. If an application is more likely to search for elements that have
been inserted recently this approach should reduce the number of
comparisons required to find the element.
If new elements are inserted at the root, rather than in a leaf, the tree
resulting from the sequence of insertions used for the extended tree above
would result in this tree:
Internal path length (I) = (1x1) + (2x2) + (4x3) = 17
External path length (E) = (1x1) + (8x4) = 33
E = 2n + I = (2x8) + 17 = 33
A = 1-23, B = 25-30, C = 32-36, D = 38-43, E = 45-59
F = 61-71, G = 73-74, H = 76-86, I = 88-100.
24
A
60
37
31
75
44
72
87
Note that the last four elements added were 37, 75, 60 and 24 and these are
indeed the values closest to the root, with the last added value being in the
root node.
As we have not discussed root insertion at all algorithmically (though we
have discussed rotation in connection with AVL trees) here is a starting point,
courtesy of Robert Sedgewick (the code in the two boxes is from chapter
twelve of his book, Algorithms in Java, Parts 1-4). His Node class has a field
called l for the left child and a field called r for the right child, and less(x,y)
returns true if x is less than y and false otherwise. Note that you will need to
think about how to deal with the parent, which Sedgewicks code does not
COMP09044 Cswk
Page 4
deal with (the book/lecture did include discussion of this so look it up in one
of those places if you need help). It is vital that the parent links are correctly
updated after a rotation, as the successor() method relies on them.
Program 12.18 Rotations in BSTs
The twin routines perform the rotation operation on a BST. A right rotation
makes the old root the right subtree of the new root (the old left subtree of
the root); a left rotation makes the old root the left subtree of the new root
(the old right subtree of the root).
private Node rotR(Node h) {Node x = h.l; h.l = x.r; x.r = h; return x;}
private Node rotL(Node h) {Node x = h.r; h.r = x.l; x.l = h; return x;}
COMP09044 Cswk
Page 5
/**
* Constructor - creates an Item and sets its value
* @param value - the value for the Item
*/
public Item(Integer value) {
this.value = value;
}
/**
* The value of this Item
* @return the Item's value
*/
public Integer value() {
return value;
}
/**
* Compares the value of this Item with that of other according to
* the contract for Comparable.
* Increments the count of comparisons.
*/
@Override
public int compareTo(Item other) {
compCount++;
return value.compareTo(other.value);
}
COMP09044 Cswk
Page 6
/**
* Returns the total number of comparisons performed on instances
* of type Item since the counter was last reset (or the total if
* it has never been reset).
* @return the count of calls to compareTo() and equals() for type
* Item
*/
public static long getCompCount() {
return compCount;
}
/**
* Resets the count of comparisons to zero.
*/
public static void resetCompCount() {
compCount = 0;
}
...
COMP09044 Cswk
Page 7
Investigate and comment on what, if any, advantage the tree that inserts at
the root has against the other two trees when searches involve only the most
recently inserted 10% of items.
Part D - Serialization
There is a separate handout providing information and guidance on
serialization. For this final part you are asked to have your
BinarySearchTree<E> class declare that it implements Serializable, so that
the state of a tree can be serialized by passing a tree to the writeObject()
method of an ObjectOutputStream, and deserialized by calling the
readObject() method of an ObjectInputStream and assigning the Object
returned to a BinarySearchTree variable. Marks will be awarded for this
working correctly and on the basis of the extent to which the serialized and
runtime forms of the tree are decoupled.
Summary and Marking Scheme
1. Part A (25%)
a. Update the class to implement an extended binary search tree.
b. Document your updates
c. Write an application to test your class and document your test
results.
d. Put this class in a package called part1.
2. Part B (20%)
a. Copy your class into a new package called part2 and update the
class to provide the option of inserting at the root.
b. Write an application to test your class when inserting at the root
and document your test results.
c. Run your tests that involve insertion again with a tree instance
that inserts using the standard algorithm (inserting in a leaf) to
confirm that the normal insertion mechanism still performs as
expected with the changes that you have made.
3. Part C (15% - for the approach taken and the data, but many of the
marks for the report will relate to your discussion of the results)
a. Write an application to investigate the performance of the
revised BinarySearchTree class produced in part B, as outlined
above. (In your report, contrast the performance with that of an
instance of TreeSet and discuss the results).
4. Part D (10%)
a. Implement serialization for your BinarySearchTree class and test
that you can serialize and deserialize an instance of the class.
Produce and submit an individual report (30%) on the work covering
the above points and critically appraising your work (if you did this as a
pair, this means that each of you should submit your own report). Note
COMP09044 Cswk
Page 8
that the report must be included. As noted above, for parts of the task
that involve your comparing things and discussing things, the comparison
and the discussion should be included in the report and the marks for the
report reflect this.
Note that you must include an acknowledgement in your report for any
sources you have used (on the web, textbooks etc) in developing or
commenting on design, code and in your testing, and any source code used
from sources other than the course textbook should be indicated in the code
by a comment at the beginning and end of the code stating the source from
which it was taken. If you receive any advice from anyone you should
acknowledge this also and indicate what part of the work it related to. You
should not share any of the code that you produce with anyone other than
your partner.
What You Should Submit
Each of you should submit the work, whether you worked as part of a
pair or on your own. Submit your report (as a Word or rich text format
document, or as a pdf) in a zip file which also contains all the Java source
code that you have written (you do not need to upload the Eclipse project
files just the source code files will be fine). Include your banner ID, and
that of your partner if you worked as a pair, in your report but do not include
your name anywhere in the submission. Submit all the files for your
submission as a zip file called named XXXX.zip (where XXXX are the last four
digits of your Banner ID) using the link in the Assignments section for this
module on Moodle, by 2300 on Monday 12 January 2015.
COMP09044 Cswk
Page 9