Sunteți pe pagina 1din 4

CS 240 Tutorial 10 Notes

Range Trees: Used for range searching multi-dimensional data, e.g., find all points (x, y) which satisfy
xlo x xhi

ylo y yhi

and

given some points and ranges [xlo , xhi ], [ylo , yhi ].


Brute-force method is to search all n points and check which lie in the range: O(n) cost.
Range trees are a way of preprocessing the data to make such searches more efficient.
If all the points lie in the range, even just outputting them takes (n) time, so we cant hope to do
better than brute-force in the worst case.
But we can do better in the case when the output size (say k) is significantly smaller than n.
In 2 dimensions, a search in a range tree costs O(k + (log n)2 ).
1 dimensional range trees are balanced binary search trees where the points are stored in the leaves. To allow
binary search to work, the non-leaves contain the value of the largest leaf contained in the left childs subtree.
Example: Draw the perfectly balanced 1-dimensional range trees on the x and y coordinates of the points
(1, 8), (2, 6), (3, 4), (5, 7).
Answer:
2

6
3

1 2 3 5

4 6 7 8

Example: Search for the range [2.5, 6.5] in both trees.


Answer: Bold nodes represent those on the search paths; underlined nodes represent those which are in
range:
2
1

6
3

4 6 7 8

1 2 3 5

In general, you want to output all the leaves which lie between the search paths, and possibly the two leaves
which lie on the search paths; check if these points lie in range manually. (Explain how pseudocode works.)
A range tree on two dimensions x and y is a range tree on x with leafs containing the points (x, y) and
non-leafs containing a pointer to a range tree on y (only using the points which appear as descendants of the
non-leaf).

Example: Draw the perfectly balanced 2-dimensional range tree using the previous points.
Answer:
(1, 8)
6
(1, 8)
(2, 6)
7
(5, 7)
2
6
(2, 6)
4
(5, 7)
(3, 4)
3
1
4
(3, 4)
(1, 8) (2, 6) (3, 4) (5, 7)
Range search works the same way as in the 1-dimensional case, except instead of just printing the leafs which
appear between the search paths, you start searching in the associated trees which lie between the search
paths using the next pair of range values.
Pseudocode for 1-dimensional range search:
rangesearch(node, lo, hi)
loop
if node is a leaf
print node.value if lo <= node.value <= hi
return
else if lo <= node.value and hi <= node.value
node = node.left
else if lo > node.value and hi > node.value
node = node.right
else
nodelo = node.left
nodehi = node.right
exit loop
loop
if nodelo is a leaf
print nodelo.value if lo <= nodelo.value <= hi
exit loop
else if lo <= nodelo.value
print all leafs in nodelo.right
nodelo = nodelo.left
else
nodelo = nodelo.right
loop
if nodehi is a leaf
print nodehi.value if lo <= nodehi.value <= hi
exit loop
else if hi <= nodehi.value
nodehi = nodehi.left
else
print all leafs in nodehi.left
nodehi = nodehi.right

Note: Part of the definition of a range tree is that it is balanced, e.g., an AVL tree. This complicates adding
new points into a range tree. For example:
2
2

3
3

(2, 3)
3

(2, 3)

add (5, 6)

rotate

(3, 4)
(3, 4) (4, 5)

(2, 3) (3, 4) (4, 5) (5, 6)


(4, 5) (5, 6)

And now the tree associated with 3 has to be set to the tree previously associated with 2, and the tree
associated with 2 has to contain (2, 3) and (3, 4).
In general,
2
3
3

rotate

4
A
C

and the tree associated with 2 has to be set to the tree associated with A with the points in B added.
A3Q1. (a) Find a way to sort an array A[1..n] with O(log n) distinct elements in O(n log log n) time.
Idea: If one could find the frequency with which each element appears, then one could just sort the O(log n)
distinct elements and add enough copies of the elements to achieve the proper frequency.
How to find the correct frequencies? Easiest thing is to store them in an associative array, loop through A
and for each item encountered, increment its corresponding frequency.
Using an unordered array, finding the counter to increment costs O(log n) (the length of the associative array)
and there are O(n) increments, so this costs O(n log n).
Using an ordered array, finding the counter to increment costs O(log log n), so total incrementing cost is
O(n log log n). If the item hasnt been added to the array yet the cost is O(log n), but that only happens
O(log n) times.
Sorting the distinct elements costs O(log n log log n), and outputting the final sorted answer with correct
frequency costs O(n). Total cost: O(n log log n).
Note: An AVL tree can also be used to implement the associative array.
Alternate Idea: Use Quicksort with a 3-way partition


< pivot | = pivot | > pivot
and pivoting on the median.
However, this is harder to analyze, and you need to know how to compute medians in linear time.
(b) Sorting arrays with many duplicates is a special case of the sorting problem, and the (n log n) bound
only applies to sorting in the general case. If you have extra information about what you are trying to sort
you can possibly beat the lower bound, as in this case.
3

A3Q2. (a) Pseudocode for finding the height of a binary tree:


height(node)
if node is empty then
return 0
else
return 1 + max(height(node.left), height(node.right))
Cost: height is called on every node in the tree, so cost is (n).
Also, height is called exactly once for every node and empty child in the tree. Every node has at most 2
empty children, so there are O(n) total height calls, each costing O(1).
Thus total cost is (n).
(b) Pseudocode for finding the height of an AVL tree:
height(node)
if node is empty then
return 0
else if node.balance <= 0
return 1 + height(node.left)
else
// node.balance = 1
return 1 + height(node.right)
Cost: height is called exactly once on each node in the leftmost path from the root to the deepest part of
the tree, plus once on an empty child. In total, it is called (height) times, each costing (1). Since the
height of AVL trees is (log n), the total cost is (log n).
Also, log n o(n), since
log n
1/n
= lim
= 0.
n n
n 1
lim

S-ar putea să vă placă și