Sunteți pe pagina 1din 35

Data Structures and Algorithms

(CS210/ESO207/ESO211)
Lecture 8
Inventing a new data structure (Binary Search Tree)
1
Important Notice

There are basically two ways of introducing a new/innovative solution of a
problem. One way is to just explain it without giving any clue as to how the
person who invented the concept came up with this solution. Another way is
to start from scratch and take a journey of the route which the inventor might
have followed to arrive at the solution. This journey goes through various
hurdles and questions, each hinting towards a better insight into the problem
if we have patience and open mind. Which of these two ways is better ?

I believe that the second way is better and more effective. The current lecture
is based on this way. The data structure we shall invent is called a Binary
Search Tree. This is the most fundamental and versatile data structure. We
shall realize this fact many times during the course
2
Recap from Previous Lecture
Data Structure List:

Mathematical Modeling of List

Implementation of List
- Array based implementation
- Link based implementation

3
Link based Implementation for lists
(recap from previous lecture)
4
Head
Value
Address of next (or right) node
Singly Linked List
Address of previous (left) node
Doubly Linked List
node
Doubly Linked List based implementation versus array
based implementation of List
Operation Time Complexity per
operation for array based
implementation
Time Complexity per
operation for doubly linked
list based implementation
IsEmpty(L) O(1) O(1)
Search(x,L) O(n) O(n)
Successor(p,L) O(1) O(1)
Predecessor(p,L) O(1) O(1)
CreateEmptyList(L) O(1) O(1)
Insert(x,p,L) O(n) O(1)
Delete(p,L) O(n) O(1)
MakeListEmpty(L) O(1) O(1)
5
Arrays are very rigid
A fundamental data structure Problem
Maintain a telephone directory
Operations:
Search the phone # of a person with name x

Insert a new record (name, phone #,)



6
Array based Linked list based
O(n) O(n)
O(1) O(1)
Can we achieve O(log n) bound for
each operation ?
Can we improve it ?
Yes. Keep the array sorted
according to the names and do
Binary search for x.
Log n
O(n)



It seems difficult to achieve O(log n) time complexity for insert
operation using Arrays (due to their rigidity which we have
seen). So let us focus on doubly linked lists to explore if it is
possible to achieve an efficient search time using them.
7
Reorganizing doubly linked list






8
head
1 2 n/2
n-1 n
2 5 46 83 96
head
46
head
n/2
n/2 -1 n/2 +1
41 53
1 2
n-1 n
2 5 83 96
2
1 n/4-1
25 41
n/4+1 n/2 -1
31
46
head
n/2
n/4
3n/4
28 67
53
n/2 +1 3n/4-1
65 96
3n/4+1 n
73
Can we reduce number of steps of Search(i,L) to n/2 ?
Can we reduce number of steps of Search(i,L) to n/4 ?
Let us keep the elements of the list
sorted according to unique ID numbers
of persons.
For ease of understanding, we use numbers
(unique IDs) of persons instead of their names. So
search query will have, as input, the unique ID of a
person instead of his/her name.


Take a pause for a few minutes to imagine what will
be the structure that will emerge if we pursue our
idea further.

9
A new data structure emerges


10
head
2
28
46
67
96 25
5
31 41
35
49
53 48 73
83
A new data structure emerges
To analyze it mathematically, remove irrlevant details






Spend some time over this data structure to see its characteristics.
How does it look like

11
head
2
28
46
67
96 25
5
31 41
35
49
53 48 73
83
Nature is a source of inspiration
12
leaves
joints
root

Nature is a source of inspiration
13

Nature is a source of inspiration

14
root
leaves
edges
Nodes
Binary Tree: A mathematical model
Definition: A collection of nodes is said to form a binary tree if
1. There is exactly one node with no incoming edge. This node is called the
root of the tree.
2. Every node other than root node has exactly one incoming edge.
3. Each node has at most two outgoing edges.

15
Which of these are
not binary trees ?
Binary Tree: some terminologies
If there is an edge from node u to node v, then u is called parent of v and v
is called child of u.
The Height of a Binary tree T is the maximum number of edges from the
root to any leaf node in the tree T.

parent(y) = ??
parent(v) = ??
children(y) = ??
children(x) = ??
height(T) = ??


16
u
v
z
x
y q
r
p
T
x
u
{r}
{y,q}
4
subtree(x)
subtree(y)
subtree(v)
Varieties of Binary trees

17
We call it Perfectly balanced
skewed
u
v
z
x
y
q
T1
p
w
u
v
z
x
y
r
p
T2
w
For every node, the number of nodes in
the subtrees of its two children differ at
atmost by 1.
Binary Search Tree (BST)











Definition: A Binary Tree T storing values is said to be Binary Search Tree if for each node v in T
If left(v) <> NULL, then value(v) > value of every node in subtree(left(v)).
If right(v)<>NULL, then value(v) < value of every node in subtree(right(v)).






18
head
2
28
46
67
96 25
5
31 41
35
49
53 48 73
83
Look at the similarity of a BST with a sorted array.
This will be exploited for searching efficiently an element in a BST.
Search(T,x)
Searching in a Binary Search Tree

19
2
28
46
67
96 25
5
31 41
35
49
53 48 73
83
T
Search(T,33) :
Searching for 33 in T.
Search(T,x)
Searching in a Binary Search Tree
Search(T,x)
{ p T;
Found FALSE ;
while( ?? )
{ if(value(p) = x) ?? ;
else if (value(p) < x) ?? ;
else ?? ;
}
return p;
}
20
Found= FALSE & p<> NULL
Found TRUE
p right(p)
p left(p)
Insert(T,x)
Insertion in a Binary Search Tree

21
2
28
46
67
96 25
5
31 41
35
49
53 48 73
83
T
Insert(T,50) :
Inserting 50 into T.
50
Homework 1



Write pseudocode for Insert(T,x) operation similar to
the pseudocode we wrote for Search(T,x).
22
Homework 2

Design an algorithm for the following problem:

Given a sorted array A storing n elements, build a perfectly
balanced binary search tree storing all elements of A in O(n)
time.
23
Homework 3
What does the following algorithm accomplish ?
Traversal(T)
{ p T;
if(p=NULL) return;
else{ if(left(p) <> NULL) Traversal(left(p));
print(value(p));
if(right(p) <> NULL) Traversal(right(p));
}
}

24
It prints the elements of binary search tree T in increasing order
of their values. What is its time complexity ?
Ponder over this algorithm for a few minutes to
know what it is doing. You might like to try it out
on some example of BST.
A question


Time complexity of Search(T,x) and Insert(T,x) in a Binary
Search Tree T = ??
25
O(Height(T))
Time complexity of Searching and inserting in a
perfectly balanced Binary Search Tree on n nodes

26
2
28
46
67
96 25
5
31 41
35
49
53 48 73
83
T
O( log n) time
Time complexity of Searching and inserting in a
skewed Binary Search tree on n nodes

27
23
T2
39
48
19
11
14
18
O(n) time !!
A hurdle on our way
Since the elements may be inserted in arbitrary order into (initially empty)
Binary Search Tree, we may get a skewed tree (with height O(n)). This will
force O(n) time complexity for search and insert operation.


Therefore, we need to modify our algorithm so that the height of tree
remains polylogarithmic of the number of nodes in the tree.
How to do it ? .


Since it is difficult to maintain perfectly balanced BST efficiently (think over
it), we maintain a partially balanced binary search tree.
28
Partially balanced Binary Search Tree
Terminology: Henceforth, size of a binary tree would mean the number of
nodes present in it.

Definition: A binary search tree T is said to be partially balanced at node v, if
subtree(v) consists of either one or two nodes. Or
the ratio of the size of the subtrees rooted at its two children is at most 2.
If none of these conditions hold at v, T is said to be partially imbalanced at
node v.

Definition: A binary search tree T is said to be partially balanced if it is
partially balanced at each node; otherwise T is called partially imbalanced.
29
Balancing BST periodically:
Preserving O(log n) height after each operation
Each node in T maintains additional field size(v) which is the
number of nodes in the subtree(v).

Keep Search(T,x) operation unchanged.

Modify Insert(T,x) operation as follows:
Carry out normal insert and update the size field of appropriate nodes.
If BST T gets partially imbalanced at any node v, perfectly balance
subtree(v).
30
Perfectly Balancing subtree at a node v

31
v
Size= k
Size> 2k
Size differs by at most 1
Notice that the modified Insert operation described is not
difficult to understand and implement. But it is quite nontrivial
to show that this algorithm solves our problem. In particular,
the following facts are not immediate:

The height of the partially balanced BST on n nodes will be
O(log n).
The total time spent in balancing various partially imbalanced
trees during a sequence of n insert operations will be small.

32
Hopefully you would have now realized that sometimes
analyzing an algorithm is more difficult than just designing
an algorithm. We shall see many such examples during this
course and the next course (CS345).
How to analyze this algorithm ?
H(n): maximum height of a partially balanced BST on n nodes.
Question: How to show that H(n) = O(log n) ?

H(1) = 0;
H(2) = 1;
H(3) = 1;
Hence H(n) 1 + H(
2
3
n)
1 + 1 + H((
2
3
)
2
n)
= O( log
3/2
n)
33
If tree T on n nodes is partially balanced,
the maximum size of a subtree rooted at
any child of root(T) will be ??
2
3
( 1)
How to analyze this algorithm ?
Each Search and Insert takes O(log n) time.
What about the time spent in perfectly balancing various subtrees?

Goal: Show that the time complexity for any arbitrary sequence of n insert and q search
operations is O((n + q) log n). In other words, average time per operation is O(log n).

Hints (Answer these questions and then try to achieve the above goal):
How many nodes get their size field changed during a single insert operation ?

Starting from a perfectly balanced subtree(v), after how many increments in the size
field of node v, will we have to rebalance subtree(v) ?

What is the time complexity of perfectly balancing subtree(v) ?

34
O(log n)
at least
1
2
size(subtree(v))
O(size(subtree(v)))
Use ideas from Homework 2 and Homework 3 to show this

Try to achieve the goal mentioned in the
previous slide (hints mentioned will be very
useful). The same goal will be accomplished in
some class next week.

35

S-ar putea să vă placă și