Data Structures and Algorithm Analysis
Dr. Malek Mouhoub
Computer Science Department University of Regina Fall 2010
Malek Mouhoub, CS340 Fall 2010
1
1. Algorithm Analysis
1. Algorithm Analysis
• 1.1 Mathematics Review
• 1.2 Introduction to Algorithm Analysis
• 1.3 Asymptotic notation and Growth of functions
• 1.4 Case Study
• 1.5 Data Structures and Algorithm Analysis
Malek Mouhoub, CS340 Fall 2010
2
1.1 Mathematics Review
1.1 Mathematics Review
✞
✝
☎
Exponents
✆
_{X} A _{X} B
X
A
X
B
( X ^{A} ) ^{B}
X ^{N} + X ^{N}
2 ^{N} + 2 ^{N}
_{X} A + B
_{X} A − B
_{=}
_{=}
= X ^{A}^{B}
= 2 X ^{N} = X ^{2} ^{N}
=
2 ^{N} ^{+}^{1}
Malek Mouhoub, CS340 Fall 2010
3
✞
✝
☎
Logarithms
✆
1.1 Mathematics Review
• By default, logarithms used in this course are to the base 2.
• X ^{A} = B ⇔ log _{X} B = A
• log _{A} B = ^{l}^{o}^{g} ^{C} ^{B} _{A} , A, B, C > 0 , A = 1
log _{C}
• log AB = log A + log B ; A, B > 0
• log A/B = log A − log B
• log
• log X < X ∀ X > 0
• log 1 = 0 , log 2 = 1 , log 1 , 024 =
( A ^{B} ) = B log A
, log 1 , 048 , 576 =
Malek Mouhoub, CS340 Fall 2010
4
Summations
✄
✂
✁
^{}
•
_{i}_{=}_{1} N a _{i} = a _{1} + a _{2} + · · · + a _{N}
• lim _{N} _{→}_{∞} ^{}
• Linearity
N i=1 ^{a} ^{i} ^{=}
∞ _{i}_{=}_{1} a _{i} = a _{1} + a _{2} + · · · (inﬁnite sum)
– _{=}_{1} ( ca _{i} + db _{i} ) = c ^{} _{=}_{1} a _{i} + d ^{}
^{}
i
N
i
– _{=}_{1} Θ( f ( i )) = Θ( ^{} _{=}_{1} f ( i ))
^{}
i
i
N
N
N
N
i =1 ^{b} ^{i}
• General algebraic manipulations :
N
– _{=}_{1} f ( N ) = Nf ( N )
i
^{}
– _{=} _{n} 0 f ( i) = ^{}
i
^{}
N
N
i=1 f ( i ) − ^{n} ^{0} ^{−}^{1}
i
=1
f ( i )
1.1 Mathematics Review
Malek Mouhoub, CS340 Fall 2010
5
• Geometric series :
–
^{}
N
i
=0 _{A} _{i} _{=}
✄
Summations
✂
✁
A ^{N} ^{+}^{1} −1
A
− 1
– if 0 < A < 1 then
N
i =0 ^{A} i ^{≤}
1
1
− A
• Arithmetic series :
– =1 _{i} _{=} N (N +1) _{=} N ^{2} + N
^{}
N
i
2
2
≈ ^{N} ^{2}
2
N
– =1 _{i} _{2} _{=} N ( N +1)(2 N +1)
^{}
i
6
≈ ^{N} ^{3}
3
=1 i _{k} ≈ _{N} k +1
–
^{}
N
i
k +1 
^{k} ^{}
= − 1
∗ if k = − 1 then H _{N} = ^{}
N 1
i =1
i
≈ log _{e} N
∗ error in approx : γ ≈ 0 , 57721566
1.1 Mathematics Review
Malek Mouhoub, CS340 Fall 2010
6
Products
✄
✂
✁
N
• ^{} _{i} _{=}_{1} a _{i} = a _{1} × a _{2} × ··· × a _{N}
N
N
• log( ^{} _{=}_{1} a _{i} ) = ^{} _{i}_{=}_{1} log a _{i}
i
1.1 Mathematics Review
Malek Mouhoub, CS340 Fall 2010
7
• Proof by induction
✞
✝
☎
Proving statements
✆
1.1 Mathematics Review
1. Proving a base case : establishing that a theorem is true for some small values.
2. Inductive hypothesis : the theorem is assumed to be true for all cases up to some limit
k .
3. Given this assumption, show that the theorem is true for k + 1
• Proof by Counter example : ﬁnd an example showing that the theorem is not true.
• Proof by Contradiction : Assuming that the theorem is false and showing that this assumption implies that some known property is false, and hence the original assumption was erroneous.
Malek Mouhoub, CS340 Fall 2010
8
1.2 Introduction to algorithm analysis
1.2 Introduction to algorithm analysis
Boss gives the following problem to Mr Dupont, a fresh hired BSc in
computer science (to test him
or may be just for fun) :
T (1) = 3 T (2) = 10 T ( n ) = 2 T ( n − 1) − T ( n − 2)
What is T (100) ?
Malek Mouhoub, CS340 Fall 2010
9
1.2 Introduction to algorithm analysis
Mr Dupont decides to directly code a recursive function tfn, in Java, to solve the problem :
if (n==1) { return 3; } else if (n==2) {return 10;} else { return 2 ∗ tfn(n1)  tfn(n2); }
1. First mistake : no analysis of the problem
⇒ risk to be ﬁred !
2. Second mistake : bad choice of the programming language
⇒ increasing the risk to be ﬁred !
Malek Mouhoub, CS340 Fall 2010
10
1.2 Introduction to algorithm analysis
n 
= 1 
→ 
3 

n 
= 2 
→ 
10 

n 
= 3 
→ 
17 

n = 35 
→ 
it takes 4.19 seconds 

_{n} = 100 
_{→} 
waits 
and then kills the program ! 
Mr Dupont decides then to use C :
if (n==1) return 3; if (n==2) return 10; return 2 ∗ tfn(n1)  tfn(n2);
n 
= 35 
→ 
it takes only 1.25 seconds 
n 
= 50 
→ 
waits and then kills the program ! 
Malek Mouhoub, CS340 Fall 2010
11
seconds
1.2 Introduction to algorithm analysis
Finally, Mr Dupont decides to (experimentally) analyze the problem : he times both programs and plots the results.
100
10
1
0.1
25
30
35
40
N
It seems that each time n increases by 1 the time increases by ≈ 1.62. At n = 40
the C program took 13.79 seconds, so for n = 100 he estimates :
13 .79 × 1 .62 ^{6}^{0} ≈ 1 , 627 , 995 years !!!
Malek Mouhoub, CS340 Fall 2010
12
1.2 Introduction to algorithm analysis
• Mr Dupont remembers he has seen this kind of problem in one of the courses he has taken (data structures course).
• After consulting his course notes, Mr Dupont decides to use Dynamic Programming :
int t[n+1];
t[1]=3;
t[2]=10;
for(i=3;i¡=n;i++)
t[i] = 2 ∗t[i1]  t[i2]; return t[n];
Malek Mouhoub, CS340 Fall 2010
13
1.2 Introduction to algorithm analysis
This solution provides a much better complexity in time but despite of the space complexity :
• n = 100 takes only a fraction of a second,
• but for n = 10 , 000 , 000 (a test that may make the boss happy if it succeeds) a segmentation fault occurs. Too much memory required.
Malek Mouhoub, CS340 Fall 2010
14
• Mr Dupont analyses the problem again :
1.2 Introduction to algorithm analysis
– there is no reason to keep all the values, only the last 2 :
if (n==1) return 3; last = 3; current = 10; for (i=3;i< = n;i++) { temp = current; current = 2∗current  last; last = temp; } return current;
– At n = 100,000,000 it takes 3.00 seconds
– At n = 200,000,000 it takes 5,99 seconds
– at n = 300,000,000 it takes 8.99 seconds
Malek Mouhoub, CS340 Fall 2010
15
How to solve such problems ?
1.2 Introduction to algorithm analysis
1. Analyze the problem on paper in order to ﬁnd an efﬁcient algorithm in terms of time and memory space complexity.
(a) 
First look at the problem : 

T 
(1) 
= 
3 

T 
(2) 
= 
10 

T 
(3) 
= 
17 

T 
(4) 
= 
24 

T 
(5) 
= 
31 

Each step increases the result by 7. 

(b) 
Guess : T ( n) = 7 n − 4 

(c) 
Proof by induction 
2. Code :
return 7∗n4
Malek Mouhoub, CS340 Fall 2010
16
An algorithm analyst might ask :
1.2 Introduction to algorithm analysis
1. What makes the ﬁrst program so slow ?
2. How fast are the 3 programs asymptotically ?
3. Is the last version really the ultimate solution ?
Malek Mouhoub, CS340 Fall 2010
17
1.2 Introduction to algorithm analysis
Let us look at the recursion tree for the ﬁrst program at n=4.
Here each circle represents one call to the routine tfn. So, for n=4 there are 5 such calls.
In general, a call to tfn(n) requires a recursive call to tfn(n1)(represented by the
shaded region on the left) and a call to tfn(n2)(shaded region on the right).
Malek Mouhoub, CS340 Fall 2010
18
1.2 Introduction to algorithm analysis
If we let f ( n ) represent the number of calls to compute T ( n ) , then :
f ( n) 
= 
f ( n − 1) + f ( n − 2) + 1 
f (1) 
= 
f (2) = 1 
This is a version of the famous Fibonacci recurrence.
It is known that f ( n) ≈ 1 . 618 ^{n} .
This agrees very well with the times we presented earlier where each increase of n by 1 increases the time by a factor of a little under 1.62.
We say such growth is exponential with asymptotic growth rate O (1 . 618 ^{n} ) .
This answers question (1).
Malek Mouhoub, CS340 Fall 2010
19
In the second and third program there was a loop
for (i=3;i<=n;i++)
1.2 Introduction to algorithm analysis
This loop contained two or three assignments, a multiplication and a subtraction.
We say such a loop takes O (n ) time.
This means that running time is proportional to n.
Recall that increasing n from 100 million to 300 million increased the time from approximately 3 to approximately 9 seconds.
The last program has one multiplication and one subtraction and takes O (1) or constant time.
This answers question (2).
Malek Mouhoub, CS340 Fall 2010
20
1.2 Introduction to algorithm analysis
• The answer to the last question is also NO. If the boss asked for
T (123456789879876543215566340014733134213)
we would get integer overﬂow on most computers.
• Switching to a ﬂoating point representation would be of no value since we need to maintain all the signiﬁcant digits in our results.
• The only alternative is to use a method to represent and manipulate large integers.
Malek Mouhoub, CS340 Fall 2010
21
1.2 Introduction to algorithm analysis
• A natural way to represent a large integer is to use an array of integers, where each array slot stores one digit.
• The addition and subtraction require a lineartime algorithm.
• A simple algorithm for multiplication requires a quadratictime cost.
Malek Mouhoub, CS340 Fall 2010
22
1.2 Introduction to algorithm analysis
The third question, “is the last program the ultimate solution”, is more of a computer science question.
A Computer Scientist might ask :
1. How do you justify counting function calls in the ﬁrst case, counting array assignments in the second case, counting variable assignments in the third, and counting arithmetic operations in the last ?
2. Is it really true that you can multiply two arbitrary large numbers together in constant time ?
3. Is the last program really the ultimate one ?
Malek Mouhoub, CS340 Fall 2010
23
1.2 Introduction to algorithm analysis
CS questions with an engineering orientation :
• What general techniques can we use to solve computational problems ?
• What data structures are best, and in what situations ?
• Which models should we use to analyze algorithms in practice ?
• When trying to improve the efﬁciency of a given program, which aspects should we focus on ﬁrst ?
Malek Mouhoub, CS340 Fall 2010
24
1.3 Asymptotic notation and Growth of functions
1.3 Asymptotic notation and Growth of functions
• Running time of an algorithm almost always depends on the amount of input : more inputs means more time. Thus the running time T , is a function of the amount of input, N , or T ( N ) = f ( N ) where N is in general a natural number.
• The exact value of the function depends on :
– the speed of the machine;
– the quality of the compiler and optimizer;
– the quality of the program that implements the algorithm;
– the basic fundamentals of the algorithm
• Typically, the last item is most important.
Malek Mouhoub, CS340 Fall 2010
25
1.3 Asymptotic notation and Growth of functions
Worstcase versus Averagecase
• Worstcase running time is a bound over all inputs of a certain size N . (Guarantee)
• Averagecase running time is an average over all inputs of a certain size N . (Prediction)
Malek Mouhoub, CS340 Fall 2010
26
Θ
notation
1.3 Asymptotic notation and Growth of functions
For a given function g ( n), we denote by Θ( g (n )) the set of functions
Θ( g ( n)) = {f (n ) : ∃ c _{1} , c _{2} , and n _{0} such that 0 ≤ c _{1} g (n ) ≤ f (n ) ≤ c _{2} g (n ) for all n ≥ n _{0} }
We say that g (n ) is an asymptotically tight bound for f (n ).
Example : The running time of insertion sort is T (n ) = Θ( n ^{2} ).
Malek Mouhoub, CS340 Fall 2010
27
Ωnotation
1.3 Asymptotic notation and Growth of functions
For a given function g ( n ) , we denote by Ω( g ( n )) the set of functions :
Ω( g ( n )) = { f ( n ) : ∃ c and n _{0} such that 0 ≤ cg ( n ) ≤ f ( n ) for all n ≥ n _{0} }
We say that g ( n) is an asymptotic lower bound for f ( n ) .
Malek Mouhoub, CS340 Fall 2010
28
BigOh notation
1.3 Asymptotic notation and Growth of functions
For a given function g ( n ) , we denote by O ( g ( n )) the set of functions :
O ( g ( n )) = { f ( n ) : ∃ c and n _{0} such that 0 ≤ f ( n ) ≤ cg ( n ) for all n ≥ n _{0} }
We say that g ( n) is an asymptotic upper bound for f ( n ) .
Note that O notation is, in general, used informally to describe asymptotically tight bounds (Θ notation).
Malek Mouhoub, CS340 Fall 2010
29
1.3 Asymptotic notation and Growth of functions
BigOh notation
• Exponential : dominant term is some constant times 2 ^{N} .
• Cubic : dominant term is some constant times N ^{3} . We say O (N ^{3} ).
• Quadratic : dominant term is some constant times N ^{2} . We say O (N ^{2} ) .
• O (N log N ) : dominant term is some constant times N log N .
• Linear : dominant term is some constant times N . We say O (N ).
• Logarithmic : dominant term is some constant times log N.
• Constant : c .
Note : BigOh ignores leading constants.
Malek Mouhoub, CS340 Fall 2010
30
1.3 Asymptotic notation and Growth of functions
Dominant Term Matters
• Suppose we estimate 35N ^{2} + N + N ^{3} .
• For N=10000 :
– Actual value is 1,003,500,010,000
– Estimate is 1,000,000,000,000
– Error in estimate is 0.35%, which is negligible.
• For large N , dominant term is usually indicative of algorithm’s behavior.
• For small N , dominant term is not necessarily indicative of behavior, BUT, typically programs on small inputs run so fast we don’t care anyway.
Malek Mouhoub, CS340 Fall 2010
31
1.3 Asymptotic notation and Growth of functions
Example 1 : Computing the Minimum
• Minimum item in an array
– Given an array of N items, ﬁnd the smallest.
• Obvious algorithm is sequential scan.
• Running time is O ( N ) (linear) because we repeat a ﬁxed amount of work for each element in the array.
• A linear algorithm is a good as we can hope for because we have to examine every element in the array, a process that requires linear time.
Malek Mouhoub, CS340 Fall 2010
32
1.3 Asymptotic notation and Growth of functions
Example 2 : Closest Points
• Closest Points in the Plane
– Given N points in a plane (that is, an xy coordinate system, ﬁnd the pair of points that are closest together).
• Fundamental problem in graphics.
• Solution : Calculate the distance between each pair of points, and retain the minimum distance.
• N ( N − 1) / 2 pairs of points, so the algorithm is quadratic.
• Better algorithms that use more subtle observations are known.
Malek Mouhoub, CS340 Fall 2010
33
1.3 Asymptotic notation and Growth of functions
Example 3 : Colinear Points in the Plane
• Colinear points in the plane
– Given N points in the plane, determine if any three form a straight line.
• Important in graphics : colinear points introduce nasty degenerate cases that require special handling.
• Solution : enumerate all groups of three points; for each possible triplet of three points check if the points are colinear. This is a cubic algorithm.
Malek Mouhoub, CS340 Fall 2010
34
1.4 Case Study
1.4 Case Study
• Examine a problem with several different solutions.
– Will look at four algorithms
– Some algorithms much easier to code than others.
– Some algorithms much easier to prove correct than others.
– Some algorithms much, much faster (or slower) than others.
Malek Mouhoub, CS340 Fall 2010
35
The problem
1.4 Case Study
• Maximum Contiguous Subsequence Sum Problem
– Given (possibly negative integers) A _{1} , A _{2} ,
, A _{N}
ﬁnd (and identify the sequence corresponding to) the maximum value of (A _{i} + A _{i} _{+}_{1} + ··· + A _{j} ) .
• The maximum contiguous subsequence sum is zero if all the integers are negative.
• Examples :
– 2, 11, 4, 13, 4, 2
– 1, 3, 4, 2, 1, 6
Malek Mouhoub, CS340 Fall 2010
36
Brute Force Algorithm
int
{
MaxSubSum1(const vector<int> & A)
int MaxSum =0; for (int i=0; i<A.size(); i++) for (int j=i; j <A.size();j++)
{
int ThisSum = 0; for (int k=i; k<=j; k++) ThisSum += A[k]; if (ThisSum > MaxSum) MaxSum = ThisSum;
}
return MaxSum;
}
1.4 Case Study
Malek Mouhoub, CS340 Fall 2010
37
Analysis
1.4 Case Study
• Loop of size N inside of loop of size N inside of loop of size N means O ( N ^{3} ) , or cubic algorithm.
• Slight overestimate that results from some loops being of size less than N is not important.
Malek Mouhoub, CS340 Fall 2010
38
1.4 Case Study
Actual Running time
• For N = 100 , actual time is 0.47 seconds on a particular computer.
• Can use this to estimate time for larger inputs :
T 
( N ) = cN ^{3} 
T 
(10 N ) = c (10 N ) ^{3} = 1000 cN ^{3} = 1000 T ( N ) 
• Inputs size increases by a factor of 10 means that running time increases by a factor of 1,000.
• For N=1000, estimate an actual time of 470 seconds. (Actual was 449 seconds).
• For N=10,000, estimate 449000 seconds (6 days).
Malek Mouhoub, CS340 Fall 2010
39
How to improve
• Remove a loop; not always possible.
1.4 Case Study
• Here it is : innermost loop is unnecessary because it throws away information.
• ThisSum for next j is easily obtained from old value of
ThisSum :
– Need A _{i} + A _{i}_{+}_{1} + ··· + A _{j} _{−} _{1} + A _{j}
– Just computed A _{i} + A _{i} _{+}_{1} + ··· + A _{j} _{−} _{1}
– What we need is what we just computed + A _{j} .
Malek Mouhoub, CS340 Fall 2010
40
1.4 Case Study
The Better Algorithm
int MaxSubSum2(const vector<int> & A)
{ 

int MaxSum = 0; for (int i=0; i < A.size(); i++) 

{ 

int ThisSum =0; 

for (int j=i; j< A.size(); j++) 

{ 

ThisSum += A[j]; if (ThisSum > MaxSum) MaxSum = ThisSum; 

} 

} 

return MaxSum; 

} 
Malek Mouhoub, CS340 Fall 2010
41
Analysis
1.4 Case Study
• Same logic as before : now the running time is quadratic, or
O ( N ^{2} ) .
• As we will see, this algorithm is still usable for inputs in the tens of thousands.
• Recall that the cubic algorithm was not practical for this amount of input.
Malek Mouhoub, CS340 Fall 2010
42
1.4 Case Study
Actual running time
• For N = 100 , actual time is 0.0111 seconds on the same particular computer.
• Can use this to estimate time for larger inputs :
T 
( N ) = cN ^{2} 
T 
(10 N ) = c (10 N ) ^{2} = 100 cN ^{2} = 100 T ( N ) 
• Inputs size increases by a factor of 10 means that running time increases by a factor of 100.
• for N = 1000 , estimate a running time of 1.11 seconds. (Actual was 1.12 seconds).
• For N = 10 , 000 , estimate 111 seconds (=actual).
Malek Mouhoub, CS340 Fall 2010
43
Linear Algorithms
• Linear algorithm would be best.
1.4 Case Study
• Running time is proportional to amount of input. Hard to do better for an algorithm.
• If inputs increases by a factor of ten, then so does running time.
Malek Mouhoub, CS340 Fall 2010
44
Recursive algorithm
• Use a divideandconquer approach.
• The maximum subsequence either
– lies entirely in the ﬁrst half
– lies entirely in the second half
1.4 Case Study
– starts somewhere in the ﬁrst half, goes to the last element in the ﬁrst half, continues at the ﬁrst element in the second half, ends somewhere in the second half.
• Compute all three possibilities, and use the maximum.
• First two possibilities easily computed recursively.
Malek Mouhoub, CS340 Fall 2010
45
1.4 Case Study
Computing the third case
• Idea :
1. Find the largest sum in the ﬁrst half, that includes the last element in the ﬁrst half.
2. Find the largest sum in the second half that includes the ﬁrst element in the second half.
3. Add the 2 sums together.
• Implementation :
– Easily done with two loops.
– For maximum sum that starts in the ﬁrst half and extends to the last element in the ﬁrst half, use a righttoleft scan starting at the last element in the ﬁrst half.
– For the other maximum sum, do a lefttoright scan, starting at the ﬁrst half.
Malek Mouhoub, CS340 Fall 2010
46
Analysis
1.4 Case Study
• Let T ( N ) = the time for an algorithm to solve a problem of size N .
• Then T (1) = 1 (1 will be the quantum time unit; constants don’t matter).
• T ( N ) = 2 T ( N/ 2) + N
– Two recursive calls, each of size N/ 2 . The time to solve each recursive call is T ( N/ 2) by the above deﬁnition.
– Case three takes O ( N ) time; we use N , because we will throw out the constants eventually.
Malek Mouhoub, CS340 Fall 2010
47
Bottom Line
1.4 Case Study
T 
(1) = 
1 = 1 ∗ 1 
T 
(2) = 
2 ∗ T (1) + 2 = 4 = 2 ∗ 2 = 2 ^{1} ∗ 2 
T 
(4) = 
2 ∗ T (2) + 4 = 12 = 4 ∗ 3 = 2 ^{2} ∗ 3 
T 
(8) = 
2 ∗ T (4) + 8 = 32 = 8 ∗ 4 = 2 ^{3} ∗ 4 
T (16) = T (32) = T (64) = 
2 ∗ T (8) + 16 = 80 = 16 ∗ 5 = 2 ^{4} ∗ 5 2 ∗ T (16) + 32 = 192 = 32 ∗ 6 = 2 ^{5} ∗ 6 2 ∗ T (32) + 64 = 448 = 64 ∗ 7 = 2 ^{6} ∗ 7 
T ( N ) = 2 ^{k} ∗ ( k + 1) = N (1 + log N ) = O ( N log N )
Malek Mouhoub, CS340 Fall 2010
48
N log N
1.4 Case Study
• Any recursive algorithm that solves two halfsized problems and does linear nonrecursive work to combine/split these solutions will always take O ( N log N ) time because the above analysis will always hold.
• This is a very signiﬁcant improvement over quadratic.
• It is still not as good as O ( N ) , but is not that far away either. There is a lineartime algorithm for this problem. The running time is clear, but the correctness is nontrivial.
Malek Mouhoub, CS340 Fall 2010
49
1.4 Case Study
The Lineartime algorithm
/ _{*}_{*} Lineartime maximum contiguous subsequence.
/
sum algorithm
_{*}
int maxSubSum4( const vector<int> & a )
{
/ _{*} 1 _{*} / 
int maxSum = 0, thisSum = 0; 
/ _{*} 2 _{*} / 
for( int j = 0; j < a.size( ); j++ ) 
{ 

/ _{*} 3 _{*} / / _{*} 4 _{*} / / _{*} 5 _{*} / / _{*} 6 _{*} / / _{*} 7 _{*} / / _{*} 8 _{*} / 
thisSum += a[ j ]; if( thisSum > maxSum ) maxSum = thisSum; else if( thisSum < 0 ) thisSum = 0;} return maxSum;} 
Malek Mouhoub, CS340 Fall 2010
50
The Logarithm
• Formal Deﬁnition
1.4 Case Study
– For any B, N > 0 , log _{B} N = K if B ^{K} = N
– If the base B is omitted, it defaults to 2 in computer science.
• Examples :
– log 32 = 5 (because 2 ^{5} = 32 )
– log 1024 = 10
– log 1048576 = 20
– log 1 billion = about 30
• The logarithm grows much more slowly than N , and slower
than ^{√} N .
Malek Mouhoub, CS340 Fall 2010
51
1.4 Case Study
Examples of the Logarithm
Bits in a binary number : how many bits are required to represent
N consecutive integers ?
Repeated doubling : starting from X = 1 , how many times
˙
should X be doubled before it is at least as large as N ?
Repeated halving : Starting from X = N , if N is repeatedly
halved, how many iterations must be applied to make N smaller
than or equal to 1 ? (Halving rounds up).
Answer to all of the above is log N (rounded up).
Malek Mouhoub, CS340 Fall 2010
52
Why log N
1.4 Case Study
• B bits represents 2 ^{B} integers. Thus 2 ^{B} is at least as big as N , so B is at least log N . Since B must be an integer, round up if needed.
• Same logic for the other examples.
Malek Mouhoub, CS340 Fall 2010
53
1.4 Case Study
Repeated Halving Principle
• An algorithm is O (log N ) if it takes constant time to reduce the problem size by a constant fraction (which is usually 1/2).
• Reason : there will be log N iterations of constant work.
Malek Mouhoub, CS340 Fall 2010
54
1.4 Case Study
Static Searching
• Given an integer X and an array A , return the position of X in A or an indicator that it is not present. If X occurs more than once, return any occurrence. The array A is not altered.
• If input array is not sorted, solution is to use a sequential search. Running times :
– Unsuccessful search : O ( N ) ; every item is examined.
– Successful search :
∗ Worst case : O ( N ) ; every item is examined.
∗ Average case :O ( N/ 2) ; half the items are examined.
• Can we do better if we know the array is sorted ?
Malek Mouhoub, CS340 Fall 2010
55
Binary Search
• Yes ! use a binary search.
• Look in the middle :
1.4 Case Study
Case 1: If X is less than the item in the middle, then look in the subarray to the left of the middle.
Case 2: If X is greater than the item in the middle, then look in the subarray to the right of the middle.
Case 3 : If X is equal to the item in the middle, then we have a match.
Base Case : If the subarray is empty, X is not found.
• This is logarithmic by the repeated halving principle.
Malek Mouhoub, CS340 Fall 2010
56
1.4 Case Study
Binary Search Continued
• Binary search is an example of a data structure implementation :
– Insert : O ( N ) time per operation, because we must insert and maintain the array in sorted order.
– Delete : O ( N ) time per operation, because we must slide elements that are to the right of the deleted element over one spot to maintain contiguity.
– Find : O (log N ) time per operation, via binary search.
• In this course we examine different data structures. Generally we allow Insert, Delete, and Find, but Find and Delete are usually restricted.
Malek Mouhoub, CS340 Fall 2010
57
1.5 Data Structures and Algorithm Analysis
1.5 Data Structures and Algorithm Analysis
A scalar item
A sequential vector
A ndimentional space
A linked list
_{A} _{h}_{i}_{e}_{r}_{a}_{r}_{c}_{h}_{i}_{c}_{a}_{l} _{t}_{r}_{e}_{e}
Malek Mouhoub, CS340 Fall 2010
58
1.5 Data Structures and Algorithm Analysis
• The most important property to express of any entity in a system is its type.
• In this course we used entities that are structured objects (e.g., an object that is a collection of other objects).
• When determining its type, the kinds of distinguishing properties include :
Ordering :
are elements ordered or unordered ? If ordering matters, is the order partial or
total ? Are elements removed FIFO (queues), LIFO (stacks), or by priority (priority
queues) ?
Duplicates :
Boundedness :
are duplicates allowed ?
is the object bounded in size or unbounded ? Can the bound change or it
is ﬁxed at creation time ?
Associative access :
are elements retrieved by an index or key ? Is the type of the index
builtin (e.g. as for sequences and arrays) or userdeﬁnable (e.g. as for symbol tables and hash tables) ?
Shape :
is the structure of the object linear, hierarchical, acyclic, ndimensional, or
arbitrarily complex (e.g. graphs, forests) ?
Malek Mouhoub, CS340 Fall 2010
59
1.5 Data Structures and Algorithm Analysis
Abstract Data Type (ADT)
• Set of data together with a set of operations.
• Deﬁnition of an ADT : [what to do ?]
– Deﬁnition of data and the set of operations (functions).
• Implementation of an ADT : [how to do it ?]
– How are the objects and operations implemented.
⇒ use the C++ class.
Malek Mouhoub, CS340 Fall 2010
60
1.5 Data Structures and Algorithm Analysis
Array Implementation of Lists
• Contiguous allocation of memory to store the elements of the list.
• Estimation (overestimation) of the maximum size of the list is required
⇒ waste of memory space.
• O ( N ) for find, constant time for findKth
• But O ( N ) is required for insertion and deletion in the worst case.
⇒ building a list by N successive inserts would require O ( N ^{2} ) in the worst case.
Malek Mouhoub, CS340 Fall 2010
61
findKth(3)=52
List
1
2
3
4
5
n
34
12
52
16
22
findKth=List[Kth]
O(C)
find(52)=3
1
2
3
4
5
n
34
12
52
16
22
_{f}_{i}_{n}_{d}_{(}_{X}_{)}_{:} _{O}_{(}_{n}_{)}
1.5 Data Structures and Algorithm Analysis
1
2
3
4
5
n
remove(34)
12
52
16
22
removeKth: O(n)
remove(X):O(n)
insert(1,34)
1
2
3
4
5
n
34
12
52
16
12
insert(kth,X)
O(n)
Figure 1: Contiguous allocation of memory to store the elements of the list
Malek Mouhoub, CS340 Fall 2010
62
Linked Lists
1.5 Data Structures and Algorithm Analysis
• Non contiguous allocation of memory.
• O ( N ) for find
• O ( N ) for findKth (but better time in practice if the calls to findKth are in sorted order by the argument).
• Constant time for insertion and deletion.
Malek Mouhoub, CS340 Fall 2010
63
^{H}^{e}^{a}^{d}
5
6
7
8
Data
Link
34
52
\
12
22
5
7
3
4
16
Head
1.5 Data Structures and Algorithm Analysis
List = 34 12 52 16 22
printList():O(n)
find(x):O(n)
findKth(i):O(i)
Figure 2: Non contiguous allocation of memory
Malek Mouhoub, CS340 Fall 2010
64
Mult mai mult decât documente.
Descoperiți tot ce are Scribd de oferit, inclusiv cărți și cărți audio de la editori majori.
Anulați oricând.