Sunteți pe pagina 1din 73

Data Structures and Algorithms

Sorting Algorithms

Tahira Mahboob

Bubble sort


Compare each element (except the last one) with its neighbor to the right  If they are out of order, swap them  This puts the largest element at the very end  The last element is now in the correct and final place Compare each element (except the last two) with its neighbor to two) the right  If they are out of order, swap them  This puts the second largest element next to last  The last two elements are now in their correct and final places Compare each element (except the last three) with its neighbor three) to the right  Continue as above until you have no unsorted elements on the left

Example of bubble sort


7 2 8 5 4 2 7 8 5 4 2 7 8 5 4 2 7 5 8 4 2 7 5 4 8 2 7 5 4 8 2 7 5 4 8 2 5 7 4 8 2 5 4 7 8 2 5 4 7 8 2 5 4 7 8 2 4 5 7 8 2 4 5 7 8 2 4 5 7 8 (done)

Code for bubble sort


void bubbleSort(int a[], int size) { int outer, inner; for (outer =size - 1; outer > 0; outer--) { // counting down outer--) for (inner = 0; inner < outer; inner++) { // bubbling up if (a[inner] > a[inner + 1]) { // if out of order... int temp = a[inner]; // ...then swap a[inner] = a[inner + 1]; a[inner + 1] = temp; } } } }

Analysis of bubble sort




for (outer = size - 1; outer > 0; outer--) { outer--) for (inner = 0; inner < outer; inner++) { if (a[inner] > a[inner + 1]) { // code for swap omitted } }}
Let n = size-1 = size of the array sizeThe outer loop is executed n-1 times (call it n, thats close enough) Each time the outer loop is executed, the inner loop is executed  Inner loop executes n-1 times at first, linearly dropping to just once  On average, inner loop executes about n/2 times for each execution of the outer loop

  

In the inner loop, the comparison is always done (constant time), the swap might be done (also constant time) Result is n * n/2 * k, that is, O(n2/2 + k) = O(n2) k,


Implemented by inserting a particular element at the appropriate position. First iteration starts with the comparison of 1st element with the 0th element In second iteration 2nd element is compared with the 0th element is compared with the 0th and 1st element. In general, in every iteration an element is compared with all the elements before it. And if element in question can be inserted at suitable position, then space created for it by shifting elements one position to the right

Insertion Sort

Insertion sort
5 7 0 3 4 2 6 1 (0) 5 7 0 3 4 2 6 1 (0) 0 5 7 3 4 2 6 1 (2) 0 3 5 7 4 2 6 1 (2) 0 3 4 5 7 2 6 1 (2) 0 2 3 4 5 7 6 1 (4) 0 2 3 4 5 6 7 1 (1) 0 1 2 3 4 5 6 7 (6)

Insertion sort
void insertionsort(int a [ ], int size) insertionsort( { int i, temp, j; for(i=1;i<size; for(i=1;i<size; i++) { temp=a[i]; for(j=i-1;j>=0;j--) for(j=i-1;j>=0;j--) { if(a[j]>temp) a[j+1]=a[j]; else break; } a[j+1]=temp; } }

Analysis of insertion sort




We run once through the outer loop, inserting each of n elements; this is a factor of n On average, there are n/2 elements already sorted  The inner loop looks at (and moves) half of these  This gives a second factor of n/4 Hence, the time required for an insertion sort of an array of n elements is proportional to n2/4 Discarding constants, we find that insertion sort is O(n2)

Summary
 

Bubble sort and insertion sort are all O(n2) As we will see later, we can do much better than this with somewhat more complicated sorting algorithms Within O(n2),  Bubble sort is very slow, and should probably never be used for anything  Insertion sort is usually the fastest of the two--in fact, for two--in small arrays (say, 10 or 15 elements), insertion sort is faster than more complicated sorting algorithms Aother algorithm selection sort and insertion sort are good enough for small arrays.

Divide and Conquer Algorithms




 

Base case: the problem is small enough, solve directly Divide the problem into two or more similar and smaller subproblems Recursively solve the subproblems Combine solutions to the subproblems We will study two divide and Conquer algorithms
 

Merge Sort Quick Sort

Divide and Conquer Algorithms




Base case


single element (n=1), return




Divide A into two subarrays: FirstPart, SecondPart


Two Subproblems:
 

sort the FirstPart sort the SecondPart

Recursively
 

sort FirstPart sort SecondPart

Combine sorted FirstPart and sorted second part

Algorithm
void mergesort(int lo, int hi) mergesort(int { if (lo<hi) { int m=(lo+hi)/2; mergesort(lo, m); mergesort(m+1, hi); merge(lo, m, hi); } }

MergeSort

MergeSort (Example) - 2

MergeSort (Example) - 3

MergeSort (Example) - 4

MergeSort (Example) - 5

MergeSort (Example) - 6

MergeSort (Example) - 7

MergeSort (Example) - 8

MergeSort (Example) - 9

MergeSort (Example) - 10

MergeSort (Example) - 11

MergeSort (Example) - 12

MergeSort (Example) - 13

MergeSort (Example) - 14

MergeSort (Example) - 15

MergeSort (Example) - 16

MergeSort (Example) - 17

MergeSort (Example) - 18

MergeSort (Example) - 19

MergeSort (Example) - 20

MergeSort (Example) - 21

MergeSort (Example) - 22

14 23 45 98

6 33 42 67

14 23 45 98

6 33 42 67

Merge

14 23 45 98 6 Merge

6 33 42 67

14 23 45 98 6 14 Merge

6 33 42 67

14 23 45 98 6 14 23 Merge

6 33 42 67

14 23 45 98 6 14 23 33 Merge

6 33 42 67

14 23 45 98 6 14 23 33 42 Merge

6 33 42 67

14 23 45 98

6 33 42 67

6 14 23 33 42 45 Merge

14 23 45 98

6 33 42 67

6 14 23 33 42 45 67 Merge

14 23 45 98

6 33 42 67

6 14 23 33 42 45 67 98 Merge

MergeMerge-Sort Analysis
Time
 

Most of the work is in the merging Total time: O(n log n) O(n), more space than other sorts.

Space:


Quick sort Algorithm


Given an array of n elements (e.g., integers):  If array only contains one element, return  Else  pick one element to use as pivot.  Partition elements into two sub-arrays: sub Elements less than or equal to pivot  Elements greater than pivot  Quicksort two sub-arrays sub Return results

Quicksort
To sort a[left...right]
1. if left < right 1.1. Partition a[left...right] such that all a[left...p-1] are less than a[p], and a[left...pall a[p+1...right] are >= a[p] 1.2. Quicksort a[left...p-1] a[left...p1.3. Quicksort a[p+1...right] 2. Terminate

Partitioning


A key step in the Quicksort algorithm is partitioning the array


We choose some (any) number p in the array to use as a pivot  We partition the array into three parts:


numbers < p

numbers greater >= p

Partitioning


 

Choose an array value (say, the first) to use as the pivot Starting from the left end, find the first element that is greater than or equal to the pivot Searching backward from the right end, find the first element that is less than the pivot Interchange (swap) these two elements Repeat, searching from where we left off, until done

Partitioning


To partition a[left...right]

1. Set p = a[left], l = left + 1, r = right; 2. while l < r, do 2.1. while l < right & a[l] < p, set l = l + 1 2.2. while r > left & a[r] >= p, set r = r - 1 2.3. if l < r, swap a[l] and a[r] 3. Set a[left] = a[r], a[r] = p 4. Terminate

Example of partitioning
        

choose pivot: search: swap: search: swap: search: swap: search: swap with pivot:

436924312189356 436924312189356 433924312189656 433924312189656 433124312989656 433124312989656 433122314989656 4 3 3 1 2 2 3 1 4 9 8 9 6 5 6 (left > right) 133122344989656

Analysis of Quick sortbest case sort




Suppose each partition operation divides the array almost exactly in half Then the depth of the recursion is log2n


Because thats how many times we can halve n

However, there are many recursions!


How can we figure this out?  We note that


Each partition is linear over its subarray  All the partitions at one level cover the array


Partitioning at various levels

Best case
  

 

We cut the array size in half each time So the depth of the recursion in log2n At each level of the recursion, all the partitions at that level do work that is linear in n O(log2n) * O(n) = O(n log2n) Hence in the average case, quicksort has time complexity O(n log2n) What about the worst case?

Worst case


In the worst case, partitioning always divides the size n array into these three parts:
A length one part, containing the pivot itself  A length zero part, and  A length n-1 part, containing everything else


 

We dont recur on the zero-length part zeroRecurring on the length n-1 part requires (in the worst case) recurring to depth n-1

Worst case partitioning

Worst case for quicksort




  

In the worst case, recursion may be n levels deep (for an array of size n) But the partitioning work done at each level is still n O(n) * O(n) = O(n2) So worst case for Quicksort is O(n2) When does this happen?


When the array is sorted to begin with!

Typical case for quicksort




  

If the array is sorted to begin with, Quicksort is terrible: O(n2) It is possible to construct other bad cases However, Quicksort is usually O(n log2n) The constants are so good that Quicksort is generally the fastest algorithm known Most real-world sorting is done by Quicksort real-

Picking a better pivot




Before, we picked the first element of the subarray to use as a pivot  If the array is already sorted, this results in O(n2) behavior  Its no better if we pick the last element We could do an optimal quicksort (guaranteed O(n log n)) if we always picked a pivot value that n)) exactly cuts the array in half  Such a value is called a median: half of the values median: in the array are larger, half are smaller  The easiest way to find the median is to sort the array and pick the value in the middle (!)

Median of three
 

Obviously, it doesnt make sense to sort the array in order to find the median to use as a pivot Instead, compare just three elements of our (sub)array (sub)array the first, the last, and the middle  Take the median (middle value) of these three as pivot  Its possible (but not easy) to construct cases which will make this technique O(n2) Suppose we rearrange (sort) these three numbers so that the smallest is in the first position, the largest in the last position, and the other in the middle  This lets us simplify and speed up the partition loop

Heap Sort
  

Uses a heap as its data structure In-place sorting algorithm memory efficient Time complexity O(n log(n))

What is a Heap
A heap is also known as a priority queue and can be represented by a binary tree with the following properties:


Structure property: A heap is a completely filled binary tree property:


with the exception of the bottom row, which is filled from left to right

Heap Order property: For every node x in the heap, the property:
parent of x greater than or equal to the value of x. (known as a maxHeap).

Example:
a heap
44 53

25

15

21

13

18

12

Algorithm


Step 1. Build Heap O(n)  Build binary tree taking N items as input, ensuring the heap structure property is held, in other words, build a complete binary tree.  Heapify the binary tree making sure the binary tree satisfies the Heap Order property. Step 2. Perform n deleteMax operations O(log(n))  Delete the maximum element in the heap which is the root node, and place this element at the end of the sorted array.

Simplifying things


For speed and efficiency we can represent the heap with an array. Place the root at array index 1, its left child at index 2, its right child at index 3, so on and so forth
53 44 25 15 21 13 18 3 12 5 7

53

44

25

15

21

13

18

12

10 11

53 44 25 15 21 13 18 3 12 5 7

53

44

25

15

21

13

18

12

10 11

For any node i, the following formulas apply:


The index of its parent = i / 2 Index of left child = 2 * i Index of right child = 2 * i + 1

Sample Run


Start with unordered array of data


21 15 25 3 5 12 7 19 45 2 9

Array representation:

Binary tree representation:


21 15 3 5 12 25 7

19

45

Sample Run


Heapify the binary tree 21 15 3 19 45 2 5 9 12 25 7 3 19 45 2 15 9 5 12 21 25 7 45 19 3 2 15 9 5 12 21 25 7

45 21 19 15 3 2 9 5 12 25 7 19 15 3 2 45 9

21 25 12 5 7 45 19 3 2 15 9

21 25 12 5 7

Step 2 perform n 1 deleteMax(es), and replace last element in heap with


first, then re-heapify. Place deleted element in the last nodes position. re45 21 19 15 3 2
45 21 25 19 9

25 25 21 7 19 15 3 2
15 3 2 5 25 21 12 19 9 5 7 15 3 2 45

12 9 5 7

9 5
12 7

12

25 21 19 15 3 2
25 21 12 19 9 5 7 15 3 2 45

21 12 19 7 2 15 3
5 7 2 3 25 45

12 9 5 7

21 19 12 15 9

21 19 15 2 3
5 7 2 3 25 45

19 12 15 7 2
19 15 12 3 9 5 7 2 21 25 45

12 9 5 7

21 19 12 15 9

19 15 3 2
19 15 12 3 9 5 7 2 21 25 45 15 9 12 3 2

15 12 9 7 3 2 5 12 7

19 21 25 45

15 9 3
15 9 12 3

12 12 9 7
19 21 25 45

7 2 5
5 15 19 21 25 45

2
2 5 7

3
12 9 7 3

12 9 3
12 9 7 3

9 7 5 3
9 5 7 3

7 2
2 12 15 19 21 25 45

2
2 5

5
15 19 21 25 45

and finally

12 15 19 21 25 45

Conclusion
 

 

1st Step- Build heap, O(n) time complexity Step2nd Step perform n deleteMax operations, each with O(log(n)) time complexity total time complexity = O(n log(n)) Pros: fast sorting algorithm, memory efficient, especially for very large values of n. Cons: slower of the O(n log(n)) sorting algorithms