Sunteți pe pagina 1din 25

TABLE OF CONTENTS

INTRODUCTION

CLASSIFICATION OF SORTING ALGORITHMS

BUBBLE SORT

INSERTION SORT

SHELL SORT

HEAP SORT

MERGE SORT

QUICK SORT

BUCKET SORT

RADIX SORT
INTRODUCTION

A sorting algorithm is an algorithm that puts a list in a specific order. Sorting is


the building block of any program. It helps in optimizing the performance of the
program, and also helps in reducing the code to a great level. This has made
sorting algorithms an area of interest for the Computer Scientists.

The list is an abstract data type that implements an ordered collection of values
where the values may occur more than once in the list. The order in which the list
is sorted can be numerical or lexicographical.

This report aims to give a glimpse about the various sorting algorithms present. It
gives a detailed (but limited) explanation on the working and the uses of some of
the most widely used sorting algorithms. With regard to the different types of
sorting algorithms at hand, the report presents us the basis on which the algorithms
are classified, some of them being complexity, stability, and general method
followed.

For every sorting algorithm, we have provided with a code fragment for the
facility of the readers. The report is to be used as a study material and not just a
reading one.
CLASSIFICATION OF SORTING ALGORITHMS

Sorting algorithms are classified based on various factors. These factors also
influence the efficiency of the sorting algorithm. They are often
classified by:

Computational Complexity – Comparisons can be made as worst, average


and best behaviour in terms of the Big O Notation a. For an sorting

algorithm of a list of size , good behaviour is and bad

behaviour is . Ideal behaviour for a sorting algorithm is .

Memory Usage – There are in-place algorithms and out-of-place algorithms.

In-place algorithms require just or extra memory,


beyond the items being sorted. Therefore they don’t need to create
auxiliary locations for data to be temporarily stored, as in out-of-place
sorting algorithms.

Recursion – Some algorithms are either recursive or non-recursive, while


others may be both (e.g., merge sort).

Stability – Stable sorting algorithms maintain the relative order of records


with equal keys (i.e., values). As an example the following set of values
is to be sorted by their first component. ( 2 , 6 ) ( 2 , 1 ) ( 3 , 5 ) ( 4 , 3 )
( 7 , 2 ) . They can be sorted in two ways :
(2,6) (2,1) (3,5) (4,3) (7,2) - Order Not
Changed
(2,1) (2,6) (3,5) (4,3) (7,2) - Order Changed

The algorithm that does not change the relative order, is the stable algorithm.
The unstable algorithms can always be specially implemented to be
stable, by keeping the order of actual data as the tie breaker for equal
values. This would require additional computational cost and memory.

Comparison – Algorithms can be classified on whether the sorting algorithm


is a comparison sort. A comparison sort examines the data only by
comparing two elements with a comparison operator. Most of the
widely used sorts are comparison sorts.

General Method – According to the method followed by the sorting algorithm


it is classified as insertion, exchange, selection, merging, etc. Examples
of exchange sorts and selection sorts are bubble sort and heap sort
respectively.

All these factors will be considered when we will be doing a detailed study of
the widely used and popular sorting algorithms.
BUBBLE SORT

Bubble sort is a simplistic and straightforward method of sorting data that is


used in computer science education. It works by repeatedly steeping
through the list to be sorted, comparing each pair of adjacent items
and swapping them if they are in the wrong order. The pass through
the list is repeated until no swaps are needed, which indicates that the
list is sorted. The smaller elements bubble their way to the top, and
hence the name bubble sort. Bubble sort is a comparison sort.

Bubble sort has the worst case and average complexity both О(n²), where n is
the number of items being sorted. In this algorithm for sorting 100
elements, there are 10000 comparisons made. There are lot of other sorting
algorithms which substantially work better with the worst or average case
of O(n log n). Insertion sort, which also has a worst case of О(n²), perform
better than bubble sort. Therefore the use of bubble sort is not practical
when n is large. The performance of bubble sort also depends on the
position of the elements. Large elements at the beginning of the list do not
pose a problem, as they are quickly swapped. Small elements towards the
end, however, move to the beginning extremely slowly. Cocktail sort is a
variant of bubble sort that solves this problem, but it still retains the О(n²),
worst case complexity.

Let us take the array of numbers "5 1 4 2 8", and sort the array from lowest
number to greatest number using bubble sort algorithm. In each step, elements
written in bold are being compared.

First Pass:
(51428) ( 1 5 4 2 8 ), Here, algorithm compares the first two elements, and
swaps them.
(15428) ( 1 4 5 2 8 ), Swap since 5 > 4
(14528) ( 1 4 2 5 8 ), Swap since 5 > 2
(14258) ( 1 4 2 5 8 ), Now, since these elements are already in order (8 > 5),
algorithm does not swap them.
Second Pass:
(14258) (14258)
(14258) (12458)
(12458) (12458)
(12458) (12458)
Now, the array is already sorted, but our algorithm does not know if it is
completed. The algorithm needs one whole pass without any swap to know it is
sorted.
Third Pass:
(12458) (12458)
(12458) (12458)
(12458) (12458)
(12458) (12458)
Finally, the array is sorted, and the algorithm can terminate.

The performance of bubble sort can be improved marginally. When the first pass
is over, the greatest element comes over to the last position i.e. the n-1 position in
the array. For further passes, that position is not to be compared. So now, each
pass can be one step shorter than the previous pass. This can shorten the number of
passes by half although the complexity still remains О(n²).

Due to its simplicity and straightforwardness, bubble sort is often used to


introduce the concept of an algorithm to introductory computer science students.
The Jargon file, which famously calls bogo-sort ‘the archetypical perversely awful
algorithm’, also calls bubble sort ‘the generic bad algorithm’. D. Knuth, in his
popular book ‘The Art of Computer Programming’ concludes that bubble sort
seems to have nothing in it to recommend. Researchers like Owen Astrachan have
shown by experimental results that insertion sort performs better even on random
lists. They have gone as far as to recommend that it no longer be taught.
INSERTION SORT

Insertion sort is a simple sorting algorithm that is relatively efficient for small
lists and mostly-sorted lists, and often is used as a part of more
sophisticated algorithms. It is a comparison sort, in which the sorted
list is built one entry at a time. It works by taking elements from the
list one by one and inserting them in their correct position into a new
sorted list. The new list and the remaining elements can share the
array’s space, but insertion is expensive, requiring shifting all
following elements over by one.

Insertion sort has an average and worst complexity of O(n2). The best case is
when an already sorted list is sorted, the running time being linear i.e. O(n).
Worst case is when the list is in the reverse order. Since in average cases
also, the running time is quadratic, it is not considered suitable for large
lists. However it is one of the fastest, when the list contains less than 10
elements. A slight variant of insertion sort, Shell sort is more efficient
for larger lists. The next chapter gives more detail on shell sort.
Insertion sort is stable, and is an in-place algorithm, requiring a constant
amount of additional memory space. It can sort a list as it receives it.
Compared to other advanced algorithms like quick sort, heap sort, or
merge sort, insertion sort is much less efficient on large lists.

The following example will show the algorithm for insertion sort. Let us consider
the array of numbers “5 1 4 2 8”. In each step, elements written in bold are being
compared.

First Pass:
(51428) ( 1 5 4 2 8 ), Here, algorithm moves down the list searching
for an element less than 5, (which is 1 in this case) and brings it to the front
by swaps.
Second Pass:
(15428) ( 1 4 5 2 8 ) Here 4<5, therefore it is brought to its position
(14528) ( 1 4 5 2 8 ) Here 1>4, hence no swapping is done

Third Pass:
(14528) (14258)
(14258) ( 1 2 4 5 8 ) Here the list is sorted, but it keeps checking that
the whole list is sorted or not.
(12458) (12458)
Fourth Pass:
(12458) (12458)
Instead of swapping, direct shifting can be done by using binary search, and
finding out where the element is to be inserted.

Binary search is only efficient if the number of comparisons is more than the
number of swaps. Since insertion is very tedious in arrays, one can use a
linked-list for the sort. But in linked-list, binary search cannot be done as
random access is not allowed in linked lists. In 2004 Bender, Farach-
Colton, and Mosteiro produced a new variant of insertion sort, called
library sort, that leaves a small number of unused gaps spread throughout
the array. The benefit is that shifting of elements be done only till a gap is
reached.
void insertionSort(int[] arr) {

int i, j, newValue;

for (i = 1; i < arr.length; i++) {

newValue = arr[i];

j = i;

while (j > 0 && arr[j - 1] > newValue) {

arr[j] = arr[j - 1];

j--;

arr[j] = newValue;

}
SHELL SORT

Shell sort was invented by Donald Shell, in 1959. The sort was given its name
upon its inventor. It is an improvised version of insertion sort. It
combines insertion sort and bubble sort to give much more efficiency
than both of these traditional algorithms. Shell sort improves insertion
sort by comparing elements separated by a gap of several positions. This
lets an element take "bigger steps" toward its expected position. Multiple
passes over the data are taken with smaller and smaller gap sizes. The last
step of Shell sort is a plain insertion sort, but by then, the array of data is
guaranteed to be almost sorted. An implementation of shell sort can be
described as arranging the data sequence in a two-dimensional array
and then sorting the columns of the array using insertion set. The effect
is that the data sequence is partially sorted. The process above is
repeated, but each time with a smaller number of columns. In the last
step, the array consists of only one column. Actually, the data sequence
is not held in a two-dimensional array, but in a one-dimensional array
that is indexed properly.

Though the shell sort is a simple algorithm, finding its complexity is a laborious
task. The original shell sort algorithm has O(n2) complexity for comparisons and
exchanges. The gap sequence is a major factor in the shell sort that improves or
deteriorates the performance of the algorithm. The original gap sequence
suggested by Donald Shell was to begin with N/2 and halve the number until it
reaches 1. With this gap sequence, the worst case running time is O(n2). The other
gap sequences that are in use popularly and their worst case running time is as
follows. O(n3 / 2
) for Hibbard's increments of 2k − 1, O(n4 / 3
) for Sedgewick's
increments of , or , or O(nlog2n) for Pratt's
increments 2i3j, and possibly unproven better running times. The existence of an
O(nlogn), (which is the optimal performance for comparison sort algorithms)
worst-case implementation of Shell sort was precluded by Poonen, Plaxton, and
Suel.

Let 3 7 9 0 5 1 6 8 4 2 0 6 1 5 7 3 4 9 8 2 be the data sequence to be sorted. First,


it is arranged in an array with 7 columns (left), then the columns are sorted (right):
3 7 9 0 5 1 6 3 3 2 0 5 1 5
8 4 2 0 6 1 5 7 4 4 0 6 1 6
7 3 4 9 8 2 8 7 9 9 8 2
Data elements 8 and 9 have now already come to the end of the sequence, but a
small element (2) is also still there. In the next step, the sequence is arranged in 3
columns, which are again sorted:
3 3 2 0 0 1
0 5 1 1 2 2
5 7 4 3 3 4
4 0 6 4 5 6
1 6 8 5 6 8
7 9 9 7 7 9
8 2 8 9
Now the sequence is almost completely sorted. When arranging it in one column
in the last step, it is only a 6, an 8 and a 9 that have to move a little bit to their
correct position.
The best known sequence according to research by Marcin Ciura is 1, 4, 10, 23,
57, 132, 301, 701, 1750. This study also concluded that "comparisons rather than
moves should be considered the dominant operation in Shellsort." Another
sequence that performs very well on large arrays is the Fibonacci numbers
(leaving out one of the starting 1's) to the power of twice the golden ratio, which
gives the following sequence: 1, 9, 34, 182, 836, 4025, 19001, 90358, 428481,
2034035, 9651787, 45806244, 217378076, 1031612713, ….
Algorithm Shellsort

void shellsort (int[] a, int n)


{
int i, j, k, h, v;
int[] cols = {1391376, 463792, 198768, 86961, 33936, 13776, 4592,
1968, 861, 336, 112, 48, 21, 7, 3, 1}
for (k=0; k<16; k++)
{
h=cols[k];
for (i=h; i<n; i++)
{
v=a[i];
j=i;
while (j>=h && a[j-h]>v)
{
a[j]=a[j-h];
j=j-h;
}
a[j]=v;
}
}
}
MERGE SORT

Merge sort is a comparison sort which is very much effective on large lists, with a
worst case complexity of O(n log n) It was invented by John Von Neumann in the
year 1945. Merge sort is an example of divide and conquer algorithm. The
algorithm followed by merge sort is as follows –

1. If the list is of length 0 or 1, then it is already sorted. Otherwise:


2. Divide the unsorted list into two sub-lists of about half the size.
3. Sort each sub-list recursively by re-applying merge sort.
4. Merge the two sub-lists back into one sorted list.

The two main ideas behind the algorithm is sorting small lists takes lesser time and
steps than sorting long lists, and creating a sorted list from two sorted lists is easier
than from two unsorted lists. Merge sort is a stable sort i.e. the order of equal
inputs is preserved in the sorted list.

As mentioned above, merge sort has an average and worst case performance of
O(n log n) in sorting n objects. When comparing with quick sort, merge sort’s
worst case is found equal with quick sort’s best case. In the worst case, merge sort
does about 39% fewer comparisons than quick sort does in the average case. The
main disadvantage that merge sort has is that as many recursive implementations
of the algorithm is done, so much method call overhead is created, thus taking
time and memory. But it is not difficult to code an iterative, non – recursive merge
sort, avoiding all method call overheads. Also merge sort does not sort in – place,
therefore it requires an extra memory to be allocated for the sorted output to be
stored in. One of the main advantage of merge sort is that it has O(n) complexity,
if the input is already sorted, which is equivalent to running through the list and
checking if it is presorted. Sorting in place is possible using linked lists and is very
complicated. In such cases, heap sort is more preferable. Merge sort is a very
stable sort as long as the merge operation is done properly.
Consider a list of “3, 5, 4, 9, 2” to be sorted using merge sort. First the list will be
divided into smaller lists. Here in this case

35492

35 492

3 5 4 92

9 2

Now let us consider the comparisons that take place in the algorithm. According to
the algorithm, if the list contains 0 or 1 no. of elements the list is already sorted,
and it is merged to form a larger sorted list. Accordingly, 3 and 5 will be merged
as ( 3 5 ) and 9 and 2 will be merged as ( 2 9 ).
Now 4 and ( 2 9 ) are two sorted lists that are to be merged. After comparisons the
new sorted list will be ( 2 4 9 ). Now the two sorted lists to be merged are (35)
and ( 2 4 9 ). The comparisons done are shown below, and the elements being
compared are shown in “bold”.
(35) (249) (2)
(35) (49) (23)
(5) (49) (234)
(5) (9) (23459)
Thus the merged list is ( 2 3 4 5 9 ) which is the required sorted output.
Various programming languages use either merge sort or a variant of the algorithm
as their in-built method for sorting.
public int[] mergeSort(int array[])

{ if(array.length > 1)
{
int elementsInA1 = array.length/2;
int elementsInA2 = elementsInA1;
if((array.length % 2) == 1)
elementsInA2 += 1;
int arr1[] = new int[elementsInA1];
int arr2[] = new int[elementsInA2];
for(int i = 0; i < elementsInA1; i++)
arr1[i] = array[i];

for(int i = elementsInA1; i < elementsInA1 +


elementsInA2; i++)
arr2[i - elementsInA1] = array[i];

arr1 = mergeSort(arr1);
arr2 = mergeSort(arr2);

int i = 0, j = 0, k = 0;

while(arr1.length != j && arr2.length != k)


{

if(arr1[j] < arr2[k])


{
array[i] = arr1[j];
i++;
j++;
}

else
{
array[i] = arr2[k];
i++;
k++;
}
}

while(arr1.length != j)
{ array[i] = arr1[j];
i++;
j++;
}
while(arr2.length != k)
{ array[i] = arr2[k];
i++;
k++;
}
}

return array;
}
HEAPSORT

Heapsort is a much more efficient version of selection sort. It works similar to


selection sort, by determining the largest (or smallest) element of the list, placing
that at the end ( or beginning ) of the list, the continuing with the rest of the list,
but accomplishes the task more efficiently with the use of a data structure called a
heap, which is a special type of binary tree. It is always guaranteed that the root
element of the heap is always the largest element in max_heap ( or smallest
element in min_heap ). When the largest element is removed from the heap, there
is no need to find the next largest element, as the heap rearranges itself so that the
next largest element becomes the root. For the heap data structure to find the next
largest element and to move it to the top, it takes only O(log n) time. Therefore the
whole Heapsort algorithm takes just O(n log n) time.

Heap is a specialized tree based data structure, where if B is a child node of A,


then key (A) ≥ key (B). Therefore in a heap, the root element is always the largest
element. The various operations that can be done on a heap data structure are
insert new element, delete an element from the root, and so on. For elementary
Heapsort algorithms, the binary heap data structure is widely used. The operations
that can be done and their algorithms are given in Appendix B.

As Heapsort has O( n log n) complexity it is always compared with quick sort and
merge sort. Quick sort has O(n²) complexity, which is very insecure and
inefficient for large lists. However quick sort works better on smaller lists because
of cache and other factors. Since Heapsort is more secure, it is used in embedded
systems where security is a great concern. When comparing with merge sort the
main advantage Heapsort has, is that Heapsort requires only a constant amount of
auxiliary storage space, in contrast to merge sort which requires O(n) auxiliary
space. Merge sort has many advantages over Heapsort, some being that merge sort
is stable and can be easily adaptable to linked lists and lists on slow media disks.
Let us study an example which demonstrates the working of Heapsort. For the list
( 11 9 34 25 17 109 53 ) the heap data structure will be
An interesting alternative to Heapsort is introsort which combines quick sort and
Heapsort, keeping the worst case property of Heapsort and average case property
of Quicksort.
Quicksort

Main article: Quicksort

Quicksort is a divide and conquer algorithm which relies on a partition operation: to


partition an array, we choose an element, called a pivot, move all smaller elements before
the pivot, and move all greater elements after it. This can be done efficiently in linear
time and in-place. We then recursively sort the lesser and greater sublists. Efficient
implementations of quicksort (with in-place partitioning) are typically unstable sorts and
somewhat complex, but are among the fastest sorting algorithms in practice. Together
with its modest O(log n) space usage, this makes quicksort one of the most popular
sorting algorithms, available in many standard libraries. The most complex issue in
quicksort is choosing a good pivot element; consistently poor choices of pivots can result
in drastically slower O(n²) performance, but if at each step we choose the median as the
pivot then it works in O(n log n).

Bucket sort

Main article: Bucket sort

Bucket sort is a sorting algorithm that works by partitioning an array into a finite number
of buckets. Each bucket is then sorted individually, either using a different sorting
algorithm, or by recursively applying the bucket sorting algorithm. Thus this is most
effective on data whose values are limited (e.g. a sort of a million integers ranging from 1
to 1000). A variation of this method called the single buffered count sort is faster than
quicksort and takes about the same time to run on any set of data.

Radix sort

Main article: Radix sort

Radix sort is an algorithm that sorts a list of fixed-size numbers of length k in O(n · k)
time by treating them as bit strings. We first sort the list by the least significant bit while
preserving their relative order using a stable sort. Then we sort them by the next bit, and
so on from right to left, and the list will end up sorted. Most often, the counting sort
algorithm is used to accomplish the bitwise sorting, since the number of values a bit can
have is minimal - only '1' or '0'.

[hide]
v•d•e
Sorting algorithms

Computational complexity
theory | Big O notation |
Theory
Total order | Lists | Stability |
Comparison sort

Exchange sorts Bubble sort | Cocktail sort |


Odd-even sort | Comb sort |
Gnome sort | Quicksort

Selection sort | Heapsort |


Selection sorts Smoothsort | Cartesian tree
sort | Tournament sort

Insertion sort | Shell sort |


Insertion sorts Tree sort | Library sort |
Patience sorting

Merge sort | Strand sort |


Merge sorts
Timsort

Radix sort | Bucket sort |


Non-comparison sorts Counting sort | Pigeonhole
sort | Burstsort | Bead sort

Topological sorting | Sorting


network | Bitonic sorter |
Others
Batcher odd-even mergesort |
Pancake sorting

Ineffective/humorous
Bogosort | Stooge sort
sorts

In mathematics, computer science, and related fields, big O notation describes the
limiting behavior of a function when the argument tends towards a particular value or
infinity, usually in terms of simpler functions. Big O notation allows its users to simplify
functions in order to concentrate on their growth rates: different functions with the same
growth rate may be represented using the same O notation.

Notation Name Example

Determining if a number is even or odd;


constant using a constant-size lookup table or
hash table

inverse Amortized time per operation using a


Ackermann disjoint set

iterated The find algorithm of Hopcroft and


logarithmic Ullman on a disjoint set
Amortized time per operation using a
log-logarithmic
bounded priority queue[4]

Finding an item in a sorted array with a


logarithmic
binary search

Deciding if n is prime with the AKS


polylogarithmic
primality test

fractional power Searching in a kd-tree

Finding an item in an unsorted list;


linear
adding two n-digit numbers

linearithmic, Performing a Fast Fourier transform;


loglinear, or heapsort, quicksort (best case), or merge
quasilinear sort

Multiplying two n-digit numbers by a


simple algorithm; adding two n×n
quadratic matrices; bubble sort (worst case or
naive implementation), shell sort,
quicksort (worst case), or insertion sort

Multiplying two n×n matrices by simple


algorithm; finding the shortest path on a
weighted digraph with the Floyd-
cubic
Warshall algorithm; inverting a (dense)
nxn matrix using LU or Cholesky
decomposition

Tree-adjoining grammar parsing;


polynomial or maximum matching for bipartite graphs
algebraic (grows faster than cubic if and only if c
> 3)
Factoring a number using the special or
L-notation
general number field sieve

Finding the (exact) solution to the


traveling salesman problem using
exponential or
dynamic programming; determining if
geometric
two logical statements are equivalent
using brute force

Solving the traveling salesman problem


factorial or
via brute-force search; finding the
combinatorial
determinant with expansion by minors.

Deciding the truth of a given statement


double exponential
in Presburger arithmetic
http://www.algolist.net/Algorithms/Sorting/Insertion_sort
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.1393

S-ar putea să vă placă și