Documente Academic
Documente Profesional
Documente Cultură
INTRODUCTION
BUBBLE SORT
INSERTION SORT
SHELL SORT
HEAP SORT
MERGE SORT
QUICK SORT
BUCKET SORT
RADIX SORT
INTRODUCTION
The list is an abstract data type that implements an ordered collection of values
where the values may occur more than once in the list. The order in which the list
is sorted can be numerical or lexicographical.
This report aims to give a glimpse about the various sorting algorithms present. It
gives a detailed (but limited) explanation on the working and the uses of some of
the most widely used sorting algorithms. With regard to the different types of
sorting algorithms at hand, the report presents us the basis on which the algorithms
are classified, some of them being complexity, stability, and general method
followed.
For every sorting algorithm, we have provided with a code fragment for the
facility of the readers. The report is to be used as a study material and not just a
reading one.
CLASSIFICATION OF SORTING ALGORITHMS
Sorting algorithms are classified based on various factors. These factors also
influence the efficiency of the sorting algorithm. They are often
classified by:
The algorithm that does not change the relative order, is the stable algorithm.
The unstable algorithms can always be specially implemented to be
stable, by keeping the order of actual data as the tie breaker for equal
values. This would require additional computational cost and memory.
All these factors will be considered when we will be doing a detailed study of
the widely used and popular sorting algorithms.
BUBBLE SORT
Bubble sort has the worst case and average complexity both О(n²), where n is
the number of items being sorted. In this algorithm for sorting 100
elements, there are 10000 comparisons made. There are lot of other sorting
algorithms which substantially work better with the worst or average case
of O(n log n). Insertion sort, which also has a worst case of О(n²), perform
better than bubble sort. Therefore the use of bubble sort is not practical
when n is large. The performance of bubble sort also depends on the
position of the elements. Large elements at the beginning of the list do not
pose a problem, as they are quickly swapped. Small elements towards the
end, however, move to the beginning extremely slowly. Cocktail sort is a
variant of bubble sort that solves this problem, but it still retains the О(n²),
worst case complexity.
Let us take the array of numbers "5 1 4 2 8", and sort the array from lowest
number to greatest number using bubble sort algorithm. In each step, elements
written in bold are being compared.
First Pass:
(51428) ( 1 5 4 2 8 ), Here, algorithm compares the first two elements, and
swaps them.
(15428) ( 1 4 5 2 8 ), Swap since 5 > 4
(14528) ( 1 4 2 5 8 ), Swap since 5 > 2
(14258) ( 1 4 2 5 8 ), Now, since these elements are already in order (8 > 5),
algorithm does not swap them.
Second Pass:
(14258) (14258)
(14258) (12458)
(12458) (12458)
(12458) (12458)
Now, the array is already sorted, but our algorithm does not know if it is
completed. The algorithm needs one whole pass without any swap to know it is
sorted.
Third Pass:
(12458) (12458)
(12458) (12458)
(12458) (12458)
(12458) (12458)
Finally, the array is sorted, and the algorithm can terminate.
The performance of bubble sort can be improved marginally. When the first pass
is over, the greatest element comes over to the last position i.e. the n-1 position in
the array. For further passes, that position is not to be compared. So now, each
pass can be one step shorter than the previous pass. This can shorten the number of
passes by half although the complexity still remains О(n²).
Insertion sort is a simple sorting algorithm that is relatively efficient for small
lists and mostly-sorted lists, and often is used as a part of more
sophisticated algorithms. It is a comparison sort, in which the sorted
list is built one entry at a time. It works by taking elements from the
list one by one and inserting them in their correct position into a new
sorted list. The new list and the remaining elements can share the
array’s space, but insertion is expensive, requiring shifting all
following elements over by one.
Insertion sort has an average and worst complexity of O(n2). The best case is
when an already sorted list is sorted, the running time being linear i.e. O(n).
Worst case is when the list is in the reverse order. Since in average cases
also, the running time is quadratic, it is not considered suitable for large
lists. However it is one of the fastest, when the list contains less than 10
elements. A slight variant of insertion sort, Shell sort is more efficient
for larger lists. The next chapter gives more detail on shell sort.
Insertion sort is stable, and is an in-place algorithm, requiring a constant
amount of additional memory space. It can sort a list as it receives it.
Compared to other advanced algorithms like quick sort, heap sort, or
merge sort, insertion sort is much less efficient on large lists.
The following example will show the algorithm for insertion sort. Let us consider
the array of numbers “5 1 4 2 8”. In each step, elements written in bold are being
compared.
First Pass:
(51428) ( 1 5 4 2 8 ), Here, algorithm moves down the list searching
for an element less than 5, (which is 1 in this case) and brings it to the front
by swaps.
Second Pass:
(15428) ( 1 4 5 2 8 ) Here 4<5, therefore it is brought to its position
(14528) ( 1 4 5 2 8 ) Here 1>4, hence no swapping is done
Third Pass:
(14528) (14258)
(14258) ( 1 2 4 5 8 ) Here the list is sorted, but it keeps checking that
the whole list is sorted or not.
(12458) (12458)
Fourth Pass:
(12458) (12458)
Instead of swapping, direct shifting can be done by using binary search, and
finding out where the element is to be inserted.
Binary search is only efficient if the number of comparisons is more than the
number of swaps. Since insertion is very tedious in arrays, one can use a
linked-list for the sort. But in linked-list, binary search cannot be done as
random access is not allowed in linked lists. In 2004 Bender, Farach-
Colton, and Mosteiro produced a new variant of insertion sort, called
library sort, that leaves a small number of unused gaps spread throughout
the array. The benefit is that shifting of elements be done only till a gap is
reached.
void insertionSort(int[] arr) {
int i, j, newValue;
newValue = arr[i];
j = i;
j--;
arr[j] = newValue;
}
SHELL SORT
Shell sort was invented by Donald Shell, in 1959. The sort was given its name
upon its inventor. It is an improvised version of insertion sort. It
combines insertion sort and bubble sort to give much more efficiency
than both of these traditional algorithms. Shell sort improves insertion
sort by comparing elements separated by a gap of several positions. This
lets an element take "bigger steps" toward its expected position. Multiple
passes over the data are taken with smaller and smaller gap sizes. The last
step of Shell sort is a plain insertion sort, but by then, the array of data is
guaranteed to be almost sorted. An implementation of shell sort can be
described as arranging the data sequence in a two-dimensional array
and then sorting the columns of the array using insertion set. The effect
is that the data sequence is partially sorted. The process above is
repeated, but each time with a smaller number of columns. In the last
step, the array consists of only one column. Actually, the data sequence
is not held in a two-dimensional array, but in a one-dimensional array
that is indexed properly.
Though the shell sort is a simple algorithm, finding its complexity is a laborious
task. The original shell sort algorithm has O(n2) complexity for comparisons and
exchanges. The gap sequence is a major factor in the shell sort that improves or
deteriorates the performance of the algorithm. The original gap sequence
suggested by Donald Shell was to begin with N/2 and halve the number until it
reaches 1. With this gap sequence, the worst case running time is O(n2). The other
gap sequences that are in use popularly and their worst case running time is as
follows. O(n3 / 2
) for Hibbard's increments of 2k − 1, O(n4 / 3
) for Sedgewick's
increments of , or , or O(nlog2n) for Pratt's
increments 2i3j, and possibly unproven better running times. The existence of an
O(nlogn), (which is the optimal performance for comparison sort algorithms)
worst-case implementation of Shell sort was precluded by Poonen, Plaxton, and
Suel.
Merge sort is a comparison sort which is very much effective on large lists, with a
worst case complexity of O(n log n) It was invented by John Von Neumann in the
year 1945. Merge sort is an example of divide and conquer algorithm. The
algorithm followed by merge sort is as follows –
The two main ideas behind the algorithm is sorting small lists takes lesser time and
steps than sorting long lists, and creating a sorted list from two sorted lists is easier
than from two unsorted lists. Merge sort is a stable sort i.e. the order of equal
inputs is preserved in the sorted list.
As mentioned above, merge sort has an average and worst case performance of
O(n log n) in sorting n objects. When comparing with quick sort, merge sort’s
worst case is found equal with quick sort’s best case. In the worst case, merge sort
does about 39% fewer comparisons than quick sort does in the average case. The
main disadvantage that merge sort has is that as many recursive implementations
of the algorithm is done, so much method call overhead is created, thus taking
time and memory. But it is not difficult to code an iterative, non – recursive merge
sort, avoiding all method call overheads. Also merge sort does not sort in – place,
therefore it requires an extra memory to be allocated for the sorted output to be
stored in. One of the main advantage of merge sort is that it has O(n) complexity,
if the input is already sorted, which is equivalent to running through the list and
checking if it is presorted. Sorting in place is possible using linked lists and is very
complicated. In such cases, heap sort is more preferable. Merge sort is a very
stable sort as long as the merge operation is done properly.
Consider a list of “3, 5, 4, 9, 2” to be sorted using merge sort. First the list will be
divided into smaller lists. Here in this case
35492
35 492
3 5 4 92
9 2
Now let us consider the comparisons that take place in the algorithm. According to
the algorithm, if the list contains 0 or 1 no. of elements the list is already sorted,
and it is merged to form a larger sorted list. Accordingly, 3 and 5 will be merged
as ( 3 5 ) and 9 and 2 will be merged as ( 2 9 ).
Now 4 and ( 2 9 ) are two sorted lists that are to be merged. After comparisons the
new sorted list will be ( 2 4 9 ). Now the two sorted lists to be merged are (35)
and ( 2 4 9 ). The comparisons done are shown below, and the elements being
compared are shown in “bold”.
(35) (249) (2)
(35) (49) (23)
(5) (49) (234)
(5) (9) (23459)
Thus the merged list is ( 2 3 4 5 9 ) which is the required sorted output.
Various programming languages use either merge sort or a variant of the algorithm
as their in-built method for sorting.
public int[] mergeSort(int array[])
{ if(array.length > 1)
{
int elementsInA1 = array.length/2;
int elementsInA2 = elementsInA1;
if((array.length % 2) == 1)
elementsInA2 += 1;
int arr1[] = new int[elementsInA1];
int arr2[] = new int[elementsInA2];
for(int i = 0; i < elementsInA1; i++)
arr1[i] = array[i];
arr1 = mergeSort(arr1);
arr2 = mergeSort(arr2);
int i = 0, j = 0, k = 0;
else
{
array[i] = arr2[k];
i++;
k++;
}
}
while(arr1.length != j)
{ array[i] = arr1[j];
i++;
j++;
}
while(arr2.length != k)
{ array[i] = arr2[k];
i++;
k++;
}
}
return array;
}
HEAPSORT
As Heapsort has O( n log n) complexity it is always compared with quick sort and
merge sort. Quick sort has O(n²) complexity, which is very insecure and
inefficient for large lists. However quick sort works better on smaller lists because
of cache and other factors. Since Heapsort is more secure, it is used in embedded
systems where security is a great concern. When comparing with merge sort the
main advantage Heapsort has, is that Heapsort requires only a constant amount of
auxiliary storage space, in contrast to merge sort which requires O(n) auxiliary
space. Merge sort has many advantages over Heapsort, some being that merge sort
is stable and can be easily adaptable to linked lists and lists on slow media disks.
Let us study an example which demonstrates the working of Heapsort. For the list
( 11 9 34 25 17 109 53 ) the heap data structure will be
An interesting alternative to Heapsort is introsort which combines quick sort and
Heapsort, keeping the worst case property of Heapsort and average case property
of Quicksort.
Quicksort
Bucket sort
Bucket sort is a sorting algorithm that works by partitioning an array into a finite number
of buckets. Each bucket is then sorted individually, either using a different sorting
algorithm, or by recursively applying the bucket sorting algorithm. Thus this is most
effective on data whose values are limited (e.g. a sort of a million integers ranging from 1
to 1000). A variation of this method called the single buffered count sort is faster than
quicksort and takes about the same time to run on any set of data.
Radix sort
Radix sort is an algorithm that sorts a list of fixed-size numbers of length k in O(n · k)
time by treating them as bit strings. We first sort the list by the least significant bit while
preserving their relative order using a stable sort. Then we sort them by the next bit, and
so on from right to left, and the list will end up sorted. Most often, the counting sort
algorithm is used to accomplish the bitwise sorting, since the number of values a bit can
have is minimal - only '1' or '0'.
[hide]
v•d•e
Sorting algorithms
Computational complexity
theory | Big O notation |
Theory
Total order | Lists | Stability |
Comparison sort
Ineffective/humorous
Bogosort | Stooge sort
sorts
In mathematics, computer science, and related fields, big O notation describes the
limiting behavior of a function when the argument tends towards a particular value or
infinity, usually in terms of simpler functions. Big O notation allows its users to simplify
functions in order to concentrate on their growth rates: different functions with the same
growth rate may be represented using the same O notation.