Sunteți pe pagina 1din 3

Unit – 5: Files and Advanced sorting

File Organization: Sequential File Organization, Direct File Organization, Indexed Sequential
File Organization.
Advanced sorting: Sorting on Several keys, List and Table sorts, Summary of Internal sorting,

 Sorting Techniques: Basic concepts


Sorting means arranging the elements of an array so that they are placed in some relevant order which may
be either ascending or descending. That is, if A is an array, then the elements of A are
arranged in a sorted order (ascending order) in such a way that A[0] < A[1] < A[2] < ...... < A[N].
For example, if we have an array that is declared and initialized as
int A[] = {21, 34, 11, 9, 1, 0, 22};
Then the sorted array (ascending order) can be given as:
A[] = {0, 1, 9, 11, 21, 22, 34};
A sorting algorithm is defined as an algorithm that puts the elements of a list in a certain order, which can
be numerical order, lexicographical order, or any user-defined order. Sorting method can be implemented
in different ways - by selection, insertion method, or by merging. Efficient sorting algorithms are widely
used to optimize the use of other algorithms like search, insert or merge algorithms which require sorted
lists to work correctly. There are two types of sorting:
 Internal sorting which deals with sorting the data stored in the computer’s memory. If all the data
that is to be sorted can be adjusted at a time in the main memory, the internal sorting method is
being performed.
 External sorting which deals with sorting the data stored in files. External sorting is applied when
there is voluminous data that cannot be stored in the memory. When the data that is to be sorted
cannot be accommodated in the memory at the same time and some has to be kept in auxiliary
memory such as hard disk, floppy disk, magnetic tapes etc, then external sorting methods are
performed.
Some common internal sorting algorithms include:

1. Bubble Sort.
2. Insertion Sort.
3. Quick Sort.
4. Heap Sort.
5. Radix Sort.
6. Selection sort.
External sorting
External sorting is a sorting technique that can handle massive amounts of data. It is usually applied when the
data being sorted does not fit into the main memory (RAM) and, therefore, a slower memory (usually a
magnetic disk or even a magnetic tape) needs to be used. The most common external sorting algorithm is Merge
Sort.

The complexity of sorting algorithm


The complexity of sorting algorithm calculates the running time of a function in which 'n' number of items are to
be sorted. The choice for which sorting method is suitable for a problem depends on several dependency
configurations for different problems. The most noteworthy of these considerations are:

 The length of time spent by the programmer in programming a specific sorting program
 Amount of machine time necessary for running the program
 The amount of memory necessary for running the program

The Efficiency of Sorting Techniques


To get the amount of time required to sort an array of 'n' elements by a particular method, the normal approach
is to analyze the method to find the number of comparisons (or exchanges) required by it. Most of the sorting
techniques are data sensitive, and so the metrics for them depends on the order in which they appear in an input
array.
Various sorting techniques are analyzed in various cases and named these cases as follows:

 Best case
 Worst case
 Average case

Hence, the result of these cases is often a formula giving the average time required for a particular sort of size
'n'. Most of the sort methods have time requirements that range from O(nlog n) to O(n 2).

Analysis of different sorting techniques

The important properties of different sorting techniques including their complexity, stability and memory
constraints.
Time complexity Analysis –
Comparison based sorting –
In comparison based sorting, elements of an array are compared with each other to find the sorted array.

 Bubble sort and Insertion sort –


Average and Worst case time complexity: n^2
Best case time complexity: ‘n’ when array is already sorted.
Worst case: when the array is in completely reverse order.
 Selection sort –
Best, average and worst case time complexity: n^2 which is independent of distribution of data.
 Merge sort –
Best, average and worst case time complexity: nlogn which is independent of distribution of data.
 Heap sort –
Best, average and worst case time complexity: nlogn which is independent of distribution of data.
 Quick sort –
It is a divide and conquer approach with recurrence relation:
T(n) = T(k) + T(n-k-1) + cn
Worst case: when the array is sorted or reverse sorted, the partition algorithm divides the array in two
subarrays with 0 and n-1 elements. Therefore,
T(n) = T(0) + T(n-1) + cn
Solving this we get, T(n) = O(n^2)
Best case and Average case: On an average, the partition algorithm divides the array in two subarrays with
equal size. Therefore,
T(n) = 2T(n/2) + cn
Solving this we get, T(n) = O(nlogn)

Non-comparison based sorting –

In non-comparison based sorting, elements of array are not compared with each other to find the sorted array.
 Radix sort –
Best, average and worst case time complexity: nk where k is the maximum number of digits in elements of
array.
 Count sort –
Best, average and worst case time complexity: n+k where k is the size of count array.
 Bucket sort –
Best and average time complexity: n+k where k is the number of buckets.
Worst case time complexity: n^2 if all elements belong to same bucket.
In-place/Outplace technique –
A sorting technique is In-place if it does not use any extra memory to sort the array. Also known as Internal
sorting technique.
Out-place is also known as External sorting technique. Among the comparison based techniques discussed, only
merge sort is out-placed technique as it requires an extra array to merge the sorted subarrays.
Among the non-comparison based techniques discussed, all are out-placed techniques. Counting sort uses a
counting array and bucket sort uses a hash table for sorting the array.
Online/Offline technique –
A sorting technique is considered Online if it can accept new data while the procedure is ongoing i.e. complete
data is not required to start the sorting operation. Among the comparison based techniques discussed, only
Insertion Sort qualifies for this because of the underlying algorithm it uses i.e. it processes the array (not just
elements) from left to right and if new elements are added to the right, it doesn’t impact the ongoing operation.
Stable/Unstable technique –
A sorting technique is stable if it does not change the order of elements with the same value.
Out of comparison based techniques, bubble sort, insertion sort and merge sort are stable techniques. Selection
sort is unstable as it may change the order of elements with the same value.
For example, consider the array 4, 4, 1, 3.
In the first iteration, the minimum element found is 1 and it is swapped with 4 at 0th position. Therefore, the
order of 4 with respect to 4 at the 1st position will change. Similarly, quick sort and heap sort are also unstable.
Out of non-comparison based techniques, Counting sort and Bucket sort are stable sorting techniques whereas
radix sort stability depends on the underlying algorithm used for sorting.
Analysis of sorting techniques:
 When the array is almost sorted and when the least time when all elements of input array are identical,
insertion sort can be preferred.
 When order of input is not known, merge sort is preferred as it has worst case time complexity of nlog
and it is stable as well.
 When the array is sorted, insertion and bubble sort gives complexity of n but quick sort complexity of
n^2.

 Summary of Time and Space Complexity Comparison Table :


TIME COMPLEXITY SPACE COMPLEXITY
SORTING BEST AVERAGE WORST BEST
ALGORITHM CASE CASE CASE CASE
Bubble Sort Ω(N) Θ(N2) O(N2) O(1)

Selection Sort Ω(N2) Θ(N2) O(N2) O(1)

Insertion Sort Ω(N) Θ(N2) O(N2) O(1)

Merge Sort Ω(N log N) Θ(N log N) O(N log N) O(N)

Heap Sort Ω(N log N) Θ(N log N) O(N log N) O(1)

Quick Sort Ω(N log N) Θ(N log N) O(N2) O(N log N)

Radix Sort Ω(N k) Θ(N k) O(N k) O(N + k)

Count Sort Ω(N + k) Θ(N + k) O(N + k) O(k)

Bucket Sort Ω(N + k) Θ(N + k) O(N2) O(N)