Do you fear the mathematical formulas for calculating complexity of a sorting algorithm? This is very normal to have though! There is nothing wrong with it as people like me are not gifted with such a mathematical skills. However, sorting algorithms are the stepping stones in the study of algorithm and complexity. Here, we will take practical examples and verify our theoretical understanding. Let's take two simple sorting algorithms and analyse with coding and not with those scary mathematical formulas. I hope many of the fundamental stuffs will be clear. We'll gradually go in advanced topics also.
First let me give a brief intro to those who are wondering what the hell is this "sorting"? Sorting is a very well known problem in computer science where you have to put a given set of elements in some order. Here we will deal with a set of integer numbers and we will sort them in ascending order. For example, if our given data set is {3,5,1,6,7,1} then our algorithm output should be: {1,1,3,5,6,7}.
Selection sort:
This is the simplest sorting algorithm that comes to mind whenever a sorting of array problem comes into picture.
Pseudo code:
Function: SelectionSort (Array to sort: array)
void selection_sort(int* array, int n)
{
int i=0, j=0, MinElementIndex, temp;
for (i=0; i<n; i++)
{
MinElementIndex = i;
for (j=i+1; j<n; j++)
{
if (array[j] < array[MinElementIndex])
MinElementIndex = j;
}
if (array[i]!=array[MinElementIndex])
{
temp = array[i];
array[i] = array[MinElementIndex];
array[MinElementIndex] = temp;
}
}
}
Having this function in a C file we will analyze this extensively. We will now write a shell script to do that. The script will generate a data set and provide it to our selection sort algorithm. The C program, writes the result in a file after completion of the algorithm and also reports the algorithm running time.The script will then see if the result reported by our algorithm is right or not. If it is right it continues to test our algorithm with next data set. We will run the algorithm with different data sets varying the value of n (the number of elements in the data set) to see how the algorithm running time varies with n.
To generate the data set, I have used environment variable $RANDOM provided by linux shell. This gives a pseudo random integer between 0 and 32767 with each reference. You can refer to my script here.
- n := length of array
- For i : 1 to n
- do
- MinElementIndex = i
- For j : i+1 to n
- do
- if (array[j] < array[MinElementIndex])
- MinElementIndex = j
- done
- if ( i != MinElementIndex )
- Swap ( array[i], array[MinElementIndex] )
- End Function
Brief Explanation of how selection sort works: Find the minimum element and place at the beginning of the array. In each iteration select the min of the remaining array and place it in its place.Now, let's write a simple C code to realize the algorithm.
void selection_sort(int* array, int n)
{
int i=0, j=0, MinElementIndex, temp;
for (i=0; i<n; i++)
{
MinElementIndex = i;
for (j=i+1; j<n; j++)
{
if (array[j] < array[MinElementIndex])
MinElementIndex = j;
}
if (array[i]!=array[MinElementIndex])
{
temp = array[i];
array[i] = array[MinElementIndex];
array[MinElementIndex] = temp;
}
}
}
Having this function in a C file we will analyze this extensively. We will now write a shell script to do that. The script will generate a data set and provide it to our selection sort algorithm. The C program, writes the result in a file after completion of the algorithm and also reports the algorithm running time.The script will then see if the result reported by our algorithm is right or not. If it is right it continues to test our algorithm with next data set. We will run the algorithm with different data sets varying the value of n (the number of elements in the data set) to see how the algorithm running time varies with n.
To generate the data set, I have used environment variable $RANDOM provided by linux shell. This gives a pseudo random integer between 0 and 32767 with each reference. You can refer to my script here.
It's time for seat and watch the graphs created for us by the script. Before that let me explain the nature of the data set we are providing as the running time will vary on that. There are some notions in sorting algorithm domain also on the basis of the nature of the data set.
Best case: Where the given data set is already sorted.
Worst case: Where the given data set is in the reverse order of the required order. In our case it is decreasing order.
Avg case: Where the data set elements are random i.e. there is no particular ascending or descending pattern in them.
For a best case scenario, selection algorithm does not realize that everything is in order. It blindly traverses all the elements and tries to sort. Although, there would not be any swap but it will do all the iteration... Thus, for this algorithm, there is no improvement for best case scenario. Check out the graph below which plots number of elements vs algorithm running time.
Bubble sort:
Let us take another simple sorting approach. Unlike selection sort, its main funda is to find the maximum number in the array and put it in its position. However, doing this it can recognize the best case scenario pretty quickly and thus gives much better performance in best case scenario than avg case or worst case. Pseudo code for bubble sort:
Function: BubbleSort (Array to sort: array)
Now running this algorithm with our random data sets as earlier we get the following graph. Now we can clearly see that for best case scenario the algorithm almost takes constant time which corresponds to our theoretical understanding also.
The C program can be viewed here.
Time Complexity:
Are you wondering as I didn't talk about the actual O(n2)? Well it is very easy to observe from these two graphs though. Both the graphs start from n = 1000 onward. So, for this value of n, n2 would be 10,00,000, whereas the highest time goes up to 1,20,000 and 2,50,000 for selection sort and bubble sort respectively. So, the running time is definitely well below the order of n2 mark. And that matches with the theoretical understanding also.
Wondering how these graphs are plot? I've use gnuplot tool in linux. You can find a good tutorial on this here.
Now, let us compare two sorting algorithms' running time.
From the graph above, we can see that in the worst case scenario, selection sort takes much less time than the bubble sort although they are of same order, O(n2). Let's see why is it so? This is because for sorting of same array, selection sort algorithm makes less number of swaps than the bubble sort algorithm. See yourself with manually simulating the algorithms for some simple array.
How do you feel now? A bit more confident in this domain? Ok, for sake of simplicity I've stored the numbers in an array. Ideally, these arrays should be replaced by linked lists... Next time we will see more complicated sorting algorithms like merge sort: understand and compare them. See you then...
Looking forward for your feedback...
Best case: Where the given data set is already sorted.
Worst case: Where the given data set is in the reverse order of the required order. In our case it is decreasing order.
Avg case: Where the data set elements are random i.e. there is no particular ascending or descending pattern in them.
For a best case scenario, selection algorithm does not realize that everything is in order. It blindly traverses all the elements and tries to sort. Although, there would not be any swap but it will do all the iteration... Thus, for this algorithm, there is no improvement for best case scenario. Check out the graph below which plots number of elements vs algorithm running time.
Bubble sort:
Let us take another simple sorting approach. Unlike selection sort, its main funda is to find the maximum number in the array and put it in its position. However, doing this it can recognize the best case scenario pretty quickly and thus gives much better performance in best case scenario than avg case or worst case. Pseudo code for bubble sort:
Function: BubbleSort (Array to sort: array)
- n := length of array
- swapped = true
- while ( swapped == true )
- do
- swapped = false
- for i : 1 to n-1
- do
- if ( array[i] > array[i+1] )
- swap( array[i], array[i+1] )
- swapped = true
- End if
- done
- done
Now running this algorithm with our random data sets as earlier we get the following graph. Now we can clearly see that for best case scenario the algorithm almost takes constant time which corresponds to our theoretical understanding also.
The C program can be viewed here.
Time Complexity:
Are you wondering as I didn't talk about the actual O(n2)? Well it is very easy to observe from these two graphs though. Both the graphs start from n = 1000 onward. So, for this value of n, n2 would be 10,00,000, whereas the highest time goes up to 1,20,000 and 2,50,000 for selection sort and bubble sort respectively. So, the running time is definitely well below the order of n2 mark. And that matches with the theoretical understanding also.
Wondering how these graphs are plot? I've use gnuplot tool in linux. You can find a good tutorial on this here.
Now, let us compare two sorting algorithms' running time.
From the graph above, we can see that in the worst case scenario, selection sort takes much less time than the bubble sort although they are of same order, O(n2). Let's see why is it so? This is because for sorting of same array, selection sort algorithm makes less number of swaps than the bubble sort algorithm. See yourself with manually simulating the algorithms for some simple array.
How do you feel now? A bit more confident in this domain? Ok, for sake of simplicity I've stored the numbers in an array. Ideally, these arrays should be replaced by linked lists... Next time we will see more complicated sorting algorithms like merge sort: understand and compare them. See you then...
Looking forward for your feedback...



This comment has been removed by the author.
ReplyDeleteFirst of all, its a very nice explanation. I have one doubt. Does average case running time depends on the randomness of the data set? I guess it should depend. If so, then can we characterize this dependency? For example, in the figure where you have calculated the running time for bubble sort, the best case is near to zero. However, the average case almost coincides with worst case. Why the pattern is like that? For example, if the input data set is 90% sorted, then should not the running time for average case come closer to the best case? If the average case means the data set is 50% random (for example what we called random with probability 0.5 Poisson), then what should be the running time?
ReplyDeleteThanks Sandy.
DeleteYes I agree with you that pattern of the input data plays a big role here. I think if we do the following things the pattern would be clear:
1. Analyse the pattern of $RANDOM
2. Make 50%, 90% of input data sorted and see.
I think avg case graph in those case should go in the middle of best case and worst case as you said. I'll do these experiments. Thanks again for the feedback...
Hi Abhirup,
ReplyDeleteGreat job. But I have a few queries and feedbacks.
1. The legends are lost in the midst of their experimental values (in the 2nd graph)
2. I got a bit confused int he section "Time Complexity".
I thought you would be explaining order of computational complexity which depends on size of array 'n'. So, the measurement of complexity should have been number of times compared instead of 'runtime in usec'. Even 'run time' has a relation with the number of computation (should be somewhat linear), and so they can be substituted by each other. But if u substitute '# comparisons' with 'runtime' then you cannot expect that (time taken to run ~ n^2 )
Anyways, keep up the good work.
Thanks boss...
DeleteYah, I thought in the same direction that "running time" can be a substitute of the number of comparison as you said). I did this not to complicate things with introducing a new parameter: number of comparison and all that.
However, the Big O notation tells that n^2 would be the upper bound of the number of comparisons if the input approaches infinity. So,is it totally out of place to compare run time with n^2? Although, actual run time can have other parts of the equation which O(n^2) has omitted [like O(5n^2+3n) ~ O(n^2)]. However, from a simplicity stand point, I thought to express time complexity only in terms of time taken by the algo and tried to map it to theoretical understanding.
Great job brother ! Hope to read more articles in coming days . Keep up the good work .
ReplyDelete1.Heap Sort in C
ReplyDelete2.Bubble Sort in C
3.Insertion Sort in C
4.Selection Sort in C
5.Quick Sort in C
Here is complete description of Heap Sort Algorithm with example and screenshots
ReplyDeletehttp://geeksprogrammings.blogspot.in/2013/08/explain-heapsort-with-example.html