Sorting It Out
By Rod Stephens
In previous articles I described trees in general and Btrees in particular. You can use these trees to sort lists of data. Simply add the data items to the tree, keeping the items in their proper order. When you have added all of the items to the tree, you can read them back out in sorted order.
Sorting using trees is generally pretty fast, but there are many other sorting algorithms that are less complicated and some of them, like countingsort, are much faster. This article describes several important sorting algorithms that are useful under different circumstances.
Algorithm books spend a great deal of time on sorting algorithms. My soontobepublished book Visual Basic Algorithms, for example, spends an entire chapter describing different sorting algorithms. There are several reasons why sorting algorithms are so popular.
First, sorting is a common task in computer applications. Almost any data is more meaningful if you can sort it and display it in various ways. Second, sorting algorithms demonstrate many important algorithmic techniques such as binary subdivision, recursion, and linear interpolation. Finally, different sorting algorithms behave differently for different data so no algorithm is best under all circumstances.
Bubblesort is a specialized algorithm designed for sorting items that are already mostly in sorted order. If only one or two items in your list are out of order, bubblesort is very fast. If the items in your list are initially arranged randomly, bubblesort is extremely slow. For this reason you should be careful when you use bubblesort.
The idea behind the algorithm is to scan through the list looking for two adjacent items that are out of order. When you find two such items, you swap them and continue down the list. You repeat this process until all of the items are in order.
Figure 1 shows a small list where the item 1 is out of order. During the first pass through the list, the algorithm will find that items 4 and 1 are out of order so it swaps them. During the next pass it finds that items 3 and 1 are out of order so it swaps them. During the third pass it swaps items 2 and 1 and the list is in its final order. The way in which the item 1 seems to bubble towards the top of the list is what gives bubblesort its name.
Figure 1. In a bubblesort, item 1 slowly "bubbles" to the top.
You can improve the algorithm if you alternate upward and downward passes through the list. During downward passes an item that is too far down in the list, like the item 1 in the previous example, can move up only one position. An item that is too far up in the list might move many positions. If you add upward passes through the list, you will be able to move items many positions up through the list as well.
During each pass through the list, at least one new item reaches its final position. If the items in your list begin mostly in sorted order, the algorithm will need only one or two passes through the list to finish the ordering. If you have a list of 1,000 items with only one out of order, the algorithm would require only 2,000 steps to put the list in its proper order.
If the items begin arranged randomly, the algorithm may need one pass per item in the list. The algorithm would need up to 1 million steps to arrange a list of 1,000 items.
Listing 1 shows the VB code for the improved bubblesort algorithm.
' min and max are the minimum and maximum indexes
' of the items that might still be out of order.
Sub BubbleSort (List() As Long, ByVal min As Integer, _
ByVal max As Integer)
Dim last_swap As Integer
Dim i As Integer
Dim j As Integer
Dim tmp As Long
' Repeat until we are done.
Do While min < max
' Bubble up.
last_swap = min  1
' For i = min + 1 To max
i = min + 1
Do While i <= max
' Find a bubble.
If List(i  1) > List(i) Then
' See where to drop the bubble.
tmp = List(i  1)
j = i
Do
List(j  1) = List(j)
j = j + 1
If j > max Then Exit Do
Loop While List(j) < tmp
List(j  1) = tmp
last_swap = j  1
i = j + 1
Else
i = i + 1
End If
Loop
' Update max.
max = last_swap  1
' Bubble down.
last_swap = max + 1
' For i = max  1 To min Step 1
i = max  1
Do While i >= min
' Find a bubble.
If List(i + 1) < List(i) Then
' See where to drop the bubble.
tmp = List(i + 1)
j = i
Do
List(j + 1) = List(j)
j = j  1
If j < min Then Exit Do
Loop While List(j) > tmp
List(j + 1) = tmp
last_swap = j + 1
i = j  1
Else
i = i  1
End If
Loop
' Update min.
min = last_swap + 1
Loop
End Sub

Listing 1. Bubblesort.
Selectionsort is a very simple algorithm. First you search the list for the smallest item. Then you swap that item with the item at the top of the list. Next you find the second smallest item and swap it with the second item in the list. You continue finding the next smallest item and swapping it into its final position in the list until you have swapped all of the items to their final positions. The VB code for selectionsort is shown in Listing 2.
Sub Selectionsort (List() As Long, min As Integer, _
max As Integer)
Dim i As Integer
Dim j As Integer
Dim best_value As Long
Dim best_j As Integer
For i = min To max  1
best_value = List(i)
best_j = i
For j = i + 1 To max
If List(j) < best_value Then
best_value = List(j)
best_j = j
End If
Next j
List(best_j) = List(i)
List(i) = best_value
Next i
End Sub

Listing 2. Selectionsort.
While looking for the Ith smallest item, you must examine each of the N  I items that you have not yet placed in their final positions. Then the total number of steps the algorithm needs is:
N + (N  1) + (N  2) + ... + 1 = N * (N + 1) / 2
This function is on the order of N2. That means if you increase the number of items in the list by a factor of 2, the run time of the algorithm will increase by a factor of roughly 22 = 4. There are several other sorting algorithms that require only about N * log(N) steps (quicksort is one described below), so selectionsort is not a very fast algorithm for large lists.
Selectionsort is fine for small lists, however. It is very simple so it is easy to program, debug, and maintain over time. In fact it is so simple that it is actually faster than the more complicated algorithms if the list you are sorting is very small. If your list contains only a dozen or so items, selectionsort will probably be your best choice.
Quicksort is a recursive algorithm that uses a divideandconquer technique. While the list of items to be sorted contains at least two items, quicksort divides it into two sublists and recursively calls itself to sort the sublists.
The quicksort routine first checks to see if the list it is sorting contains fewer than two items. If so, it simply returns.
Otherwise the subroutine picks an item from the list to use as a dividing point. It then places all of the items that belong before this dividing point in the left part of the list. It places all of the other items in right part of the list. The subroutine then recursively calls itself to sort two smaller sublists.
There are several ways in which the quicksort routine might pick the dividing item. One of the easiest is to simply use the first item in the sublist being sorted. If the list is initially arranged randomly, that item will be a reasonable choice. Chances are good that the item will belong somewhere in the middle of the list and the two sublists the algorithm creates will be reasonably equal in size.
If the numbers are initially sorted or almost sorted, or if they are initially sorted in reverse order, then this method fails miserably. In that case the first item in the list will divide the list into one sublist that contains almost every item and another that will contain almost no items. Since the larger sublist does not shrink much, the algorithm makes little headway. In this case the algorithm will require on the order of N2 steps. This is the same order of performance given by selectionsort, only this algorithm is much more complicated.
A better method for selecting the dividing item is to choose one randomly. Then no matter how the items in the list are arranged, chances are the item you select will belong near the middle of the list and the sublists will be fairly evenly sized.
As long as the sublists are fairly equal in size, the algorithm will require on the order of N * log(N) steps. It can be proven that this is the fastest time possible for a sorting algorithm that sorts using comparisons. By using a little randomness, this algorithm avoids the possibility of its worst case N2 behavior and gives an expected case performance of N * log(N). Quicksort is very fast in practice as well as theory, so it is the favorite sorting algorithm of many programmers.
Listing 2 shows the VB code for the quicksort routine.
Sub Quicksort (List() As Long, min As Integer, max As Integer)
Dim med_value As Long
Dim hi As Integer
Dim lo As Integer
Dim i As Integer
' If the list has no more than 1 element, it's sorted.
If min >= max Then Exit Sub
' Pick a dividing item.
i = Int((max  min + 1) * Rnd + min)
med_value = List(i)
' Swap it to the front so we can find it easily.
List(i) = List(min)
' Move the items smaller than this into the left
' half of the list. Move the others into the right.
lo = min
hi = max
Do
' Look down from hi for a value < med_value.
Do While List(hi) >= med_value
hi = hi  1
If hi <= lo Then Exit Do
Loop
If hi <= lo Then
List(lo) = med_value
Exit Do
End If
' Swap the lo and hi values.
List(lo) = List(hi)
' Look up from lo for a value >= med_value.
lo = lo + 1
Do While List(lo) < med_value
lo = lo + 1
If lo >= hi Then Exit Do
Loop
If lo >= hi Then
lo = hi
List(hi) = med_value
Exit Do
End If
' Swap the lo and hi values.
List(hi) = List(lo)
Loop
' Sort the two sublists
Quicksort List(), min, lo  1
Quicksort List(), lo + 1, max
End Sub

Listing 3. Quicksort.
The discussion of quicksort above mentions that the fastest possible sorting algorithms that use comparisons use on the order of N * log(N) steps. Countingsort does not use comparisons so it is not bound by that result. In fact countingsort is so fast it seems to sort using magic rather than comparisons.
On the other hand, countingsort only works under special circumstances. First, the items you are sorting must be integers. You cannot use countingsort to sort strings. Second, the range of values the items have must be fairly limited. If your items range from 1 to 1,000, countingsort will work extremely well. If the items range from 1 to 30,000, countingsort will not work as well. If the items range from 1 to 10 billion, you should go back to quicksort.
The algorithm starts by allocating a temporary array of integers with bounds that cover the range of your items. If your items range from min_item to max_item, the algorithm creates an array like this:
Dim Counts() As Integer
ReDim Counts(min_item To max_item)
Next the algorithm looks through each of the items in the list and increments the Counts entry corresponding to that item. When this stage is finished, Counts(I) holds a count of the number of items that have value I. Keep in mind that the ReDim statement initializes each of these entries to 0 so you do not need to do this yourself.
For I = min To Max
Counts(List(I)) = Counts(List(I)) + 1
Next I
The program then runs through the Counts array converting the counts into offsets in the sorted list. For example, suppose the items in the list have values between 1 and 1,000. There might be 15 items with value 1, 7 with value 2, 11 with value 3, and so forth. The items with value 1 will begin at position 1 in the sorted list. Since there are 15 of them, the items with value 2 will start at position 16. There are 7 of those so the items with value 3 will start at position 23, etc.
next_spot = 1
For i = min_value To max_value
this_count = counts(i)
counts(i) = next_spot
next_spot = next_spot + this_count
Next i
When this stage is complete, the entry Counts(I) indicates the position in the sorted list where the first item with value I belongs.
Now the algorithm reads through the list again, placing each item in the correct position in the sorted array. As the algorithm places each item, it updates the corresponding Counts entry so the next item with the same value goes into the next position in the array.
If you are sorting data records rather than just numbers, you will need to use a temporary array. Since you place each item directly in its final location in the sorted array, you cannot store the sorted list in the same array as the original list. If you did, you would overwrite another item in the list that might not yet have been moved to its correct location. If your program needs the items in the original array, you will have to copy them back out of the temporary array when you have finished sorting them.
To sort N numbers that have a range that spans M values, countingsort executes roughly 2 * N + M steps. First it reads the N items to fill in the Counts array. Then it runs through the M values in the Counts array converting them from counts into offsets. Finally it moves the N items to their correct sorted positions.
If N is large and M is relatively small, 2 * N + M is much faster than the N * log(N) performance given by quicksort. To sort 30,000 numbers that ranged from 1 to 10,000, for example, quicksort might execute more than 400,000 steps. Countingsort would execute only about 70,000 steps and take less than a fifth as long.
The VB code for countingsort is shown in Listing 4.
Sub Countingsort (List() As Long, sorted_list() As Long, _
min As Integer, max As Integer, min_value As Long, _
max_value As Long)
Dim counts() As Integer
Dim i As Integer
Dim this_count As Integer
Dim next_offset As Integer
' Create the Counts array.
ReDim counts(min_value To max_value)
' Count the items.
For i = min To max
counts(List(i)) = counts(List(i)) + 1
Next i
' Convert the counts into offsets.
next_offset = min
For i = min_value To max_value
this_count = counts(i)
counts(i) = next_offset
next_offset = next_offset + this_count
Next i
' Place the items in the sorted array.
For i = min To max
sorted_list(counts(List(i))) = List(i)
counts(List(i)) = counts(List(i)) + 1
Next i
End Sub

Listing 4. Countingsort.
Table 1 summarizes the strengths and weaknesses of the algorithms presented here. As you can see, each performs well under some circumstances and badly under others. Here are some guidelines to help you select the right algorithm for your situation.
 If your list is more than 99% sorted already, use bubblesort.
 If you have a very small list (under 100 items or so), use selectionsort.
 If the items in your list are integers ranging over a small number of values (up to several thousand), use countingsort.
 Otherwise use quicksort.
Algorithm 
Advantages 
Disadvantages 
Bubblesort 
Very fast for lists that are almost sorted 
Very slow sorting all other lists 
Selectionsort 
Very simple 
Slow for large lists 
Easy to understand 
Very fast for small lists 
Quicksort 
Very fast for large lists 
Trouble if there are lots of duplicate data values 
Countingsort 
Extremely fast when the data is distributed over a small range (e.g. all values are between 1 and 1,000) 
Slower when data range is large 
Requires extra memory 
Only works with integer data 
Table 1. Comparison of Sorting Algorithms.
Program SORT allows you to run each of these algorithms. You can enter the number of items to sort and the maximum value the items should have. When you press the Go button, the program builds a list of random numbers between 1 and the maximum you specified. The program then sorts the numbers and displays the amount of time it took.
If you check the Start Sorted check box, the program will begin by sorting the list using Quicksort. If you also fill in a value in the # Unsorted field, the program will then switch some items around so the list contains the indicated number of unsorted items. The program will then sort the partially sorted list using the algorithm you have selected. This allows you to test bubblesort on lists that are mostly sorted.
Finally, you can fill in a value in the Repetitions field to make the program sort the list many repeatedly. You will need to sort small lists many times to see that selectionsort is faster than quicksort and countingsort for very small lists.
Using the program you can verify that:
 Selectionsort is fastest for small lists (use many repetitions).
 Quicksort is faster for larger lists.
 Countingsort is faster still if the number of items is larger than the largest item.
 Quicksort is faster than countingsort if the largest item is large (like 30,000) compared to the number of items (like 1,000).
 Bubblesort is fastest if the list starts mostly sorted.
Downloads
Download a 4K zip file containing program SORT described in this article. This program demonstrates the bubblesort, selectionsort, quicksort, and countingsort algorithms.
Note: This article uses Integers for indexes into the item array and Longs for the data type of the items being sorted. That makes it easier for you to tell which variables represent items and which represent indexes.
This also makes it easier to switch the data types of the items. If you need to sort Strings or Singles, for example, change the data types of the Long variables.
Because the code uses Integer indexes, the routines only work for 32,000 or so items. If you need more items, change the index variables' data types to Long.
This article originally appeared in the December 1995 issue of
Visual Basic Developer
magazine. If you like the article, visit the magazine's Web
site to see if you would like to subscribe.

