1.2 Simple Sorting1.1 Binary Search1 Algorithms and Data Structures in Java1.3 Recursion

1.2 Simple Sorting

As we have seen in the previous section, having an array sorted is an important prerequisite for later searching it. In general, when using an array as a "database" it is a good idea to keep the elements sorted according to some criterion: names can be sorted in alphabetical order, products by registration numbers, cities by zip codes, and so on. Entities may be also sorted according to different criteria, e.g., a database of books may be sorted according to their titles or their ISBN numbers (but not both at the same time). Because sorting is an important problem in computer science, it has been extensively studied and numerous methods of varying sophistication have been developed. In this section, we will investigate a basic sorting algorithm called  insertion sort which is good for moderately sized arrays (up to some tens of elements). For larger arrays, we will see later a much faster algorithm.

The basic idea of insertion sort is as follows: we assume that at the start of iteration i of the algorithm, the first i elements of the array are already sorted. The array therefore consists of a left part which is sorted and a right part which is unsorted.

Insertion Sort

We then take the (i+1)-th element x and insert it into the appropriate position of the sorted part to the left of x by shifting the elements larger than x one position to the right.

Insertion Sort

Now we have the first i+1 elements sorted. After n iterations (where n is the length of the array), therefore the whole array is sorted.

The following applet illustrates the insertion sort of an array of 10 elements.

If you don't see the applet, your Browser does not support Java. Click here, to view a precomputed image.
The arrow outer represents the program variable i above, the arrow inner the variable j (see below). The arrow temp denotes the temporary variable x. Use the button Step to step through the algorithm or Run to execute the algorithm.

Applet

The algorithm is expressed by the following Java method:

  // ------------------------------------------------
  // insertionSort(a)
  // sort 'a' by the insertion sort algorithm.
  //
  // Input:
  //   'a' is not null.
  // Output:
  //   'a' has the same elements as the old 'a'.
  //   'a' is sorted.
  // ------------------------------------------------
  static void insertionSort(int[] a)
  {
    final int n = a.length;
    for (int i = 1; i < n; i++)
    {
      // a is sorted from 0 to i-1
      int x = a[i];
      int j = i-1;
      while (j >= 0 && a[j] > x)
      {
        a[j+1] = a[j];
        j = j-1;
      }
      a[j+1] = x;
   }
  }

In this method, we use the variable x as a temporary buffer to hold the value of a[i] while the larger elements in the sorted part of the array are shifted one position to the right. When all elements have been shifted, this value is written back into the appropriate position of the array.

In the first iteration of the outer loop, the algorithm performs at most one comparison operation, in the second iteration at most two, and so on. In the worst case (the input array is sorted in  descending order), we therefore have

1+2+...+(n-1) = n*(n-1)/2  n2/2
comparisons (the number of element swap operations is approximately the same). We therefore call the algorithm  quadratic in the length n of the array. In  average, however, in each iteration of the outer loop only half of the maximum number of comparisons have to be performed until the appropriate insertion point is found.
Why?
The algorithm therefore needs in average approximately n2/4 steps.

Sorting an array by this algorithm is therefore much slower than searching for an element in an array (which takes at most n steps, if the array is unsorted, and at most log2 n steps, if the array is sorted). For instance, if the size of the array is 1024, approximately 262000 comparison operations have to be performed to sort the array. If the array has 8192 elements, approximately 16 million comparisons have to be performed.

However, for data that are already sorted or almost sorted, the insertion sort does much better. When data is in order, the inner loop is never executed.

Why?
In this case, the algorithm needs only n steps. If the data is almost sorted, the inner loop is only very infrequently executed and the algorithm runs in "almost" n steps. It is therefore a simple and efficient way to order a database that is only slightly out of order.
© Wolfgang Schreiner; February 3, 2005

1.2 Simple Sorting1.1 Binary Search1 Algorithms and Data Structures in Java1.3 Recursion