October 14, 2021

Counting and Bucket Sort

Nikhil Kumar SinghVrishchik

DURATION

15min

Counting sort

Counting sort is a sorting algorithm that works on the range of the input values. Counting sort is somewhat different from other sorting techniques, as it is a linear sorting algorithm. This means it runs in learn time O(N), whereas the best comparison-based sorting algorithms have the complexity of O(N log N) (where N is the number of elements in an array).

As stated above, the values are to be in a range. So, for algorithm implementation, we take an auxiliary array which stores the count of the occurrences of each element in the array lying between 0 to N -1, then the array is used to produce the sorted version of the array using arithmetic operations.

Here we are using an extra array which increases the space complexity. This makes it a bit more costly than other sorting algorithms, as it works only with ranges and is therefore only suitable where the element values’ range is close to the size of the array and the range = max_element - min_element + 1.

Now we will show an example of the process to illustrate this more clearly.

Let’s consider an array A of length N which we are going to sort using counting sort, and an array aux of length K where K is max_element - min_element + 1.

Basic terminology:

The aux array is 0 indexed, which is initially initiated to 0.
Every i ranging from 0 <= i <=N-1, store the number of occurrences of i values in the original array or contain the arr[i] - min element occurrences.

PSEUDOCODE:

Use an aux array to store the counts for all unique integers in our input array A.
Modify the aux array such that the elements at each index store the sum of previous counts.
Now iterate over A and output each integer to its corresponding position in the output array.

Our example array A: [ 1 , 4 , 1 , 2 , 7 , 5 , 2] and N=7;

In our explanation, we have simplified things by taking the value of K just greater than N.

STEPS:

Let’s consider the values to be in the range of 0 to 9.

Take the count of each unique element in the range

Now modify the aux array such that each ith element contains the sum of previous counts.

The above-modified array indicated the position of each object in the output sequence. Now to make it indexed, rotate the array clockwise once.

Build the output using the above info.

Code Implementation(C++):

The code below is implemented and handles the negative number. Here we ground the minimum element and store the count of that minimum element at the 0th index.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <bits/stdc++.h>

using namespace std;

void countingSort(vector < int > & arr) {
  int max = * max_element(arr.begin(), arr.end());
  int min = * min_element(arr.begin(), arr.end());
  int range = max - min + 1;

  vector < int > aux(range), output(arr.size());
  for (int i = 0; i < arr.size(); i++)
    aux[arr[i] - min]++;

  for (int i = 1; i < aux.size(); i++)
    aux[i] += aux[i - 1];

  for (int i = arr.size() - 1; i >= 0; i--) {
    output[aux[arr[i] - min] - 1] = arr[i];
    aux[arr[i] - min]--;
  }

  for (int i = 0; i < arr.size(); i++)
    arr[i] = output[i];
}
int main() {
  int n;
  cin >> n;

  vector < int > arr(n);
  for (int i = 0; i < n; i++) cin >> arr[i];

  countingSort(arr);

  for (int i = 0; i < arr.size(); i++)
    cout << arr[i] << " ";
  return 0;
}

Time Complexity:

Space complexity: O(N)

Modifying aux array: O(K)

Output integers to output array: O(N)

Overall time complexity: O(K + N)

In practice, it becomes O(N)

A couple of remarks:

Counting sort is stable: if two elemsta A[i] and A[j] have the same key value and i < j ,A[i] will appear before A[j] in the output.
If instead of going from N-1 to 0 (as above), i goes from 0 to N-1, the sort is not stable.
Counting sort can be very efficient if k is much smaller than N.

Bucket Sort

Bucket sort is a starting algorithm mainly used when we have data uniformly distributed over a range. As the name suggests, in bucket sort we have several groups called buckets, each containing specified elements of the input array. Each bucket is then sorted by either recursively applying the same bucket algorithm or suitable sorting algorithms.

Basic steps for bucket sort:

Partition the range into a fixed number of groups (buckets),
Iterate over the elements once you have put the elements into the appropriate bucket,
Sort the buckets individually using some sorting algorithm,
Now, concatenate the buckets together.

Use cases of bucket sort:

For sorting floating-point numbers.
For input uniformly distributed over a range.

Scatter-gather Approach

For further understanding we are going to discuss the scatter-gather approach. As the name suggests we first scatter the given elements into several buckets, then sort the buckets using a sorting algorithm, and at last, gather them in order.

Here is an example which will more clearly explain the process.

Let’s consider an array A of length N which we are going to sort using bucket sort.

Here A = { 10 , 8 , 20 , 7 , 16 , 18 , 12 , 1 , 23 ,11 } and N = 9

As we can observe, the elements are in a range of about [0,25). The buckets would be of the following ranges.

As you see above we have distributed our range into five equal buckets. If there is an element like 9 it would go into the second bucket ranging from 5-10. Similarly, every item of the array is placed into its respective bucket.

After placing the elements into buckets or scattering the array elements:

Now, we should individually sort the buckets using any stable sorting algorithms.

Now, at last, we should gather them bucket by bucket.

Code Implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#include <stdio.h>

int getMax(int a[], int n) // function to get maximum element from the given array  
{
  int max = a[0];
  for (int i = 1; i < n; i++)
    if (a[i] > max)
      max = a[i];
  return max;
}
void bucket(int a[], int n) // function to implement bucket sort  
{
  int max = getMax(a, n); //max is the maximum element of array  
  int bucket[max], i;
  for (int i = 0; i <= max; i++) {
    bucket[i] = 0;
  }
  for (int i = 0; i < n; i++) {
    bucket[a[i]]++;
  }
  for (int i = 0, j = 0; i <= max; i++) {
    while (bucket[i] > 0) {
      a[j++] = i;
      bucket[i]--;
    }
  }
}
void printArr(int a[], int n) // function to print array elements  
{
  for (int i = 0; i < n; ++i)
    printf("%d  ", a[i]);
}
int main() {
  int a[] = {
    54,
    12,
    84,
    57,
    69,
    41,
    9,
    5
  };
  int n = sizeof(a) / sizeof(a[0]);
  printf("Before sorting array elements are - \n");
  printArr(a, n);
  bucket(a, n);
  printf("\nAfter sorting array elements are - \n");
  printArr(a, n);
}

Bucket sort complexity:

Best case complexity - occurs when the array is already sorted and when the elements are uniformly distributed in buckets. It could improve further if the elements in the bucket are sorted as well. As we know the individual buckets aren’t sorted, we can improve our complexity further by using insertion sort which makes our overall complexity linear i.e., O(n+k), where O(n) is for distribution among buckets and O(k) for insertion sort among buckets. The best-case time complexity of bucket sort is O(n + k).
Average case complexity - occurs when the elements are not in proper sorted order, not ascending and not descending. Bucket sort runs in linear time, even when the elements are uniformly distributed. The average case time complexity of bucket sort is O(n + K).
**Worst case complexity -**occurs when elements are in very close range. It means they differ marginally, and due to this property they are placed in the same bucket, causing some buckets to consist of more elements than others. The complexity will get worse when the elements are in reverse order. The worst-case time complexity of bucket sort is O(n2).

The advantages of bucket sort are -

Fewer comparisons than other algorithms,
Works faster because of the uniform distribution of elements.

The limitations of bucket sort are -

Not necessarily a stable algorithm,
Efficiency decreases with an increase in the size of the array,
Uses extra space, costing a space complexity of O(n).

Chat on Discord

October 14, 2021

Counting and Bucket Sort

DURATION

categories

Tags

share

FREELANCE OPPORTUNITIES

Counting sort

Code Implementation(C++):

Time Complexity:

Bucket Sort

Scatter-gather Approach

Code Implementation:

Bucket sort complexity:

Recommended for you

Tree Data Structure

Binary Indexed Trees

Data Structures