Double Hashing

Nikhil Kumar SinghVrishchik

DURATION

11min

Tags

C++

Love to compete?

Join Topcoder Challenges

Double hashing is a probing method which works according to a constant multiple of another hash function, representation:

P(k,x) = x*H₂(k), where H₂(k) is another hash function.

Both hash functions must hash the same type of keys. Double hashing boils down to linear hashing, except for the fact that the constant is unknown until the runtime. Since it is similar to linear probing we might face the same issue of infinite cycle as you can see below:

Let’s take an example to prove our point. Linear function is P(X) = 4x and H₁(k) = 3, and N is 8:

H(k) + P(0) mod N = 3 The cycle {3,7} makes it impossible to
H(k) + P(1) mod N = 7 reach buckets {1,2,4,5,6}. This would cause
H(k) + P(2) mod N = 3 an infinite loop if all the buckets {3,7}
H(k) + P(3) mod N = 7 are already occupied.
H(k) + P(4) mod N = 3
H(k) + P(5) mod N = 7
H(k) + P(6) mod N = 3
H(k) + P(7) mod N = 7

Now to fix the issue of the infinite loop we choose our table size to be a prime number and also calculate the value of δ.

δ = H₂(k) mod N

If the value of δ >=1, δ < N and the gcd of δ and N is 1 since N is prime, therefore, with these conditions, we can assure that the probing sequence is guaranteed to have order N, making us hit every single slot in our hash table.

Let’s suppose the keys we are using are of type T. Now the main question revolves around how we construct our secondary hash function H₂(k). Given the fact it would be nice to have a systematic way to be able to effectively produce a new hash function every time we need one.

In basic programming the keys we need to hash are always composed of the same fundamental building blocks, like integers, strings, etc. As we have many great hash functions for fundamental building blocks we can use and merge them to construct our hash function H₂(k). Mainly these functions are chosen from the pool of hash functions called universal hash functions which operate mainly on fundamental data types.

Let’s take a hash table of size N and we will insert some key-value pairs. Below is the hash table in detail:

Probing function: P(x) = x*H₂(k)
Table size: N = 7 (prime number)
Max load factor: α = 0.75
Threshold before resize: N * α = 5

Now in case of exceeding the threshold of our hash table, we can go ahead and extend our hash table by multiplying it by two and selecting the next prime number to it. In our case 2*N = 14 and the next prime number would be 17.

Code implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
#include <bits/stdc++.h>

using namespace std;

#define TABLE_SIZE 7
#define PRIME 7

vector < int > hash_table(TABLE_SIZE, -1);
int size = 0;

int h1(int key) {
  return (key % TABLE_SIZE);
}

int h2(int key) {
  return (PRIME - (key % PRIME));
}

void insert(int key) {
  if (size == TABLE_SIZE)
    return;

  int idx = h1(key);

  if (hash_table[idx] != -1) {
    int idx2 = h2(key);
    int i = 1;
    while (1) {
      int newidx = (idx + i * idx2) % TABLE_SIZE;

      if (hash_table[newidx] == -1) {
        hash_table[newidx] = key;
        break;
      }
      i++;
    }
  } else
    hash_table[idx] = key;
  size++;
}

void display() {
  cout << "Hash Table" << endl;
  for (int i = 0; i < TABLE_SIZE; i++)
    if (hash_table[i] != -1)
      cout << i << " --> " << hash_table[i] << endl;
    else
      cout << i << " --> ∅" << endl;
}

int main() {
  int n, x;
  cin >> n;

  for (int i = 0; i < n; i++) {
    cin >> x;
    insert(x);
  }

  display();
  return 0;
}

Input:

6
8 56 89 7 23 6

Output:

Hash Table
0 --> 56
1 --> 8
2 --> 23
3 --> 7
4 --> ∅
5 --> 89
6 --> 6

Chat on Discord

December 28, 2021

Double Hashing

DURATION

categories

Tags

share

Join Topcoder Challenges

Code implementation:

Recommended for you

List of awesome learning resources

Binary Search

Competitive Programming at Topcoder: A Step by Step Guide