Double hashing is a probing method which works according to a constant multiple of another hash function, representation:
P(k,x) = x*H2(k), where H2(k) is another hash function.
Both hash functions must hash the same type of keys. Double hashing boils down to linear hashing, except for the fact that the constant is unknown until the runtime. Since it is similar to linear probing we might face the same issue of infinite cycle as you can see below:
Let’s take an example to prove our point. Linear function is P(X) = 4x and H1(k) = 3, and N is 8:
H(k) + P(0) mod N = 3 The cycle {3,7} makes it impossible to
H(k) + P(1) mod N = 7 reach buckets {1,2,4,5,6}. This would cause
H(k) + P(2) mod N = 3 an infinite loop if all the buckets {3,7}
H(k) + P(3) mod N = 7 are already occupied.
H(k) + P(4) mod N = 3
H(k) + P(5) mod N = 7
H(k) + P(6) mod N = 3
H(k) + P(7) mod N = 7
Now to fix the issue of the infinite loop we choose our table size to be a prime number and also calculate the value of δ.
δ = H2(k) mod N
If the value of δ >=1, δ < N and the gcd of δ and N is 1 since N is prime, therefore, with these conditions, we can assure that the probing sequence is guaranteed to have order N, making us hit every single slot in our hash table.
Let’s suppose the keys we are using are of type T. Now the main question revolves around how we construct our secondary hash function H2(k). Given the fact it would be nice to have a systematic way to be able to effectively produce a new hash function every time we need one.
In basic programming the keys we need to hash are always composed of the same fundamental building blocks, like integers, strings, etc. As we have many great hash functions for fundamental building blocks we can use and merge them to construct our hash function H2(k). Mainly these functions are chosen from the pool of hash functions called universal hash functions which operate mainly on fundamental data types.
Let’s take a hash table of size N and we will insert some key-value pairs. Below is the hash table in detail:
Probing function: P(x) = x*H2(k)
Table size: N = 7 (prime number)
Max load factor: α = 0.75
Threshold before resize: N * α = 5
Now in case of exceeding the threshold of our hash table, we can go ahead and extend our hash table by multiplying it by two and selecting the next prime number to it. In our case 2*N = 14 and the next prime number would be 17.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
#include <bits/stdc++.h>
using namespace std;
#define TABLE_SIZE 7
#define PRIME 7
vector < int > hash_table(TABLE_SIZE, -1);
int size = 0;
int h1(int key) {
return (key % TABLE_SIZE);
}
int h2(int key) {
return (PRIME - (key % PRIME));
}
void insert(int key) {
if (size == TABLE_SIZE)
return;
int idx = h1(key);
if (hash_table[idx] != -1) {
int idx2 = h2(key);
int i = 1;
while (1) {
int newidx = (idx + i * idx2) % TABLE_SIZE;
if (hash_table[newidx] == -1) {
hash_table[newidx] = key;
break;
}
i++;
}
} else
hash_table[idx] = key;
size++;
}
void display() {
cout << "Hash Table" << endl;
for (int i = 0; i < TABLE_SIZE; i++)
if (hash_table[i] != -1)
cout << i << " --> " << hash_table[i] << endl;
else
cout << i << " --> ∅" << endl;
}
int main() {
int n, x;
cin >> n;
for (int i = 0; i < n; i++) {
cin >> x;
insert(x);
}
display();
return 0;
}
Input:
6
8 56 89 7 23 6
Output:
Hash Table
0 --> 56
1 --> 8
2 --> 23
3 --> 7
4 --> ∅
5 --> 89
6 --> 6