Linear Hashing – Exam Summary
Idea: Linear hashing is a dynamic hashing scheme that
grows gradually by splitting buckets one by one in a fixed order,
avoiding a full rehash of all keys.
1. Parameters & Notation
- Each bucket = 1 primary block + 0 or more overflow blocks.
- Primary block capacity: $C$ (e.g. $C = 3$ records).
- Level $i$: number of bits currently used by base hash.
- Split pointer $n$: index of next bucket to split.
- Number of primary buckets at level $i$: $2^i$ (conceptually).
- Global occupancy:
\[
\text{occ} = \frac{\text{\#records}}{\text{\#buckets} \times C}
\]
- Threshold $\alpha$ (e.g. $\alpha = 0.75$ or $\alpha = 1.0$).
2. Hash Functions
We assume a base hash $h(k)$ which gives an integer. Then:
\[
h_i(k) = h(k) \bmod 2^i
\]
Special simple case often used in examples:
\[
h_i(k) = k \bmod 2^i
\]
Linear hashing effectively uses two variants:
- $h_i(k)$ for unsplit buckets,
- $h_{i+1}(k)$ for buckets that have already been split in the current round.
3. Bucket Selection Rule
Given current level $i$ and split pointer $n$, to find the bucket for key $k$:
\[
b = h_i(k)
\]
\[
\text{if } b < n \text{ then } b = h_{i+1}(k)
\]
So:
- If $b \ge n$: bucket index $b$ (has not been split in this round).
- If $b < n$: this bucket was already split, so we use one more bit ($h_{i+1}$).
4. Insertion Algorithm (High-Level)
- Compute bucket index $b$ with the rule above.
- Insert record into bucket $b$:
- If primary block has < $C$ records → store in primary.
- Else → append to overflow chain of that bucket.
- Recompute global occupancy:
\[
\text{occ} = \frac{\text{\#records}}{\text{\#buckets} \times C}
\]
- If $\text{occ} > \alpha$ (threshold) → perform one split.
Pseudocode (Insert)
Insert(key):
b = h_i(key)
if b < n:
b = h_{i+1}(key)
// insert into bucket b (primary then overflow)
if bucket[b].primary_count < C:
place key in primary block of bucket b
else
place key in overflow chain of bucket b
if Occupancy() > alpha:
SplitOneBucket()
5. Split Algorithm
When we decide to split, we always split the bucket whose index is n (split pointer).
Steps:
- Let $s = n$ be the bucket to split.
- Create a new bucket with index:
\[
s' = s + 2^i
\]
- Collect all records (primary + overflow) from bucket $s$.
- Rehash each record using:
\[
b = h_{i+1}(k) = h(k) \bmod 2^{i+1}
\]
and put it into bucket $b$ (which is either $s$ or $s'$).
- Increment split pointer: $n \leftarrow n + 1$.
- If $n = 2^i$:
- Increment level: $i \leftarrow i + 1$
- Reset split pointer: $n \leftarrow 0$
Pseudocode (Split)
SplitOneBucket():
s = n // bucket to split
old_i = i
s_new = s + 2^old_i // new bucket index
move all records from bucket[s] into temp list L
clear bucket[s]
ensure bucket[s_new] exists and is empty
for each key in L:
b = h_{old_i + 1}(key) // = h(key) mod 2^(old_i+1)
insert key into bucket[b]
n = n + 1
if n == 2^old_i:
i = old_i + 1
n = 0
6. Split Sequence
Split pointer $n$ always increases in order, so the bucket indices that get split form a fixed sequence:
- At level $i = 1$: split buckets $0, 1$
- At level $i = 2$: split buckets $0, 1, 2, 3$
- At level $i = 3$: split buckets $0, 1, 2, 3, 4, 5, 6, 7$
Overall split sequence (bucket numbers):
$0, 1,\; 0, 1, 2, 3,\; 0, 1, 2, 3, 4, 5, 6, 7,\; \dots$
7. Overflow Handling
- Each bucket has:
- Primary block: up to $C$ records.
- Overflow blocks: additional records in a chain.
- Overflow appears when more than $C$ records hash to the same bucket.
- On split, we move:
- all records from primary + overflow into a temp list,
- then rehash them with $h_{i+1}$ and redistribute.
- Splits tend to reduce overflow by spreading records into more buckets.
- Worst case: many records all hash to the same value (e.g. all keys identical) → overflow chain grows and splitting helps very little for that specific bucket.
8. Typical Exam Points
- Understand variables: $i$, $n$, occupancy, bucket capacity.
- Be able to compute:
- bucket index using $h_i$ and the rule with $n$,
- when a split occurs (occupancy > threshold),
- how records move during a split.
- Be able to trace:
- a sequence of inserts,
- split order,
- contents of each bucket (including overflow).
- Conceptual: why linear hashing grows incrementally instead of rebuilding the whole table.