Linear Hashing – Exam Summary

Idea: Linear hashing is a dynamic hashing scheme that grows gradually by splitting buckets one by one in a fixed order, avoiding a full rehash of all keys.

1. Parameters & Notation

Each bucket = 1 primary block + 0 or more overflow blocks.
Primary block capacity: $C$ (e.g. $C = 3$ records).
Level $i$: number of bits currently used by base hash.
Split pointer $n$: index of next bucket to split.
Number of primary buckets at level $i$: $2^i$ (conceptually).
Global occupancy: \[ \text{occ} = \frac{\text{\#records}}{\text{\#buckets} \times C} \]
Threshold $\alpha$ (e.g. $\alpha = 0.75$ or $\alpha = 1.0$).

2. Hash Functions

We assume a base hash $h(k)$ which gives an integer. Then:

\[ h_i(k) = h(k) \bmod 2^i \] Special simple case often used in examples: \[ h_i(k) = k \bmod 2^i \] Linear hashing effectively uses two variants:

$h_i(k)$ for unsplit buckets,
$h_{i+1}(k)$ for buckets that have already been split in the current round.

3. Bucket Selection Rule

Given current level $i$ and split pointer $n$, to find the bucket for key $k$:

\[ b = h_i(k) \] \[ \text{if } b < n \text{ then } b = h_{i+1}(k) \] So:

If $b \ge n$: bucket index $b$ (has not been split in this round).
If $b < n$: this bucket was already split, so we use one more bit ($h_{i+1}$).

4. Insertion Algorithm (High-Level)

Compute bucket index $b$ with the rule above.
Insert record into bucket $b$:
- If primary block has < $C$ records → store in primary.
- Else → append to overflow chain of that bucket.
Recompute global occupancy: \[ \text{occ} = \frac{\text{\#records}}{\text{\#buckets} \times C} \]
If $\text{occ} > \alpha$ (threshold) → perform one split.

Pseudocode (Insert)

Insert(key):
    b = h_i(key)
    if b < n:
        b = h_{i+1}(key)

    // insert into bucket b (primary then overflow)
    if bucket[b].primary_count < C:
        place key in primary block of bucket b
    else
        place key in overflow chain of bucket b

    if Occupancy() > alpha:
        SplitOneBucket()

5. Split Algorithm

When we decide to split, we always split the bucket whose index is n (split pointer).

Steps:

Let $s = n$ be the bucket to split.
Create a new bucket with index: \[ s' = s + 2^i \]
Collect all records (primary + overflow) from bucket $s$.
Rehash each record using: \[ b = h_{i+1}(k) = h(k) \bmod 2^{i+1} \] and put it into bucket $b$ (which is either $s$ or $s'$).
Increment split pointer: $n \leftarrow n + 1$.
If $n = 2^i$:
- Increment level: $i \leftarrow i + 1$
- Reset split pointer: $n \leftarrow 0$

Pseudocode (Split)

SplitOneBucket():
    s      = n             // bucket to split
    old_i  = i
    s_new  = s + 2^old_i   // new bucket index

    move all records from bucket[s] into temp list L
    clear bucket[s]
    ensure bucket[s_new] exists and is empty

    for each key in L:
        b = h_{old_i + 1}(key)  // = h(key) mod 2^(old_i+1)
        insert key into bucket[b]

    n = n + 1
    if n == 2^old_i:
        i = old_i + 1
        n = 0

6. Split Sequence

Split pointer $n$ always increases in order, so the bucket indices that get split form a fixed sequence:

At level $i = 1$: split buckets $0, 1$
At level $i = 2$: split buckets $0, 1, 2, 3$
At level $i = 3$: split buckets $0, 1, 2, 3, 4, 5, 6, 7$

Overall split sequence (bucket numbers): $0, 1,\; 0, 1, 2, 3,\; 0, 1, 2, 3, 4, 5, 6, 7,\; \dots$

7. Overflow Handling

Each bucket has:
- Primary block: up to $C$ records.
- Overflow blocks: additional records in a chain.
Overflow appears when more than $C$ records hash to the same bucket.
On split, we move:
- all records from primary + overflow into a temp list,
- then rehash them with $h_{i+1}$ and redistribute.
Splits tend to reduce overflow by spreading records into more buckets.
Worst case: many records all hash to the same value (e.g. all keys identical) → overflow chain grows and splitting helps very little for that specific bucket.

8. Typical Exam Points

Understand variables: $i$, $n$, occupancy, bucket capacity.
Be able to compute:
- bucket index using $h_i$ and the rule with $n$,
- when a split occurs (occupancy > threshold),
- how records move during a split.
Be able to trace:
- a sequence of inserts,
- split order,
- contents of each bucket (including overflow).
Conceptual: why linear hashing grows incrementally instead of rebuilding the whole table.