Linear Hashing – Exam Summary

Idea: Linear hashing is a dynamic hashing scheme that grows gradually by splitting buckets one by one in a fixed order, avoiding a full rehash of all keys.

1. Parameters & Notation

2. Hash Functions

We assume a base hash $h(k)$ which gives an integer. Then:

\[ h_i(k) = h(k) \bmod 2^i \] Special simple case often used in examples: \[ h_i(k) = k \bmod 2^i \] Linear hashing effectively uses two variants:

3. Bucket Selection Rule

Given current level $i$ and split pointer $n$, to find the bucket for key $k$:

\[ b = h_i(k) \] \[ \text{if } b < n \text{ then } b = h_{i+1}(k) \] So:

4. Insertion Algorithm (High-Level)

  1. Compute bucket index $b$ with the rule above.
  2. Insert record into bucket $b$:
  3. Recompute global occupancy: \[ \text{occ} = \frac{\text{\#records}}{\text{\#buckets} \times C} \]
  4. If $\text{occ} > \alpha$ (threshold) → perform one split.

Pseudocode (Insert)

Insert(key):
    b = h_i(key)
    if b < n:
        b = h_{i+1}(key)

    // insert into bucket b (primary then overflow)
    if bucket[b].primary_count < C:
        place key in primary block of bucket b
    else
        place key in overflow chain of bucket b

    if Occupancy() > alpha:
        SplitOneBucket()

5. Split Algorithm

When we decide to split, we always split the bucket whose index is n (split pointer).

Steps:

  1. Let $s = n$ be the bucket to split.
  2. Create a new bucket with index: \[ s' = s + 2^i \]
  3. Collect all records (primary + overflow) from bucket $s$.
  4. Rehash each record using: \[ b = h_{i+1}(k) = h(k) \bmod 2^{i+1} \] and put it into bucket $b$ (which is either $s$ or $s'$).
  5. Increment split pointer: $n \leftarrow n + 1$.
  6. If $n = 2^i$:

Pseudocode (Split)

SplitOneBucket():
    s      = n             // bucket to split
    old_i  = i
    s_new  = s + 2^old_i   // new bucket index

    move all records from bucket[s] into temp list L
    clear bucket[s]
    ensure bucket[s_new] exists and is empty

    for each key in L:
        b = h_{old_i + 1}(key)  // = h(key) mod 2^(old_i+1)
        insert key into bucket[b]

    n = n + 1
    if n == 2^old_i:
        i = old_i + 1
        n = 0

6. Split Sequence

Split pointer $n$ always increases in order, so the bucket indices that get split form a fixed sequence:

Overall split sequence (bucket numbers): $0, 1,\; 0, 1, 2, 3,\; 0, 1, 2, 3, 4, 5, 6, 7,\; \dots$

7. Overflow Handling

8. Typical Exam Points