1.[15 Points] Suppose relation R is stored in 10,000 disk blocks. Estimate the number of main memory blocks needed to use the a two-pass, sort-based algorithm to perform the following operations: (a) δ (duplicate elimination), (b) γ (group by), and (c) a binary operation such as join or union?

We have formula two-pass external sort on a relation of size \(B\) blocks \[ B \le M(M-1). \] With \(B = 10{,}000\): \[ M^2 - M - 10{,}000 \ge 0 \;\Rightarrow\; M \ge \frac{1+\sqrt{1+4\cdot 10{,}000}}{2} = \frac{1+\sqrt{40{,}001}}{2} \approx 100.5. \] The minimal is \[ M = 101. \]

a.\(\delta\) duplicate elimination: requires sorting \(R\) in two passes \(101\) buffer blocks.
b.\(\gamma\) group by: also \(101\) buffer blocks.

\[B(R)+B(S)\le\ M^2\]

10000+10000=20000

\[M \ge\sqrt{20000}\] => 142

c.a binary operation such as join or union: \(142\) buffers.

2. [30 Points] Suppose in a linear hash structure, blocks can hold three records. Begin with the hash table having two empty blocks corresponding to buckets 0 and 1. We then insert the following 16 records into the hash structure where the hashed keys are 0000, 0001, 0010, 0011,...,1110, 1111, in that order. Show the hash structure after insertion of the records with the hash keys: 0111, 1001, and 1100. Do the above separately in two cases with occupancy ratio: (a) 75% and (b) 100%. You need to show 3 hash structures in case (a) and also 3 in case (b).

linear hashing: \[ h_i(k) = k \bmod 2^i, \] if \(h_i(k) < p\) \[ h_{i+1}(k) = k \bmod 2^{i+1}. \]

(a) Occupancy Ratio = 75%

case 0111

\(i = 2,\; p = 0\), number of buckets = 4

[0]: 0000, 0100
[1]: 0001, 0101
[2]: 0010, 0110
[3]: 0011, 0111
  
case 1001

\(i = 2,\; p = 1\), number of buckets = 5

[0]: 0000, 1000
[1]: 0001, 0101, 1001
[2]: 0010, 0110
[3]: 0011, 0111
[4]: 0100
  
case 1100

\(i = 2,\; p = 2\), number of buckets = 6

[0]: 0000, 1000
[1]: 0001, 1001
[2]: 0010, 0110, 1010
[3]: 0011, 0111, 1011
[4]: 0100, 1100
[5]: 0101
  

(b) Occupancy Ratio = 100%

case 0111

\(i = 1,\; p = 1\), number of buckets = 3

[0]: 0000, 0100
[1]: 0001, 0011, 0101 ,0111 
[2]: 0010, 0110
  

overflow block: 0111

case 1001

\(i = 2,\; p = 0\), number of buckets = 4

[0]: 0000, 0100, 1000
[1]: 0001, 0101, 1001
[2]: 0010, 0110
[3]: 0011, 0111
  
case 1100

\(i = 2,\; p = 1\), number of buckets = 5

[0]: 0000, 1000
[1]: 0001, 0101, 1001
[2]: 0010, 0110, 1010
[3]: 0011, 0111, 1011
[4]: 0100, 1100
  

3. [15 Points] Construct an extensible hash structure and insert 16 records with their keys hashed into 4 bits as follows: 1111, 1110,···, 0000, in the order specified. Start with two empty buckets corresponding to 0 and 1, show the organization of the hash structure after inserting each record. Assume each bucket can hold 3 records.

1.insert key 1111

NumDirectorydepthKeys
B101Null
B2111111

2.insert key 1110

NumDirectorydepthKeys
B101Null
B2111111, 1110

3.insert key 1101

NumDirectorydepthKeys
B101Null
B2111111, 1110, 1101

4.insert key 1100

NumDirectorydepthKeys
B1000, 001, 010, 0111Null
B2100, 1012Null
B311031101, 1100
B411131111, 1110

5.insert key 1011

NumDirectorydepthKeys
B1000, 001, 010, 0111Null
B2100, 10121011
B311031101, 1100
B411131111, 1110

6. 1010

NumDirectorydepthKeys
B1000, 001, 010, 0111Null
B2100, 10121011, 1010
B311031101, 1100
B411131111, 1110

7. 1001

NumDirectorydepthKeys
B1000, 001, 010, 0111Null
B2100, 10121011, 1010, 1001
B311031101, 1100
B411131111, 1110

8. 1000

NumDirectorydepthKeys
B1000, 001, 010, 0111Null
B210031001, 1000
B310131011, 1010
B411031101, 1100
B511131111, 1110

9. 0111

NumDirectorydepthKeys
B1000, 001, 010, 01110111
B210031001, 1000
B310131011, 1010
B411031101, 1100
B511131111, 1110

10. 0110

NumDirectorydepthKeys
B1000, 001, 010, 01110111, 0110
B210031001, 1000
B310131011, 1010
B411031101, 1100
B511131111, 1110

11. 0101

NumDirectorydepthKeys
B1000, 001, 010, 01110111, 0110, 0101
B210031001, 1000
B310131011, 1010
B411031101, 1100
B511131111, 1110

12. 0100

NumDirectorydepthKeys
B1000, 0012Null
B201030101, 0100
B301130111, 0110
B410031001, 1000
B510131011, 1010
B611031101, 1100
B711131111, 1110

13. 0011

NumDirectorydepthKeys
B1000, 00120011
B201030101, 0100
B301130111, 0110
B410031001, 1000
B510131011, 1010
B611031101, 1100
B711131111, 1110
v

14. 0010

NumDirectorydepthKeys
B1000, 00120011, 0010
B201030101, 0100
B301130111, 0110
B410031001, 1000
B510131011, 1010
B611031101, 1100
B711131111, 1110

15. 0001

NumDirectorydepthKeys
B1000, 00120011, 0010, 0001
B201030101, 0100
B301130111, 0110
B410031001, 1000
B510131011, 1010
B611031101, 1100
B711131111, 1110

16. 0000

NumDirectorydepthKeys
B100030001, 0000
B200130011, 0010
B301030101, 0100
B401130111, 0110
B510031001, 1000
B610131011, 1010
B711031101, 1100
B811131111, 1110

4. [10 Points] Applying the run-length encoding method for each of the following bitmap vectors, what would be the result obtained? (a). 01100000001000001000000 (b). 10000010000001001101

(a) Bitmap: 01100000001000001000000

Drop trailing zeros:
01100000001000001
then lengths: [1, 0, 7, 5]
i=1,prefix 0,binary 1, code = 0+1
i=0,prefix 0,binary 0, code = 0+0
i=7,prefix 110,binary 111, code = 110+111
i=5,prefix 110,binary 101, code = 110+101
Answer:0100110111110101

(b) Bitmap: 10000010000001001101

lengths: [0, 5, 6, 2, 0, 1]
i=0,binary 0,prefix 0, code= 00
i=5,101,110,110101
i=6,110,110,110110
i=2,10,10,1010
i=0,00
i=1,01
Answer: 0011010111011010100001

5. [10 Points] Suppose A is the key attribute of a relation R that has n tuples. Calculate the number of bits used to represent the bitmap vectors for A

In lecture we have: total bits = number of records × number of distinct values,which is m x n
In this question A is a key, every tuple has a different A value.
So the number of distinct values of A = number of tuples = n.
Then we have:

\[ \underbrace{n}_{\text{bitmaps}} \times \underbrace{n}_{\text{bits per bitmap}} \;=\; n^2\ \text{bits}. \]