[10 Points] Consider the disk introduced in question 1 of Assignment 1, but with 10000 tracks only, numbered 0 to 9999. Suppose the r/w head is currently at track 6521.

Disk tracks0–9999
Current head positionTrack 6521
Seek overhead1 ms
Time to move over 4000 tracks1 ms
Rotational speed8800 RPM
Each track256 blocks × 8 sectors × 512 bytes
Gaps10%

(a). What is the expected number of tracks to move over to access the next block requested?

The expected distance is Left + right:

Left=E(|X₁ - X₂|) = 0...6521 / 2 = 6522 / 2 = 3261 tracks
right=E(|X₁ - X₂|) = 6522...10000 / 2 = 3479 / 2 = 1739.5 tracks
left+right=6521/10000*3261+3478/10000*1739.5 = 2731.4962 tracks

Expected number of tracks to move = 2731.4862 tracks

(b). What is the expected I/O time to transfer the next block requested?

seek = 1 + (2731.4962 / 4000) = 1.6829 ms

Rotation speed = 8800 RPM = 8800 / 60 = 146.67 rotations/sec
Time for one rotation = 1 / 146.67 = 6.82 ms
Average rotational latency = 6.82 / 2 = 3.41 ms

Each track = 256 blocks
Time per block = 6.82 / 256 = 0.0266 ms

IO = seek + rotate + transfer
IO = 1.68 + 3.41 + 0.027 = 5.12 ms

Expected I/O time to transfer next block = 5.12 ms

[10 Points] Suppose table T has 100000 records is stored in sequential disk blocks, where each record is 220 bytes and each block is 4K bytes. Also suppose the average seek time is 10 ms, rotational speed is 7500 rounds per minute, and the time to transfer a block is 0.6 ms. Suppose a user query requires accessing n random tuples in T. Determine for what value of n, performing an exhaustive search to access the desired records would be faster than performing n random block access.

In lecture we assume unspanned, slotted-page model, records don’t cross block boundaries \[ \text{Records per 4 KB block} = \frac{4\times1024}{220 bytes}=18 \] \[ \frac{100000}{18} = 5556 \text{ blocks} \]

Rotational speed = \( 7500 \, \text{RPM} = \frac{7500}{60} = 125 \, \text{rotations/sec} \)

\[ T_{\text{rotation}} = \frac{1}{125} = 8 \text{ ms} \] \[ T_{\text{rot\avg}} = \frac{T_{\text{rotation}}}{2} = 4 \text{ ms} \]

\[ T_{\text{rand}} = T_{\text{seek}} + T_{\text{rot\avg}} + T_{\text{transfer}} \] \[ T_{\text{rand}} = 10 + 4 + 0.6 = 14.6 \text{ ms} \] \[ \text{Random read total} = n \times 14.6 \text{ ms} \]

\[ T_{\text{scan}} = T_{\text{seek}} + T_{\text{rot\avg}} + B \times T_{\text{transfer}} \] \[ T_{\text{scan}} = 10 + 4 + 5556 \times 0.6 = 3347.6 \text{ ms} \]

\[ T_{\text{scan}} < n \times T_{\text{rand}} \] \[ 3347.6 < 14.6n \] \[ n > \frac{3347.6}{14.6} \approx 229.3 \]

Break even number of tuples \( n \)\( \ge 229.3 \)

[10 Points] Suppose a disk block can fit up to 30 data records or 200 key-pointer pairs. Also suppose no data or index block is more than 80% full. We want to store a table that has n records. If the total number of data and index blocks needed is k, what would be the minimum value of k (as a function of n), if (i) we use a dense index, or (ii) use a sparse index?

A block can hold up to 30 data records or 200 key–pointer pairs. No block data or index is allowed to be more than 80% full.

Max usable data records per block: \(0.8 \times 30 = 24\).

Max usable index entries per block: \(0.8 \times 200 = 160\).

Let the table have \(n\) records. Then the number of data blocks needed is \[ D = \frac{n}{24} \]

A dense index has one index entry per record, so total index entries \(= n\). With 160 entries per index block, \[ \text{dense Index Block} = \frac{n}{160} \] Therefore, the minimum total number of blocks data + index is \[ \,{\text{dense: K}}(n) = \frac{n}{24} \;+\; \frac{n}{160} = \frac{23}{480}n\approx 0.04792n \]

A sparse index typically has one entry per data block pointing to the first record in each data block. Thus total index entries \(= D\). With 160 entries per index block, \[ {\text{sparse Index Block}} = \frac{D}{160} = \frac{n/24 }{160} . \] Therefore, the minimum total number of blocks data + index is \[ \,{\text{sparse: k}}(n) =\frac{n}{24} \;+\; \frac{ n/24 }{160} =\frac{161}{3840}n \approx 0.04193n\]

Answer: \[ k_{\text{dense}}(n) = \frac{23}{480}n, \qquad k_{\text{sparse}}(n) = \frac{161}{3840}n. \]

[5 Points] Consider the data and parity block organization of 5 disks shown below, in which Bis represent data blocks and Pis represent the parity blocks for data blocks B5i−4 to B5i. Explain what major problem may be represented by this organization.

This layout puts the parity of a stripe on the same disk as one of the data blocks of that stripe.

P1 parity for B1 … B5 is on D1, which also holds B1; P2 parity for B6 … B10 is on D2, which also holds B6

If disk D1 fails, they lose both B1 and P1. To reconstruct B1, they need P1+ B2...B5

The correct Raid 5, the parity block must on a different disk than all of that data block, so any one disk failure remains recoverable. This organization violates the rule, creating a single disk failure data loss hazard.

[15 Points] We want to build a B+ tree index with the order n = 3, and insert a sequence of records with the key values 1,2,3,···, in that order. Start with a leaf node containing pointers to the records with the keys 1, 2, 3, and keep inserting every subsequent records and update the tree. To accommodate a newly inserted node and maintain the tree, when we need to split a leaf node N, suppose we divide the pointers contained in N into 2 and 2, and when splitting an interior node M, we split the pointers in M into 3 (in the left node) and 2 (in the right one).

(a). Determine inserting of which record (i.e., with what key) will result in the levels of the tree to become 4 for the first time?

Start: one leaf [1,2,3] level = 1.
Insert 4 leaf splits 2∣2 newroot appears level = 2
From then on, every two inserts split the rightmost leaf and add one child to the root.
Root can hold 4 children, so it will split making level = 3 when it needs a 5th child: after inserts 9,10 first time level becomes 3 is on key 10.

The tree first reaches 4 levels when inserting key 28.

(b). When the level becomes 4, what is the total number of pointers used in the index nodes from the root down to the leaf nodes? Do not count the pointers in the leaf nodes that point to the data records

Level 1: [19]
  Level 2: [7, 13]
     Level 3: [3, 5]    
        leaves: [1,2] [3,4] [5,6]
     Level 3: [9, 11]    
        leaves: [7,8] [9,10] [11,12]
     Level 3: [15, 17]   
        leaves: [13,14] [15,16] [17,18]
  Level 2: [25]
      Level 3: [21, 23]   
        leaves: [19,20] [21,22] [23,24]
      Level 3: [27]       
        leaves: [25,26] [27,28]
Root pointers: 2

    Level 2 pointers: 

        3+2= 5

    Level 3 pointers: 

        3+3+3+3+2= 14

2 + 5 + 14 = 21

At that moment, the index uses 21 pointers in total

[20 Points] It is well-known that not every hash functions works well. Suppose we use a hash table and insert records with integer keys k using the hash function defined as h(k) = k2 mod b, where b is the number of data buckets in the hash table. In each of the following choices of value b, explain the problems and comment on usefulness of h

(a) b = 2

k2 mod 2 are {0,1},
evens: bucket 0, odds: bucket 1.
Only two buckets exist, so collisions are heavy
a table with only 2 buckets isn’t practical

(b) b = 10

Squares mod 10 are {0, 1, 4, 5, 6, 9}.
only 6 of the 10 buckets are ever used
buckets 2, 3, 7, 8 are never hit.
This wastes 40% of the table
poor hash.

(c) b = 16

mod 16 are {0, 1, 4, 9}.
only 4 of the 16 buckets receive keys
the other 12 are permanently empty.
Clustering is extreme
very poor hash.

(d) When is this hash useful?

k² mod b is not a good hash, it can produce only the square numbers mod b.
If b is an odd prime p, the number of distinct values is nearly half of the buckets
It’s only useful for b=2, but 2 is not pracitical
For most hash tables, we need a function that spreads across all b buckets