Relational Model & DBMS

“Megatron 747” (Teaching DBMS)

Idea Data as Unix files; schema in separate file. Example tuple encodings like Smith#123#CS.

Query SELECT * FROM Students; → list tuples.

Projection+Filter→File SELECT * FROM R WHERE Cond | T; reads schema, scans R, writes qualifying tuples to T, and registers new file T.

Naïve Query Execution (Scan-Join)

disk capacity=tracks*blocks*sectors*bytes

2^30=1tb

circle pi*diameters, bytes/(90%*diameters*pi), inner and outer are different, everage inner+outer/2

maxseek=d1+maxtrack/trackpersec,averageseek=d1+1/3(maxtrack/trackpersec)

T_fullrotation=time/rpm,T_averagerot=T_fullrotation/2

two pass:RX2+SX2+R+S,$\frac{block}{M}$ < M-1

$\frac{k}{k+1}$

Serverbreak_p:percent_break x recover_hour/8760 hour

raid0: 0 fault tolerance,raid1:mirror,raid3:1diskfail byte level xor,raid4,large read slow writer block level xor, raid5:1 disk can fail distributed parity balance performance,raid6:2disk can fail

raid4:XOR:00:0,01:1,10:1,11:0

number of track to move: left+right, left/total*left_avg+right/total*right_avg

seek=seek_overhead+track_avg/trackpersec

Record_per_block=$\frac{kb*1024}{record_bytes}$

Require_blocks=record_num/Record_per_block

Tscan=Tseek+Trot_avg+B*Ttransfer

Trand=Tseek+Trot_avg+Trans

Tscan < Trand*Tuple_num

dense Index Block:total_block=$\frac{index entries}{Record_per_block}+\frac{index entries}{Index_per_block}$

Sparse Index Block:total_block=$\frac{index entries}{Record_per_block}+\frac{index entries/Record_per_block}{Index_per_block}$

$RPM=\frac{600000}{avg_dalay*2}

read whole tracks=Tseek+avg_dalay*2*track_number

seek time=overhead+(time per cylinder)*N/3,random accesses distance = N/3.

The first-level index (on data) is typically dense, not sparse. The second-level index (on the first-level index) is sparse, not dense.

Clustering Index Built on a non-key attribute of an ordered data file where records with the same value are stored together (clustered).

Secondary Index on Key Attribute A secondary index built on an attribute that is also a key

Sparse Primary Index is is the usual case for a primary index. order file One entry per data block (not every record).

Dense Secondary Index One entry for every record in the data file. Built on a non-ordering attribute

Memory, Storage & Disks

Access-time & Throughput (Rules of Thumb)

  • Disk Access Time (approx.): $T_{access}=T_{seek}+T_{rot}+T_{xfer}$ (seek ms + rotational delay ms + transfer). (Generic formula; aligns with hierarchy notion that “Disk I/O dominates computation”.)
  • Block Transfer Time: $T_{xfer}=\dfrac{\text{BlockSize}}{\text{TransferRate}}$.
  • Scan I/O Count (unindexed): $\approx$ #blocks in file.

Files, Blocks & Records

Phase1 runs=$\frac{diskblock}{Memorysize}$

Mergesort $\frac{diskblock}{M} <= M-1, \sqrt{diskblock}+1$

Mergesort (M-1)*M=Block size

Sortest distance first, elevator: move up then down

Primary Index Built on a sorted (ordered) data file. The index key is the primary key (ordering key) of the data file. There is one index entry per data block (not per record). Points to the first record in each block.

Secondary Index Built on a non-ordering field (not the primary key). The data file itself may be unordered or ordered on another field. There can be multiple secondary indexes on the same file. Typically has one index entry per record, because data are not ordered on this field.

Secondary Level Index (or Multi-Level Index) This is not a different kind of index (primary or secondary) — it’s about the structure of the index. When an index (primary or secondary) is too large to fit in memory, we build another index on top of it — a second-level index. This creates a hierarchy (multi-level index) that speeds up lookups.

Search Key A field or combination of fields used to look up records in an index. It is a physical access concept, used in file organization and indexing. Does not have to be unique — multiple records can have the same search key value.

Variable-Length & Variable-Format Records

Insertions & Deletions (Ordered Files)

Pointer Swizzling & Addresses

  • Heap file page count: $P=\left\lceil \dfrac{N \cdot R}{B} \right\rceil$  (N tuples × tuple size / block size)
  • Scan time (rough): $T_{scan}\approx P \cdot (T_{seek}+T_{rot}+T_{xfer})$
  • Avg. join (NLJ, no index): $N_R \cdot N_S$ comparisons (tuple-wise), or page-wise $P_R\cdot P_S$.
  • Block reorg overhead: offset-table update $\ll$ tuple rewrite (keeps inserts cheap).
  • Swizzle invariant: all in-mem objects must have valid reverse entries to unswizzle safely before eviction.

Primary Index Ordering (primary) key Yes Block Sparse 1 entry/block
Sparse Primary Index Ordering key Yes Block Sparse Common primary index
Dense Primary Index Primary key No Record Dense Used if file unordered
Secondary Index Non-ordering field No Record Dense One per record
Sparse Secondary Index Non-ordering field No – ❌ Not practical
Secondary Index on Key Attribute Non-ordering unique field No Record Dense Each record unique
Clustering Index Non-key attribute (with duplicates) Yes Distinct value Spar