Key Points from the Slides + What to Expect on the Test

Course context: Advanced Database Technology and Applications (COMP 6521, Fall 2025). This page distills the lecture deck into exam-ready bullets and practice prompts. :contentReference[oaicite:0]{index=0}

Databases & DBMS’s

Why DBMS?

  • Manage very large amounts of data (v.l.a.d.)
  • Provide flexible & efficient access
  • Support concurrency for many users
  • Ensure security and atomic operations
Real-world concurrency examples: multiple users editing a file; two ATM withdrawals from the same account. Expect questions about why a DBMS is needed beyond a file system.

Relational Model (RM)

  • Based on tables (relations); most widely used
  • Vendors: Oracle, Informix, Sybase; runner-ups: IBM DB2, MySQL
  • Object-relational = OO extensions to RM
  • RM foundational to major DB research & practice

Three Aspects of a DBMS (Course mapping)

  1. Modeling & Conceptual Design (COMP 353/5531)
  2. Programming: Queries & Transactions (COMP 353/5531)
  3. DBMS Implementation (COMP 451/6521)

Case Study: “Megatron 747” (Toy DBMS)

How it “works”

  • Stores relations as Unix files; schemas in a schema file
  • Simple SELECT execution by scanning tuples and checking conditions
  • Joins executed by nested loops over files
  • | T” pipe writes results to a new file and appends schema info

What’s wrong with it (and why it matters)

  • Rigid tuple layout → small edits can force full rewrites
  • No indexes → expensive full scans; brute-force joins
  • No buffer manager → excessive disk I/O
  • No concurrency control → inconsistent updates
  • No reliability (crash safety) and coarse security
Exam angle: diagnose deficiencies; propose DBMS features (buffering, indexing, locking/transactions, logging/ARIES-style ideas, fine-grained privileges) to fix them.

Storage & Memory Hierarchy

Hierarchy & Scale

  • Registers/CacheMain MemoryDisksTertiary (tapes/CD jukeboxes)
  • Trade-off: speed vs capacity vs cost
  • Nonvolatile: disks/tapes (retain data after power off)
  • Volatile: DRAM (requires refresh; power-off loses state)
Key idea: Disk I/O dominates cost; CPU time during transfer is comparatively negligible → motivates the I/O Model (count disk I/Os).

Disk Geometry & Terminology

  • Platter with tracks; two surfaces; many platters → many surfaces
  • Cylinder: tracks at the same radius across surfaces (fast to read without seeking)
  • Sector: smallest unit with ECC (e.g., 4096 B)
  • Block: group of sectors (e.g., 16 KB)

Controller Responsibilities

  • Buffering data in/out, scheduling head movement
  • Managing bad sectors (remapping/marking unusable)
Expect definitions and short problems converting sectors↔blocks↔capacity.

Disk Timing, Latency Components & I/O Model

Access Time = Seek + Rotational + Transfer

  • Seek: move head to cylinder (often dominant)
  • Rotational delay: wait for sector to rotate under head (avg ≈ ½ revolution)
  • Transfer: time for block to pass under head

Example parameters you saw: 7200 RPM (8.33 ms per rotation), 4 KB sectors, 16 KB blocks, gap overhead, etc.

Worked Examples (you should be ready to compute):

  • Average rotational delay at 7200 RPM → ≈ 4.16 ms
  • Average seek time via probability over distances (result proportional to c + k·N/3)
  • Block transfer time with inter-sector gaps
  • Total read vs. read-modify-write with verify
Exam angle: plug numbers, show units, state assumptions (e.g., average vs worst case, same cylinder vs random).

Designing I/O-Efficient DBMS Algorithms

Guidelines

  • Exploit locality: place blocks used together on the same cylinder
  • Buffer hot pages (roots of indexes, directory pages)
  • Use a block fully once it’s in memory (batch work; sequential scans)
  • Avoid random I/O: prefer sequential access patterns
Link to “Megatron” fixes: add a buffer manager, build indexes (B+-trees), implement cost-based join algorithms (e.g., sort-merge, hash-join), concurrency control (2PL/MVCC), and recovery (logging/checkpoint).

Likely Exam Topics & Practice

What I’d expect on the test

  • Concepts: DB vs DBMS; why DBMS over file system; RM vs object-relational
  • Megatron critique: identify problems; propose concrete DBMS components to fix each
  • Disk geometry: cylinder/track/sector/block; controller roles
  • Memory hierarchy: volatile vs nonvolatile; why I/O dominates
  • Timing math: compute seek/rotational/transfer; average vs worst-case
  • I/O model reasoning: translate algorithm steps into # of I/Os; compare plans
short answers numerical problems explain/fix questions

Quick Practice (with hints)

  1. Concept: Give two reasons a DBMS needs a buffer manager.
    Hint: amortize seek/rotation, reuse hot pages, reduce random I/O.
  2. Geometry: Define a cylinder and explain why it matters for layout.
    Hint: contiguous tracks at same radius → no seek between surfaces.
  3. Timing: A 7200-RPM disk, 16 KB block, average seek 6.5 ms, gaps = 10% of track. Estimate average access time for a random block.
    Sketch: 6.5 ms (seek) + 4.16 ms (rotation) + ~0.13 ms (transfer).
  4. Megatron: Why are brute-force joins unacceptable at scale, and which join would you prefer?
    Hint: quadratic pairings; consider hash-join (I/O-friendly) or sort-merge.
  5. Security: Why is file-level protection too coarse?
    Hint: need per-relation/attribute privileges and views.

Mini Cheat-Sheet (drop on your formula page)

  • Rotational delay (avg) = ½ rotation = (60 / RPM) / 2 seconds
  • Transfer time ≈ (block degrees / 360°) × rotation time (account for gaps)
  • Random seek (avg) often modeled ≈ c + k·(N/3) for N cylinders
  • Total access ≈ seek + rotational + transfer (+ any verify/write cost)
  • I/O model: count disk block transfers; CPU time during transfer ≈ negligible

Self-Check: Fixing “Megatron”

Problem → Fix

  • Rigid tuples → Slotted pages, schema-aware storage
  • Full scans → B+-tree / hash indexes
  • No buffering → Buffer manager with replacement (LRU/Clock)
  • No concurrency → 2PL / MVCC
  • No recovery → Write-ahead logging, checkpoints
  • Coarse security → GRANT/REVOKE, views, column-level ACLs
Exam tip: When asked to “improve” a toy DBMS, map each symptom to a standard DBMS component and briefly state how it reduces I/O or prevents anomalies.