Key Points from the Slides + What to Expect on the Test
Course context: Advanced Database Technology and Applications (COMP 6521, Fall 2025). This page distills the lecture deck into exam-ready bullets and practice prompts. :contentReference[oaicite:0]{index=0}
Databases & DBMS’s
Why DBMS?
- Manage very large amounts of data (v.l.a.d.)
- Provide flexible & efficient access
- Support concurrency for many users
- Ensure security and atomic operations
Real-world concurrency examples: multiple users editing a file; two ATM withdrawals from the same account. Expect questions about why a DBMS is needed beyond a file system.
Relational Model (RM)
- Based on tables (relations); most widely used
- Vendors: Oracle, Informix, Sybase; runner-ups: IBM DB2, MySQL
- Object-relational = OO extensions to RM
- RM foundational to major DB research & practice
Three Aspects of a DBMS (Course mapping)
- Modeling & Conceptual Design (COMP 353/5531)
- Programming: Queries & Transactions (COMP 353/5531)
- DBMS Implementation (COMP 451/6521)
Case Study: “Megatron 747” (Toy DBMS)
How it “works”
- Stores relations as Unix files; schemas in a schema file
- Simple SELECT execution by scanning tuples and checking conditions
- Joins executed by nested loops over files
- “| T” pipe writes results to a new file and appends schema info
What’s wrong with it (and why it matters)
- Rigid tuple layout → small edits can force full rewrites
- No indexes → expensive full scans; brute-force joins
- No buffer manager → excessive disk I/O
- No concurrency control → inconsistent updates
- No reliability (crash safety) and coarse security
Exam angle: diagnose deficiencies; propose DBMS features (buffering, indexing, locking/transactions, logging/ARIES-style ideas, fine-grained privileges) to fix them.
Storage & Memory Hierarchy
Hierarchy & Scale
- Registers/Cache → Main Memory → Disks → Tertiary (tapes/CD jukeboxes)
- Trade-off: speed vs capacity vs cost
- Nonvolatile: disks/tapes (retain data after power off)
- Volatile: DRAM (requires refresh; power-off loses state)
Key idea: Disk I/O dominates cost; CPU time during transfer is comparatively negligible → motivates the I/O Model (count disk I/Os).
Disk Geometry & Terminology
- Platter with tracks; two surfaces; many platters → many surfaces
- Cylinder: tracks at the same radius across surfaces (fast to read without seeking)
- Sector: smallest unit with ECC (e.g., 4096 B)
- Block: group of sectors (e.g., 16 KB)
Controller Responsibilities
- Buffering data in/out, scheduling head movement
- Managing bad sectors (remapping/marking unusable)
Expect definitions and short problems converting sectors↔blocks↔capacity.
Disk Timing, Latency Components & I/O Model
Access Time = Seek + Rotational + Transfer
- Seek: move head to cylinder (often dominant)
- Rotational delay: wait for sector to rotate under head (avg ≈ ½ revolution)
- Transfer: time for block to pass under head
Example parameters you saw: 7200 RPM (8.33 ms per rotation), 4 KB sectors, 16 KB blocks, gap overhead, etc.
Worked Examples (you should be ready to compute):
- Average rotational delay at 7200 RPM → ≈ 4.16 ms
- Average seek time via probability over distances (result proportional to
c + k·N/3) - Block transfer time with inter-sector gaps
- Total read vs. read-modify-write with verify
Exam angle: plug numbers, show units, state assumptions (e.g., average vs worst case, same cylinder vs random).
Designing I/O-Efficient DBMS Algorithms
Guidelines
- Exploit locality: place blocks used together on the same cylinder
- Buffer hot pages (roots of indexes, directory pages)
- Use a block fully once it’s in memory (batch work; sequential scans)
- Avoid random I/O: prefer sequential access patterns
Link to “Megatron” fixes: add a buffer manager, build indexes (B+-trees), implement cost-based join algorithms (e.g., sort-merge, hash-join), concurrency control (2PL/MVCC), and recovery (logging/checkpoint).
Likely Exam Topics & Practice
What I’d expect on the test
- Concepts: DB vs DBMS; why DBMS over file system; RM vs object-relational
- Megatron critique: identify problems; propose concrete DBMS components to fix each
- Disk geometry: cylinder/track/sector/block; controller roles
- Memory hierarchy: volatile vs nonvolatile; why I/O dominates
- Timing math: compute seek/rotational/transfer; average vs worst-case
- I/O model reasoning: translate algorithm steps into # of I/Os; compare plans
short answers
numerical problems
explain/fix questions
Quick Practice (with hints)
- Concept: Give two reasons a DBMS needs a buffer manager.Hint: amortize seek/rotation, reuse hot pages, reduce random I/O.
- Geometry: Define a cylinder and explain why it matters for layout.Hint: contiguous tracks at same radius → no seek between surfaces.
- Timing: A 7200-RPM disk, 16 KB block, average seek 6.5 ms, gaps = 10% of track. Estimate average access time for a random block.Sketch: 6.5 ms (seek) + 4.16 ms (rotation) + ~0.13 ms (transfer).
- Megatron: Why are brute-force joins unacceptable at scale, and which join would you prefer?Hint: quadratic pairings; consider hash-join (I/O-friendly) or sort-merge.
- Security: Why is file-level protection too coarse?Hint: need per-relation/attribute privileges and views.
Mini Cheat-Sheet (drop on your formula page)
- Rotational delay (avg) = ½ rotation =
(60 / RPM) / 2seconds - Transfer time ≈ (block degrees / 360°) × rotation time (account for gaps)
- Random seek (avg) often modeled ≈
c + k·(N/3)for N cylinders - Total access ≈ seek + rotational + transfer (+ any verify/write cost)
- I/O model: count disk block transfers; CPU time during transfer ≈ negligible
Self-Check: Fixing “Megatron”
Problem → Fix
- Rigid tuples → Slotted pages, schema-aware storage
- Full scans → B+-tree / hash indexes
- No buffering → Buffer manager with replacement (LRU/Clock)
- No concurrency → 2PL / MVCC
- No recovery → Write-ahead logging, checkpoints
- Coarse security → GRANT/REVOKE, views, column-level ACLs
Exam tip: When asked to “improve” a toy DBMS, map each symptom to a standard DBMS component and briefly state how it reduces I/O or prevents anomalies.