Query Execution & Algorithms – Exam Notes

1. Query Processor Overview

2. Relational Algebra & Bags

Basic SELECT–FROM–WHERE is roughly \(\pi(\sigma(\text{product of input relations}))\).

Sets vs. Bags

Bag semantics

Let \(\mathrm{Card}(t, R)\) be the number of copies of tuple \(t\) in \(R\).

RA & SQL mapping

3. Basic RA Operators

4. Expression Trees

5. Scans & Iterator Model

Scanning a relation \(R\)

Iterator interface

6. Cost Parameters

Sequential scan of \(R\):

7. Classification of Algorithms

8. One-Pass Algorithms

8.1 Tuple-at-a-time (selection, projection)

8.2 Duplicate Elimination \(\delta(R)\)

8.3 Grouping \(\gamma_{L}(R)\)

8.4 One-pass Binary Operators

General requirement:

\[ \min\big(B(R), B(S)\big) \le M \]

9. Nested-Loop Joins (One-and-a-half-pass)

9.1 Tuple-based nested-loop join

for each tuple r in R:
  for each tuple s in S:
    if join condition holds:
      output (r, s)
    

9.2 Block-based nested-loop join

Assume \(B(S) \le B(R)\) and \(B(S) > M\).

Cost formula:

\[ \text{Cost} = \frac{B(S)}{M-1} \cdot \big((M-1) + B(R)\big) = B(S) + \frac{B(S)B(R)}{M-1} \approx \frac{B(S)B(R)}{M} \]

10. Two-Pass Sort-Based Algorithms (TPMMS)

10.1 External sort (TPMMS)

10.2 Duplicate elimination via sorting

10.3 Sort-based set operations

10.4 Sort-merge join

For join \(R(Y) \bowtie S(Y)\):

11. Two-Pass Hash-Based Algorithms

11.1 Hash partitioning

11.2 Hash-based duplicate elimination

11.3 Hash-based grouping

11.4 Partitioned hash-join

For \(R \bowtie S\) on attributes \(Y\):

12. Index-Based Algorithms

12.1 Clustering

12.2 Index-based selection \(\sigma_{a=c}(R)\)

After index lookup:

12.3 Index nested-loop join

Join \(R(X,Y) \bowtie S(Y,Z)\), index on \(S.Y\):

12.4 Joins with sorted (B-tree) indexes

13. Multipass Sort-Based Algorithms

13.1 Sorting a single relation

Let \(s(M,k)\) = max #blocks we can sort in \(k\) passes.

13.2 Multipass sort-based join

Let \(j(M,k)\) = max total blocks \(B(R)+B(S)\) joinable in \(k\) passes.

14. Multipass Hash-Based Algorithms

14.1 Unary operations (single relation)

Let \(u(M,k)\) = max #blocks we can handle in \(k\) passes.

14.2 Binary operations

Let \(j(M,k)\) = max #blocks of the smaller relation we can handle in \(k\) passes.

15. Quick Strategy Summary