Boolean Retrieval

Mini lesson and interactive query demo

Try it

Use operators AND, OR, NOT, and parentheses ( ). Terms are case-insensitive; operators are case-insensitive too.

Example queries: information AND retrieval (web OR internet) AND crawling ranking AND NOT pagerank "inverted index"

How the demo works

  1. Tokenize each document into terms (lowercased, stripped of punctuation).
  2. Build an inverted index: term → {docIDs}.
  3. Parse the query to a syntax tree (supports quotes for phrases).
  4. Evaluate the tree using set operations over posting lists.
  5. Highlight matched terms/phrases and list matching documents.

Study cheatsheet

Core concepts

  • Tokenization & normalization
  • Dictionary & postings lists
  • Conjunctive vs disjunctive queries
  • Skip pointers & efficient merging

Operator precedence

NOT > AND > OR (unless parentheses override)

Phrases in "double quotes" are treated as exact sequences.

Complexity intuition

Conjunctive queries cost ~ the sum of traversed posting lengths; ordering terms by increasing df can reduce work.