COMP 479/6791 — Information Retrieval & Web Search

Fall 2025 • Department of Computer Science & Software Engineering
Instructor: Sabine Bergler, PhD Credits: 4 Format: 3h lecture + 2h lab / week

What you'll learn

IR Foundations

  • Boolean retrieval, dictionaries, tolerant retrieval
  • Inverted indexes, postings lists, index construction & compression
  • Scoring, term weighting, vector space model
  • Evaluation: precision, recall, F-measure; relevance feedback & query expansion

ML for IR

  • Naïve Bayes and vector-space classification
  • Support Vector Machines
  • Flat & hierarchical clustering; cluster evaluation

Web Search

  • Web crawling & indexing at scale
  • Link analysis & PageRank
  • XML & structured data for search
  • Applications: spam filtering, sentiment & categorization

Textbook Introduction to Information Retrieval by Manning, Raghavan & Schütze (Cambridge, 2008) — free online edition.

Course admin

Contact & Office Hours

  • Emergencies/private: sabine.bergler@concordia.ca
  • Course comms: Moodle messages
  • Office: ER 1143
  • Hours: Tue or Wed, 11:00–12:00 (or by appointment)

Labs

  • Section D DI — Thu 14:45–16:35
  • Section D DJ — Wed 14:45–16:35
  • Section D DK — Tue 14:45–16:35

Prereqs & Background

  • Prereqs: COMP 233 or ENGR 371; COMP 352
  • Programming: recursion; lists & vectors; basic Python; processing large/ASCII files
  • Math: discrete math (relations), complexity, graphs, vectors & matrices
  • Skills: writing informative academic reports

Assessments

Weights & Dates

ItemWeightDateNotes
Midterm20% or 0%Oct 23, 2025Replaced by final if final is higher
Final Exam40% or 60%tbd by exams officeCounts 60% if higher than midterm
Project 115%Oct 16, 2025Implementation focus
Project 225%Nov 18, 2025Experiment using P2 code

Exams test theory; projects test practical implementation. Projects are individual.

Lab & Project Notes

  • Use NLTK for preprocessing (see book).
  • Design choices matter — better decisions earn more points.
  • Discuss ideas in lab and on Moodle (not other channels).
  • Project 1 and Final Project demos are required in lab.

Tentative weekly schedule

WeekTopicSuggested Readings
1Boolean retrieval; term vocabulary; postings listsCh. 1–2
2Dictionaries & tolerant retrieval; index constructionCh. 3–4
3Index compressionCh. 5
4Scoring; term weighting; vector space modelCh. 6
5Evaluation; relevance feedback; query expansionCh. 7–9
6Naïve Bayes; vector-space classificationCh. 13–14
7Reading week
8Midterm
9Support Vector MachinesCh. 15
10Flat & hierarchical clustering; cluster evaluationCh. 16–17
11RAG (topic tbd)
12MedHopQA (topic tbd)

Course learning outcomes

Graduate attributes emphasized

Resources

© 2025 • This page is a student-friendly summary of the official course outline. Always check Moodle / instructor announcements for updates.