What you'll learn
IR Foundations
- Boolean retrieval, dictionaries, tolerant retrieval
- Inverted indexes, postings lists, index construction & compression
- Scoring, term weighting, vector space model
- Evaluation: precision, recall, F-measure; relevance feedback & query expansion
ML for IR
- Naïve Bayes and vector-space classification
- Support Vector Machines
- Flat & hierarchical clustering; cluster evaluation
Web Search
- Web crawling & indexing at scale
- Link analysis & PageRank
- XML & structured data for search
- Applications: spam filtering, sentiment & categorization
Textbook Introduction to Information Retrieval by Manning, Raghavan & Schütze (Cambridge, 2008) — free online edition.
Course admin
Contact & Office Hours
- Emergencies/private: sabine.bergler@concordia.ca
- Course comms: Moodle messages
- Office: ER 1143
- Hours: Tue or Wed, 11:00–12:00 (or by appointment)
Labs
- Section D DI — Thu 14:45–16:35
- Section D DJ — Wed 14:45–16:35
- Section D DK — Tue 14:45–16:35
Prereqs & Background
- Prereqs: COMP 233 or ENGR 371; COMP 352
- Programming: recursion; lists & vectors; basic Python; processing large/ASCII files
- Math: discrete math (relations), complexity, graphs, vectors & matrices
- Skills: writing informative academic reports
Assessments
Weights & Dates
| Item | Weight | Date | Notes |
|---|---|---|---|
| Midterm | 20% or 0% | Oct 23, 2025 | Replaced by final if final is higher |
| Final Exam | 40% or 60% | tbd by exams office | Counts 60% if higher than midterm |
| Project 1 | 15% | Oct 16, 2025 | Implementation focus |
| Project 2 | 25% | Nov 18, 2025 | Experiment using P2 code |
Exams test theory; projects test practical implementation. Projects are individual.
Lab & Project Notes
- Use NLTK for preprocessing (see book).
- Design choices matter — better decisions earn more points.
- Discuss ideas in lab and on Moodle (not other channels).
- Project 1 and Final Project demos are required in lab.
Tentative weekly schedule
| Week | Topic | Suggested Readings |
|---|---|---|
| 1 | Boolean retrieval; term vocabulary; postings lists | Ch. 1–2 |
| 2 | Dictionaries & tolerant retrieval; index construction | Ch. 3–4 |
| 3 | Index compression | Ch. 5 |
| 4 | Scoring; term weighting; vector space model | Ch. 6 |
| 5 | Evaluation; relevance feedback; query expansion | Ch. 7–9 |
| 6 | Naïve Bayes; vector-space classification | Ch. 13–14 |
| 7 | Reading week | — |
| 8 | Midterm | — |
| 9 | Support Vector Machines | Ch. 15 |
| 10 | Flat & hierarchical clustering; cluster evaluation | Ch. 16–17 |
| 11 | RAG (topic tbd) | — |
| 12 | MedHopQA (topic tbd) | — |
Course learning outcomes
- Perform basic text processing & tokenization
- Build and query an inverted index
- Implement keyword-based document retrieval
- Apply the vector space model for search & classification
- Use clustering for organizing documents
- Design and evaluate a simple web crawler
Graduate attributes emphasized
- Knowledge-base: Text cleaning; tokenization; indexing; search; MapReduce; vector-space modeling; clustering
- Design: Crawl & index the web; assess page sentiment at scale
- Tools: Linux, Java, IDEs; adapt/implement algorithms; Python & NLTK for NLP preprocessing
- Work: Individual projects; final project integrates crawling, indexing, ranking, sentiment
Resources
- Textbook: Introduction to Information Retrieval (free online)
- NLTK Book: Natural Language Processing with Python
- On-Campus Support: Student services via Gina Cody School
© 2025 • This page is a student-friendly summary of the official course outline. Always check Moodle / instructor announcements for updates.