Mock Exam – Introduction to Information Retrieval
60 multiple-choice questions. Select exactly one option per question.
I. Boolean Retrieval (Q1–Q10)
Q1. In the Boolean retrieval model, a query is interpreted as:
Q2. The data structure that maps each term to the list of documents containing it is called:
Q3. Which operation is NOT a standard Boolean operator in IR?
Q4. A term–document incidence matrix is typically avoided in large IR systems because it is:
Q5. Which best describes processing the query “A AND B NOT C”?
Q6. Why is grep inadequate as an IR engine for large collections?
Q7. Query optimization in Boolean retrieval typically focuses on:
Q8. In an inverted index, postings lists are commonly stored:
Q9. Which query would likely return zero results in strict Boolean retrieval but nonzero in ranked retrieval?
Q10. “Feast or famine” refers to:
II. Vocabulary, Tokenization & Postings (Q11–Q20)
Q11. A token is best defined as:
Q12. Normalization addresses:
Q13. Which is a challenge in tokenization?
Q14. A positional index stores, for each posting:
Q15. Phrase queries (e.g., "to be or not") are supported efficiently by:
Q16. Skip pointers are used to:
Q17. The distinction between type and token implies:
Q18. Case-folding is an example of:
Q19. Handling multi-language documents primarily affects:
Q20. A postings list for term t contains:
III. Dictionaries & Tolerant Retrieval (Q21–Q30)
Q21. A key downside of hash tables for term dictionaries is:
Q22. B-trees are preferred over binary trees for term dictionaries because they:
Q23. The permuterm index is primarily used to support:
Q24. For wildcard query mon* with a B-tree dictionary, a system fetches all terms in range:
Q25. Levenshtein distance counts the minimum number of:
Q26. Soundex is mainly designed to:
Q27. A k-gram index helps with:
Q28. A limitation of hash-based dictionaries is the need to:
Q29. For wildcard query *X, the permuterm idea allows lookup as:
Q30. “Tolerant retrieval” refers to techniques that:
IV. Index Construction (Q31–Q40)
Q31. A key hardware fact driving IR design is that:
Q32. BSBI stands for:
Q33. The SPIMI algorithm’s key idea includes:
Q34. In BSBI, after creating partial indexes for blocks, the next step is to:
Q35. A main advantage of SPIMI over BSBI is:
Q36. MapReduce helps in indexing primarily by:
Q37. Dynamic indexing commonly uses a small in-memory index to:
Q38. Disk I/O efficiency improves by:
Q39. In the Reuters RCV1 example, the number of non-positional postings is on the order of:
Q40. Fault tolerance in large IR systems is often achieved by:
V. Index Compression (Q41–Q50)
Q41. The main motivation for dictionary compression is to:
Q42. Postings compression improves performance because:
Q43. Gap encoding stores:
Q44. Variable Byte (VB) coding uses the high bit of each byte to:
Q45. Gamma coding represents an integer via:
Q46. Heaps’ Law relates vocabulary size M to number of tokens T as:
Q47. Zipf’s Law states that term frequency is approximately proportional to:
Q48. A dictionary-as-string structure typically stores for each term:
Q49. Front-coding is especially useful when:
Q50. In Reuters compression examples, γ-coded postings are typically:
VI. Scoring, TF–IDF & Vector Space Model (Q51–Q60)
Q51. Ranked retrieval addresses the Boolean “feast or famine” by:
Q52. Term frequency (tf) in a document typically:
Q53. Inverse document frequency (idf) downweights terms that:
Q54. A common idf formula is:
Q55. The vector space model represents documents and queries as:
Q56. Cosine similarity between vectors q and d is:
Q57. A limitation of the Jaccard coefficient for ranking is that it:
Q58. In tf-idf weighting, the weight of term t in document d typically increases with:
Q59. Length normalization in cosine similarity primarily:
Q60. Which statement is TRUE?