We have a vocabulary of 4 words: [w1, w2, w3, w4]. Each document is represented by a binary vector indicating which words appear.