Bag of Words
   - Bag of Words techniques (like latent semantic analysis) are a good
       approximation to semantics.
 
   - You take a bunch of documents (typically paragraphs) and build
       a term by document matrix.
 
   - The idea is that the meaning of the document is the words that
       are in it, and the meaning of the word is the documents it is in.
 
   - You typically remove stop words (like the, of, and and).
 
   - You can also reduce the matrix into two smaller ones (using a
       Singular Value Decomposition(SVD) algorithm).
 
   - So a 3000x5000 document by word matrix becomes a 3000x300 document
       matrix and a 5000x300 word matrix.
 
   - They typically use a cosine measurement to measure similarity.
 
   - Terms that are similar will point in a similar direction (in 300
       dimensional space).
 
   - You can add words together, and you can compare documents with
       words or each other.