Document Classification
- Carlos did this work. It's a bit of a hassel to get PyNN and
NEST going, but I think he just used my standard mechanism of
dual booting with Ubuntu, and uploading.
- He was also working with python3 and I think my code is in python2;
I've now switched to python3. (Note they update PyNN and NEST
pretty frequently, and I've found it easiest to switch versions in a
batch and infrequently.)
- Using some documents from a Kaggle set, Carlos grabbed 1200,
with 200 of those for testing.
- He used Word2Vec to get a vector representation of the words
(a vector of 100).
- Two methods were used to combine the words in a document,
Term Frequency Inverse Document Frequency and averaging.
- Now, the documents were represented by a vector of 100, and
these were used as input.
- He turned on the neurons adjacent to the feature also (generalization).
- The documents come from four categories.
- The standard two layer approach is used for training.
- The system gets 81% of the test set correct. (74% for TF-DIF.)