Document Classification

Carlos did this work. It's a bit of a hassel to get PyNN and NEST going, but I think he just used my standard mechanism of dual booting with Ubuntu, and uploading.
He was also working with python3 and I think my code is in python2; I've now switched to python3. (Note they update PyNN and NEST pretty frequently, and I've found it easiest to switch versions in a batch and infrequently.)
Using some documents from a Kaggle set, Carlos grabbed 1200, with 200 of those for testing.
He used Word2Vec to get a vector representation of the words (a vector of 100).
Two methods were used to combine the words in a document, Term Frequency Inverse Document Frequency and averaging.
Now, the documents were represented by a vector of 100, and these were used as input.
He turned on the neurons adjacent to the feature also (generalization).
The documents come from four categories.
The standard two layer approach is used for training.
The system gets 81% of the test set correct. (74% for TF-DIF.)