NEAL Home Page
Middlesex Logo

Text Engineering Component Projects

My interpretation of the Message Understanding Competitions was that text extraction could be effectively done by a cascade of finite state automatas (a good description of this approach was Jerry Hobbs' FASTUS system.) This is what the GATE system uses as at least the introductory part of its standard text extraction system. So, some of the standard systems could be created to start to process language. One cascade would be:
  1. Tokenizer
  2. Sentence Splitter
  3. Part of Speech Tagger (e.g. Brill)
  4. Simple Phrase Parser
  5. Phrase Combination
  6. Template Filling
Note that only template filling is specific to text extraction. The other components could be used for other systems such as dialog agents. As all of these components can be done with FSAs, each could be done readily on neuromorphic systems. Moreover, all of these subprojects could be combined into one system. Putting these in cascade on one board would reduce the delays of getting things onto and off the board.