Text Extraction
- I competed in the Message Understanding Competitions (10 years ago).
- Systems competed to automatically extract database entries
from text.
- The best systems got about 60% precision and 60% recall.
- The text was open and often ungrammatical.
- It was in a restricted domain (central american terrorist events, or
rocket launches).
- When MUC-1 started, most systems couldn't even get one result.
- Over the 10 years or so they occured the systems went from toy
to marketable.
- The competitions stopped, because there were no more advances.
- The best systems used cascades of finite state automata.
- The key was to build these automata quickly and this was done
by people (novices even) marking up documents.