GATE
- GATE is the General Architecture for Text Extraction.
- I'm sure Rob, Hamish, Yorick and others will disagree with
parts of what I'm saying but...
- Rob Gaizaskus put in a MUC entry for (I think) MUC-4. He
was finishing up his PhD, and shortly thereafter got a lecturing
job at Sheffield.
- Yorick Wilks, a really impressive NLP professor, suggested
that they make a tool for generating these systems.
- So they hired Hamish Cunningham, and away they went.
- In the intervening almost two decades, they've been improving the
tool.
- It's component based, so anyone can add a compenent. At some
point my doctoral thesis system (Plink) was one of the parsers.
- During MUC, the TE community figured out that you needed to use
machine learning to make your systems work.
- So, GATE also has a lot of data (text resources) that one can
use, but also see The Linguistic
Data Consortium.
- GATE is now used for a lot more than Text Extraction.