The Brill Tagger
- An old, and still solid, piece of technology is the Brill Part of
Speech tagger. It's in GATE but also freely available elsewhere.
- Part of speech tagging looks through open text and gives particular
words their lexical category.
- Examples of lexical categories are verbs, nouns, adjectives and
determiners.
- Typically, later stages of processing (e.g.
parsing) take this as input.
- Note that most open class words are lexically
ambiguous. E.g. note is both a verb and a noun.
- The Brill tagger is trainable. That is, you give it a
bunch of text where the parts of speech are known.
- It then learns to tag unseen text.
- It's about 97% effective.