Lexicon
- The lexicon is just normally thought of as the dictionary.
- However, it is typically a lot more than that.
- One issue is that it can deal with part of speech.
- A good system for this is the Brill tagger.
- Note that there are closed and open class words.
Closed class words include prepositions and determiners.
- Open class words include nouns and verbs. New ones are generated
all the time.
- The lexicon will also break a word into its parts (root and
suffix). This can make processing more efficient.
- You also often want to have gazeteers, lists of proper nouns, and
their associated semantic category. (E.g. Ford is a car company,
and London is a City.)