Lexical and Syntax systems
- NLP takes advantage of centuries of linguistics research.
- It seems that two complex systems in humans are the
lexical and syntax systems.
- The lexicon
- is the dictionary.
- It's a bit more than just a bunch of words.
- There are a lot of lexical categories, verb, noun, adjective,
determiner, number ... The exact number is not clear and certainly
varies from language to language.
- There are also ways of getting multiple words from roots (e.g.
plural of most nouns just adds s to the root).
- Morphological analysis relates to this stemming.
- The Brill part of speech tagger is good for
part of speech tagging. You can download it for English, and
its trainable.
- The syntactic system involves taking lexemes and combining them
into phrases, clauses and sentences.
- This is typically called parsing.
- There is a lot of ambiguity in natural language, and
syntactic ambiguity is prevalent.
- Grammar rules are often used to define the syntax
of a language. There is not a standard grammar for English.
- The best parsers get around 90% of attachments correct.
- Syntax tree: typically a sentence can be broken down
into a tree with noun, verb, determiner, adverb, and adjective
phrases combined.