Traditional Text Engineering systems are broken into different modules.
A reasonable modular break down would include:
- Word Recognition, which has been explained.
- Part of Speech Analysis. This is categorisation to learn parts of
speech and to categorise examples. This can take advantage of context.
- Parsing has been explained.
- Co-Reference resolution involves aligning anaphor (e.g. pronouns)
with existing entities. Existing entities are either active CAs,
recently inactive CAs, or recently activated bound n-tuples.
The question is which CA to activate. This is decided by
activation.
- Discourse Analysis, this is the least understood module. It would
involve building up temporary structure, and allowing some of
that structure to persist. In some sense it can be implemented
with really complex grammar rules.