Text Mining
- I did my doctoral work in Natural Language Processing.
- One subfield of data mining is text mining.
- It's involved with procesing text documents.
- Search engines are one example of a text mining system.
- Light weight approaches using bags of words can be used
to develop useful systems. Here a document is just the bag
of words that are on it.
- More heavyweight techniques that more fully process text can
also be used.
- These take advantage of parsing and dialogue analysis
techniques developed over decades for natural language processing
systems.
- One example application is text extraction. This was
made popular by the Message Understanding Competitions.
- Roughly, a domain specific system is developed. The system
can then automatically generate database entries by reading
texts.
- For example, systems were built for understanding terrorist
events in Central America. A new article could be read, and
the system would tell you what event happened (e.g. kidnapping),
who did, when and where among other things.
- This doesn't work as well as people, though they're not perfect,
but one system could process 1000 documents in a minute.
- See James Allen's Natural Language Understanding and
- NISTs
page on the Message Understanding Conference