Text
- In the datamining community, text is often called unstructured data.
(The computational linguist in me shudders.)
- This means that text, like a newspaper article, is not
just a vector.
- I think language understanding is AI complete. That is, to properly
understand natural language, you need a system that is intelligent.
- We don't have one of those yet (and probably not in my life
time), but we can get a long way with the techniques we already
have.
- First of all, it is important to note that there are different
kinds of documents. Email is not a newspaper article.
- You can take advantage of this. A good summary for a newspaper article
is the first paragraph. It's pretty easy to get the first paragraph.