Categorisation and Classification
- Perhaps the most common datamining task is categorisation.
- Data items are defined by a vector of fields. One of the
fields is the category.
- Your system learns from the data items to categorise new
items.
- Classification is similar but the data items don't contain
the category.
- I'm not a big datamining guy, but a standard machine learning
research paradigm is to use a standard benchmark.
- A really commonly used one (the most commonly used one) is
University of California at Irvine's Categorisation Benchmark
- Its used to compare algorithms.
- Its also not a bad way to exercise your datamining skills.
- Note that no algorithm is better than all others on all data sets;
there's no free lunch.