Benchmarks
- Benchmarks are a common mechanism for comparing
computer algorithms, but can be used for a range of
tasks.
- The idea is that there are some standardized tasks.
- A new algorithm can be applied to the task, and
compared to the results of earlier systems.
- Again it's fair because anyone can do it.
- I have a fair few thesis students do this on the
Cal Irvine categorisation database.
- N-Fold validation. With machine learning algorithms, it's easy
to leaern the training set. The problem is generally to see
how the system performs on unseen data. The system
can be trained on part of the data (say 1/4th) and tested
on the rest. It can then be retrained on a different fourth, etc.
The is a 4-Fold validation.
- The down side is that the benchmark may not be
a complete solution to the overall problem. Better
algorithms may do worse on the benchmark, but better
on open versions of the problem.