Benchmarks

Benchmarks are a common mechanism for comparing computer algorithms, but can be used for a range of tasks.
The idea is that there are some standardized tasks.
A new algorithm can be applied to the task, and compared to the results of earlier systems.
Again it's fair because anyone can do it.
I have a fair few thesis students do this on the Cal Irvine categorisation database.
N-Fold validation. With machine learning algorithms, it's easy to leaern the training set. The problem is generally to see how the system performs on unseen data. The system can be trained on part of the data (say 1/4th) and tested on the rest. It can then be retrained on a different fourth, etc. The is a 4-Fold validation.
The down side is that the benchmark may not be a complete solution to the overall problem. Better algorithms may do worse on the benchmark, but better on open versions of the problem.