Precision and Recall
- I've said things like with POS tagging the best systems get
"about 98% of the words right."
- This is a simplification, and like information retrieval systems,
most of this TE work uses Precision and Recall.
- In the case of POS tagging, each word gets a tag, so 98%
is reasonable,
- but in general, a system doesn't have to (or may not be able
to) propose an answer for each question. Also,it may want to
propose multiple answers.
- So, given a set of answers A (generated by the system) and a
set of gold standard answers G, how correct is the system.
- Precision is the number of answers that the system gives
that are correct. count(A intersect G)/count(A).
- It's pretty easy to get a high precision; only guess when you're
very confident.
- Recall is the number of the correct answers that are answered
correctly. count(A intersect G)/count(G).
- It's pretty easy to get a high recall; just give lots of different
answers for each question.
- The trick is to get high recall and precision.
- So, typically an f-measure is used which is (2PR/(P+R)).
- In the case of POS all the measures are close to .98.