Precision and Recall

I've said things like with POS tagging the best systems get "about 98% of the words right."
This is a simplification, and like information retrieval systems, most of this TE work uses Precision and Recall.
In the case of POS tagging, each word gets a tag, so 98% is reasonable,
but in general, a system doesn't have to (or may not be able to) propose an answer for each question. Also,it may want to propose multiple answers.
So, given a set of answers A (generated by the system) and a set of gold standard answers G, how correct is the system.
Precision is the number of answers that the system gives that are correct. count(A intersect G)/count(A).
It's pretty easy to get a high precision; only guess when you're very confident.
Recall is the number of the correct answers that are answered correctly. count(A intersect G)/count(G).
It's pretty easy to get a high recall; just give lots of different answers for each question.
The trick is to get high recall and precision.
So, typically an f-measure is used which is (2PR/(P+R)).
In the case of POS all the measures are close to .98.