Systems vs. People

The precision and recall measurements make use of a gold standard.
The gold standard is developed by people, and it's not perfect.
A good indicator of how perfect the standard is is inner-annotator agreement.
If the F measurement between annotators is roughly the same as your systems measurement, then you've done about as well as you could.
In MUC-7: the best Named Entity system got a 93 F-score and inner-annotator agreement was 97.
For template elements the best score was 87.
For the full template, the inner annnotator score was 97 and the best score was 51 (by SRA).
I got TE 47 and ST 1 in about a days work on the new domain.

Inner Annotator Agreement