The following information is only of relevance if there is a need to evaluate on something other than all the errors annotated in a gold-standard date set.

In some circumstances we may want to only evaluate system performance on a subset of the errors that are annotated in a dataset. A typical scenario where this might happen is when we are using a dataset that is annotated for a wide range of error types, but we only want to evaluate system performance on the detection and correction of preposition usage errors.

The HOO evaluation and reporting tools therefore allow the selection of specific types to be used in the computation and presentation of scores. We provide a configuration mechanism that allows a mapping from abstract categories to the error types indicated in edit structures. This is achieved by means of a types.config file, which should exist in the same directory as that in which the tools are run; the contents of a typical types.config file are shown here. In this example, each aggregate line indicates a user-specified mapping from an abstract category (for example, conj) to a set of error types used in edit structures (for example, RC, UC and MC). These aggregations are useful in organising the reporting of type-based evaluations, since the granularity of edit types used in annotations is often finer than is required for reporting.

There is nothing to stop the user reporting at the level of the fine-grained types that exist in the data; this can be achieved by providing aggregate clauses that simply map the individual types of interest into themselves, although for the sake of clarity it may make sense to use distinct labels; for example, we might have:

aggregate WrongPunct RP
aggregate MissingPunct MP
aggregate UnnecessaryPunct UP
This has the added benefit of providing a means of controlling the particular type labels used in outputs.

Back to the top of this page