Macquarie University

evalrun.py

NAME

evalrun

SYNOPSIS

evalrun [-c casematching] [-h] [-m measures] [-r regime] [-t types] [-o resultsfile] goldedits sysedits

DESCRIPTION

Evalrun takes two directories containing sets of files that contain edit structures and pairwise compares the contents of the corresponding files. The first-specified directory, goldeditdir, is assumed to contain the gold-standard edits; the second directory, syseditdir, contains the system-produced edits to be evaluated against these gold-standard edits. The results of the evaluation are written to the specified results file, resultsfile; if no results file is specified, the results are written to the standard output.

Evalrun first ensures that there is a system-produced edit set corresponding to each edit set in the gold-standard dataset, with correspondences being determined by appropriate filename matching as specified in the HOO File Naming Conventions. If there is a mismatch, evalrun produces an error message and exits without performing any evaluation.

If there is a system-produced edit set corresponding to each gold-standard edit set, evalrun calls the appropriate functions exported by evalfrag for each such pair, passing on the provided -m, -r and -t options. If any of these options are not specified, their default values are inserted into the calls to the relevant functions. The output for each pair of edit sets is specified as a results file whose name is derived from the corresponding fragment name.

Once all the pairs of edits sets have been evaluated, the contents of these intermediate results files are then combined to produce a dataset results file. See here for a schematic presentation of what a dataset results file looks like. The results file thus provides data on a system's ability to detect, recognize and correct errors of the specified types for a given gold-standard dataset, under a specified scoring regime.

OPTIONS

-c casematching: Specifies whether system corrections should case-match gold-standard corrections. See evalfrag for an explanation.
-h: Prints out a help message and exits.
-m measures: Specifies the measures—Precision, Recall and/or F-score—that should be computed; defaults to prf. See evalfrag for an explanation.
-o resultsfile: Specifies the file to be used to contain the results of the evaluation processing, as outlined in the DESCRIPTION above.
-r regime: Specifies the scoring regime to be used; defaults to nobonus. See evalfrag for an explanation.
-t types: Specifies the error types to be included in the scoring; defaults to all. See evalfrag for an explanation.

Back to the top of this page