The ACL RD-TEC 1.0
The ACL Reference Dataset for Terminology Extraction and Classification (ACL RD-TEC), ver. 1.0, was developed for benchmarking automatic term recognition algorithms (see QasemiZadeh and Handschuh, 2014). A manually validated terminology is the main component of ACL RD-TEC 1.0.
It embraces more than 80,000 manually annotated candidate terms which are annotated either as valid, invalid or technology terms. More than 25,000 candidates are valid terms, of which about 13,000 are technology terms. In short, "technology terms" are those computational linguistics jargon that signal processes, method and algorithms: terms that signify practical solutions to NLP problems.
In order to be able to reproduce the obtained results in an evaluation experiment, the dataset is built on top of a fixed set of documents, i.e. the ACL Anthology Reference Corpus, 2006 release (ibid). However, other usages are also possible, for instance, see the list of 13 semantically related terms in the ACL ARC 2.0. The annotation process for validating terms was carried out by Behrang QasemiZadeh as part of his PhD research. The annotation was done on the output of the extraction method explained in Zadeh and Handschuh (2014b).
The complete resource including the preprocessed segmented ACL ARC is available from atmykitchen.info.
Here, you can also download the complete list of annotated candidate terms, list of annotated terms, list of technology terms.
Note that in another effort, Qasemizadeh and Schumann introduced the ACL RD-TEC 2.0 which complements this resource by providing annotation of terms in context.
For commercial use, ACL ARC 1.0 is now also available through ELRA.
Please attribute this dataset by citing Zadeh and Handschuh (2014).
Based on the ACL Anthology Reference Corpus (ACL ARC) at http://acl-arc.comp.nus.edu.sg/. This dataset is also available via ELRA under reference ELRA-T0375. Permissions beyond the scope of this license may be available; for inquiries please contact Behrang QasemiZadeh or ELRA.