Annotations | ACL RD-TEC
Annotation Files
This folder contains all the candidate terms that are manually annotated as either technology term (marked by 2), valid term (marked by 1) or invalid term (marked by 0).
Annotation Files available for download: |
|||
Size: | Name: | Description: | |
719.306 | _ALL_ANNOTATED_ CANDID_TERM.ZIP |
The list of more than 80,000 manually annotated terms. Each line of the file represents the following information:
|
|
83.081.599 | _ALL_ANNOTATION_ IN_SENTENCE.ZIP |
This file lists all the sentence ids from the SEPID_CORPUS that contain at least one valid or technology terms. If a sentence contains more than one valid or technology term, then it is listed more than once. We exclude annotated invalid terms from this file. The structure of this file is as follows:
|
|
2.530.845 | _ALL_ANNOTATION_ MAP_TO_ ACL_ARC_ID.ZIP |
This file gives a mapping between the publications in ACL ARC and annotated valid terms as well as technology terms. The structure of the file is as follows:
|
|
2.858.754 | _ALL_ANNOTATION_ MAP_TO_ ACL_ARC_ID_ HUMAN_READABLE.ZIP |
This file has the same content as the above _ALL_ANNOTATION_MAP_TO_ACL_ARC_ID.ZIP file, however, in a human readable format. Technology terms as well as valid terms are mapped onto ACL ARC identifies, the title of papers and their authors are also presented. |
Additional Notes:
- In order to locate terms in the corpus at any level of text segment granularity (e.g. paragraph, section, etc.), or to locate them by a specific location (e.g. in the topic sentences only), please use the provided index files for lists of candidate terms.the candidate terms.
- Using the additional provided indices in SEPID_CORPUS, you can map the annotated terms onto publications (such as the above the above _ALL_ANNOTATION_MAP_TO_ACL_ARC_ID.ZIP), people, institute, etc. Therefore, this annotations can be further combined with the provided annotation layers, e.g. the citation network, in the ACL ARC.