The ACL RD-TEC 2.0
Successive to the ACL RD-TEC 1.0, the ACL RD-TEC 2.0 embraces 300 unique abstracts from the ACL Anthology Corpus which are manually annotated for terms that they contain.
These terms are tagged with several categories of computational linguistics concepts: technologies, systems, language resources, language resources (specific product), models, measure and measurement related terms, as well as a class label for residuals (i.e., other)---see the guidelines here.
In total, 471 abstracts are annotated, of which 171 are annotated by two annotators (myself and Anne Schumann).
The manually annotated corpora resulting from the annotations by each of the participating annotators can be browsed in the NoSkE engine at these links for Annotator 1 and for Annotaotr 2. To see annotated terms in an abstract, click on provided links in the A1 and A2 columns of the table below.
More information about the dataset can be found in the following publication:
QasemiZadeh and Schumann, The ACL RD-TEC 2.0: A Language Resource for Evaluating Term Extraction and Entity Recognition Methods, LREC 2016.
Obtaining the annotated corpus
- The ACL RD-TEC has a permanent home at LINDAT/CLARIN Repository of the Institute of Formal and Applied Linguistics, Charles University in Prague: http://hdl.handle.net/11372/LRT-1661.
- The Git repository containing the data and some tools can be also browsed here.
- You can also use the local NoSkE instances to collect data.
- The corpus is also hosted by the Lindat KonText system at UFAL.
Summary:
- Total number of abstracts annotated by at least one annotaotr: 300
- Total number of files annotated by the first annotator: 282
- Total number of files annotated by the second annotator: 189
- Total number of files annotated by both annotator: 171
- Average inter annotator agreement on annotated boundaries: 0.8
- Average inter annotator agreement on annotated boundaries and their assigned class: 0.71
Lists of abstracts:
In the following table, for each file participating annotators are marked by A1 and A2 (and a link to the terms that they have annotated in the abstract). IAAS and IAAA show the inter annotator agreement for deciding term boundaries and their semantic classes, respectively. Bear in mind that IAAS determines the upper-bound limit for IAAA.
Computed inter agreements per concept category can be seen here.
Year | ACL ID | Title | A1 | A2 | IAAS | IAAA |
---|---|---|---|---|---|---|
1978 | T78‑1001 | Testing The Psychological Reality of a Representational Model | ✓ | ✓ | 0.87 | 0.82 |
1978 | T78‑1028 | Fragments of a Theory of Human Plausible Reasoning | ✓ | ✓ | 0.86 | 0.78 |
1978 | T78‑1031 | PATH‑BASED AND NODE‑BASED INFERENCE IN SEMANTIC NETWORKS | ✓ | ✓ | 0.94 | 0.73 |
1980 | C80‑1039 | ON FROFF: A TEXT PROCESSING SYSTEM FOR ENGLISH TEXTS AND FIGURES | ✓ | ✓ | 0.53 | 0.53 |
1980 | C80‑1073 | ATNS USED AS A PROCEDURAL DIALOG MODEL | ✓ | ✓ | 0.75 | 0.5 |
1980 | P80‑1004 | Metaphor ‑ A Key to Extensible Semantic Analysis | ✓ | ✓ | 0.69 | 0.21 |
1980 | P80‑1019 | Expanding the Horizons of Natural Language Interfaces | ✓ | ✓ | 0.64 | 0.64 |
1980 | P80‑1026 | Flexiable Parsing | ✓ | ✓ | 0.6 | 0.53 |
1981 | P81‑1032 | Dynamic Strategy Selection in Flexible Parsing | ✓ | |||
1981 | P81‑1033 | A Construction‑Specific Approach to Focused Interaction in Flexible Parsing | ✓ | |||
1982 | C82‑1054 | AN IMPROVED LEFT‑CORNER PARSING ALGORITHM | ✓ | ✓ | 1 | 1 |
1982 | J82‑3002 | An Efficient Easily Adaptable System for Interpreting Natural Language Queries | ✓ | ✓ | 0.88 | 0.85 |
1982 | P82‑1035 | Scruffy Text Understanding: Design and Implementation of 'Tolerant' Understanders | ✓ | ✓ | 0.51 | 0.51 |
1983 | E83‑1021 | AN APPROACH TO NATURAL LANGUAGE IN THE SI‑NETS PARADIGM | ✓ | |||
1983 | E83‑1029 | NATURAL LANGUAGE INPUT FOR SCENE GENERATION | ✓ | |||
1983 | P83‑1003 | Crossed Serial Dependencies: A low‑power parseable extension to GPSG | ✓ | |||
1983 | P83‑1004 | Formal Constraints on Metarules | ✓ | |||
1983 | P83‑1021 | PARSING AS DEDUCTION | ✓ | |||
1984 | P84‑1020 | LIMITED DOMAIN SYSTEMS FOR LANGUAGE TEACHING | ✓ | ✓ | 1 | 1 |
1984 | P84‑1034 | A PROPER TREATMEMT OF SYNTAX AND SEMANTICS IN MACHINE TRANSLATION | ✓ | ✓ | 0.93 | 0.93 |
1984 | P84‑1047 | Entity‑Oriented Parsing | ✓ | ✓ | 0.8 | 0.55 |
1984 | P84‑1064 | A COMPUTATIONAL THEORY OF DISPOSITIONS | ✓ | ✓ | 0.89 | 0.79 |
1984 | P84‑1078 | Controlling Lexical Substitution in Computer Text Generation | ✓ | ✓ | 0.83 | 0.78 |
1985 | E85‑1037 | A PROBLEM SOLVING APPROACH TO GENERATING TEXT FROM SYSTEMIC GRAMMARS | ✓ | |||
1985 | E85‑1041 | THE STRUCTURE OF COMMUNICATIVE CONTEXT OF DIALOGUE INTERACTION | ✓ | |||
1985 | P85‑1015 | Parsing with Discontinuous Constituents | ✓ | |||
1985 | P85‑1019 | Semantic Caseframe Parsing and Syntactic Generality | ✓ | |||
1985 | P85‑1024 | A PRAGMATICS‑BASED APPROACH TO UNDERSTANDING INTERSENTENTIAL ELLIPSIS | ✓ | |||
1986 | C86‑1081 | A LOGICAL FORMALISM FOR THE REPRESENTATION OF DETERMINERS | ✓ | ✓ | 0.86 | 0.86 |
1986 | C86‑1132 | SYNTHESIZING WEATHER FORECASTS FROM FORMATFED DATA | ✓ | ✓ | 0.88 | 0.88 |
1986 | J86‑1002 | THE CORRECTION OF ILL‑FORMED INPUT USING HISTORY‑BASED EXPECTATION WITH APPLICATIONS TO SPEECH UNDERSTANDING | ✓ | ✓ | 0.74 | 0.64 |
1986 | J86‑3001 | Attention, Intentions, And The Structure Of Discourse | ✓ | ✓ | 0.83 | 0.82 |
1986 | J86‑4002 | REFERENCE IDENTIFICATION AND REFERENCE IDENTIFICATION FAILURES | ✓ | ✓ | 0.86 | 0.84 |
1986 | P86‑1011 | The Relationship Between Tree Adjoining Grammars And Head Grammars | ✓ | ✓ | 0.92 | 0.73 |
1986 | P86‑1038 | A LOGICAL SEMANTICS FOR FEATURE STRUCTURES | ✓ | ✓ | 0.84 | 0.66 |
1987 | E87‑1037 | A Comparison of Rule‑Invocation Strategies in Context‑Free Chart Parsing | ✓ | |||
1987 | E87‑1043 | ITERATION, HABITUALITY AND VERB FORM SEMANTICS | ✓ | |||
1987 | J87‑1003 | SIMULTANEOUS‑DISTRIBUTIVE COORDINATION AND CONTEXT‑FREENESS | ✓ | |||
1987 | J87‑3001 | PROCESSING DICTIONARY DEFINITIONS WITH PHRASAL PATTERN HIERARCHIES | ✓ | |||
1987 | P87‑1022 | A CENTERING APPROACH TO PRONOUNS | ✓ | |||
1988 | A88‑1001 | The Multimedia Articulation of Answers in a Natural Language Database Query System | ✓ | ✓ | 0.76 | 0.63 |
1988 | A88‑1003 | An Architecture for Anaphora Resolution | ✓ | ✓ | 0.53 | 0.43 |
1988 | C88‑1007 | Machine Translation Using Isomorphic UCGs | ✓ | ✓ | 0.59 | 0.56 |
1988 | C88‑1044 | On the Generation and Interpretation of Demonstrative Expressions | ✓ | ✓ | 0.71 | 0.71 |
1988 | C88‑1066 | Parsing with Category Coocurrence Restrictions | ✓ | ✓ | 0.87 | 0.69 |
1988 | C88‑2086 | Solving Some Persistent Presupposition Problems | ✓ | ✓ | 0.57 | 0.57 |
1988 | C88‑2130 | Directing the Generation of Living Space Descriptions | ✓ | ✓ | 1 | 0.67 |
1988 | C88‑2132 | Island Parsing and Bidirectional Charts | ✓ | ✓ | 0.55 | 0.48 |
1988 | C88‑2160 | Interactive Translation : a new approach | ✓ | ✓ | 0.43 | 0.43 |
1988 | C88‑2162 | NETL: A System for Representing and Using Real‑World Knowledge | ✓ | ✓ | 0.83 | 0.45 |
1988 | C88‑2166 | COMPLEX: A Computational Lexicon for Natural Language Systems | ✓ | ✓ | 0.7 | 0.64 |
1988 | J88‑3002 | MODELING THE USER IN NATURAL LANGUAGE SYSTEMS | ✓ | ✓ | 0.9 | 0.86 |
1989 | E89‑1006 | TENSES AS ANAPHORA | ✓ | |||
1989 | E89‑1016 | User studies and the design of Natural Language Systems | ✓ | |||
1989 | H89‑1027 | THE MIT SUMMIT SPEECH RECOGNITION SYSTEM: A PROGRESS REPORT | ✓ | |||
1989 | H89‑1036 | Lexicalized TAGs, Parsing and Lexicons | ✓ | |||
1989 | H89‑2019 | A PROPOSAL FOR SLS EVALUATION | ✓ | |||
1989 | H89‑2028 | A CSR‑NL INTERFACE SPECIFICATION: Version 1.5 | ✓ | |||
1989 | J89‑4003 | A FORMAL MODEL FOR CONTEXT‑FREE LANGUAGES AUGMENTED WITH REDUPLICATION | ✓ | |||
1989 | P89‑1008 | CONVERSATIONALLY RELEVANT DESCRIPTIONS | ✓ | |||
1990 | C90‑1013 | Generation for Dialogue Translation Using Typed Feature Structure Unification | ✓ | ✓ | 0.91 | 0.67 |
1990 | C90‑2032 | Sentence disambiguation by document preference sets oriented | ✓ | ✓ | 0.7 | 0.7 |
1990 | C90‑3014 | A phonological knowledge base system using unification‑based formalism: a case study of Korean phonology | ✓ | ✓ | 0.67 | 0.33 |
1990 | C90‑3045 | Synchronous Tree‑Adjoining Grammars | ✓ | ✓ | 0.71 | 0.67 |
1990 | C90‑3046 | Japanese Sentence Analysis as Argumentation | ✓ | ✓ | 0.88 | 0.62 |
1990 | C90‑3063 | Automatic Processing of Large Corpora for the Resolution of Anaphora References | ✓ | ✓ | 0.69 | 0.36 |
1990 | C90‑3072 | Spelling‑checking for Highly Inflective Languages | ✓ | ✓ | 0.86 | 0.83 |
1990 | H90‑1016 | Toward a Real‑Time Spoken Language System Using Commercial Hardware | ✓ | ✓ | 0.5 | 0.42 |
1990 | H90‑1060 | A New Paradigm for Speaker‑Independent Training and Speaker Adaptation | ✓ | ✓ | 0.83 | 0.74 |
1990 | J90‑3002 | AN EDITOR FOR THE EXPLANATORY AND COMBINATORY DICTIONARY OF CONTEMPORARY FRENCH (DECFC) | ✓ | ✓ | 0.88 | 0.83 |
1990 | P90‑1014 | Free Indexation: Combinatorial Analysis and A Compositional Algorithm | ✓ | ✓ | 0.75 | 0.46 |
1991 | E91‑1012 | Non‑deterministic Recursive Ascent Parsing | ✓ | |||
1991 | E91‑1043 | A BIDIRECTIONAL MODEL FOR NATURAL LANGUAGE PROCESSING | ✓ | |||
1991 | E91‑1050 | A Language for the Statement of Binary Relations over Feature Structures | ✓ | |||
1991 | H91‑1010 | New Results with the Lincoln Tied‑Mixture HMM CSR System | ✓ | |||
1991 | H91‑1067 | Automatic Acquisition of Subcategorization Frames from Tagged Text | ✓ | |||
1991 | H91‑1077 | A PROPOSAL FOR LEXICAL DISAMBIGUATION | ✓ | |||
1991 | P91‑1016 | The Acquisition and Application of Context Sensitive Grammar for English | ✓ | |||
1991 | P91‑1025 | Resolving Translation Mismatches With Information Flow | ✓ | |||
1992 | A92‑1026 | Robust Processing of Real‑World Natural‑Language Texts | ✓ | ✓ | 0.77 | 0.77 |
1992 | A92‑1027 | An Efficient Chart‑based Algorithm for Partial‑Parsing of Unrestricted Texts | ✓ | ✓ | 0.84 | 0.8 |
1992 | C92‑1052 | Temporal Structure of Discourse | ✓ | ✓ | 0.92 | 0.73 |
1992 | C92‑1055 | Syntactic Ambiguity Resolution Using A Discrimination and Robustness Oriented Adaptive Learning Algorithm | ✓ | ✓ | 0.76 | 0.71 |
1992 | C92‑2068 | Quasi‑Destructive Graph Unification with Structure‑Sharing | ✓ | ✓ | 0.5 | 0.5 |
1992 | C92‑2115 | A Similarity‑Driven Transfer System | ✓ | ✓ | 1 | 0.91 |
1992 | C92‑3165 | Interactive Speech Understanding | ✓ | ✓ | 0.67 | 0.67 |
1992 | C92‑4199 | Recognizing Unregistered Names for Mandarin Word Identification | ✓ | ✓ | 0.76 | 0.7 |
1992 | C92‑4207 | Reconstructing Spatial Image from Natural Language Texts | ✓ | ✓ | 0.58 | 0.48 |
1992 | H92‑1003 | Multi‑Site Data Collection for a Spoken Language Corpus: MADCOW | ✓ | ✓ | 0.94 | 0.88 |
1992 | H92‑1010 | Spoken Language Processing in the Framework of Human‑Machine Communication at LIMSI | ✓ | ✓ | 0.89 | 0 |
1992 | H92‑1016 | The MIT ATIS System: February 1992 Progress Report | ✓ | ✓ | 0.8 | 0.75 |
1992 | H92‑1017 | Recent Improvements and Benchmark Results for Paramax ATIS System | ✓ | ✓ | 0.67 | 0.59 |
1992 | H92‑1026 | Towards History‑based Grammars: Using Richer Models for Probabilistic Parsing | ✓ | ✓ | 0.74 | 0.55 |
1992 | H92‑1036 | MAP Estimation of Continuous Density HMM: Theory and Applications | ✓ | ✓ | 0.83 | 0.79 |
1992 | H92‑1045 | One Sense Per Discourse | ✓ | ✓ | 0.83 | 0.77 |
1992 | H92‑1060 | A Relaxation Method for Understanding Spontaneous Speech Utterances | ✓ | ✓ | 0.6 | 0.46 |
1992 | H92‑1074 | CSR Corpus Development | ✓ | ✓ | 0.69 | 0.64 |
1992 | H92‑1095 | Language Understanding Research at Paramax | ✓ | ✓ | 0.74 | 0.74 |
1992 | M92‑1025 | GE NLTOOLSET: DESCRIPTION OF THE SYSTEM AS USED FOR MUC‑4 | ✓ | |||
1993 | E93‑1004 | Talking About Trees | ✓ | |||
1993 | E93‑1013 | LFG Semantics via Constraints | ✓ | |||
1993 | E93‑1020 | A Computational Treatment of Sentence‑Final 'then' | ✓ | |||
1993 | E93‑1023 | A Probabilistic Context‑free Grammar for Disambiguation in Morphological Parsing | ✓ | |||
1993 | E93‑1025 | A Discourse Copying Algorithm for Ellipsis and Anaphora Resolution | ✓ | |||
1993 | E93‑1043 | Coping With Derivation in a Morphological Component | ✓ | |||
1993 | H93‑1076 | Speech and Text‑Image Processing in Documents | ✓ | |||
1993 | P93‑1014 | A UNIFICATION‑BASED PARSER FOR RELATIONAL GRAMMAR | ✓ | |||
1994 | A94‑1007 | Symmetric Pattern Matching Analysis for English Coordinate Structures | ✓ | ✓ | 0.84 | 0.63 |
1994 | A94‑1011 | Exploiting Sophisticated Representations for Document Retrieval | ✓ | ✓ | 0.77 | 0.66 |
1994 | A94‑1017 | Real‑Time Spoken Language Translation Using Associative Processors | ✓ | ✓ | 0.79 | 0.68 |
1994 | A94‑1026 | Handling Japanese Homophone Errors in Revision Support System for Japanese Texts; REVISE | ✓ | ✓ | 0.82 | 0.82 |
1994 | C94‑1026 | A Part‑of‑Speech‑Based Alignment Algorithm | ✓ | ✓ | 0.86 | 0.74 |
1994 | C94‑1030 | AN EVALUATION TO DETECT AND CORRECT ERRONEOUS CHARACTERS WRONGLY SUBSTITUTED, DELETED AND INSERTED IN JAPANESE AND ENGLISH SEN~IENCES USING MARKOV MODELS | ✓ | ✓ | 0.93 | 0.86 |
1994 | C94‑1052 | TGE: Tlinks Generation Environment. | ✓ | ✓ | 0.84 | 0.71 |
1994 | C94‑1061 | CONCURRENT LEXICAIJZEID DEPENDENCY PARSING: THE ParseTalk MODEL | ✓ | ✓ | 0.83 | 0.6 |
1994 | C94‑1077 | Emergent Parsing and Generation with Generalized | ✓ | ✓ | 0.92 | 0.88 |
1994 | C94‑1079 | PRINCIPAR‑‑An Efficient, Broad‑coverage, Principle‑based Parser | ✓ | ✓ | 0.74 | 0.5 |
1994 | C94‑1080 | CONCURRENT LEXICALIZED DEPENDENCY PARSING: A BEHAVIORAL VIEW ON ParseTalk EVENTS | ✓ | ✓ | 0.84 | 0.71 |
1994 | C94‑1082 | LHIP: Extended DCGs for Configurable Robust Parsing | ✓ | ✓ | 0.6 | 0.6 |
1994 | C94‑2151 | Non‑constituent coordination: Theory and practice | ✓ | ✓ | 0.75 | 0.75 |
1994 | H94‑1014 | Language Modeling with Sentence‑Level Mixtures | ✓ | ✓ | 0.86 | 0.81 |
1994 | H94‑1034 | Tagging Speech Repairs | ✓ | ✓ | 0.75 | 0.7 |
1994 | H94‑1084 | Integrated Text and Image Understanding for Document Understanding | ✓ | ✓ | 0.76 | 0.76 |
1995 | E95‑1021 | Tagging French ‑ comparing a statistical and a constraint‑based method | ✓ | |||
1995 | E95‑1033 | ParseTalk about Sentence‑ and Text‑Level Anaphora | ✓ | |||
1995 | E95‑1036 | Splitting the Reference Time: Temporal Anaphora and Quantification in DRT | ✓ | |||
1995 | P95‑1013 | Compilation of HPSG to TAG | ✓ | |||
1995 | P95‑1025 | Statistical Sense Disambiguation with Relatively Small Corpora Using Dictionary Definitions | ✓ | |||
1995 | P95‑1027 | A Quantitative Evaluation of Linguistic Tests for the Automatic Prediction of Semantic Markedness | ✓ | |||
1995 | P95‑1034 | Two‑Level, Many‑Paths Generation | ✓ | |||
1995 | P95‑1053 | Conciseness through Aggregation in Text Generation | ✓ | |||
1997 | A97‑1015 | The Domain Dependence of Parsing | ✓ | |||
1997 | A97‑1020 | Reading more into Foreign Languages | ✓ | |||
1997 | A97‑1021 | Large‑Scale Acquisition of LCS‑Based Lexicons for Foreign Language Tutoring | ✓ | |||
1997 | A97‑1027 | Dutch Sublanguage Semantic Tagging combined with Mark‑Up Technology | ✓ | |||
1997 | A97‑1028 | A Statistical Profile of the Named Entity Task | ✓ | |||
1997 | A97‑1034 | Using SGML as a Basis for Data‑Intensive NLP | ✓ | |||
1997 | A97‑1042 | Identifying Topics by Position | ✓ | |||
1997 | A97‑1050 | Semi‑Automatic Acquisition of Domain‑Specific Translation Lexicons | ✓ | |||
1997 | A97‑1052 | Corpus Data TP FP FN | ✓ | |||
1997 | P97‑1002 | Fast Context‑Free Parsing Requires Fast Boolean Matrix Multiplication | ✓ | |||
1997 | P97‑1006 | Document Classification Using a Finite Mixture Model | ✓ | |||
1997 | P97‑1015 | Probing the lexicon in evaluating commercial MT systems | ✓ | |||
1997 | P97‑1017 | Machine Transliteration | ✓ | |||
1997 | P97‑1026 | Sentence Planning as Description Using Tree Adjoining Grammar | ✓ | |||
1997 | P97‑1040 | Efficient Generation in Primitive Optimality Theory | ✓ | |||
1997 | P97‑1050 | Efficient Construction of Underspecified Semantics under Massive Ambiguity | ✓ | |||
1997 | P97‑1052 | On Interpreting F‑Structures as UDRSs | ✓ | |||
1997 | P97‑1058 | Approximating Context‑Free Grammars with a Finite‑State Calculus | ✓ | |||
1997 | P97‑1071 | Contrastive accent in a data‑to‑speech system | ✓ | |||
1997 | P97‑1072 | Towards resolution of bridging descriptions | ✓ | |||
1999 | E99‑1005 | Determinants of Adjective‑Noun Plausibility | ✓ | |||
1999 | E99‑1014 | Full Text Parsing using Cascades of Rules: an Information Extraction Perspective | ✓ | |||
1999 | E99‑1015 | An annotation scheme for discourse‑level argumentation in research articles | ✓ | |||
1999 | E99‑1023 | Representing Text Chunks | ✓ | |||
1999 | E99‑1029 | Parsing with an Extended Domain of Locality | ✓ | |||
1999 | E99‑1034 | Finding content‑bearing terms using term similarities | ✓ | |||
1999 | E99‑1038 | Focusing on focus: a formalization | ✓ | |||
1999 | P99‑1022 | Dynamic Nonlocal Language Modeling via Hierarchical Topic‑Based Adaptation | ✓ | |||
1999 | P99‑1025 | Construct Algebra: Analytical Dialog Management | ✓ | |||
1999 | P99‑1036 | A Part of Speech Estimation Method for Japanese Unknown Words using a Statistical Model of Morphology and Context | ✓ | |||
1999 | P99‑1038 | Two Accounts of Scope Availability and Semantic Underspecification | ✓ | |||
1999 | P99‑1058 | A semantically‑derived subset of English for hardware verification | ✓ | |||
1999 | P99‑1062 | Semantic Analysis of Japanese Noun Phrases : A New Approach to Dictionary‑Based Understanding | ✓ | |||
1999 | P99‑1068 | Mining the Web for Bilingual Text | ✓ | |||
1999 | P99‑1080 | A Pylonic Decision‑Tree Language Model with Optimal Question Selection | ✓ | |||
2001 | H01‑1001 | Activity detection for information access to oral communication | ✓ | ✓ | 0.74 | 0.43 |
2001 | H01‑1017 | Dialogue Interaction with the DARPA Communicator Infrastructure: The Development of Useful Software | ✓ | ✓ | 0.62 | 0.62 |
2001 | H01‑1040 | Intelligent Access to Text: Integrating Information Extraction Technology into Text Browsers | ✓ | ✓ | 0.83 | 0.67 |
2001 | H01‑1041 | Interlingua‑Based Broad‑Coverage Korean‑to‑English Translation in CCLING | ✓ | ✓ | 0.84 | 0.82 |
2001 | H01‑1042 | Is That Your Final Answer? | ✓ | ✓ | 0.95 | 0.95 |
2001 | H01‑1049 | Listen‑Communicate‑Show (LCS): Spoken Language Command of Agent‑based Remote Information Access | ✓ | ✓ | 0.9 | 0.9 |
2001 | H01‑1055 | Natural Language Generation in Dialog Systems | ✓ | ✓ | 1 | 1 |
2001 | H01‑1058 | On Combining Language Models : Oracle Approach | ✓ | ✓ | 0.98 | 0.88 |
2001 | H01‑1068 | A Three‑Tiered Evaluation Approach for Interactive Spoken Dialogue Systems | ✓ | ✓ | 0.67 | 0.67 |
2001 | H01‑1070 | Towards an Intelligent Multilingual Keyboard System | ✓ | ✓ | 1 | 1 |
2001 | N01‑1003 | SPoT: A Trainable Sentence Planner | ✓ | ✓ | 0.84 | 0.82 |
2001 | P01‑1004 | Low‑cost, High‑performance Translation Retrieval: Dumber is Better | ✓ | ✓ | 0.91 | 0.91 |
2001 | P01‑1007 | Guided Parsing of Range Concatenation Languages | ✓ | ✓ | 0.78 | 0.68 |
2001 | P01‑1008 | Extracting Paraphrases from a Parallel Corpus | ✓ | ✓ | 0.95 | 0.95 |
2001 | P01‑1009 | Alternative Phrases and Natural Language Information Retrieval | ✓ | ✓ | 0.69 | 0.64 |
2001 | P01‑1047 | Extending Lambek grammars: a logical account of minimalist grammars | ✓ | ✓ | 0.93 | 0.93 |
2001 | P01‑1056 | Evaluating a Trainable Sentence Planner for a Spoken Dialogue System | ✓ | ✓ | 0.93 | 0.93 |
2001 | P01‑1070 | Using Machine Learning Techniques to Interpret WH‑questions | ✓ | ✓ | 0.8 | 0.67 |
2003 | N03‑1001 | Effective Utterance Classification with Unsupervised Phonotactic Models | ✓ | ✓ | 0.94 | 0.87 |
2003 | N03‑1004 | In Question Answering, Two Heads Are Better Than One | ✓ | ✓ | 0.76 | 0.76 |
2003 | N03‑1012 | Semantic Coherence Scoring Using an Ontology | ✓ | ✓ | 0.96 | 0.92 |
2003 | N03‑1017 | Statistical Phrase‑Based Translation | ✓ | ✓ | 1 | 1 |
2003 | N03‑1018 | A Generative Probabilistic OCR Model for NLP Applications | ✓ | ✓ | 0.78 | 0.71 |
2003 | N03‑1026 | Statistical Sentence Condensation using Ambiguity Packing and Stochastic Disambiguation Methods for Lexical‑Functional Grammar | ✓ | ✓ | 0.76 | 0.63 |
2003 | N03‑1033 | Feature‑Rich Part‑of‑Speech Tagging with a Cyclic Dependency Network | ✓ | ✓ | 0.88 | 0.73 |
2003 | N03‑2003 | Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class‑Dependent Mixtures | ✓ | ✓ | 0.92 | 0.92 |
2003 | N03‑2006 | Adaptation Using Out‑of‑Domain Corpus within EBMT | ✓ | ✓ | 0.67 | 0.61 |
2003 | N03‑2015 | Unsupervised Learning of Morphology for English and Inuktitut | ✓ | ✓ | 0.97 | 0.97 |
2003 | N03‑2017 | Word Alignment with Cohesion Constraint | ✓ | ✓ | 1 | 0.73 |
2003 | N03‑2025 | Bootstrapping for Named Entity Tagging Using Concept‑based Seeds | ✓ | ✓ | 0.84 | 0.78 |
2003 | N03‑2036 | A Phrase‑Based Unigram Model for Statistical Machine Translation | ✓ | ✓ | 0.88 | 0.85 |
2003 | N03‑3010 | Cooperative Model Based Language Understanding in Dialogue | ✓ | ✓ | 0.95 | 0.95 |
2003 | N03‑4004 | TAP‑XL: An Automated Analyst's Assistant | ✓ | ✓ | 0.67 | 0.67 |
2003 | N03‑4010 | JAVELIN: A Flexible, Planner‑Based Architecture for Question Answering | ✓ | ✓ | 1 | 1 |
2003 | P03‑1002 | Using Predicate‑Argument Structures for Information Extraction | ✓ | ✓ | 0.88 | 0.71 |
2003 | P03‑1005 | Hierarchical Directed Acyclic Graph Kernel: Methods for Structured Natural Language Data | ✓ | ✓ | 1 | 0.89 |
2003 | P03‑1009 | Clustering Polysemic Subcategorization Frame Distributions Semantically | ✓ | ✓ | 0.64 | 0.64 |
2003 | P03‑1022 | A Machine Learning Approach to Pronoun Resolution in Spoken Dialogue | ✓ | ✓ | 0.95 | 0.95 |
2003 | P03‑1030 | Optimizing Story Link Detection is not Equivalent to Optimizing New Event Detection | ✓ | ✓ | 1 | 0.95 |
2003 | P03‑1031 | Corpus‑based Discourse Understanding in Spoken Dialogue Systems | ✓ | ✓ | 0.83 | 0.76 |
2003 | P03‑1033 | Flexible Guidance Generation using User Model in Spoken Dialogue Systems | ✓ | ✓ | 0.86 | 0.86 |
2003 | P03‑1050 | Unsupervised Learning of Arabic Stemming using a Parallel Corpus | ✓ | ✓ | 0.96 | 0.96 |
2003 | P03‑1051 | Language Model Based Arabic Word Segmentation | ✓ | ✓ | 0.74 | 0.72 |
2003 | P03‑1058 | Exploiting Parallel Texts for Word Sense Disambiguation: An Empirical Study | ✓ | ✓ | 0.81 | 0.81 |
2003 | P03‑1068 | Towards a Resource for Lexical Semantics: A Large German Corpus with Extensive Semantic Annotation | ✓ | ✓ | 0.8 | 0.8 |
2003 | P03‑1070 | Towards a Model of Face‑to‑Face Grounding | ✓ | ✓ | 0.97 | 0.97 |
2003 | P03‑2036 | Comparison between CFG filtering techniques for LTAG and HPSG | ✓ | ✓ | 1 | 0.91 |
2004 | C04‑1035 | Classifying Ellipsis in Dialogue: A Machine Learning Approach | ✓ | |||
2004 | C04‑1036 | Feature Vector Quality and Distributional Similarity | ✓ | |||
2004 | C04‑1068 | Filtering Speaker‑Specific Words from Electronic Discussions | ✓ | |||
2004 | C04‑1080 | Part of Speech Tagging in Context | ✓ | |||
2004 | C04‑1096 | Generation of Relative Referring Expressions based on Perceptual Grouping | ✓ | |||
2004 | C04‑1103 | Direct Orthographical Mapping for Machine Transliteration | ✓ | |||
2004 | C04‑1106 | Lower and higher estimates of the number of "true analogies" between sentences contained in a large multilingual corpus | ✓ | |||
2004 | C04‑1112 | A Lemma‑Based Approach to a Maximum Entropy Word Sense Disambiguation System for Dutch | ✓ | |||
2004 | C04‑1116 | Term Aggregation: Mining Synonymous Expressions using Personal Stylistic Variations | ✓ | |||
2004 | C04‑1128 | Detection of Question‑Answer Pairs in Email Conversations | ✓ | |||
2004 | C04‑1147 | Fast Computation of Lexical Affinity Models | ✓ | |||
2004 | C04‑1192 | Fine‑Grained Word Sense Disambiguation Based on Parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets | ✓ | |||
2004 | N04‑1022 | Minimum Bayes‑Risk Decoding for Statistical Machine Translation | ✓ | |||
2004 | N04‑1024 | Evaluating Multiple Aspects of Coherence in Student Essays | ✓ | |||
2004 | N04‑4028 | Confidence Estimation for Information Extraction | ✓ | |||
2004 | P04‑2005 | Automatic Acquisition of English Topic Signatures Based on a Second Language | ✓ | |||
2004 | P04‑2010 | A Machine Learning Approach to German Pronoun Resolution | ✓ | |||
2005 | H05‑1005 | Improving Multilingual Summarization: Using Redundancy in the Input to Correct MT errors | ✓ | ✓ | 0.62 | 0.57 |
2005 | H05‑1012 | A Maximum Entropy Word Aligner for Arabic‑English Machine Translation | ✓ | ✓ | 0.83 | 0.83 |
2005 | H05‑1032 | Bayesian Learning in Text Summarization | ✓ | |||
2005 | H05‑1064 | Hidden‑Variable Models for Discriminative Reranking | ✓ | |||
2005 | H05‑1095 | Translating with non‑contiguous phrases | ✓ | ✓ | 0.96 | 0.92 |
2005 | H05‑1101 | Some Computational Complexity Results for Synchronous Context‑Free Grammars | ✓ | ✓ | 0.75 | 0.57 |
2005 | H05‑1115 | Using Random Walks for Question‑focused Sentence Retrieval | ✓ | |||
2005 | H05‑1117 | Automatically Evaluating Answers to Definition Questions | ✓ | ✓ | 0.67 | 0.67 |
2005 | H05‑2007 | Pattern Visualization for Machine Translation Output | ✓ | ✓ | 0.67 | 0.57 |
2005 | I05‑2013 | Automatic recognition of French expletive pronoun occurrences | ✓ | |||
2005 | I05‑2014 | BLEU in characters: towards automatic MT evaluation in languages without word delimiters | ✓ | ✓ | 0.73 | 0.73 |
2005 | I05‑2021 | Evaluating the Word Sense Disambiguation Performance of Statistical Machine Translation | ✓ | ✓ | 0.77 | 0.43 |
2005 | I05‑2043 | Trend Survey on Japanese Natural Language Processing Studies over the Last Decade | ✓ | |||
2005 | I05‑2044 | Two‑Phase Shift‑Reduce Deterministic Dependency Parser of Chinese | ✓ | |||
2005 | I05‑2048 | Statistical Machine Translation Part I: Hands‑On Introduction | ✓ | ✓ | 0.95 | 0.79 |
2005 | I05‑3022 | Chinese Word Segmentation in FTRD Beijing | ✓ | |||
2005 | I05‑4007 | Cross‑lingual Conversion of Lexical Semantic Relations: Building Parallel Wordnets | ✓ | |||
2005 | I05‑4008 | Taiwan Child Language Corpus: Data Collection and Annotation | ✓ | |||
2005 | I05‑4010 | Harvesting the Bitexts of the Laws of Hong Kong From the Web | ✓ | ✓ | 0.53 | 0.53 |
2005 | I05‑5003 | Using Machine Translation Evaluation Techniques to Determine Sentence‑level Semantic Equivalence | ✓ | ✓ | 0.68 | 0.57 |
2005 | I05‑5004 | A Class‑oriented Approach to Building a Paraphrase Corpus | ✓ | |||
2005 | I05‑5008 | Automatic generation of paraphrases to be used as translation references in objective evaluation measures of machine translation | ✓ | ✓ | 0.7 | 0.59 |
2005 | I05‑5009 | Evaluating Contextual Dependency of Paraphrases using a Latent Variable Model | ✓ | |||
2005 | I05‑6010 | Some remarks on the Annotation of Quantifying Noun Groups in Treebanks | ✓ | |||
2005 | I05‑6011 | Annotating Honorifics Denoting Social Ranking of Referents | ✓ | ✓ | 0.87 | 0.87 |
2005 | J05‑1003 | Discriminative Reranking for Natural Language Parsing | ✓ | ✓ | 0.78 | 0.66 |
2005 | J05‑4003 | Improving Machine Translation Performance by Exploiting Non‑Parallel Corpora | ✓ | ✓ | 0.67 | 0.62 |
2005 | P05‑1010 | Probabilistic CFG with latent annotations | ✓ | |||
2005 | P05‑1018 | Modeling Local Coherence: An Entity‑based Approach | ✓ | |||
2005 | P05‑1028 | Exploring and Exploiting the Limited Utility of Captions in Recognizing Intention in Information Graphics | ✓ | |||
2005 | P05‑1032 | Scaling Phrase‑Based Statistical Machine Translation to Larger Corpora and Longer Phrases | ✓ | ✓ | 0.85 | 0.64 |
2005 | P05‑1034 | Dependency Treelet Translation: Syntactically Informed Phrasal SMT | ✓ | ✓ | 0.95 | 0.89 |
2005 | P05‑1039 | What to do when lexicalization fails: parsing German with suffix analysis and smoothing | ✓ | |||
2005 | P05‑1046 | Unsupervised Learning of Field Segmentation Models for Information Extraction | ✓ | |||
2005 | P05‑1048 | Word Sense Disambiguation vs. Statistical Machine Translation | ✓ | ✓ | 0.86 | 0.67 |
2005 | P05‑1053 | Exploring Various Knowledge in Relation Extraction | ✓ | |||
2005 | P05‑1056 | Using Conditional Random Fields For Sentence Boundary Detection In Speech | ✓ | |||
2005 | P05‑1057 | Log‑linear Models for Word Alignment | ✓ | |||
2005 | P05‑1058 | Alignment Model Adaptation for Domain‑Specific Word Alignment | ✓ | |||
2005 | P05‑1067 | Machine Translation Using Probabilistic Synchronous Dependency Insertion Grammars | ✓ | ✓ | 0.93 | 0.58 |
2005 | P05‑1069 | A Localized Prediction Model for Statistical Machine Translation | ✓ | ✓ | 0.85 | 0.64 |
2005 | P05‑1073 | Joint Learning Improves Semantic Role Labeling | ✓ | |||
2005 | P05‑1074 | Paraphrasing with Bilingual Parallel Corpora | ✓ | ✓ | 0.91 | 0.77 |
2005 | P05‑1076 | Automatic Acquisition of Adjectival Subcategorization from Corpora | ✓ | |||
2005 | P05‑2008 | Using Emoticons to reduce Dependency in Machine Learning Techniques for Sentiment Classification | ✓ | |||
2005 | P05‑2013 | Automatic Induction of a CCG Grammar for Turkish | ✓ | |||
2005 | P05‑2016 | Dependency‑Based Statistical Machine Translation | ✓ | ✓ | 0.82 | 0.67 |
2005 | P05‑3001 | An Information‑State Approach to Collaborative Reference | ✓ | |||
2005 | P05‑3025 | Interactively Exploring a Machine Translation Model | ✓ | ✓ | 0.42 | 0.42 |
2005 | P05‑3030 | Organizing English Reading Materials for Vocabulary Learning | ✓ | |||
2006 | E06‑1004 | Computational Complexity of Statistical Machine Translation | ✓ | ✓ | 0.89 | 0.85 |
2006 | E06‑1018 | Word Sense Induction: Triplet‑Based Clustering and Automatic Evaluation | ✓ | ✓ | 0.69 | 0.69 |
2006 | E06‑1022 | Addressee Identification in Face‑to‑Face Meetings | ✓ | ✓ | 0.62 | 0.62 |
2006 | E06‑1031 | CDER: Efficient MT Evaluation Using Block Movements | ✓ | ✓ | 0.87 | 0.71 |
2006 | E06‑1035 | Automatic Segmentation of Multiparty Dialogue | ✓ | ✓ | 0.73 | 0.73 |
2006 | E06‑1041 | Structuring Knowledge for Reference Generation: A Clustering Algorithm | ✓ | ✓ | 0.75 | 0.67 |
2006 | N06‑2009 | Answering the Question You Wish They Had Asked: The Impact of Paraphrasing for Question Answering | ✓ | ✓ | 0.67 | 0.67 |
2006 | N06‑2038 | A Comparison of Tagging Strategies for Statistical Information Extraction | ✓ | ✓ | 0.88 | 0.88 |
2006 | N06‑4001 | InfoMagnets: Making Sense of Corpus Data | ✓ | ✓ | 0.42 | 0.42 |
2006 | P06‑1013 | Ensemble Methods for Unsupervised WSD | ✓ | ✓ | 0.67 | 0.67 |
2006 | P06‑1018 | Polarized Unification Grammars | ✓ | ✓ | 0.87 | 0.87 |
2006 | P06‑1052 | An Improved Redundancy Elimination Algorithm for Underspecified Representations | ✓ | ✓ | 0.87 | 0.56 |
2006 | P06‑2001 | Using Machine Learning Techniques to Build a Comma Checker for Basque | ✓ | ✓ | 0.93 | 0.93 |
2006 | P06‑2012 | Unsupervised Relation Disambiguation Using Spectral Clustering | ✓ | ✓ | 0.86 | 0.86 |
2006 | P06‑2059 | Automatic Construction of Polarity‑tagged Corpus from HTML Documents | ✓ | ✓ | 0.71 | 0.62 |
2006 | P06‑2110 | Word Vectors and Two Kinds of Similarity | ✓ | ✓ | 0.92 | 0.67 |
2006 | P06‑3007 | Investigations on Event‑Based Summarization | ✓ | ✓ | 0.56 | 0.47 |
2006 | P06‑4007 | FERRET: Interactive Question‑Answering for Real‑World Environments | ✓ | ✓ | 0.86 | 0.86 |
2006 | P06‑4011 | Computational Analysis of Move Structures in Academic Abstracts | ✓ | ✓ | 0.77 | 0.73 |
2006 | P06‑4014 | Re‑Usable Tools for Precision Machine Translation | ✓ | ✓ | 0.77 | 0.55 |