lr,2-3-I05-4010,bq The resultant <term> bilingual corpus </term> , 10.4 M <term> English words </term> and 18.3 M <term> Chinese characters </term> , is an authoritative and comprehensive <term> text collection </term> covering the specific and special domain of HK laws .
lr-prod,17-4-H92-1074,bq This paper presents an overview of the <term> CSR corpus </term> , reviews the definition and development of the <term> CSR pilot corpus </term> , and examines the dynamic challenge of extending the <term> CSR corpus </term> to meet future needs .
lr,21-5-P03-1051,bq To improve the <term> segmentation </term><term> accuracy </term> , we use an <term> unsupervised algorithm </term> for automatically acquiring new <term> stems </term> from a 155 million <term> word </term><term> unsegmented corpus </term> , and re-estimate the <term> model parameters </term> with the expanded <term> vocabulary </term> and <term> training corpus </term> .
lr,6-3-P06-1052,bq We evaluate the algorithm on a <term> corpus </term> , and show that it reduces the degree of <term> ambiguity </term> significantly while taking negligible runtime .
lr,3-3-P05-1034,bq We align a <term> parallel corpus </term> , project the <term> source dependency parse </term> onto the target <term> sentence </term> , extract <term> dependency treelet translation pairs </term> , and train a <term> tree-based ordering model </term> .
lr-prod,7-4-H92-1074,bq This paper presents an overview of the <term> CSR corpus </term> , reviews the definition and development of the <term> CSR pilot corpus </term> , and examines the dynamic challenge of extending the <term> CSR corpus </term> to meet future needs .
lr,13-1-N03-2006,bq In order to boost the <term> translation quality </term> of <term> EBMT </term> based on a small-sized <term> bilingual corpus </term> , we use an out-of-domain <term> bilingual corpus </term> and , in addition , the <term> language model </term> of an in-domain <term> monolingual corpus </term> .
lr,7-2-P05-2016,bq The only <term> bilingual resource </term> required is a <term> sentence-aligned parallel corpus </term> .
lr,29-2-C88-2130,bq The <term> model </term> is embodied in a program , <term> APT </term> , that can reproduce segments of actual tape-recorded descriptions , using <term> organizational and discourse strategies </term> derived through analysis of our <term> corpus </term> .
lr,23-2-C04-1116,bq This paper proposes a new methodology to improve the <term> accuracy </term> of a <term> term aggregation system </term> using each author 's text as a coherent <term> corpus </term> .
lr,28-2-P03-1051,bq Our method is seeded by a small <term> manually segmented Arabic corpus </term> and uses it to bootstrap an <term> unsupervised algorithm </term> to build the <term> Arabic word segmenter </term> from a large <term> unsegmented Arabic corpus </term> .
lr,50-3-C04-1147,bq In comparison with previous <term> models </term> , which either use arbitrary <term> windows </term> to compute <term> similarity </term> between <term> words </term> or use <term> lexical affinity </term> to create <term> sequential models </term> , in this paper we focus on <term> models </term> intended to capture the <term> co-occurrence patterns </term> of any pair of <term> words </term> or <term> phrases </term> at any distance in the <term> corpus </term> .
lr,19-2-N03-4010,bq The demonstration will focus on how <term> JAVELIN </term> processes <term> questions </term> and retrieves the most likely <term> answer candidates </term> from the given <term> text corpus </term> .
lr,34-5-P03-1051,bq To improve the <term> segmentation </term><term> accuracy </term> , we use an <term> unsupervised algorithm </term> for automatically acquiring new <term> stems </term> from a 155 million <term> word </term><term> unsegmented corpus </term> , and re-estimate the <term> model parameters </term> with the expanded <term> vocabulary </term> and <term> training corpus </term> .
lr,19-5-C90-3063,bq An experiment was performed to resolve <term> references </term> of the <term> pronoun </term><term> it </term> in <term> sentences </term> that were randomly selected from the <term> corpus </term> .
lr,30-2-C04-1192,bq The method exploits recent advances in <term> word alignment </term> and <term> word clustering </term> based on <term> automatic extraction of translation equivalents </term> and being supported by available <term> aligned wordnets </term> for the <term> languages </term> in the <term> corpus </term> .
lr-prod,26-4-H90-1060,bq With only 12 <term> training speakers </term> for <term> SI recognition </term> , we achieved a 7.5 % <term> word error rate </term> on a standard <term> grammar </term> and <term> test set </term> from the <term> DARPA Resource Management corpus </term> .
lr,29-5-J05-4003,bq We also show that a good-quality <term> MT system </term> can be built from scratch by starting with a very small <term> parallel corpus </term> ( 100,000 <term> words </term> ) and exploiting a large <term> non-parallel corpus </term> .
lr,15-2-C90-3063,bq This paper presents an <term> automatic scheme </term> for collecting <term> statistics </term> on <term> co-occurrence patterns </term> in a large <term> corpus </term> .
lr-prod,15-3-H94-1014,bq The models were constructed using a 5K <term> vocabulary </term> and trained using a 76 million <term> word </term><term> Wall Street Journal text corpus </term> .
hide detail