Concordance

Query corpus 55 >
Sort Right 55 (2,564.2 per million)

lr,2-3-I05-4010,bq	The resultant <term> bilingual corpus </term> , 10.4 M <term> English words </term> and 18.3 M <term> Chinese characters </term> , is an authoritative and comprehensive <term> text collection </term> covering the specific and special domain of HK laws .	#8255 The resultant bilingual corpus, 10.4M English words and 18.3M Chinese characters, is an authoritative and comprehensive text collection covering the specific and special domain of HK laws.
lr-prod,17-4-H92-1074,bq	This paper presents an overview of the <term> CSR corpus </term> , reviews the definition and development of the <term> CSR pilot corpus </term> , and examines the dynamic challenge of extending the <term> CSR corpus </term> to meet future needs .	#19620 This paper presents an overview of the CSR corpus, reviews the definition and development of the CSR pilot corpus, and examines the dynamic challenge of extending the CSR corpus to meet future needs.
lr,21-5-P03-1051,bq	To improve the <term> segmentation </term><term> accuracy </term> , we use an <term> unsupervised algorithm </term> for automatically acquiring new <term> stems </term> from a 155 million <term> word </term><term> unsegmented corpus </term> , and re-estimate the <term> model parameters </term> with the expanded <term> vocabulary </term> and <term> training corpus </term> .	#4728 To improve the segmentation accuracy, we use an unsupervised algorithm for automatically acquiring new stems from a 155 million word unsegmented corpus, and re-estimate the model parameters with the expanded vocabulary and training corpus.
lr,6-3-P06-1052,bq	We evaluate the algorithm on a <term> corpus </term> , and show that it reduces the degree of <term> ambiguity </term> significantly while taking negligible runtime .	#11183 We evaluate the algorithm on acorpus, and show that it reduces the degree of ambiguity significantly while taking negligible runtime.
lr,3-3-P05-1034,bq	We align a <term> parallel corpus </term> , project the <term> source dependency parse </term> onto the target <term> sentence </term> , extract <term> dependency treelet translation pairs </term> , and train a <term> tree-based ordering model </term> .	#9248 We align a parallel corpus, project the source dependency parse onto the target sentence, extract dependency treelet translation pairs, and train a tree-based ordering model.
lr-prod,7-4-H92-1074,bq	This paper presents an overview of the <term> CSR corpus </term> , reviews the definition and development of the <term> CSR pilot corpus </term> , and examines the dynamic challenge of extending the <term> CSR corpus </term> to meet future needs .	#19609 This paper presents an overview of the CSR corpus, reviews the definition and development of the CSR pilot corpus, and examines the dynamic challenge of extending the CSR corpus to meet future needs.
lr,13-1-N03-2006,bq	In order to boost the <term> translation quality </term> of <term> EBMT </term> based on a small-sized <term> bilingual corpus </term> , we use an out-of-domain <term> bilingual corpus </term> and , in addition , the <term> language model </term> of an in-domain <term> monolingual corpus </term> .	#3093 In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus.
lr,7-2-P05-2016,bq	The only <term> bilingual resource </term> required is a <term> sentence-aligned parallel corpus </term> .	#9803 The only bilingual resource required is a sentence-aligned parallel corpus.
lr,29-2-C88-2130,bq	The <term> model </term> is embodied in a program , <term> APT </term> , that can reproduce segments of actual tape-recorded descriptions , using <term> organizational and discourse strategies </term> derived through analysis of our <term> corpus </term> .	#15495 The model is embodied in a program, APT, that can reproduce segments of actual tape-recorded descriptions, using organizational and discourse strategies derived through analysis of ourcorpus.
lr,23-2-C04-1116,bq	This paper proposes a new methodology to improve the <term> accuracy </term> of a <term> term aggregation system </term> using each author 's text as a coherent <term> corpus </term> .	#6137 This paper proposes a new methodology to improve the accuracy of a term aggregation system using each author's text as a coherentcorpus.
lr,28-2-P03-1051,bq	Our method is seeded by a small <term> manually segmented Arabic corpus </term> and uses it to bootstrap an <term> unsupervised algorithm </term> to build the <term> Arabic word segmenter </term> from a large <term> unsegmented Arabic corpus </term> .	#4668 Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus.
lr,50-3-C04-1147,bq	In comparison with previous <term> models </term> , which either use arbitrary <term> windows </term> to compute <term> similarity </term> between <term> words </term> or use <term> lexical affinity </term> to create <term> sequential models </term> , in this paper we focus on <term> models </term> intended to capture the <term> co-occurrence patterns </term> of any pair of <term> words </term> or <term> phrases </term> at any distance in the <term> corpus </term> .	#6400 In comparison with previous models, which either use arbitrary windows to compute similarity between words or use lexical affinity to create sequential models, in this paper we focus on models intended to capture the co-occurrence patterns of any pair of words or phrases at any distance in thecorpus.
lr,19-2-N03-4010,bq	The demonstration will focus on how <term> JAVELIN </term> processes <term> questions </term> and retrieves the most likely <term> answer candidates </term> from the given <term> text corpus </term> .	#3682 The demonstration will focus on how JAVELIN processes questions and retrieves the most likely answer candidates from the given text corpus.
lr,34-5-P03-1051,bq	To improve the <term> segmentation </term><term> accuracy </term> , we use an <term> unsupervised algorithm </term> for automatically acquiring new <term> stems </term> from a 155 million <term> word </term><term> unsegmented corpus </term> , and re-estimate the <term> model parameters </term> with the expanded <term> vocabulary </term> and <term> training corpus </term> .	#4741 To improve the segmentation accuracy, we use an unsupervised algorithm for automatically acquiring new stems from a 155 million word unsegmented corpus, and re-estimate the model parameters with the expanded vocabulary and training corpus.
lr,19-5-C90-3063,bq	An experiment was performed to resolve <term> references </term> of the <term> pronoun </term><term> it </term> in <term> sentences </term> that were randomly selected from the <term> corpus </term> .	#16689 An experiment was performed to resolve references of the pronoun it in sentences that were randomly selected from thecorpus.
lr,30-2-C04-1192,bq	The method exploits recent advances in <term> word alignment </term> and <term> word clustering </term> based on <term> automatic extraction of translation equivalents </term> and being supported by available <term> aligned wordnets </term> for the <term> languages </term> in the <term> corpus </term> .	#6480 The method exploits recent advances in word alignment and word clustering based on automatic extraction of translation equivalents and being supported by available aligned wordnets for the languages in thecorpus.
lr-prod,26-4-H90-1060,bq	With only 12 <term> training speakers </term> for <term> SI recognition </term> , we achieved a 7.5 % <term> word error rate </term> on a standard <term> grammar </term> and <term> test set </term> from the <term> DARPA Resource Management corpus </term> .	#17099 With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from the DARPA Resource Management corpus.
lr,29-5-J05-4003,bq	We also show that a good-quality <term> MT system </term> can be built from scratch by starting with a very small <term> parallel corpus </term> ( 100,000 <term> words </term> ) and exploiting a large <term> non-parallel corpus </term> .	#9098 We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus.
lr,15-2-C90-3063,bq	This paper presents an <term> automatic scheme </term> for collecting <term> statistics </term> on <term> co-occurrence patterns </term> in a large <term> corpus </term> .	#16631 This paper presents an automatic scheme for collecting statistics on co-occurrence patterns in a largecorpus.
lr-prod,15-3-H94-1014,bq	The models were constructed using a 5K <term> vocabulary </term> and trained using a 76 million <term> word </term><term> Wall Street Journal text corpus </term> .	#21261 The models were constructed using a 5K vocabulary and trained using a 76 million word Wall Street Journal text corpus.


	in Help