Concordance

Query corpus 55 >
Sort Right 55 (2,564.2 per million)

lr,2-3-I05-4010,bq	in detail . The resultant <term> bilingual	corpus	</term> , 10.4 M <term> English words </term>	#8255 The resultant bilingual corpus, 10.4M English words and 18.3M Chinese characters, is an authoritative and comprehensive text collection covering the specific and special domain of HK laws.
lr-prod,17-4-H92-1074,bq	definition and development of the <term> CSR pilot	corpus	</term> , and examines the dynamic challenge	#19620 This paper presents an overview of the CSR corpus, reviews the definition and development of the CSR pilot corpus, and examines the dynamic challenge of extending the CSR corpus to meet future needs.
lr,21-5-P03-1051,bq	million <term> word </term><term> unsegmented	corpus	</term> , and re-estimate the <term> model	#4728 To improve the segmentation accuracy, we use an unsupervised algorithm for automatically acquiring new stems from a 155 million word unsegmented corpus, and re-estimate the model parameters with the expanded vocabulary and training corpus.
lr,6-3-P06-1052,bq	</term> . We evaluate the algorithm on a <term>	corpus	</term> , and show that it reduces the degree	#11183 We evaluate the algorithm on acorpus, and show that it reduces the degree of ambiguity significantly while taking negligible runtime.
lr,3-3-P05-1034,bq	component </term> . We align a <term> parallel	corpus	</term> , project the <term> source dependency	#9248 We align a parallel corpus, project the source dependency parse onto the target sentence, extract dependency treelet translation pairs, and train a tree-based ordering model.
lr-prod,7-4-H92-1074,bq	paper presents an overview of the <term> CSR	corpus	</term> , reviews the definition and development	#19609 This paper presents an overview of the CSR corpus, reviews the definition and development of the CSR pilot corpus, and examines the dynamic challenge of extending the CSR corpus to meet future needs.
lr,13-1-N03-2006,bq	</term> based on a small-sized <term> bilingual	corpus	</term> , we use an out-of-domain <term> bilingual	#3093 In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus.
lr,7-2-P05-2016,bq	required is a <term> sentence-aligned parallel	corpus	</term> . All other <term> resources </term>	#9803 The only bilingual resource required is a sentence-aligned parallel corpus.
lr,29-2-C88-2130,bq	</term> derived through analysis of our <term>	corpus	</term> . <term> Chart parsing </term> is <term>	#15495 The model is embodied in a program, APT, that can reproduce segments of actual tape-recorded descriptions, using organizational and discourse strategies derived through analysis of ourcorpus.
lr,23-2-C04-1116,bq	each author 's text as a coherent <term>	corpus	</term> . Our approach is based on the idea	#6137 This paper proposes a new methodology to improve the accuracy of a term aggregation system using each author's text as a coherentcorpus.
lr,28-2-P03-1051,bq	</term> from a large <term> unsegmented Arabic	corpus	</term> . The <term> algorithm </term> uses a	#4668 Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus.
lr,50-3-C04-1147,bq	phrases </term> at any distance in the <term>	corpus	</term> . The framework is flexible , allowing	#6400 In comparison with previous models, which either use arbitrary windows to compute similarity between words or use lexical affinity to create sequential models, in this paper we focus on models intended to capture the co-occurrence patterns of any pair of words or phrases at any distance in thecorpus.
lr,19-2-N03-4010,bq	candidates </term> from the given <term> text	corpus	</term> . The operation of the <term> system	#3682 The demonstration will focus on how JAVELIN processes questions and retrieves the most likely answer candidates from the given text corpus.
lr,34-5-P03-1051,bq	<term> vocabulary </term> and <term> training	corpus	</term> . The resulting <term> Arabic word	#4741 To improve the segmentation accuracy, we use an unsupervised algorithm for automatically acquiring new stems from a 155 million word unsegmented corpus, and re-estimate the model parameters with the expanded vocabulary and training corpus.
lr,19-5-C90-3063,bq	that were randomly selected from the <term>	corpus	</term> . The results of the experiment show	#16689 An experiment was performed to resolve references of the pronoun it in sentences that were randomly selected from thecorpus.
lr,30-2-C04-1192,bq	for the <term> languages </term> in the <term>	corpus	</term> . The <term> wordnets </term> are aligned	#6480 The method exploits recent advances in word alignment and word clustering based on automatic extraction of translation equivalents and being supported by available aligned wordnets for the languages in thecorpus.
lr-prod,26-4-H90-1060,bq	</term> from the <term> DARPA Resource Management	corpus	</term> . This <term> performance </term> is	#17099 With only 12 training speakers for SI recognition, we achieved a 7.5% word error rate on a standard grammar and test set from the DARPA Resource Management corpus.
lr,29-5-J05-4003,bq	and exploiting a large <term> non-parallel	corpus	</term> . Thus , our method can be applied	#9098 We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus.
lr,15-2-C90-3063,bq	co-occurrence patterns </term> in a large <term>	corpus	</term> . To a large extent , these <term>	#16631 This paper presents an automatic scheme for collecting statistics on co-occurrence patterns in a largecorpus.
lr-prod,15-3-H94-1014,bq	word </term><term> Wall Street Journal text	corpus	</term> . Using the <term> BU recognition system	#21261 The models were constructed using a 5K vocabulary and trained using a 76 million word Wall Street Journal text corpus.


	in Help