Concordance

Query corpus 55 >
Multilevel Sort 55 >
Shuffle 55 (2,564.2 per million)

lr,12-4-C92-1055,bq	possible variations between the <term> training	corpus	</term> and the real tasks are also taken	#17893 To make the proposed algorithm robust, the possible variations between the training corpus and the real tasks are also taken into consideration by enlarging the separation margin between the correct candidate and its competing members.
lr-prod,7-4-H92-1074,bq	paper presents an overview of the <term> CSR	corpus	</term> , reviews the definition and development	#19609 This paper presents an overview of the CSR corpus, reviews the definition and development of the CSR pilot corpus, and examines the dynamic challenge of extending the CSR corpus to meet future needs.
lr,19-2-N03-4010,bq	candidates </term> from the given <term> text	corpus	</term> . The operation of the <term> system	#3682 The demonstration will focus on how JAVELIN processes questions and retrieves the most likely answer candidates from the given text corpus.
lr,19-5-J05-4003,bq	starting with a very small <term> parallel	corpus	</term> ( 100,000 <term> words </term> ) and	#9088 We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus.
lr,29-2-C88-2130,bq	</term> derived through analysis of our <term>	corpus	</term> . <term> Chart parsing </term> is <term>	#15495 The model is embodied in a program, APT, that can reproduce segments of actual tape-recorded descriptions, using organizational and discourse strategies derived through analysis of ourcorpus.
lr,6-3-P06-1052,bq	</term> . We evaluate the algorithm on a <term>	corpus	</term> , and show that it reduces the degree	#11183 We evaluate the algorithm on acorpus, and show that it reduces the degree of ambiguity significantly while taking negligible runtime.
lr,20-1-N03-2006,bq	, we use an out-of-domain <term> bilingual	corpus	</term> and , in addition , the <term> language	#3100 In order to boost the translation quality of EBMT based on a small-sized bilingual corpus, we use an out-of-domain bilingual corpus and, in addition, the language model of an in-domain monolingual corpus.
lr,7-2-P03-1051,bq	by a small <term> manually segmented Arabic	corpus	</term> and uses it to bootstrap an <term>	#4648 Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus.
lr,3-3-H92-1026,bq	process </term> in a novel way . We use a <term>	corpus	of bracketed sentences </term> , called a	#18946 We use acorpus of bracketed sentences, called a Treebank, in combination with decision tree building to tease out the relevant aspects of a parse tree that will determine the correct parse of a sentence.
lr-prod,7-2-H92-1074,bq	now old <term> Resource Management ( RM )	corpus	</term> that has fueled <term> DARPA speech	#19565 This corpus essentially supersedes the now old Resource Management (RM) corpus that has fueled DARPA speech recognition technology development for the past 5 years.
lr,6-1-H92-1003,bq	recently collected <term> spoken language	corpus	</term> for the <term> ATIS ( Air Travel Information	#18532 This paper describes a recently collected spoken language corpus for the ATIS (Air Travel Information System) domain.
lr,8-1-P06-2059,bq	method of building <term> polarity-tagged	corpus	</term> from <term> HTML documents </term> .	#11401 This paper proposes a novel method of building polarity-tagged corpus from HTML documents.
lr,22-2-P03-1050,bq	a small ( 10K sentences ) <term> parallel	corpus	</term> as its sole <term> training resources	#4469 The stemming model is based on statistical machine translation and it uses an English stemmer and a small (10K sentences) parallel corpus as its sole training resources.
lr,11-4-P05-1074,bq	extracted from a <term> bilingual parallel	corpus	</term> to be ranked using <term> translation	#9729 We define a paraphrase probability that allows paraphrases extracted from a bilingual parallel corpus to be ranked using translation probabilities, and show how it can be refined to take contextual information into account.
lr-prod,2-3-H92-1074,bq	for the past 5 years . The new <term> CSR	corpus	</term> supports research on major new problems	#19583 The new CSR corpus supports research on major new problems including unlimited vocabulary, natural grammar, and spontaneous speech.
lr-prod,29-4-H92-1074,bq	dynamic challenge of extending the <term> CSR	corpus	</term> to meet future needs . <term> Language	#19631 This paper presents an overview of the CSR corpus, reviews the definition and development of the CSR pilot corpus, and examines the dynamic challenge of extending the CSR corpus to meet future needs.
lr,29-5-J05-4003,bq	and exploiting a large <term> non-parallel	corpus	</term> . Thus , our method can be applied	#9098 We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus.
lr,21-5-P03-1051,bq	million <term> word </term><term> unsegmented	corpus	</term> , and re-estimate the <term> model	#4728 To improve the segmentation accuracy, we use an unsupervised algorithm for automatically acquiring new stems from a 155 million word unsegmented corpus, and re-estimate the model parameters with the expanded vocabulary and training corpus.
lr,52-3-A94-1011,bq	, and does not require a <term> pre-tagged	corpus	</term> to fit . One of the distinguishing	#19998 A novel method for adding linguistic annotation to corpora is presented which involves using a statistical POS tagger in conjunction with unsupervised structure finding methods to derive notions of noun group, verb group, and so on which is inherently extensible to more sophisticated annotation, and does not require a pre-tagged corpus to fit.
lr,9-1-P03-1068,bq	of a large , <term> semantically annotated	corpus	</term> resource as reliable basis for the	#4943 We describe the ongoing construction of a large, semantically annotated corpus resource as reliable basis for the large-scale acquisition of word-semantic information, e.g. the construction of domain-independent lexica.


	in Help