It gives users the ability to spend their time finding more data relevant to their task , and gives them translingual reach into other <term> languages </term> by leveraging <term> human language technology </term> .
It works by calculating <term> eigenvectors </term> of an <term> adjacency graph </term> 's <term> Laplacian </term> to recover a <term> submanifold </term> of data from a <term> high dimensionality space </term> and then performing <term> cluster number estimation </term> on the <term> eigenvectors </term> .
This data collection effort has been co-ordinated by <term> MADCOW ( Multi-site ATIS Data COllection Working group ) </term> .
Our <term> document understanding technology </term> is implemented in a system called <term> IDUS ( Intelligent Document Understanding System ) </term> , which creates the data for a <term> text retrieval application </term> and the <term> automatic generation of hypertext links </term> .
lr,11-2-P03-1058,bq In this paper , we evaluate an approach to automatically acquire <term> sense-tagged training data </term> from <term> English-Chinese parallel corpora </term> , which are then used for disambiguating the <term> nouns </term> in the <term> SENSEVAL-2 English lexical sample task </term> .
lr,13-1-H05-1012,bq This paper presents a <term> maximum entropy word alignment algorithm </term> for <term> Arabic-English </term> based on <term> supervised training data </term> .
lr,13-4-P03-1033,bq Moreover , the <term> models </term> are automatically derived by <term> decision tree learning </term> using real <term> dialogue data </term> collected by the <term> system </term> .
lr,14-1-P03-1058,bq A central problem of <term> word sense disambiguation ( WSD ) </term> is the lack of <term> manually sense-tagged data </term> required for <term> supervised learning </term> .
lr,15-1-N03-1001,bq This paper describes a method for <term> utterance classification </term> that does not require <term> manual transcription </term> of <term> training data </term> .
lr,17-5-H05-1095,bq Experimental results are presented , that demonstrate how the proposed <term> method </term> allows to better generalize from the <term> training data </term> .
lr,19-2-P01-1004,bq We take a selection of both <term> bag-of-words and segment order-sensitive string comparison methods </term> , and run each over both <term> character - and word-segmented data </term> , in combination with a range of <term> local segment contiguity models </term> ( in the form of <term> N-grams </term> ) .
lr,2-1-N03-2003,bq Sources of <term> training data </term> suitable for <term> language modeling </term> of <term> conversational speech </term> are limited .
lr,27-3-H90-1060,bq In addition , combination of the <term> training speakers </term> is done by averaging the <term> statistics > </term> of <term> independently trained models </term> rather than the usual pooling of all the <term> speech data </term> from many <term> speakers </term> prior to <term> training </term> .
lr,37-4-P03-1058,bq On a subset of the most difficult <term> SENSEVAL-2 nouns </term> , the <term> accuracy </term> difference between the two approaches is only 14.0 % , and the difference could narrow further to 6.5 % if we disregard the advantage that <term> manually sense-tagged data </term> have in their <term> sense coverage </term> .
lr,43-2-N03-2003,bq In this paper , we show how <term> training data </term> can be supplemented with <term> text </term> from the <term> web </term> filtered to match the <term> style </term> and/or <term> topic </term> of the target <term> recognition task </term> , but also that it is possible to get bigger performance gains from the <term> data </term> by using <term> class-dependent interpolation </term> of <term> N-grams </term> .
lr,6-3-J05-4003,bq Using this <term> approach </term> , we extract <term> parallel data </term> from large <term> Chinese , Arabic , and English non-parallel newspaper corpora </term> .
lr,7-2-N03-2003,bq In this paper , we show how <term> training data </term> can be supplemented with <term> text </term> from the <term> web </term> filtered to match the <term> style </term> and/or <term> topic </term> of the target <term> recognition task </term> , but also that it is possible to get bigger performance gains from the <term> data </term> by using <term> class-dependent interpolation </term> of <term> N-grams </term> .
lr,7-4-N03-1012,bq An evaluation of our <term> system </term> against the <term> annotated data </term> shows that , it successfully classifies 73.2 % in a <term> German corpus </term> of 2.284 <term> SRHs </term> as either coherent or incoherent ( given a <term> baseline </term> of 54.55 % ) .
lr,8-6-N01-1003,bq The <term> SPR </term> uses <term> ranking rules </term> automatically learned from <term> training data </term> .
lr,9-1-H05-2007,bq We describe a <term> method </term> for identifying systematic <term> patterns </term> in <term> translation data </term> using <term> part-of-speech tag sequences </term> .
hide detail