This paper describes a method for <term> utterance classification </term> that does not require <term> manual transcription </term> of <term> training data </term> .
Surprisingly , learning <term> phrases </term> longer than three <term> words </term> and learning <term> phrases </term> from <term> high-accuracy word-level alignment models </term> does not have a strong impact on performance .
Since multiple <term> candidates </term> for the <term> understanding </term> result can be obtained for a <term> user utterance </term> due to the <term> ambiguity </term> of <term> speech understanding </term> , it is not appropriate to decide on a single <term> understandingresult </term> after each <term> user utterance </term> .
Along the way , we present the first comprehensive comparison of <term> unsupervised methods for part-of-speech tagging </term> , noting that published results to date have not been comparable across <term> corpora </term> or <term> lexicons </term> .
However , such an approach does not work well when there is no distinctive <term> attribute </term> among <term> objects </term> .
Our study reveals that the proposed method not only reduces an extensive system development effort but also improves the <term> transliteration accuracy </term> significantly .
But <term> computational linguists </term> seem to be quite dubious about <term> analogies between sentences </term> : they would not be enough numerous to be of any use .
According to our assumption , most of the words with similar <term> context features </term> in each author 's <term> corpus </term> tend not to be <term> synonymous expressions </term> .
While <term> sentence extraction </term> as an approach to <term> summarization </term> has been shown to work in <term> documents </term> of certain <term> genres </term> , because of the conversational nature of <term> email communication </term> where <term> utterances </term> are made in relation to one made previously , <term> sentence extraction </term> may not capture the necessary <term> segments </term> of <term> dialogue </term> that would make a <term> summary </term> coherent .
The <term> method </term> combined the <term> log-likelihood </term> under a <term> baseline model </term> ( that of <term> Collins [ 1999 ] </term> ) with evidence from an additional 500,000 <term> features </term> over <term> parse trees </term> that were not included in the original <term> model </term> .
We train a <term> maximum entropy classifier </term> that , given a pair of <term> sentences </term> , can reliably determine whether or not they are <term> translations </term> of each other .
Using a state-of-the-art <term> Chinese word sense disambiguation model </term> to choose <term> translation candidates </term> for a typical <term> IBM statistical MT system </term> , we find that <term> word sense disambiguation </term> does not yield significantly better <term> translation quality </term> than the <term> statistical machine translation system </term> alone .
In this paper we study a set of problems that are of considerable importance to <term> Statistical Machine Translation ( SMT ) </term> but which have not been addressed satisfactorily by the <term> SMT research community </term> .
We also find that the <term> transcription errors </term> inevitable in <term> ASR output </term> have a negative impact on models that combine <term> lexical-cohesion and conversational features </term> , but do not change the general preference of approach for the two tasks .
<term> InfoMagnets </term> aims at making <term> exploratory corpus analysis </term> accessible to researchers who are not experts in <term> text mining </term> .
After several experiments , and trained with a little <term> corpus </term> of 100,000 <term> words </term> , the system guesses correctly not placing <term> commas </term> with a <term> precision </term> of 96 % and a <term> recall </term> of 98 % .
In this paper , we report a system <term> FROFF </term> which can make a fair copy of not only texts but also graphs and tables indispensable to our papers .
This paper defends that view , but claims that direct imitation of human performance is not the best way to implement many of these <term> non-literal aspects of communication </term> ; that the new technology of powerful <term> personal computers </term> with integral <term> graphics displays </term> offers techniques superior to those of humans for these aspects , while still satisfying <term> human communication needs </term> .
However , this is not the only area in which the principles of the system might be used , and the aim in building it was simply to demonstrate the workability of the general mechanism , and provide a framework for assessing developments of it .
<term> Determiners </term> play an important role in conveying the <term> meaning </term> of an <term> utterance </term> , but they have often been disregarded , perhaps because it seemed more important to devise methods to grasp the <term> global meaning </term> of a <term> sentence </term> , even if not in a precise way .
hide detail