At MIT Lincoln Laboratory , we have been developing a <term> Korean-to-English machine translation system </term><term> CCLINC ( Common Coalition Language System at Lincoln Laboratory ) </term> .
We have built and will demonstrate an application of this approach called <term> LCS-Marine </term> .
We have demonstrated this capability in several field exercises with the Marines and are currently developing applications of this <term> technology </term> in <term> new domains </term> .
Recent advances in <term> Automatic Speech Recognition technology </term> have put the goal of naturally sounding <term> dialog systems </term> within reach .
<term> Techniques for automatically training </term> modules of a <term> natural language generator </term> have recently been proposed , but a fundamental concern is whether the <term> quality </term> of <term> utterances </term> produced with <term> trainable components </term> can compete with <term> hand-crafted template-based or rule-based approaches </term> .
Surprisingly , learning <term> phrases </term> longer than three <term> words </term> and learning <term> phrases </term> from <term> high-accuracy word-level alignment models </term> does not have a strong impact on performance .
<term> FSM </term> provides two strategies for <term> language understanding </term> and have a high accuracy but little robustness and flexibility .
Experiment results have shown that a <term> system </term> that exploits the proposed <term> method </term> performs sufficiently and that holding multiple <term> candidates </term> for <term> understanding </term> results is effective .
On a subset of the most difficult <term> SENSEVAL-2 nouns </term> , the <term> accuracy </term> difference between the two approaches is only 14.0 % , and the difference could narrow further to 6.5 % if we disregard the advantage that <term> manually sense-tagged data </term> have in their <term> sense coverage </term> .
The results show that the <term> features </term> in terms of which we formulate our <term> heuristic principles </term> have significant <term> predictive power </term> , and that <term> rules </term> that closely resemble our <term> Horn clauses </term> can be learnt automatically from these <term> features </term> .
Along the way , we present the first comprehensive comparison of <term> unsupervised methods for part-of-speech tagging </term> , noting that published results to date have not been comparable across <term> corpora </term> or <term> lexicons </term> .
This paper investigates some <term> computational problems </term> associated with <term> probabilistic translation models </term> that have recently been adopted in the literature on <term> machine translation </term> .
This tends to support the view that despite recent speculative claims to the contrary , current <term> SMT models </term> do have limitations in comparison with dedicated <term> WSD models </term> , and that <term> SMT </term> should benefit from the better predictions made by the <term> WSD models </term> .
Over the last few years dramatic improvements have been made , and a number of comparative evaluations have shown , that <term> SMT </term> gives competitive results to <term> rule-based translation systems </term> , requiring significantly less development time .
Over the last few years dramatic improvements have been made , and a number of comparative evaluations have shown , that <term> SMT </term> gives competitive results to <term> rule-based translation systems </term> , requiring significantly less development time .
In this paper we study a set of problems that are of considerable importance to <term> Statistical Machine Translation ( SMT ) </term> but which have not been addressed satisfactorily by the <term> SMT research community </term> .
Over the last decade , a variety of <term> SMT algorithms </term> have been built and empirically tested whereas little is known about the <term> computational complexity </term> of some of the fundamental problems of <term> SMT </term> .
We first apply approaches that have been proposed for <term> predicting top-level topic shifts </term> to the problem of <term> identifying subtopic boundaries </term> .
We also find that the <term> transcription errors </term> inevitable in <term> ASR output </term> have a negative impact on models that combine <term> lexical-cohesion and conversational features </term> , but do not change the general preference of approach for the two tasks .
Finally , we have shown that these results can be improved using a bigger and a more homogeneous <term> corpus </term> to train , that is , a bigger <term> corpus </term> written by one unique <term> author </term> .
hide detail