At MIT Lincoln Laboratory , we have been developing a <term> Korean-to-English machine translation system </term><term> CCLINC ( Common Coalition Language System at Lincoln Laboratory ) </term> .
Having been trained on <term> Korean newspaper articles </term> on missiles and chemical biological warfare , the <term> system </term> produces the <term> translation output </term> sufficient for content understanding of the <term> original document </term> .
The issue of <term> system response </term> to <term> users </term> has been extensively studied by the <term> natural language generation community </term> , though rarely in the context of <term> dialog systems </term> .
The <term> oracle </term> knows the <term> reference word string </term> and selects the <term> word string </term> with the best <term> performance </term> ( typically , <term> word or semantic error rate </term> ) from a list of <term> word strings </term> , where each <term> word string </term> has been obtained by using a different <term> LM </term> .
<term> Techniques for automatically training </term> modules of a <term> natural language generator </term> have recently been proposed , but a fundamental concern is whether the <term> quality </term> of <term> utterances </term> produced with <term> trainable components </term> can compete with <term> hand-crafted template-based or rule-based approaches </term> .
<term> Link detection </term> has been regarded as a core technology for the <term> Topic Detection and Tracking tasks </term> of <term> new event detection </term> .
<term> Dialogue strategies </term> based on the <term> user modeling </term> are implemented in <term> Kyoto city bus information system </term> that has been developed at our laboratory .
Along the way , we present the first comprehensive comparison of <term> unsupervised methods for part-of-speech tagging </term> , noting that published results to date have not been comparable across <term> corpora </term> or <term> lexicons </term> .
While <term> sentence extraction </term> as an approach to <term> summarization </term> has been shown to work in <term> documents </term> of certain <term> genres </term> , because of the conversational nature of <term> email communication </term> where <term> utterances </term> are made in relation to one made previously , <term> sentence extraction </term> may not capture the necessary <term> segments </term> of <term> dialogue </term> that would make a <term> summary </term> coherent .
This paper investigates some <term> computational problems </term> associated with <term> probabilistic translation models </term> that have recently been adopted in the literature on <term> machine translation </term> .
Much effort has been put in designing and evaluating dedicated <term> word sense disambiguation ( WSD ) models </term> , in particular with the <term> Senseval </term> series of workshops .
Surprisingly however , the <term> WSD </term><term> accuracy </term> of <term> SMT models </term> has never been evaluated and compared with that of the dedicated <term> WSD models </term> .
Over the last few years dramatic improvements have been made , and a number of comparative evaluations have shown , that <term> SMT </term> gives competitive results to <term> rule-based translation systems </term> , requiring significantly less development time .
<term> STTK </term> has been developed by the presenter and co-workers over a number of years and is currently used as the basis of <term> CMU 's SMT system </term> .
It has also successfully been coupled with <term> rule-based and example based machine translation modules </term> to build a <term> multi engine machine translation system </term> .
In this paper we study a set of problems that are of considerable importance to <term> Statistical Machine Translation ( SMT ) </term> but which have not been addressed satisfactorily by the <term> SMT research community </term> .
Over the last decade , a variety of <term> SMT algorithms </term> have been built and empirically tested whereas little is known about the <term> computational complexity </term> of some of the fundamental problems of <term> SMT </term> .
The correlation of the new <term> measure </term> with <term> human judgment </term> has been investigated systematically on two different <term> language pairs </term> .
We first apply approaches that have been proposed for <term> predicting top-level topic shifts </term> to the problem of <term> identifying subtopic boundaries </term> .
As evidence of its usefulness and usability , it has been used successfully in a research context to uncover relationships between <term> language </term> and <term> behavioral patterns </term> in two distinct domains : <term> tutorial dialogue </term> ( Kumar et al. , submitted ) and <term> on-line communities </term> ( Arguello et al. , 2006 ) .
hide detail