other,31-1-H01-1040,bq In this paper we show how two standard outputs from <term> information extraction ( IE ) systems </term> - <term> named entity annotations </term> and <term> scenario templates </term> - can be used to enhance access to <term> text collections </term> via a standard <term> text browser </term> .
tech,36-1-H01-1040,bq In this paper we show how two standard outputs from <term> information extraction ( IE ) systems </term> - <term> named entity annotations </term> and <term> scenario templates </term> - can be used to enhance access to <term> text collections </term> via a standard <term> text browser </term> .
tech,38-3-H01-1040,bq We also report results of a preliminary , <term> qualitative user evaluation </term> of the <term> system </term> , which while broadly positive indicates further work needs to be done on the <term> interface </term> to make <term> users </term> aware of the increased potential of <term> IE-enhanced text browsers </term> .
other,11-7-H01-1042,bq Subjects were given a set of up to six extracts of <term> translated newswire text </term> .
other,20-2-P01-1008,bq We present an <term> unsupervised learning algorithm </term> for <term> identification of paraphrases </term> from a <term> corpus of multiple English translations </term> of the same <term> source text </term> .
other,31-1-N03-1018,bq In this paper , we introduce a <term> generative probabilistic optical character recognition ( OCR ) model </term> that describes an end-to-end process in the <term> noisy channel framework </term> , progressing from generation of <term> true text </term> through its transformation into the <term> noisy output </term> of an <term> OCR system </term> .
other,37-3-N03-1018,bq We present an implementation of the <term> model </term> based on <term> finite-state models </term> , demonstrate the <term> model </term> 's ability to significantly reduce <term> character and word error rate </term> , and provide evaluation results involving <term> automatic extraction </term> of <term> translation lexicons </term> from <term> printed text </term> .
other,13-2-N03-2003,bq In this paper , we show how <term> training data </term> can be supplemented with <term> text </term> from the <term> web </term> filtered to match the <term> style </term> and/or <term> topic </term> of the target <term> recognition task </term> , but also that it is possible to get bigger performance gains from the <term> data </term> by using <term> class-dependent interpolation </term> of <term> N-grams </term> .
other,24-1-N03-4010,bq The <term> JAVELIN system </term> integrates a flexible , <term> planning-based architecture </term> with a variety of <term> language processing modules </term> to provide an <term> open-domain question answering capability </term> on <term> free text </term> .
lr,19-2-N03-4010,bq The demonstration will focus on how <term> JAVELIN </term> processes <term> questions </term> and retrieves the most likely <term> answer candidates </term> from the given <term> text corpus </term> .
lr,1-3-P03-1050,bq No <term> parallel text </term> is needed after the <term> training phase </term> .
lr,0-4-P03-1050,bq <term> Monolingual , unannotated text </term> can be used to further improve the <term> stemmer </term> by allowing it to adapt to a desired <term> domain </term> or <term> genre </term> .
lr,26-6-P03-1050,bq Our <term> resource-frugal approach </term> results in 87.5 % <term> agreement </term> with a state of the art , proprietary <term> Arabic stemmer </term> built using <term> rules </term> , <term> affix lists </term> , and <term> human annotated text </term> , in addition to an <term> unsupervised component </term> .
other,16-7-P03-1050,bq <term> Task-based evaluation </term> using <term> Arabic information retrieval </term> indicates an improvement of 22-38 % in <term> average precision </term> over <term> unstemmed text </term> , and 96 % of the performance of the proprietary <term> stemmer </term> above .
tech,3-1-C04-1116,bq We present a <term> text mining method </term> for finding <term> synonymous expressions </term> based on the <term> distributional hypothesis </term> in a set of coherent <term> corpora </term> .
This paper proposes a new methodology to improve the <term> accuracy </term> of a <term> term aggregation system </term> using each author 's text as a coherent <term> corpus </term> .
other,26-4-P04-2005,bq Our method takes advantage of the different way in which <term> word senses </term> are lexicalised in <term> English </term> and <term> Chinese </term> , and also exploits the large amount of <term> Chinese text </term> available in <term> corpora </term> and on the <term> Web </term> .
lr,11-4-P04-2010,bq Furthermore , we present a standalone system that resolves <term> pronouns </term> in <term> unannotated text </term> by using a fully automatic sequence of <term> preprocessing modules </term> that mimics the manual <term> annotation process </term> .
tech,24-5-P04-2010,bq Although the system performs well within a limited textual domain , further research is needed to make it effective for <term> open-domain question answering </term> and <term> text summarisation </term> .
other,35-1-I05-4010,bq In this paper we present our recent work on harvesting <term> English-Chinese bitexts </term> of the laws of Hong Kong from the <term> Web </term> and aligning them to the <term> subparagraph </term> level via utilizing the <term> numbering system </term> in the <term> legal text hierarchy </term> .
hide detail