ACL RD-TEC 1.0 Summarization of E95-1010
Paper Title:
TEXT ALIGNMENT IN THE REAL WORLD: IMPROVING ALIGNMENTS OF NOISY TRANSLATIONS USING COMMON LEXICAL FEATURES, STRING MATCHING STRATEGIES AND N-GRAM COMPARISONS
TEXT ALIGNMENT IN THE REAL WORLD: IMPROVING ALIGNMENTS OF NOISY TRANSLATIONS USING COMMON LEXICAL FEATURES, STRING MATCHING STRATEGIES AND N-GRAM COMPARISONS
Authors: Mark W. Davis and Ted E. Dunning and William C. Ogden
Primarily assigned technology terms:
- algorithm
- alignment algorithm
- alignment process
- approximation
- automatic alignment
- co-occurrence matching
- data fusion
- deep analysis
- dynamic programming
- feature matching
- hard matching
- heuristic segmentation
- information retrieval
- information retrieval system
- language understanding
- matching
- matching algorithm
- memory management
- multi-lingual information retrieval
- n-gram matching
- number matching
- programming framework
- programming system
- retrieval system
- retrieving
- scoring
- segmentation
- smoothing
- string match
- string matching
- text alignment
- text segmentation
- tokenization
- translation process
- translator
- translators
Other assigned terms:
- abbreviations
- alignment probability
- approach
- bilingual dictionaries
- case
- characters
- chunks
- co-occurrence
- computational overhead
- corpora
- data set
- derivation
- dictionaries
- dictionary
- distribution
- document
- ellipsis
- english language
- english text
- english translations
- fact
- feature
- heuristic
- heuristics
- histogram
- implementation
- information sources
- knowledge
- language expression
- lexical feature
- lexical features
- measure
- measures
- method
- multi-lingual information
- n-gram
- n-gram match
- n-grams
- names
- noise
- norm
- paragraph
- paragraphs
- parallel corpora
- parallel text
- parallel texts
- phrase
- posteriori probability
- priori
- probabilities
- probability
- probability density
- procedure
- process
- proper names
- questionnaire
- segments
- sentence
- sentence boundaries
- sentence boundary
- sentences
- source text
- sources of information
- standard deviation
- statistics
- technical terms
- technique
- term
- terms
- test corpus
- text
- text segments
- training
- training set
- transcriptions
- translations
- understanding
- uniform distribution
- window size
- word
- words