ACL RD-TEC 1.0 Summarization of P03-1020
Paper Title:
TRUECASING
TRUECASING
Authors: Lucian Vlad Lita and Abe Ittycheriah and Salim Roukos and Nanda Kambhatla
Primarily assigned technology terms:
- algorithm
- approximation
- automatic content extraction
- automatic machine translation
- automatic speech recognition
- bootstrap
- capitalization
- case disambiguation
- case restoration
- character recognition
- chinese-to-english translation
- classification
- computing
- decoding
- disambiguation
- entity recognition
- error reduction
- greedy approach
- hidden markov
- hidden markov model
- language modeling
- language processing
- machine translation
- machine translation evaluation
- machine translation system
- markov model
- maximum entropy
- mention detection
- mention detection task
- modeling
- named entity recognition
- natural language processing
- nlp
- normalization
- optical character recognition
- post-processing
- probability estimation
- processing
- recognition
- rule-based system
- sentence splitting
- speech recognition
- splitting
- surface form restoration
- tagger
- tagging
- text processing
- translation evaluation
- translation evaluation method
- translation system
- viterbi
- viterbi algorithm
- weighting
Other assigned terms:
- ace corpus
- ambiguous word
- approach
- asr output
- baseline score
- beam
- bigram
- bleu
- bleu score
- bleu scores
- boundary information
- broadcast news
- broadcast news data
- case
- case information
- context information
- corpora
- data sets
- detection task
- distribution
- entropy
- estimation
- evaluation method
- evaluations
- f-measure
- fact
- feature
- feature space
- interpolation
- labeling
- language model
- language models
- large training
- lattice
- lexical item
- lexical items
- local context
- machine translation output
- mapping
- meaning
- method
- model parameters
- morphological features
- n-gram
- n-grams
- named entities
- named entity
- names
- natural language
- natural language text
- nist
- nlp tasks
- nouns
- organization names
- perplexity
- person names
- precision
- probabilities
- probability
- procedure
- process
- proper name
- punctuation
- qualitative analysis
- random sample
- semantic
- semantic categories
- sentence
- sentence boundaries
- sentence level
- sentences
- statistical approach
- statistical model
- statistics
- surface form
- system performance
- technique
- terms
- test data
- test set
- text
- text corpora
- text segment
- tokens
- training
- training corpus
- training data
- training examples
- training material
- transformation
- transition probabilities
- translation output
- translations
- trigram
- trigram language model
- unigram
- unigram model
- vocabulary
- weighting scheme
- word
- words