ACL RD-TEC 1.0 Summarization of I05-2039
Paper Title:
THE INFLUENCE OF DATA HOMOGENEITY ON NLP SYSTEM PERFORMANCE
THE INFLUENCE OF DATA HOMOGENEITY ON NLP SYSTEM PERFORMANCE
Primarily assigned technology terms:
Other assigned terms:
- approach
- arithmetic mean
- array
- bleu
- case
- characters
- coefficient
- community
- contemporary english
- conversation
- corpora
- correlation
- data homogeneity
- dialogues
- distribution
- edit distance
- error rate
- estimation
- evaluation measures
- fact
- frequency counts
- geometric mean
- grammars
- japanese language
- japanese sentences
- knowledge
- language model
- language model perplexity
- language models
- large corpora
- large corpus
- lexeme
- lexemes
- measure
- measures
- method
- model complexity
- model perplexity
- mt quality
- multilingual corpus
- n-gram
- n-gram model
- nist
- nlp community
- objective translation
- perplexity
- probabilistic models
- reference translation
- reference translations
- semantic
- sentence
- sentences
- signal
- similarity scores
- speech database
- standard deviation
- style
- sublanguage
- system performance
- target language
- terms
- test corpus
- test set
- text
- training
- training corpus
- training data
- transcriptions
- transcripts
- translation quality
- translations
- trees
- user
- word
- word error rate
- word frequency
- words