ACL RD-TEC 1.0 Summarization of W06-1109
Paper Title:
STUDY OF SOME DISTANCE MEASURES FOR LANGUAGE AND ENCODING IDENTIFICATION
STUDY OF SOME DISTANCE MEASURES FOR LANGUAGE AND ENCODING IDENTIFICATION
Primarily assigned technology terms:
- add-k smoothing
- algorithm
- categorization
- computational linguistics
- em algorithm
- encoding
- graphical user interface
- internet
- java
- language processing
- matching
- measuring
- morphology
- natural language processing
- nlp
- partial matching
- processing
- pruning
- reading
- sampling
- smoothing
- statistical method
- tagging
- text categorization
- text encoding
- translator
- translators
- user interface
Other assigned terms:
- alphabet
- approach
- association for computational linguistics
- bigram
- case
- characters
- cross entropy
- data sets
- distance measure
- distribution
- document
- entropy
- esperanto
- fact
- heuristics
- hindi
- implementation
- knowledge
- kullback-leibler distance
- labeling
- language model
- language models
- likelihood
- linguistic
- linguistics
- mappings
- measure
- measures
- meta-data
- method
- multilingual text
- mutual information
- n-gram
- n-gram model
- n-gram score
- n-grams
- natural language
- nlp applications
- noise
- probabilities
- probability
- sentences
- similarity measure
- similarity measures
- similarity score
- statistical approach
- statistics
- terms
- test data
- text
- training
- training data
- training size
- trigram
- user
- web page
- web pages
- west european languages
- word
- word morphology
- word n-gram model
- words