ACL RD-TEC 1.0 Summarization of P05-1064
Paper Title:
A PHONOTACTIC LANGUAGE MODEL FOR SPOKEN LANGUAGE IDENTIFICATION
A PHONOTACTIC LANGUAGE MODEL FOR SPOKEN LANGUAGE IDENTIFICATION
Authors: Haizhou Li and Bin Ma
Primarily assigned technology terms:
- acoustic modeling
- algorithm
- categorization
- classification
- classifier
- classifiers
- computing
- database
- databases
- decoding
- decomposition
- digital signal processing
- dimension reduction
- document categorization
- document representation
- error rate reduction
- hidden markov
- hidden markov models
- identification
- information retrieval
- k-nearest-neighbor
- language identification
- language modeling
- language processing
- language recognition
- latent semantic analysis
- modeling
- morphology
- n-gram language modeling
- natural language processing
- normalization
- phone recognition
- phoneme recognition
- processing
- rate reduction
- recognition
- recognizer
- semantic analysis
- signal processing
- singular value decomposition
- smoothing
- spectral analysis
- speech production
- speech recognition
- speech recognizer
- spoken document categorization
- spoken language identification
- statistical language modeling
- table look-up
- text categorization
- text-categorization
- tokenization
- tokenizer
- vector quantization
- vector space modeling
- viterbi
- viterbi algorithm
- weighting
Other assigned terms:
- acoustic model
- acoustic models
- acoustic score
- alphabet
- analogy
- approach
- bigram
- categorization problem
- channel noise
- characters
- chinese characters
- co-occurrence
- co-occurrences
- concept
- conditional probability
- conditional probability distribution
- conversation
- conversational telephone speech
- cyrillic script
- dependent vocabulary
- development set
- distance measure
- distance metric
- distribution
- document
- document vector
- document vectors
- error rate
- evaluation data
- evaluation set
- evaluations
- fact
- feature
- finite mixture model
- formalism
- frame
- french
- function words
- hindi
- histogram
- language model
- language models
- latent semantic
- latin alphabet
- lexical word
- linear combination
- linguistic
- markov models
- measure
- methodology
- model size
- n-gram
- n-gram language model
- n-grams
- natural language
- nist
- noise
- norm
- phoneme
- phonemes
- phonotactic distance
- phonotactic information
- probability
- probability distribution
- procedure
- projection
- prosody
- recognition evaluation
- semantic
- semantic domain
- signal
- spoken language
- statistical framework
- statistics
- suffixes
- syntax
- technique
- technology
- term-document matrix
- terms
- text
- text categorization problem
- text documents
- tokens
- training
- training corpus
- training data
- training documents
- training set
- trigram
- unigram
- utterance
- vector space
- vocabulary
- weighting scheme
- word
- words