ACL RD-TEC 1.0 Summarization of W05-0708
Paper Title:
POS TAGGING OF DIALECTAL ARABIC: A MINIMALLY SUPERVISED APPROACH
POS TAGGING OF DIALECTAL ARABIC: A MINIMALLY SUPERVISED APPROACH
Authors: Kevin Duh and Katrin Kirchhoff
Primarily assigned technology terms:
- algorithm
- analyzer
- clustering
- clustering algorithm
- contextual model interpolation
- data generation
- data sharing
- decoding
- distribution-based clustering
- eca tagger
- expectation-maximization
- expectation-maximization algorithm
- frequency counting
- graphical model
- hidden markov
- hidden markov model
- hmm tagger
- joint training
- language modeling
- language processing
- language processing technology
- likelihood estimate
- linear interpolation
- markov model
- matching
- maximum likelihood
- model interpolation
- modeling
- morphological analysis
- morphological analyzer
- morphological analyzers
- naive bayes
- natural language processing
- nlp
- nlp system
- part-of-speech tagger
- pos tagging
- preprocessing
- processing
- processing technology
- root-based clustering
- scoring
- smoothing
- spelling
- splitting
- statistical modeling
- stemmer
- supervised training
- tagger
- taggers
- tagging
- tokenization
- training method
- training process
- trigram tagger
- unsupervised tagging
- unsupervised training
- viterbi
- viterbi algorithm
- weighting
Other assigned terms:
- affix
- affixation
- affixes
- ambiguous words
- annotation
- approach
- bias
- bilingual dictionary
- case
- chinese\/english corpora
- cluster
- clusters
- conditional probability
- contextual model
- corpora
- corpus frequency
- corpus size
- data set
- data sets
- data sparseness
- development set
- dialectal speech
- dictionary
- distribution
- evaluation set
- fact
- generation
- generation process
- hmm model
- interpolation
- joint probability
- knowledge
- language corpora
- lexical model
- lexicon
- lexicon entry
- likelihood
- mapping
- maximum likelihood estimate
- method
- modern standard arabic
- msa treebank
- mutual information
- n-gram
- n-grams
- natural language
- noise
- notational simplicity
- noun phrase
- opinions
- parallel corpora
- part-of-speech
- particle
- phrase
- prepositions
- probabilistic model
- probabilistic models
- probabilities
- probability
- probability distribution
- probability distributions
- process
- punctuation
- relative frequency
- russian
- sentence
- spoken language
- spoken language corpora
- standard arabic
- stem
- stems
- substring
- suffix
- tag sequence
- tagger lexicon
- tagging accuracy
- tagging performance
- tags
- tagset
- technique
- technology
- terms
- text
- tokens
- toolkit
- topics
- training
- training corpus
- training data
- training set
- transcriptions
- transcripts
- treebank
- treebank corpus
- trigram
- trigram model
- unannotated corpora
- verb
- vocabulary
- word
- word alignments
- word features
- word fragments
- word order
- word sequences
- word types
- words