ACL RD-TEC 1.0 Summarization of P05-1071
Paper Title:
ARABIC TOKENIZATION, PART-OF-SPEECH TAGGING AND MORPHOLOGICAL DISAMBIGUATION IN ONE FELL SWOOP
ARABIC TOKENIZATION, PART-OF-SPEECH TAGGING AND MORPHOLOGICAL DISAMBIGUATION IN ONE FELL SWOOP
Authors: Nizar Habash and Owen Rambow
Primarily assigned technology terms:
- algorithm
- analyzer
- arabic morphological analyzer
- chunking
- classification
- classifier
- classifiers
- corpus-based evaluation
- data preparation
- data representation
- databases
- decoding
- disambiguation
- error reduction
- identification
- learner
- learning
- machine learning
- matching
- morphological analysis
- morphological analyzer
- morphological analyzers
- morphological disambiguation
- morphological generator
- parsing
- part-of-speech tagging
- phrase chunking
- pos tagging
- processing
- root identification
- rule-based classifier
- segmentation
- splitting
- stem identification
- support vector machines
- tagger
- tagging
- tokenization
- unsupervised identification
- unsupervised learning
- unsupervised segmentation
- viterbi
- viterbi decoding
- word tokenization
Other assigned terms:
- affixation
- affixes
- ambiguity
- annotation
- approach
- arabic orthography
- arabic treebank
- backoff
- binary feature
- buckwalter lexicon
- case
- confidence measure
- confidence score
- corpora
- data consortium
- dictionary
- english penn treebank
- exponential model
- f-measure
- fact
- feature
- feature sets
- generation
- gold standard
- heuristics
- implementation
- inflection
- interpretation
- knowledge
- large corpus
- lexicon
- linguistic
- linguistic data
- linguistic data consortium
- linguistic features
- linguistic knowledge
- meaning
- measure
- measures
- mood
- morphological features
- morphological variation
- nouns
- nunation
- orthography
- part-of-speech
- part-of-speech tag
- particles
- parts-of-speech
- penn treebank
- phrase
- pos tag
- precision
- prefixes and suffixes
- prepositions
- process
- pronouns
- punctuation
- relation
- representations
- run-time
- sentence
- stem
- stems
- suffix
- suffixes
- support vector
- symbols
- tag set
- tagging accuracy
- tags
- tagset
- term
- terms
- test corpora
- test corpus
- text
- tokens
- training
- training corpora
- training corpus
- training data
- treebank
- trigram
- trigram model
- unannotated corpus
- unigram
- verb
- word
- word classes
- word form
- word stem
- words