ACL RD-TEC 1.0 Summarization of H01-1035
Paper Title:
INDUCING MULTILINGUAL TEXT ANALYSIS TOOLS VIA ROBUST PROJECTION ACROSS ALIGNED CORPORA
INDUCING MULTILINGUAL TEXT ANALYSIS TOOLS VIA ROBUST PROJECTION ACROSS ALIGNED CORPORA
Authors: D. Yarowsky and G. Ngai and R. Wicentowski
Primarily assigned technology terms:
- 5-fold cross-validation
- algorithm
- analysis technique
- analyzer
- automatic word alignment
- bigram training
- bilingual parsing
- boosting
- bootstrapping
- bracketing
- classification
- clustering
- concurrent parsing
- coupled parsing
- cross-language resource projection
- cross-validation
- direct projection
- em bootstrapping
- entity classification
- entity tagger
- error reduction
- giza
- induction
- information retrieval
- joint parsing
- language generation
- learning
- learning algorithms
- learning system
- lemmatization
- lemmatizer
- lexicalized learning
- linking
- model training
- modeling
- modeling technique
- morphological analysis
- morphological analyzer
- morphological analyzers
- morphology
- morphology induction
- multilingual projection
- n-gram modeling
- named entity tagger
- named-entity classification
- noise reduction
- noise-robust tagger
- noun-phrase bracketing
- np bracketing
- parsing
- part-of-speech tagger
- part-of-speech tagging
- phrasal alignment
- pos tagger
- pos tagging
- pos-tagging
- re-estimation
- segmentation
- signal amplification
- smoothing
- spelling
- statistical mt
- string transformation
- stringtransduction-based morphology induction
- supervised learning
- tagger
- tagger training
- taggers
- tagging
- target language generation
- text analysis
- tool development
- training algorithm
- transduction
- transformation-based learning
- translation model training
- trie-based modeling
- verb mapping
- weighting
- word alignment
- word reordering
- word segmentation
- word-alignment
Other assigned terms:
- adjective
- adverb
- affix
- aligned corpus
- ambiguity
- annotated corpora
- annotation
- annotation projection
- approach
- backoff
- backoff model
- basenp
- bias
- bigram
- bigram model
- bilingual corpora
- bilingual corpus
- bilingual text
- bilingual text corpora
- bitext
- brown corpus
- case
- characters
- chunk
- classification accuracy
- cluster
- cohesion
- collocation
- context similarity
- corpora
- cross-language resource
- czech morphology
- data set
- data sets
- dictionary
- direct transfer
- distribution
- english noun phrase
- english translation
- entity class
- error rate
- evaluation data
- evaluation set
- f-measure
- fact
- foreign language
- formalism
- free word order
- french
- generation
- genre
- grammars
- human annotation
- inflection
- knowledge
- lemma
- lexical choice
- linguistic
- linguistic annotation
- mapping
- measure
- methodology
- model performance
- model probability
- model size
- mt formalism
- n-gram
- named entity
- named-entity
- noise
- noun phrase
- noun phrases
- nouns
- parallel corpora
- parallel corpus
- parallel text
- parallelism
- part-of-speech
- parts of speech
- parts-of-speech
- phrase
- pos tag
- precision
- probabilities
- probability
- process
- projection
- pronoun
- relative frequency
- reordering
- seed
- seed words
- sentence
- sentences
- sequence model
- signal
- source language
- stem
- suffix
- system evaluation
- tag sequence
- tag set
- tags
- tagset
- target language
- technique
- terms
- test data
- test data set
- test set
- text
- text corpora
- theory
- tokens
- training
- training data
- training set
- transformation
- translation model
- translations
- verb
- verbal inflection
- vocabulary
- word
- word alignments
- word order
- word type
- words