ACL RD-TEC 1.0 Summarization of J03-3002
Paper Title:
ARTICLES THE WEB AS A PARALLEL CORPUS
ARTICLES THE WEB AS A PARALLEL CORPUS
Authors: Philip Resnik and Noah A. Smith
Primarily assigned technology terms:
- algorithm
- alignment algorithm
- alignment process
- analysis tool
- approximation
- automatic language identification
- automatic lexical acquisition
- bipartite matching
- bootstrapping
- browser
- candidate generation
- candidate pair classification
- chunk alignment
- classification
- classifier
- classifier construction
- classifiers
- computational linguistics
- computing
- crawling
- cross-language information retrieval
- cross-language ir
- cross-lingual information retrieval
- cross-validation
- database
- databases
- decision tree
- decision tree classifier
- decision tree induction
- decision tree learner
- decision trees
- document translation
- dynamic programming
- dynamic programming algorithm
- dynamic programming technique
- english machine translation
- expectationmaximization
- greedy approximation
- identification
- induction
- information retrieval
- interactive cross-language ir
- internet
- language identification
- language processing
- learner
- learning
- lemmatizer
- lexical acquisition
- lexical matching
- lexicon induction
- linear regression
- linking
- machine translation
- machine-learning
- matching
- matching algorithm
- maximum weighted bipartite matching
- mining
- morphological analysis
- multilingual natural language processing
- n-gram language identification
- natural language processing
- parallelization
- parameter estimation
- pattern matching
- pearson correlation
- preprocessing
- processing
- programming algorithm
- programming technique
- querying
- random selection
- rating
- recognition
- reduction strategy
- regression
- regular expression
- sample selection
- sampling
- scoring
- search
- search engine
- search engines
- sentence alignment
- single classifier
- statistical mt
- statistical text-based sentence alignment
- statistical translation
- strand classifier
- string matching
- structural alignment
- structural matching
- structural translation
- structural translation recognition
- supervised learning
- supervised training
- text alignment
- text retrieval
- text search
- text-based sentence alignment
- thresholding
- tokenization
- tokenizer
- training procedure
- translation detection
- translation lexicon induction
- translators
- tree classifier
- tree induction
- tuning
- wayback machine
- web crawling
- web interface
- web mining
- web search
- web-mining
- weighted bipartite matching
- weighting
- word-to-word translation
- world wide web
Other assigned terms:
- abbreviations
- agreement score
- anchor
- anchors
- annotation
- annotator
- annotators
- approach
- arabic text
- arabic-english parallel corpus
- association for computational linguistics
- attribute-value pairs
- axioms
- basque
- bilingual dictionary
- bilingual lexicon
- bilingual lexicons
- bilingual text
- bitext
- case
- characters
- chunk
- chunks
- classification performance
- classification task
- cluster
- co-occurrence
- coefficient
- community
- computational complexity
- computational linguists
- confidence scores
- content words
- corpora
- corpus size
- correlation
- correlation coefficient
- cross-validation experiment
- data consortium
- data set
- development set
- dictionaries
- dictionary
- disk
- distribution
- document
- document collections
- document frequency
- document length
- document set
- document structure
- dutch
- edit distance
- english-chinese corpus
- english-chinese parallel corpus
- estimation
- evaluation measures
- evaluations
- events
- f measure
- f score
- f-measure
- fact
- feature
- french
- french translation
- generation
- generation process
- generation system
- genre
- gold standard
- heuristic
- html document
- human annotators
- human judgment
- human judgments
- implementation
- index
- information sources
- information theory
- internet archive
- inverse document frequency
- joint probability
- knowledge
- language pair
- language pairs
- language resources
- language-dependent knowledge
- lemma
- lexical translation
- lexical word
- lexicon
- lexicon entries
- linear regression model
- linguistic
- linguistic data
- linguistic data consortium
- linguistic knowledge
- linguistic resources
- linguistics
- linguists
- machine translation output
- mapping
- markup
- matching process
- mean average precision
- meaning
- measure
- measures
- mechanisms
- method
- multilingual corpus
- multinomial distribution
- mutual information
- n-gram
- named entities
- names
- natural language
- noisy translation lexicon
- paragraph
- parallel corpora
- parallel corpus
- parallel text
- parallel texts
- pearson correlation coefficient
- precision
- prefixes and suffixes
- probabilistic model
- probabilities
- probability
- probability distribution
- procedure
- process
- projection
- punctuation
- queries
- random order
- random sample
- regression model
- relation
- representations
- search space
- seed
- semantic
- semantic network
- sentence
- sentence level
- sentences
- similarity measure
- similarity score
- size of the corpus
- suffixes
- tags
- technique
- terms
- test collection
- test set
- text
- text length
- theory
- tokens
- training
- training data
- training material
- translation lexicon
- translation model
- translation models
- translation output
- translation pair
- translation pairs
- translation probabilities
- translation quality
- translational equivalence
- translations
- tree
- tree structures
- trees
- vertex
- vocabulary
- vocabulary size
- web page
- web pages
- web site
- web-based document
- weighted edit distance
- word
- word frequency
- word order
- word pair
- word-to-word translation model
- wordnet
- words