ACL RD-TEC 1.0 Summarization of N04-1016
Paper Title:
THE WEB AS A BASELINE: EVALUATING THE PERFORMANCE OF UNSUPERVISED WEB-BASED MODELS FOR A RANGE OF NLP TASKS
THE WEB AS A BASELINE: EVALUATING THE PERFORMANCE OF UNSUPERVISED WEB-BASED MODELS FOR A RANGE OF NLP TASKS
Authors: Mirella Lapata and Frank Keller
Primarily assigned technology terms:
- algorithm
- bracketing
- candidate selection
- classification
- classifier
- classifiers
- compiler
- compound noun interpretation
- context-sensitive spelling
- context-sensitive spelling correction
- countability learning
- disambiguation
- ensemble learning
- expectation maximization
- expectation maximization algorithm
- gsearch
- language generation
- language translation
- latent semantic analysis
- learning
- learning methods
- learning task
- machine learning
- machine learning methods
- machine translation
- matching
- maximization algorithm
- memory-based classifier
- memory-based learning
- natural language generation
- nlp
- noun countability detection
- noun interpretation
- noun paraphrasing
- parameter setting
- paraphrasing
- parsing
- pattern matching
- query expansion
- right-branching
- search
- search engine
- semantic analysis
- spelling
- spelling correction
- statistical approaches
- supervised method
- syntactic disambiguation
- target word selection
- transformation-based learning
- unsupervised method
- word selection
Other assigned terms:
- adjective
- altavista model
- ambiguity
- approach
- backoff
- backoff model
- bag of words
- baseline model
- bigram
- bigram model
- bilingual lexica
- bilingual sentence
- british english
- brown corpus
- case
- characters
- co-occurrence
- co-occurrence frequency
- comlex
- compound noun
- compounds
- concept
- concepts
- conditional probability
- context words
- corpora
- countability
- data set
- data sets
- data sparseness
- data sparseness problem
- dependency model
- determiner
- determiners
- development set
- dictionary
- dictionary entries
- disk
- encyclopedia
- estimation
- fact
- feature
- feature set
- generation
- generation task
- gold standard
- head noun
- heuristic
- heuristics
- hypothesis
- inflected forms
- interpretation
- japanese-to-english semantic transfer dictionary
- language corpora
- large corpus
- latent semantic
- lexica
- lexical entries
- lexicon
- likelihood
- linguistic
- linguistic information
- message
- method
- mixture models
- model parameters
- modifier
- morphological information
- n-gram
- n-grams
- natural language
- nlp tasks
- noise
- noun countability
- nouns
- opinion
- paraphrases
- part-of-speech
- part-of-speech tags
- parts of speech
- pos information
- preposition
- prepositions
- probabilistic model
- probabilities
- probability
- probability estimates
- queries
- query
- random sample
- relation
- relative frequency
- search term
- semantic
- semantic and pragmatic
- semantic information
- semantic interpretation
- semantic relations
- semantic transfer
- semantic transfer dictionary
- sentence
- source language
- sparseness problem
- spelling error
- statistical information
- subject-verb agreement
- supervised model
- syntactic relations
- syntax
- syntax and semantics
- tags
- target language
- target word
- taxonomy
- term
- terms
- test corpus
- test data
- test set
- text
- thesaurus
- tokens
- training
- training data
- transfer dictionary
- transitive closure
- translation accuracy
- translation candidates
- translations
- trigram
- trigram model
- unigram
- unigram model
- verb
- wall street journal corpus
- word
- word corpus
- word features
- word sequence
- word window
- word-based model
- words