ACL RD-TEC 1.0 Summarization of J03-3005
Paper Title:
USING THE WEB TO OBTAIN FREQUENCIES FOR UNSEEN BIGRAMS
USING THE WEB TO OBTAIN FREQUENCIES FOR UNSEEN BIGRAMS
Authors: Frank Keller and Mirella Lapata
Primarily assigned technology terms:
- algorithm
- analyzer
- bracketing
- candidate selection
- chart parser
- chi-square test
- chunking
- class-based probability estimation
- class-based smoothing
- clustering
- computational linguistics
- computing
- context-sensitive spelling
- context-sensitive spelling correction
- correlation analysis
- cross-validation
- databases
- disambiguation
- distance-weighted averaging
- em-based clustering
- em-based smoothing
- estimation method
- example-based machine translation
- expectation maximization
- gsearch
- heuristic method
- hypothesis testing
- internet
- language modeling
- learning
- learning algorithms
- linear interpolation
- machine translation
- magnitude estimation
- measuring
- mining
- modeling
- morphological analyzer
- nlp
- parameter tuning
- parser
- part-of-speech tagger
- pos tagging
- predictor
- probabilistic modeling
- probability estimation
- pseudodisambiguation
- querying
- randomization
- rating
- regression
- resampling
- retrieving
- sampling
- search
- search engine
- search engines
- sense disambiguation
- significance testing
- smoothing
- smoothing method
- smoothing technique
- smoothing techniques
- spelling
- spelling correction
- tagger
- tagging
- tuning
- word sense disambiguation
Other assigned terms:
- adjective
- ambiguity
- anaphora
- annotation
- approach
- association measure
- bias
- bigram
- british english
- british national corpus
- case
- class-based approach
- class-based model
- clustering model
- co-occurrence
- co-occurrence frequency
- co-occurrences
- coefficient
- compounds
- concept
- concept hierarchy
- conditional model
- conditional probabilities
- conditional probability
- conditional probability model
- context-free grammar
- corpora
- corpus evidence
- corpus frequency
- corpus size
- correlation
- correlation coefficient
- correlations
- data set
- data sets
- data sparseness
- data sparseness problem
- development set
- dictionary
- distribution
- distributional similarity
- error rate
- estimation
- evaluations
- fact
- french
- frequency counts
- genre
- grammar
- head noun
- heuristic
- heuristics
- human judgments
- hypothesis
- interpolation
- interpretation
- joint probability
- joint probability model
- language model
- linguistic
- linguistic data
- linguistic phenomenon
- linguistics
- linguistics literature
- manual annotation
- measure
- method
- model parameters
- n-gram
- n-grams
- nantc coefficient
- nlp tasks
- noise
- nominal anaphora
- nouns
- parameter settings
- part of speech
- part-of-speech
- pp attachment
- predicate-argument
- probabilities
- probability
- probability estimates
- probability model
- procedure
- queries
- query
- questionnaire
- random order
- relation
- selectional association
- semantic
- semantic class
- semantic classes
- semantic hierarchy
- semantic similarity
- sense ambiguity
- sentence
- sparse data
- sparseness problem
- spoken language
- statistics
- syntactic patterns
- syntactic relations
- syntax
- tagged corpus
- taxonomy
- technique
- terms
- test data
- test set
- text
- text corpus
- theoretical linguistics
- training
- training corpus
- training data
- training set
- transcripts
- transformation
- translations
- tree
- trigram
- trigram language model
- unigram
- verb
- webexp software package
- word
- word error rate
- word sense
- word sense ambiguity
- word senses
- word sequences
- wordnet
- wordnet taxonomy
- words