ACL RD-TEC 1.0 Summarization of E06-1030
Paper Title:
WEB TEXT CORPUS FOR NATURAL LANGUAGE PROCESSING
WEB TEXT CORPUS FOR NATURAL LANGUAGE PROCESSING
Authors: Vinci Liu and James R. Curran
Primarily assigned technology terms:
- algorithm
- approximation
- automatic thesaurus extraction
- boundary detection
- breadth-first search
- c + +
- candidate selection
- character encoding
- classifier
- context-sensitive spelling
- context-sensitive spelling correction
- disambiguation
- disambiguation problem
- encoding
- entropy classifier
- extraction system
- information retrieval
- language processing
- learner
- linking
- machine translation
- maximum entropy
- maximum entropy classifier
- memory-based learner
- n-gram estimation
- natural language processing
- nlp
- nlp systems
- processing
- querying
- question answering
- question answering system
- random walk
- repair
- sampling
- search
- search engine
- search engines
- semi-supervised algorithm
- sentence boundary detection
- set disambiguation
- spelling
- spelling correction
- supervised method
- text processing
- thesaurus extraction
- tokenisation
- unsupervised method
- web server
- web spider
- winnow method
- world wide web
Other assigned terms:
- adjective
- approach
- bias
- bigram
- british national corpus
- brown corpus
- case
- co-occurrence
- collocation
- community
- context vectors
- context window
- corpora
- corpus size
- correlation
- countability
- data sets
- dictionary
- document
- entropy
- estimation
- evaluation method
- extraction process
- fact
- frame
- frequency counts
- generation
- genre
- gold standard
- human judgement
- hypothesis
- implementation
- linguistic
- linguistic information
- method
- n-gram
- natural language
- nlp applications
- nlp tasks
- noun similarity
- nouns
- paragraphs
- parts of speech
- penn treebank
- penn treebank project
- phrase
- phrase attachment
- pp attachment
- preposition
- prepositional phrase
- prepositional phrase attachment
- process
- punctuation
- punctuation information
- queries
- relation
- running time
- seed
- sentence
- sentence boundaries
- sentence boundary
- sentences
- server
- statistics
- synonym
- synonyms
- tags
- target word
- technique
- term
- terms
- test set
- testing set
- text
- text corpus
- thesaurus
- tokens
- topics
- training
- training data
- training set
- transformation
- translation candidate
- tree
- treebank
- treebank project
- typographical errors
- unigram
- vocabulary
- web corpus
- web documents
- web graph
- web page
- web pages
- web text
- word
- word corpus
- words