ACL RD-TEC 1.0 Summarization of W06-1705
Paper Title:
ANNOTATED WEB AS CORPUS
ANNOTATED WEB AS CORPUS
Authors: Paul Rayson and James Walkerdine and William H. Fletcher and Adam Kilgarriff
Primarily assigned technology terms:
- adobe pdf
- annotation system
- automated tagging
- batch processing
- caching
- character encoding
- chart parser
- coding
- comparative evaluation
- computational linguistics
- computing
- corpus annotation
- corpus collection
- corpus linguistics
- corpus processing
- crawler
- database
- development environment
- digital library
- disambiguation
- distributed computation
- distributed corpus annotation
- distributed processing
- encoding
- frequency profiling
- gsearch
- instant messaging
- instant messenger
- instantiation
- internet
- internet search
- language engineering
- language processing
- learning
- lemmatisation
- linguistic analysis
- load balancing
- messaging
- morphology
- multivalent pdf extracttext
- natural language processing
- network computing
- nlp
- object-oriented
- p2p application framework
- parallel processing
- parser
- parsers
- parsing
- part-of-speech tagging
- processing
- processing tools
- processor
- search
- search engine
- search engines
- searching
- sense disambiguation
- sense tagging
- shallow parsing
- statistical natural language processing
- taggers
- tagging
- word sense disambiguation
- word sense tagging
Other assigned terms:
- academic writing
- annotated corpora
- annotated corpus
- annotation
- annotation framework
- annotation process
- annotator
- approach
- british national corpus
- case
- characters
- co-occurrence
- community
- computing infrastructure
- corpora
- corpus size
- data sets
- dictionaries
- dictionary
- dictionary entries
- discourse
- distribution
- document
- english dictionary
- frequency counts
- gold-standard sub-corpus
- grid
- hypothesis
- implementation
- intention
- internet archive
- lexical frequency
- lexicography
- linguist
- linguistic
- linguistic annotation
- linguistics
- linguistics research
- linguists
- mark-up
- mechanisms
- metadata
- method
- methodology
- n-grams
- natural language
- parse
- part-of-speech
- part-of-speech tags
- parts-ofspeech
- phrase
- probabilities
- process
- representative corpora
- seed
- sentences
- server
- sparse data
- sparse data problem
- statistical natural language
- syntactic level
- syntax
- system development
- tags
- teaching
- technique
- technology
- terms
- text
- thesaurus
- trees
- user
- wall street journal corpus
- web corpus
- web pages
- word
- word co-occurrence
- word corpus
- word frequencies
- word frequency
- word sense
- word types
- words