ACL RD-TEC 1.0 Summarization of J93-2004
Paper Title:
BUILDING A LARGE ANNOTATED CORPUS OF ENGLISH: THE PENN TREEBANK
BUILDING A LARGE ANNOTATED CORPUS OF ENGLISH: THE PENN TREEBANK
Authors: Mitchell P. Marcus and Mary Ann Marcinkiewicz and Beatrice Santorini
Primarily assigned technology terms:
- algorithm
- atis
- automatic construction
- bootstrap
- bracketing
- cd-rom
- classification
- computational linguistics
- deterministic parser
- emacs
- information system
- language processing
- language understanding
- learning
- matching
- message understanding
- modelling
- natural language processing
- nlp
- parser
- parsing
- part-of-speech tagging
- pattern matching
- pos tagging
- processing
- recognition
- search
- searching
- speech recognition
- spoken language systems
- spoken language understanding
- stochastic parsing
- syntactic analysis
- syntactic annotation
- syntactic parser
- taggers
- tagging
- tagging process
- text understanding
Other assigned terms:
- adjective
- adverb
- ambiguity
- ambiguous words
- american english
- annotated corpora
- annotated corpus
- annotation
- annotation scheme
- annotation schemes
- annotation task
- annotator
- annotators
- approach
- association for computational linguistics
- attachment site
- benchmark
- brown corpus
- case
- chunk
- chunks
- composition
- contextual information
- corpora
- data consortium
- determiner
- device
- distribution
- emacs editor
- error rate
- fact
- feature
- foreign word
- genre
- grammar
- grammars
- grammatical coverage
- grammatical structure
- human annotators
- intention
- labeling
- large corpora
- lexical item
- lexical redundancy
- lexicon
- linguistic
- linguistic context
- linguistic data
- linguistic data consortium
- linguistic theory
- linguistics
- lisp
- main verb
- manual tagging
- mapping
- measure
- measures
- mechanisms
- message
- message understanding conference
- methodology
- modifier
- morpheme
- morphemes
- muc-3
- natural language
- noun phrase
- noun phrases
- nouns
- parallel corpus
- parse
- parse tree
- parsed corpus
- parser output
- parsing models
- part of speech
- part-of-speech
- particle
- particles
- past participle
- penn treebank
- penn treebank corpus
- penn treebank project
- penn treebank tagset
- permutation
- personal pronoun
- personal pronouns
- phrase
- pos information
- pos tag
- pragmatic information
- predeterminer
- predicate-argument
- predicate-argument structure
- predicate-argument structures
- predicates
- preposition
- prepositional phrase
- prepositional phrases
- prepositions
- preprocessor
- process
- pronoun
- pronouns
- proper noun
- punctuation
- reflexive pronoun
- relative clauses
- representations
- sbar
- semantic
- sentence
- sentences
- signal
- skeletal syntactic structure
- sparse data
- spoken language
- statistical models
- subcorpus
- subordinate clauses
- symbol
- symbols
- syntactic categories
- syntactic category
- syntactic context
- syntactic function
- syntactic information
- syntactic representation
- syntactic structure
- tagged corpus
- tagged text
- tagging task
- tags
- tagset
- text
- theoretical linguistics
- theories
- theory
- tokens
- topics
- training
- training material
- transcripts
- transitivity
- translations
- tree
- tree structure
- treebank
- treebank corpus
- treebank project
- trees
- unannotated text
- understanding
- verb
- verb lexicon
- vocabulary
- wh-determiner
- word
- words