ACL RD-TEC 1.0 Summarization of A97-1034
Paper Title:
USING SGML AS A BASIS FOR DATA-INTENSIVE NLP
USING SGML AS A BASIS FOR DATA-INTENSIVE NLP
Authors: David McKelvie and Chris Brew and Henry Thompson
Primarily assigned technology terms:
- binding
- corpus annotation
- corpus development
- corpus preparation
- corpus processing
- database
- disambiguation
- editing
- encoding
- error detection
- extraction system
- gate system
- hyperlinking
- identification
- indexing
- information extraction
- information extraction system
- interfaces
- language engineering
- language processing
- linguistic analysis
- linking
- markup language
- matching
- natural language processing
- nlp
- nlp system
- non-sgml database
- object-oriented
- parser
- parsers
- parsing
- part of speech tagging
- part-of-speech tagger
- pre-processing
- processing
- processing tools
- program interfaces
- programming language
- query language
- reading
- regular expression
- search
- search engine
- searching
- segmentation
- sequential corpus processing
- sgml parser
- software development
- solaris
- speech tagger
- speech tagging
- tagger
- tagging
- text editor
- tokenisation
- user interface
- validation
- version control
- word segmentation
Other assigned terms:
- annotated corpora
- annotated corpus
- annotation
- annotation scheme
- approach
- british national corpus
- case
- concept
- corpora
- data structures
- determiners
- disk
- distribution
- document
- document structure
- document type definition
- fact
- french
- generalisation
- generation
- hierarchical structure
- implementation
- index
- indexing scheme
- innovation corpus
- intention
- interoperability
- large corpora
- large scale corpora
- large text corpora
- lexicography
- lexicon
- linguistic
- linguistic annotation
- linguistic structures
- linguists
- mapping
- maptask corpus
- markup
- mechanisms
- method
- modular architecture
- names
- natural language
- nlp applications
- paragraphs
- part of speech
- part-of-speech
- part-of-speech tags
- penn treebank
- penn treebank tagset
- phrase
- process
- public domain software
- queries
- query
- regular expressions
- search results
- sentence
- sentence boundaries
- sentence boundary
- sentences
- sgml document
- sgml stream
- size of the corpus
- statistics
- structural information
- structured text
- style
- syntactic structures
- syntax
- system architecture
- tags
- tagset
- technology
- terms
- text
- text corpora
- tipster architecture
- transformation
- tree
- tree structures
- treebank
- user
- word
- word alignments
- word level
- words