ACL RD-TEC 1.0 Summarization of A00-1012
Paper Title:
EXPERIMENTS ON SENTENCE BOUNDARY DETECTION
EXPERIMENTS ON SENTENCE BOUNDARY DETECTION
Authors: Mark Stevenson and Robert Gaizauskas
Primarily assigned technology terms:
- algorithm
- asr system
- automatic speech recognition
- automatic system
- boundary detection
- capitalization
- classification
- complementary evaluation
- computing
- content analysis
- dialogue analysis
- disambiguation
- disambiguation problem
- discourse analysis
- frequency estimation
- information extraction
- information retrieval
- lazy learning
- learning
- learning algorithm
- linguistic analysis
- machine learning
- machine learning algorithm
- maximum entropy
- maximum entropy approach
- memory-based learning
- memory-based learning algorithm
- neural network
- nlp
- nlp technology
- parsers
- parsing
- qualitative evaluation
- recognition
- recognition systems
- sense disambiguation
- sentence boundary detection
- sentence splitting
- speech recognition
- speech recognition systems
- splitting
- tagger
- taggers
- tagging
- transcription
- trigram speech recognition
- word sense disambiguation
Other assigned terms:
- annotation
- annotator
- annotators
- approach
- asr output
- boundary information
- break
- british national corpus
- broadcast news
- brown corpus
- capitalization information
- case
- case information
- characters
- classification task
- classification tasks
- computational approach
- detection task
- disambiguation task
- discourse
- entropy
- error rate
- estimation
- evaluation metrics
- feature
- human annotation
- human annotators
- human performance
- input text
- inter-annotator agreement
- kappa
- kappa statistic
- kappa value
- knowledge
- lexical information
- linguistic
- markup
- measure
- measures
- method
- methodology
- opinion
- part of speech
- part of speech tags
- pause
- penn treebank
- phoneme
- phrase
- pitch
- pre-pausal lengthening
- precision
- priori
- probabilities
- probability
- probability distributions
- process
- prosodic information
- punctuation
- punctuation marks
- recognition model
- sense disambiguation problem
- sentence
- sentence boundaries
- sentence boundary
- sentences
- speech information
- speech tag
- spoken language
- statistic
- statistics
- suffix
- symbol
- symbols
- tags
- technologies
- technology
- television
- terms
- test corpus
- test data
- test set
- text
- tokens
- training
- training corpus
- training example
- training examples
- training text
- transcribed speech
- transcriptions
- transcripts
- tree
- treebank
- trigram
- trigram model
- vocabulary
- wall street journal text
- word
- word boundaries
- word boundary
- word error rate
- word sense
- words