ACL RD-TEC 1.0 Summarization of W97-0304
Paper Title:
TEXT SEGMENTATION USING EXPONENTIAL MODELS
TEXT SEGMENTATION USING EXPONENTIAL MODELS
Authors: Doug Beeferman and Adam Berger and John Lafferty
Primarily assigned technology terms:
- algorithm
- cd-rom
- continuous speech recognition
- decision tree
- decision tree algorithm
- decision trees
- detection and tracking
- document summarization
- dynamic time warping
- feature induction
- feature selection
- greedy algorithm
- hypothesizing
- identification
- illustration
- induction
- induction algorithm
- information retrieval
- iterative scaling
- language modeling
- language processing
- learning
- learning methods
- linear interpolation
- machine learning
- machine learning methods
- maximum entropy
- maximum likelihood
- modeling
- monitoring
- natural language processing
- normalization
- paragraph segmentation
- partitioning
- predictor
- processing
- pruning
- recognition
- reporting
- search
- segmentation
- segmentation algorithm
- segmenter
- smoothing
- speech processing
- speech recognition
- spelling
- spelling correction
- splitting
- stochastic process
- summarization
- text segmentation
- text tiling
- time warping
- tokenizer
- topic detection
- topic detection and tracking
- training algorithm
- tree algorithm
- viterbi
- viterbi search
Other assigned terms:
- anchors
- annotation
- approach
- backoff
- bag of words
- binary features
- broadcast news
- broadcast news corpus
- cache
- case
- co-occurrence
- cohesion
- concepts
- conditional probability
- content words
- continuous speech
- conversation
- corpora
- cosine measure
- data consortium
- dictionary
- discourse
- discourse units
- distribution
- document
- document length
- edit distance
- entropy
- error metric
- evaluation metric
- events
- experimental results
- exponential distribution
- exponential model
- f-measure
- fact
- feature
- feature-based approach
- geometric mean
- interpolation
- knowledge
- language model
- language models
- large corpora
- large corpus
- lexical cohesion
- lexical cohesiveness
- lexical features
- likelihood
- linear combination
- linguistic
- linguistic data
- linguistic data consortium
- linguistic features
- log-linear model
- measure
- measures
- method
- model probability
- mutual information
- n-grams
- natural language
- news corpus
- pairs of words
- paragraph
- paragraphs
- pauses
- personal pronoun
- phrase
- precision
- probabilities
- probability
- probability distribution
- probability distributions
- procedure
- process
- pronoun
- segment boundaries
- segment boundary
- segmentation problem
- segments
- semantic
- semantic network
- sentence
- sentence boundaries
- sentence level
- sentences
- size of the corpus
- statistic
- statistical approach
- statistical framework
- statistical model
- statistics
- string edit distance
- style
- symbol
- target sentence
- tdt corpus
- technique
- television
- term
- terms
- test data
- text
- text corpora
- text corpus
- text segments
- tokens
- topics
- training
- training and test data
- training data
- training set
- transcripts
- tree
- trees
- trigram
- trigram model
- uniform distribution
- user
- utterance
- vocabulary
- wall street journal corpus
- word
- word corpus
- word repetition
- words
- wsj corpus