ACL RD-TEC 1.0 Summarization of W04-3210
Paper Title:
AUTOMATIC PARAGRAPH IDENTIFICATION: A STUDY ACROSS LANGUAGES AND DOMAINS
AUTOMATIC PARAGRAPH IDENTIFICATION: A STUDY ACROSS LANGUAGES AND DOMAINS
Authors: Caroline Sporleder and Mirella Lapata
Primarily assigned technology terms:
- algorithm
- automatic method
- automatic speech recognition
- boosting
- boosting algorithm
- boundary detection
- boundary identification
- boundary insertion
- categorisation
- classification
- classifier
- classifiers
- combined classifier
- editing
- encoding
- graph search
- identification
- information extraction
- language modelling
- language processing
- learner
- learning
- learning approach
- learning methods
- learning system
- machine learning
- machine learning approach
- machine translation
- modelling
- morphology
- multi-document summarisation
- natural language processing
- nlp
- paragraph boundary identification
- paragraph formation
- paragraph identification
- paragraph insertion
- parser
- parsers
- parsing
- part-of-speech tagging
- pre-processing
- predictor
- processing
- recogniser
- recognition
- recognition systems
- search
- search algorithm
- segmentation
- sentence splitting
- speech recognition
- speech recognition systems
- speech-to-text
- splitting
- summarisation
- supervised learning
- tagging
- text categorisation
- text segmentation
- text simplification
- text-segmentation
- texttiling
- topic segmentation
Other assigned terms:
- abbreviation
- anaphora
- anaphora structure
- anaphors
- annotation
- approach
- authorship
- authorship attribution
- break
- broadcast news
- case
- characters
- chunks
- classification accuracy
- classification task
- co-occurrence
- coefficient
- content words
- corpora
- cue words
- data set
- data sets
- development set
- device
- dialogues
- discourse
- distribution
- english corpus
- entropy
- error rate
- europarl corpus
- fact
- feature
- generation
- german corpus
- human performance
- hypotheses
- kappa
- kappa coefficient
- knowledge
- language model
- language models
- leaf
- manual annotation
- measure
- measures
- method
- n-gram
- n-gram models
- named entities
- names
- natural language
- news corpus
- orthography
- paragraph
- paragraph length
- paragraphs
- parse
- parse tree
- part-of-speech
- part-of-speech tags
- penn treebank
- prediction task
- probability
- process
- pronoun
- punctuation
- punctuation mark
- punctuation marks
- root node
- segments
- semantic
- sentence
- sentence boundaries
- sentences
- source language
- stems
- style
- syntactic features
- tags
- target language
- term
- terms
- test corpus
- test set
- text
- toolkit
- topics
- training
- training data
- training examples
- training set
- training size
- transcripts
- tree
- treebank
- unigram
- vocabulary
- word
- word co-occurrence
- word features
- word lists
- word order
- words
- writing system
- written texts