ACL RD-TEC 1.0 Summarization of H05-1062
Paper Title:
ROBUST NAMED ENTITY EXTRACTION FROM LARGE SPOKEN ARCHIVES
ROBUST NAMED ENTITY EXTRACTION FROM LARGE SPOKEN ARCHIVES
Authors: Benoit Favre and Frédéric Bechet and Pascal Nocéra
Primarily assigned technology terms:
- asr search
- asr system
- asr transcription
- automatic speech recognition
- broadcast news transcription
- capitalization
- computational linguistics
- databases
- detection and tracking
- document retrieval
- entity extraction
- entity recognition
- entity recognition system
- extraction systems
- finite state
- finite state machine
- finite state machines
- human language
- human language technology
- indexing
- information extraction
- information extraction systems
- language learning
- language processing
- language technology
- learning
- maximum entropy
- measuring
- message understanding
- model adaptation
- modeling
- named entity extraction
- named entity recognition
- natural language learning
- natural language processing
- ne extraction
- ne tagger
- news transcription
- nlp
- numerical information extraction
- parsing
- processing
- recognition
- recognition system
- scoring
- search
- search process
- segmentation
- smoothing
- speaker variation
- speech recognition
- statistical tagger
- story segmentation
- story-based adaptation
- tagger
- topic detection
- topic detection and tracking
- transcription
- transducer
- transducers
- weighting
Other assigned terms:
- ambiguity
- ambiguity rate
- annotation
- approach
- asr output
- association for computational linguistics
- broadcast news
- broadcast news data
- composition
- corpora
- correlation
- development set
- distribution
- document
- document content
- document frequency
- document retrieval evaluation
- domain information
- entity type
- entity types
- entropy
- entropy models
- error rate
- evaluations
- events
- extraction process
- f-measure
- fact
- feature
- french
- grammars
- hmm-based model
- hypotheses
- hypothesis
- implementation
- index
- index terms
- inverse document frequency
- knowledge
- labeling
- language model
- language models
- language resources
- lattice
- lattices
- lexicon
- likelihood
- linguistics
- maximum entropy models
- measure
- measures
- message
- message understanding conferences
- metadata
- method
- n-best list
- n-gram
- n-gram model
- named entities
- named entity
- names
- natural language
- nist
- noise
- noisy input
- oracle
- part-of-speech
- part-of-speech tags
- pauses
- precision
- probability
- process
- proper names
- recognition errors
- retrieval task
- search space
- sentence
- sentence boundaries
- slot
- speech input
- statistical model
- symbols
- tag sequence
- tags
- tagset
- technology
- term
- terms
- test corpora
- test corpus
- test data
- test set
- testing data
- text
- text corpora
- toolkit
- topics
- training
- training and testing data
- training corpus
- training data
- training set
- transcript
- transcriptions
- transcripts
- trigram
- understanding
- vocabulary
- word
- word error rate
- word lattice
- word lattices
- word lists
- word sequence
- word sequences
- word string
- words
- written texts