ACL RD-TEC 1.0 Summarization of P05-2021
Paper Title:
SPEECH RECOGNITION OF CZECH—INCLUSION OF RARE WORDS HELPS
SPEECH RECOGNITION OF CZECH—INCLUSION OF RARE WORDS HELPS
Authors: Petr Podvesky and Pavel Machek
Primarily assigned technology terms:
- automaton
- continuous speech recognition
- decoder
- error rate reduction
- htk toolkit
- language model automaton
- language modeling
- large vocabulary continuous speech recognition
- linear interpolation
- modeling
- morphological generation
- morphology
- normalization
- pruning
- rate reduction
- recognition
- recognition systems
- search
- smoothing
- speech recognition
- speech recognition systems
- state automaton
- tagger
- transducer
- transducers
- tuning
- viterbi
- vocabulary adaptation
- vocabulary selection
Other assigned terms:
- acoustic model
- acoustic models
- approach
- back-off model
- baseline model
- bigram
- break
- broadcast news
- broadcast news data
- case
- continuous speech
- corpora
- czech morphology
- czech national corpus
- dependency treebank
- dictionaries
- dictionary
- disk
- distribution
- domain knowledge
- error rate
- estimation
- events
- fact
- free word order
- generation
- interpolation
- knowledge
- language model
- language models
- large corpora
- lattice
- lattices
- lemma
- mapping
- maps
- measure
- method
- model size
- morph
- morphemes
- names
- oracle
- prague dependency treebank
- probability
- probability distribution
- procedure
- recognition accuracy
- recognition phase
- relative frequency
- russian
- search space
- sentences
- signal
- speech corpora
- stems
- tags
- terms
- test data
- test set
- text
- text corpora
- text corpus
- tokens
- toolkit
- training
- training corpus
- training data
- transcribed speech
- transcriptions
- treebank
- triphone
- uniform distribution
- unigram
- unigram model
- unigram probability
- utterance
- viterbi criterion
- vocabulary
- vocabulary growth
- word
- word error rate
- word order
- word sequences
- words