ACL RD-TEC 1.0 Summarization of P03-1036
Paper Title:
UNSUPERVISED SEGMENTATION OF WORDS USING PRIOR DISTRIBUTIONS OF MORPH LENGTH AND FREQUENCY
UNSUPERVISED SEGMENTATION OF WORDS USING PRIOR DISTRIBUTIONS OF MORPH LENGTH AND FREQUENCY
Primarily assigned technology terms:
- algorithm
- analyzer
- approximation
- automatic segmentation
- expectation-maximization
- expectation-maximization algorithm
- grouping
- information retrieval
- language modelling
- language processing
- likelihood estimate
- maximum likelihood
- model optimization
- modelling
- morphological analysis
- morphology
- morphology discovery
- natural language processing
- nlp
- normalization
- optimization
- parser
- probabilistic method
- processing
- processor
- reasoning
- recognition
- recursive mdl
- recursive segmentation
- search
- search algorithm
- segmentation
- segmentation algorithm
- splitting
- statistical language modelling
- stochastic process
- terminology
- text segmentation
- two-level morphology
- unsupervised algorithm
- unsupervised segmentation
- viterbi
- viterbi algorithm
- word discovery
Other assigned terms:
- affixes
- allomorphy
- alphabet
- ambiguity
- approach
- bayesian framework
- brown corpus
- case
- characters
- coefficient
- conditional probabilities
- convergence
- corpora
- corpus size
- data set
- data sets
- density function
- distribution
- english corpus
- english language
- evaluation measure
- evaluation measures
- evaluation method
- f-measure
- fact
- frequency distribution
- generation
- generation process
- generative model
- hapax legomena
- implementation
- knowledge
- language model
- length distribution
- lexicon
- likelihood
- linguistic
- linguistic theory
- mapping
- mappings
- maximum likelihood estimate
- measure
- measures
- method
- minimum description length
- morph
- morpheme
- morphemes
- morphological lexicon
- morphological structure
- n-gram
- n-gram models
- natural language
- nlp applications
- orthography
- phonological rules
- poisson distribution
- precision
- prior probability
- priori
- probabilistic model
- probabilities
- probability
- probability density
- probability density function
- probability distribution
- probability distributions
- probability value
- procedure
- process
- punctuation
- punctuation marks
- recursive structure
- representations
- segments
- semantic
- semantic similarity
- standard terminology
- statistical language model
- stem
- stems
- suffix
- suffixes
- tags
- technique
- test set
- text
- text corpus
- theory
- tokens
- training
- understanding
- uniform distribution
- verb
- vocabulary
- word
- word form
- word sequences
- word types
- words