ACL RD-TEC 1.0 Summarization of W06-1701
Paper Title:
WEB-BASED FREQUENCY DICTIONARIES FOR MEDIUM DENSITY LANGUAGES
WEB-BASED FREQUENCY DICTIONARIES FOR MEDIUM DENSITY LANGUAGES
Authors: András Kornai and Péter Halácsy and Viktor Nagy and Csaba Oravecz and Viktor Trón and Dániel Varga
Primarily assigned technology terms:
- analyzer
- capitalization
- character encoding
- classification
- coding
- crawling
- database
- databases
- disambiguation
- encoding
- hmms
- language technology
- lemmatization
- matching
- maximum entropy
- morphological analysis
- morphological analyzer
- morphological disambiguation
- morphology
- nlp
- open source software
- pos tagger
- pos tagging
- preprocessing
- processing
- psycholinguistic experiment design
- reading
- recognition
- search
- spellchecker
- tagger
- tagging
- topic classification
- training process
- unsupervised training
- viterbi
- viterbi search
- word selection
Other assigned terms:
- ambiguity
- annotation
- authorship
- bias
- bigram
- brown corpus
- cache
- capitalization information
- case
- characters
- coefficient
- coherence
- community
- compounds
- corpora
- corpus size
- derivational morphology
- dictionaries
- dictionary
- disambiguation system
- disambiguation task
- distribution
- entropy
- fact
- frequency counts
- frequency distribution
- frequency list
- generative models
- genre
- grammaticality
- hypothesis
- inflectional morphology
- labeling
- language models
- language usage
- lemma
- lexicon
- linguistic
- linguistic data
- linguistic information
- linguistics
- long distance dependencies
- manual tagging
- markov models
- meanings
- measure
- measures
- method
- monolingual corpora
- morpheme
- morphemes
- morphological annotation
- morphological lexicon
- n-gram
- nlp tasks
- noun phrases
- nouns
- opennlp package
- parallel corpus
- pos information
- precision
- probabilistic model
- probabilities
- probability
- process
- punctuation
- quantifier
- queries
- query
- reuters corpus
- sentence
- sentences
- statistics
- stem
- stems
- style
- suffix
- surface form
- syntax
- tag model
- tagging model
- technology
- test corpus
- test set
- text
- theoretical linguistics
- theory
- tokens
- topics
- training
- training corpora
- training corpus
- training material
- trigram
- unigram
- vocabulary
- vowel
- web corpus
- word
- word form
- word frequency
- word usage
- wordform
- words