ACL RD-TEC 1.0 Summarization of P98-1050
Paper Title:
MULTEXT-EAST: PARALLEL AND COMPARABLE CORPORA AND LEXICONS FOR SIX CENTRAL AND EASTERN EUROPEAN LANGUAGES
MULTEXT-EAST: PARALLEL AND COMPARABLE CORPORA AND LEXICONS FOR SIX CENTRAL AND EASTERN EUROPEAN LANGUAGES
Authors: Ludmila Dimitrova and Tomaz Erjavec and Nancy Ide and Heiki Jaan Kaalep and Vladimir Petkevic and Dan Tufis
Primarily assigned technology terms:
- capitalization
- classification
- corpus annotation
- corpus development
- decomposition
- dictionary look-up
- disambiguation
- electronic text encoding
- encoding
- language engineering
- markup language
- pos tagging
- processing
- resource development
- segmenter
- splitting
- tagging
- text encoding
- text representation
- tokenizer
- word splitting
Other assigned terms:
- abbreviations
- ambiguity
- annotation
- cluster
- clusters
- comparable corpora
- comparable corpus
- compounding
- corpora
- corpus encoding standard
- data type
- dictionaries
- dictionary
- dictionary entries
- dictionary entry
- distribution
- document
- eagles
- feature
- homography
- inflected form
- input text
- interpretation
- language model
- lemma
- lexical items
- lexical specification
- lexicon
- linguistic
- linguistic corpora
- mapping
- mapping rules
- markup
- mechanisms
- methodology
- morphosyntactic description
- natural language
- paragraph
- parallel corpus
- part of speech
- part-of-speech
- phrase
- procedure
- process
- punctuation
- rule format
- sentence
- sentence boundaries
- sentence boundary
- sentence level
- sentences
- speech data
- statistical language model
- tags
- tagset
- terms
- text
- tokens
- translations
- word
- word compounding
- wordform
- words