ACL RD-TEC 1.0 Summarization of J03-3002

Paper Title:
ARTICLES THE WEB AS A PARALLEL CORPUS

Authors: Philip Resnik and Noah A. Smith

Primarily assigned technology terms:

Other assigned terms:

  • abbreviations
  • agreement score
  • anchor
  • anchors
  • annotation
  • annotator
  • annotators
  • approach
  • arabic text
  • arabic-english parallel corpus
  • association for computational linguistics
  • attribute-value pairs
  • axioms
  • basque
  • bilingual dictionary
  • bilingual lexicon
  • bilingual lexicons
  • bilingual text
  • bitext
  • case
  • characters
  • chunk
  • chunks
  • classification performance
  • classification task
  • cluster
  • co-occurrence
  • coefficient
  • community
  • computational complexity
  • computational linguists
  • confidence scores
  • content words
  • corpora
  • corpus size
  • correlation
  • correlation coefficient
  • cross-validation experiment
  • data consortium
  • data set
  • development set
  • dictionaries
  • dictionary
  • disk
  • distribution
  • document
  • document collections
  • document frequency
  • document length
  • document set
  • document structure
  • dutch
  • edit distance
  • english-chinese corpus
  • english-chinese parallel corpus
  • estimation
  • evaluation measures
  • evaluations
  • events
  • f measure
  • f score
  • f-measure
  • fact
  • feature
  • french
  • french translation
  • generation
  • generation process
  • generation system
  • genre
  • gold standard
  • heuristic
  • html document
  • human annotators
  • human judgment
  • human judgments
  • implementation
  • index
  • information sources
  • information theory
  • internet archive
  • inverse document frequency
  • joint probability
  • knowledge
  • language pair
  • language pairs
  • language resources
  • language-dependent knowledge
  • lemma
  • lexical translation
  • lexical word
  • lexicon
  • lexicon entries
  • linear regression model
  • linguistic
  • linguistic data
  • linguistic data consortium
  • linguistic knowledge
  • linguistic resources
  • linguistics
  • linguists
  • machine translation output
  • mapping
  • markup
  • matching process
  • mean average precision
  • meaning
  • measure
  • measures
  • mechanisms
  • method
  • multilingual corpus
  • multinomial distribution
  • mutual information
  • n-gram
  • named entities
  • names
  • natural language
  • noisy translation lexicon
  • paragraph
  • parallel corpora
  • parallel corpus
  • parallel text
  • parallel texts
  • pearson correlation coefficient
  • precision
  • prefixes and suffixes
  • probabilistic model
  • probabilities
  • probability
  • probability distribution
  • procedure
  • process
  • projection
  • punctuation
  • queries
  • random order
  • random sample
  • regression model
  • relation
  • representations
  • search space
  • seed
  • semantic
  • semantic network
  • sentence
  • sentence level
  • sentences
  • similarity measure
  • similarity score
  • size of the corpus
  • suffixes
  • tags
  • technique
  • terms
  • test collection
  • test set
  • text
  • text length
  • theory
  • tokens
  • training
  • training data
  • training material
  • translation lexicon
  • translation model
  • translation models
  • translation output
  • translation pair
  • translation pairs
  • translation probabilities
  • translation quality
  • translational equivalence
  • translations
  • tree
  • tree structures
  • trees
  • vertex
  • vocabulary
  • vocabulary size
  • web page
  • web pages
  • web site
  • web-based document
  • weighted edit distance
  • word
  • word frequency
  • word order
  • word pair
  • word-to-word translation model
  • wordnet
  • words

Extracted Section Types:



This page last edited on 10 May 2017.

*** ***