ACL RD-TEC 1.0 Summarization of I05-4010

Paper Title:
HARVESTING THE BITEXTS OF THE LAWS OF HONG KONG FROM THE WEB

Authors: Chunyu Kit and Xiaoyue Liu and KingKui Sin and Jonathan J. Webster

Other assigned terms:

  • aligned corpus
  • alignment procedure
  • american national corpus
  • anchor
  • anchors
  • annotation
  • bilingual corpora
  • bilingual corpus
  • bilingual lexicon
  • bilingual text
  • bitext
  • break
  • characters
  • chinese characters
  • chinese language
  • chinese-english language pair
  • community
  • corpora
  • corpus size
  • data sets
  • english language
  • exact match
  • feature
  • hansard corpus
  • hierarchical structure
  • knowledge
  • language pair
  • language pairs
  • lexical resources
  • lexicon
  • linguistic
  • linguistic data
  • markup
  • methodology
  • names
  • nlp community
  • paragraph
  • paragraphs
  • parallel corpora
  • parallel corpus
  • parallel texts
  • penn treebank
  • penn treebank corpus
  • procedure
  • schema
  • statistics
  • tags
  • technology
  • term
  • terms
  • text
  • text collection
  • text structure
  • training
  • training data
  • translation knowledge
  • translation models
  • translation quality
  • treebank
  • treebank corpus
  • web page
  • web pages
  • word
  • words
  • xml format
  • xml schema

Extracted Section Types:


This page last edited on 10 May 2017.

*** ***