ACL RD-TEC 1.0 Summarization of I05-4010
Paper Title:
HARVESTING THE BITEXTS OF THE LAWS OF HONG KONG FROM THE WEB
HARVESTING THE BITEXTS OF THE LAWS OF HONG KONG FROM THE WEB
Authors: Chunyu Kit and Xiaoyue Liu and KingKui Sin and Jonathan J. Webster
Primarily assigned technology terms:
- automatic recognition
- automatic translation
- bilingual terminology
- crawler
- data annotation
- data exchange
- encoding
- example-based machine translation
- harvesting
- identification
- information retrieval
- information system
- java
- learning
- learning techniques
- machine learning
- machine learning techniques
- machine translation
- machine translation technology
- matching
- mining
- nlp
- recognition
- statistical translation
- terminology
- text alignment
- text extraction
- translation technology
- word alignment
- xml markup
Other assigned terms:
- aligned corpus
- alignment procedure
- american national corpus
- anchor
- anchors
- annotation
- bilingual corpora
- bilingual corpus
- bilingual lexicon
- bilingual text
- bitext
- break
- characters
- chinese characters
- chinese language
- chinese-english language pair
- community
- corpora
- corpus size
- data sets
- english language
- exact match
- feature
- hansard corpus
- hierarchical structure
- knowledge
- language pair
- language pairs
- lexical resources
- lexicon
- linguistic
- linguistic data
- markup
- methodology
- names
- nlp community
- paragraph
- paragraphs
- parallel corpora
- parallel corpus
- parallel texts
- penn treebank
- penn treebank corpus
- procedure
- schema
- statistics
- tags
- technology
- term
- terms
- text
- text collection
- text structure
- training
- training data
- translation knowledge
- translation models
- translation quality
- treebank
- treebank corpus
- web page
- web pages
- word
- words
- xml format
- xml schema