ACL RD-TEC 1.0 Summarization of J05-4005

Paper Title:
CHINESE WORD SEGMENTATION AND NAMED ENTITY RECOGNITION: A PRAGMATIC APPROACH

Authors: Jianfeng Gao and Mu Li and Andi Wu and Chang-Ning Huang

Primarily assigned technology terms:

Other assigned terms:

  • abbreviation
  • abbreviations
  • adaptation paradigm
  • affix
  • affixation
  • ambiguity
  • analogy
  • annotated corpus
  • annotated training corpora
  • annotated training corpus
  • annotated training set
  • annotation
  • annotators
  • approach
  • array
  • association for computational linguistics
  • automata
  • backoff
  • bigram
  • bigram model
  • bilingual dictionary
  • binary feature
  • binary features
  • case
  • character bigram model
  • character sequence
  • characters
  • chinese characters
  • chinese language
  • chinese nouns
  • chinese sentence
  • chinese text
  • chinese treebank
  • chinese word
  • chinese words
  • classification problem
  • co-occurrence
  • co-occurrence frequency
  • concept
  • concepts
  • context dependency
  • context information
  • context model
  • context models
  • contextual information
  • convergence
  • corpora
  • correlations
  • data set
  • data sets
  • decision rule
  • derivation
  • dictionaries
  • dictionary
  • distribution
  • document
  • document frequency
  • entropy
  • entropy models
  • error rate
  • estimation
  • evaluation measures
  • evaluation methodology
  • evaluations
  • experimental results
  • experimental setting
  • f-measure
  • fact
  • feature
  • feature value
  • generation
  • generation process
  • generative model
  • generative models
  • generative probability
  • gold test set
  • grammar
  • heuristic
  • heuristic rules
  • heuristics
  • human annotation
  • human annotators
  • hypothesis
  • implementation
  • input string
  • inverse document frequency
  • keyword
  • knowledge
  • language model
  • language models
  • language processing applications
  • large corpus
  • large training
  • lattice
  • lexical word
  • lexicon
  • lexicon entry
  • likelihood
  • linguist
  • linguistic
  • linguistic knowledge
  • linguistics
  • linguists
  • mapping
  • maximum entropy models
  • measure
  • measures
  • method
  • methodology
  • mixture models
  • model parameter
  • model parameters
  • model probability
  • morpheme
  • morpheme boundary
  • morphological rules
  • msr gold test
  • mutual information
  • n-gram
  • n-gram models
  • named entities
  • named entity
  • names
  • natural language
  • natural language processing applications
  • nlp applications
  • nlp tasks
  • nouns
  • open test
  • ordered list
  • organization names
  • parameter values
  • person names
  • plural noun
  • precision
  • probabilistic models
  • probabilities
  • probability
  • probability distribution
  • procedure
  • process
  • pronouns
  • pronunciation
  • punctuation
  • raw text corpus
  • regular expressions
  • relation
  • relative frequency
  • schema
  • search space
  • seed
  • segmentation bakeoff
  • segmented corpus
  • semantic
  • sentence
  • sentence boundaries
  • sentences
  • source language
  • statistical approach
  • statistical information
  • statistical models
  • statistical significance
  • statistics
  • stem
  • stems
  • stochastic model
  • style
  • substring
  • suffix
  • symbols
  • syntactic level
  • syntactic structure
  • system description
  • system evaluation
  • tags
  • taxonomy
  • term
  • term distribution
  • term frequency
  • terminals
  • terms
  • test corpus
  • test data
  • test set
  • text
  • text corpus
  • theory
  • time expressions
  • tokens
  • training
  • training and test data
  • training corpora
  • training corpus
  • training criterion
  • training data
  • training material
  • training samples
  • training set
  • training size
  • transformation
  • translations
  • tree
  • tree structures
  • treebank
  • trigram
  • trigram language model
  • unigram
  • upenn chinese treebank
  • vector space
  • verb
  • word
  • word boundaries
  • word boundary
  • word candidate
  • word classes
  • word formation
  • word lattice
  • word segmentation performance
  • word sequence
  • word type
  • word types
  • words
  • wrapper

Extracted Section Types:


This page last edited on 10 May 2017.

*** ***