ACL RD-TEC 1.0 Summarization of J05-4005

Paper Title:
CHINESE WORD SEGMENTATION AND NAMED ENTITY RECOGNITION: A PRAGMATIC APPROACH

Authors: Jianfeng Gao and Mu Li and Andi Wu and Chang-Ning Huang

Primarily assigned technology terms:

Other assigned terms:

abbreviation
abbreviations
adaptation paradigm
affix
affixation
ambiguity
analogy
annotated corpus
annotated training corpora
annotated training corpus
annotated training set
annotation
annotators
approach
array
association for computational linguistics
automata
backoff
bigram
bigram model
bilingual dictionary
binary feature
binary features
case
character bigram model
character sequence
characters
chinese characters
chinese language
chinese nouns
chinese sentence
chinese text
chinese treebank
chinese word
chinese words
classification problem
co-occurrence
co-occurrence frequency
concept
concepts
context dependency
context information
context model
context models
contextual information
convergence
corpora
correlations
data set
data sets
decision rule
derivation
dictionaries
dictionary
distribution
document
document frequency
entropy
entropy models
error rate
estimation
evaluation measures
evaluation methodology
evaluations
experimental results
experimental setting
f-measure
fact
feature
feature value
generation
generation process
generative model
generative models
generative probability
gold test set
grammar
heuristic
heuristic rules
heuristics
human annotation
human annotators
hypothesis
implementation
input string
inverse document frequency
keyword
knowledge
language model
language models
language processing applications
large corpus
large training
lattice
lexical word
lexicon
lexicon entry
likelihood
linguist
linguistic
linguistic knowledge
linguistics
linguists
mapping
maximum entropy models
measure
measures
method
methodology
mixture models
model parameter
model parameters
model probability
morpheme
morpheme boundary
morphological rules
msr gold test
mutual information
n-gram
n-gram models
named entities
named entity
names
natural language
natural language processing applications
nlp applications
nlp tasks
nouns
open test
ordered list
organization names
parameter values
person names
plural noun
precision
probabilistic models
probabilities
probability
probability distribution
procedure
process
pronouns
pronunciation
punctuation
raw text corpus
regular expressions
relation
relative frequency
schema
search space
seed
segmentation bakeoff
segmented corpus
semantic
sentence
sentence boundaries
sentences
source language
statistical approach
statistical information
statistical models
statistical significance
statistics
stem
stems
stochastic model
style
substring
suffix
symbols
syntactic level
syntactic structure
system description
system evaluation
tags
taxonomy
term
term distribution
term frequency
terminals
terms
test corpus
test data
test set
text
text corpus
theory
time expressions
tokens
training
training and test data
training corpora
training corpus
training criterion
training data
training material
training samples
training set
training size
transformation
translations
tree
tree structures
treebank
trigram
trigram language model
unigram
upenn chinese treebank
vector space
verb
word
word boundaries
word boundary
word candidate
word classes
word formation
word lattice
word segmentation performance
word sequence
word type
word types
words
wrapper

Extracted Section Types:

Download the PDF file from the ACL Anthology.
Brwose this paper on the University of Michigan CLAIR Group's interface.