TOKENIZATION: Related Papers in ACL Anthology

Back to Document Index
Back to Term Index

Concordance view (Keyword-In-Context) for the term tokenization in the ACL ARC 2.0;
Concordance view in the ACL ARC 1.0 (Sketch Engine Service).

TOKENIZATION can be found in the following ACL ARC 1.0 documents (click to explore):

(ACL ID: A00-1032) language independent morphological analysis
(ACL ID: A00-1033) a divide-and-conquer strategy for shallow parsing of german free texts
(ACL ID: A00-2035) tagging sentence boundaries
(ACL ID: A00-2040) a finite state and data-oriented method for grapheme to phoneme conversion
(ACL ID: A92-1018) a practical part-of-speech tagger
(ACL ID: A92-1047) lexical processing in the clare system
(ACL ID: A94-1030) improving chinese tokenization with linguistic filters on statistical lexical acquisition
(ACL ID: A97-1051) mixed-initiative development of language processing systems
(ACL ID: C00-1020) a client/server architecture for word sense disambiguation
(ACL ID: C00-2095) a formalism for universal segmentation of text
(ACL ID: C00-2126) word order acquisition from corpora
(ACL ID: C00-2144) thistle and interarbora
(ACL ID: C00-2152) an integrated architecture for example-based machine translation
(ACL ID: C00-2168) acquisition of a language computational model for nlp
(ACL ID: C02-1002) a cheap and fast way to build useful translation lexicons
(ACL ID: C02-1021) (semi-)automatic detection of errors in pos-tagged corpora
(ACL ID: C02-1094) integrating linguistic and performance-based constraints for assigning phrase breaks
(ACL ID: C02-2005) scaled log likelihood ratios for the detection of abbreviations in text corpora
(ACL ID: C04-1018) playing the telephone game
(ACL ID: C04-1021) modern natural language interfaces to databases
(ACL ID: C04-1075) a high-performance coreference resolution system using a constraint-based multi-agent strategy
(ACL ID: C04-1109) discriminative slot detection using kernel methods
(ACL ID: C04-1156) knowledge intensive word alignment with knowa
(ACL ID: C04-1163) a semantic-based approach to interoperabiltity of classification hierarchies
(ACL ID: C92-1033) ttp
(ACL ID: C92-1063) the typology of unknown words
(ACL ID: C92-4173) tokenization as the initial phase in nlp
(ACL ID: C96-1020) beyond skeleton parsing
(ACL ID: C96-2136) context-based spelling correction for japanese ocr
(ACL ID: E06-1032) re-evaluation the role of bleu in machine translation research
(ACL ID: E06-1047) parsing arabic dialects
(ACL ID: E06-1051) exploiting shallow linguistic information for relation extraction from biomedical literature
(ACL ID: E06-2024) a suite of shallow processing tools for portuguese
(ACL ID: E95-1010) text alignment in the real world
(ACL ID: E99-1001) named entity recognition without gazetteers
(ACL ID: E99-1013) complementing wordnet with roget's and corpus-based thesauri for information retrieval
(ACL ID: E99-1026) japanese dependency structure analysis based on maximum entropy models
(ACL ID: E99-1045) encoding a parallel corpus for automatic terminology extraction
(ACL ID: H01-1038) integrated feasibility experiment for bio-security
(ACL ID: H91-1068) fast text processing for information retrieval
(ACL ID: H93-1037) lingstat
(ACL ID: H94-1029) the automatic component of the lingstat machine-aided translation system
(ACL ID: H94-1050) weighted rational transductions and their application to human language processing
(ACL ID: H94-1096) the automatic component of the lingstat machine-aided translation system
(ACL ID: I05-2038) syntax annotation for the genia corpus
(ACL ID: I05-5003) using machine translation evaluation techniques to determine sentence-level semantic equivalence
(ACL ID: J01-2004) probabilistic top-down parsing and language modeling
(ACL ID: J01-4004) a machine learning approach to coreference resolution of noun phrases
(ACL ID: J02-2002) the combinatory morphemic lexicon
(ACL ID: J03-3002) articles the web as a parallel corpus
(ACL ID: J03-4001) dependency parsing with an extended finite-state approach
(ACL ID: J05-3002) sentence fusion for multidocument news summarization
(ACL ID: J92-1002) an estimate of an upper bound for the entropy of english
(ACL ID: J94-3004) the reconstruction engine
(ACL ID: J96-3004) a stochastic finite-state word-segmentation algorithm for chinese
(ACL ID: J97-2002) adaptive multilingual sentence boundary disambiguation
(ACL ID: J97-3001) a rule-based hyphenator for modern greek
(ACL ID: J97-3003) automatic rule induction for unknown-word guessing
(ACL ID: J97-4004) critical tokenization and its properties
(ACL ID: M91-1023) gte
(ACL ID: M91-1031) synchronetics
(ACL ID: M92-1030) mitre-bedford
(ACL ID: M93-1013) mitre-bedford
(ACL ID: M93-1021) unisys
(ACL ID: M95-1008) knight-ridder information's value adding name finder
(ACL ID: M95-1009) lockheed martin
(ACL ID: M95-1012) mitre
(ACL ID: M95-1014) the nyu system for muc-6 or where's the syntax?
(ACL ID: M95-1015) university of pennsylvania
(ACL ID: M95-1016) description of the saic dx systemasused for muc-6
(ACL ID: M98-1016) description of the kent ridge digital labs system used for muc-7
(ACL ID: M98-1018) nyu
(ACL ID: M98-1028) appendix e
(ACL ID: M98-1029) appendix f
(ACL ID: N01-1025) chunking with support vector machines
(ACL ID: N01-1029) a structured language model based on context-sensitive probabilistic left-corner parsing
(ACL ID: N03-1018) a generative probabilistic ocr model for nlp applications
(ACL ID: N04-1008) automatic question answering
(ACL ID: N04-1013) speed and accuracy in shallow and deep stochastic parsing
(ACL ID: N04-2002) identifying chemical names in biomedical text
(ACL ID: N04-2007) a preliminary look into the use of named entity information for bioscience text tokenization
(ACL ID: N04-3008) senseclusters - finding clusters that represent word senses
(ACL ID: N04-4026) a unigram orientation model for statistical machine translation
(ACL ID: N04-4038) automatic tagging of arabic text
(ACL ID: N06-1055) semantic role labeling of nominalized predicates in chinese
(ACL ID: N06-2013) arabic preprocessing schemes for statistical machine translation
(ACL ID: N06-2035) weblog classification for fast splog filtering
(ACL ID: N06-2038) a comparison of tagging strategies for statistical information extraction
(ACL ID: N06-4005) aqualog
(ACL ID: P01-1002) invited talk
(ACL ID: P01-1041) japanese named entity recognition based on a simple rule generator and decision tree learning
(ACL ID: P01-1058) evaluating cetempúblico, a free resource for portuguese
(ACL ID: P01-1069) text chunking using regularized winnow
(ACL ID: P02-1056) an integrated archictecture for shallow and deep processing
(ACL ID: P03-1011) loosely tree-based alignment for machine translation
(ACL ID: P03-1015) combining deep and shallow approaches in parsing german
(ACL ID: P03-1023) coreference resolution using competition learning approach
(ACL ID: P03-1037) parametric models of linguistic count data
(ACL ID: P03-2019) integrating information extraction and automatic hyperlinking
(ACL ID: P04-1004) analysis of mixed natural and symbolic input in mathematical dialogs
(ACL ID: P04-1021) a joint source-channel model for machine transliteration
(ACL ID: P04-1025) extracting regulatory gene expression networks from pubmed
(ACL ID: P04-1030) head-driven parsing for word lattices
(ACL ID: P04-1057) error mining for wide-coverage grammar engineering
(ACL ID: P04-3014) improving bitext word alignments via syntax-based reordering of english
(ACL ID: P05-1046) unsupervised learning of field segmentation models for information extraction
(ACL ID: P05-1052) extracting relations with integrated information using kernel methods
(ACL ID: P05-1064) a phonotactic language model for spoken language identification
(ACL ID: P05-1069) a localized prediction model for statistical machine translation
(ACL ID: P05-1071) arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop
(ACL ID: P05-1073) joint learning improves semantic role labeling
(ACL ID: P05-1075) a nonparametric method for extraction of candidate phrasal terms
(ACL ID: P05-3009) the linguist’s search engine
(ACL ID: P06-1001) combination of arabic preprocessing schemes for statistical machine translation
(ACL ID: P06-1015) espresso
(ACL ID: P06-1016) modeling commonality among related classes in relation extraction
(ACL ID: P06-1042) error mining in parsing results
(ACL ID: P06-1078) incorporating speech recognition confidence into discriminative named entity recognition of speech data
(ACL ID: P06-1090) a clustered global phrase reordering model for statistical machine translation
(ACL ID: P06-1091) a discriminative global training algorithm for statistical mt
(ACL ID: P06-1108) event extraction in a plot advice agent
(ACL ID: P06-2005) a phrase-based statistical model for sms text normalization
(ACL ID: P06-2006) evaluating the accuracy of an unlexicalized statistical parser on the parc depbank
(ACL ID: P06-2024) towards a modular data model for multi-layer annotated corpora
(ACL ID: P06-2058) obfuscating document stylometry to preserve author anonymity
(ACL ID: P06-2105) a logic-based semantic approach to recognizing textual entailment
(ACL ID: P06-2114) sinhala grapheme-to-phoneme conversion and rules for schwa epenthesis
(ACL ID: P06-3008) discursive usage of six chinese punctuation marks
(ACL ID: P06-4018) nltk
(ACL ID: P89-1012) dictionaries, dictionary grammars and dictionary entry parsing
(ACL ID: P92-1021) lattice-based word identification in clare
(ACL ID: P94-1002) multi-paragraph segmentation expository text
(ACL ID: P96-1015) directed replacement
(ACL ID: P97-1039) a portable algorithm for mapping bitext correspondence
(ACL ID: P97-1063) a word-to-word model of translational equivalence
(ACL ID: P98-1030) terminology finite-state preprocessing for computational lfg
(ACL ID: P98-1076) one tokenization per source
(ACL ID: P98-2189) ranking text units according to textual saliency, connectivity and topic aptness
(ACL ID: P99-1046) statistical models for topic segmentation
(ACL ID: W00-0409) multi-document summarization by visualizing topical content
(ACL ID: W00-0504) mandarin-english information (mei)
(ACL ID: W00-0506) pre-processing closed captions for machine translation
(ACL ID: W00-0602) some challenges of developing fully-automated systems for taking audio comprehension exams
(ACL ID: W00-1103) use of dependency tree structures for the microcontext extraction
(ACL ID: W00-1104) semantic indexing using wordnet senses
(ACL ID: W00-1205) sinica treebank
(ACL ID: W00-1217) how should a large corpus be built?-a comparative study of closure in annotated newspaper corpora from two chinese sources, towards building a larger representative corpus merged from representative sublanguage collections
(ACL ID: W00-1302) what's yours and what's mine
(ACL ID: W00-1320) a statistical model for parsing and word-sense disambiguation
(ACL ID: W01-1011) gist-it
(ACL ID: W01-1312) a multilingual approach to annotating and extracting temporal information
(ACL ID: W01-1409) building a statistical machine translation system from scratch
(ACL ID: W01-1412) a comparative study on translation units for bilingual lexicon extraction
(ACL ID: W02-0606) unsupervised discovery of morphologically related words based on orthographic and semantic similarity
(ACL ID: W02-0814) evaluating the results of a memory-based word-expert approach to unrestricted word sense disambiguation
(ACL ID: W02-0903) boosting automatic lexical acquisition with morphological information
(ACL ID: W02-1013) from words to corpora
(ACL ID: W02-1031) the superarv language model
(ACL ID: W02-1302) towards a road map on human language technology
(ACL ID: W02-1506) adapting existing grammars
(ACL ID: W02-1817) automatic recognition of chinese unknown words based on roles tagging
(ACL ID: W03-0308) treq-al
(ACL ID: W03-0504) summarization of noisy documents
(ACL ID: W03-0801) the talent system
(ACL ID: W03-0806) blueprint for a high performance nlp infrastructure
(ACL ID: W03-0810) accelerating corporate research in the development, application, and deployment of human language technologies
(ACL ID: W03-0909) surfaces and depths in text understanding
(ACL ID: W03-1211) question answering on a case insensitive corpus
(ACL ID: W03-1301) gene name extraction using flybase resources
(ACL ID: W03-1309) protein name tagging for biomedical annotation in text
(ACL ID: W03-1315) an investigation of various information sources for classifying biological names
(ACL ID: W03-1606) normalization and paraphrasing using symbolic methods
(ACL ID: W03-1701) unsupervised training for overlapping ambiguity resolution in chinese word segmentation
(ACL ID: W03-1731) chunking-based chinese word tokenization
(ACL ID: W03-2008) natural language analysis of patent claims
(ACL ID: W04-0407) representation and treatment of multiword expressions in basque
(ACL ID: W04-0409) integrating morphology with multi-word expression processing in turkish
(ACL ID: W04-0804) senseval-3 task
(ACL ID: W04-0810) a first evaluation of logic form identification systems
(ACL ID: W04-0821) the university of maryland senseval-3 system descriptions
(ACL ID: W04-0850) the duluth lexical sample systems in senseval-3
(ACL ID: W04-1111) a statistical model for hangeul-hanja conversion in terminology domain
(ACL ID: W04-1221) biomedical named entity recognition using conditional random fields and rich feature sets
(ACL ID: W04-1601) computer processing of arabic script-based languages. current state and future directions
(ACL ID: W04-1602) developing an arabic treebank
(ACL ID: W04-1606) issues in arabic orthography and morphology analysis
(ACL ID: W04-1609) an unsupervised approach for bootstrapping arabic sense tagging
(ACL ID: W04-1703) nlp-based scripting for call activities
(ACL ID: W04-1806) automatically inducing ontologies from corpora
(ACL ID: W04-2602) towards full automation of lexicon construction
(ACL ID: W04-2710) annotating wordnet
(ACL ID: W04-3101) a resource for constructing customized test suites for molecular biology entity identification systems
(ACL ID: W04-3111) integrated annotation for biomedical information extraction
(ACL ID: W04-3217) automatic analysis of plot for story rewriting
(ACL ID: W04-3238) spelling correction as an iterative process that exploits the collective knowledge of web users
(ACL ID: W04-3245) from machine translation to computer assisted translation using finite-state models
(ACL ID: W05-0101) teaching applied natural language processing
(ACL ID: W05-0110) teaching language technology at the north-west university
(ACL ID: W05-0111) hands-on nlp for an interdisciplinary audience
(ACL ID: W05-0304) parallel entity and treebank annotation
(ACL ID: W05-0603) search engine statistics beyond the n-gram
(ACL ID: W05-0705) modifying a natural language processing system for european languages to treat arabic in information processing and information retrieval applications
(ACL ID: W05-0706) choosing an optimal architecture for segmentation and pos-tagging of modern hebrew
(ACL ID: W05-0708) pos tagging of dialectal arabic
(ACL ID: W05-0804) bilingual word spectral clustering for statistical machine translation
(ACL ID: W05-0822) portage
(ACL ID: W05-0826) combining linguistic data views for phrase-based smt
(ACL ID: W05-0903) preprocessing and normalization for automatic evaluation of machine translation
(ACL ID: W05-1306) corpus design for biomedical natural language processing
(ACL ID: W06-0115) the third international chinese language processing bakeoff
(ACL ID: W06-0117) france telecom r&d beijing word segmenter for sighan bakeoff 2006
(ACL ID: W06-1008) a fast and accurate method for detecting english-japanese parallel texts
(ACL ID: W06-1104) automatically creating datasets for measures of semantic relatedness
(ACL ID: W06-1317) classification of discourse coherence relations
(ACL ID: W06-1501) the hidden tag model
(ACL ID: W06-1650) automatically assessing review helpfulness
(ACL ID: W06-1658) entity annotation based on inverse index operations
(ACL ID: W06-1708) the problem of ontology alignment on the web
(ACL ID: W06-1905) keyword translation accuracy and cross-lingual question answering inchinese and japanese
(ACL ID: W06-1910) experiments adapting an open-domain question answering system to the geographical domain using scope-based resources
(ACL ID: W06-2205) recognition of synonyms by a lexical graph
(ACL ID: W06-2701) representing and querying multi-dimensional markup for question answering
(ACL ID: W06-2702) annotation and disambiguation of semantic types in biomedical text
(ACL ID: W06-2713) representing and accessing multilevel linguistic annotation using the meaning format
(ACL ID: W06-2716) layering and merging linguistic annotations
(ACL ID: W06-2914) word distributions for thematic segmentation in a support vector machine approach
(ACL ID: W06-2920) conll-x shared task on multilingual dependency parsing
(ACL ID: W06-3108) discriminative reordering models for statistical machine translation
(ACL ID: W06-3114) manual and automatic evaluation of machine translation between european languages
(ACL ID: W06-3115) ntt system description for the wmt2006 shared task
(ACL ID: W06-3118) portage
(ACL ID: W06-3303) term generalization and synonym resolution for biological abstracts
(ACL ID: W06-3306) human gene name normalization using text matching with automatically extracted synonym dictionaries
(ACL ID: W06-3312) postnominal prepositional phrase attachment in proteomics
(ACL ID: W06-3328) bootstrapping and evaluating named entity recognition in the biomedical domain
(ACL ID: W06-3711) ibm mastor system
(ACL ID: W06-3812) chinese whispers - an efficient graph clustering algorithm and its application to natural language processing problems
(ACL ID: W93-0109) the automatic acquisition of frequencies of verb subcategorization frames from tagged corpora
(ACL ID: W95-0114) compiling bilingual lexicon entries from a non-parallel english-chinese corpus
(ACL ID: W96-0101) using word class for part-of-speech disambiguation
(ACL ID: W97-0102) commercial implementation of text recognition tools for vlc
(ACL ID: W97-0312) learning to tag multilingual texts through observation
(ACL ID: W97-0502) automatic message indexing and full text retrieval for a communication aid
(ACL ID: W97-0909) nlp and industry
(ACL ID: W97-1008) what makes a word
(ACL ID: W97-1015) a comparative study of the application of different learning techniques to natural language interfaces
(ACL ID: W97-1508) lexical resource reconciliation in the xerox linguistic environment
(ACL ID: W98-0203) coreference as the foundations for link analysis over free text databases
(ACL ID: W98-0211) how to build a (quite general) linguistic diagram editor
(ACL ID: W98-1002) tagarab
(ACL ID: W98-1102) encoding linguistic corpora
(ACL ID: W98-1118) exploiting diverse knowledge sources via maximum entropy in named entity recognition
(ACL ID: W99-0313) a two-level approach to coding dialogue for discourse structure
(ACL ID: W99-0605) cross-language information retrieval for technical documents
(ACL ID: W99-0612) language independent named entity recognition combining morphological and contextual evidence
(ACL ID: W99-0634) corpus-based learning for noun phrase coreference resolution
(ACL ID: X96-1043) tipster text phase ii architecture design version 2.1p 19 june 1996
(ACL ID: X98-1004) the common pattern specification language
(ACL ID: X98-1010) coreference resolution strategies from an application perspective
(ACL ID: X98-1012) research in information extraction
(ACL ID: X98-1013) information extraction research and applications
(ACL ID: X98-1021) overview of the university of pennsylvania's tipster project
(ACL ID: X98-1023) improving robust domain independent summarization
(ACL ID: X98-1026) automated text summarization and the summarist system

* See also a list of some of the related terms to tokenization.

Back to Description Index