Corpus ACL ARC 2.0 segmented, PoS tagged and cleaned (to an extent) – statistics and info

ACL ARC 2.0 pre-processed

Counts
Tokens96737944
Words78151628
Sentences4052234
Paragraphs0
Documents25504
General info
LanguageEnglish
EncodingUTF-8
Compiled05/01/2016 18:58:56
Tagset doc Description
Infolink More info
Lexicon sizes
word1049494
lemma975733
tag46
lc914330
lemma_lc888148

Structures and attributes

hide detail