Corpus oana-fa: the 1984 corpus, Farsi version – statistics and info
The annotatoed 1984 Persian Corpus in the MULTEXT-EAST Framework
Counts | |
---|---|
Tokens | 108437 |
Words | 95682 |
Sentences | 6605 |
Paragraphs | 1266 |
Documents | 1 |
General info | |
---|---|
Language | Persian |
Encoding | UTF-8 |
Compiled | 11/09/2015 15:38:07 |
Tagset doc | Description |
Infolink | More info |
Lexicon sizes | |
---|---|
word | 11322 |
tag | 428 |
lemma | 6612 |
rtag | 12 |
lc | 11322 |
lemma_lc | 6612 |
Structures and attributes
- doc 1
-
id 1
-
- p 1266
-
id 1266
-
- s 6605
-
id 6605
-
- g 14832