FASpell Dataset
FASpell dataset was developed for the evaluation of spell checking methods. It contians a set of pair of misspelt Persian words and their corresponding corrected forms similar to the ASpell dataset used for English.
The dataset consists of two parts:
- faspell_main: list of 5050 pairs collected from mistakes made by elementary school pupils and professional typists.
- faspell_ocr: list of 800 pairs collected from the output of a Farsi OCR system.
Obataining the data
To download the data set please visit LINDAT/CLARIN Repository.
For further information about the data and the obtained performances see:
-
- Barari, L., & QasemiZadeh, B. (2005). CloniZER spell checker adaptive language independent spell checker. In AIML 2005 Conference CICC, Cairo, Egypt (pp. 65-71).
-
- QasemiZadeh, B., Ilkhani, A., & Ganjeii, A. (2006, June). Adaptive language independent spell checking using intelligent traverse on a tree. In Cybernetics and Intelligent Systems, 2006 IEEE Conference on (pp. 1-6). IEEE.