FASpell Dataset

FASpell dataset was developed for the evaluation of spell checking methods. It contians a set of pair of misspelt Persian words and their corresponding corrected forms similar to the ASpell dataset used for English.

The dataset consists of two parts:

faspell_main: list of 5050 pairs collected from mistakes made by elementary school pupils and professional typists.
faspell_ocr: list of 800 pairs collected from the output of a Farsi OCR system.

Obataining the data

To download the data set please visit LINDAT/CLARIN Repository.

For further information about the data and the obtained performances see:

- Barari, L., & QasemiZadeh, B. (2005). CloniZER spell checker adaptive language independent spell checker. In AIML 2005 Conference CICC, Cairo, Egypt (pp. 65-71).
- QasemiZadeh, B., Ilkhani, A., & Ganjeii, A. (2006, June). Adaptive language independent spell checking using intelligent traverse on a tree. In Cybernetics and Intelligent Systems, 2006 IEEE Conference on (pp. 1-6). IEEE.