FASpell Dataset

FASpell dataset was developed for the evaluation of spell checking methods. It contians a set of pair of misspelt Persian words and their corresponding corrected forms similar to the ASpell dataset used for English.

The dataset consists of two parts:

  • faspell_main: list of 5050 pairs collected from mistakes made by elementary school pupils and professional typists.
  • faspell_ocr: list of 800 pairs collected from the output of a Farsi OCR system.

Obataining the data

To download the data set please visit LINDAT/CLARIN Repository.

For further information about the data and the obtained performances see:

