Arabic Spell CheckersNatural Language Processing - CS465Supervised by:Dr. Amal Al-SaifDone by:Hanan Al-MohammadiMona Al-MutairiImam Muhammad ibn Saud University, Department ofComputer Science and Information System1
First Paper“An Approach for Analyzing and CorrectingSpelling Errors for Non-native Arabic learners”o Based on a questioning environment.
First Paper• Error DetectionTwo types of errors:1. Ill-formed word errors.o Buckwalter’s Arabic Morphological analyzer .Ex. ‘ ’ is ill-formed of word ‘ ’2. Semantically incorrect errors.Ex. If a spelling question displays a happy face to a learnerand asks him to write a word which describes this pictureand he enter ’ ’/helped instead of ’ ’/happy
First Paper• Error CorrectionEdit distance technique.• Filtering1. Morphological Analyzer Filter.Ex. After applying Correction techniques on word ‘ ’, ‘ ’appears as correction. So, Morphological filter will exclude it.2. Gloss Filter.Ex. If user misspelled word ’ ’/happy with ’ ’ (the second letter’ ’ is incorrectly replaced by the short vowel Fatha). applying Correctiontechniques will result two possible word corrections: ’ ’/happy and’ ’/helped, Both are valid Arabic words. Apply gloss filter willexclude word ’ ’/helped.
First Paper• Evaluation:Done using real test data composed of 190 misspelled words and includeboth single and multi-error misspellings composed of up to three errors perword. Average word length is 5 letters per word.• Result80+% recall and 90+% precision were achieved for each type of spellingerror.
Second Paper“Towards Automatic Spell Checking forArabic”• Composed of Arabic morphologicalanalyzer, lexicon, spelling detector, and spellingcorrector.• Spelling detection• Two possibilities :1. The misspelled word is an invalid word, Ex. ‘ ’ for‘ ’2. The misspelled word is a valid word , Ex. ‘ ’ inplace of ‘ ’
Second Paper• Spelling correction:• Add missing character: the candidates of the misspelled ‘ ’ are‘ ’, ‘ ’ and ‘ ’• Replace incorrect character: the candidates of the misspelled " " are" ", " and " ".• Remove excessive character: the candidates of the misspelled word" " are " ", " ".• Add a space to split words: the candidates of the misspelled word " "are " ", " ".• Arabic morphological analyzer• Broke down the inflected word ‘ ’ into the prefix‘ , the suffix ‘ , and the stem ‘ ’. Then check the stemlexicon, if has entry in the lexicon stem is correct.
Second Paper• Evaluation:This approach theoretical, No experimental results were report.
Third Paper- Algorithm defined by B. Haddad and M. Yassen- Error patternsSimple Errors :Editing Errors and Boundary ProblemsCognitive and Phonetic MistakesSyntax ErrorsSemantic ErrorsSubstitution: (/ → /, fāl→qāl, he said), the letter (/ /,f) mistakenly substituted by (/ /,q).Deletion: (/ → /, ’sḫdama→ ’staḫdama, he or it-used), the letter (/ /,t) is missing.Insertion: (/ → /, makttūb → maktūb, a letter in the sense of a message). (/ /,t) is additionally inserted.Transposition: (/ → /, ’ğmitā‘ → ’ğtimā‘, meeting). The letter (/ /, t) is swapped.(/ → /, ra’īs’alğami‘h→ ra’īs ’alğami‘h)(/ → /, fa qāl → faqāl, and then he said)(/ or → /, hādā or hāzā → hadā, the particle that)(/ → /, the girl went to [the]- school), (/ /,dahaba) instead of(/ /, dahabat).(/ → /, red rebuking cells → red blood cells). (/ /, ’ldam, the rebuking)instead of (/ /, ’ldam, the-blood).
Third Paper- Knowledge base :D&C = ( DAWKB , NDAKB , CORSTR)- Derivative Arabic Word Knowledge Base DAWKB- For each valid Arabic root there is a certain number of consistent patterns.- Root-pattern relationship means, a word, which has at least one lexical occurrencein the Arabic vocabulary.- dwj = ( Prefji + PtjΘsubMGRi + Suffji ) MSR PNGRi- Database for NDW & AWConsidered as stems or lexemes collected in the knowledge base.- Non-Word Recognition and Error Correction Strategy
Fourth Paper- Paper proposed by A. Hattab and A. Hussein.- The proposed system consists of three models.- The detection and correction model, classify wordsinto a non-words or a misspelling.
Fourth PaperEvaluation :-There are two run applied for the proposed system, first run without the detectionand correction method and the second is with detection and correction method.-The same data will be used in both experiments. The results of these experimentsare shown in Tables:-The detection and correction algorithm outperformed the Bayes algorithm by about10%, without checking misspelling errors accuracy is 68.85%, while the averageaccuracy for the classification system with misspellings detection and correction is71.77%.