Spell checker using Natural language processing

Presentation
On
Spell Checking Techniques In NLP
Presented By: Sandeep Wakchaure

Outline
 Abstract
Introduction
Error Category
 Error Detection Technique
 Algorithms
Project Screenshot
Conclusion

Abstract
In this project, I am proposing a simple, flexible, and efficient spell checker
editor application based upon edit distance score, Supervised learning. I am
integrating Levenshtein distance (LD), Jaccard coefficient Algorithms to
achieve the require target, these all algorithms are measure the similarity
between two strings, which we will refer to as the source string (s) and the target
string (t). My approach is to design text editor based upon NLP having
auto spell checker which suggest the user mistake. I am using a novel scoring
scheme to integrate the retrieved words from each spelling approach and
calculate an overall score for each matched word. From the overall scores, we
can rank the possible matches. This algorithms required a training data set
which nothing but a data as dictionary, while writing a content in editor, the
backend proceed will happen like tokenization, distance calculation and
further filtering the results using mentioned algorithms and finally suggest
appropriate results.

 Spell Check is a process of detecting and sometimes providing
suggestions for incorrectly spelled words in a text.
 In computing, Spell Checker is an application program that flags words
in a document that may not be spelled correctly.
 Spell Checker may be stand- alone capable of operating on a block a
text such as word-processor, electronic dictionary.
A basic spell checker carries out the following processes:
 It scans the text and extracts the words contained in it.
 It then compares each word with a known list of correctly spelled words (i.e.
a dictionary).
 An additional step is a language-dependent algorithm for handling
morphology.
Introduction:

Spelling errors can be divided into two categories:
Real-word errors
Non-word errors.
Real-word errors : are those error words that are acceptable words in
the dictionary.
Non-word errors : are those error words that cannot be found in the
dictionary.
This words are complex to provide the suggestion, so this might not be
suggested.

2. ERROR DETECTION TECHNIQUES
A. Dictionary Lookup Technique :
- In this, Dictionary lookup technique is used which checks every word of input
text for its presence in dictionary.
If that word present in the dictionary, then it is a correct
word.
Otherwise it is put into the list of error words.

String 1 – “statistics”
String 2 – “statistical”
If n is set to 2 (Bigrams are being extracted), then the similarity of the two
strings is calculated as follows.
Initially, the two strings are split into n-grams:
Statistics - st ta at ti is st ti ic cs 9 Bigrams
Statistical - st ta at ti is st ti ic ca al 10 Bigrams
Coefficient = A ⊓ B
A ⊔ B
1. N- grams Based Technique using Jaccard coefficient
B. ALGORITHMS FOR ERROR WORDS

Levenshtein distance (LD) is a measure of the similarity between two strings,
which we will refer to as the source string (s) and the target string (t). The
distance is the number of deletions, insertions, or substitutions required to
transform s into t. For example,
If s is "test" and t is "test", then LD(s,t) = 0, because no transformations are
needed. The strings are already identical.
If s is "test" and t is "tent", then LD(s,t) = 1, because one substitution (change
"s" to "n") is sufficient to transform s into t.
2. The Levenshtein Algorithm (LD)

The Levenshtein distance algorithm has been used in:
Spell checking
Speech recognition
DNA analysis
Plagiarism detection
Operation : Insertion, Deletion, Substitution
In this algorithm cost calculated by comparing the source and targeted
words, and according to low cost word is suggested to user.
e.g Cat Cut
here the one substitution by 1 latter.

Conclusion
In this presentation we seen error detection , correction
techniques, the word are suggested to end user is based
on two algorithms one is Jaccard coefficient and second is
Levenshteien distance.
This algorithms filter out the dictionary
words and provide the exact suggestion to user, so that
user enter text in editor should be a error free and it
does not content any spelling mistakes.

Spell checker using Natural language processing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Spell checker using Natural language processing

Similar to Spell checker using Natural language processing (20)

Recently uploaded

Recently uploaded (20)

Spell checker using Natural language processing