SlideShare a Scribd company logo
1 of 31
Spell Correction Systems for E-commerce engines
Anjan Goswami HuiZhong Duan
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 1 / 31
The Spell correction problem
Rich literature [KCG90, Pet80].
Active research area [CB04].
Combination of NLP, Machine Learning [DH11, BB01, LDZ12] and
Systems problems [Kuk92].
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 2 / 31
Spell correction for e-commerce
Critical site feature for e-commerce.
Impact of ML based spell correction
Adds revenue.
Reduces bounce rate.
Reduces null Results.
Departments such as pharmacy can have huge gain in revenue with
Spell Correction.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 3 / 31
Spell correction for e-commerce
Science part is same as any other large scale spell correction systems.
Demand and supply side corpus.
Conversion focus.
User Interfaces.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 4 / 31
Spell correction Evaluation
Accuracy for misspelled queries.
Accuracy for correctly spelled queries.
Business metrics.
Coverage.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 5 / 31
The problem
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 6 / 31
The problem
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 7 / 31
The problem
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 8 / 31
The problem
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 9 / 31
The problem
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 10 / 31
Error statistics
Approximately 26% queries have spelling error in web queries [JM].
E-com data can be expected to be similar.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 11 / 31
Error Types
Typographic errors: Covr ← Cover
Cognitive errors: Visio Tv ← Vizio Tv
Non-english word errors: X345678 ← X345677
Contextual errors: life of Pie ← Life of Pi
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 12 / 31
Challenges
General Challenges
Large candidate pool: queries
Open dictionary: all terms are feasible
Efficiency: happens before search is executed
User behavior: query formulation is different from typical writing
Devices: different device may cause different types of typos
Under-correction: even a term is in correct form, it may need
correction
Over-correction: a term that doesn’t appear correct could still be
good search term
Languages: Different languages have different challenges.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 13 / 31
Query Spelling Challenges
Special Challenges (and Opportunities) in e-Commerce
optimization target: linguistic correct or conversion?
unique dictionary: model numbers, etc.
high cost for over-correction
availability of inventory data
availability of conversion data
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 14 / 31
General problems
Error modeling
Candidate generation
Ranking and selection of the best candidate.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 15 / 31
Modeling
A Noisy Channel Framework
Given user input query q, for every candidate correction c, compute the
conditional probability p(c|q)
p(c|q) =
p(q|c) · p(c)
p(q)
∝ p(q|c) · p(c) (1)
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 16 / 31
Modeling
A Noisy Channel Framework (cont.)
Source model p(c)
Captures: how likely user will pick query c in the first place
Typically: language model
Rationale: common phrases have high probabilities
Error model p(q|c)
Captures: how likely c is misspelled as q
Straightforward model: edit distance
Rationale: misspelled query should not be too different from original
query
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 17 / 31
Modeling
A Noisy Channel Framework (cont.)
More on Source model p(c)
Linguistic correction is important
Should also reflect query popularity
In e-Commerce, we also need to consider query conversion, and query
revenue
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 18 / 31
Modeling
A Noisy Channel Framework (cont.)
Language Model
n-gram language model: data sparsity as n goes up
backoff to/interpolation with lower-gram is necessary
smoothing is important
Good Turing smoothing: use 1-frequency items to estimate 0-frequency
probabilities
Additive smoothing: add pseudo count to terms/phrases
Knesser-Ney Smoothing: smart way of backoff and interpolation
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 19 / 31
Modeling
A Noisy Channel Framework (cont.)
More on Error model p(q|c)
Weighted edit model is better: p( a → e ) > p( a → n )
Context matters: p( a → e |context = ”be...”)
Multi-word errors need to be considered: p(”gopro”|”go pro”), can
be modeled by HMM, joint sequence model, etc.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 20 / 31
Modeling
A Noisy Channel Framework (cont.)
Hierarchical Error models
Character level error model
p( a → e |context = ”be...”)
generalizes well
less accurate
Syllable level error model
Word level error model
p( pi → pie |context = ”life of ...”)
sparse data
more accurate
Phrase level error model
...
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 21 / 31
Modeling
Discriminative Models
Why?
Noisy channel model is a generative framework
Multiplication is difficult as probabilities are estimated in different
ways
How to merge signals in one probability estimation is unknown (e.g.
linguistic correction vs. popularity vs. revenue)
There are other heuristic features and domain specific features that
cannot be subsumed in noisy channel model
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 22 / 31
Modeling
Discriminative Models (cont.)
How?
Learn to score < q, c > pair so that best correction has highest score
Challenges
Obtaining large scale training data: text parsing, human annotation
Learning methods
Classification
Learning to Rank
Structural learning
Efficiency: use noisy channel model to retrieve a handful candidates
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 23 / 31
Modeling
Discriminative Models (cont.)
Typically discriminative models such as SVM can also be used to
rerank the spelling candidates.
Recent successes with deep neural net.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 24 / 31
Modeling
Systems for Spelling Correction
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 25 / 31
Modeling
Candidate generation for Spelling Correction
Given a word find out all neighboring words under k edit distance.
Given a word find out potential close matches by hashing trick.
Generate candidates by using heuristic rules for common errors.
N-gram based techniques.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 26 / 31
Modeling
Candidate generation scaling up
Distributed implementation.
Hashing tricks.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 27 / 31
Modeling
Spell correction for E-commerce
UI for the spell correction.
Input data: Whether to include item titles or not?
Impact of autocorrection on conversion.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 28 / 31
Modeling
References I
Michele Banko and Eric Brill, Scaling to very very large corpora for
natural language disambiguation, Proceedings of the 39th Annual
Meeting on Association for Computational Linguistics, Association for
Computational Linguistics, 2001, pp. 26–33.
Silviu Cucerzan and Eric Brill, Spelling correction as an iterative
process that exploits the collective knowledge of web users., EMNLP,
vol. 4, 2004, pp. 293–300.
Huizhong Duan and Bo-June Paul Hsu, Online spelling correction for
query completion, Proceedings of the 20th international conference on
World wide web, ACM, 2011, pp. 117–126.
Daniel Jurafsky and James H Martin, Speech and language processing:
An introduction to natural language processing, computational
linguistics, and speech recognition.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 29 / 31
Modeling
References II
Mark D Kernighan, Kenneth W Church, and William A Gale, A
spelling correction program based on a noisy channel model,
Proceedings of the 13th conference on Computational
linguistics-Volume 2, Association for Computational Linguistics, 1990,
pp. 205–210.
Karen Kukich, Techniques for automatically correcting words in text,
ACM Computing Surveys (CSUR) 24 (1992), no. 4, 377–439.
Yanen Li, Huizhong Duan, and ChengXiang Zhai, A generalized
hidden markov model with discriminative training for query spelling
correction, Proceedings of the 35th international ACM SIGIR
conference on Research and development in information retrieval,
ACM, 2012, pp. 611–620.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 30 / 31
Modeling
References III
James L Peterson, Computer programs for detecting and correcting
spelling errors, Communications of the ACM 23 (1980), no. 12,
676–687.
Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 31 / 31

More Related Content

What's hot

Introduction to spaCy
Introduction to spaCyIntroduction to spaCy
Introduction to spaCyRyo Takahashi
 
Introduction to myanmar Text-To-Speech
Introduction to myanmar Text-To-SpeechIntroduction to myanmar Text-To-Speech
Introduction to myanmar Text-To-SpeechNgwe Tun
 
Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...Data Science London
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining Sulman Ahmed
 
Scalability
ScalabilityScalability
Scalabilityfelho
 
Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Kavita Ganesan
 
12. Random Forest
12. Random Forest12. Random Forest
12. Random ForestFAO
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...Universitat Politècnica de Catalunya
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)Sumit Raj
 
SAP Business One & Woogjin Holdings Overview_En
SAP Business One & Woogjin Holdings Overview_EnSAP Business One & Woogjin Holdings Overview_En
SAP Business One & Woogjin Holdings Overview_EnSap Woongjin Holdings
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationMohammed Bennamoun
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Mostafa G. M. Mostafa
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition Goa App
 

What's hot (20)

Introduction to spaCy
Introduction to spaCyIntroduction to spaCy
Introduction to spaCy
 
Introduction to myanmar Text-To-Speech
Introduction to myanmar Text-To-SpeechIntroduction to myanmar Text-To-Speech
Introduction to myanmar Text-To-Speech
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Understanding GloVe
Understanding GloVeUnderstanding GloVe
Understanding GloVe
 
Parsing
ParsingParsing
Parsing
 
Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Scalability
ScalabilityScalability
Scalability
 
Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)
 
LaTex workshop
LaTex workshopLaTex workshop
LaTex workshop
 
12. Random Forest
12. Random Forest12. Random Forest
12. Random Forest
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
SAP Business One & Woogjin Holdings Overview_En
SAP Business One & Woogjin Holdings Overview_EnSAP Business One & Woogjin Holdings Overview_En
SAP Business One & Woogjin Holdings Overview_En
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computation
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
 
Speech processing
Speech processingSpeech processing
Speech processing
 

Viewers also liked

Lewisham chswg terms of reference (1)
Lewisham chswg terms of reference (1)Lewisham chswg terms of reference (1)
Lewisham chswg terms of reference (1)Felicia Samuel
 
$$$$Rafael e luiz$$$$
$$$$Rafael e luiz$$$$$$$$Rafael e luiz$$$$
$$$$Rafael e luiz$$$$rafaella1997
 
Art fx programme_20h_blender
Art fx programme_20h_blenderArt fx programme_20h_blender
Art fx programme_20h_blenderdocteuratelier
 
Innovative Strategies
Innovative StrategiesInnovative Strategies
Innovative Strategiesrohtashmal
 
Sergio Baonza Presentacion.
Sergio Baonza Presentacion.Sergio Baonza Presentacion.
Sergio Baonza Presentacion.sergiobaonza10
 
From Billions to Trillions - A report on Uganda's SDGs strategy
From Billions to Trillions - A report on Uganda's SDGs strategyFrom Billions to Trillions - A report on Uganda's SDGs strategy
From Billions to Trillions - A report on Uganda's SDGs strategyLizaGR
 
AncientEgyptPsychiatry
AncientEgyptPsychiatryAncientEgyptPsychiatry
AncientEgyptPsychiatrySandra Knecht
 
Presentation restaurant de la fin du monde
Presentation restaurant de la fin du mondePresentation restaurant de la fin du monde
Presentation restaurant de la fin du mondedocteuratelier
 
Marketing : 25 utilisations de la réalité virtuelle par les marques !
Marketing : 25 utilisations de la réalité virtuelle par les marques !Marketing : 25 utilisations de la réalité virtuelle par les marques !
Marketing : 25 utilisations de la réalité virtuelle par les marques !ARUCO
 
Så här hjäper vi ungdomar till sysselsättning!
Så här hjäper vi ungdomar till sysselsättning!Så här hjäper vi ungdomar till sysselsättning!
Så här hjäper vi ungdomar till sysselsättning!Transformator Design Group
 
SGF Veg Restaurant Presentation
SGF Veg Restaurant PresentationSGF Veg Restaurant Presentation
SGF Veg Restaurant PresentationKewal Ahuja
 
Dominic Kniveton - Embracing uncertainty
Dominic Kniveton - Embracing uncertaintyDominic Kniveton - Embracing uncertainty
Dominic Kniveton - Embracing uncertaintySTEPS Centre
 
Marriott International Capstone Research Paper
Marriott International Capstone Research PaperMarriott International Capstone Research Paper
Marriott International Capstone Research PaperNatalia Poplawska
 
Thèse "Comment une marque peut intégrer une dimension émotionnelle grâce à la...
Thèse "Comment une marque peut intégrer une dimension émotionnelle grâce à la...Thèse "Comment une marque peut intégrer une dimension émotionnelle grâce à la...
Thèse "Comment une marque peut intégrer une dimension émotionnelle grâce à la...Laurence Thébault
 
MobiliteaTime #10 : Apple Pay & Apple Wallet
MobiliteaTime #10 : Apple Pay & Apple Wallet MobiliteaTime #10 : Apple Pay & Apple Wallet
MobiliteaTime #10 : Apple Pay & Apple Wallet USERADGENTS
 

Viewers also liked (17)

Lewisham chswg terms of reference (1)
Lewisham chswg terms of reference (1)Lewisham chswg terms of reference (1)
Lewisham chswg terms of reference (1)
 
$$$$Rafael e luiz$$$$
$$$$Rafael e luiz$$$$$$$$Rafael e luiz$$$$
$$$$Rafael e luiz$$$$
 
Art fx programme_20h_blender
Art fx programme_20h_blenderArt fx programme_20h_blender
Art fx programme_20h_blender
 
Innovative Strategies
Innovative StrategiesInnovative Strategies
Innovative Strategies
 
Sergio Baonza Presentacion.
Sergio Baonza Presentacion.Sergio Baonza Presentacion.
Sergio Baonza Presentacion.
 
From Billions to Trillions - A report on Uganda's SDGs strategy
From Billions to Trillions - A report on Uganda's SDGs strategyFrom Billions to Trillions - A report on Uganda's SDGs strategy
From Billions to Trillions - A report on Uganda's SDGs strategy
 
AncientEgyptPsychiatry
AncientEgyptPsychiatryAncientEgyptPsychiatry
AncientEgyptPsychiatry
 
Presentation restaurant de la fin du monde
Presentation restaurant de la fin du mondePresentation restaurant de la fin du monde
Presentation restaurant de la fin du monde
 
Marketing : 25 utilisations de la réalité virtuelle par les marques !
Marketing : 25 utilisations de la réalité virtuelle par les marques !Marketing : 25 utilisations de la réalité virtuelle par les marques !
Marketing : 25 utilisations de la réalité virtuelle par les marques !
 
Så här hjäper vi ungdomar till sysselsättning!
Så här hjäper vi ungdomar till sysselsättning!Så här hjäper vi ungdomar till sysselsättning!
Så här hjäper vi ungdomar till sysselsättning!
 
SGF Veg Restaurant Presentation
SGF Veg Restaurant PresentationSGF Veg Restaurant Presentation
SGF Veg Restaurant Presentation
 
Underwriting
UnderwritingUnderwriting
Underwriting
 
Dominic Kniveton - Embracing uncertainty
Dominic Kniveton - Embracing uncertaintyDominic Kniveton - Embracing uncertainty
Dominic Kniveton - Embracing uncertainty
 
The Race
The RaceThe Race
The Race
 
Marriott International Capstone Research Paper
Marriott International Capstone Research PaperMarriott International Capstone Research Paper
Marriott International Capstone Research Paper
 
Thèse "Comment une marque peut intégrer une dimension émotionnelle grâce à la...
Thèse "Comment une marque peut intégrer une dimension émotionnelle grâce à la...Thèse "Comment une marque peut intégrer une dimension émotionnelle grâce à la...
Thèse "Comment une marque peut intégrer une dimension émotionnelle grâce à la...
 
MobiliteaTime #10 : Apple Pay & Apple Wallet
MobiliteaTime #10 : Apple Pay & Apple Wallet MobiliteaTime #10 : Apple Pay & Apple Wallet
MobiliteaTime #10 : Apple Pay & Apple Wallet
 

Similar to Spelling correction systems for e-commerce platforms

2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.pptmilkesa13
 
Maxmizing Profits with the Improvement in Product Composition - ICIEOM - Mer...
Maxmizing Profits with the Improvement in Product Composition  - ICIEOM - Mer...Maxmizing Profits with the Improvement in Product Composition  - ICIEOM - Mer...
Maxmizing Profits with the Improvement in Product Composition - ICIEOM - Mer...MereoConsulting
 
taghelper-final.doc
taghelper-final.doctaghelper-final.doc
taghelper-final.docbutest
 
Iterative usability evaluation of DSLs
Iterative usability evaluation of DSLsIterative usability evaluation of DSLs
Iterative usability evaluation of DSLsAnkica Barisic
 
Re2018 Semios for Requirements
Re2018 Semios for RequirementsRe2018 Semios for Requirements
Re2018 Semios for RequirementsClément Portet
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDESbutest
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDESbutest
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifierEsteban Ribero
 
Learning Content and Usage Factors Simultaneously
Learning Content and Usage Factors SimultaneouslyLearning Content and Usage Factors Simultaneously
Learning Content and Usage Factors SimultaneouslyArnab Bhadury
 
Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approachGarima Nanda
 
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...IRJET Journal
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to HindiRajat Jain
 
Semantic Relatedness for All (Languages): A Comparative Analysis of Multiling...
Semantic Relatedness for All (Languages): A Comparative Analysis of Multiling...Semantic Relatedness for All (Languages): A Comparative Analysis of Multiling...
Semantic Relatedness for All (Languages): A Comparative Analysis of Multiling...Siyamak Barzegar
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...jcscholtes
 
Evaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutionsEvaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutionsSajeed Mahaboob
 
A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and...
A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and...A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and...
A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and...Shakas Technologies
 
Text, Tags and Thumbnails: Latest Trends in Bioscience Literature Search
Text, Tags and Thumbnails:Latest Trends in Bioscience Literature SearchText, Tags and Thumbnails:Latest Trends in Bioscience Literature Search
Text, Tags and Thumbnails: Latest Trends in Bioscience Literature Searchmarti_hearst
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationIJECEIAES
 

Similar to Spelling correction systems for e-commerce platforms (20)

2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
Maxmizing Profits with the Improvement in Product Composition - ICIEOM - Mer...
Maxmizing Profits with the Improvement in Product Composition  - ICIEOM - Mer...Maxmizing Profits with the Improvement in Product Composition  - ICIEOM - Mer...
Maxmizing Profits with the Improvement in Product Composition - ICIEOM - Mer...
 
taghelper-final.doc
taghelper-final.doctaghelper-final.doc
taghelper-final.doc
 
Iterative usability evaluation of DSLs
Iterative usability evaluation of DSLsIterative usability evaluation of DSLs
Iterative usability evaluation of DSLs
 
Re2018 Semios for Requirements
Re2018 Semios for RequirementsRe2018 Semios for Requirements
Re2018 Semios for Requirements
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDES
 
PPT SLIDES
PPT SLIDESPPT SLIDES
PPT SLIDES
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifier
 
Learning Content and Usage Factors Simultaneously
Learning Content and Usage Factors SimultaneouslyLearning Content and Usage Factors Simultaneously
Learning Content and Usage Factors Simultaneously
 
Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approach
 
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
 
Semantic Relatedness for All (Languages): A Comparative Analysis of Multiling...
Semantic Relatedness for All (Languages): A Comparative Analysis of Multiling...Semantic Relatedness for All (Languages): A Comparative Analysis of Multiling...
Semantic Relatedness for All (Languages): A Comparative Analysis of Multiling...
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...
 
Evaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutionsEvaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutions
 
Question answering
Question answeringQuestion answering
Question answering
 
A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and...
A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and...A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and...
A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and...
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
 
Text, Tags and Thumbnails: Latest Trends in Bioscience Literature Search
Text, Tags and Thumbnails:Latest Trends in Bioscience Literature SearchText, Tags and Thumbnails:Latest Trends in Bioscience Literature Search
Text, Tags and Thumbnails: Latest Trends in Bioscience Literature Search
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query Translation
 

More from Anjan Goswami

Learning to Diversify for E-commerce Search with Multi-Armed Bandit}
Learning to Diversify for E-commerce Search with Multi-Armed Bandit}Learning to Diversify for E-commerce Search with Multi-Armed Bandit}
Learning to Diversify for E-commerce Search with Multi-Armed Bandit}Anjan Goswami
 
Discovery In Commerce Search
Discovery In Commerce SearchDiscovery In Commerce Search
Discovery In Commerce SearchAnjan Goswami
 
Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...
Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...
Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...Anjan Goswami
 
Controlled Experiments for Decision-Making in e-Commerce Search
Controlled Experiments for Decision-Making in e-Commerce SearchControlled Experiments for Decision-Making in e-Commerce Search
Controlled Experiments for Decision-Making in e-Commerce SearchAnjan Goswami
 
Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...
Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...
Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...Anjan Goswami
 
Assessing product image quality for online shopping
Assessing product image quality for online shoppingAssessing product image quality for online shopping
Assessing product image quality for online shopping Anjan Goswami
 

More from Anjan Goswami (8)

Learning to Diversify for E-commerce Search with Multi-Armed Bandit}
Learning to Diversify for E-commerce Search with Multi-Armed Bandit}Learning to Diversify for E-commerce Search with Multi-Armed Bandit}
Learning to Diversify for E-commerce Search with Multi-Armed Bandit}
 
Discovery In Commerce Search
Discovery In Commerce SearchDiscovery In Commerce Search
Discovery In Commerce Search
 
Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...
Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...
Machine-Learned Ranking Algorithms for E-commerce Search and Recommendation A...
 
Controlled Experiments for Decision-Making in e-Commerce Search
Controlled Experiments for Decision-Making in e-Commerce SearchControlled Experiments for Decision-Making in e-Commerce Search
Controlled Experiments for Decision-Making in e-Commerce Search
 
Reputation systems
Reputation systemsReputation systems
Reputation systems
 
Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...
Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...
Topic Models Based Understanding of Supply and Demand Side of an eCommerce En...
 
Assessing product image quality for online shopping
Assessing product image quality for online shoppingAssessing product image quality for online shopping
Assessing product image quality for online shopping
 
Clustering
ClusteringClustering
Clustering
 

Recently uploaded

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 

Recently uploaded (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 

Spelling correction systems for e-commerce platforms

  • 1. Spell Correction Systems for E-commerce engines Anjan Goswami HuiZhong Duan Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 1 / 31
  • 2. The Spell correction problem Rich literature [KCG90, Pet80]. Active research area [CB04]. Combination of NLP, Machine Learning [DH11, BB01, LDZ12] and Systems problems [Kuk92]. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 2 / 31
  • 3. Spell correction for e-commerce Critical site feature for e-commerce. Impact of ML based spell correction Adds revenue. Reduces bounce rate. Reduces null Results. Departments such as pharmacy can have huge gain in revenue with Spell Correction. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 3 / 31
  • 4. Spell correction for e-commerce Science part is same as any other large scale spell correction systems. Demand and supply side corpus. Conversion focus. User Interfaces. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 4 / 31
  • 5. Spell correction Evaluation Accuracy for misspelled queries. Accuracy for correctly spelled queries. Business metrics. Coverage. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 5 / 31
  • 6. The problem Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 6 / 31
  • 7. The problem Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 7 / 31
  • 8. The problem Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 8 / 31
  • 9. The problem Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 9 / 31
  • 10. The problem Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 10 / 31
  • 11. Error statistics Approximately 26% queries have spelling error in web queries [JM]. E-com data can be expected to be similar. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 11 / 31
  • 12. Error Types Typographic errors: Covr ← Cover Cognitive errors: Visio Tv ← Vizio Tv Non-english word errors: X345678 ← X345677 Contextual errors: life of Pie ← Life of Pi Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 12 / 31
  • 13. Challenges General Challenges Large candidate pool: queries Open dictionary: all terms are feasible Efficiency: happens before search is executed User behavior: query formulation is different from typical writing Devices: different device may cause different types of typos Under-correction: even a term is in correct form, it may need correction Over-correction: a term that doesn’t appear correct could still be good search term Languages: Different languages have different challenges. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 13 / 31
  • 14. Query Spelling Challenges Special Challenges (and Opportunities) in e-Commerce optimization target: linguistic correct or conversion? unique dictionary: model numbers, etc. high cost for over-correction availability of inventory data availability of conversion data Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 14 / 31
  • 15. General problems Error modeling Candidate generation Ranking and selection of the best candidate. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 15 / 31
  • 16. Modeling A Noisy Channel Framework Given user input query q, for every candidate correction c, compute the conditional probability p(c|q) p(c|q) = p(q|c) · p(c) p(q) ∝ p(q|c) · p(c) (1) Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 16 / 31
  • 17. Modeling A Noisy Channel Framework (cont.) Source model p(c) Captures: how likely user will pick query c in the first place Typically: language model Rationale: common phrases have high probabilities Error model p(q|c) Captures: how likely c is misspelled as q Straightforward model: edit distance Rationale: misspelled query should not be too different from original query Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 17 / 31
  • 18. Modeling A Noisy Channel Framework (cont.) More on Source model p(c) Linguistic correction is important Should also reflect query popularity In e-Commerce, we also need to consider query conversion, and query revenue Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 18 / 31
  • 19. Modeling A Noisy Channel Framework (cont.) Language Model n-gram language model: data sparsity as n goes up backoff to/interpolation with lower-gram is necessary smoothing is important Good Turing smoothing: use 1-frequency items to estimate 0-frequency probabilities Additive smoothing: add pseudo count to terms/phrases Knesser-Ney Smoothing: smart way of backoff and interpolation Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 19 / 31
  • 20. Modeling A Noisy Channel Framework (cont.) More on Error model p(q|c) Weighted edit model is better: p( a → e ) > p( a → n ) Context matters: p( a → e |context = ”be...”) Multi-word errors need to be considered: p(”gopro”|”go pro”), can be modeled by HMM, joint sequence model, etc. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 20 / 31
  • 21. Modeling A Noisy Channel Framework (cont.) Hierarchical Error models Character level error model p( a → e |context = ”be...”) generalizes well less accurate Syllable level error model Word level error model p( pi → pie |context = ”life of ...”) sparse data more accurate Phrase level error model ... Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 21 / 31
  • 22. Modeling Discriminative Models Why? Noisy channel model is a generative framework Multiplication is difficult as probabilities are estimated in different ways How to merge signals in one probability estimation is unknown (e.g. linguistic correction vs. popularity vs. revenue) There are other heuristic features and domain specific features that cannot be subsumed in noisy channel model Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 22 / 31
  • 23. Modeling Discriminative Models (cont.) How? Learn to score < q, c > pair so that best correction has highest score Challenges Obtaining large scale training data: text parsing, human annotation Learning methods Classification Learning to Rank Structural learning Efficiency: use noisy channel model to retrieve a handful candidates Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 23 / 31
  • 24. Modeling Discriminative Models (cont.) Typically discriminative models such as SVM can also be used to rerank the spelling candidates. Recent successes with deep neural net. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 24 / 31
  • 25. Modeling Systems for Spelling Correction Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 25 / 31
  • 26. Modeling Candidate generation for Spelling Correction Given a word find out all neighboring words under k edit distance. Given a word find out potential close matches by hashing trick. Generate candidates by using heuristic rules for common errors. N-gram based techniques. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 26 / 31
  • 27. Modeling Candidate generation scaling up Distributed implementation. Hashing tricks. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 27 / 31
  • 28. Modeling Spell correction for E-commerce UI for the spell correction. Input data: Whether to include item titles or not? Impact of autocorrection on conversion. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 28 / 31
  • 29. Modeling References I Michele Banko and Eric Brill, Scaling to very very large corpora for natural language disambiguation, Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, 2001, pp. 26–33. Silviu Cucerzan and Eric Brill, Spelling correction as an iterative process that exploits the collective knowledge of web users., EMNLP, vol. 4, 2004, pp. 293–300. Huizhong Duan and Bo-June Paul Hsu, Online spelling correction for query completion, Proceedings of the 20th international conference on World wide web, ACM, 2011, pp. 117–126. Daniel Jurafsky and James H Martin, Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 29 / 31
  • 30. Modeling References II Mark D Kernighan, Kenneth W Church, and William A Gale, A spelling correction program based on a noisy channel model, Proceedings of the 13th conference on Computational linguistics-Volume 2, Association for Computational Linguistics, 1990, pp. 205–210. Karen Kukich, Techniques for automatically correcting words in text, ACM Computing Surveys (CSUR) 24 (1992), no. 4, 377–439. Yanen Li, Huizhong Duan, and ChengXiang Zhai, A generalized hidden markov model with discriminative training for query spelling correction, Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, ACM, 2012, pp. 611–620. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 30 / 31
  • 31. Modeling References III James L Peterson, Computer programs for detecting and correcting spelling errors, Communications of the ACM 23 (1980), no. 12, 676–687. Anjan Goswami, HuiZhong Duan (Search Science, WalmartLabs)Spell correction Systems 31 / 31