SlideShare a Scribd company logo
Spell Checking in Deezer
Search Engine
Marion Baranes (Search-Scientist)
WiMLDS Paris
April 19th, 2018
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
Deezer Search Engine
/01
What is Deezer’s Search Engine?
Spell checking in Search Engine
Main features :
- Search across multiple
types (artist, album, tracks,
playlist, podcast,... )
- Localized and
personalized ranking
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
What is Deezer’s Search Engine?
Spell checking in Search Engine
Main features :
- Search across multiple
types (artist, album, tracks,
playlist, podcast,... )
- Localized and
personalized ranking
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
What do we do in Deezer Search Team?
Extra features :
- Top result
- Trends prediction
- Related queries
- Advanced search
- Search by tags
- Spell checking
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
Some numbers about Deezer Search
Spell checking in Search Engine
2.5 M daily users
9 M requests/day
Large catalog:53M tracks, 7M albums, 2M artists, 9M playlists,...
≈ 100 milliseconds, time to find a result
25 % of the stream sessions comes from the Search
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
/02
Our Spell Checking System
7WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
Why do we need spelling correction?
misunderstanding
disengagement
unsubscription
….
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
Spell checking in Search Engine
Error prediction
Originally, we used fuzzy approach to treat misspelled
queries.
In search engine, doing that:
● introduce noise in search results
● increase number of attempted requests
● increase search engine response time
→ We choose to predict future user’s misspelled queries
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
/A
/B
/C
Spell checking system
Learn user’s misspelled queries
Generate new misspelled queries
Prediction system
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
A. Search Engine with spell checking
user’s query
Spell Checking module
Is this a frequent query?
Is this a known
misspelled query?
Search the user’s
query
Search this query as
a frequent query
Search the
associated frequent
query of this mistake
yes
yes
no
Search Engine
no
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
→ Link misspelled queries with frequent queries using behavioral similarity.
● group queries of a same user need:using temporal and textual features.
● flag reformulated queries in a group
eg. here q3, flagged as reformulated, is a frequent query, the misspelled query is q2.
B. Learn user's misspelled queries
From data
daft
q0
daft p
q1
daft pink
q2
daft punk
q3
insertion at end insertion at end substitution
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
a) Validation by graphical similarity
b) Validation by phonetic similarity
daft punk - daft pink
lacrim - lace
pierpoljak - pierre paul jacque
polo & pan - pollo
reseaux - resa
pharrell williams - farel williams
havan - havana
...
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
B. Learn user's misspelled queries
Validation of pairs
● Damerau and Levenshtein score
count number of operations (insertion, deletion, inversion, substitution) needed to convert a
string in another
● Jaro and Winkler score
count the number of transpositions needed to convert one string to another. This algorithm
favours words that share the same prefix by impacting transpositions located in the
beginning of the word.
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
B. Learn user's misspelled queries
Validation of pairs - graphical similarity
Phonetic of a word depends on the speaker's
mother tongue:
Eg. for the name Schubert:
● english:/ʃubət/ (≈ chubet)
● french:/ʃubεʁ/ (≈ chuber)
Romanic Baltic Hellenic
Germanic Slavic Uralic ©http://www.listlanguage.com/european-languages-tree.html
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
B. Learn user's misspelled queries
Validation of pairs - phonetic similarity
Phonetic of a word depends on the speaker's
mother tongue:
Eg. for the name Schubert:
● english:/ʃubət/ (≈ chubet)
● french:/ʃubεʁ/ (≈ chuber)
Generation of phonetic version for pairs of
frequent query and misspelled query.
eg. billy gin - billie jean ≈ /bilidjin/
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
Romanic Baltic Hellenic
Germanic Slavic Uralic
B. Learn user's misspelled queries
Validation of pairs - phonetic similarity
©http://www.listlanguage.com/european-languages-tree.html
C. Generate new misspelled queries
How to predict a spelling error?
● Formal analogy
● Analogy for spell checking
● Extraction of spell checking rules
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
Formal analogy means that relation between these four objects has to be graphemic.
complicated : complication :: created : creation
x y z t
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
C. Generate new misspelled queries
Formal analogy
Formal analogy means that relation between these four objects has to be graphemic.
Stroppa and Yvon (2005, 2006) define formal analogy with two notions:
(1) an object can be split into sub-parts called factors
(2) Two pairs of objects share a relation of analogy, if all factors can be exchanged together:
○ inside each pair of objects,
○ between two pairs of objects.
complicat ed : complicat ion x1 = y1
x1 x2 y1 y2 z1 = t1
creat ed : creat ion x2 = z2
z1 z2 t1 t2 y2 = t2
For t the attended form to resolve the analogy [x:y :: z:? ], we can predict t (composed of factor of y and z)
::
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
C. Generate new misspelled queries
Formal analogy
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
C. Generate new misspelled queries
Analogy for spell checking
::
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
C. Generate new misspelled queries
Analogy for spell checking
→
:: ::
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
C. Generate new misspelled queries
Analogy for spell checking
1. Create a training corpus train with pairs of frequent and misspelled queries.
2. Detect the common factor and extract remaining factors:
S y n Cole
S i n Cole
3. Extract relevant information and create weighted spell checking rules:
previous context:[s] previous context:[l]
syn Cole : sin Cole mistake: y → i lykke Li : likke Li mistake: y → i [sl] y → i [nk]
next context:[n] next context:[k]
Eg. Marilyn Manson → Marilin Manson
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
C. Generate new misspelled queries
Extraction of spell checking rules
Evaluation and conclusion
/03
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
Results in Search
We suggest or force a correction depending on our confidence and the frequency of the request:
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
Evaluation
Quality of our system only evaluable by user feedbacks:
→ on ≈ 500 000 queries extracted from desktop search:
≈ 10 000 are concerned by our spelling system
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
Force correction Suggest correction Total
Accepted by the user 84% 10% 94%
Rejected by the user 3% 3% 6%
Total 87% 13% 100%
Conclusion
Around 1 query in 50 is misspelled and well corrected
(per day and per distinct user on desktop search)
Next steps for spell checking in Deezer Search Engine:
- Improve
- Personalize the current system
- Localize
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine

More Related Content

Similar to Spell Checking in Deezer Search Engine

Querying data on the Web – client or server?
Querying data on the Web – client or server?Querying data on the Web – client or server?
Querying data on the Web – client or server?
Ruben Verborgh
 
Svetlin Nakov - Cognate or False Friend? Ask the Web!
Svetlin Nakov - Cognate or False Friend? Ask the Web!Svetlin Nakov - Cognate or False Friend? Ask the Web!
Svetlin Nakov - Cognate or False Friend? Ask the Web!
Svetlin Nakov
 
SPOON: Source Code Analysis and Transformation for Java, Benjamin Danglot, In...
SPOON: Source Code Analysis and Transformation for Java, Benjamin Danglot, In...SPOON: Source Code Analysis and Transformation for Java, Benjamin Danglot, In...
SPOON: Source Code Analysis and Transformation for Java, Benjamin Danglot, In...
OW2
 
Using the whole web as your dataset
Using the whole web as your datasetUsing the whole web as your dataset
Using the whole web as your dataset
Turi, Inc.
 
Perl DBI Scripting with the ILS
Perl DBI Scripting with the ILSPerl DBI Scripting with the ILS
Perl DBI Scripting with the ILS
Roy Zimmer
 
Constructing your search
Constructing your searchConstructing your search
Constructing your search
Jamie Bisset
 
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
lucenerevolution
 
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
RIILP
 
Mining the social web for music-related data: a hands-on tutorial
Mining the social web for music-related data: a hands-on tutorialMining the social web for music-related data: a hands-on tutorial
Mining the social web for music-related data: a hands-on tutorial
Ben Fields
 
Mining the social web for music-related data: a hands-on tutorial
Mining the social web for music-related data: a hands-on tutorialMining the social web for music-related data: a hands-on tutorial
Mining the social web for music-related data: a hands-on tutorial
claudio b
 
Tactical Information Gathering
Tactical Information GatheringTactical Information Gathering
Tactical Information Gathering
Christian Martorella
 
RecSys'19 SMORe
RecSys'19 SMOReRecSys'19 SMORe
RecSys'19 SMORe
志明 陳
 
Semantic Technologies and Programmatic Access to Semantic Data
Semantic Technologies and Programmatic Access to Semantic Data Semantic Technologies and Programmatic Access to Semantic Data
Semantic Technologies and Programmatic Access to Semantic Data
Steffen Staab
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking
Mohamed BEN ELLEFI
 
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Lucidworks
 
Similarity at scale
Similarity at scaleSimilarity at scale
Similarity at scale
Ken Krugler
 
Professional Information Research
Professional Information ResearchProfessional Information Research
Professional Information Research
Eric Kokke
 
Metadata Provenance
Metadata ProvenanceMetadata Provenance
Metadata Provenance
Kai Eckert
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
Laura Hollink
 

Similar to Spell Checking in Deezer Search Engine (20)

Querying data on the Web – client or server?
Querying data on the Web – client or server?Querying data on the Web – client or server?
Querying data on the Web – client or server?
 
Svetlin Nakov - Cognate or False Friend? Ask the Web!
Svetlin Nakov - Cognate or False Friend? Ask the Web!Svetlin Nakov - Cognate or False Friend? Ask the Web!
Svetlin Nakov - Cognate or False Friend? Ask the Web!
 
SPOON: Source Code Analysis and Transformation for Java, Benjamin Danglot, In...
SPOON: Source Code Analysis and Transformation for Java, Benjamin Danglot, In...SPOON: Source Code Analysis and Transformation for Java, Benjamin Danglot, In...
SPOON: Source Code Analysis and Transformation for Java, Benjamin Danglot, In...
 
Using the whole web as your dataset
Using the whole web as your datasetUsing the whole web as your dataset
Using the whole web as your dataset
 
Perl DBI Scripting with the ILS
Perl DBI Scripting with the ILSPerl DBI Scripting with the ILS
Perl DBI Scripting with the ILS
 
Constructing your search
Constructing your searchConstructing your search
Constructing your search
 
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
 
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
 
Mining the social web for music-related data: a hands-on tutorial
Mining the social web for music-related data: a hands-on tutorialMining the social web for music-related data: a hands-on tutorial
Mining the social web for music-related data: a hands-on tutorial
 
Mining the social web for music-related data: a hands-on tutorial
Mining the social web for music-related data: a hands-on tutorialMining the social web for music-related data: a hands-on tutorial
Mining the social web for music-related data: a hands-on tutorial
 
Tactical Information Gathering
Tactical Information GatheringTactical Information Gathering
Tactical Information Gathering
 
RecSys'19 SMORe
RecSys'19 SMOReRecSys'19 SMORe
RecSys'19 SMORe
 
Semantic Technologies and Programmatic Access to Semantic Data
Semantic Technologies and Programmatic Access to Semantic Data Semantic Technologies and Programmatic Access to Semantic Data
Semantic Technologies and Programmatic Access to Semantic Data
 
C 2
C 2C 2
C 2
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking
 
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
 
Similarity at scale
Similarity at scaleSimilarity at scale
Similarity at scale
 
Professional Information Research
Professional Information ResearchProfessional Information Research
Professional Information Research
 
Metadata Provenance
Metadata ProvenanceMetadata Provenance
Metadata Provenance
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 

More from Paris Women in Machine Learning and Data Science

Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
Paris Women in Machine Learning and Data Science
 
How and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe DaudierHow and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe Daudier
Paris Women in Machine Learning and Data Science
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
Paris Women in Machine Learning and Data Science
 
Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
Paris Women in Machine Learning and Data Science
 
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria KnorpsOptimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Paris Women in Machine Learning and Data Science
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Evaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled dataEvaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled data
Paris Women in Machine Learning and Data Science
 
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Paris Women in Machine Learning and Data Science
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
Paris Women in Machine Learning and Data Science
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Paris Women in Machine Learning and Data Science
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
Paris Women in Machine Learning and Data Science
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
Paris Women in Machine Learning and Data Science
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Paris Women in Machine Learning and Data Science
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
Paris Women in Machine Learning and Data Science
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Paris Women in Machine Learning and Data Science
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
Paris Women in Machine Learning and Data Science
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Paris Women in Machine Learning and Data Science
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
Paris Women in Machine Learning and Data Science
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Paris Women in Machine Learning and Data Science
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Paris Women in Machine Learning and Data Science
 

More from Paris Women in Machine Learning and Data Science (20)

Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
How and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe DaudierHow and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe Daudier
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
 
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria KnorpsOptimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Perspectives, by M. Pannegeon
 
Evaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled dataEvaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled data
 
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
 

Recently uploaded

Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 

Recently uploaded (20)

Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 

Spell Checking in Deezer Search Engine

  • 1. Spell Checking in Deezer Search Engine Marion Baranes (Search-Scientist) WiMLDS Paris April 19th, 2018
  • 2. WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine Deezer Search Engine /01
  • 3. What is Deezer’s Search Engine? Spell checking in Search Engine Main features : - Search across multiple types (artist, album, tracks, playlist, podcast,... ) - Localized and personalized ranking WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
  • 4. What is Deezer’s Search Engine? Spell checking in Search Engine Main features : - Search across multiple types (artist, album, tracks, playlist, podcast,... ) - Localized and personalized ranking WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
  • 5. What do we do in Deezer Search Team? Extra features : - Top result - Trends prediction - Related queries - Advanced search - Search by tags - Spell checking WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
  • 6. Some numbers about Deezer Search Spell checking in Search Engine 2.5 M daily users 9 M requests/day Large catalog:53M tracks, 7M albums, 2M artists, 9M playlists,... ≈ 100 milliseconds, time to find a result 25 % of the stream sessions comes from the Search WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
  • 7. /02 Our Spell Checking System 7WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
  • 8. Why do we need spelling correction? misunderstanding disengagement unsubscription …. WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
  • 9. Spell checking in Search Engine Error prediction Originally, we used fuzzy approach to treat misspelled queries. In search engine, doing that: ● introduce noise in search results ● increase number of attempted requests ● increase search engine response time → We choose to predict future user’s misspelled queries WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
  • 10. /A /B /C Spell checking system Learn user’s misspelled queries Generate new misspelled queries Prediction system WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
  • 11. A. Search Engine with spell checking user’s query Spell Checking module Is this a frequent query? Is this a known misspelled query? Search the user’s query Search this query as a frequent query Search the associated frequent query of this mistake yes yes no Search Engine no WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
  • 12. → Link misspelled queries with frequent queries using behavioral similarity. ● group queries of a same user need:using temporal and textual features. ● flag reformulated queries in a group eg. here q3, flagged as reformulated, is a frequent query, the misspelled query is q2. B. Learn user's misspelled queries From data daft q0 daft p q1 daft pink q2 daft punk q3 insertion at end insertion at end substitution WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
  • 13. a) Validation by graphical similarity b) Validation by phonetic similarity daft punk - daft pink lacrim - lace pierpoljak - pierre paul jacque polo & pan - pollo reseaux - resa pharrell williams - farel williams havan - havana ... WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine B. Learn user's misspelled queries Validation of pairs
  • 14. ● Damerau and Levenshtein score count number of operations (insertion, deletion, inversion, substitution) needed to convert a string in another ● Jaro and Winkler score count the number of transpositions needed to convert one string to another. This algorithm favours words that share the same prefix by impacting transpositions located in the beginning of the word. WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine B. Learn user's misspelled queries Validation of pairs - graphical similarity
  • 15. Phonetic of a word depends on the speaker's mother tongue: Eg. for the name Schubert: ● english:/ʃubət/ (≈ chubet) ● french:/ʃubεʁ/ (≈ chuber) Romanic Baltic Hellenic Germanic Slavic Uralic ©http://www.listlanguage.com/european-languages-tree.html WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine B. Learn user's misspelled queries Validation of pairs - phonetic similarity
  • 16. Phonetic of a word depends on the speaker's mother tongue: Eg. for the name Schubert: ● english:/ʃubət/ (≈ chubet) ● french:/ʃubεʁ/ (≈ chuber) Generation of phonetic version for pairs of frequent query and misspelled query. eg. billy gin - billie jean ≈ /bilidjin/ WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine Romanic Baltic Hellenic Germanic Slavic Uralic B. Learn user's misspelled queries Validation of pairs - phonetic similarity ©http://www.listlanguage.com/european-languages-tree.html
  • 17. C. Generate new misspelled queries How to predict a spelling error? ● Formal analogy ● Analogy for spell checking ● Extraction of spell checking rules WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
  • 18. Formal analogy means that relation between these four objects has to be graphemic. complicated : complication :: created : creation x y z t WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine C. Generate new misspelled queries Formal analogy
  • 19. Formal analogy means that relation between these four objects has to be graphemic. Stroppa and Yvon (2005, 2006) define formal analogy with two notions: (1) an object can be split into sub-parts called factors (2) Two pairs of objects share a relation of analogy, if all factors can be exchanged together: ○ inside each pair of objects, ○ between two pairs of objects. complicat ed : complicat ion x1 = y1 x1 x2 y1 y2 z1 = t1 creat ed : creat ion x2 = z2 z1 z2 t1 t2 y2 = t2 For t the attended form to resolve the analogy [x:y :: z:? ], we can predict t (composed of factor of y and z) :: WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine C. Generate new misspelled queries Formal analogy
  • 20. WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine C. Generate new misspelled queries Analogy for spell checking
  • 21. :: WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine C. Generate new misspelled queries Analogy for spell checking
  • 22. → :: :: WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine C. Generate new misspelled queries Analogy for spell checking
  • 23. 1. Create a training corpus train with pairs of frequent and misspelled queries. 2. Detect the common factor and extract remaining factors: S y n Cole S i n Cole 3. Extract relevant information and create weighted spell checking rules: previous context:[s] previous context:[l] syn Cole : sin Cole mistake: y → i lykke Li : likke Li mistake: y → i [sl] y → i [nk] next context:[n] next context:[k] Eg. Marilyn Manson → Marilin Manson WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine C. Generate new misspelled queries Extraction of spell checking rules
  • 24. Evaluation and conclusion /03 WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
  • 25. Results in Search We suggest or force a correction depending on our confidence and the frequency of the request: WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
  • 26. Evaluation Quality of our system only evaluable by user feedbacks: → on ≈ 500 000 queries extracted from desktop search: ≈ 10 000 are concerned by our spelling system WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine Force correction Suggest correction Total Accepted by the user 84% 10% 94% Rejected by the user 3% 3% 6% Total 87% 13% 100%
  • 27. Conclusion Around 1 query in 50 is misspelled and well corrected (per day and per distinct user on desktop search) Next steps for spell checking in Deezer Search Engine: - Improve - Personalize the current system - Localize WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine
  • 28. WiMLDS 2018/04 : Spell Checking in DEEZER Search Engine