SlideShare a Scribd company logo
Fuzzy Name Matching
with Elasticsearch
November 16th 2015
Chris Mack @cgmack
mack@basistech.com
4
02
Why Match Names?
1. Security
2. Fraud
3. Commerce
5
01
Quick survey: How many of you...
• Regularly develop Elastic applications?
• Develop Elastic applications that include names of…
...People?
...Places?
...Products?
...Organizations?
• Have names in languages beside English?
6
03
What Makes Name Matching Hard?
7
01
Name Variety
8
01
Name Variety
9
01
Name Ambiguity
10
01
How Would You Solve It?
11
01
Best Practice: field per variation type?
http://stackoverflow.com/questions/20632042/elasticsearch-searching-for-human-names
• Create a multi_field type with a
field per possible variation
• Complex query against each field
• Generally gives high recall
12
01
Can’t a name field type do this?
• Manage all the subfields
• Contribute score that reflects phenomena
• Be part of queries using many field types
• Have multiple fields per document
• Have multiple values per field (coming soon)
13
01
But what if variations co-occur?
“Jesus Alfonso Lopez Diaz”
v.
“LobezDias, Chuy”
1) Reordered.
2) Nickname for first name.
3) Missing 2nd Name.
4) Two spelling differences.
5) Missing space.
14
01
Can We Do Better?
• Incorporate our proprietary name matching
• Provide similarity scores to name pairs
• Uses Elasticsearch's Rescore query
• Allows for higher precision ranking and tresholding
• Provides multi-lingual name search
Demo
16
01
How does it work?
• Plugin contains custom mapper which does all the
work behind the scenes
17
01
What happens at index time?
• NameMapper indexes keys for different phenomena
in separate (sub) fields
18
01
Plug-in Implementation
19
01
What happens at query time?
• Step #1: NameMapper generates analogous keys for
a custom Lucene query that finds good candidates f
or re-scoring
20
01
What else happens at query time?
• Step #2: Uses Rescore query to score names in the
best candidate documents and reorder accordingly
- Tuned for high precision name matching
- Computationally expensive
21
01
What does that function do?
• The 'name_score' function matches the query name
against the indexed name in every candidate
document and returns the similarity score
22
01
Plug-in Implementation
23
01
Rescore Params: Tradeoff Accuracy vs. Speed
• window_size
- Controls how many of the top
documents to rescore
- Tradeoff accuracy vs speed
• minScoreToCheck - (Added by Us)
- Score threshold top doc must meet
to be rescored
- Tradeoff accuracy vs speed
24
01
Rescore Params - Integration w/Query
• rescore_query
- Calls the name_score function to get score
- Combine rescore_queries to query across multiple
fields
• query_weight
- Controls how much weight is given to main query
- Allows for queries on other non-name fields
• rescore_query_weight
- Controls how much weight is given to rescore query
25
01
Summary: How it works
• Central Problem
- Name Variety
- Name Ambiguity
• Custom field type
- Splits a single field into multiple fields covering different phenomena
- Supports multiple name fields in a document as well as multivalued fields
- Intercepts the query to inject a custom Lucene query
• Custom rescore function
- Rescores documents with algorithm specific to name matching
- Limits intense calculations to only top candidates
- Highly configurable
26
01
Resources
• Code
- https://github.com/cgmack/elastic_meetups
• This Presentation
-
Fuzzy Name Matching
with Elasticsearch
November 16th 2015
Chris Mack @cgmack
mack@basistech.com
29
01
Suggested Questions:
• What is names are in unstructured text?
• What if the names are in other text fields?
• How did you implement multi-valued fields?
• How does it scale?
• How do you handle names not in English?
• How does this relate to the theme of Entity-Centric
Search?
• How do plug-in’s scores relate to Elastic scores?
• How can I learn more?

More Related Content

What's hot

Managing a SolrCloud cluster using APIs
Managing a SolrCloud cluster using APIsManaging a SolrCloud cluster using APIs
Managing a SolrCloud cluster using APIsAnshum Gupta
 
Pivotal tracker를 활용한 팀 프로젝트 관리
Pivotal tracker를 활용한 팀 프로젝트 관리Pivotal tracker를 활용한 팀 프로젝트 관리
Pivotal tracker를 활용한 팀 프로젝트 관리Byungjin Park
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Introduction to spaCy
Introduction to spaCyIntroduction to spaCy
Introduction to spaCyRyo Takahashi
 
What is front-end development ?
What is front-end development ?What is front-end development ?
What is front-end development ?Mahmoud Shaker
 
TypeScript: coding JavaScript without the pain
TypeScript: coding JavaScript without the painTypeScript: coding JavaScript without the pain
TypeScript: coding JavaScript without the painSander Mak (@Sander_Mak)
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
Traversing Graphs with Gremlin
Traversing Graphs with GremlinTraversing Graphs with Gremlin
Traversing Graphs with GremlinArtem Chebotko
 
REST Easy with Django-Rest-Framework
REST Easy with Django-Rest-FrameworkREST Easy with Django-Rest-Framework
REST Easy with Django-Rest-FrameworkMarcel Chastain
 
Node JS Crash Course
Node JS Crash CourseNode JS Crash Course
Node JS Crash CourseHaim Michael
 
Database Automation with MySQL Triggers and Event Schedulers
Database Automation with MySQL Triggers and Event SchedulersDatabase Automation with MySQL Triggers and Event Schedulers
Database Automation with MySQL Triggers and Event SchedulersAbdul Rahman Sherzad
 
XML Schema
XML SchemaXML Schema
XML Schemayht4ever
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In DepthFabio Fumarola
 
Overview of React.JS - Internship Presentation - Week 5
Overview of React.JS - Internship Presentation - Week 5Overview of React.JS - Internship Presentation - Week 5
Overview of React.JS - Internship Presentation - Week 5Devang Garach
 

What's hot (20)

Learn react-js
Learn react-jsLearn react-js
Learn react-js
 
Managing a SolrCloud cluster using APIs
Managing a SolrCloud cluster using APIsManaging a SolrCloud cluster using APIs
Managing a SolrCloud cluster using APIs
 
Pivotal tracker를 활용한 팀 프로젝트 관리
Pivotal tracker를 활용한 팀 프로젝트 관리Pivotal tracker를 활용한 팀 프로젝트 관리
Pivotal tracker를 활용한 팀 프로젝트 관리
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Xpath presentation
Xpath presentationXpath presentation
Xpath presentation
 
Introduction to spaCy
Introduction to spaCyIntroduction to spaCy
Introduction to spaCy
 
What is front-end development ?
What is front-end development ?What is front-end development ?
What is front-end development ?
 
TypeScript: coding JavaScript without the pain
TypeScript: coding JavaScript without the painTypeScript: coding JavaScript without the pain
TypeScript: coding JavaScript without the pain
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Traversing Graphs with Gremlin
Traversing Graphs with GremlinTraversing Graphs with Gremlin
Traversing Graphs with Gremlin
 
REST Easy with Django-Rest-Framework
REST Easy with Django-Rest-FrameworkREST Easy with Django-Rest-Framework
REST Easy with Django-Rest-Framework
 
03 namespace
03 namespace03 namespace
03 namespace
 
Semantic web
Semantic webSemantic web
Semantic web
 
Node JS Crash Course
Node JS Crash CourseNode JS Crash Course
Node JS Crash Course
 
Database Automation with MySQL Triggers and Event Schedulers
Database Automation with MySQL Triggers and Event SchedulersDatabase Automation with MySQL Triggers and Event Schedulers
Database Automation with MySQL Triggers and Event Schedulers
 
TypeScript
TypeScriptTypeScript
TypeScript
 
XML Schema
XML SchemaXML Schema
XML Schema
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
Overview of React.JS - Internship Presentation - Week 5
Overview of React.JS - Internship Presentation - Week 5Overview of React.JS - Internship Presentation - Week 5
Overview of React.JS - Internship Presentation - Week 5
 

Similar to Fuzzy Name Matching with Rosette

Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologySimple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologyLucidworks
 
Fuzzy Name Matching in Solr
Fuzzy Name Matching in SolrFuzzy Name Matching in Solr
Fuzzy Name Matching in SolrChristopher Mack
 
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...lucenerevolution
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformLucidworks (Archived)
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Lucidworks
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformTrey Grainger
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseLucidworks
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkSimon Hughes
 
Lessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedLessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedDainius Jocas
 
apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic ...
apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic ...apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic ...
apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic ...apidays
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrlucenerevolution
 
Large Data Volume Salesforce experiences
Large Data Volume Salesforce experiencesLarge Data Volume Salesforce experiences
Large Data Volume Salesforce experiencesCidar Mendizabal
 
PyCon 2013 - Experiments in data mining, entity disambiguation and how to thi...
PyCon 2013 - Experiments in data mining, entity disambiguation and how to thi...PyCon 2013 - Experiments in data mining, entity disambiguation and how to thi...
PyCon 2013 - Experiments in data mining, entity disambiguation and how to thi...Ekta Grover
 
Reactive Development: Commands, Actors and Events. Oh My!!
Reactive Development: Commands, Actors and Events.  Oh My!!Reactive Development: Commands, Actors and Events.  Oh My!!
Reactive Development: Commands, Actors and Events. Oh My!!David Hoerster
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAsad Abbas
 
Microservices - Is it time to breakup?
Microservices - Is it time to breakup? Microservices - Is it time to breakup?
Microservices - Is it time to breakup? Dave Nielsen
 
Introducción a NoSQL
Introducción a NoSQLIntroducción a NoSQL
Introducción a NoSQLMongoDB
 
Adding data sources to the reporter
Adding data sources to the reporterAdding data sources to the reporter
Adding data sources to the reporterRogan Hamby
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that growGibraltar Software
 

Similar to Fuzzy Name Matching with Rosette (20)

Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologySimple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
 
Fuzzy Name Matching in Solr
Fuzzy Name Matching in SolrFuzzy Name Matching in Solr
Fuzzy Name Matching in Solr
 
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
Lessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedLessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at Vinted
 
apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic ...
apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic ...apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic ...
apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic ...
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solr
 
Large Data Volume Salesforce experiences
Large Data Volume Salesforce experiencesLarge Data Volume Salesforce experiences
Large Data Volume Salesforce experiences
 
Designing DDD Aggregates
Designing DDD AggregatesDesigning DDD Aggregates
Designing DDD Aggregates
 
PyCon 2013 - Experiments in data mining, entity disambiguation and how to thi...
PyCon 2013 - Experiments in data mining, entity disambiguation and how to thi...PyCon 2013 - Experiments in data mining, entity disambiguation and how to thi...
PyCon 2013 - Experiments in data mining, entity disambiguation and how to thi...
 
Reactive Development: Commands, Actors and Events. Oh My!!
Reactive Development: Commands, Actors and Events.  Oh My!!Reactive Development: Commands, Actors and Events.  Oh My!!
Reactive Development: Commands, Actors and Events. Oh My!!
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Microservices - Is it time to breakup?
Microservices - Is it time to breakup? Microservices - Is it time to breakup?
Microservices - Is it time to breakup?
 
Introducción a NoSQL
Introducción a NoSQLIntroducción a NoSQL
Introducción a NoSQL
 
Adding data sources to the reporter
Adding data sources to the reporterAdding data sources to the reporter
Adding data sources to the reporter
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that grow
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Alison B. Lowndes
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...Product School
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»QADay
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesThousandEyes
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 

Fuzzy Name Matching with Rosette

  • 1. Fuzzy Name Matching with Elasticsearch November 16th 2015 Chris Mack @cgmack mack@basistech.com
  • 2.
  • 3.
  • 4. 4 02 Why Match Names? 1. Security 2. Fraud 3. Commerce
  • 5. 5 01 Quick survey: How many of you... • Regularly develop Elastic applications? • Develop Elastic applications that include names of… ...People? ...Places? ...Products? ...Organizations? • Have names in languages beside English?
  • 6. 6 03 What Makes Name Matching Hard?
  • 10. 10 01 How Would You Solve It?
  • 11. 11 01 Best Practice: field per variation type? http://stackoverflow.com/questions/20632042/elasticsearch-searching-for-human-names • Create a multi_field type with a field per possible variation • Complex query against each field • Generally gives high recall
  • 12. 12 01 Can’t a name field type do this? • Manage all the subfields • Contribute score that reflects phenomena • Be part of queries using many field types • Have multiple fields per document • Have multiple values per field (coming soon)
  • 13. 13 01 But what if variations co-occur? “Jesus Alfonso Lopez Diaz” v. “LobezDias, Chuy” 1) Reordered. 2) Nickname for first name. 3) Missing 2nd Name. 4) Two spelling differences. 5) Missing space.
  • 14. 14 01 Can We Do Better? • Incorporate our proprietary name matching • Provide similarity scores to name pairs • Uses Elasticsearch's Rescore query • Allows for higher precision ranking and tresholding • Provides multi-lingual name search
  • 15. Demo
  • 16. 16 01 How does it work? • Plugin contains custom mapper which does all the work behind the scenes
  • 17. 17 01 What happens at index time? • NameMapper indexes keys for different phenomena in separate (sub) fields
  • 19. 19 01 What happens at query time? • Step #1: NameMapper generates analogous keys for a custom Lucene query that finds good candidates f or re-scoring
  • 20. 20 01 What else happens at query time? • Step #2: Uses Rescore query to score names in the best candidate documents and reorder accordingly - Tuned for high precision name matching - Computationally expensive
  • 21. 21 01 What does that function do? • The 'name_score' function matches the query name against the indexed name in every candidate document and returns the similarity score
  • 23. 23 01 Rescore Params: Tradeoff Accuracy vs. Speed • window_size - Controls how many of the top documents to rescore - Tradeoff accuracy vs speed • minScoreToCheck - (Added by Us) - Score threshold top doc must meet to be rescored - Tradeoff accuracy vs speed
  • 24. 24 01 Rescore Params - Integration w/Query • rescore_query - Calls the name_score function to get score - Combine rescore_queries to query across multiple fields • query_weight - Controls how much weight is given to main query - Allows for queries on other non-name fields • rescore_query_weight - Controls how much weight is given to rescore query
  • 25. 25 01 Summary: How it works • Central Problem - Name Variety - Name Ambiguity • Custom field type - Splits a single field into multiple fields covering different phenomena - Supports multiple name fields in a document as well as multivalued fields - Intercepts the query to inject a custom Lucene query • Custom rescore function - Rescores documents with algorithm specific to name matching - Limits intense calculations to only top candidates - Highly configurable
  • 27.
  • 28. Fuzzy Name Matching with Elasticsearch November 16th 2015 Chris Mack @cgmack mack@basistech.com
  • 29. 29 01 Suggested Questions: • What is names are in unstructured text? • What if the names are in other text fields? • How did you implement multi-valued fields? • How does it scale? • How do you handle names not in English? • How does this relate to the theme of Entity-Centric Search? • How do plug-in’s scores relate to Elastic scores? • How can I learn more?