SlideShare a Scribd company logo
Name Matching
with Elasticsearch
June 25, 2015
Graham Morehead
gmorehead@basistech.com
April 15 2013 2:49 PM .
Real life example
David K. Murgatroyd
VP of Engineering
Boarding Pass
Best Practice using Elasticsearch?
● NameMapper
(http://stackoverflow.com/questions/20632042/elasticsearch-searching-for-human-names)
"mappings": { ... "type": "multi_field", "fields": {
"pty_surename": { "type": "string", "analyzer": "simple" },
"metaphone": { "type": "string", "analyzer": "metaphone" },
"porter": { "type": "string", "analyzer": "porter" } …
● rescore_query
“Jesus Alfonso Lopez Diaz”
vs.
“LobezDias, Chuy”
RNI
Rescore Query
Main Query
Plug-in Implementation
match : {
name:
"Bob Smitty" }
bool:
name.Key1:...
name.Key2:...
name.Key3:...
User Query
Rescore
name_score : {
field : "name",
name : "Bob
Smitty")
name:"Robert
Smith"
dob:2/13/1987
score : .79
Indexing
{ name: "Robert
Smith"
dob:"
1987/02/13" }
{ name: "Robert
Smith"
name.Key1:…
name.Key2:…
name.Key3:…
dob:
"1987/02/13" }
User Doc
Index
subset
Demo
Elastic + RNI
Name Matching
with Elasticsearch
June 25, 2015
Graham Morehead
gmorehead@basistech.com
How could you use such a Field?
● Plugin contains custom mapper which does
all the work behind the scenes
PUT /ofac/ofac/_mapping
{
"ofac" : {
"properties" : {
"name" : { "type:" : "rni_name" }
"aka" : { "type:" : "rni_name" }
}
}
}
What happens at index time?
● NameMapper indexes keys for different
phenomena in separate (sub) fields
@Override
public void parse(ParseContext context) throws IOException {
Name name = NameBuilder.data(nameString).build();
//Generate keys for name
Collection<FieldSpec> fields = helper.deriveFieldsForName(name);
//Parse each key with the appropriate Mapper
for (FieldSpec field : fields) {
Mapper mapper = keyMappers.get(field.getField().fieldName());
context = context.createExternalValueContext(field.getStringValue());
mapper.parse(context);
}
}
What happens at query time?
● Step #1: NameMapper generates analogous
keys for a custom Lucene query that finds
good candidates for re-scoring
@Override
public Query termQuery(Object value, @Nullable QueryParseContext context) {
//Parse name string
Name name = NameBuilder.data(value.toString()).build();
QuerySpec spec = helper.buildQuerySpec(new NameIndexQuery(name));
//Build Lucene query
Query query = spec.accept(new ESQueryVisitor(names.indexName() + "."));
return query;
}
What else happens at query time?
● Step #2: Uses a Rescore query to score names in the
best candidate documents and reorder accordingly
○ Tuned for high precision name matching
○ Computationally expensive
"rescore" : {
"query" : {
"rescore_query" : {
"function_score" : {
"name_score" : {
"field" : "name",
"query_name" : "LobEzDiaS, Chuy"
}
...
● The 'name_score' function matches the
query name against the indexed name in
every candidate document and returns the
similarity score
@Override
public double score(int docId, float subQueryScore) {
//Create a scorer for the query name
CachedScorer cs = createCachedScorer(queryName);
//Retrieve name data from doc values
nameByteData.setDocument(docId);
Name indexName = bytesToName(nameByteData.valueAt(i).bytes);
//Score the query against the indexed name in this document
return cs.score(indexName);
}
What does that function do?

More Related Content

What's hot

Talend Interview Questions and Answers | Talend Online Training | Talend Tuto...
Talend Interview Questions and Answers | Talend Online Training | Talend Tuto...Talend Interview Questions and Answers | Talend Online Training | Talend Tuto...
Talend Interview Questions and Answers | Talend Online Training | Talend Tuto...
Edureka!
 
Introduction to Modern Identity with Auth0's Developer
 Introduction to Modern Identity with Auth0's Developer Introduction to Modern Identity with Auth0's Developer
Introduction to Modern Identity with Auth0's Developer
Product School
 
Derbycon - The Unintended Risks of Trusting Active Directory
Derbycon - The Unintended Risks of Trusting Active DirectoryDerbycon - The Unintended Risks of Trusting Active Directory
Derbycon - The Unintended Risks of Trusting Active Directory
Will Schroeder
 
BloodHound Unleashed.pdf
BloodHound Unleashed.pdfBloodHound Unleashed.pdf
BloodHound Unleashed.pdf
n00py1
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
Philippe Mizrahi
 
Redis for duplicate detection on real time stream
Redis for duplicate detection on real time streamRedis for duplicate detection on real time stream
Redis for duplicate detection on real time stream
Roberto Franchini
 
Not a Security Boundary
Not a Security BoundaryNot a Security Boundary
Not a Security Boundary
Will Schroeder
 
Json Web Token - JWT
Json Web Token - JWTJson Web Token - JWT
Json Web Token - JWT
Prashant Walke
 
Keep me in the Loop: INotify in HDFS
Keep me in the Loop: INotify in HDFSKeep me in the Loop: INotify in HDFS
Keep me in the Loop: INotify in HDFS
DataWorks Summit
 
aclpwn - Active Directory ACL exploitation with BloodHound
aclpwn - Active Directory ACL exploitation with BloodHoundaclpwn - Active Directory ACL exploitation with BloodHound
aclpwn - Active Directory ACL exploitation with BloodHound
DirkjanMollema
 
Introduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlibIntroduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlib
Taras Matyashovsky
 
Understanding OpenID
Understanding OpenIDUnderstanding OpenID
Understanding OpenID
Prabath Siriwardena
 
introduction Azure OpenAI by Usama wahab khan
introduction  Azure OpenAI by Usama wahab khanintroduction  Azure OpenAI by Usama wahab khan
introduction Azure OpenAI by Usama wahab khan
Usama Wahab Khan Cloud, Data and AI
 
Backtrack 5 - network pentest
Backtrack 5 - network pentestBacktrack 5 - network pentest
Backtrack 5 - network pentest
Dan H
 
An Attacker's View of Serverless and GraphQL Apps - Abhay Bhargav - AppSec Ca...
An Attacker's View of Serverless and GraphQL Apps - Abhay Bhargav - AppSec Ca...An Attacker's View of Serverless and GraphQL Apps - Abhay Bhargav - AppSec Ca...
An Attacker's View of Serverless and GraphQL Apps - Abhay Bhargav - AppSec Ca...
Abhay Bhargav
 
Dynamics 365 CRM Javascript Customization
Dynamics 365 CRM Javascript CustomizationDynamics 365 CRM Javascript Customization
Dynamics 365 CRM Javascript Customization
Sanjaya Prakash Pradhan
 
OAuth 2.0 and OpenId Connect
OAuth 2.0 and OpenId ConnectOAuth 2.0 and OpenId Connect
OAuth 2.0 and OpenId Connect
Saran Doraiswamy
 
DerbyCon 2019 - Kerberoasting Revisited
DerbyCon 2019 - Kerberoasting RevisitedDerbyCon 2019 - Kerberoasting Revisited
DerbyCon 2019 - Kerberoasting Revisited
Will Schroeder
 
RedisConf17 - Redis as Java Session Store
RedisConf17 - Redis as Java Session StoreRedisConf17 - Redis as Java Session Store
RedisConf17 - Redis as Java Session Store
Redis Labs
 
SharePoint goes Microsoft Graph
SharePoint goes Microsoft GraphSharePoint goes Microsoft Graph
SharePoint goes Microsoft Graph
Markus Moeller
 

What's hot (20)

Talend Interview Questions and Answers | Talend Online Training | Talend Tuto...
Talend Interview Questions and Answers | Talend Online Training | Talend Tuto...Talend Interview Questions and Answers | Talend Online Training | Talend Tuto...
Talend Interview Questions and Answers | Talend Online Training | Talend Tuto...
 
Introduction to Modern Identity with Auth0's Developer
 Introduction to Modern Identity with Auth0's Developer Introduction to Modern Identity with Auth0's Developer
Introduction to Modern Identity with Auth0's Developer
 
Derbycon - The Unintended Risks of Trusting Active Directory
Derbycon - The Unintended Risks of Trusting Active DirectoryDerbycon - The Unintended Risks of Trusting Active Directory
Derbycon - The Unintended Risks of Trusting Active Directory
 
BloodHound Unleashed.pdf
BloodHound Unleashed.pdfBloodHound Unleashed.pdf
BloodHound Unleashed.pdf
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
 
Redis for duplicate detection on real time stream
Redis for duplicate detection on real time streamRedis for duplicate detection on real time stream
Redis for duplicate detection on real time stream
 
Not a Security Boundary
Not a Security BoundaryNot a Security Boundary
Not a Security Boundary
 
Json Web Token - JWT
Json Web Token - JWTJson Web Token - JWT
Json Web Token - JWT
 
Keep me in the Loop: INotify in HDFS
Keep me in the Loop: INotify in HDFSKeep me in the Loop: INotify in HDFS
Keep me in the Loop: INotify in HDFS
 
aclpwn - Active Directory ACL exploitation with BloodHound
aclpwn - Active Directory ACL exploitation with BloodHoundaclpwn - Active Directory ACL exploitation with BloodHound
aclpwn - Active Directory ACL exploitation with BloodHound
 
Introduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlibIntroduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlib
 
Understanding OpenID
Understanding OpenIDUnderstanding OpenID
Understanding OpenID
 
introduction Azure OpenAI by Usama wahab khan
introduction  Azure OpenAI by Usama wahab khanintroduction  Azure OpenAI by Usama wahab khan
introduction Azure OpenAI by Usama wahab khan
 
Backtrack 5 - network pentest
Backtrack 5 - network pentestBacktrack 5 - network pentest
Backtrack 5 - network pentest
 
An Attacker's View of Serverless and GraphQL Apps - Abhay Bhargav - AppSec Ca...
An Attacker's View of Serverless and GraphQL Apps - Abhay Bhargav - AppSec Ca...An Attacker's View of Serverless and GraphQL Apps - Abhay Bhargav - AppSec Ca...
An Attacker's View of Serverless and GraphQL Apps - Abhay Bhargav - AppSec Ca...
 
Dynamics 365 CRM Javascript Customization
Dynamics 365 CRM Javascript CustomizationDynamics 365 CRM Javascript Customization
Dynamics 365 CRM Javascript Customization
 
OAuth 2.0 and OpenId Connect
OAuth 2.0 and OpenId ConnectOAuth 2.0 and OpenId Connect
OAuth 2.0 and OpenId Connect
 
DerbyCon 2019 - Kerberoasting Revisited
DerbyCon 2019 - Kerberoasting RevisitedDerbyCon 2019 - Kerberoasting Revisited
DerbyCon 2019 - Kerberoasting Revisited
 
RedisConf17 - Redis as Java Session Store
RedisConf17 - Redis as Java Session StoreRedisConf17 - Redis as Java Session Store
RedisConf17 - Redis as Java Session Store
 
SharePoint goes Microsoft Graph
SharePoint goes Microsoft GraphSharePoint goes Microsoft Graph
SharePoint goes Microsoft Graph
 

Viewers also liked

OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian CarrierOSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
Basis Technology
 
Optimizing multilingual search in SOLR
Optimizing multilingual search in SOLROptimizing multilingual search in SOLR
Optimizing multilingual search in SOLR
Basis Technology
 
Rosette Search Essentials for Elasticsearch
Rosette Search Essentials for ElasticsearchRosette Search Essentials for Elasticsearch
Rosette Search Essentials for Elasticsearch
Basis Technology
 
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Basis Technology
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?
Jos van Dongen
 
How can iceland produce so many professional players sept 2010
How can iceland produce so many professional players sept 2010How can iceland produce so many professional players sept 2010
How can iceland produce so many professional players sept 2010
Robin.Russell
 
The Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsThe Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsAlyona Medelyan
 
Building Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoBuilding Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using Pentaho
Ashnikbiz
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Roland Bouman
 
Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho
Uday Kothari
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
Alexandre Rafalovitch
 
Pentaho-BI
Pentaho-BIPentaho-BI
Pentaho-BIEdureka!
 
Slides pentaho-hadoop-weka
Slides pentaho-hadoop-wekaSlides pentaho-hadoop-weka
Slides pentaho-hadoop-wekalucboudreau
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introduction
mattcasters
 
ElasticSearch : Architecture et Développement
ElasticSearch : Architecture et DéveloppementElasticSearch : Architecture et Développement
ElasticSearch : Architecture et Développement
Mohamed hedi Abidi
 
Nantes JUG - Elasticsearch
Nantes JUG - ElasticsearchNantes JUG - Elasticsearch
Nantes JUG - Elasticsearch
David Pilato
 
Elasticsearch - Montpellier JUG
Elasticsearch - Montpellier JUGElasticsearch - Montpellier JUG
Elasticsearch - Montpellier JUG
David Pilato
 
Tirer le meilleur de ses données avec ElasticSearch
Tirer le meilleur de ses données avec ElasticSearchTirer le meilleur de ses données avec ElasticSearch
Tirer le meilleur de ses données avec ElasticSearch
Séven Le Mesle
 

Viewers also liked (19)

OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian CarrierOSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
 
Optimizing multilingual search in SOLR
Optimizing multilingual search in SOLROptimizing multilingual search in SOLR
Optimizing multilingual search in SOLR
 
Rosette Search Essentials for Elasticsearch
Rosette Search Essentials for ElasticsearchRosette Search Essentials for Elasticsearch
Rosette Search Essentials for Elasticsearch
 
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
 
Final Doc_1.1
Final Doc_1.1Final Doc_1.1
Final Doc_1.1
 
World Domination with Pentaho EE?
World Domination with Pentaho EE?World Domination with Pentaho EE?
World Domination with Pentaho EE?
 
How can iceland produce so many professional players sept 2010
How can iceland produce so many professional players sept 2010How can iceland produce so many professional players sept 2010
How can iceland produce so many professional players sept 2010
 
The Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsThe Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text Analytics
 
Building Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoBuilding Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using Pentaho
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
 
Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Pentaho-BI
Pentaho-BIPentaho-BI
Pentaho-BI
 
Slides pentaho-hadoop-weka
Slides pentaho-hadoop-wekaSlides pentaho-hadoop-weka
Slides pentaho-hadoop-weka
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introduction
 
ElasticSearch : Architecture et Développement
ElasticSearch : Architecture et DéveloppementElasticSearch : Architecture et Développement
ElasticSearch : Architecture et Développement
 
Nantes JUG - Elasticsearch
Nantes JUG - ElasticsearchNantes JUG - Elasticsearch
Nantes JUG - Elasticsearch
 
Elasticsearch - Montpellier JUG
Elasticsearch - Montpellier JUGElasticsearch - Montpellier JUG
Elasticsearch - Montpellier JUG
 
Tirer le meilleur de ses données avec ElasticSearch
Tirer le meilleur de ses données avec ElasticSearchTirer le meilleur de ses données avec ElasticSearch
Tirer le meilleur de ses données avec ElasticSearch
 

Similar to Simple fuzzy Name Matching in Elasticsearch - Graham Morehead

Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
Matteo Moci
 
Simple fuzzy name matching in solr
Simple fuzzy name matching in solrSimple fuzzy name matching in solr
Simple fuzzy name matching in solr
David Murgatroyd
 
Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!
Oliver Gierke
 
An introduction into Spring Data
An introduction into Spring DataAn introduction into Spring Data
An introduction into Spring Data
Oliver Gierke
 
Querydsl fin jug - june 2012
Querydsl   fin jug - june 2012Querydsl   fin jug - june 2012
Querydsl fin jug - june 2012
Timo Westkämper
 
Functional programming using underscorejs
Functional programming using underscorejsFunctional programming using underscorejs
Functional programming using underscorejs偉格 高
 
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, GermanyHarnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
André Ricardo Barreto de Oliveira
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
Karwin Software Solutions LLC
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
Philips Kokoh Prasetyo
 
2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator
Alberto Paro
 
2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator
Alberto Paro
 
C# 7.0 Hacks and Features
C# 7.0 Hacks and FeaturesC# 7.0 Hacks and Features
C# 7.0 Hacks and Features
Abhishek Sur
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Trey Grainger
 
Webinar: Simplifying Persistence for Java and MongoDB
Webinar: Simplifying Persistence for Java and MongoDBWebinar: Simplifying Persistence for Java and MongoDB
Webinar: Simplifying Persistence for Java and MongoDB
MongoDB
 
d3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlind3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlin
Toshiaki Katayama
 
Simplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with MorphiaSimplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with Morphia
MongoDB
 
[2019-07] GraphQL in depth (serverside)
[2019-07] GraphQL in depth (serverside)[2019-07] GraphQL in depth (serverside)
[2019-07] GraphQL in depth (serverside)
croquiscom
 
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologySimple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Lucidworks
 
Next-generation API Development with GraphQL and Prisma
Next-generation API Development with GraphQL and PrismaNext-generation API Development with GraphQL and Prisma
Next-generation API Development with GraphQL and Prisma
Nikolas Burk
 

Similar to Simple fuzzy Name Matching in Elasticsearch - Graham Morehead (20)

Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
 
Simple fuzzy name matching in solr
Simple fuzzy name matching in solrSimple fuzzy name matching in solr
Simple fuzzy name matching in solr
 
Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!
 
An introduction into Spring Data
An introduction into Spring DataAn introduction into Spring Data
An introduction into Spring Data
 
Querydsl fin jug - june 2012
Querydsl   fin jug - june 2012Querydsl   fin jug - june 2012
Querydsl fin jug - june 2012
 
Functional programming using underscorejs
Functional programming using underscorejsFunctional programming using underscorejs
Functional programming using underscorejs
 
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, GermanyHarnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
 
Lucene in Action
Lucene in ActionLucene in Action
Lucene in Action
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
 
2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator
 
2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator
 
C# 7.0 Hacks and Features
C# 7.0 Hacks and FeaturesC# 7.0 Hacks and Features
C# 7.0 Hacks and Features
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
 
Webinar: Simplifying Persistence for Java and MongoDB
Webinar: Simplifying Persistence for Java and MongoDBWebinar: Simplifying Persistence for Java and MongoDB
Webinar: Simplifying Persistence for Java and MongoDB
 
d3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlind3sparql.js demo at SWAT4LS 2014 in Berlin
d3sparql.js demo at SWAT4LS 2014 in Berlin
 
Simplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with MorphiaSimplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with Morphia
 
[2019-07] GraphQL in depth (serverside)
[2019-07] GraphQL in depth (serverside)[2019-07] GraphQL in depth (serverside)
[2019-07] GraphQL in depth (serverside)
 
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologySimple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
 
Next-generation API Development with GraphQL and Prisma
Next-generation API Development with GraphQL and PrismaNext-generation API Development with GraphQL and Prisma
Next-generation API Development with GraphQL and Prisma
 

More from Basis Technology

Product Update: Customization with Rosette
Product Update: Customization with RosetteProduct Update: Customization with Rosette
Product Update: Customization with Rosette
Basis Technology
 
Smart Matching for Screening Webinar - May 2020
Smart Matching for Screening Webinar - May 2020Smart Matching for Screening Webinar - May 2020
Smart Matching for Screening Webinar - May 2020
Basis Technology
 
Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020
Basis Technology
 
Rosette Product Update (May 2019)
Rosette Product Update (May 2019)Rosette Product Update (May 2019)
Rosette Product Update (May 2019)
Basis Technology
 
Simple fuzzy name matching in elasticsearch paris meetup
Simple fuzzy name matching in elasticsearch   paris meetupSimple fuzzy name matching in elasticsearch   paris meetup
Simple fuzzy name matching in elasticsearch paris meetup
Basis Technology
 
Gregor Stewart - OSIRA 2014
Gregor Stewart - OSIRA 2014Gregor Stewart - OSIRA 2014
Gregor Stewart - OSIRA 2014
Basis Technology
 
Basis Technology showcase at elasticsearch meetup in Japan
Basis Technology showcase at elasticsearch meetup in JapanBasis Technology showcase at elasticsearch meetup in Japan
Basis Technology showcase at elasticsearch meetup in Japan
Basis Technology
 
HLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff Godbold
HLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff GodboldHLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff Godbold
HLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff Godbold
Basis Technology
 
HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier
HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian CarrierHLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier
HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier
Basis Technology
 
OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
OSS 2013 - Real World Facets with Entity Resolution by Benson MarguliesOSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
Basis Technology
 
HLT 2013 - Adapting News-Trained Entity Extraction to New Domains and Emergin...
HLT 2013 - Adapting News-Trained Entity Extraction to New Domains and Emergin...HLT 2013 - Adapting News-Trained Entity Extraction to New Domains and Emergin...
HLT 2013 - Adapting News-Trained Entity Extraction to New Domains and Emergin...
Basis Technology
 
HLT 2013 - From Research to Reality: Advances in HLT by David Murgatroyd
HLT 2013 - From Research to Reality: Advances in HLT by David MurgatroydHLT 2013 - From Research to Reality: Advances in HLT by David Murgatroyd
HLT 2013 - From Research to Reality: Advances in HLT by David Murgatroyd
Basis Technology
 
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics PlatformAutopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
Basis Technology
 
A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
A Lightning Introduction To Clouds & HLT - Human Language Technology ConferenceA Lightning Introduction To Clouds & HLT - Human Language Technology Conference
A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
Basis Technology
 
Autopsy 3.0 - Open Source Digital Forensics Conference
Autopsy 3.0 - Open Source Digital Forensics ConferenceAutopsy 3.0 - Open Source Digital Forensics Conference
Autopsy 3.0 - Open Source Digital Forensics Conference
Basis Technology
 
Big Data Triage with Rosette Human Language Technology Conference
Big Data Triage with Rosette Human Language Technology ConferenceBig Data Triage with Rosette Human Language Technology Conference
Big Data Triage with Rosette Human Language Technology Conference
Basis Technology
 
Multilingual Search and Text Analytics with Solr - Open Source Search Conference
Multilingual Search and Text Analytics with Solr - Open Source Search ConferenceMultilingual Search and Text Analytics with Solr - Open Source Search Conference
Multilingual Search and Text Analytics with Solr - Open Source Search Conference
Basis Technology
 

More from Basis Technology (17)

Product Update: Customization with Rosette
Product Update: Customization with RosetteProduct Update: Customization with Rosette
Product Update: Customization with Rosette
 
Smart Matching for Screening Webinar - May 2020
Smart Matching for Screening Webinar - May 2020Smart Matching for Screening Webinar - May 2020
Smart Matching for Screening Webinar - May 2020
 
Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020
 
Rosette Product Update (May 2019)
Rosette Product Update (May 2019)Rosette Product Update (May 2019)
Rosette Product Update (May 2019)
 
Simple fuzzy name matching in elasticsearch paris meetup
Simple fuzzy name matching in elasticsearch   paris meetupSimple fuzzy name matching in elasticsearch   paris meetup
Simple fuzzy name matching in elasticsearch paris meetup
 
Gregor Stewart - OSIRA 2014
Gregor Stewart - OSIRA 2014Gregor Stewart - OSIRA 2014
Gregor Stewart - OSIRA 2014
 
Basis Technology showcase at elasticsearch meetup in Japan
Basis Technology showcase at elasticsearch meetup in JapanBasis Technology showcase at elasticsearch meetup in Japan
Basis Technology showcase at elasticsearch meetup in Japan
 
HLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff Godbold
HLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff GodboldHLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff Godbold
HLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff Godbold
 
HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier
HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian CarrierHLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier
HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier
 
OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
OSS 2013 - Real World Facets with Entity Resolution by Benson MarguliesOSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
 
HLT 2013 - Adapting News-Trained Entity Extraction to New Domains and Emergin...
HLT 2013 - Adapting News-Trained Entity Extraction to New Domains and Emergin...HLT 2013 - Adapting News-Trained Entity Extraction to New Domains and Emergin...
HLT 2013 - Adapting News-Trained Entity Extraction to New Domains and Emergin...
 
HLT 2013 - From Research to Reality: Advances in HLT by David Murgatroyd
HLT 2013 - From Research to Reality: Advances in HLT by David MurgatroydHLT 2013 - From Research to Reality: Advances in HLT by David Murgatroyd
HLT 2013 - From Research to Reality: Advances in HLT by David Murgatroyd
 
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics PlatformAutopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
 
A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
A Lightning Introduction To Clouds & HLT - Human Language Technology ConferenceA Lightning Introduction To Clouds & HLT - Human Language Technology Conference
A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
 
Autopsy 3.0 - Open Source Digital Forensics Conference
Autopsy 3.0 - Open Source Digital Forensics ConferenceAutopsy 3.0 - Open Source Digital Forensics Conference
Autopsy 3.0 - Open Source Digital Forensics Conference
 
Big Data Triage with Rosette Human Language Technology Conference
Big Data Triage with Rosette Human Language Technology ConferenceBig Data Triage with Rosette Human Language Technology Conference
Big Data Triage with Rosette Human Language Technology Conference
 
Multilingual Search and Text Analytics with Solr - Open Source Search Conference
Multilingual Search and Text Analytics with Solr - Open Source Search ConferenceMultilingual Search and Text Analytics with Solr - Open Source Search Conference
Multilingual Search and Text Analytics with Solr - Open Source Search Conference
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 

Simple fuzzy Name Matching in Elasticsearch - Graham Morehead

  • 1. Name Matching with Elasticsearch June 25, 2015 Graham Morehead gmorehead@basistech.com
  • 2. April 15 2013 2:49 PM .
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8. Real life example David K. Murgatroyd VP of Engineering Boarding Pass
  • 9. Best Practice using Elasticsearch? ● NameMapper (http://stackoverflow.com/questions/20632042/elasticsearch-searching-for-human-names) "mappings": { ... "type": "multi_field", "fields": { "pty_surename": { "type": "string", "analyzer": "simple" }, "metaphone": { "type": "string", "analyzer": "metaphone" }, "porter": { "type": "string", "analyzer": "porter" } … ● rescore_query
  • 10. “Jesus Alfonso Lopez Diaz” vs. “LobezDias, Chuy”
  • 11. RNI
  • 12.
  • 13.
  • 14.
  • 15. Rescore Query Main Query Plug-in Implementation match : { name: "Bob Smitty" } bool: name.Key1:... name.Key2:... name.Key3:... User Query Rescore name_score : { field : "name", name : "Bob Smitty") name:"Robert Smith" dob:2/13/1987 score : .79 Indexing { name: "Robert Smith" dob:" 1987/02/13" } { name: "Robert Smith" name.Key1:… name.Key2:… name.Key3:… dob: "1987/02/13" } User Doc Index subset
  • 16. Demo
  • 18. Name Matching with Elasticsearch June 25, 2015 Graham Morehead gmorehead@basistech.com
  • 19. How could you use such a Field? ● Plugin contains custom mapper which does all the work behind the scenes PUT /ofac/ofac/_mapping { "ofac" : { "properties" : { "name" : { "type:" : "rni_name" } "aka" : { "type:" : "rni_name" } } } }
  • 20. What happens at index time? ● NameMapper indexes keys for different phenomena in separate (sub) fields @Override public void parse(ParseContext context) throws IOException { Name name = NameBuilder.data(nameString).build(); //Generate keys for name Collection<FieldSpec> fields = helper.deriveFieldsForName(name); //Parse each key with the appropriate Mapper for (FieldSpec field : fields) { Mapper mapper = keyMappers.get(field.getField().fieldName()); context = context.createExternalValueContext(field.getStringValue()); mapper.parse(context); } }
  • 21. What happens at query time? ● Step #1: NameMapper generates analogous keys for a custom Lucene query that finds good candidates for re-scoring @Override public Query termQuery(Object value, @Nullable QueryParseContext context) { //Parse name string Name name = NameBuilder.data(value.toString()).build(); QuerySpec spec = helper.buildQuerySpec(new NameIndexQuery(name)); //Build Lucene query Query query = spec.accept(new ESQueryVisitor(names.indexName() + ".")); return query; }
  • 22. What else happens at query time? ● Step #2: Uses a Rescore query to score names in the best candidate documents and reorder accordingly ○ Tuned for high precision name matching ○ Computationally expensive "rescore" : { "query" : { "rescore_query" : { "function_score" : { "name_score" : { "field" : "name", "query_name" : "LobEzDiaS, Chuy" } ...
  • 23. ● The 'name_score' function matches the query name against the indexed name in every candidate document and returns the similarity score @Override public double score(int docId, float subQueryScore) { //Create a scorer for the query name CachedScorer cs = createCachedScorer(queryName); //Retrieve name data from doc values nameByteData.setDocument(docId); Name indexName = bytesToName(nameByteData.valueAt(i).bytes); //Score the query against the indexed name in this document return cs.score(indexName); } What does that function do?