SlideShare a Scribd company logo
1 of 33
Dave Moore
david.moore@elastic.co
Real-Time Entity Resolution
With Elasticsearch
1
Disambiguation
Entity Entity
Single attributes in unstructured text
"Named Entity Recognition"
Multiple attributes in structured data
"Entity Resolution"
vs.
Person
Field Value
Name Alice Jones
DOB 1984-01-01
Street 123 Main St
Credit Card 4040 0000 2020 8080
Phone 202-555-1234
2
What is entity resolution?
Health Care
Patient ID
We need to identify
and their medical
many hand-written
Mixing up records puts
at risk of injury or
Sales & Marketing
Customer Intel
We have reps
managing many
sources of info on
leads and customers.
Our view of the buyer
is fragmented and that
makes us less
effective. We're losing
pipeline.
Security & Compliance
Fraud
We need to track a
person or device that is
hiding its tracks.
Connecting the dots is
a
laborious process and
we can't keep up with
our incident backlog.
Military, IC, Law
Surveillance
We need to track a
person or device that is
hiding its identity. Our
timely success is
critical to public safety
and national security.
Privacy Compliance
GDPR
We must find and
manage all PII to
respond to inquiries.
Failure to comply risks
fines of €20 million or
4% annual turnover.
IT
MDM
MDM is a slow and
bureaucratic process.
We can solve our own
data quality problems
faster and better. And
we still need query
time entity resolution.
3
Examples
4
Why is identity hard to track?
Ali Jones
123 W Main Street
ABC Wigdets
4040 0000 2020 8008
+1 (202) 555 1234
5
1. Identity is Vague
Allie Jones
123 Main St
ABC Widgets, Inc.
4040 0000 2020 8080
202-555-1234
Icons by icons8
Ali Jones
123 W Main Street
ABC Wigdets
4040 0000 2020 8008
+1 (202) 555 1234
Alison Jones-Smith
555 Brooad Street
XYZ Tech
3030 5500 9999 0000
2025559867
6
2. Identity Changes
Allie Jones
123 Main St
ABC Widgets, Inc.
4040 0000 2020 8080
202-555-1234
Allison Smith
555 Broad St
XYZ Technology Corp.
3030 5050 9999 0000
202-555-9876
Icons by icons8
Ali Jones
123 W Main Street
ABC Wigdets
4040 0000 2020 8008
+1 (202) 555 1234
Alison Jones-Smith
555 Brooad Street
XYZ Tech
3030 5500 9999 0000
2025559867
7
3. Identity is Messy
Allie Jones
123 Main St
ABC Widgets, Inc.
4040 0000 2020 8080
202-555-1234
Allison Smith
555 Broad St
XYZ Technology Corp.
3030 5050 9999 0000
202-555-9876
Icons by icons8
8
4. Identity is Diverse
Ali Jones
123 W Main Street
ABC Wigdets
4040 0000 2020 8008
+1 (202) 555 1234
Alison Jones-Smith
555 Brooad Street
XYZ Tech
3030 5500 9999 0000
2025559867
Allie Jones
123 Main St
ABC Widgets, Inc.
4040 0000 2020 8080
202-555-1234
Allison Smith
555 Broad St
XYZ Technology Corp.
3030 5050 9999 0000
202-555-9876
???
???
???
???
Icons by icons8
9
Entity Resolution
connects the dots despite these challenges
Allie Jones 123 Main St ABC Widgets, Inc. 4040 0000 2020 8080 202-555-1234
Allie Jones 123 Main Street ABC Widgets 4040 0000 2020 8080 202.555.1234
Ali Jones 123 W Main Street ABC Wigdets 4040 0000 2020 8008 +1 (202) 555 1234
Allie Jones 132 W Main Street ABC Widgets 4040 0000 2020 8080 202 555 1234
Allie Smith 123 Main St ABC Widgets, Inc. 4040 0000 2020 8080 202-555-1234
Allie Smith 123 Main Street ABC Widgets 4040 0000 2020 8080 202.555.1234
Ali Smith 123 W Main Street ABC Wigdets 4040 0000 2020 8008 +1 (202) 555 1234
Allie Smith 555 Broad St ABC Widgets, Inc 4040 0000 2020 8080 202-555-1234
Allie Smith 555 Broad Street XYZ Tech Corp 3030 5050 9999 0000 202.555.1234
Allie Smith 555 Broad Street XYZ Technology Corp 3030 5050 9999 0000 202-555-9876
10
Comparison to Search
Search Resolution
name:"Allie Jones" AND street:"123 Main St" name:"Allie Jones" AND street:"123 Main St"
Allie Jones 123 Main St ABC Widgets, Inc. 4040 0000 2020 8080 202-555-1234
Allie Jones 123 Main Street ABC Widgets 4040 0000 2020 8080 202.555.1234
Ali Jones 123 W Main Street ABC Wigdets 4040 0000 2020 8008 +1 (202) 555 1234
Ali Jones 132 Mane Street ABC Widgets 4024 0071 4970 1227 888-555-5555
Aly Jonas 113 Main Street Acme Corp. 4716 1035 4536 4671 610-555-5555
Allie Jones 132 W Main Street ABC Widgets 4040 0000 2020 8080 202-555-9876
Al Jones 132 E Main St Mom & Pop, LLC 3772 733741 52501 1-610-555-0000
Aly Jones 113 Main St, #102 Acme Corp. 4716 1035 4536 4671 610-555-5555
Ali Jones 132 Mane Street ABC Widgets 4024 0071 4970 1227 888-555-1234
Aly Jonas 113 Main Street Acme Corp. 4781 9105 0533 4481 610-555-2345
Allie Johns 132 W Main Street ABC Widgets 4088 0110 2044 8180 202-555-3456
Elle Jeon 132 E Main St Mom & Pop, LLC 3502 730741 52203 1-610-555-4567
Elle Jones 113 Main St, #102 Acme Corp. 4716 1035 4536 4671 610-555-5678
Eli Jones 132 Mane Street ABC Widgets 4224 0065 4800 1337 888-555-6789
Eli Joans 113 Main Street Acme Corp. 4206 1035 4536 4081 610-555-7890
Allie Jeans 132 N Mean Street ABC Widgets 4240 0101 02020 8888 202-555-8901
Search engine ranks results once.
True hits mixed with noise.
Search engine filters results recursively.
True hits isolated and transitively linked.
11
Real-Time
12
Batch vs. Real-Time
Batch Real-Time
How is it used? Resolve all entities in advance
(Partitioning, pairwise scoring, connected
components)
How long does it take? Docs + (Docs/Partitions)2 + Components2
(Hours for billions of documents)
When is it necessary? Population or network analysis
Most solutions have a real-time phase,
sometimes applied after batch resolution.
How is it used? Resolve one entity on query
(Recursive Boolean query)
How long does it take? Indices * Attributes * Hops
(Milliseconds for a handful of each)
When is it necessary? Individual analysis
Robust matching
β€’ Token normalization
β€’ Phonetic matching
β€’ Fuzzy transpositions
β€’ Boolean logic filtering
β€’ Fine-tune search parameters
13
Real-Time
Why Elasticsearch
Suited for operations
β€’ Horizontal scaling
β€’ Real-time response rates
β€’ Flexible index mappings
14
Approach
β€’ Fast – Get results in real-time. From milliseconds to low seconds.
β€’ Generic – Resolve any type of entity. People, companies, locations, sessions, etc.
β€’ Transitive – Resolve over multiple hops of matches. Capture changing identities.
β€’ Multi-source – Resolve over multiple indices with disparate mappings.
β€’ Accommodating – Operate on data as it exists. Avoid transforming and reindexing
data.
β€’ Logical – Logic is easier to read, troubleshoot, and optimize than statistics.
β€’ 100% Elasticsearch – Operate within existing search infrastructure.
Goals
15
Approach
1. Entity modeling – What is the entity? What are its attributes?
2. Analyzers – How are you indexing each attribute?
3. Matchers – What is the query logic for each attribute?
4. Resolvers – What combinations of matching attributes imply a resolution?
5. Metadata maps – Which matchers apply to which indexed fields?
6. Recursive queries – How to repeat the queries until completion?
Steps
16
zentity
zentity.io
An open source Elasticsearch plugin
for real-time entity resolution
zentity
zentity.io
An open source Elasticsearch plugin
for real-time entity resolution
17
POST _zentity/resolution/person
{
"attributes": {
"name": "Alice Jones",
"dob": "1984-01-01",
"phone": [ "555-123-4567", "555-987-6543" ]
}
}
18
Demos
19
Demos
Customer intelligence
Gather everything we know about a customer.
Web traffic sessionization
Track a bot that cycles through IP addresses, cookies, and user agent signatures.
Fraud detection
Determine if a health care provider was blacklisted under a different name.
Dave Moore
email: david.moore@elastic.co
zentity: zentity.io
Contact
@elastic
www.elastic.co
Extra Content
22
Approach
23
Step 1. Entity Modeling
Person
Name the entity type.
Name – First Name
Name – Last Name
Address – Street
Address – City
Address – Province
Address – Postal Code
Address – Country
Date of Birth
Phone Number
Email Address
IP Address
Credit Card Number
Social Security Number
Define its attributes. Study them in your data sets.
Uniqueness Consistency Presence
Moderate
Moderate
High
Low
Low
Low
Low
Moderate
Moderate
High
High
Extreme
Extreme
Moderate
Moderate
Low
Moderate
High
High
High
High
Moderate
Extreme
Extreme
Extreme
High
Extreme
High
Moderate
Moderate
High
High
High
Moderate
Moderate
Moderate
Low
Low
None
Icons by icons8
24
Step 1. Entity Modeling
Person
Name the entity type.
Name – First Name
Name – Last Name
Address – Street
Address – City
Address – Province
Address – Postal Code
Address – Country
Date of Birth
Phone Number
Email Address
IP Address
Credit Card Number
Social Security Number
Define its attributes. Study them in your data sets.
Uniqueness Consistency Presence
Moderate
Moderate
High
Low
Low
Low
Low
Moderate
Moderate
High
High
Extreme
Extreme
Moderate
Moderate
Low
Moderate
High
High
High
High
Moderate
Extreme
Extreme
Extreme
High
Extreme
High
Moderate
Moderate
High
High
High
Moderate
Moderate
Moderate
Low
Low
None
This model is independent from your indices.
You can reuse and extend this model as you add or amend indices.
Icons by icons8
Name – First Name
Name – Last Name
Address – Street
Address – City
Address – Province
Address – Postal Code
Address – Country
Date of Birth
Phone Number
Email Address
IP Address
Credit Card Number
Social Security Number
Phonetic
"Alice Jones" => ["ALAC","JAN"]
Standard
"Alice Jones" => ["ALICE","JONES"]
25
Step 2. Analyzers
Take the attributes. Define their analyzers. Put them in your index mappings.
{
"settings": {
"index": {
"analysis": {
"filter": {
"phonetic": {
"type": "phonetic",
"encoder": "nysiis"
}
},
"analyzer": {
"phonetic": {
"filter": [
"icu_normalizer",
"icu_folding",
"phonetic"
],
"tokenizer": "standard"
}
}
}
}
}
}
{
"mappings": {
"_doc": {
"properties": {
β€œfirst_name": {
"type": "text",
"fields": {
"phonetic": {
"type": "text",
"analyzer": "phonetic"
}
}
}
}
}
}
}
Person
Icons by icons8
Name – First Name
Name – Last Name
Address – Street
Address – City
Address – Province
Address – Postal Code
Address – Country
Date of Birth
Phone Number
Email Address
IP Address
Credit Card Number
Social Security Number
Phonetic
"Alice Jones" => ["ALAC","JAN"]
Standard
"Alice Jones" => ["ALICE","JONES"]
26
Step 2. Analyzers
Take the attributes. Define their analyzers. Put them in your index mappings.
{
"settings": {
"index": {
"analysis": {
"filter": {
"phonetic": {
"type": "phonetic",
"encoder": "nysiis"
}
},
"analyzer": {
"phonetic": {
"filter": [
"icu_normalizer",
"icu_folding",
"phonetic"
],
"tokenizer": "standard"
}
}
}
}
}
}
{
"mappings": {
"_doc": {
"properties": {
β€œfirst_name": {
"type": "text",
"fields": {
"phonetic": {
"type": "text",
"analyzer": "phonetic"
}
}
}
}
}
}
}
Person
Analyzers are powerful. But they must be defined prior to indexing.
Give careful thought to your analyzers to avoid having to reindex data.
Icons by icons8
Phonetic
{
"match": {
"{{ field }}": {
"query": "{{ value }}",
"fuzziness": 0
}
}
}
Standard
{
"match": {
"{{ field }}": {
"query": "{{ value }}",
"fuzziness": 2
}
}
}
27
Step 3. Matchers
Take the attributes.
Name – First Name
Name – Last Name
Address – Street
Address – City
Address – Province
Address – Postal Code
Address – Country
Date of Birth
Phone Number
Email Address
IP Address
Credit Card Number
Social Security Number
Define their Boolean query logic. Use templates for variables.
Person
{{ field }} – The field of an index.
{{ value }} – The value of an attribute.
We will replace these at query time.
Icons by icons8
Phonetic
{
"match": {
"{{ field }}": {
"query": "{{ value }}β€œ,
"fuzziness": 0
}
}
}
Standard
{
"match": {
"{{ field }}": {
"query": "{{ value }}β€œ,
"fuzziness": 2
}
}
}
28
Step 3. Matchers
Take the attributes.
Name – First Name
Name – Last Name
Address – Street
Address – City
Address – Province
Address – Postal Code
Address – Country
Date of Birth
Phone Number
Email Address
IP Address
Credit Card Number
Social Security Number
Define their Boolean query logic. Use templates for variables.
Person
{{ field }} – The field of an index.
{{ value }} – The value of an attribute.
We will replace these at query time.
Understand that each matcher will be combined
into one large Boolean query.
Icons by icons8
29
Step 4. Resolvers
Take the attributes.
Name – First Name
Name – Last Name
Address – Street
Address – City
Address – Province
Address – Postal Code
Address – Country
Date of Birth
Phone Number
Email Address
IP Address
Credit Card Number
Social Security Number
Determine which combinations of matching attributes imply a resolution.
[ Name – First, Name – Last, Address – Street, Address – City, Address – State ]
[ Name – First, Name – Last, Address – Street, Address – Postal Code ]
[ Name – First, Name – Last, Date of Birth, Address – City, Address – State ]
[ Name – First, Name – Last, Date of Birth, Address – Postal Code ]
[ Name – First, Name – Last, Phone Number ]
[ Name – First, Name – Last, Email Address ]
[ Name – First, Name – Last, IP Address ]
[ Name – First, Name – Last, Credit Card Number ]
[ Name – First, Name – Last, Social Security Number]
[ Email Address, Phone Number ]
[ Email Address, IP Address ]
[ Email Address, Credit Card Number ]
[ IP Address, Credit Card Number ]
Person
Icons by icons8
30
Step 4. Resolvers
Take the attributes.
Name – First Name
Name – Last Name
Address – Street
Address – City
Address – Province
Address – Postal Code
Address – Country
Date of Birth
Phone Number
Email Address
IP Address
Credit Card Number
Social Security Number
Determine which combinations of matching attributes imply a resolution.
[ Name – First, Name – Last, Address – Street, Address – City, Address – State ]
[ Name – First, Name – Last, Address – Street, Address – Postal Code ]
[ Name – First, Name – Last, Date of Birth, Address – City, Address – State ]
[ Name – First, Name – Last, Date of Birth, Address – Postal Code ]
[ Name – First, Name – Last, Phone Number ]
[ Name – First, Name – Last, Email Address ]
[ Name – First, Name – Last, IP Address ]
[ Name – First, Name – Last, Credit Card Number ]
[ Name – First, Name – Last, Social Security Number]
[ Email Address, Phone Number ]
[ Email Address, IP Address ]
[ Email Address, Credit Card Number ]
[ IP Address, Credit Card Number ]
Person
Avoid resolving on a single attribute such as Social Security Number.
Corroboration among multiple attributes helps prevent snowballs.
Icons by icons8
31
Step 5. Metadata Maps
Take the attributes.
Name – First Name
Name – Last Name
Address – Street
Address – City
Address – Province
Address – Postal Code
Address – Country
Date of Birth
Phone Number
Email Address
IP Address
Credit Card Number
Social Security Number
Map them to the fields of the relevant indices.
users.first_name
users.last_name
users.phone
users.email
customers:fname
customers:lname
customers:tel
customers:email
customers:cc
customers:zip
Person
Icons by icons8
32
Step 6. Recursive Queries
With each query, new inputs might be found in different attributes.
Use the metadata map and your resolvers to determine if you can
create new queries for the new inputs.

More Related Content

What's hot

ORC: 2015 Faster, Better, Smaller
ORC: 2015 Faster, Better, SmallerORC: 2015 Faster, Better, Smaller
ORC: 2015 Faster, Better, SmallerDataWorks Summit
Β 
Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with PostgresEDB
Β 
RedisConf18 - Redis Memory Optimization
RedisConf18 - Redis Memory OptimizationRedisConf18 - Redis Memory Optimization
RedisConf18 - Redis Memory OptimizationRedis Labs
Β 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
Β 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesNeo4j
Β 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureDataWorks Summit
Β 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
Β 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive QueriesDataWorks Summit
Β 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks
Β 
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query OptimizationHive Data Modeling and Query Optimization
Hive Data Modeling and Query OptimizationEyad Garelnabi
Β 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
Β 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeDatabricks
Β 
SSIS Tutorial For Beginners | SQL Server Integration Services (SSIS) | MSBI T...
SSIS Tutorial For Beginners | SQL Server Integration Services (SSIS) | MSBI T...SSIS Tutorial For Beginners | SQL Server Integration Services (SSIS) | MSBI T...
SSIS Tutorial For Beginners | SQL Server Integration Services (SSIS) | MSBI T...Edureka!
Β 
Vectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
Vectorized UDF: Scalable Analysis with Python and PySpark with Li JinVectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
Vectorized UDF: Scalable Analysis with Python and PySpark with Li JinDatabricks
Β 
Optimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningOptimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningDatabricks
Β 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseMike Dirolf
Β 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetDataWorks Summit/Hadoop Summit
Β 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationVolodymyr Rovetskiy
Β 

What's hot (20)

Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
Β 
ORC: 2015 Faster, Better, Smaller
ORC: 2015 Faster, Better, SmallerORC: 2015 Faster, Better, Smaller
ORC: 2015 Faster, Better, Smaller
Β 
Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with Postgres
Β 
RedisConf18 - Redis Memory Optimization
RedisConf18 - Redis Memory OptimizationRedisConf18 - Redis Memory Optimization
RedisConf18 - Redis Memory Optimization
Β 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Β 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
Β 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and future
Β 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
Β 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
Β 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
Β 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Β 
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query OptimizationHive Data Modeling and Query Optimization
Hive Data Modeling and Query Optimization
Β 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
Β 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
Β 
SSIS Tutorial For Beginners | SQL Server Integration Services (SSIS) | MSBI T...
SSIS Tutorial For Beginners | SQL Server Integration Services (SSIS) | MSBI T...SSIS Tutorial For Beginners | SQL Server Integration Services (SSIS) | MSBI T...
SSIS Tutorial For Beginners | SQL Server Integration Services (SSIS) | MSBI T...
Β 
Vectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
Vectorized UDF: Scalable Analysis with Python and PySpark with Li JinVectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
Vectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
Β 
Optimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningOptimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File Pruning
Β 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source Database
Β 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
Β 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentation
Β 

Similar to Real time entity resolution with elasticsearch - haystack 2018

Real-Time Entity Resolution with Elasticsearch - Haystack 2018
Real-Time Entity Resolution with Elasticsearch - Haystack 2018Real-Time Entity Resolution with Elasticsearch - Haystack 2018
Real-Time Entity Resolution with Elasticsearch - Haystack 2018zentity.io
Β 
In:Confidence 2019 - Balancing the conflicting objectives of data access and ...
In:Confidence 2019 - Balancing the conflicting objectives of data access and ...In:Confidence 2019 - Balancing the conflicting objectives of data access and ...
In:Confidence 2019 - Balancing the conflicting objectives of data access and ...Privitar
Β 
Privacy solutions decode2021_jon_oliver
Privacy solutions decode2021_jon_oliverPrivacy solutions decode2021_jon_oliver
Privacy solutions decode2021_jon_oliverJonathanOliver26
Β 
How We Did It: The Case of the Credit Card Breach
How We Did It: The Case of the Credit Card BreachHow We Did It: The Case of the Credit Card Breach
How We Did It: The Case of the Credit Card BreachTeradata
Β 
Mastering Location Data – a new paradigm in network analytics
Mastering Location Data – a new paradigm in network analyticsMastering Location Data – a new paradigm in network analytics
Mastering Location Data – a new paradigm in network analyticsPrecisely
Β 
Global AI Bootcamp Singapore - Keynote
Global AI Bootcamp Singapore - KeynoteGlobal AI Bootcamp Singapore - Keynote
Global AI Bootcamp Singapore - KeynoteAlex Smith
Β 
The Domains of Identity & Self-Sovereign Identity MyData 2018
The Domains of Identity & Self-Sovereign Identity MyData 2018The Domains of Identity & Self-Sovereign Identity MyData 2018
The Domains of Identity & Self-Sovereign Identity MyData 2018Kaliya "Identity Woman" Young
Β 
Introduction of Artificial Intelligence
Introduction of Artificial IntelligenceIntroduction of Artificial Intelligence
Introduction of Artificial IntelligenceAkhileshwar Nirala
Β 
All Clearances or Cyber Virtual Job Fair Handbook June 3, 2020, Huntsville
All Clearances or Cyber Virtual Job Fair Handbook June 3, 2020, HuntsvilleAll Clearances or Cyber Virtual Job Fair Handbook June 3, 2020, Huntsville
All Clearances or Cyber Virtual Job Fair Handbook June 3, 2020, HuntsvilleClearedJobs.Net
Β 
Self-Sovereign Identity: Lightening Talk at RightsCon
Self-Sovereign Identity: Lightening Talk at RightsCon Self-Sovereign Identity: Lightening Talk at RightsCon
Self-Sovereign Identity: Lightening Talk at RightsCon Kaliya "Identity Woman" Young
Β 
Trusting AI with important decisions
Trusting AI with important decisionsTrusting AI with important decisions
Trusting AI with important decisionsLouis Dorard
Β 
Cybersecurity for Marketing
Cybersecurity for Marketing Cybersecurity for Marketing
Cybersecurity for Marketing Alert Logic
Β 
TECHNOLOGY: Solution to our woos not Politicians & INTERNET of THINGS in Nuts...
TECHNOLOGY: Solution to our woos not Politicians & INTERNET of THINGS in Nuts...TECHNOLOGY: Solution to our woos not Politicians & INTERNET of THINGS in Nuts...
TECHNOLOGY: Solution to our woos not Politicians & INTERNET of THINGS in Nuts...Ravi Chandra
Β 
Huntsville All Clearances or Cyber Virtual Job Fair Handbook June 3
Huntsville All Clearances or Cyber Virtual Job Fair Handbook June 3Huntsville All Clearances or Cyber Virtual Job Fair Handbook June 3
Huntsville All Clearances or Cyber Virtual Job Fair Handbook June 3ClearedJobs.Net
Β 
Database Design Disasters
Database Design DisastersDatabase Design Disasters
Database Design DisastersRichie Rump
Β 
What You Need to Know About Robotic Process Automation: How It Works & Real-W...
What You Need to Know About Robotic Process Automation: How It Works & Real-W...What You Need to Know About Robotic Process Automation: How It Works & Real-W...
What You Need to Know About Robotic Process Automation: How It Works & Real-W...Captricity
Β 
What i learned at the infosecurity isaca north america expo and conference 2019
What i learned at the infosecurity isaca north america expo and conference 2019What i learned at the infosecurity isaca north america expo and conference 2019
What i learned at the infosecurity isaca north america expo and conference 2019Ulf Mattsson
Β 
Internet of Things Primer
Internet of Things PrimerInternet of Things Primer
Internet of Things PrimerStephen Bates
Β 
Supporting IT by David Meares
Supporting IT by David MearesSupporting IT by David Meares
Supporting IT by David MearesAlex Cachia
Β 

Similar to Real time entity resolution with elasticsearch - haystack 2018 (20)

Real-Time Entity Resolution with Elasticsearch - Haystack 2018
Real-Time Entity Resolution with Elasticsearch - Haystack 2018Real-Time Entity Resolution with Elasticsearch - Haystack 2018
Real-Time Entity Resolution with Elasticsearch - Haystack 2018
Β 
In:Confidence 2019 - Balancing the conflicting objectives of data access and ...
In:Confidence 2019 - Balancing the conflicting objectives of data access and ...In:Confidence 2019 - Balancing the conflicting objectives of data access and ...
In:Confidence 2019 - Balancing the conflicting objectives of data access and ...
Β 
Privacy solutions decode2021_jon_oliver
Privacy solutions decode2021_jon_oliverPrivacy solutions decode2021_jon_oliver
Privacy solutions decode2021_jon_oliver
Β 
How We Did It: The Case of the Credit Card Breach
How We Did It: The Case of the Credit Card BreachHow We Did It: The Case of the Credit Card Breach
How We Did It: The Case of the Credit Card Breach
Β 
Mastering Location Data – a new paradigm in network analytics
Mastering Location Data – a new paradigm in network analyticsMastering Location Data – a new paradigm in network analytics
Mastering Location Data – a new paradigm in network analytics
Β 
Global AI Bootcamp Singapore - Keynote
Global AI Bootcamp Singapore - KeynoteGlobal AI Bootcamp Singapore - Keynote
Global AI Bootcamp Singapore - Keynote
Β 
The Domains of Identity & Self-Sovereign Identity MyData 2018
The Domains of Identity & Self-Sovereign Identity MyData 2018The Domains of Identity & Self-Sovereign Identity MyData 2018
The Domains of Identity & Self-Sovereign Identity MyData 2018
Β 
Introduction of Artificial Intelligence
Introduction of Artificial IntelligenceIntroduction of Artificial Intelligence
Introduction of Artificial Intelligence
Β 
All Clearances or Cyber Virtual Job Fair Handbook June 3, 2020, Huntsville
All Clearances or Cyber Virtual Job Fair Handbook June 3, 2020, HuntsvilleAll Clearances or Cyber Virtual Job Fair Handbook June 3, 2020, Huntsville
All Clearances or Cyber Virtual Job Fair Handbook June 3, 2020, Huntsville
Β 
Self-Sovereign Identity: Lightening Talk at RightsCon
Self-Sovereign Identity: Lightening Talk at RightsCon Self-Sovereign Identity: Lightening Talk at RightsCon
Self-Sovereign Identity: Lightening Talk at RightsCon
Β 
Trusting AI with important decisions
Trusting AI with important decisionsTrusting AI with important decisions
Trusting AI with important decisions
Β 
Cybersecurity for Marketing
Cybersecurity for Marketing Cybersecurity for Marketing
Cybersecurity for Marketing
Β 
TECHNOLOGY: Solution to our woos not Politicians & INTERNET of THINGS in Nuts...
TECHNOLOGY: Solution to our woos not Politicians & INTERNET of THINGS in Nuts...TECHNOLOGY: Solution to our woos not Politicians & INTERNET of THINGS in Nuts...
TECHNOLOGY: Solution to our woos not Politicians & INTERNET of THINGS in Nuts...
Β 
Huntsville All Clearances or Cyber Virtual Job Fair Handbook June 3
Huntsville All Clearances or Cyber Virtual Job Fair Handbook June 3Huntsville All Clearances or Cyber Virtual Job Fair Handbook June 3
Huntsville All Clearances or Cyber Virtual Job Fair Handbook June 3
Β 
Database Design Disasters
Database Design DisastersDatabase Design Disasters
Database Design Disasters
Β 
IOT presentation
IOT presentationIOT presentation
IOT presentation
Β 
What You Need to Know About Robotic Process Automation: How It Works & Real-W...
What You Need to Know About Robotic Process Automation: How It Works & Real-W...What You Need to Know About Robotic Process Automation: How It Works & Real-W...
What You Need to Know About Robotic Process Automation: How It Works & Real-W...
Β 
What i learned at the infosecurity isaca north america expo and conference 2019
What i learned at the infosecurity isaca north america expo and conference 2019What i learned at the infosecurity isaca north america expo and conference 2019
What i learned at the infosecurity isaca north america expo and conference 2019
Β 
Internet of Things Primer
Internet of Things PrimerInternet of Things Primer
Internet of Things Primer
Β 
Supporting IT by David Meares
Supporting IT by David MearesSupporting IT by David Meares
Supporting IT by David Meares
Β 

More from OpenSource Connections

How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessOpenSource Connections
Β 
The right path to making search relevant - Taxonomy Bootcamp London 2019
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019OpenSource Connections
Β 
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullOpenSource Connections
Β 
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonHaystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonOpenSource Connections
Β 
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...OpenSource Connections
Β 
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajHaystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajOpenSource Connections
Β 
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...OpenSource Connections
Β 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlOpenSource Connections
Β 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesOpenSource Connections
Β 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerOpenSource Connections
Β 
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...OpenSource Connections
Β 
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...OpenSource Connections
Β 
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...OpenSource Connections
Β 
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...OpenSource Connections
Β 
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...OpenSource Connections
Β 
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...OpenSource Connections
Β 
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah ViaOpenSource Connections
Β 

More from OpenSource Connections (20)

Encores
EncoresEncores
Encores
Β 
Test driven relevancy
Test driven relevancyTest driven relevancy
Test driven relevancy
Β 
How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for Success
Β 
The right path to making search relevant - Taxonomy Bootcamp London 2019
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019
Β 
Payloads and OCR with Solr
Payloads and OCR with SolrPayloads and OCR with Solr
Payloads and OCR with Solr
Β 
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Β 
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonHaystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Β 
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Β 
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajHaystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Β 
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Β 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Β 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
Β 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Β 
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Β 
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Β 
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Β 
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Β 
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Β 
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Β 
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
Β 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
Β 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
Β 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
Β 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
Β 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
Β 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
Β 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
Β 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
Β 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
Β 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
Β 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
Β 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
Β 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
Β 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
Β 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
Β 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
Β 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
Β 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraΓΊjo
Β 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
Β 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
Β 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Β 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
Β 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
Β 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
Β 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Β 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Β 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
Β 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Β 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Β 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
Β 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Β 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Β 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
Β 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Β 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Β 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
Β 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Β 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Β 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Β 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Β 

Real time entity resolution with elasticsearch - haystack 2018

  • 2. 1 Disambiguation Entity Entity Single attributes in unstructured text "Named Entity Recognition" Multiple attributes in structured data "Entity Resolution" vs. Person Field Value Name Alice Jones DOB 1984-01-01 Street 123 Main St Credit Card 4040 0000 2020 8080 Phone 202-555-1234
  • 3. 2 What is entity resolution?
  • 4. Health Care Patient ID We need to identify and their medical many hand-written Mixing up records puts at risk of injury or Sales & Marketing Customer Intel We have reps managing many sources of info on leads and customers. Our view of the buyer is fragmented and that makes us less effective. We're losing pipeline. Security & Compliance Fraud We need to track a person or device that is hiding its tracks. Connecting the dots is a laborious process and we can't keep up with our incident backlog. Military, IC, Law Surveillance We need to track a person or device that is hiding its identity. Our timely success is critical to public safety and national security. Privacy Compliance GDPR We must find and manage all PII to respond to inquiries. Failure to comply risks fines of €20 million or 4% annual turnover. IT MDM MDM is a slow and bureaucratic process. We can solve our own data quality problems faster and better. And we still need query time entity resolution. 3 Examples
  • 5. 4 Why is identity hard to track?
  • 6. Ali Jones 123 W Main Street ABC Wigdets 4040 0000 2020 8008 +1 (202) 555 1234 5 1. Identity is Vague Allie Jones 123 Main St ABC Widgets, Inc. 4040 0000 2020 8080 202-555-1234 Icons by icons8
  • 7. Ali Jones 123 W Main Street ABC Wigdets 4040 0000 2020 8008 +1 (202) 555 1234 Alison Jones-Smith 555 Brooad Street XYZ Tech 3030 5500 9999 0000 2025559867 6 2. Identity Changes Allie Jones 123 Main St ABC Widgets, Inc. 4040 0000 2020 8080 202-555-1234 Allison Smith 555 Broad St XYZ Technology Corp. 3030 5050 9999 0000 202-555-9876 Icons by icons8
  • 8. Ali Jones 123 W Main Street ABC Wigdets 4040 0000 2020 8008 +1 (202) 555 1234 Alison Jones-Smith 555 Brooad Street XYZ Tech 3030 5500 9999 0000 2025559867 7 3. Identity is Messy Allie Jones 123 Main St ABC Widgets, Inc. 4040 0000 2020 8080 202-555-1234 Allison Smith 555 Broad St XYZ Technology Corp. 3030 5050 9999 0000 202-555-9876 Icons by icons8
  • 9. 8 4. Identity is Diverse Ali Jones 123 W Main Street ABC Wigdets 4040 0000 2020 8008 +1 (202) 555 1234 Alison Jones-Smith 555 Brooad Street XYZ Tech 3030 5500 9999 0000 2025559867 Allie Jones 123 Main St ABC Widgets, Inc. 4040 0000 2020 8080 202-555-1234 Allison Smith 555 Broad St XYZ Technology Corp. 3030 5050 9999 0000 202-555-9876 ??? ??? ??? ??? Icons by icons8
  • 10. 9 Entity Resolution connects the dots despite these challenges
  • 11. Allie Jones 123 Main St ABC Widgets, Inc. 4040 0000 2020 8080 202-555-1234 Allie Jones 123 Main Street ABC Widgets 4040 0000 2020 8080 202.555.1234 Ali Jones 123 W Main Street ABC Wigdets 4040 0000 2020 8008 +1 (202) 555 1234 Allie Jones 132 W Main Street ABC Widgets 4040 0000 2020 8080 202 555 1234 Allie Smith 123 Main St ABC Widgets, Inc. 4040 0000 2020 8080 202-555-1234 Allie Smith 123 Main Street ABC Widgets 4040 0000 2020 8080 202.555.1234 Ali Smith 123 W Main Street ABC Wigdets 4040 0000 2020 8008 +1 (202) 555 1234 Allie Smith 555 Broad St ABC Widgets, Inc 4040 0000 2020 8080 202-555-1234 Allie Smith 555 Broad Street XYZ Tech Corp 3030 5050 9999 0000 202.555.1234 Allie Smith 555 Broad Street XYZ Technology Corp 3030 5050 9999 0000 202-555-9876 10 Comparison to Search Search Resolution name:"Allie Jones" AND street:"123 Main St" name:"Allie Jones" AND street:"123 Main St" Allie Jones 123 Main St ABC Widgets, Inc. 4040 0000 2020 8080 202-555-1234 Allie Jones 123 Main Street ABC Widgets 4040 0000 2020 8080 202.555.1234 Ali Jones 123 W Main Street ABC Wigdets 4040 0000 2020 8008 +1 (202) 555 1234 Ali Jones 132 Mane Street ABC Widgets 4024 0071 4970 1227 888-555-5555 Aly Jonas 113 Main Street Acme Corp. 4716 1035 4536 4671 610-555-5555 Allie Jones 132 W Main Street ABC Widgets 4040 0000 2020 8080 202-555-9876 Al Jones 132 E Main St Mom & Pop, LLC 3772 733741 52501 1-610-555-0000 Aly Jones 113 Main St, #102 Acme Corp. 4716 1035 4536 4671 610-555-5555 Ali Jones 132 Mane Street ABC Widgets 4024 0071 4970 1227 888-555-1234 Aly Jonas 113 Main Street Acme Corp. 4781 9105 0533 4481 610-555-2345 Allie Johns 132 W Main Street ABC Widgets 4088 0110 2044 8180 202-555-3456 Elle Jeon 132 E Main St Mom & Pop, LLC 3502 730741 52203 1-610-555-4567 Elle Jones 113 Main St, #102 Acme Corp. 4716 1035 4536 4671 610-555-5678 Eli Jones 132 Mane Street ABC Widgets 4224 0065 4800 1337 888-555-6789 Eli Joans 113 Main Street Acme Corp. 4206 1035 4536 4081 610-555-7890 Allie Jeans 132 N Mean Street ABC Widgets 4240 0101 02020 8888 202-555-8901 Search engine ranks results once. True hits mixed with noise. Search engine filters results recursively. True hits isolated and transitively linked.
  • 13. 12 Batch vs. Real-Time Batch Real-Time How is it used? Resolve all entities in advance (Partitioning, pairwise scoring, connected components) How long does it take? Docs + (Docs/Partitions)2 + Components2 (Hours for billions of documents) When is it necessary? Population or network analysis Most solutions have a real-time phase, sometimes applied after batch resolution. How is it used? Resolve one entity on query (Recursive Boolean query) How long does it take? Indices * Attributes * Hops (Milliseconds for a handful of each) When is it necessary? Individual analysis
  • 14. Robust matching β€’ Token normalization β€’ Phonetic matching β€’ Fuzzy transpositions β€’ Boolean logic filtering β€’ Fine-tune search parameters 13 Real-Time Why Elasticsearch Suited for operations β€’ Horizontal scaling β€’ Real-time response rates β€’ Flexible index mappings
  • 15. 14 Approach β€’ Fast – Get results in real-time. From milliseconds to low seconds. β€’ Generic – Resolve any type of entity. People, companies, locations, sessions, etc. β€’ Transitive – Resolve over multiple hops of matches. Capture changing identities. β€’ Multi-source – Resolve over multiple indices with disparate mappings. β€’ Accommodating – Operate on data as it exists. Avoid transforming and reindexing data. β€’ Logical – Logic is easier to read, troubleshoot, and optimize than statistics. β€’ 100% Elasticsearch – Operate within existing search infrastructure. Goals
  • 16. 15 Approach 1. Entity modeling – What is the entity? What are its attributes? 2. Analyzers – How are you indexing each attribute? 3. Matchers – What is the query logic for each attribute? 4. Resolvers – What combinations of matching attributes imply a resolution? 5. Metadata maps – Which matchers apply to which indexed fields? 6. Recursive queries – How to repeat the queries until completion? Steps
  • 17. 16 zentity zentity.io An open source Elasticsearch plugin for real-time entity resolution
  • 18. zentity zentity.io An open source Elasticsearch plugin for real-time entity resolution 17 POST _zentity/resolution/person { "attributes": { "name": "Alice Jones", "dob": "1984-01-01", "phone": [ "555-123-4567", "555-987-6543" ] } }
  • 20. 19 Demos Customer intelligence Gather everything we know about a customer. Web traffic sessionization Track a bot that cycles through IP addresses, cookies, and user agent signatures. Fraud detection Determine if a health care provider was blacklisted under a different name.
  • 24. 23 Step 1. Entity Modeling Person Name the entity type. Name – First Name Name – Last Name Address – Street Address – City Address – Province Address – Postal Code Address – Country Date of Birth Phone Number Email Address IP Address Credit Card Number Social Security Number Define its attributes. Study them in your data sets. Uniqueness Consistency Presence Moderate Moderate High Low Low Low Low Moderate Moderate High High Extreme Extreme Moderate Moderate Low Moderate High High High High Moderate Extreme Extreme Extreme High Extreme High Moderate Moderate High High High Moderate Moderate Moderate Low Low None Icons by icons8
  • 25. 24 Step 1. Entity Modeling Person Name the entity type. Name – First Name Name – Last Name Address – Street Address – City Address – Province Address – Postal Code Address – Country Date of Birth Phone Number Email Address IP Address Credit Card Number Social Security Number Define its attributes. Study them in your data sets. Uniqueness Consistency Presence Moderate Moderate High Low Low Low Low Moderate Moderate High High Extreme Extreme Moderate Moderate Low Moderate High High High High Moderate Extreme Extreme Extreme High Extreme High Moderate Moderate High High High Moderate Moderate Moderate Low Low None This model is independent from your indices. You can reuse and extend this model as you add or amend indices. Icons by icons8
  • 26. Name – First Name Name – Last Name Address – Street Address – City Address – Province Address – Postal Code Address – Country Date of Birth Phone Number Email Address IP Address Credit Card Number Social Security Number Phonetic "Alice Jones" => ["ALAC","JAN"] Standard "Alice Jones" => ["ALICE","JONES"] 25 Step 2. Analyzers Take the attributes. Define their analyzers. Put them in your index mappings. { "settings": { "index": { "analysis": { "filter": { "phonetic": { "type": "phonetic", "encoder": "nysiis" } }, "analyzer": { "phonetic": { "filter": [ "icu_normalizer", "icu_folding", "phonetic" ], "tokenizer": "standard" } } } } } } { "mappings": { "_doc": { "properties": { β€œfirst_name": { "type": "text", "fields": { "phonetic": { "type": "text", "analyzer": "phonetic" } } } } } } } Person Icons by icons8
  • 27. Name – First Name Name – Last Name Address – Street Address – City Address – Province Address – Postal Code Address – Country Date of Birth Phone Number Email Address IP Address Credit Card Number Social Security Number Phonetic "Alice Jones" => ["ALAC","JAN"] Standard "Alice Jones" => ["ALICE","JONES"] 26 Step 2. Analyzers Take the attributes. Define their analyzers. Put them in your index mappings. { "settings": { "index": { "analysis": { "filter": { "phonetic": { "type": "phonetic", "encoder": "nysiis" } }, "analyzer": { "phonetic": { "filter": [ "icu_normalizer", "icu_folding", "phonetic" ], "tokenizer": "standard" } } } } } } { "mappings": { "_doc": { "properties": { β€œfirst_name": { "type": "text", "fields": { "phonetic": { "type": "text", "analyzer": "phonetic" } } } } } } } Person Analyzers are powerful. But they must be defined prior to indexing. Give careful thought to your analyzers to avoid having to reindex data. Icons by icons8
  • 28. Phonetic { "match": { "{{ field }}": { "query": "{{ value }}", "fuzziness": 0 } } } Standard { "match": { "{{ field }}": { "query": "{{ value }}", "fuzziness": 2 } } } 27 Step 3. Matchers Take the attributes. Name – First Name Name – Last Name Address – Street Address – City Address – Province Address – Postal Code Address – Country Date of Birth Phone Number Email Address IP Address Credit Card Number Social Security Number Define their Boolean query logic. Use templates for variables. Person {{ field }} – The field of an index. {{ value }} – The value of an attribute. We will replace these at query time. Icons by icons8
  • 29. Phonetic { "match": { "{{ field }}": { "query": "{{ value }}β€œ, "fuzziness": 0 } } } Standard { "match": { "{{ field }}": { "query": "{{ value }}β€œ, "fuzziness": 2 } } } 28 Step 3. Matchers Take the attributes. Name – First Name Name – Last Name Address – Street Address – City Address – Province Address – Postal Code Address – Country Date of Birth Phone Number Email Address IP Address Credit Card Number Social Security Number Define their Boolean query logic. Use templates for variables. Person {{ field }} – The field of an index. {{ value }} – The value of an attribute. We will replace these at query time. Understand that each matcher will be combined into one large Boolean query. Icons by icons8
  • 30. 29 Step 4. Resolvers Take the attributes. Name – First Name Name – Last Name Address – Street Address – City Address – Province Address – Postal Code Address – Country Date of Birth Phone Number Email Address IP Address Credit Card Number Social Security Number Determine which combinations of matching attributes imply a resolution. [ Name – First, Name – Last, Address – Street, Address – City, Address – State ] [ Name – First, Name – Last, Address – Street, Address – Postal Code ] [ Name – First, Name – Last, Date of Birth, Address – City, Address – State ] [ Name – First, Name – Last, Date of Birth, Address – Postal Code ] [ Name – First, Name – Last, Phone Number ] [ Name – First, Name – Last, Email Address ] [ Name – First, Name – Last, IP Address ] [ Name – First, Name – Last, Credit Card Number ] [ Name – First, Name – Last, Social Security Number] [ Email Address, Phone Number ] [ Email Address, IP Address ] [ Email Address, Credit Card Number ] [ IP Address, Credit Card Number ] Person Icons by icons8
  • 31. 30 Step 4. Resolvers Take the attributes. Name – First Name Name – Last Name Address – Street Address – City Address – Province Address – Postal Code Address – Country Date of Birth Phone Number Email Address IP Address Credit Card Number Social Security Number Determine which combinations of matching attributes imply a resolution. [ Name – First, Name – Last, Address – Street, Address – City, Address – State ] [ Name – First, Name – Last, Address – Street, Address – Postal Code ] [ Name – First, Name – Last, Date of Birth, Address – City, Address – State ] [ Name – First, Name – Last, Date of Birth, Address – Postal Code ] [ Name – First, Name – Last, Phone Number ] [ Name – First, Name – Last, Email Address ] [ Name – First, Name – Last, IP Address ] [ Name – First, Name – Last, Credit Card Number ] [ Name – First, Name – Last, Social Security Number] [ Email Address, Phone Number ] [ Email Address, IP Address ] [ Email Address, Credit Card Number ] [ IP Address, Credit Card Number ] Person Avoid resolving on a single attribute such as Social Security Number. Corroboration among multiple attributes helps prevent snowballs. Icons by icons8
  • 32. 31 Step 5. Metadata Maps Take the attributes. Name – First Name Name – Last Name Address – Street Address – City Address – Province Address – Postal Code Address – Country Date of Birth Phone Number Email Address IP Address Credit Card Number Social Security Number Map them to the fields of the relevant indices. users.first_name users.last_name users.phone users.email customers:fname customers:lname customers:tel customers:email customers:cc customers:zip Person Icons by icons8
  • 33. 32 Step 6. Recursive Queries With each query, new inputs might be found in different attributes. Use the metadata map and your resolvers to determine if you can create new queries for the new inputs.