SlideShare a Scribd company logo
Elastiknn
Elasticsearch Plugin for Nearest Neighbor Search
Alex Klibisz
● Software and infrastructure engineer at CiBO Technologies
○ Carbon credit marketplace for farmland
○ Primarily work on web services and data pipelines using Scala, Postgres, and Kubernetes
● Nearest neighbor search hobbyist
Overview
1. Nearest neighbor search
2. Elastiknn Live Demo
3. Performance, Trade-offs, Alternatives
~30 minutes followed by Q&A
Nearest Neighbor Search
(KNN, Similarity search, Vector search, …)
1.0
1.4
0.5
(1, 1.5)
● Generalizes to vectors with many dimensions (~10 to 10000)
● Vectors represent discrete entities (users, items, images, videos, etc.)
● Distance corresponds to semantic distance
○ Two items w/ similar properties should have smaller distance than two very different items
○ A user should be closer to an item s/he prefers than to an item they don’t
○ Several ways to compute distance: Euclidean, Angular, Manhattan, Jaccard, Hamming
○ Similarity is the inverse of distance
● Exact KNN is usually too slow
○ Scales with the size of dataset
○ Approximate methods tradeoff search quality and speed
Elastiknn Demo
Amazon Reviews Dataset
● Properties, image vectors, reviews for ~9M Amazon products
○ Image vectors computed via convolutional neural network
● Combine standard Elasticsearch queries with nearest neighbors queries
Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering; R.
He, J. McAuley; WWW, 2016
Image-based recommendations on styles and substitutes; J. McAuley, C. Targett, J. Shi, A. van den
Hengel; SIGIR, 2015
Demo…
Elastiknn Functionality Summarized
1. Store dense floating point and sparse boolean vectors
2. Exact and approximate nearest neighbors
a. Dense vectors: L2 (Euclidean), L1 (Manhattan), Angular
b. Sparse vectors: Jaccard, Hamming
3. Integrate with standard ES queries
4. Changes to vectors reflected immediately in search results.
5. Runs entirely in the Elasticsearch JVM
Performance, Alternatives, Trade-offs
Performance = recall vs. queries/second for a specific dataset
Single shard, single thread
> 0.9 recall, ~100 queries/second
3 shards, 9 threads, 20 parallel queries
> 0.9 recall, ~580 queries/second
elastiknn.com/performance
Alternatives
● Elasticsearch X-Pack
○ Supports exact queries on dense vectors since 7.3.0
○ No approximate queries
○ Tolerable latency if you can narrow down to ~10k vectors using other properties
● Opendistro K-NN Plugin
○ Supports exact and approximate queries for Euclidean and Angular similarity
○ ~3x faster than Elastiknn
○ Stores vectors and runs queries using NMSLib sidecar binary
- Stackoverflow: similar image search by pHash distance in Elasticsearch (Sep 2015)
- Paper: Semantic Vector Encoding and Similarity Search Using Fulltext Search Engines (Jun 2017)
- Plugin: Elasticsearch vector scoring (Sep 2017, deprecated)
- Stackoverflow: query nearest neighbors of a point in Elasticsearch (Oct 2017)
- Paper: Towards Practical Visual Search Engine within Elasticsearch (June 2018)
- Paper: Large-Scale Image Retrieval with Elasticsearch (June 2018)
- Plugin (POC): ElastiK Nearest Neighbors (July 2018, deprecated)
- Elasticsearch PR: LSH for approximate nearest neighbour search (July 2019, closed)
- Elasticsearch 7.3: using vectors in document scoring (Aug 2019)
- Paper: Lucene for Approximate Nearest-Neighbors Search on Arbitrary Dense Vectors (Oct 2019)
- Plugin: Elastiknn (Feb 2020)
- Plugin: Opendistro ANN (April 2020)
- Lucene: Various additions to natively support vectors and graph-based search (ongoing)
Elastiknn Literature Review (Google doc)
Related Work
Trade-offs
Pros
1. You don’t need a separate nearest neighbor search system
2. Changes are reflected immediately in search results
Cons
1. Order of magnitude slower than offline/batch methods
a. Elastiknn still handles 100s of queries/second
b. See ann-benchmarks for fastest overall solutions
Summary
Summary
● Elastiknn brings exact and approximate KNN to Elasticsearch
● Store and search dense and sparse vectors w/ five similarity functions
● Apache 2.0 License, accepting issues and PRs
● Future topics
○ How does it actually work?
○ A few interesting JVM/Lucene performance optimizations
github.com/alexklibisz/elastiknn, www.elastiknn.com
[0.33, -0.1, 0.24, …]
- Vector space modeling on music data (iHeartRadio)
- StarSpace (Facebook)
- Word2Vec Tutorial (Chris McCormick)
- Stop Using Word2Vec (SticthFix, Chris Moody)
- Listing Embeddings in Search Ranking (Airbnb)
- Approximate Nearest Neighbors and Vector Models (Erik Bernhardsson)
- Pinterest Visual Search
?
users
items
images
videos
...
Elasticsearch (x-pack) Opendistro K-NN Elastiknn
JVM-only ✓ ✓
Dense L2, Exact ✓ ✓ ✓
Dense L2, Approx ✓ ✓
Dense Angular, Exact ✓ ✓* ✓
Dense Angular, Approx ✓* ✓
Dense L1, Exact ✓ ✓
Dense L1, Approx
Sparse Jaccard, Exact ✓** ✓
Sparse Jaccard, Approx ✓
Sparse Hamming, Exact ✓** ✓
Sparse Hamming, Approx ✓
* experimental ** deprecated

More Related Content

What's hot

Data Structures Using Object Oriented Programming
Data Structures Using Object Oriented ProgrammingData Structures Using Object Oriented Programming
Data Structures Using Object Oriented Programming
mallikamt
 
Large scale graph processing
Large scale graph processingLarge scale graph processing
Large scale graph processing
Harisankar H
 
Apache cassandra an introduction
Apache cassandra  an introductionApache cassandra  an introduction
Apache cassandra an introduction
Shehaaz Saif
 
Denclue Algorithm - Cluster, Pe
Denclue Algorithm - Cluster, PeDenclue Algorithm - Cluster, Pe
Denclue Algorithm - Cluster, Pe
Tauhidul Khandaker
 
Reduce your Angular state options with NGXS
Reduce your Angular state options with NGXSReduce your Angular state options with NGXS
Reduce your Angular state options with NGXS
Yohan Lasorsa
 
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithm
Darshak Mehta
 
Pregel
PregelPregel
Pregel
Weiru Dai
 
Pregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingPregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph Processing
Riyad Parvez
 
Making Elasticity Testing of Cloud-Based Systems Reproducible
Making Elasticity Testing of Cloud-Based Systems ReproducibleMaking Elasticity Testing of Cloud-Based Systems Reproducible
Making Elasticity Testing of Cloud-Based Systems Reproducible
Michel Albonico
 
A Lock-Free Algorithm of Tree-Based Reduction for Large Scale Clustering on G...
A Lock-Free Algorithm of Tree-Based Reduction for Large Scale Clustering on G...A Lock-Free Algorithm of Tree-Based Reduction for Large Scale Clustering on G...
A Lock-Free Algorithm of Tree-Based Reduction for Large Scale Clustering on G...
Ruo Ando
 
Time Series With OrientDB - Fosdem 2015
Time Series With OrientDB - Fosdem 2015Time Series With OrientDB - Fosdem 2015
Time Series With OrientDB - Fosdem 2015
wolf4ood
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
Tomas Sirny
 
Poster
PosterPoster
Peer sim (p2p network)
Peer sim (p2p network)Peer sim (p2p network)
Peer sim (p2p network)
Hein Min Htike
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
MLconf
 

What's hot (15)

Data Structures Using Object Oriented Programming
Data Structures Using Object Oriented ProgrammingData Structures Using Object Oriented Programming
Data Structures Using Object Oriented Programming
 
Large scale graph processing
Large scale graph processingLarge scale graph processing
Large scale graph processing
 
Apache cassandra an introduction
Apache cassandra  an introductionApache cassandra  an introduction
Apache cassandra an introduction
 
Denclue Algorithm - Cluster, Pe
Denclue Algorithm - Cluster, PeDenclue Algorithm - Cluster, Pe
Denclue Algorithm - Cluster, Pe
 
Reduce your Angular state options with NGXS
Reduce your Angular state options with NGXSReduce your Angular state options with NGXS
Reduce your Angular state options with NGXS
 
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithm
 
Pregel
PregelPregel
Pregel
 
Pregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingPregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph Processing
 
Making Elasticity Testing of Cloud-Based Systems Reproducible
Making Elasticity Testing of Cloud-Based Systems ReproducibleMaking Elasticity Testing of Cloud-Based Systems Reproducible
Making Elasticity Testing of Cloud-Based Systems Reproducible
 
A Lock-Free Algorithm of Tree-Based Reduction for Large Scale Clustering on G...
A Lock-Free Algorithm of Tree-Based Reduction for Large Scale Clustering on G...A Lock-Free Algorithm of Tree-Based Reduction for Large Scale Clustering on G...
A Lock-Free Algorithm of Tree-Based Reduction for Large Scale Clustering on G...
 
Time Series With OrientDB - Fosdem 2015
Time Series With OrientDB - Fosdem 2015Time Series With OrientDB - Fosdem 2015
Time Series With OrientDB - Fosdem 2015
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
 
Poster
PosterPoster
Poster
 
Peer sim (p2p network)
Peer sim (p2p network)Peer sim (p2p network)
Peer sim (p2p network)
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
 

Similar to Using Elastiknn for exact and approximate nearest neighbor search

Dense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfDense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdf
Sease
 
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Sease
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache Solr
Sease
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
Brodmann17
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
ivan provalov
 
MLconf seattle 2015 presentation
MLconf seattle 2015 presentationMLconf seattle 2015 presentation
MLconf seattle 2015 presentation
ehtshamelahi
 
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
dbpublications
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
Charlie Hull
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
Divij Sehgal
 
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Wee Hyong Tok
 
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
Yves Raimond
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlow
Darshan Patel
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
ananth
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better Together
ObjectRocket
 
Hierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyondHierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyond
Frank Kelly
 
Elasticsearch Architechture
Elasticsearch ArchitechtureElasticsearch Architechture
Elasticsearch Architechture
Anurag Sharma
 
Comparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsComparative Study of Object Detection Algorithms
Comparative Study of Object Detection Algorithms
IRJET Journal
 
Exploring Direct Concept Search
Exploring Direct Concept SearchExploring Direct Concept Search
Exploring Direct Concept Search
Steve Rowe
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Paul Brebner
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognition
vatsal199567
 

Similar to Using Elastiknn for exact and approximate nearest neighbor search (20)

Dense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdfDense Retrieval with Apache Solr Neural Search.pdf
Dense Retrieval with Apache Solr Neural Search.pdf
 
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
Neural Search Comes to Apache Solr_ Approximate Nearest Neighbor, BERT and Mo...
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache Solr
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
 
MLconf seattle 2015 presentation
MLconf seattle 2015 presentationMLconf seattle 2015 presentation
MLconf seattle 2015 presentation
 
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
 
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlow
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
Exploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better TogetherExploring MongoDB & Elasticsearch: Better Together
Exploring MongoDB & Elasticsearch: Better Together
 
Hierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyondHierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyond
 
Elasticsearch Architechture
Elasticsearch ArchitechtureElasticsearch Architechture
Elasticsearch Architechture
 
Comparative Study of Object Detection Algorithms
Comparative Study of Object Detection AlgorithmsComparative Study of Object Detection Algorithms
Comparative Study of Object Detection Algorithms
 
Exploring Direct Concept Search
Exploring Direct Concept SearchExploring Direct Concept Search
Exploring Direct Concept Search
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognition
 

More from FaithWestdorp

Observability from the Home
Observability from the HomeObservability from the Home
Observability from the Home
FaithWestdorp
 
Elasticsearch Goes to Congress
Elasticsearch Goes to CongressElasticsearch Goes to Congress
Elasticsearch Goes to Congress
FaithWestdorp
 
Eliminate your zombie technology ray myers - 11-5-2020
Eliminate your zombie technology   ray myers - 11-5-2020Eliminate your zombie technology   ray myers - 11-5-2020
Eliminate your zombie technology ray myers - 11-5-2020
FaithWestdorp
 
Mejorando las busquedas en nuestras aplicaciones web con elasticsearch
Mejorando las busquedas en nuestras aplicaciones web con elasticsearchMejorando las busquedas en nuestras aplicaciones web con elasticsearch
Mejorando las busquedas en nuestras aplicaciones web con elasticsearch
FaithWestdorp
 
Evolving with Elastic: GetSet Learning
Evolving with Elastic: GetSet LearningEvolving with Elastic: GetSet Learning
Evolving with Elastic: GetSet Learning
FaithWestdorp
 
EmPOW: Integrating Attack Behavior Intelligence into Logstash Plugins
EmPOW: Integrating Attack Behavior Intelligence into Logstash PluginsEmPOW: Integrating Attack Behavior Intelligence into Logstash Plugins
EmPOW: Integrating Attack Behavior Intelligence into Logstash Plugins
FaithWestdorp
 
Examining OpenData with a Search Index using Elasticsearch
Examining OpenData with a Search Index using ElasticsearchExamining OpenData with a Search Index using Elasticsearch
Examining OpenData with a Search Index using Elasticsearch
FaithWestdorp
 
From the trenches: scaling a large log management deployment
From the trenches: scaling a large log management deploymentFrom the trenches: scaling a large log management deployment
From the trenches: scaling a large log management deployment
FaithWestdorp
 
Logstash and Maxmind: not just for GEOIP anymore
Logstash and Maxmind: not just for GEOIP anymoreLogstash and Maxmind: not just for GEOIP anymore
Logstash and Maxmind: not just for GEOIP anymore
FaithWestdorp
 
Elasticsearch's aggregations & esctl in action or how i built a cli tool...
Elasticsearch's aggregations & esctl in action  or how i built a cli tool...Elasticsearch's aggregations & esctl in action  or how i built a cli tool...
Elasticsearch's aggregations & esctl in action or how i built a cli tool...
FaithWestdorp
 
Searching for NLP: Using Elasticsearch to Create MVPs of NLP-enabled User Ex...
 Searching for NLP: Using Elasticsearch to Create MVPs of NLP-enabled User Ex... Searching for NLP: Using Elasticsearch to Create MVPs of NLP-enabled User Ex...
Searching for NLP: Using Elasticsearch to Create MVPs of NLP-enabled User Ex...
FaithWestdorp
 
Introduction to machine learning using Elastic
Introduction to machine learning using ElasticIntroduction to machine learning using Elastic
Introduction to machine learning using Elastic
FaithWestdorp
 
Upgrade your attack model: finding and stopping fileless attacks with MITRE A...
Upgrade your attack model: finding and stopping fileless attacks with MITRE A...Upgrade your attack model: finding and stopping fileless attacks with MITRE A...
Upgrade your attack model: finding and stopping fileless attacks with MITRE A...
FaithWestdorp
 
Elastic Observability
Elastic Observability Elastic Observability
Elastic Observability
FaithWestdorp
 
Threat hunting with Elastic APM
Threat hunting with Elastic APMThreat hunting with Elastic APM
Threat hunting with Elastic APM
FaithWestdorp
 
Guide to Data Visualization in Kibana
Guide to Data Visualization in KibanaGuide to Data Visualization in Kibana
Guide to Data Visualization in Kibana
FaithWestdorp
 
Elastic's recommendation on keeping services up and running with real-time vi...
Elastic's recommendation on keeping services up and running with real-time vi...Elastic's recommendation on keeping services up and running with real-time vi...
Elastic's recommendation on keeping services up and running with real-time vi...
FaithWestdorp
 
Esctl in action elastic user group presentation aug 25 2020
Esctl in action   elastic user group presentation aug 25 2020Esctl in action   elastic user group presentation aug 25 2020
Esctl in action elastic user group presentation aug 25 2020
FaithWestdorp
 

More from FaithWestdorp (18)

Observability from the Home
Observability from the HomeObservability from the Home
Observability from the Home
 
Elasticsearch Goes to Congress
Elasticsearch Goes to CongressElasticsearch Goes to Congress
Elasticsearch Goes to Congress
 
Eliminate your zombie technology ray myers - 11-5-2020
Eliminate your zombie technology   ray myers - 11-5-2020Eliminate your zombie technology   ray myers - 11-5-2020
Eliminate your zombie technology ray myers - 11-5-2020
 
Mejorando las busquedas en nuestras aplicaciones web con elasticsearch
Mejorando las busquedas en nuestras aplicaciones web con elasticsearchMejorando las busquedas en nuestras aplicaciones web con elasticsearch
Mejorando las busquedas en nuestras aplicaciones web con elasticsearch
 
Evolving with Elastic: GetSet Learning
Evolving with Elastic: GetSet LearningEvolving with Elastic: GetSet Learning
Evolving with Elastic: GetSet Learning
 
EmPOW: Integrating Attack Behavior Intelligence into Logstash Plugins
EmPOW: Integrating Attack Behavior Intelligence into Logstash PluginsEmPOW: Integrating Attack Behavior Intelligence into Logstash Plugins
EmPOW: Integrating Attack Behavior Intelligence into Logstash Plugins
 
Examining OpenData with a Search Index using Elasticsearch
Examining OpenData with a Search Index using ElasticsearchExamining OpenData with a Search Index using Elasticsearch
Examining OpenData with a Search Index using Elasticsearch
 
From the trenches: scaling a large log management deployment
From the trenches: scaling a large log management deploymentFrom the trenches: scaling a large log management deployment
From the trenches: scaling a large log management deployment
 
Logstash and Maxmind: not just for GEOIP anymore
Logstash and Maxmind: not just for GEOIP anymoreLogstash and Maxmind: not just for GEOIP anymore
Logstash and Maxmind: not just for GEOIP anymore
 
Elasticsearch's aggregations & esctl in action or how i built a cli tool...
Elasticsearch's aggregations & esctl in action  or how i built a cli tool...Elasticsearch's aggregations & esctl in action  or how i built a cli tool...
Elasticsearch's aggregations & esctl in action or how i built a cli tool...
 
Searching for NLP: Using Elasticsearch to Create MVPs of NLP-enabled User Ex...
 Searching for NLP: Using Elasticsearch to Create MVPs of NLP-enabled User Ex... Searching for NLP: Using Elasticsearch to Create MVPs of NLP-enabled User Ex...
Searching for NLP: Using Elasticsearch to Create MVPs of NLP-enabled User Ex...
 
Introduction to machine learning using Elastic
Introduction to machine learning using ElasticIntroduction to machine learning using Elastic
Introduction to machine learning using Elastic
 
Upgrade your attack model: finding and stopping fileless attacks with MITRE A...
Upgrade your attack model: finding and stopping fileless attacks with MITRE A...Upgrade your attack model: finding and stopping fileless attacks with MITRE A...
Upgrade your attack model: finding and stopping fileless attacks with MITRE A...
 
Elastic Observability
Elastic Observability Elastic Observability
Elastic Observability
 
Threat hunting with Elastic APM
Threat hunting with Elastic APMThreat hunting with Elastic APM
Threat hunting with Elastic APM
 
Guide to Data Visualization in Kibana
Guide to Data Visualization in KibanaGuide to Data Visualization in Kibana
Guide to Data Visualization in Kibana
 
Elastic's recommendation on keeping services up and running with real-time vi...
Elastic's recommendation on keeping services up and running with real-time vi...Elastic's recommendation on keeping services up and running with real-time vi...
Elastic's recommendation on keeping services up and running with real-time vi...
 
Esctl in action elastic user group presentation aug 25 2020
Esctl in action   elastic user group presentation aug 25 2020Esctl in action   elastic user group presentation aug 25 2020
Esctl in action elastic user group presentation aug 25 2020
 

Recently uploaded

Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 

Recently uploaded (20)

Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 

Using Elastiknn for exact and approximate nearest neighbor search

  • 1. Elastiknn Elasticsearch Plugin for Nearest Neighbor Search
  • 2. Alex Klibisz ● Software and infrastructure engineer at CiBO Technologies ○ Carbon credit marketplace for farmland ○ Primarily work on web services and data pipelines using Scala, Postgres, and Kubernetes ● Nearest neighbor search hobbyist
  • 3. Overview 1. Nearest neighbor search 2. Elastiknn Live Demo 3. Performance, Trade-offs, Alternatives ~30 minutes followed by Q&A
  • 4. Nearest Neighbor Search (KNN, Similarity search, Vector search, …)
  • 6. ● Generalizes to vectors with many dimensions (~10 to 10000) ● Vectors represent discrete entities (users, items, images, videos, etc.) ● Distance corresponds to semantic distance ○ Two items w/ similar properties should have smaller distance than two very different items ○ A user should be closer to an item s/he prefers than to an item they don’t ○ Several ways to compute distance: Euclidean, Angular, Manhattan, Jaccard, Hamming ○ Similarity is the inverse of distance ● Exact KNN is usually too slow ○ Scales with the size of dataset ○ Approximate methods tradeoff search quality and speed
  • 8. Amazon Reviews Dataset ● Properties, image vectors, reviews for ~9M Amazon products ○ Image vectors computed via convolutional neural network ● Combine standard Elasticsearch queries with nearest neighbors queries Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering; R. He, J. McAuley; WWW, 2016 Image-based recommendations on styles and substitutes; J. McAuley, C. Targett, J. Shi, A. van den Hengel; SIGIR, 2015
  • 10. Elastiknn Functionality Summarized 1. Store dense floating point and sparse boolean vectors 2. Exact and approximate nearest neighbors a. Dense vectors: L2 (Euclidean), L1 (Manhattan), Angular b. Sparse vectors: Jaccard, Hamming 3. Integrate with standard ES queries 4. Changes to vectors reflected immediately in search results. 5. Runs entirely in the Elasticsearch JVM
  • 12. Performance = recall vs. queries/second for a specific dataset Single shard, single thread > 0.9 recall, ~100 queries/second 3 shards, 9 threads, 20 parallel queries > 0.9 recall, ~580 queries/second
  • 14. Alternatives ● Elasticsearch X-Pack ○ Supports exact queries on dense vectors since 7.3.0 ○ No approximate queries ○ Tolerable latency if you can narrow down to ~10k vectors using other properties ● Opendistro K-NN Plugin ○ Supports exact and approximate queries for Euclidean and Angular similarity ○ ~3x faster than Elastiknn ○ Stores vectors and runs queries using NMSLib sidecar binary
  • 15. - Stackoverflow: similar image search by pHash distance in Elasticsearch (Sep 2015) - Paper: Semantic Vector Encoding and Similarity Search Using Fulltext Search Engines (Jun 2017) - Plugin: Elasticsearch vector scoring (Sep 2017, deprecated) - Stackoverflow: query nearest neighbors of a point in Elasticsearch (Oct 2017) - Paper: Towards Practical Visual Search Engine within Elasticsearch (June 2018) - Paper: Large-Scale Image Retrieval with Elasticsearch (June 2018) - Plugin (POC): ElastiK Nearest Neighbors (July 2018, deprecated) - Elasticsearch PR: LSH for approximate nearest neighbour search (July 2019, closed) - Elasticsearch 7.3: using vectors in document scoring (Aug 2019) - Paper: Lucene for Approximate Nearest-Neighbors Search on Arbitrary Dense Vectors (Oct 2019) - Plugin: Elastiknn (Feb 2020) - Plugin: Opendistro ANN (April 2020) - Lucene: Various additions to natively support vectors and graph-based search (ongoing) Elastiknn Literature Review (Google doc) Related Work
  • 16. Trade-offs Pros 1. You don’t need a separate nearest neighbor search system 2. Changes are reflected immediately in search results Cons 1. Order of magnitude slower than offline/batch methods a. Elastiknn still handles 100s of queries/second b. See ann-benchmarks for fastest overall solutions
  • 18. Summary ● Elastiknn brings exact and approximate KNN to Elasticsearch ● Store and search dense and sparse vectors w/ five similarity functions ● Apache 2.0 License, accepting issues and PRs ● Future topics ○ How does it actually work? ○ A few interesting JVM/Lucene performance optimizations github.com/alexklibisz/elastiknn, www.elastiknn.com
  • 19.
  • 20. [0.33, -0.1, 0.24, …] - Vector space modeling on music data (iHeartRadio) - StarSpace (Facebook) - Word2Vec Tutorial (Chris McCormick) - Stop Using Word2Vec (SticthFix, Chris Moody) - Listing Embeddings in Search Ranking (Airbnb) - Approximate Nearest Neighbors and Vector Models (Erik Bernhardsson) - Pinterest Visual Search ? users items images videos ...
  • 21. Elasticsearch (x-pack) Opendistro K-NN Elastiknn JVM-only ✓ ✓ Dense L2, Exact ✓ ✓ ✓ Dense L2, Approx ✓ ✓ Dense Angular, Exact ✓ ✓* ✓ Dense Angular, Approx ✓* ✓ Dense L1, Exact ✓ ✓ Dense L1, Approx Sparse Jaccard, Exact ✓** ✓ Sparse Jaccard, Approx ✓ Sparse Hamming, Exact ✓** ✓ Sparse Hamming, Approx ✓ * experimental ** deprecated