SlideShare a Scribd company logo
1 of 55
Approaching Join Index 
yet another one join algorithm 
Mikhail Khludnev 
principal engineer
• Grid Dynamics is a Silicon Valley-based leading provider of scalable, next-generation 
commerce technology solutions 
• Record of outperformance with Tier 1 retail clients 
• Fortune 1000 client relationships 
PRIVILEGED AND CONFIDENTIAL
About Me 
● principal engineer at Grid Dynamics 
● spoke at few last LuceneRevolutions 
● contributed BlockJoin query parser for Solr - {!parent} 
● blogged about it at http://blog.griddynamics.com/ 
● tried to fix threads at DataImportHandler 
http://google.com/+MikhailKhludnev
You are expected to know 
● how Lucene searches/filters 
● how it counts facets 
● that there are segments 
● what is DocValues 
● why to join 
● RDBMS joins: nested loop join, sort-merge join and hash join.
I’m expected to know 
● query-time join 
● index-time join 
● yet another one join
Lucene/Solr Is Strong 
● searching 
○ filtering 
● analytics 
○ facets 
○ pivots 
○ stats
Lucene/Solr Is Weak in 
● robust joins 
○ multiple entities 
○ relations
Joins in General
Joins in General
Joins in General 
PK=FK
Joins in General 
PK=FK
Joins in General 
PK=FK
children 
Joins in General 
1:M 
parents
Executing Join Query 
q
Executing Join Query 
q
Executing Join Query 
q
Executing Join Query 
q 
fq
Executing Join Query 
q 
fq
Join in General 
parents ∩ join-relation ∩ children
Join in General 
parents ∩ join-relation ∩ children 
● input 
● output 
● enumeration order
JoinUtil 
q 
FK[doc#] 
“17” 
“17” 
“25” 
“25” 
“56” 
“56” 
“56” 
“25” 
“4” 
“61” 
“25”
JoinUtil 
q 
FK[doc#] 
“17” 
“17” 
“25” 
“25” 
“56” 
“56” 
“56” 
“25” 
“4” 
“61” 
“25” 
“25” 
“17” 
...
JoinUtil 
q 
PK 
“1” →△ 
… 
“17”→△ 
“25”→△ 
... 
“25” 
“17” 
... 
FK[doc#] 
“17” 
“17” 
“25” 
“25” 
“56” 
“56” 
“56” 
“25” 
“4” 
“61” 
“25”
JoinUtil 
q 
PK FK[doc#] 
“1” →△ 
… 
“17”→△ 
“25”→△ 
... 
“25” 
“17” 
... 
fq 
“17” 
“17” 
“25” 
“25” 
“56” 
“56” 
“56” 
“25” 
“4” 
“61” 
“25”
Block Join 
doc#
Block Join 
doc#
Block Join 
1 doc# 
0 
0 
1 
0 
0 
1 
0 
0 
1 
0
Block Join 
1 doc# 
0 
0 
1 
0 
0 
1 
0 
0 
1 
0 
q
Block Join 
1 doc# 
0 
0 
1 
0 
0 
1 
0 
0 
1 
0 
q 
fq
Block Join 
1 doc# 
0 
0 
1 
0 
0 
1 
0 
0 
1 
0 
q 
fq
Comparison 
JoinUtil BlockJoin 
searching slow fast 
reindexing
Comparison 
JoinUtil BlockJoin 
searching slow fast 
reindexing fast slow
Comparison 
doc# 
JoinUtil BlockJoin 
searching slow fast 
reindexing fast slow
Comparison 
doc# 
JoinUtil BlockJoin 
searching slow fast 
reindexing fast slow
Comparison 
doc# 
JoinUtil BlockJoin 
searching slow fast 
reindexing fast slow
Comparison 
JoinUtil BlockJoin 
searching slow < ? < fast 
reindexing fast > ? > slow
Join Index 
q 
doc#[doc#] 
3 
6 
0 
3 
10 
0 
6 
3
Join Index 
q 
doc#[doc#] 
3 
6 
0 
3 
10 
0 
6 
3
Join Index 
q 
doc#[doc#] 
fq 3 
6 
0 
3 
10 
0 
6 
3
Join Index 
q 
fq 
doc#[doc#] 
4 
1 
2 
6 
8 
5 10 
9
Join Index 
q 
doc#[doc#] 
fq 
4 
1 
2 
6 
8 
5 10 
9
Join Index 
q 
doc#[doc#] 
fq 
4 
1 
2 
6 
8 
5 10 
9
Benchmarking 2.9 M docs 
Latency, ms 
the bigger the worse 
27 
14 
10 
BlockJoin 
(i-time) 
Join Index 
JoinUtil 
(q-time) 
https://github.com/m-khl/lucene-solr/tree/dvjoin-benchmark
Indexing is still a problem 
doc#[doc#] 
4 
3 
6 
1 
0 
3 
2 
10 
0 
6 
3 
8 
5 10 
9
Further Plans 
● incremental join-index update 
● perhaps just calculate and cache it 
● or put to dedicated index 
● join in both directions 
● calculate optimal execution plan of segments enumeration
Summary 
● Joins in General 
● JoinUtil vs Block-join 
● {!scorejoin } - SOLR-6234 
● updatable DocValues 
● opportunities for improving query-time joins: 
○ eliminate term enum 
○ choose lower cardinality side for enumeration
References 
● Searching relational content with Lucene's BlockJoinQuery 
http://blog.mikemccandless.com/ 
● Solr Experience: search parent-child relations. Part I 
Solr block-join support 
http://blog.griddynamics.com/ 
● https://wiki.apache.org/solr/Join 
● http://www.slideshare.net/martijnvg/document-relations 
● https://issues.apache.org/jira/browse/SOLR-6234 {!scorejoin } 
● Updatable DocValues Under the Hood 
http://shaierera.blogspot.com/ 
● Subject: How to openIfChanged the most recent merge? 
at: java-dev@lucene.apache.org
Thanks for Joining us! 
http://goo.gl/aYsxiD
Off scope
Joins’ Zoo in Lucene 
True Joins 
● query-time join 
○ JoinUtil 
○ {!join } 
○ {!scorejoin } - SOLR-6234 
● index-time join aka block-join 
{!parent}
Joins’ Zoo in Lucene 
Workarounds 
● term positions/SpanQueries 
● FieldCollapsing/Grouping 
● term decoration 
○ spatial 
● multivalue fields 
True Joins 
● query-time join 
○ JoinUtil 
○ {!join }, 
○ {!scorejoin } - SOLR-6234 
● index-time join aka block-join 
{!parent}
Joins’ Zoo in Lucene 
Workarounds 
● term positions/SpanQueries 
● FieldCollapsing/Grouping 
● term decoration 
○ spatial 
● multivalue fields 
True Joins 
● query-time join 
○ JoinUtil 
○ {!join }, 
○ {!scorejoin } - SOLR-6234 
● index-time join aka block-join 
{!parent}
Two phase update problem 
Subject: How to openIfChanged the most recent merge? 
at: java-dev@lucene.apache.org
JoinUtil 
● query-time 
● indexing is fast 
● searching is slow, why? 
○ expensive term enum 
○ single enumeration order 
BlockJoin 
● index-time 
● reindexing whole block is as expensive as mandatory 
● searching is darn fast, however 
○ can’t reorder child docs
Approaching Join Index - Lucene/Solr Revolution 2014

More Related Content

What's hot

Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabsSolr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Lucidworks
 

What's hot (20)

Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Stardog Linked Data Catalog
Stardog Linked Data CatalogStardog Linked Data Catalog
Stardog Linked Data Catalog
 
Access Control for HTTP Operations on Linked Data
Access Control for HTTP Operations on Linked DataAccess Control for HTTP Operations on Linked Data
Access Control for HTTP Operations on Linked Data
 
Linked Open Data - Masaryk University in Brno 8.11.2016
Linked Open Data - Masaryk University in Brno 8.11.2016Linked Open Data - Masaryk University in Brno 8.11.2016
Linked Open Data - Masaryk University in Brno 8.11.2016
 
Integrate ManifoldCF with Solr
Integrate ManifoldCF with SolrIntegrate ManifoldCF with Solr
Integrate ManifoldCF with Solr
 
Tagging search solution design Advanced edition
Tagging search solution design Advanced editionTagging search solution design Advanced edition
Tagging search solution design Advanced edition
 
RDFa Tutorial
RDFa TutorialRDFa Tutorial
RDFa Tutorial
 
Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use cases
 
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hood
 
Bea con anatomy-of-web-attack
Bea con anatomy-of-web-attackBea con anatomy-of-web-attack
Bea con anatomy-of-web-attack
 
Cool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearchCool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearch
 
Side by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and SolrSide by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and Solr
 
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabsSolr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Elasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionElasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English version
 
Solr vs ElasticSearch
Solr vs ElasticSearchSolr vs ElasticSearch
Solr vs ElasticSearch
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in action
 
Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017
 

Similar to Approaching Join Index - Lucene/Solr Revolution 2014

48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions
48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions
48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions
Ashwin Kumar
 
Pl sql student guide v 3
Pl sql student guide v 3Pl sql student guide v 3
Pl sql student guide v 3
Nexus
 
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
tienmixon
 
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
adkinspaige22
 
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
poulterbarbara
 
Hooking up Semantic MediaWiki with external tools via SPARQL
Hooking up Semantic MediaWiki with external tools via SPARQLHooking up Semantic MediaWiki with external tools via SPARQL
Hooking up Semantic MediaWiki with external tools via SPARQL
Samuel Lampa
 
10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide
Databricks
 

Similar to Approaching Join Index - Lucene/Solr Revolution 2014 (20)

Migrating to Database 12c Multitenant - New Opportunities To Get It Right!
Migrating to Database 12c Multitenant - New Opportunities To Get It Right!Migrating to Database 12c Multitenant - New Opportunities To Get It Right!
Migrating to Database 12c Multitenant - New Opportunities To Get It Right!
 
Qcon beijing 2010
Qcon beijing 2010Qcon beijing 2010
Qcon beijing 2010
 
48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions
48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions
48742447 11g-sql-fundamentals-ii-additional-practices-and-solutions
 
Pl sql student guide v 3
Pl sql student guide v 3Pl sql student guide v 3
Pl sql student guide v 3
 
Building Bridges with Taxonomy: Enabling Semantic Integration
Building Bridges with Taxonomy: Enabling Semantic IntegrationBuilding Bridges with Taxonomy: Enabling Semantic Integration
Building Bridges with Taxonomy: Enabling Semantic Integration
 
MySQL as a Document Store
MySQL as a Document StoreMySQL as a Document Store
MySQL as a Document Store
 
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
 
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
 
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx# 107444   Cust Rothwell   AuCengage  Pg. No. ii Title  I.docx
# 107444 Cust Rothwell AuCengage Pg. No. ii Title I.docx
 
OpenJDK: How to Join In on All the Fun [JavaOne 2017 CON3667]
OpenJDK: How to Join In on All the Fun [JavaOne 2017 CON3667]OpenJDK: How to Join In on All the Fun [JavaOne 2017 CON3667]
OpenJDK: How to Join In on All the Fun [JavaOne 2017 CON3667]
 
Iterative Methodology for Personalization Models Optimization
 Iterative Methodology for Personalization Models Optimization Iterative Methodology for Personalization Models Optimization
Iterative Methodology for Personalization Models Optimization
 
Webinar: Deploy Microsoft Teams and stay in control
Webinar: Deploy Microsoft Teams and stay in controlWebinar: Deploy Microsoft Teams and stay in control
Webinar: Deploy Microsoft Teams and stay in control
 
Bring your Content into Focus using Data Driven Analytics.pptx
Bring your Content into Focus using Data Driven Analytics.pptxBring your Content into Focus using Data Driven Analytics.pptx
Bring your Content into Focus using Data Driven Analytics.pptx
 
10 Reasons Snowflake Is Great for Analytics
10 Reasons Snowflake Is Great for Analytics10 Reasons Snowflake Is Great for Analytics
10 Reasons Snowflake Is Great for Analytics
 
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudInteractive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
 
OUGLS 2016: Guided Tour On The MySQL Source Code
OUGLS 2016: Guided Tour On The MySQL Source CodeOUGLS 2016: Guided Tour On The MySQL Source Code
OUGLS 2016: Guided Tour On The MySQL Source Code
 
Hooking up Semantic MediaWiki with external tools via SPARQL
Hooking up Semantic MediaWiki with external tools via SPARQLHooking up Semantic MediaWiki with external tools via SPARQL
Hooking up Semantic MediaWiki with external tools via SPARQL
 
Ensure Optimal Performance and Scalability: Implementing a Robust and Reliabl...
Ensure Optimal Performance and Scalability: Implementing a Robust and Reliabl...Ensure Optimal Performance and Scalability: Implementing a Robust and Reliabl...
Ensure Optimal Performance and Scalability: Implementing a Robust and Reliabl...
 
10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, Wix
 

More from Grid Dynamics

Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
Grid Dynamics
 

More from Grid Dynamics (20)

Are you keeping up with your customer
Are you keeping up with your customer Are you keeping up with your customer
Are you keeping up with your customer
 
"Implementing data quality automation with open source stack" - Max Martynov,...
"Implementing data quality automation with open source stack" - Max Martynov,..."Implementing data quality automation with open source stack" - Max Martynov,...
"Implementing data quality automation with open source stack" - Max Martynov,...
 
"How to build cool & useful voice commerce applications (such as devices like...
"How to build cool & useful voice commerce applications (such as devices like..."How to build cool & useful voice commerce applications (such as devices like...
"How to build cool & useful voice commerce applications (such as devices like...
 
"Challenges for AI in Healthcare" - Peter Graven Ph.D
"Challenges for AI in Healthcare" - Peter Graven Ph.D"Challenges for AI in Healthcare" - Peter Graven Ph.D
"Challenges for AI in Healthcare" - Peter Graven Ph.D
 
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
Dynamic Talks: "Applications of Big Data, Machine Learning and Artificial Int...
 
Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...
Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...
Dynamic Talks: "Digital Transformation in Banking & Financial Services… a per...
 
Dynamic Talks: "Data Strategy as a Conduit for Data Maturity and Monetization...
Dynamic Talks: "Data Strategy as a Conduit for Data Maturity and Monetization...Dynamic Talks: "Data Strategy as a Conduit for Data Maturity and Monetization...
Dynamic Talks: "Data Strategy as a Conduit for Data Maturity and Monetization...
 
Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...
Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...
Dynamics Talks: "Writing Spark Pipelines with Less Boilerplate Code" - Egor P...
 
"Trends in Building Advanced Analytics Platform for Large Enterprises" - Atul...
"Trends in Building Advanced Analytics Platform for Large Enterprises" - Atul..."Trends in Building Advanced Analytics Platform for Large Enterprises" - Atul...
"Trends in Building Advanced Analytics Platform for Large Enterprises" - Atul...
 
The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019
The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019
The New Era of Public Safety Records Management: Dynamic talks Chicago 9/24/2019
 
Dynamic Talks: "Implementing data quality automation with open source stack" ...
Dynamic Talks: "Implementing data quality automation with open source stack" ...Dynamic Talks: "Implementing data quality automation with open source stack" ...
Dynamic Talks: "Implementing data quality automation with open source stack" ...
 
"Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav...
"Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav..."Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav...
"Implementing AI for New Business Models and Efficiencies" - Parag Shrivastav...
 
Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...
Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...
Reducing No-shows and Late Cancelations in Healthcare Enterprise" - Shervin M...
 
Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...
Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...
Customer intelligence: a Machine Learning Approach: Dynamic talks Atlanta 8/2...
 
"ML Services - How do you begin and when do you start scaling?" - Madhura Dud...
"ML Services - How do you begin and when do you start scaling?" - Madhura Dud..."ML Services - How do you begin and when do you start scaling?" - Madhura Dud...
"ML Services - How do you begin and when do you start scaling?" - Madhura Dud...
 
Realtime Contextual Product Recommendations…that scale and generate revenue -...
Realtime Contextual Product Recommendations…that scale and generate revenue -...Realtime Contextual Product Recommendations…that scale and generate revenue -...
Realtime Contextual Product Recommendations…that scale and generate revenue -...
 
Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...
Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...
Decision Automation in Marketing Systems using Reinforcement Learning: Dynami...
 
Best practices for enterprise-grade microservices implementations with Google...
Best practices for enterprise-grade microservices implementations with Google...Best practices for enterprise-grade microservices implementations with Google...
Best practices for enterprise-grade microservices implementations with Google...
 
Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...
Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...
Attribution Modelling 101: Credit Where Credit is Due!: Dynamic talks Seattle...
 
Building an algorithmic price management system using ML: Dynamic talks Seatt...
Building an algorithmic price management system using ML: Dynamic talks Seatt...Building an algorithmic price management system using ML: Dynamic talks Seatt...
Building an algorithmic price management system using ML: Dynamic talks Seatt...
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

Approaching Join Index - Lucene/Solr Revolution 2014

  • 1. Approaching Join Index yet another one join algorithm Mikhail Khludnev principal engineer
  • 2. • Grid Dynamics is a Silicon Valley-based leading provider of scalable, next-generation commerce technology solutions • Record of outperformance with Tier 1 retail clients • Fortune 1000 client relationships PRIVILEGED AND CONFIDENTIAL
  • 3. About Me ● principal engineer at Grid Dynamics ● spoke at few last LuceneRevolutions ● contributed BlockJoin query parser for Solr - {!parent} ● blogged about it at http://blog.griddynamics.com/ ● tried to fix threads at DataImportHandler http://google.com/+MikhailKhludnev
  • 4. You are expected to know ● how Lucene searches/filters ● how it counts facets ● that there are segments ● what is DocValues ● why to join ● RDBMS joins: nested loop join, sort-merge join and hash join.
  • 5. I’m expected to know ● query-time join ● index-time join ● yet another one join
  • 6. Lucene/Solr Is Strong ● searching ○ filtering ● analytics ○ facets ○ pivots ○ stats
  • 7. Lucene/Solr Is Weak in ● robust joins ○ multiple entities ○ relations
  • 13. children Joins in General 1:M parents
  • 19. Join in General parents ∩ join-relation ∩ children
  • 20. Join in General parents ∩ join-relation ∩ children ● input ● output ● enumeration order
  • 21. JoinUtil q FK[doc#] “17” “17” “25” “25” “56” “56” “56” “25” “4” “61” “25”
  • 22. JoinUtil q FK[doc#] “17” “17” “25” “25” “56” “56” “56” “25” “4” “61” “25” “25” “17” ...
  • 23. JoinUtil q PK “1” →△ … “17”→△ “25”→△ ... “25” “17” ... FK[doc#] “17” “17” “25” “25” “56” “56” “56” “25” “4” “61” “25”
  • 24. JoinUtil q PK FK[doc#] “1” →△ … “17”→△ “25”→△ ... “25” “17” ... fq “17” “17” “25” “25” “56” “56” “56” “25” “4” “61” “25”
  • 27. Block Join 1 doc# 0 0 1 0 0 1 0 0 1 0
  • 28. Block Join 1 doc# 0 0 1 0 0 1 0 0 1 0 q
  • 29. Block Join 1 doc# 0 0 1 0 0 1 0 0 1 0 q fq
  • 30. Block Join 1 doc# 0 0 1 0 0 1 0 0 1 0 q fq
  • 31. Comparison JoinUtil BlockJoin searching slow fast reindexing
  • 32. Comparison JoinUtil BlockJoin searching slow fast reindexing fast slow
  • 33. Comparison doc# JoinUtil BlockJoin searching slow fast reindexing fast slow
  • 34. Comparison doc# JoinUtil BlockJoin searching slow fast reindexing fast slow
  • 35. Comparison doc# JoinUtil BlockJoin searching slow fast reindexing fast slow
  • 36. Comparison JoinUtil BlockJoin searching slow < ? < fast reindexing fast > ? > slow
  • 37. Join Index q doc#[doc#] 3 6 0 3 10 0 6 3
  • 38. Join Index q doc#[doc#] 3 6 0 3 10 0 6 3
  • 39. Join Index q doc#[doc#] fq 3 6 0 3 10 0 6 3
  • 40. Join Index q fq doc#[doc#] 4 1 2 6 8 5 10 9
  • 41. Join Index q doc#[doc#] fq 4 1 2 6 8 5 10 9
  • 42. Join Index q doc#[doc#] fq 4 1 2 6 8 5 10 9
  • 43. Benchmarking 2.9 M docs Latency, ms the bigger the worse 27 14 10 BlockJoin (i-time) Join Index JoinUtil (q-time) https://github.com/m-khl/lucene-solr/tree/dvjoin-benchmark
  • 44. Indexing is still a problem doc#[doc#] 4 3 6 1 0 3 2 10 0 6 3 8 5 10 9
  • 45. Further Plans ● incremental join-index update ● perhaps just calculate and cache it ● or put to dedicated index ● join in both directions ● calculate optimal execution plan of segments enumeration
  • 46. Summary ● Joins in General ● JoinUtil vs Block-join ● {!scorejoin } - SOLR-6234 ● updatable DocValues ● opportunities for improving query-time joins: ○ eliminate term enum ○ choose lower cardinality side for enumeration
  • 47. References ● Searching relational content with Lucene's BlockJoinQuery http://blog.mikemccandless.com/ ● Solr Experience: search parent-child relations. Part I Solr block-join support http://blog.griddynamics.com/ ● https://wiki.apache.org/solr/Join ● http://www.slideshare.net/martijnvg/document-relations ● https://issues.apache.org/jira/browse/SOLR-6234 {!scorejoin } ● Updatable DocValues Under the Hood http://shaierera.blogspot.com/ ● Subject: How to openIfChanged the most recent merge? at: java-dev@lucene.apache.org
  • 48. Thanks for Joining us! http://goo.gl/aYsxiD
  • 50. Joins’ Zoo in Lucene True Joins ● query-time join ○ JoinUtil ○ {!join } ○ {!scorejoin } - SOLR-6234 ● index-time join aka block-join {!parent}
  • 51. Joins’ Zoo in Lucene Workarounds ● term positions/SpanQueries ● FieldCollapsing/Grouping ● term decoration ○ spatial ● multivalue fields True Joins ● query-time join ○ JoinUtil ○ {!join }, ○ {!scorejoin } - SOLR-6234 ● index-time join aka block-join {!parent}
  • 52. Joins’ Zoo in Lucene Workarounds ● term positions/SpanQueries ● FieldCollapsing/Grouping ● term decoration ○ spatial ● multivalue fields True Joins ● query-time join ○ JoinUtil ○ {!join }, ○ {!scorejoin } - SOLR-6234 ● index-time join aka block-join {!parent}
  • 53. Two phase update problem Subject: How to openIfChanged the most recent merge? at: java-dev@lucene.apache.org
  • 54. JoinUtil ● query-time ● indexing is fast ● searching is slow, why? ○ expensive term enum ○ single enumeration order BlockJoin ● index-time ● reindexing whole block is as expensive as mandatory ● searching is darn fast, however ○ can’t reorder child docs