SlideShare a Scribd company logo
1 of 31
Download to read offline
Privileged and Confidential
Point Field Types in Solr
Evolution of Range Filters
amikryukov@griddynamics.com
Privileged and Confidential
Agenda
1. Recap: From query parser to TopDocsCollector.
2. TermQuery search flow.
3. How Range Filters are implemented?
4. Optimizations for Range Filters.
5. Point Fields.
2
Privileged and Confidential
Recap: From query parser to TopDocsCollector
? What is query parser?
3
Privileged and Confidential
Recap: From query parser to TopDocsCollector
? What is query parser?
? What is the difference between Query and Scorer? And why you need a Collector?
4
Privileged and Confidential
Recap: Query execution flow
LeafReader
5
Privileged and Confidential
Recap: From query parser to TopDocsCollector
? What is query parser?
? What is the difference between Query and Scorer?
? What is the difference between TermsEnum and PosingsEnum?
6
Privileged and Confidential
Recap: inverted index, terms, posting list
7
Privileged and Confidential
TermQuery search flow
q=Price:10
8
Privileged and Confidential
TermQuery search flow
q=Price:10
TermQuery(
field=’Price’,
val=’10’)
TermQuery
TermWeight
TermScorer
9
Privileged and Confidential
TermQuery
Idea:
Iterate over posting list of the term `10`
10
q=Price:10
Privileged and Confidential
TermQuery source code
11
Privileged and Confidential
TermScorer source code
12
Privileged and Confidential
How Range filters are implemented?
term -> document ids
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
q=PRICE:[423 TO 642]
13
Privileged and Confidential
How Range filters are implemented?
term -> document ids
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
q=PRICE:[423 TO 642]q=PRICE:423 PRICE:445 PRICE:446 … PRICE:642
14
Privileged and Confidential
MultiTermQuery
term -> document ids
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
q=PRICE:[423 TO 642]q=PRICE:423 PRICE:445 PRICE:446 … PRICE:642
15
Privileged and Confidential
Naive implementation
term -> document ids
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
q=PRICE:[423 TO 642]q=PRICE:423 PRICE:445 PRICE:446 … PRICE:642
In total = 11 should clauses.
16
Privileged and Confidential
Optimizations for Range Filters
? How can we improve the naive implementation of RangeFilterQuery?Original values
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
17
Privileged and Confidential
Trie
18
Original values
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
Privileged and Confidential
Trie*Field index time
Original values
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
Additional values
42* -> [1, 2]
44* -> [3, 4]
52* -> [5, 7]
63* -> [5, 6]
64* -> [5, 6 , 7]
4** -> [1, 2, 3, 4]
5** -> [5, 7]
6** -> [5, 6, 7]
Exploit the Trie*Field
Shift 2
Shift 1
Shift 0
19
(since Lucene 2.9)
Privileged and Confidential
Trie*Field query time
Original values
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
Additional values
42* -> [1, 2]
44* -> [3, 4]
52* -> [5, 7]
63* -> [5, 6]
64* -> [5, 6 , 7]
4** -> [1, 2, 3, 4]
5** -> [5, 7]
6** -> [5, 6, 7]
Exploit the Trie*Field
In total = 6 should clauses in the end
20
Privileged and Confidential
Is not it enough? Distribution of terms?
Trie-based approach does not involve distribution of the terms analysis.
q=PRICE:[100 TO 2002222]Original values
1 -> [1]
100 -> [2]
2000001 -> [3]
2000022 -> [3]
2000222 -> [4]
2002222 -> [5]
50000005 -> [7]
21
Privileged and Confidential
Is not it enough?
IO efficiency.
We need to store all original and additional values.
We need to read all Terms of the field at search time.
Original values
1 -> [1]
100 -> [2]
2000001 -> [3]
2000022 -> [3]
2000222 -> [4]
2002222 -> [5]
50000005 -> [7]
Additional values
10* -> [2]
1** -> [1, 2]
200002* -> [3]
200022* -> [4]
20002** -> [4]
200**** -> [3, 4, 5]
200222* -> [5]
20022** -> [5]
2002*** -> [5]
22
Privileged and Confidential
Point Fields
This feature replaces the now deprecated numeric fields (Trie*Field) and numeric range query since it
has better overall performance and is more general - allowing multidimensions. (since Lucene 6.0)
● Based on Bkd-Tree: A Dynamic Scalable kd-Tree
Naturally adapt to each data set's particular distribution. In contrast to legacy numeric fields
which always index the same precision levels for every value regardless of how the points are
distributed.
● Most of the data structure resides in on-disk blocks, with a small in-heap binary tree index
structure to locate the blocks at search time.
● Allows to operate with multi-dimensional points. (Maps, 3D-models).
23
Privileged and Confidential
Bkd-Tree
Binary Space Partitioning tree
B - Blocked
Number of points in the cell = 2
24
Privileged and Confidential
Bkd-Tree adapts to particular distribution
Example from
https://www.elastic.co/blog/lucene-points-6.0
25
Privileged and Confidential
Point Fields: index time
Disk
Heap
Lucene - number of points in cell is 1024.
26
Privileged and Confidential
Point Fields: search time
Disk
Heap
q=PRICE:[100, 2002222]
If block overlaps with the query - we
have to check every term value inside
If block is fully contained within the query -
the documents with values in that cell are
efficiently collected without having to test
each point
27
Privileged and Confidential
Performance testing (Lucene 6.0)
28
Privileged and Confidential
Point Fields
29
Privileged and Confidential
Links
Numeric Range Queries in Lucene/Solr
http://blog-archive.griddynamics.com/2014/10/numeric-range-queries-in-lucenesolr.html
Lucene Search Essentials: Scorers, Collectors and Custom Queries
https://www.slideshare.net/lucenerevolution/lucene-search-essentials-scorers-collectors-and-custom-queries-dublin13
Multi-dimensional points, coming in Apache Lucene 6.0
https://www.elastic.co/blog/lucene-points-6.0
Bkd-Tree: A Dynamic Scalable kd-Tree
https://users.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf
The Evolution of Lucene & Solr Numerics from Strings to Points
https://www.slideshare.net/lucidworks/the-evolution-of-lucene-solr-numerics-from-strings-to-points-
presented-by-steve-rowe-lucidworks?from_action=save
30
Privileged and Confidential 31

More Related Content

What's hot

Accelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache HiveAccelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache HiveDataWorks Summit
 
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...confluent
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query OptimizationMongoDB
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxData
 
Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesTaro L. Saito
 
The consequences of sync_binlog != 1
The consequences of sync_binlog != 1The consequences of sync_binlog != 1
The consequences of sync_binlog != 1Jean-François Gagné
 
MariaDB MaxScale
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScaleMariaDB plc
 
ODA Backup Restore Utility & ODA Rescue Live Disk
ODA Backup Restore Utility & ODA Rescue Live DiskODA Backup Restore Utility & ODA Rescue Live Disk
ODA Backup Restore Utility & ODA Rescue Live DiskRuggero Citton
 
MySQL Group Replication
MySQL Group ReplicationMySQL Group Replication
MySQL Group ReplicationKenny Gryp
 
Paris Redis Meetup Introduction
Paris Redis Meetup IntroductionParis Redis Meetup Introduction
Paris Redis Meetup IntroductionGregory Boissinot
 
MySQL InnoDB Cluster and Group Replication in a nutshell hands-on tutorial
MySQL InnoDB Cluster and Group Replication in a nutshell  hands-on tutorialMySQL InnoDB Cluster and Group Replication in a nutshell  hands-on tutorial
MySQL InnoDB Cluster and Group Replication in a nutshell hands-on tutorialFrederic Descamps
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScyllaDB
 
Exadata db node update
Exadata db node updateExadata db node update
Exadata db node updatepat2001
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDBSage Weil
 
MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바NeoClova
 
Hardening Kafka Replication
Hardening Kafka Replication Hardening Kafka Replication
Hardening Kafka Replication confluent
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...HostedbyConfluent
 

What's hot (20)

Accelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache HiveAccelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache Hive
 
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
 
Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query Optimization
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 Updates
 
The consequences of sync_binlog != 1
The consequences of sync_binlog != 1The consequences of sync_binlog != 1
The consequences of sync_binlog != 1
 
MariaDB MaxScale
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScale
 
ODA Backup Restore Utility & ODA Rescue Live Disk
ODA Backup Restore Utility & ODA Rescue Live DiskODA Backup Restore Utility & ODA Rescue Live Disk
ODA Backup Restore Utility & ODA Rescue Live Disk
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
MySQL Group Replication
MySQL Group ReplicationMySQL Group Replication
MySQL Group Replication
 
Paris Redis Meetup Introduction
Paris Redis Meetup IntroductionParis Redis Meetup Introduction
Paris Redis Meetup Introduction
 
Elk
Elk Elk
Elk
 
MySQL InnoDB Cluster and Group Replication in a nutshell hands-on tutorial
MySQL InnoDB Cluster and Group Replication in a nutshell  hands-on tutorialMySQL InnoDB Cluster and Group Replication in a nutshell  hands-on tutorial
MySQL InnoDB Cluster and Group Replication in a nutshell hands-on tutorial
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
 
Exadata db node update
Exadata db node updateExadata db node update
Exadata db node update
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바
 
Hardening Kafka Replication
Hardening Kafka Replication Hardening Kafka Replication
Hardening Kafka Replication
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 

Similar to Point field types in Solr. Evolution of the Range Queries.

Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comJungsu Heo
 
Writing efficient sql
Writing efficient sqlWriting efficient sql
Writing efficient sqlj9soto
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" DataArt
 
Interactive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupInteractive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupSease
 
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100xOscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100xshradha ambekar
 
Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1Jungsu Heo
 
OQL querying and indexes with Apache Geode (incubating)
OQL querying and indexes with Apache Geode (incubating)OQL querying and indexes with Apache Geode (incubating)
OQL querying and indexes with Apache Geode (incubating)Jason Huynh
 
Scaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersScaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersJonathan Levin
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2Itamar Haber
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Time series denver an introduction to prometheus
Time series denver   an introduction to prometheusTime series denver   an introduction to prometheus
Time series denver an introduction to prometheusBob Cotton
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB RoadmapMongoDB
 
SQL Server Deep Drive
SQL Server Deep Drive SQL Server Deep Drive
SQL Server Deep Drive DataArt
 
Top 10 Cypher Tuning Tips & Tricks
Top 10 Cypher Tuning Tips & TricksTop 10 Cypher Tuning Tips & Tricks
Top 10 Cypher Tuning Tips & TricksNeo4j
 
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptxGraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptxjexp
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr PerformanceLucidworks
 
Introducing Apache Carbon Data - Hadoop Native Columnar Data Format
Introducing Apache Carbon Data - Hadoop Native Columnar Data FormatIntroducing Apache Carbon Data - Hadoop Native Columnar Data Format
Introducing Apache Carbon Data - Hadoop Native Columnar Data FormatVimal Das Kammath
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB RoadmapMongoDB
 

Similar to Point field types in Solr. Evolution of the Range Queries. (20)

Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
 
Writing efficient sql
Writing efficient sqlWriting efficient sql
Writing efficient sql
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 
Interactive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupInteractive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval Meetup
 
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100xOscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
 
Apache Solr for begginers
Apache Solr for begginersApache Solr for begginers
Apache Solr for begginers
 
Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1
 
OQL querying and indexes with Apache Geode (incubating)
OQL querying and indexes with Apache Geode (incubating)OQL querying and indexes with Apache Geode (incubating)
OQL querying and indexes with Apache Geode (incubating)
 
Scaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersScaling MySQL Strategies for Developers
Scaling MySQL Strategies for Developers
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Time series denver an introduction to prometheus
Time series denver   an introduction to prometheusTime series denver   an introduction to prometheus
Time series denver an introduction to prometheus
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB Roadmap
 
SQL Server Deep Drive
SQL Server Deep Drive SQL Server Deep Drive
SQL Server Deep Drive
 
Top 10 Cypher Tuning Tips & Tricks
Top 10 Cypher Tuning Tips & TricksTop 10 Cypher Tuning Tips & Tricks
Top 10 Cypher Tuning Tips & Tricks
 
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptxGraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Introducing Apache Carbon Data - Hadoop Native Columnar Data Format
Introducing Apache Carbon Data - Hadoop Native Columnar Data FormatIntroducing Apache Carbon Data - Hadoop Native Columnar Data Format
Introducing Apache Carbon Data - Hadoop Native Columnar Data Format
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB Roadmap
 

Recently uploaded

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 

Recently uploaded (20)

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 

Point field types in Solr. Evolution of the Range Queries.

  • 1. Privileged and Confidential Point Field Types in Solr Evolution of Range Filters amikryukov@griddynamics.com
  • 2. Privileged and Confidential Agenda 1. Recap: From query parser to TopDocsCollector. 2. TermQuery search flow. 3. How Range Filters are implemented? 4. Optimizations for Range Filters. 5. Point Fields. 2
  • 3. Privileged and Confidential Recap: From query parser to TopDocsCollector ? What is query parser? 3
  • 4. Privileged and Confidential Recap: From query parser to TopDocsCollector ? What is query parser? ? What is the difference between Query and Scorer? And why you need a Collector? 4
  • 5. Privileged and Confidential Recap: Query execution flow LeafReader 5
  • 6. Privileged and Confidential Recap: From query parser to TopDocsCollector ? What is query parser? ? What is the difference between Query and Scorer? ? What is the difference between TermsEnum and PosingsEnum? 6
  • 7. Privileged and Confidential Recap: inverted index, terms, posting list 7
  • 8. Privileged and Confidential TermQuery search flow q=Price:10 8
  • 9. Privileged and Confidential TermQuery search flow q=Price:10 TermQuery( field=’Price’, val=’10’) TermQuery TermWeight TermScorer 9
  • 10. Privileged and Confidential TermQuery Idea: Iterate over posting list of the term `10` 10 q=Price:10
  • 13. Privileged and Confidential How Range filters are implemented? term -> document ids 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] q=PRICE:[423 TO 642] 13
  • 14. Privileged and Confidential How Range filters are implemented? term -> document ids 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] q=PRICE:[423 TO 642]q=PRICE:423 PRICE:445 PRICE:446 … PRICE:642 14
  • 15. Privileged and Confidential MultiTermQuery term -> document ids 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] q=PRICE:[423 TO 642]q=PRICE:423 PRICE:445 PRICE:446 … PRICE:642 15
  • 16. Privileged and Confidential Naive implementation term -> document ids 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] q=PRICE:[423 TO 642]q=PRICE:423 PRICE:445 PRICE:446 … PRICE:642 In total = 11 should clauses. 16
  • 17. Privileged and Confidential Optimizations for Range Filters ? How can we improve the naive implementation of RangeFilterQuery?Original values 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] 17
  • 18. Privileged and Confidential Trie 18 Original values 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7]
  • 19. Privileged and Confidential Trie*Field index time Original values 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] Additional values 42* -> [1, 2] 44* -> [3, 4] 52* -> [5, 7] 63* -> [5, 6] 64* -> [5, 6 , 7] 4** -> [1, 2, 3, 4] 5** -> [5, 7] 6** -> [5, 6, 7] Exploit the Trie*Field Shift 2 Shift 1 Shift 0 19 (since Lucene 2.9)
  • 20. Privileged and Confidential Trie*Field query time Original values 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] Additional values 42* -> [1, 2] 44* -> [3, 4] 52* -> [5, 7] 63* -> [5, 6] 64* -> [5, 6 , 7] 4** -> [1, 2, 3, 4] 5** -> [5, 7] 6** -> [5, 6, 7] Exploit the Trie*Field In total = 6 should clauses in the end 20
  • 21. Privileged and Confidential Is not it enough? Distribution of terms? Trie-based approach does not involve distribution of the terms analysis. q=PRICE:[100 TO 2002222]Original values 1 -> [1] 100 -> [2] 2000001 -> [3] 2000022 -> [3] 2000222 -> [4] 2002222 -> [5] 50000005 -> [7] 21
  • 22. Privileged and Confidential Is not it enough? IO efficiency. We need to store all original and additional values. We need to read all Terms of the field at search time. Original values 1 -> [1] 100 -> [2] 2000001 -> [3] 2000022 -> [3] 2000222 -> [4] 2002222 -> [5] 50000005 -> [7] Additional values 10* -> [2] 1** -> [1, 2] 200002* -> [3] 200022* -> [4] 20002** -> [4] 200**** -> [3, 4, 5] 200222* -> [5] 20022** -> [5] 2002*** -> [5] 22
  • 23. Privileged and Confidential Point Fields This feature replaces the now deprecated numeric fields (Trie*Field) and numeric range query since it has better overall performance and is more general - allowing multidimensions. (since Lucene 6.0) ● Based on Bkd-Tree: A Dynamic Scalable kd-Tree Naturally adapt to each data set's particular distribution. In contrast to legacy numeric fields which always index the same precision levels for every value regardless of how the points are distributed. ● Most of the data structure resides in on-disk blocks, with a small in-heap binary tree index structure to locate the blocks at search time. ● Allows to operate with multi-dimensional points. (Maps, 3D-models). 23
  • 24. Privileged and Confidential Bkd-Tree Binary Space Partitioning tree B - Blocked Number of points in the cell = 2 24
  • 25. Privileged and Confidential Bkd-Tree adapts to particular distribution Example from https://www.elastic.co/blog/lucene-points-6.0 25
  • 26. Privileged and Confidential Point Fields: index time Disk Heap Lucene - number of points in cell is 1024. 26
  • 27. Privileged and Confidential Point Fields: search time Disk Heap q=PRICE:[100, 2002222] If block overlaps with the query - we have to check every term value inside If block is fully contained within the query - the documents with values in that cell are efficiently collected without having to test each point 27
  • 28. Privileged and Confidential Performance testing (Lucene 6.0) 28
  • 30. Privileged and Confidential Links Numeric Range Queries in Lucene/Solr http://blog-archive.griddynamics.com/2014/10/numeric-range-queries-in-lucenesolr.html Lucene Search Essentials: Scorers, Collectors and Custom Queries https://www.slideshare.net/lucenerevolution/lucene-search-essentials-scorers-collectors-and-custom-queries-dublin13 Multi-dimensional points, coming in Apache Lucene 6.0 https://www.elastic.co/blog/lucene-points-6.0 Bkd-Tree: A Dynamic Scalable kd-Tree https://users.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf The Evolution of Lucene & Solr Numerics from Strings to Points https://www.slideshare.net/lucidworks/the-evolution-of-lucene-solr-numerics-from-strings-to-points- presented-by-steve-rowe-lucidworks?from_action=save 30