The document summarizes recent developments in spatial and temporal search capabilities in Apache Lucene and Solr. It describes new features like the FlexPrefixTree for more optimized spatial indexing, approaches for indexing and searching date ranges using numeric prefix trees, and upcoming work on spatial heatmaps and term auto-prefixing to improve search performance. The presentation was given by David Smiley, a Lucene/Solr committer and expert, to provide an overview of the latest improvements.
Photon Technical Deep Dive: How to Think VectorizedDatabricks
Photon is a new vectorized execution engine powering Databricks written from scratch in C++. In this deep dive, I will introduce you to the basic building blocks of a vectorized engine by walking you through the evaluation of an example query with code snippets. You will learn about expression evaluation, compute kernels, runtime adaptivity, filter evaluation, and vectorized operations against hash tables.
Earlier this week, we made the trip to Orlando, FL for the first-ever CIMM2 Users Group Meeting! From February 8th to 10th, over 60 people representing 26 different companies gathered at the Wyndham Grand Orlando Resort Bonnet Creek to talk about e-commerce trends for 2015, new CIMM2 functionality, and how Unilog is disrupting the B2B industry.
Photon Technical Deep Dive: How to Think VectorizedDatabricks
Photon is a new vectorized execution engine powering Databricks written from scratch in C++. In this deep dive, I will introduce you to the basic building blocks of a vectorized engine by walking you through the evaluation of an example query with code snippets. You will learn about expression evaluation, compute kernels, runtime adaptivity, filter evaluation, and vectorized operations against hash tables.
Earlier this week, we made the trip to Orlando, FL for the first-ever CIMM2 Users Group Meeting! From February 8th to 10th, over 60 people representing 26 different companies gathered at the Wyndham Grand Orlando Resort Bonnet Creek to talk about e-commerce trends for 2015, new CIMM2 functionality, and how Unilog is disrupting the B2B industry.
Covers the new Apache Lucene 4 spatial module. Includes Solr usage info. Applicable to ElasticSearch too.
Presented the 2012 Open Source Search in Government conference by Basis Technologies.
Presented by David Smiley, Software Systems Engineer, Lead, MITRE
Lucene’s former spatial contrib is gone and in its place is an entirely new spatial module developed by several well-known names in the Lucene/Solr spatial community. The heart of this module is an approach in which spatial geometries are indexed using edge-ngram tokenized geohashes searched with a prefix-tree/trie recursive algorithm. It sounds cool and it is! In this presentation, you’ll see how it works, why it’s fast, and what new things you can do with it. Key features are support for multi-valued fields, and indexing shapes with area -- even polygons, and support for various spatial predicates like “Within”. You’ll see a live demonstration and a visual representation of geohash indexed shapes. Finally, the session will conclude with a look at the future direction of the module.
In the presentation we review the Spatial Data in SQL Server.
Best Regards,
Dr. Eduardo Castro Martinez, Microsoft SQL Server MVP
http://ecastrom.blogspot.com
http://tinyurl.com/comunidadwindows
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleEvan Chan
My keynote presentation about how we developed FiloDB, a distributed, Prometheus-compatible time series database, productionized it at Apple and scaled it out to handle a huge amount of operational data, based on the stack of Kafka, Cassandra, Scala/Akka.
Spatial Data is very important for the new applications, related with Data Visualization and BI. Microsoft Azure offers possibility to use advantages of spatial data suing cloud computing. In this lecture will talk about the use of spatial data in the Microsoft Azure - loading data from Windows Azure SQL Database Spatial, optimizing Windows Azure applications and their use of different types of customers: WEB based, WPF, WP. We will learn how to import spatial data in different formats in Microsoft Azure SQL Database Spatial and will create a several demo applications, that use this data. We will also discuss the specifics, when you need to create and deploy claus applications like Azure Web Sites, Azure Cloud Services using spatial data.
Vector Tiles with GeoServer and OpenLayersJody Garnett
The latest release of GeoServer adds support for creating Vector Tiles in GeoJSON, TopoJSON, and MapBox Vector Tiles format through its WMS service for all the vector data formats it supports. These tiles can be cached using GeoWebCache (built into GeoServer), and served with the various tiling protocols (TMS, WMTS, and WMS-C). Thanks to very recent OpenLayers 3 development, these Vector Tiles can be easily and efficiently styled on a map.
This technical talk will look at how GeoServer makes Vector Tiles accessible through standard OGC services and how they differ from normal WMS and WFS usage. It will also look at how OpenLayers 3 - as a simple-to-use vector tiles client - interacts with GeoServer to retrieve tiles and effectively manage and style them. OpenLayer 3’s extensive style infrastructure will be investigated.
In this On-Demand Webinar, Erik Hatcher, co-founder of Lucid Imagination, co-author of Lucene in Action, and Lucene/Solr PMC member and committer, presents and discusess key features and innovations of Apache Solr 1.4
Postgres vs Mongo / Олег Бартунов (Postgres Professional)Ontico
РИТ++ 2017, Backend Conf
Зал Конгресс-холл, 6 июня, 17:00
Тезисы:
http://backendconf.ru/2017/abstracts/2781.html
Я хочу немного порушить стереотипы, что Postgres - это чисто реляционная СУБД из прошлого века, плохо приспособленная под реалии современных проектов. Недавно мы прогнали YCSB для последних версий Postgres и Mongodb и увидели их плюсы и минусы на разных типах нагрузки, о которых я буду рассказывать. ...
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppGoogle
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-fusion-buddy-review
AI Fusion Buddy Review: Key Features
✅Create Stunning AI App Suite Fully Powered By Google's Latest AI technology, Gemini
✅Use Gemini to Build high-converting Converting Sales Video Scripts, ad copies, Trending Articles, blogs, etc.100% unique!
✅Create Ultra-HD graphics with a single keyword or phrase that commands 10x eyeballs!
✅Fully automated AI articles bulk generation!
✅Auto-post or schedule stunning AI content across all your accounts at once—WordPress, Facebook, LinkedIn, Blogger, and more.
✅With one keyword or URL, generate complete websites, landing pages, and more…
✅Automatically create & sell AI content, graphics, websites, landing pages, & all that gets you paid non-stop 24*7.
✅Pre-built High-Converting 100+ website Templates and 2000+ graphic templates logos, banners, and thumbnail images in Trending Niches.
✅Say goodbye to wasting time logging into multiple Chat GPT & AI Apps once & for all!
✅Save over $5000 per year and kick out dependency on third parties completely!
✅Brand New App: Not available anywhere else!
✅ Beginner-friendly!
✅ZERO upfront cost or any extra expenses
✅Risk-Free: 30-Day Money-Back Guarantee!
✅Commercial License included!
See My Other Reviews Article:
(1) AI Genie Review: https://sumonreview.com/ai-genie-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIFusionBuddyReview,
#AIFusionBuddyFeatures,
#AIFusionBuddyPricing,
#AIFusionBuddyProsandCons,
#AIFusionBuddyTutorial,
#AIFusionBuddyUserExperience
#AIFusionBuddyforBeginners,
#AIFusionBuddyBenefits,
#AIFusionBuddyComparison,
#AIFusionBuddyInstallation,
#AIFusionBuddyRefundPolicy,
#AIFusionBuddyDemo,
#AIFusionBuddyMaintenanceFees,
#AIFusionBuddyNewbieFriendly,
#WhatIsAIFusionBuddy?,
#HowDoesAIFusionBuddyWorks
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Covers the new Apache Lucene 4 spatial module. Includes Solr usage info. Applicable to ElasticSearch too.
Presented the 2012 Open Source Search in Government conference by Basis Technologies.
Presented by David Smiley, Software Systems Engineer, Lead, MITRE
Lucene’s former spatial contrib is gone and in its place is an entirely new spatial module developed by several well-known names in the Lucene/Solr spatial community. The heart of this module is an approach in which spatial geometries are indexed using edge-ngram tokenized geohashes searched with a prefix-tree/trie recursive algorithm. It sounds cool and it is! In this presentation, you’ll see how it works, why it’s fast, and what new things you can do with it. Key features are support for multi-valued fields, and indexing shapes with area -- even polygons, and support for various spatial predicates like “Within”. You’ll see a live demonstration and a visual representation of geohash indexed shapes. Finally, the session will conclude with a look at the future direction of the module.
In the presentation we review the Spatial Data in SQL Server.
Best Regards,
Dr. Eduardo Castro Martinez, Microsoft SQL Server MVP
http://ecastrom.blogspot.com
http://tinyurl.com/comunidadwindows
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleEvan Chan
My keynote presentation about how we developed FiloDB, a distributed, Prometheus-compatible time series database, productionized it at Apple and scaled it out to handle a huge amount of operational data, based on the stack of Kafka, Cassandra, Scala/Akka.
Spatial Data is very important for the new applications, related with Data Visualization and BI. Microsoft Azure offers possibility to use advantages of spatial data suing cloud computing. In this lecture will talk about the use of spatial data in the Microsoft Azure - loading data from Windows Azure SQL Database Spatial, optimizing Windows Azure applications and their use of different types of customers: WEB based, WPF, WP. We will learn how to import spatial data in different formats in Microsoft Azure SQL Database Spatial and will create a several demo applications, that use this data. We will also discuss the specifics, when you need to create and deploy claus applications like Azure Web Sites, Azure Cloud Services using spatial data.
Vector Tiles with GeoServer and OpenLayersJody Garnett
The latest release of GeoServer adds support for creating Vector Tiles in GeoJSON, TopoJSON, and MapBox Vector Tiles format through its WMS service for all the vector data formats it supports. These tiles can be cached using GeoWebCache (built into GeoServer), and served with the various tiling protocols (TMS, WMTS, and WMS-C). Thanks to very recent OpenLayers 3 development, these Vector Tiles can be easily and efficiently styled on a map.
This technical talk will look at how GeoServer makes Vector Tiles accessible through standard OGC services and how they differ from normal WMS and WFS usage. It will also look at how OpenLayers 3 - as a simple-to-use vector tiles client - interacts with GeoServer to retrieve tiles and effectively manage and style them. OpenLayer 3’s extensive style infrastructure will be investigated.
In this On-Demand Webinar, Erik Hatcher, co-founder of Lucid Imagination, co-author of Lucene in Action, and Lucene/Solr PMC member and committer, presents and discusess key features and innovations of Apache Solr 1.4
Postgres vs Mongo / Олег Бартунов (Postgres Professional)Ontico
РИТ++ 2017, Backend Conf
Зал Конгресс-холл, 6 июня, 17:00
Тезисы:
http://backendconf.ru/2017/abstracts/2781.html
Я хочу немного порушить стереотипы, что Postgres - это чисто реляционная СУБД из прошлого века, плохо приспособленная под реалии современных проектов. Недавно мы прогнали YCSB для последних версий Postgres и Mongodb и увидели их плюсы и минусы на разных типах нагрузки, о которых я буду рассказывать. ...
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppGoogle
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-fusion-buddy-review
AI Fusion Buddy Review: Key Features
✅Create Stunning AI App Suite Fully Powered By Google's Latest AI technology, Gemini
✅Use Gemini to Build high-converting Converting Sales Video Scripts, ad copies, Trending Articles, blogs, etc.100% unique!
✅Create Ultra-HD graphics with a single keyword or phrase that commands 10x eyeballs!
✅Fully automated AI articles bulk generation!
✅Auto-post or schedule stunning AI content across all your accounts at once—WordPress, Facebook, LinkedIn, Blogger, and more.
✅With one keyword or URL, generate complete websites, landing pages, and more…
✅Automatically create & sell AI content, graphics, websites, landing pages, & all that gets you paid non-stop 24*7.
✅Pre-built High-Converting 100+ website Templates and 2000+ graphic templates logos, banners, and thumbnail images in Trending Niches.
✅Say goodbye to wasting time logging into multiple Chat GPT & AI Apps once & for all!
✅Save over $5000 per year and kick out dependency on third parties completely!
✅Brand New App: Not available anywhere else!
✅ Beginner-friendly!
✅ZERO upfront cost or any extra expenses
✅Risk-Free: 30-Day Money-Back Guarantee!
✅Commercial License included!
See My Other Reviews Article:
(1) AI Genie Review: https://sumonreview.com/ai-genie-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIFusionBuddyReview,
#AIFusionBuddyFeatures,
#AIFusionBuddyPricing,
#AIFusionBuddyProsandCons,
#AIFusionBuddyTutorial,
#AIFusionBuddyUserExperience
#AIFusionBuddyforBeginners,
#AIFusionBuddyBenefits,
#AIFusionBuddyComparison,
#AIFusionBuddyInstallation,
#AIFusionBuddyRefundPolicy,
#AIFusionBuddyDemo,
#AIFusionBuddyMaintenanceFees,
#AIFusionBuddyNewbieFriendly,
#WhatIsAIFusionBuddy?,
#HowDoesAIFusionBuddyWorks
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
AI Genie Review: World’s First Open AI WordPress Website CreatorGoogle
AI Genie Review: World’s First Open AI WordPress Website Creator
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-genie-review
AI Genie Review: Key Features
✅Creates Limitless Real-Time Unique Content, auto-publishing Posts, Pages & Images directly from Chat GPT & Open AI on WordPress in any Niche
✅First & Only Google Bard Approved Software That Publishes 100% Original, SEO Friendly Content using Open AI
✅Publish Automated Posts and Pages using AI Genie directly on Your website
✅50 DFY Websites Included Without Adding Any Images, Content Or Doing Anything Yourself
✅Integrated Chat GPT Bot gives Instant Answers on Your Website to Visitors
✅Just Enter the title, and your Content for Pages and Posts will be ready on your website
✅Automatically insert visually appealing images into posts based on keywords and titles.
✅Choose the temperature of the content and control its randomness.
✅Control the length of the content to be generated.
✅Never Worry About Paying Huge Money Monthly To Top Content Creation Platforms
✅100% Easy-to-Use, Newbie-Friendly Technology
✅30-Days Money-Back Guarantee
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIGenieApp #AIGenieBonus #AIGenieBonuses #AIGenieDemo #AIGenieDownload #AIGenieLegit #AIGenieLiveDemo #AIGenieOTO #AIGeniePreview #AIGenieReview #AIGenieReviewandBonus #AIGenieScamorLegit #AIGenieSoftware #AIGenieUpgrades #AIGenieUpsells #HowDoesAlGenie #HowtoBuyAIGenie #HowtoMakeMoneywithAIGenie #MakeMoneyOnline #MakeMoneywithAIGenie
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
In the ever-evolving landscape of technology, enterprise software development is undergoing a significant transformation. Traditional coding methods are being challenged by innovative no-code solutions, which promise to streamline and democratize the software development process.
This shift is particularly impactful for enterprises, which require robust, scalable, and efficient software to manage their operations. In this article, we will explore the various facets of enterprise software development with no-code solutions, examining their benefits, challenges, and the future potential they hold.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
3. Agenda
Spatial
• Polygons and Accuracy: SerializedDVStrategy
• FlexPrefixTree
• BBoxSpatialStrategy
• Student/Intern contributions, Geodesics
Temporal
• Dates, and Date Ranges
• Search
• Faceting
4. About David Smiley
• Freelance search consultant / developer
• Expert Lucene/Solr development skills,
advice (consulting), training
• Java (full-stack), Web, Spatial
• Apache Lucene / Solr committer & PMC,
Eclipse Locationtech PMC
• Authored 1st book on Solr, plus two editions
• Presented at several conferences & meetups
• Taught several Solr classes, self-developed & LucidWorks
5. Lucene Spatial Overview
• Multiple approaches to index spatial data
abstract class SpatialStrategy
(5+ concrete implementations)
• RecursivePrefixTreeStrategy (RPT) is most prominent, versatile
• Grid based
Shape
SpatialPrefixTree / Cell PrefixTreeStrategy
• Uses Spatial4j lib for shapes, distance calculations, and WKT
• Uses JTS Topology Suite lib for polygons
IntersectsPrefixTreeFilter
Contains…
Geohash | Quad Within…
6. SpatialPrefixTrees and Accuracy
RecursivePrefixTree (RPT) uses Lucene’s index as a PrefixTree
• Thus represents shapes as grid cells of varying precision by prefix
Example, a point shape:
• D, DR, DRT, DRT2, DRT2Y
Example, a polygon shape:
• Too many to list… 508 cells
More details here:
http://opensourceconnections.com/blog/2014/04/11/indexing-polygons-in-lucene-with-accuracy/
7. …continued
• For more accuracy, index more levels (longer prefixes)
• Points: linear relationship of levels to number of cells
• Non-points: exponential relationship…
RPT applies a distErrPct shape size ratio to non-point shapes to
trade accuracy for scalability
• distErrPct=0.025 (2.5% of the radius, the default):
• Massachusetts: level 6
• USA: level 4 (not as precise)
8. SerializedDVStrategy (Lucene 4.7)
• Stores serialized geometry into Lucene BinaryDocValues
• It’s as accurate as the underlying geometry coordinates/shape
• But it’s not a spatial index – it’s retrievable on a per-document basis
• Use RPT + SerializedDV for speed and accuracy!
• More to come eventually:
• Solr adapter – SOLR-5728, ElasticSearch adapter #2361
• Speed: Skip the serialized geometry check for non-edge cells –
LUCENE-5579
9. Sample Code
SpatialArgs args = new SpatialArgs(INTERSECTS, point);
treeStrategy = new RecursivePrefixTreeStrategy(
grid, "geometry");
verifyStrategy = new SerializedDVStrategy(
ctx, "serialized_geometry");
Query treeQuery = new ConstantScoreQuery(
treeStrategy.makeFilter(args));
Query combinedQuery = new FilteredQuery(
treeQuery,
verifyStrategy.makeFilter(args),
FilteredQuery.QUERY_FIRST_FILTER_STRATEGY);
Code is from a related presentation by the Climate Corporation presented at FOSS4G 2014
10. FlexPrefixTree (Coming to Lucene 5)
• A new SpatialPrefixTree by Varun Shenoy (GSOC 2014) !
• LUCENE-4922; Still needs to be committed. Goal is for 5.0.
• More optimized, more flexible, than Geohash & Quad
• Configurable sub-cells at each level: 4, 16, 64, 256
• You choose trade-off between index speed/disk size & search speed
• Internally uses an integer coordinate system
• Rectangle searches are particularly fast; minimal floating-point conversion
• Cells are always squares (equal sides) – better for heatmaps
• YMMV: 10% - 100% faster than GeohashPrefixTree
11. BBoxSpatialStrategy (Lucene 4.10)
• Rectangles (BBox’s) only, one value per field
• Wide predicate support
• Equals, Intersects, Within, Contains, Disjoint
• Accurate (8-byte double floating point)
• Area overlap relevancy
• Weight search results by a combination of query shape overlap &
index shape overlap ratios
• Solr BBoxField…
15. Approach: Simple Two-field
(as you might do in SQL or any system without native range types)
• A start-time & end-time field pair
• A search window (time span) becomes two range queries
• details vary by predicate (Intersects, Contains, vs. Within)
• Single-valued only
• …even though Lucene supports multi-valued fields
• Theoretically possible but would be a lot of work
• because Lucene doesn’t store “position” info for numeric fields
• because numeric range/prefix queries are position-less
16. Approach: 2D Spatial PrefixTree
• Lucene Spatial QuadPrefixTree
(2D) with RPT Strategy
• Use ‘x’ for start-time, ‘y’ for end-time
• A search window (time span)
becomes a rectangle query
• details vary by predicate (Intersects,
Contains, vs. Within)
• Cool…
• But floating-point edge issues
• Only ~50 levels supported; not 64
Details: http://wiki.apache.org/solr/SpatialForTimeDurations
17. Approach: DateRangePrefixTree (Lucene 5)
• A new 1D SpatialPrefixTree: NumberRangePrefixTree
• NumberRangePrefixTree w/ DateRangePrefixTree subclass
• NR-SPT: Configurable sub-cells per level; no level limit
• Not just for ranges; instances too
• Index/Search with NumberRangePrefixTreeStrategy
• Indexing, and search predicate code (e.g. Intersects…) completely re-used
• DateRangePrefixTree
• 9 Levels: 1M years, 1K years, years, months, days, hours, minutes,
seconds, millis
…continued…
18. Trade-offs of N/D-SPT
• Indexing:
• “Common” date-ranges use ~ <50 terms, but random millisecond
ranges use up to ~14K terms
• All date instances (not a range) <= 9 terms
• Comparison to 2D SPT: instance or range, always 50
• Search:
• Query for “common” query ranges faster than uncommon
• Comparison to 2D SPT:
• Contains & Within predicates: overlapping values per document get
coalesced, can’t be differentiated
19. Solr DateRangeField
• Configuration in schema.xml:
<field name="dateRange" type=”dateRange” />
<fieldType name="dateRange" class="solr.DateRangeField" />
• Index field data, examples:
• 2014-05-21T12:00:00.000Z (same as TrieDate)
• 2014-05-21T12 (truncated to desired precision)
• [1990 TO 1995]
• Query, examples:
• fq=dateRange:[* TO 2014-05-21]
• fq={!field f=dateRange op=Contains} [2000 TO 2014-05-21]
21. Date Faceting
• Option A: facet.range
• Not for indexed date-ranges
• Internally executes one query for each value & caches large bitset
• Option B: facet.interval (Solr 4.10)
• Not for indexed date-ranges
• Requires DocValues (more index data)
• Supports variable/custom intervals
• New work-in-progress option: Facet on DateRangeField
• Ranges are fixed/pre-determined (months, days, etc.)
• Optimized for thousands of ranges to count
• Each value-range is only 1 term!
22. Future stuff I’m excited about
• Continuing works in-progress
• Spatial heatmaps! Coming in January 2015!
• Lucene layer & Solr adapter
• Lucene term auto-prefixing LUCENE-5879
• Brings spatial, date, numeric, indexing/search to the next level!
• More prefix-tree optimizations
• Inner vs edge leaf cell differentiation for non-point shapes
• RPT + SerializedDVStrategy; skip accuracy checks for inner cells
• Don’t index leaf cells twice
23. That’s all for now; thanks for coming!
Need Lucene/Solr guidance or custom development?
Contact me!
Email: dsmiley@apache.org
LinkedIn: http://www.linkedin.com/in/davidwsmiley
G+: +DavidSmiley
Twitter: @DavidWSmiley
ETA: December
2014
Editor's Notes
There is a 3rd edition expected by the end of 2014.
508 cells level 5 detail (same as point example). 463 of these are “leaf” cells, and these get duplicated in the index with and without a leaf variant.
Disclaimer: the actual polygon picture here actually goes to level 6 but that’s not important.
distErrPct=0.025 tends to yield a few thousand cells or so.
distErrPct is independent of maximum configured precision.
More about
The geometry format is dictated by Spatial4j which has it’s own format for the Spatial4j native shapes; other shapes (e.g. polygons) use WKB. There are plenty of opportunities for a more compact representation; WKB is a little hefty but it’s known to be fast to read nonetheless.
By the way, set PrefixGridScanLevel on the RPT strategy to be at least maxLevels (set to 100 is fine), such that it never scans. The scanning optimization is has turned out to be very bad for non-point indexed shapes.
TODO: Update for latest trunk, and run some randomized tests (beasting) for a while, then commit to trunk. Then wait a little and back-port to 5x.
256 levels is only supported for point data.
No Hilbert Curve ordering yet.
Configurable levels is similar in concept to precisionStep in the numeric Trie fields, but here it’s configurable at each “step” (level).
https://issues.apache.org/jira/browse/LUCENE-4922
See the Solr Ref Guide for more info: https://cwiki.apache.org/confluence/display/solr/Spatial+Search
That ENVELOPE syntax is WKT/CQL
In reverse chronological order. Note the middle 3 are works in progress. Non-coincidentally they all deal with geodesics. Geodesics is hard! Also, Varun was basically full-time at this.
distErrPct=0
Once FlexPrefixTree is committed, it would be great to add an ‘integer’ based 2D Shape (or upgrade it to ‘long’) and add some ease-of-use wrappers (Solr FieldType) to make this nicer
Could use configurable maximum depth.
Theoretically should be faster than DateField for date instances given “common” query ranges because DateSpatialPrefixTree is customized for common date ranges resulting in ~ <50 terms whereas Lucene numeric trie fields with precisionStep 6 will use ~680 terms (math: 2^6 * 64/6)
“op” local-param could get renamed to ‘pred’ by the time it’s released