Building Local/Geo Search with Apache Lucene and Solr

  • 6,655 views
Uploaded on

In this on-demand webinar, you'll hear from Grant Ingersoll, co-founder of Lucid Imagination and chairman of the Apache Lucene PMC, for an in-depth technical workshop on the potential and application …

In this on-demand webinar, you'll hear from Grant Ingersoll, co-founder of Lucid Imagination and chairman of the Apache Lucene PMC, for an in-depth technical workshop on the potential and application of the newly released Lucene and Solr geo-search functions. Grant will be joined by thought leaders: Ryan McKinley, co-founder of Voyager GIS and Apache Lucene PMC member; and Sameer Maggon, of Lucid Imagination customer AT&T Interactive, which manages and delivers online and mobile advertising products across AT&T's media platforms.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
6,655
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
19
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Building Local/Geo Search with Apache Lucene and Solr
  • 2. Agenda Grant Ingersoll, Lucid Imagination Introduction Basics of geo-spatial search Tools available in Lucene and Solr Ryan McKinley, Voyager GIS Spatial search in Action: Sameer Maggon, AT&T Interactive How Solr powers local search at YP.com Lucid Imagination, Inc.
  • 3. Introductions Grant Ingersoll Lucene/Solr committer Co-author of upcoming “Taming Text” Ryan McKinley Lucene/Solr committer Co-founder of Voyager GIS Sameer Maggon Search Eng. Team lead at AT&T Interactive Active user of Lucene since 2001 Lucid Imagination, Inc.
  • 4. Use Cases Asset Management “Dude, where’s my map?” Social Networking Find all friends near me Targeted, local search results and ads “restaurants in Austin Texas” “Starbucks, 55313” Business Intelligence Restrict doc set for analysis by location Lucid Imagination, Inc.
  • 5. Spatial Search Concepts Spatial Data Types Points (latitude/longitude) Lines Shapes Maps and overlays Streets, POI http://www.openstreetmap.org/?lat=44.9744&lon=-93.2484&zoom=14&layers=B000FTFT Integration with unstructured text Metadata, descriptions, user reviews, etc. Lucid Imagination, Inc.
  • 6. Application Needs Query Parsing Efficient distance calculations Euclidean, Great Circle (Haversine), Vincenty’s Filtering Bounding Box Sort by Distance Relevance Enhancement Faceting Advanced: shape intersections, routes Lucid Imagination, Inc.
  • 7. Lucene 2.9/Solr 1.4 Features for Spatial Search Lucene/Solr are excellent for dealing with unstructured text 2.9/1.4 adds: Better Numeric handling for range searches Spatial contribution with features for (2.9 only, coming in 1.5): • Creating Cartesian Tiers (Grids) • Geohashes • Calculating distances • Filter implementations Lucid Imagination, Inc.
  • 8. Query Parsing Query parsing is often the most difficult to get right User error, ambiguity in names Mixture of topic and location: bars in Minneapolis MN Geocoding translates addresses, POIs into lat/lon or other Several publicly available services: geonames.org, Google Maps Often have built-in throttles, so may not be effective for prod. Query logs are invaluable for developing an effective parser Lucid Imagination, Inc.
  • 9. Filtering Range queries can significantly slow down search if done improperly Goal: reduce the number of terms to evaluate Solution 1: New Trie-based numeric capabilities Solution 2: Cartesian Tiers Lucid Imagination, Inc.
  • 10. Cartesian Tiers Divide up the space into grids and assign it an id Each tier breaks the space down into 2tier grids Sample code using Lucene spatial contrib: CartesianTierPlotter pl = new CartesianTierPlotter(10, new SinusoidalProjector(), "spatial"); pl.getTierBoxId(latitude, longitude); See http://www.nsshutdown.com/projects/lucene/wh itepaper/locallucene_v2.html Lucid Imagination, Inc.
  • 11. What’s next? Tighter integration in Solr Work already under way Native field types, query parsing support, faceting support Resources java-user@lucene,apache.org, solr-user@lucene.apache.org https://issues.apache.org/jira/browse/SOLR-773 http://lucene.apache.org/java/2_9_1/api/contrib- spatial/index.html Many, many more general resources on the web Lucid Imagination, Inc.
  • 12. Voyager Spatial Data Search Ryan McKinley Co-founder, Voyager GIS
  • 13. Where is my Data? • Files stored across the network – desktop, external drives, databases etc. • Many distinct data formats • Massive datasets keep getting bigger. • Poor cataloging tools • Limited metadata
  • 14. Voyager Solution Voyager is a search engine for your geographic data. • Find data with simple text search and geographic constraints • Keep data in its existing location (no need to import to a new system) • Tools to work with search results
  • 15. Implementation • Data Discovery / Extraction • Solr search • Wicket UI
  • 16. Data Extraction • For each result, we extract basic information: - ESRI ArcObjects - GDAL - PDFBox - Geotools - Tika - etc
  • 17. Geographic Search in Solr • Need to search by ‘extent’ not point • Works well with a standard RTree • Built a custom Lucene Filter to intersect/search within a given extent.
  • 18. Work in Progress • Custom Gazateer – “Building 12” > ‘-96.X 30.X -96.X 30.X’ • Named Entity Extraction – Geographic words that appear in titles / text get indexed with geographic properties
  • 19. Geographic Search in Solr 1.5+ • Standard API, pluggable implementation. – Standard Qparser, pluggable indexing • Single input ‘field’ could index multiple lucene fields. • Share objects between different parts of the request cycle (only calculate distance once) • Augment results with calculated value – Manual or from function query
  • 20. How Solr powers local search at YP.com Sameer Maggon November 18, 2009 © 2008 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.
  • 21. YP.com Technical Challenges Custom Relevance Model Scalability / Architecture Conclusion © 2008 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property.
  • 22. YP.com (beta) Local Search Site Focused on providing relevant results Uses Solr for search AT&T Proprietary (Restricted) Only for use by authorized individuals or any above- 3 designated team(s) within the AT&T companies and not for general distribution
  • 23. Technical Challenges Relevancy Scalability Topically relevant results 10s of millions of records Constrained by contextual geographical search Response time less than 200ms Local relevancy is not just keyword and location – Fault resistant ratings, brands, etc More than 150 million searches per month AT&T Proprietary (Restricted) Only for use by authorized individuals or any above- 4 designated team(s) within the AT&T companies and not for general distribution
  • 24. Custom Relevance Model Topical + Geographical + Social Complex handling of Distance modulation based on Business with 4.5 stars and multiword queries business density 200 reviews is more relevant than 5.0 star 1 review AT&T Proprietary (Restricted) Only for use by authorized individuals or any above- 5 designated team(s) within the AT&T companies and not for general distribution
  • 25. Custom Relevance Model Topical + Geographical + Social Complex handling of Distance modulation based on Business with 4.5 stars and multiword queries business density 200 reviews is more relevant than 5.0 star 1 review Field Boosts for certain LocalSolr as a geographic CustomScoreQuery to tie fields filter all different scores together Dismax to handle complex Ability to modulate score queries based on business density AT&T Proprietary (Restricted) Only for use by authorized individuals or any above- 6 designated team(s) within the AT&T companies and not for general distribution
  • 26. Geographic Sharding Score Combinations Performance was better Provisioning is a bit complex AT&T Proprietary (Restricted) Only for use by authorized individuals or any above- 7 designated team(s) within the AT&T companies and not for general distribution
  • 27. Search Architecture Search Slaves Masters shards API Layer replication Feeder / Document Pipeline rows AT&T Proprietary (Restricted) Only for use by authorized individuals or any above- 8 designated team(s) within the AT&T companies and not for general distribution
  • 28. Bottom Line Solr has enabled us to innovate faster • Quick iterations of relevancy model and functionality • Open Platform with much more flexibility • Scalable Architecture to meet our business needs
  • 29. Bottom Line Solr has enabled us to innovate faster • Quick iterations of relevancy model and functionality • Open Platform with much more flexibility • Scalable Architecture to meet our business needs Thus, delivering value to our consumers
  • 30. Resources http://bit.ly/lucid-local Lucid Imagination, Inc.
  • 31. Q&A Lucid Imagination, Inc.
  • 32. http://bit.ly/lucid-local