Solr: 4 big features


Published on

Four Big Features:
* Faceting
* Query auto-complete
* Geospatial
* Scaling

Presented at a Meetup in Boston, 11 March 2014.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Solr: 4 big features

  1. 1. APACHE SOLR Four Big Features: •  Faceting •  Query auto-complete •  Geospatial •  Scaling 2014 March Presented by David Smiley at the Boston Java Meetup Group
  2. 2. About David Smiley ➢  Software Engineer (14 years) ○  Search (5 years) ○  Java, Web, Spatial ➢  Part-time employed at MITRE ➢  Part-time search consultant ➢  Apache Lucene / Solr committer & PMC ➢  Published 1st book on Solr ➢  Presented at several conferences ➢  Taught several Solr classes
  3. 3. Faceting •  Do you know what I mean by “faceting”? •  AKA: faceted navigation, or parametric search •  Popular Apps: •  eBay, Amazon, and many e-commerce sites •  Apps I use that don’t use faceting but I wish they did: • and all Maven repository software: Nexus, Artifactory, Archiva •  JIRA •  Compare this to:
  4. 4. Faceted Navigation & Analytics by example… Notice the counts Optionally start with a keyword search or filter Extremely useful feature supported by very few platforms: Solr, ElasticSearch, Sphinx, … (no DBs) Credit: Trey Grainger; CareerBuilder
  5. 5. How to: Field Faceting •  Index setup: schema.xml: <field name=“category” type=“string” /> <field name=“manufacturer” type=“string” /> •  Facet search: http://localhost:8983/solr/ collection1/ select? q=*:*& facet=true& facet.field=category& facet.field=manufacturer
  6. 6. How to: Numeric/Date Faceting •  Index setup: schema.xml: <field name=“timestamp” type=“tdate” /> •  Facet search: http://localhost:8983/solr/ collection1/ select? q=*:*& facet=true& facet.range=timestamp& facet.range.start=NOW/YEAR-10YEAR facet.range.end=NOW/YEAR+1YEAR
  7. 7. Query Suggest / Autocomplete If you aren’t doing this then you really should!
  8. 8. Several Types •  Instant search •  Direct navigation to documents, usually by name/title/id, etc. •  Implement via edge n-grams or a Suggester •  Ex: iTunes, Netflix, … •  Query log completion •  Searches user queries you’ve captured & indexed •  Implement via edge n-grams or FreeTextSuggester •  Ex: Google •  Term completion •  Completes indexed words •  Implement via facet.prefix technique or a Suggester •  Facet / field value completion •  Ex: Not mutually exclusive!
  9. 9. Tools for Completing / Suggesting •  The Suggester •  A specialization of the spell-check Solr component •  8 implementations to choose from! Different pros/cons •  Weighted? Analyzing? Infix? Highlight? Fuzzy? N-gram model? •  Faceting with facet.prefix •  Respects your current filters – don’t suggest a 0-result response •  Edge n-grams, with standard search •  Terms component
  10. 10. Sample Suggester Search Search http://localhost: 8983/solr/ mbartists/ a_term_suggest? q=sma Response { "responseHeader":{ "status":0, "QTime":1}, "spellcheck":{ "suggestions":[ "sma",{ "numFound":4, "startOffset":0, "endOffset":3, "suggestion":[ "small", "smart", "smash" “smalley”]}, "collation","small"]}}
  11. 11. Geospatial Features •  Lucene/Solr can index text, numbers, dates, and spatial data •  Features: •  Index latitude & longitude coordinates or any X Y pairs •  Index polygons or other geometry •  Query by point-radius, rectangle, or polygon geometry •  Including “IsWithin” vs “Intersects” vs “Contains” predicates •  2d/flat Euclidean OR geodetic spherical world model •  Sort or relevancy-boost by distance to indexed points The NoSQL solutions with the best spatial are CouchDB, MongoDB, Solr, and ElasticSearch
  12. 12. How to: Spatial Filter & Sort • Index setup: schema.xml: <field name=“geo” type=“location_rpt” /> • Index latitude comma longitude in your document: 37.7752,-100.0232 • Filter : http://localhost: 8983/solr/ collection1/ select? q=*:*& fq={!geofilt}& sort=geodist() asc& sfield=geo& pt=45.15,-93.85& d=5
  13. 13. Cool Technology Under the Hood • Grid / tile based recursive indexed structure using a prefix tree / trie indexing approach on standard Lucene inverted index • Future: •  Precise indexed shapes •  Geodetic polygons •  Hilbert curve ordering
  14. 14. Scaling Solr Solr’s mechanisms for scaling: •  Replication •  Eliminates single point of failure •  Reduces query load on any one node •  Backups •  Distributed-search (for sharded indexes) •  For collections of large multi-million document collections •  SolrCloud •  Combines distributed-search and real-time replicated indexing •  Centrally manages configuration •  A higher level logical API, manages lots of coordination underneath •  Advanced: doc routing, shard splitting, migration
  15. 15. Replication & Sharding Illustrated with a metaphor of an encyclopedia at a library A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 26 Shards 3 Replicas A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
  16. 16. Nice Admin Screen UI
  17. 17. More Advanced SolrCloud Features •  Document routing customization •  Answers: Which shard does a document belong in? •  Hash (i.e. random) distribution •  Or keep certain related documents together (ex: for same user) •  Helps scale when searching by a subset •  Or manage it yourself manually (ex: index by month) •  Shard splitting •  When your shard(s) get to be too big •  Live; no down-time •  Inter-collection document migration •  Copies a subset of one collection to another, possibly new collection •  Live; no down-time
  18. 18. That’s all for now; thanks for coming! Need Lucene/Solr guidance or custom development? Contact me: ETA: June 2014