Solr: 4 big features

Uploaded on

Four Big Features: …

Four Big Features:
* Faceting
* Query auto-complete
* Geospatial
* Scaling

Presented at a Meetup in Boston, 11 March 2014.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. APACHE SOLR Four Big Features: •  Faceting •  Query auto-complete •  Geospatial •  Scaling 2014 March Presented by David Smiley at the Boston Java Meetup Group
  • 2. About David Smiley ➢  Software Engineer (14 years) ○  Search (5 years) ○  Java, Web, Spatial ➢  Part-time employed at MITRE ➢  Part-time search consultant ➢  Apache Lucene / Solr committer & PMC ➢  Published 1st book on Solr ➢  Presented at several conferences ➢  Taught several Solr classes
  • 3. Faceting •  Do you know what I mean by “faceting”? •  AKA: faceted navigation, or parametric search •  Popular Apps: •  eBay, Amazon, and many e-commerce sites •  Apps I use that don’t use faceting but I wish they did: • and all Maven repository software: Nexus, Artifactory, Archiva •  JIRA •  Compare this to:
  • 4. Faceted Navigation & Analytics by example… Notice the counts Optionally start with a keyword search or filter Extremely useful feature supported by very few platforms: Solr, ElasticSearch, Sphinx, … (no DBs) Credit: Trey Grainger; CareerBuilder
  • 5. How to: Field Faceting •  Index setup: schema.xml: <field name=“category” type=“string” /> <field name=“manufacturer” type=“string” /> •  Facet search: http://localhost:8983/solr/ collection1/ select? q=*:*& facet=true& facet.field=category& facet.field=manufacturer
  • 6. How to: Numeric/Date Faceting •  Index setup: schema.xml: <field name=“timestamp” type=“tdate” /> •  Facet search: http://localhost:8983/solr/ collection1/ select? q=*:*& facet=true& facet.range=timestamp& facet.range.start=NOW/YEAR-10YEAR facet.range.end=NOW/YEAR+1YEAR
  • 7. Query Suggest / Autocomplete If you aren’t doing this then you really should!
  • 8. Several Types •  Instant search •  Direct navigation to documents, usually by name/title/id, etc. •  Implement via edge n-grams or a Suggester •  Ex: iTunes, Netflix, … •  Query log completion •  Searches user queries you’ve captured & indexed •  Implement via edge n-grams or FreeTextSuggester •  Ex: Google •  Term completion •  Completes indexed words •  Implement via facet.prefix technique or a Suggester •  Facet / field value completion •  Ex: Not mutually exclusive!
  • 9. Tools for Completing / Suggesting •  The Suggester •  A specialization of the spell-check Solr component •  8 implementations to choose from! Different pros/cons •  Weighted? Analyzing? Infix? Highlight? Fuzzy? N-gram model? •  Faceting with facet.prefix •  Respects your current filters – don’t suggest a 0-result response •  Edge n-grams, with standard search •  Terms component
  • 10. Sample Suggester Search Search http://localhost: 8983/solr/ mbartists/ a_term_suggest? q=sma Response { "responseHeader":{ "status":0, "QTime":1}, "spellcheck":{ "suggestions":[ "sma",{ "numFound":4, "startOffset":0, "endOffset":3, "suggestion":[ "small", "smart", "smash" “smalley”]}, "collation","small"]}}
  • 11. Geospatial Features •  Lucene/Solr can index text, numbers, dates, and spatial data •  Features: •  Index latitude & longitude coordinates or any X Y pairs •  Index polygons or other geometry •  Query by point-radius, rectangle, or polygon geometry •  Including “IsWithin” vs “Intersects” vs “Contains” predicates •  2d/flat Euclidean OR geodetic spherical world model •  Sort or relevancy-boost by distance to indexed points The NoSQL solutions with the best spatial are CouchDB, MongoDB, Solr, and ElasticSearch
  • 12. How to: Spatial Filter & Sort • Index setup: schema.xml: <field name=“geo” type=“location_rpt” /> • Index latitude comma longitude in your document: 37.7752,-100.0232 • Filter : http://localhost: 8983/solr/ collection1/ select? q=*:*& fq={!geofilt}& sort=geodist() asc& sfield=geo& pt=45.15,-93.85& d=5
  • 13. Cool Technology Under the Hood • Grid / tile based recursive indexed structure using a prefix tree / trie indexing approach on standard Lucene inverted index • Future: •  Precise indexed shapes •  Geodetic polygons •  Hilbert curve ordering
  • 14. Scaling Solr Solr’s mechanisms for scaling: •  Replication •  Eliminates single point of failure •  Reduces query load on any one node •  Backups •  Distributed-search (for sharded indexes) •  For collections of large multi-million document collections •  SolrCloud •  Combines distributed-search and real-time replicated indexing •  Centrally manages configuration •  A higher level logical API, manages lots of coordination underneath •  Advanced: doc routing, shard splitting, migration
  • 15. Replication & Sharding Illustrated with a metaphor of an encyclopedia at a library A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 26 Shards 3 Replicas A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
  • 16. Nice Admin Screen UI
  • 17. More Advanced SolrCloud Features •  Document routing customization •  Answers: Which shard does a document belong in? •  Hash (i.e. random) distribution •  Or keep certain related documents together (ex: for same user) •  Helps scale when searching by a subset •  Or manage it yourself manually (ex: index by month) •  Shard splitting •  When your shard(s) get to be too big •  Live; no down-time •  Inter-collection document migration •  Copies a subset of one collection to another, possibly new collection •  Live; no down-time
  • 18. That’s all for now; thanks for coming! Need Lucene/Solr guidance or custom development? Contact me: ETA: June 2014