Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Inside Solr 5 - Bangalore Solr/Lucene Meetup


Published on

What's new in Solr 5 and what's in store in the near term

Published in: Software
  • Be the first to comment

Inside Solr 5 - Bangalore Solr/Lucene Meetup

  1. 1. October 13-15, 2015 • Austin, TX
  2. 2. Inside Apache Solr 5
  3. 3. COMMUNITY CUSTOMERS PRODUCTS Apache Solr + Lucidworks
  4. 4. Search is more than just a box.
  5. 5. personal. contextual. actionable. Search makes data
  6. 6. Search can be smarter. location search history query security context Personal, contextual, relevant results: consumer- like simplicity and power in the enterprise.
  7. 7. Product Offering Environment Features Support Level Additional Support Availability Response Time Number of Incidents Pricing Model Solr Enterprise 24x7 SLA-Backed Unlimited Incidents Per Node Dev Support (4 Contacts) Operational Support Regular Health Checks Security Log Analysis / SiLK Support Dashboards & Reporting Enhanced Admin UI Fusion Dev Support (4 Contacts) Operational Support Regular Health Checks 24x7 SLA-Backed Unlimited Incidents Per Node Security Crawlers & Connectors Log Analysis / SiLK Support Enhanced Admin UI Data Enrichment Machine Learning Recommendations Advanced Relevancy Tuning Developer Support How-To Support Knowledge Base Fusion Support 9x5 SLA-Backed Unlimited Incidents Per Named Developer ProductionDevelopment
  8. 8. • Get Started • Dig in • Go Big • Get Finished • Sneak peak Inside Apache Solr 5
  9. 9. • Easy to start/stop ./bin/solr {start|stop} • Create collections: ./bin/solr create -c <COLL_NAME> • No more WAR! Web container (Jetty) is now an implementation detail • Scripts to support installing and running Solr as a service on Linux. Get Started
  10. 10. JSON’s great: • Solr 5 “does the right thing” for JSON out of the box Except when it isn’t: • Most data isn’t JSON • Solr handles CSV, XML, Rich Content out of the box without having to install plugins Your Content, Your Way
  11. 11. Your Content, Your Way • Solr 5 will ship Tika 1.7, adding: • OCR support • PST and Matlab • Better Date Handling • More flexibility with spatial units
  12. 12. Dig In
  13. 13. • Stats and Pivot faceting now work together • Focused on accuracy of results • First few steps in unification of all facet types with stats and aggregations • got-stats-in-my-facets/ Pivots and Stats
  14. 14. • Schema API: REST API for adding field types, and dynamic fields • Managing Request Handlers through API • Implicit registration of replication, Real Time Get and Administration Handlers • Improved APIs for managing collections API Goodness
  15. 15. Lucene 5 Highlights • Stronger index safety guarantees • Reduced memory usage in a number of areas • No more FieldCache (replaced w/ UninvertingReader) • Multi-valued sorting and suggesters • Better IO defaults when using SSDs • More efficient handling of merging stored fields
  16. 16. Go Big • Many scaling improvements focused on interactions with Zookeeper: • Split cluster state management reduces chattiness in large multi-tenant implementations • Improved performance for Overseer operations >40% • Better timeout defaults based on real-world testing • See my Lucene Revolution Keynote for more details:
  17. 17. Distributed IDF • IDF = Inverse Document Frequency = A measure of the relative importance of a word in a collection • 4 implementations: • LocalStatsCache: Local Stats • ExactStatsCache: One time use aggregation • ExactSharedStatsCache: Stats shared across requests • LRUStatsCache: Stats shared in an LRU cache across requests
  18. 18. • Ease of getting started means nothing if you can’t stay running in production • Jepsen tests simulate network partitions, data loss, i.e. “The Real World” • LucidWorks/jepsen/tree/solr- jepsen • Get Finished
  19. 19. Stability Improvements • Protection of ZK content • ReplicationHandler now has an option to throttle the speed of replication • More control over terminating long running queries • Finite default timeouts for select and update requests
  21. 21. • Facets and Analytics: • Mix and match all facet types and stats (SOLR-6352, SOLR-6353, SOLR-4212) • Percentiles via t-digest (SOLR-6350) • Replication performance (SOLR-6816) • Finish off Config APIs (various) • Data location aware ValueSource implementation for fast changing distributed data • First class support for more languages OOTB Near Term Road Map
  22. 22. Resources Release Notes: • Solr: • Lucene: ReleaseNote50 Lucidworks: Shalin Shekhar Mangar • • Twitter:
  23. 23. Credits What’s new in Solr 5.0 — Anshum Gupta • Lucidworks webinar “Inside Solr 5” - Grant Ingersoll • apache-solr-5