October 13-15, 2015 • Austin, TX
http://lucenerevolution.org
Inside Apache Solr 5
COMMUNITY
CUSTOMERS PRODUCTS
Apache Solr +
Lucidworks
Search is more than just a box.
personal.
contextual.
actionable.
Search makes data
Search can be smarter.
location search history query security context
Personal, contextual, relevant results: consumer-
like simplicity and power in the enterprise.
Product Offering
Environment
Features
Support Level
Additional Support
Availability
Response Time
Number of Incidents
Pricing Model
Solr
Enterprise
24x7
SLA-Backed
Unlimited Incidents
Per Node
Dev Support (4 Contacts)
Operational Support
Regular Health Checks
Security
Log Analysis / SiLK Support
Dashboards & Reporting
Enhanced Admin UI
Fusion
Dev Support (4 Contacts)
Operational Support
Regular Health Checks
24x7
SLA-Backed
Unlimited Incidents
Per Node
Security
Crawlers & Connectors
Log Analysis / SiLK Support
Enhanced Admin UI
Data Enrichment
Machine Learning
Recommendations
Advanced Relevancy Tuning
Developer
Support
How-To Support
Knowledge Base
Fusion Support
9x5
SLA-Backed
Unlimited Incidents
Per Named Developer
ProductionDevelopment
• Get Started
• Dig in
• Go Big
• Get Finished
• Sneak peak
Inside Apache Solr 5
• Easy to start/stop
./bin/solr {start|stop}
• Create collections:
./bin/solr create -c <COLL_NAME>
• No more WAR! Web container (Jetty) is now an
implementation detail
• Scripts to support installing and running Solr as a
service on Linux.
Get Started
JSON’s great:
• Solr 5 “does the right thing” for JSON out of the box
Except when it isn’t:
• Most data isn’t JSON
• Solr handles CSV, XML, Rich Content out of the box
without having to install plugins
Your Content, Your Way
Your Content, Your Way
• Solr 5 will ship Tika 1.7, adding:
• OCR support
• PST and Matlab
• Better Date Handling
• More flexibility with spatial units
Dig In
• Stats and Pivot faceting now work
together
• Focused on accuracy of results
• First few steps in unification of all
facet types with stats and
aggregations
• http://lucidworks.com/blog/you-
got-stats-in-my-facets/
Pivots and Stats
• Schema API: REST API for adding field types, and
dynamic fields
• Managing Request Handlers through API
• Implicit registration of replication, Real Time Get
and Administration Handlers
• Improved APIs for managing collections
API Goodness
Lucene 5 Highlights
• Stronger index safety guarantees
• Reduced memory usage in a number of areas
• No more FieldCache (replaced w/
UninvertingReader)
• Multi-valued sorting and suggesters
• Better IO defaults when using SSDs
• More efficient handling of merging stored fields
Go Big
• Many scaling improvements focused on interactions with
Zookeeper:
• Split cluster state management reduces chattiness in
large multi-tenant implementations
• Improved performance for Overseer operations >40%
• Better timeout defaults based on real-world testing
• See my Lucene Revolution Keynote for more details:
http://bit.ly/shalinRevKeynote
Distributed IDF
• IDF = Inverse Document Frequency = A measure of the
relative importance of a word in a collection
• 4 implementations:
• LocalStatsCache: Local Stats
• ExactStatsCache: One time use aggregation
• ExactSharedStatsCache: Stats shared across requests
• LRUStatsCache: Stats shared in an LRU cache across
requests
• Ease of getting started means
nothing if you can’t stay
running in production
• Jepsen tests simulate network
partitions, data loss, i.e. “The
Real World”
• https://github.com/
LucidWorks/jepsen/tree/solr-
jepsen
• http://bit.ly/solr-jepsen
Get Finished
Stability Improvements
• Protection of ZK content
• ReplicationHandler now has an option to throttle the
speed of replication
• More control over terminating long running queries
• Finite default timeouts for select and update requests
WELCOME TO THE FUTURE
• Facets and Analytics:
• Mix and match all facet types and stats (SOLR-6352,
SOLR-6353, SOLR-4212)
• Percentiles via t-digest (SOLR-6350)
• Replication performance (SOLR-6816)
• Finish off Config APIs (various)
• Data location aware ValueSource implementation for fast
changing distributed data
• First class support for more languages OOTB
Near Term Road Map
Resources
Release Notes:
• Solr: http://wiki.apache.org/solr/ReleaseNote50
• Lucene: https://wiki.apache.org/lucene-java/
ReleaseNote50
Lucidworks: http://www.lucidworks.com
Shalin Shekhar Mangar
• shalin@apache.org
• Twitter: https://twitter.com/shalinmangar
Credits
What’s new in Solr 5.0 — Anshum Gupta
• http://www.slideshare.net/anshumg/solr-50
Lucidworks webinar “Inside Solr 5” - Grant Ingersoll
• http://www.slideshare.net/lucidworks/webinar-inside-
apache-solr-5
Inside Solr 5 - Bangalore Solr/Lucene Meetup

Inside Solr 5 - Bangalore Solr/Lucene Meetup

  • 2.
    October 13-15, 2015• Austin, TX http://lucenerevolution.org
  • 3.
  • 4.
  • 5.
    Search is morethan just a box.
  • 6.
  • 7.
    Search can besmarter. location search history query security context Personal, contextual, relevant results: consumer- like simplicity and power in the enterprise.
  • 8.
    Product Offering Environment Features Support Level AdditionalSupport Availability Response Time Number of Incidents Pricing Model Solr Enterprise 24x7 SLA-Backed Unlimited Incidents Per Node Dev Support (4 Contacts) Operational Support Regular Health Checks Security Log Analysis / SiLK Support Dashboards & Reporting Enhanced Admin UI Fusion Dev Support (4 Contacts) Operational Support Regular Health Checks 24x7 SLA-Backed Unlimited Incidents Per Node Security Crawlers & Connectors Log Analysis / SiLK Support Enhanced Admin UI Data Enrichment Machine Learning Recommendations Advanced Relevancy Tuning Developer Support How-To Support Knowledge Base Fusion Support 9x5 SLA-Backed Unlimited Incidents Per Named Developer ProductionDevelopment
  • 9.
    • Get Started •Dig in • Go Big • Get Finished • Sneak peak Inside Apache Solr 5
  • 10.
    • Easy tostart/stop ./bin/solr {start|stop} • Create collections: ./bin/solr create -c <COLL_NAME> • No more WAR! Web container (Jetty) is now an implementation detail • Scripts to support installing and running Solr as a service on Linux. Get Started
  • 11.
    JSON’s great: • Solr5 “does the right thing” for JSON out of the box Except when it isn’t: • Most data isn’t JSON • Solr handles CSV, XML, Rich Content out of the box without having to install plugins Your Content, Your Way
  • 12.
    Your Content, YourWay • Solr 5 will ship Tika 1.7, adding: • OCR support • PST and Matlab • Better Date Handling • More flexibility with spatial units
  • 13.
  • 14.
    • Stats andPivot faceting now work together • Focused on accuracy of results • First few steps in unification of all facet types with stats and aggregations • http://lucidworks.com/blog/you- got-stats-in-my-facets/ Pivots and Stats
  • 15.
    • Schema API:REST API for adding field types, and dynamic fields • Managing Request Handlers through API • Implicit registration of replication, Real Time Get and Administration Handlers • Improved APIs for managing collections API Goodness
  • 16.
    Lucene 5 Highlights •Stronger index safety guarantees • Reduced memory usage in a number of areas • No more FieldCache (replaced w/ UninvertingReader) • Multi-valued sorting and suggesters • Better IO defaults when using SSDs • More efficient handling of merging stored fields
  • 17.
    Go Big • Manyscaling improvements focused on interactions with Zookeeper: • Split cluster state management reduces chattiness in large multi-tenant implementations • Improved performance for Overseer operations >40% • Better timeout defaults based on real-world testing • See my Lucene Revolution Keynote for more details: http://bit.ly/shalinRevKeynote
  • 18.
    Distributed IDF • IDF= Inverse Document Frequency = A measure of the relative importance of a word in a collection • 4 implementations: • LocalStatsCache: Local Stats • ExactStatsCache: One time use aggregation • ExactSharedStatsCache: Stats shared across requests • LRUStatsCache: Stats shared in an LRU cache across requests
  • 19.
    • Ease ofgetting started means nothing if you can’t stay running in production • Jepsen tests simulate network partitions, data loss, i.e. “The Real World” • https://github.com/ LucidWorks/jepsen/tree/solr- jepsen • http://bit.ly/solr-jepsen Get Finished
  • 20.
    Stability Improvements • Protectionof ZK content • ReplicationHandler now has an option to throttle the speed of replication • More control over terminating long running queries • Finite default timeouts for select and update requests
  • 21.
  • 22.
    • Facets andAnalytics: • Mix and match all facet types and stats (SOLR-6352, SOLR-6353, SOLR-4212) • Percentiles via t-digest (SOLR-6350) • Replication performance (SOLR-6816) • Finish off Config APIs (various) • Data location aware ValueSource implementation for fast changing distributed data • First class support for more languages OOTB Near Term Road Map
  • 23.
    Resources Release Notes: • Solr:http://wiki.apache.org/solr/ReleaseNote50 • Lucene: https://wiki.apache.org/lucene-java/ ReleaseNote50 Lucidworks: http://www.lucidworks.com Shalin Shekhar Mangar • shalin@apache.org • Twitter: https://twitter.com/shalinmangar
  • 24.
    Credits What’s new inSolr 5.0 — Anshum Gupta • http://www.slideshare.net/anshumg/solr-50 Lucidworks webinar “Inside Solr 5” - Grant Ingersoll • http://www.slideshare.net/lucidworks/webinar-inside- apache-solr-5