SlideShare a Scribd company logo
1 of 19
Lucene Performance
        Workshop




Lucid Imagination, Inc.




                          Lucid Imagination, Inc.   1
Intro


About the speaker and Lucid Imagination
Agenda
 Lucene and performance
 Lucid Gaze for Lucene: UI and API
 Key statistics
 Examples
 Q & A session




                           Lucid Imagination, Inc.




                                                     Lucid Imagination, Inc.   2
Lucene and performance
  Perceived performance issues can have different causes
  Classic JVM problems, classic solutions
   heap size
   garbage collection
   stack size
   HotSpot
  Lucene/Search-related issues: beyond JVM tuning
    Indexing performance: indexing too slow, strange
    slowdowns during indexing
    Search performance: search too slow in general, or for
                           Lucid Imagination, Inc.



    certain types of queries


                                            Lucid Imagination, Inc.   3
Common Lucene performance issues


   Indexing:
     Too many segments being created
     Too many Token-s / TokenStream-s
     Too many Documents / Fields
   Searching:
     Too many IndexReader-s / IndexSearcher-s
     High RAM usage of IndexReader
     Slow response times for certain queries
   Application-level logging may not be up to the task
   Profiler is too low-level and too intrusive
                               Lucid Imagination, Inc.

   We need a lightweight probe to peek at vital Lucene statistics


                                                         Lucid Imagination, Inc.   4
Lucid Gaze for Lucene (LG4L)
Target audience and applications
  Tool for developers
  Performance monitoring
  Statistics collection
  Drop-in replacement for lucene-core-2.4.1.jar




                           Lucid Imagination, Inc.




                                                     Lucid Imagination, Inc.   5
Available information

Statistics (per time unit):
  IndexReader / IndexWriter:
    Number of documents and fields retrieved / created
    Number of IndexWriter / IndexReader / Directory instances created
      And the number of live instances!
    Memory consumption of IndexWriter / IndexReader instances
  Analysis:
    Number of Analyzers / TokenFilters / Tokenizers
    Number of TokenStream-s and Token-s
  Search:
    Number of searches and their average time
                            Lucid Imagination, Inc.
    Number of opened IndexSearcher-s
  Storage:
      Number of Lucene Directory instances created
                                                      Lucid Imagination, Inc.   6
Available information: metrics

Lists and histograms
  Count and a list of Analyzer, Tokenizer, TokenFilter
  instances
  Directory implementations
  Top-N queries:
    Queries with largest numbers of hits
    Queries that took longest to execute


  All this data is available as log, persistent DB and through the API

                              Lucid Imagination, Inc.




                                                        Lucid Imagination, Inc.   7
In-memory and RRD storage

Retaining historical values of collected statistics
  In-memory
    No files, no configuration hassles
    Concise overview periodically written to log (optional)
    Uses Java logging
  RRD (Round-Robin Database)
    Persistent round-robin database
        Single database of a constant size
        E.g. hourly, daily, weekly, monthly, yearly statistics
    Suitable for long-term monitoring
    Many more metrics and statistics tracked
    Can be accessed concurrentlyImagination,other applications
                             Lucid
                                   from Inc.




                                                                 Lucid Imagination, Inc.   8
Configuration

Java properties or gaze.properties
  List of properties supplied as -Dlucid.gaze...
  gaze.properties on classpath
  Configurability:
    Turning on/off selected monitors
    Producing debug output
    Using in-memory or RRD log retention
    Configuring RRD archives (to scale historical data over different
    periods)


                               Lucid Imagination, Inc.




                                                         Lucid Imagination, Inc.   9
API

Facade with static methods: LuceneCore
  Programmatic access to all statistics groups
  Retrieve top-N queries
  Retrieving additional metrics (e.g. histograms of analyzers,
  tokenizers, directory implementations, tracking of IndexReader /
  IndexWriter instances and their memory consumption)
  Enabling / disabling monitors
  Resetting statistics (useful for creating snapshots)


                             Lucid Imagination, Inc.




                                                       Lucid Imagination, Inc.   10
Example: indexing performance tuning

 Based on the contrib/benchmark suite
   Test impact of number of buffered docs
   Other interesting observations
     Number of documents / fields
     Number of tokens / token streams
     Number of IndexReader / Directory instances
     Number of IndexSearchers




                              Lucid Imagination, Inc.




                                                        Lucid Imagination, Inc.   11
Example: console output




                Lucid Imagination, Inc.




                                          Lucid Imagination, Inc.   12
Example: RRD Inspector




               Lucid Imagination, Inc.




                                         Lucid Imagination, Inc.   13
RRD Inspector (2)




             Lucid Imagination, Inc.




                                       Lucid Imagination, Inc.   14
RRD Inspector (3)




             Lucid Imagination, Inc.




                                       Lucid Imagination, Inc.   15
Performance impact of
     performance monitoring
Overhead of using LG4L
  Benchmarks (in contrib/benchmark) slower by ~10-15% on
  average, memory consumption higher by ~10%


  Remember: you can turn off some monitors!




                          Lucid Imagination, Inc.




                                                    Lucid Imagination, Inc.   16
Conclusions


Lucene can perform fantastically
   ... but it can't outmaneuver sub-optimal design or weak
  configuration
LG4L helps to understand the causes of poor performance
  Insight into high-level statistics that relate to Lucene API
  Round-robin database for tracking historical data
  LG4L is lightweight!




                              Lucid Imagination, Inc.




                                                        Lucid Imagination, Inc.   17
Q&A




Download and documentation:
  http://www.lucidimagination.com/Downloads/LucidGaze-for-Lucene
                           Lucid Imagination, Inc.




                                                     Lucid Imagination, Inc.   18
Example: LG4L with Solr


INFO: * AnalysisStats:
INFO: * AnalysisStats:
INFO:
INFO:    counters: {toks=258}
         counters: {toks=258}
INFO:
INFO:    metrics: {tns={=2, WhitespaceTokenizer=8},
         metrics: {tns={=2,
tfs={WordDelimiterFilter=8, StopFilter=8, SynonymFilter=8, LowerCaseFilter=8,
tfs={WordDelimiterFilter=8,                                  LowerCaseFilter=8,
EnglishPorterFilter=8},
EnglishPorterFilter=8},
ans={org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer=1,
ans={org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer=1,
org.apache.solr.analysis.TokenizerChain=6,
org.apache.solr.analysis.TokenizerChain=6,
org.apache.solr.schema.FieldType$DefaultAnalyzer=13,
org.apache.lucene.analysis.WhitespaceAnalyzer=1,
org.apache.lucene.analysis.WhitespaceAnalyzer=1,
org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer=1}}
INFO: * DocumentStats:
INFO: * DocumentStats:
INFO:
INFO:    counters: {docs=1, fields=20}
         counters: {docs=1,
INFO: * IndexStats:
INFO: * IndexStats:
INFO:
INFO:    counters: {ir_isdC=1, ir_C=4, iw_C=0, ir_newC=7, ir_tpC=12,
         counters: {ir_isdC=1,              iw_C=0,
iw_segs=0, iw_buf=0, ir_tdC=10, ir_ram=1343504, iw_ram=0}
iw_segs=0, iw_buf=0, ir_tdC=10, ir_ram=1343504, iw_ram=0}
INFO: * SearchStats:
INFO: * SearchStats:
INFO:
INFO:    counters: {dfC=11, rwrC=3, rwrT=30265, srchrC=14, srchT=90133076,
         counters: {dfC=11,           rwrT=30265, srchrC=14,
srchC=6}
srchC=6}
INFO: * StoreStats:
INFO: * StoreStats:
INFO:
INFO:    counters: {dirC=8}
         counters: {dirC=8}       Lucid Imagination, Inc.
INFO:
INFO:    metrics: {dir_t={FSDirectory=8}}
         metrics: {dir_t={FSDirectory=8}}


 … but of course you should use LucidGaze for Solr instead!

                                                        Lucid Imagination, Inc.   19

More Related Content

Viewers also liked

across the universe
across the universeacross the universe
across the universetanica
 
"Search, APIs,Capability Management and the Sensis Journey"
"Search, APIs,Capability Management and the Sensis Journey""Search, APIs,Capability Management and the Sensis Journey"
"Search, APIs,Capability Management and the Sensis Journey"Lucidworks (Archived)
 
Hellosong
HellosongHellosong
Hellosongtanica
 
Spanish bombss
Spanish bombssSpanish bombss
Spanish bombsstanica
 
20101023 ie9 cache
20101023 ie9 cache20101023 ie9 cache
20101023 ie9 cache彰 村地
 
Maroon5
Maroon5Maroon5
Maroon5tanica
 
Speed Up Web 2012
Speed Up Web 2012Speed Up Web 2012
Speed Up Web 2012彰 村地
 
Tate Tyler - Designing the Search Experience
Tate Tyler - Designing the Search ExperienceTate Tyler - Designing the Search Experience
Tate Tyler - Designing the Search ExperienceLucidworks (Archived)
 
Technology opportunities in hampton roads (kaszubowski ), nasa technology day...
Technology opportunities in hampton roads (kaszubowski ), nasa technology day...Technology opportunities in hampton roads (kaszubowski ), nasa technology day...
Technology opportunities in hampton roads (kaszubowski ), nasa technology day...Marty Kaszubowski
 
I love you mommy
I love you mommyI love you mommy
I love you mommyNyiah
 
Highly Relevant Search Result Ranking for Law Enforcement
Highly Relevant Search Result Ranking for Law EnforcementHighly Relevant Search Result Ranking for Law Enforcement
Highly Relevant Search Result Ranking for Law EnforcementLucidworks (Archived)
 
Impact of open source search on the intelligence community
Impact of open source search on the intelligence communityImpact of open source search on the intelligence community
Impact of open source search on the intelligence communityLucidworks (Archived)
 
Getting started with Lucidworks Enterprise
Getting started with Lucidworks EnterpriseGetting started with Lucidworks Enterprise
Getting started with Lucidworks EnterpriseLucidworks (Archived)
 

Viewers also liked (17)

What’s new in apache lucene 3.0
What’s new in apache lucene 3.0What’s new in apache lucene 3.0
What’s new in apache lucene 3.0
 
across the universe
across the universeacross the universe
across the universe
 
Juan gris
Juan grisJuan gris
Juan gris
 
Joan Miro
Joan MiroJoan Miro
Joan Miro
 
"Search, APIs,Capability Management and the Sensis Journey"
"Search, APIs,Capability Management and the Sensis Journey""Search, APIs,Capability Management and the Sensis Journey"
"Search, APIs,Capability Management and the Sensis Journey"
 
Hellosong
HellosongHellosong
Hellosong
 
Spanish bombss
Spanish bombssSpanish bombss
Spanish bombss
 
20101023 ie9 cache
20101023 ie9 cache20101023 ie9 cache
20101023 ie9 cache
 
Maroon5
Maroon5Maroon5
Maroon5
 
Speed Up Web 2012
Speed Up Web 2012Speed Up Web 2012
Speed Up Web 2012
 
Tate Tyler - Designing the Search Experience
Tate Tyler - Designing the Search ExperienceTate Tyler - Designing the Search Experience
Tate Tyler - Designing the Search Experience
 
Technology opportunities in hampton roads (kaszubowski ), nasa technology day...
Technology opportunities in hampton roads (kaszubowski ), nasa technology day...Technology opportunities in hampton roads (kaszubowski ), nasa technology day...
Technology opportunities in hampton roads (kaszubowski ), nasa technology day...
 
I love you mommy
I love you mommyI love you mommy
I love you mommy
 
Highly Relevant Search Result Ranking for Law Enforcement
Highly Relevant Search Result Ranking for Law EnforcementHighly Relevant Search Result Ranking for Law Enforcement
Highly Relevant Search Result Ranking for Law Enforcement
 
Impact of open source search on the intelligence community
Impact of open source search on the intelligence communityImpact of open source search on the intelligence community
Impact of open source search on the intelligence community
 
Getting started with Lucidworks Enterprise
Getting started with Lucidworks EnterpriseGetting started with Lucidworks Enterprise
Getting started with Lucidworks Enterprise
 
Solr & Lucene at Etsy
Solr & Lucene at EtsySolr & Lucene at Etsy
Solr & Lucene at Etsy
 

Similar to Understanding Lucene Search Performance

Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrLucidworks (Archived)
 
December 2013 HUG: Hunk - Splunk over Hadoop
December 2013 HUG: Hunk - Splunk over HadoopDecember 2013 HUG: Hunk - Splunk over Hadoop
December 2013 HUG: Hunk - Splunk over HadoopYahoo Developer Network
 
SplunkLive! Detroit April 2013 - Domino's Pizza
SplunkLive! Detroit April 2013 - Domino's PizzaSplunkLive! Detroit April 2013 - Domino's Pizza
SplunkLive! Detroit April 2013 - Domino's PizzaSplunk
 
Integrating Splunk into your Spring Applications
Integrating Splunk into your Spring ApplicationsIntegrating Splunk into your Spring Applications
Integrating Splunk into your Spring ApplicationsDamien Dallimore
 
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...Dan Cundiff
 
dlux - Splunk Technical Overview
dlux - Splunk Technical Overviewdlux - Splunk Technical Overview
dlux - Splunk Technical OverviewDavid Lutz
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackRohit Sharma
 
2015 03-16-elk at-bsides
2015 03-16-elk at-bsides2015 03-16-elk at-bsides
2015 03-16-elk at-bsidesJeremy Cohoe
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionSplunk
 
SplunkLive! Salt Lake City June 2013 - Ancestry.com
SplunkLive! Salt Lake City June 2013 - Ancestry.comSplunkLive! Salt Lake City June 2013 - Ancestry.com
SplunkLive! Salt Lake City June 2013 - Ancestry.comSplunk
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsDigitalOcean
 
Stabilizing the Jenga tower: Scaling out Ceilometer
Stabilizing the Jenga tower: Scaling out CeilometerStabilizing the Jenga tower: Scaling out Ceilometer
Stabilizing the Jenga tower: Scaling out CeilometerPradeep Kilambi
 
Stabilising the jenga tower
Stabilising the jenga towerStabilising the jenga tower
Stabilising the jenga towerGordon Chung
 
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, LucidworksSearching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, LucidworksLucidworks
 
Game Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupGame Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupJelena Zanko
 
Splunk in Nordstrom: IT Operations
Splunk in Nordstrom: IT OperationsSplunk in Nordstrom: IT Operations
Splunk in Nordstrom: IT OperationsTimur Bagirov
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysisPramod Toraskar
 

Similar to Understanding Lucene Search Performance (20)

Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for Solr
 
December 2013 HUG: Hunk - Splunk over Hadoop
December 2013 HUG: Hunk - Splunk over HadoopDecember 2013 HUG: Hunk - Splunk over Hadoop
December 2013 HUG: Hunk - Splunk over Hadoop
 
SplunkLive! Detroit April 2013 - Domino's Pizza
SplunkLive! Detroit April 2013 - Domino's PizzaSplunkLive! Detroit April 2013 - Domino's Pizza
SplunkLive! Detroit April 2013 - Domino's Pizza
 
Integrating Splunk into your Spring Applications
Integrating Splunk into your Spring ApplicationsIntegrating Splunk into your Spring Applications
Integrating Splunk into your Spring Applications
 
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
 
dlux - Splunk Technical Overview
dlux - Splunk Technical Overviewdlux - Splunk Technical Overview
dlux - Splunk Technical Overview
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK Stack
 
2015 03-16-elk at-bsides
2015 03-16-elk at-bsides2015 03-16-elk at-bsides
2015 03-16-elk at-bsides
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
SplunkLive! Salt Lake City June 2013 - Ancestry.com
SplunkLive! Salt Lake City June 2013 - Ancestry.comSplunkLive! Salt Lake City June 2013 - Ancestry.com
SplunkLive! Salt Lake City June 2013 - Ancestry.com
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult Steps
 
Stabilizing the Jenga tower: Scaling out Ceilometer
Stabilizing the Jenga tower: Scaling out CeilometerStabilizing the Jenga tower: Scaling out Ceilometer
Stabilizing the Jenga tower: Scaling out Ceilometer
 
Stabilising the jenga tower
Stabilising the jenga towerStabilising the jenga tower
Stabilising the jenga tower
 
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, LucidworksSearching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
 
Game Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupGame Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid Meetup
 
Splunk in Nordstrom: IT Operations
Splunk in Nordstrom: IT OperationsSplunk in Nordstrom: IT Operations
Splunk in Nordstrom: IT Operations
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
 
Using the Splunk Java SDK
Using the Splunk Java SDKUsing the Splunk Java SDK
Using the Splunk Java SDK
 
Geode Meetup Apachecon
Geode Meetup ApacheconGeode Meetup Apachecon
Geode Meetup Apachecon
 

More from Lucidworks (Archived)

Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Lucidworks (Archived)
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and SolrLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchLucidworks (Archived)
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Lucidworks (Archived)
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...Lucidworks (Archived)
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Lucidworks (Archived)
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCLucidworks (Archived)
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCLucidworks (Archived)
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCLucidworks (Archived)
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCLucidworks (Archived)
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKLucidworks (Archived)
 

More from Lucidworks (Archived) (20)

Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
The Data-Driven Paradigm
The Data-Driven ParadigmThe Data-Driven Paradigm
The Data-Driven Paradigm
 
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 

Recently uploaded

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Recently uploaded (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Understanding Lucene Search Performance

  • 1. Lucene Performance Workshop Lucid Imagination, Inc. Lucid Imagination, Inc. 1
  • 2. Intro About the speaker and Lucid Imagination Agenda Lucene and performance Lucid Gaze for Lucene: UI and API Key statistics Examples Q & A session Lucid Imagination, Inc. Lucid Imagination, Inc. 2
  • 3. Lucene and performance Perceived performance issues can have different causes Classic JVM problems, classic solutions heap size garbage collection stack size HotSpot Lucene/Search-related issues: beyond JVM tuning Indexing performance: indexing too slow, strange slowdowns during indexing Search performance: search too slow in general, or for Lucid Imagination, Inc. certain types of queries Lucid Imagination, Inc. 3
  • 4. Common Lucene performance issues Indexing: Too many segments being created Too many Token-s / TokenStream-s Too many Documents / Fields Searching: Too many IndexReader-s / IndexSearcher-s High RAM usage of IndexReader Slow response times for certain queries Application-level logging may not be up to the task Profiler is too low-level and too intrusive Lucid Imagination, Inc. We need a lightweight probe to peek at vital Lucene statistics Lucid Imagination, Inc. 4
  • 5. Lucid Gaze for Lucene (LG4L) Target audience and applications Tool for developers Performance monitoring Statistics collection Drop-in replacement for lucene-core-2.4.1.jar Lucid Imagination, Inc. Lucid Imagination, Inc. 5
  • 6. Available information Statistics (per time unit): IndexReader / IndexWriter: Number of documents and fields retrieved / created Number of IndexWriter / IndexReader / Directory instances created And the number of live instances! Memory consumption of IndexWriter / IndexReader instances Analysis: Number of Analyzers / TokenFilters / Tokenizers Number of TokenStream-s and Token-s Search: Number of searches and their average time Lucid Imagination, Inc. Number of opened IndexSearcher-s Storage: Number of Lucene Directory instances created Lucid Imagination, Inc. 6
  • 7. Available information: metrics Lists and histograms Count and a list of Analyzer, Tokenizer, TokenFilter instances Directory implementations Top-N queries: Queries with largest numbers of hits Queries that took longest to execute All this data is available as log, persistent DB and through the API Lucid Imagination, Inc. Lucid Imagination, Inc. 7
  • 8. In-memory and RRD storage Retaining historical values of collected statistics In-memory No files, no configuration hassles Concise overview periodically written to log (optional) Uses Java logging RRD (Round-Robin Database) Persistent round-robin database Single database of a constant size E.g. hourly, daily, weekly, monthly, yearly statistics Suitable for long-term monitoring Many more metrics and statistics tracked Can be accessed concurrentlyImagination,other applications Lucid from Inc. Lucid Imagination, Inc. 8
  • 9. Configuration Java properties or gaze.properties List of properties supplied as -Dlucid.gaze... gaze.properties on classpath Configurability: Turning on/off selected monitors Producing debug output Using in-memory or RRD log retention Configuring RRD archives (to scale historical data over different periods) Lucid Imagination, Inc. Lucid Imagination, Inc. 9
  • 10. API Facade with static methods: LuceneCore Programmatic access to all statistics groups Retrieve top-N queries Retrieving additional metrics (e.g. histograms of analyzers, tokenizers, directory implementations, tracking of IndexReader / IndexWriter instances and their memory consumption) Enabling / disabling monitors Resetting statistics (useful for creating snapshots) Lucid Imagination, Inc. Lucid Imagination, Inc. 10
  • 11. Example: indexing performance tuning Based on the contrib/benchmark suite Test impact of number of buffered docs Other interesting observations Number of documents / fields Number of tokens / token streams Number of IndexReader / Directory instances Number of IndexSearchers Lucid Imagination, Inc. Lucid Imagination, Inc. 11
  • 12. Example: console output Lucid Imagination, Inc. Lucid Imagination, Inc. 12
  • 13. Example: RRD Inspector Lucid Imagination, Inc. Lucid Imagination, Inc. 13
  • 14. RRD Inspector (2) Lucid Imagination, Inc. Lucid Imagination, Inc. 14
  • 15. RRD Inspector (3) Lucid Imagination, Inc. Lucid Imagination, Inc. 15
  • 16. Performance impact of performance monitoring Overhead of using LG4L Benchmarks (in contrib/benchmark) slower by ~10-15% on average, memory consumption higher by ~10% Remember: you can turn off some monitors! Lucid Imagination, Inc. Lucid Imagination, Inc. 16
  • 17. Conclusions Lucene can perform fantastically ... but it can't outmaneuver sub-optimal design or weak configuration LG4L helps to understand the causes of poor performance Insight into high-level statistics that relate to Lucene API Round-robin database for tracking historical data LG4L is lightweight! Lucid Imagination, Inc. Lucid Imagination, Inc. 17
  • 18. Q&A Download and documentation: http://www.lucidimagination.com/Downloads/LucidGaze-for-Lucene Lucid Imagination, Inc. Lucid Imagination, Inc. 18
  • 19. Example: LG4L with Solr INFO: * AnalysisStats: INFO: * AnalysisStats: INFO: INFO: counters: {toks=258} counters: {toks=258} INFO: INFO: metrics: {tns={=2, WhitespaceTokenizer=8}, metrics: {tns={=2, tfs={WordDelimiterFilter=8, StopFilter=8, SynonymFilter=8, LowerCaseFilter=8, tfs={WordDelimiterFilter=8, LowerCaseFilter=8, EnglishPorterFilter=8}, EnglishPorterFilter=8}, ans={org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer=1, ans={org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer=1, org.apache.solr.analysis.TokenizerChain=6, org.apache.solr.analysis.TokenizerChain=6, org.apache.solr.schema.FieldType$DefaultAnalyzer=13, org.apache.lucene.analysis.WhitespaceAnalyzer=1, org.apache.lucene.analysis.WhitespaceAnalyzer=1, org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer=1}} INFO: * DocumentStats: INFO: * DocumentStats: INFO: INFO: counters: {docs=1, fields=20} counters: {docs=1, INFO: * IndexStats: INFO: * IndexStats: INFO: INFO: counters: {ir_isdC=1, ir_C=4, iw_C=0, ir_newC=7, ir_tpC=12, counters: {ir_isdC=1, iw_C=0, iw_segs=0, iw_buf=0, ir_tdC=10, ir_ram=1343504, iw_ram=0} iw_segs=0, iw_buf=0, ir_tdC=10, ir_ram=1343504, iw_ram=0} INFO: * SearchStats: INFO: * SearchStats: INFO: INFO: counters: {dfC=11, rwrC=3, rwrT=30265, srchrC=14, srchT=90133076, counters: {dfC=11, rwrT=30265, srchrC=14, srchC=6} srchC=6} INFO: * StoreStats: INFO: * StoreStats: INFO: INFO: counters: {dirC=8} counters: {dirC=8} Lucid Imagination, Inc. INFO: INFO: metrics: {dir_t={FSDirectory=8}} metrics: {dir_t={FSDirectory=8}} … but of course you should use LucidGaze for Solr instead! Lucid Imagination, Inc. 19