SlideShare a Scribd company logo
Solr Anti - patterns
Rafał Kuć, Sematext Group, Inc.
About me
Sematext consultant & engineer co-founder
Father & husband
The (not so) perfect migration
From 3.1 to 4.10 (and hopefully not back)
March 2011 September 2014
The lonely solrconfig.xml
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler" />
<requestHandler name="/update/javabin" class="solr.BinaryUpdateRequestHandler" />
<requestHandler name="/update/csv" class="solr.CSVRequestHandler" />
<requestHandler name="/update/json" class="solr.JsonUpdateRequestHandler" />
<directoryFactory name="DirectoryFactory"
And faulty indexing
And faulty indexing
<?xml version="1.0" encoding="UTF-8"?>
<lst name="responseHeader">
<int name="status">400</int>
<int name="QTime">0</int>
<lst name="error">
<str name="msg">missing content stream</str>
<int name="code">400</int>
109173 [qtp1223685984-20] ERROR org.apache.solr.core.SolrCore ľ org.apache.solr.common.SolrException: missing content stream
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
at org.apache.solr.handler.RequestHandlerBase.handleRequest(
at org.apache.solr.core.SolrCore.execute(
at org.apache.solr.servlet.SolrDispatchFilter.execute(
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(
at org.eclipse.jetty.servlet.ServletHandler.doHandle(
at org.eclipse.jetty.server.handler.ScopedHandler.handle(
at org.eclipse.jetty.server.session.SessionHandler.doHandle(
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(
at org.eclipse.jetty.servlet.ServletHandler.doScope(
at org.eclipse.jetty.server.session.SessionHandler.doScope(
at org.eclipse.jetty.server.handler.ContextHandler.doScope(
at org.eclipse.jetty.server.handler.ScopedHandler.handle(
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
at org.eclipse.jetty.server.handler.HandlerCollection.handle(
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
at org.eclipse.jetty.server.Server.handle(
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(
at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(
at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(
at org.eclipse.jetty.http.HttpParser.parseNext(
at org.eclipse.jetty.http.HttpParser.parseAvailable(
at org.eclipse.jetty.server.BlockingHttpConnection.handle(
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
at org.eclipse.jetty.util.thread.QueuedThreadPool$
at Source)
Let’s make that right
<requestHandler name="/update" class="solr.UpdateRequestHandler" />
<requestHandler name="/update/json" class="solr.UpdateRequestHandler">
<lst name="defaults">
<str name="stream.contentType">application/json</str>
<directoryFactory name="DirectoryFactory"
<str name="dir">
The old schema.xml
<fieldType name="int" class="solr.IntField" omitNorms="true"/>
<fieldType name="long" class="solr.LongField" omitNorms="true"/>
<fieldType name="float" class="solr.FloatField" omitNorms="true"/>
<fieldType name="double" class="solr.DoubleField" omitNorms="true"/>
<fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="int" class="solr.IntField" omitNorms="true"/>
<fieldType name="long" class="solr.LongField" omitNorms="true"/>
<fieldType name="float" class="solr.FloatField" omitNorms="true"/>
<fieldType name="double" class="solr.DoubleField" omitNorms="true"/>
<fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
The old schema.xml
The new schema.xml
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
Threads? What threads?
<Set name="ThreadPool">
<New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
<Set name="minThreads">10</Set>
<Set name="maxThreads">200</Set>
<Set name="detailedDump">false</Set>
I see deadlocks
Threads? What threads?
<Set name="ThreadPool">
<New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
<Set name="minThreads">10</Set>
<Set name="maxThreads">200</Set>
<Set name="detailedDump">false</Set>
OK, so now we can actually run queries
<Set name="ThreadPool">
<New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
<Set name="minThreads">10</Set>
<Set name="maxThreads">10000</Set>
<Set name="detailedDump">false</Set>
The ZooKeeper
The ZooKeeper
The ZooKeeper
The ZooKeeper
The ZooKeeper
The ZooKeeper – production
The ZooKeeper – production
The ZooKeeper – production
The ZooKeeper – production
The ZooKeeper – production
Let’s cache everything
<filterCache class="solr.LRUCache"
<queryResultCache class="solr.LRUCache"
autowarmCount="524288"/><documentCache class="solr.LRUCache"
And now let’s look at the warmup times
And now let’s look at the warmup times
OK, show us the way „Mr. Consultant”
<filterCache class="solr.FastLRUCache"
<queryResultCache class="solr.LRUCache"
autowarmCount="8000"/><documentCache class="solr.LRUCache"
Let’s look at the warmup times again
Let’s look at the warmup times again
Bulks are for noobs
Application Application Application
Doc Doc Doc
Bulks are for noobs
Application Application Application
Doc Doc Doc
But let’s use bulks, just in case
But let’s use bulks, just in case
We need to refresh and hard commit
Maybe we should only refresh?
OK, let’s go easy with refreshing
But I really need all that data
curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
<?xml version="1.0" encoding="UTF-8"?>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">9418</int>
<lst name="params">
<str name="start">3000000</str>
<str name="q">*:*</str>
<str name="rows">100</str>
<result name="response" numFound="3284000" start="3000000">
But I really need all that data
curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
<?xml version="1.0" encoding="UTF-8"?>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">9418</int>
<lst name="params">
<str name="start">3000000</str>
<str name="q">*:*</str>
<str name="rows">5</str>
<result name="response" numFound="3284000" start="3000000">
But I really need all that data
<?xml version="1.0" encoding="UTF-8"?>
<lst name="error">
<str name="msg">java.lang.OutOfMemoryError: Java heap space</str>
<str name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.servlet.SolrDispatchFilter.sendError(
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
Caused by: java.lang.OutOfMemoryError: Java heap space
<int name="code">500</int>
curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
But I really need all that data
But I really need all that data
But I really need all that data
But I really need all that data
Use the scroll Luke
curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc'
Use the scroll Luke
curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc'
<?xml version="1.0" encoding="UTF-8"?>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">189</int>
<lst name="params">
<str name="sort">score desc,id desc</str>
<str name="q">*:*</str>
<str name="cursorMark">*</str>
<result name="response" numFound="3284000" start="0">
<str name="nextCursorMark">AoIIP4AAACY5OTk5OTA=</str>
Use the scroll Luke
curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc
Use the scroll Luke
curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc
<?xml version="1.0" encoding="UTF-8"?>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">184</int>
<lst name="params">
<str name="sort">score desc,id desc</str>
<str name="q">*:*</str>
<str name="cursorMark">AoIIP4AAACY5OTk5OTA=</str>
<result name="response" numFound="3284000" start="0">
<str name="nextCursorMark">AoIIP4AAACY5OTk5ODE=</str>
Limiting faceting, why bother?
curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&…
Limiting faceting, why bother?
curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&…
<?xml version="1.0" encoding="UTF-8"?>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">9967</int>
<lst name="params">
<result name="response" numFound="3284000" start="0">
<lst name="facet_counts">
<lst name="facet_fields">
<lst name="tag">
Limiting faceting, why bother?
curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&…
<?xml version="1.0" encoding="UTF-8"?>
<lst name="error">
<str name="msg">Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space</str>
<str name="trace">org.apache.solr.common.SolrException: Error while processing facet fields:
java.lang.OutOfMemoryError: Java heap space
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.request.SimpleFacets.getFieldCacheCounts(
<int name="code">500</int>
Now let’s look at performance
Now let’s look at performance
Now let’s look at performance
Now let’s look at performance
Now let’s look at performance
Magic happens with small changes
curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&…
Magic happens with small changes
Magic happens with small changes
Magic happens with small changes
Magic happens with small changes
Magic happens with small changes
Magic happens with small changes
Magic happens with small changes
Monitoring in production
And remember…
Quick summary
We are hiring!
Dig Search?
Dig Analytics?
Dig Big Data?
Dig Performance?
Dig Logging?
Dig working with and in open – source?
We’re hiring world – wide!
Thank you!
Rafał Kuć

More Related Content

Viewers also liked

Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
Rafał Kuć
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
Rafał Kuć
From zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and ElasticsearchFrom zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and Elasticsearch
Rafał Kuć
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Rafał Kuć
Architecture challenges of search
Architecture challenges of searchArchitecture challenges of search
Architecture challenges of searchTorsten Köster
Backup & disaster recovery for Solr
Backup & disaster recovery for SolrBackup & disaster recovery for Solr
Backup & disaster recovery for Solr
Hrishikesh Gadre
Infrastructure As Code
Infrastructure As CodeInfrastructure As Code
Infrastructure As Code
Kamil Grabowski
Takahiro Ishikawa
LXC - kontener pingwinów
LXC - kontener pingwinówLXC - kontener pingwinów
LXC - kontener pingwinów
Elasticsearch i Docker - skalowalność, wysoka dostępność i zarządzanie zasobami
Elasticsearch i Docker - skalowalność, wysoka dostępność i zarządzanie zasobamiElasticsearch i Docker - skalowalność, wysoka dostępność i zarządzanie zasobami
Elasticsearch i Docker - skalowalność, wysoka dostępność i zarządzanie zasobami
Enterprise Search Warsaw Meetup
Elasticsearch Data Analyses
Elasticsearch Data AnalysesElasticsearch Data Analyses
Elasticsearch Data AnalysesAlaa Elhadba
How Solr Search Works
How Solr Search WorksHow Solr Search Works
How Solr Search Works
Atlogys Technical Consulting
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
Issei Nishigata
Hukum pembiayaan makalah leasing
Hukum pembiayaan makalah leasingHukum pembiayaan makalah leasing
Hukum pembiayaan makalah leasing
andimalikfatara malik_AVR
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...
Ikki Ohmukai
楽天Edyオンライン 記者説明会資料
楽天Edyオンライン 記者説明会資料楽天Edyオンライン 記者説明会資料
楽天Edyオンライン 記者説明会資料Kei Wada
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, ClouderaWhy Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...

Viewers also liked (20)

Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
From zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and ElasticsearchFrom zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and Elasticsearch
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Architecture challenges of search
Architecture challenges of searchArchitecture challenges of search
Architecture challenges of search
Backup & disaster recovery for Solr
Backup & disaster recovery for SolrBackup & disaster recovery for Solr
Backup & disaster recovery for Solr
Infrastructure As Code
Infrastructure As CodeInfrastructure As Code
Infrastructure As Code
LXC - kontener pingwinów
LXC - kontener pingwinówLXC - kontener pingwinów
LXC - kontener pingwinów
Elasticsearch i Docker - skalowalność, wysoka dostępność i zarządzanie zasobami
Elasticsearch i Docker - skalowalność, wysoka dostępność i zarządzanie zasobamiElasticsearch i Docker - skalowalność, wysoka dostępność i zarządzanie zasobami
Elasticsearch i Docker - skalowalność, wysoka dostępność i zarządzanie zasobami
Elasticsearch Data Analyses
Elasticsearch Data AnalysesElasticsearch Data Analyses
Elasticsearch Data Analyses
How Solr Search Works
How Solr Search WorksHow Solr Search Works
How Solr Search Works
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
Solr6 の紹介(第18回 Solr勉強会 資料) (2016年6月10日)
Hukum pembiayaan makalah leasing
Hukum pembiayaan makalah leasingHukum pembiayaan makalah leasing
Hukum pembiayaan makalah leasing
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...
楽天Edyオンライン 記者説明会資料
楽天Edyオンライン 記者説明会資料楽天Edyオンライン 記者説明会資料
楽天Edyオンライン 記者説明会資料
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, ClouderaWhy Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...
Customer Intelligence: Using the ELK Stack to Analyze ForgeRock OpenAM Audit ...

Similar to Solr Anti - patterns

Solr Anti-Patterns: Presented by Rafał Kuć, Sematext
Solr Anti-Patterns: Presented by Rafał Kuć, SematextSolr Anti-Patterns: Presented by Rafał Kuć, Sematext
Solr Anti-Patterns: Presented by Rafał Kuć, Sematext
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자Donghyeok Kang
A noobs lesson on solr (configuration)
A noobs lesson on solr (configuration)A noobs lesson on solr (configuration)
A noobs lesson on solr (configuration)
Apache Solr Search Mastery
Apache Solr Search MasteryApache Solr Search Mastery
Apache Solr Search Mastery
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
Alexandre Rafalovitch
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg DonovanSolr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Gregg Donovan
Solr & Lucene at Etsy
Solr & Lucene at EtsySolr & Lucene at Etsy
Solr & Lucene at Etsy
Lucidworks (Archived)
Solr and Lucene at Etsy - By Gregg Donovan
Solr and Lucene at Etsy - By Gregg DonovanSolr and Lucene at Etsy - By Gregg Donovan
Solr and Lucene at Etsy - By Gregg Donovan
Apache Solr + ajax solr
Apache Solr + ajax solrApache Solr + ajax solr
Apache Solr + ajax solr
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
Bertrand Delacretaz
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
Sematext Group, Inc.
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
XML Schemas
XML SchemasXML Schemas
XML Schemas
Hoang Nguyen
Cassandra summit
Cassandra summitCassandra summit
Cassandra summitmattstump
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Jayesh Bhoyar

Similar to Solr Anti - patterns (20)

Solr Anti-Patterns: Presented by Rafał Kuć, Sematext
Solr Anti-Patterns: Presented by Rafał Kuć, SematextSolr Anti-Patterns: Presented by Rafał Kuć, Sematext
Solr Anti-Patterns: Presented by Rafał Kuć, Sematext
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
A noobs lesson on solr (configuration)
A noobs lesson on solr (configuration)A noobs lesson on solr (configuration)
A noobs lesson on solr (configuration)
Apache Solr Search Mastery
Apache Solr Search MasteryApache Solr Search Mastery
Apache Solr Search Mastery
Solr02 fields
Solr02 fieldsSolr02 fields
Solr02 fields
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg DonovanSolr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene at Etsy
Solr & Lucene at EtsySolr & Lucene at Etsy
Solr & Lucene at Etsy
Solr and Lucene at Etsy - By Gregg Donovan
Solr and Lucene at Etsy - By Gregg DonovanSolr and Lucene at Etsy - By Gregg Donovan
Solr and Lucene at Etsy - By Gregg Donovan
Apache Solr + ajax solr
Apache Solr + ajax solrApache Solr + ajax solr
Apache Solr + ajax solr
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
XML Schemas
XML SchemasXML Schemas
XML Schemas
Cassandra summit
Cassandra summitCassandra summit
Cassandra summit
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr

Recently uploaded

Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Brad Spiegel Macon GA
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
Javier Lasa
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...

Recently uploaded (20)

Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...

Solr Anti - patterns

  • 1.
  • 2. Solr Anti - patterns Rafał Kuć, Sematext Group, Inc. @kucrafal @sematext
  • 3. About me Sematext consultant & engineer co-founder Father & husband
  • 4. The (not so) perfect migration
  • 5. From 3.1 to 4.10 (and hopefully not back) March 2011 September 2014
  • 6. The lonely solrconfig.xml <requestHandler name="/update" class="solr.XmlUpdateRequestHandler" /> <requestHandler name="/update/javabin" class="solr.BinaryUpdateRequestHandler" /> <requestHandler name="/update/csv" class="solr.CSVRequestHandler" /> <requestHandler name="/update/json" class="solr.JsonUpdateRequestHandler" /> <luceneMatchVersion>LUCENE_31</luceneMatchVersion> <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
  • 8. And faulty indexing <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">400</int> <int name="QTime">0</int> </lst> <lst name="error"> <str name="msg">missing content stream</str> <int name="code">400</int> </lst> </response> 109173 [qtp1223685984-20] ERROR org.apache.solr.core.SolrCore ľ org.apache.solr.common.SolrException: missing content stream at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody( at org.apache.solr.handler.RequestHandlerBase.handleRequest( at org.apache.solr.core.SolrCore.execute( at org.apache.solr.servlet.SolrDispatchFilter.execute( at org.apache.solr.servlet.SolrDispatchFilter.doFilter( at org.apache.solr.servlet.SolrDispatchFilter.doFilter( at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter( at org.eclipse.jetty.servlet.ServletHandler.doHandle( at org.eclipse.jetty.server.handler.ScopedHandler.handle( at at org.eclipse.jetty.server.session.SessionHandler.doHandle( at org.eclipse.jetty.server.handler.ContextHandler.doHandle( at org.eclipse.jetty.servlet.ServletHandler.doScope( at org.eclipse.jetty.server.session.SessionHandler.doScope( at org.eclipse.jetty.server.handler.ContextHandler.doScope( at org.eclipse.jetty.server.handler.ScopedHandler.handle( at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle( at org.eclipse.jetty.server.handler.HandlerCollection.handle( at org.eclipse.jetty.server.handler.HandlerWrapper.handle( at org.eclipse.jetty.server.Server.handle( at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest( at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest( at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete( at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete( at org.eclipse.jetty.http.HttpParser.parseNext( at org.eclipse.jetty.http.HttpParser.parseAvailable( at org.eclipse.jetty.server.BlockingHttpConnection.handle( at$ at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob( at org.eclipse.jetty.util.thread.QueuedThreadPool$ at Source)
  • 9. Let’s make that right <requestHandler name="/update" class="solr.UpdateRequestHandler" /> <requestHandler name="/update/json" class="solr.UpdateRequestHandler"> <lst name="defaults"> <str name="stream.contentType">application/json</str> </lst> </requestHandler> <luceneMatchVersion>LUCENE_4.10.0</luceneMatchVersion> <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/> <nrtMode>true</nrtMode> <updateLog> <str name="dir"> ${solr.ulog.dir:} </str> </updateLog>
  • 10. The old schema.xml <fieldType name="int" class="solr.IntField" omitNorms="true"/> <fieldType name="long" class="solr.LongField" omitNorms="true"/> <fieldType name="float" class="solr.FloatField" omitNorms="true"/> <fieldType name="double" class="solr.DoubleField" omitNorms="true"/> <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/> <fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
  • 11. <fieldType name="int" class="solr.IntField" omitNorms="true"/> <fieldType name="long" class="solr.LongField" omitNorms="true"/> <fieldType name="float" class="solr.FloatField" omitNorms="true"/> <fieldType name="double" class="solr.DoubleField" omitNorms="true"/> <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/> <fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/> The old schema.xml
  • 12. The new schema.xml <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
  • 13. Threads? What threads? <Set name="ThreadPool"> <New class="org.eclipse.jetty.util.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">200</Set> <Set name="detailedDump">false</Set> </New> </Set>
  • 15. Threads? What threads? <Set name="ThreadPool"> <New class="org.eclipse.jetty.util.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">200</Set> <Set name="detailedDump">false</Set> </New> </Set>
  • 16. OK, so now we can actually run queries <Set name="ThreadPool"> <New class="org.eclipse.jetty.util.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">10000</Set> <Set name="detailedDump">false</Set> </New> </Set>
  • 22. The ZooKeeper – production
  • 23. The ZooKeeper – production -DzkHost=zk1:2181,zk2:2181,zk3:2181
  • 24. The ZooKeeper – production -DzkHost=zk1:2181,zk2:2181,zk3:2181
  • 25. The ZooKeeper – production -DzkHost=zk1:2181,zk2:2181,zk3:2181
  • 26. The ZooKeeper – production -DzkHost=zk1:2181,zk2:2181,zk3:2181
  • 27. Let’s cache everything <filterCache class="solr.LRUCache" size="1048576" initialSize="1048576" autowarmCount="524288"/> <queryResultCache class="solr.LRUCache" size="1048576" initialSize="1048576" autowarmCount="524288"/><documentCache class="solr.LRUCache" size="1048576" initialSize="1048576" autowarmCount="0"/>
  • 28. And now let’s look at the warmup times
  • 29. And now let’s look at the warmup times
  • 30. OK, show us the way „Mr. Consultant” <filterCache class="solr.FastLRUCache" size="1024" initialSize="1024" autowarmCount="512"/> <queryResultCache class="solr.LRUCache" size="16000" initialSize="16000" autowarmCount="8000"/><documentCache class="solr.LRUCache" size="16384" initialSize="16384" autowarmCount="0"/>
  • 31. Let’s look at the warmup times again
  • 32. Let’s look at the warmup times again
  • 33. Bulks are for noobs Application Application Application Doc Doc Doc
  • 34. Bulks are for noobs Application Application Application Doc Doc Doc
  • 35. But let’s use bulks, just in case
  • 36. But let’s use bulks, just in case
  • 37. We need to refresh and hard commit <autoCommit> <maxTime>1000</maxTime> <openSearcher>true</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit>
  • 38. Maybe we should only refresh? <autoCommit> <maxTime>60000</maxTime> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit>
  • 39. OK, let’s go easy with refreshing <autoCommit> <maxTime>60000</maxTime> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>30000</maxTime> </autoSoftCommit>
  • 40. But I really need all that data curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
  • 41. <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">9418</int> <lst name="params"> <str name="start">3000000</str> <str name="q">*:*</str> <str name="rows">100</str> </lst> </lst> <result name="response" numFound="3284000" start="3000000"> . . . </result> </response> But I really need all that data curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
  • 42. <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">9418</int> <lst name="params"> <str name="start">3000000</str> <str name="q">*:*</str> <str name="rows">5</str> </lst> </lst> <result name="response" numFound="3284000" start="3000000"> . . . </result> </response> But I really need all that data <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="error"> <str name="msg">java.lang.OutOfMemoryError: Java heap space</str> <str name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.servlet.SolrDispatchFilter.sendError( at org.apache.solr.servlet.SolrDispatchFilter.doFilter( . . . Caused by: java.lang.OutOfMemoryError: Java heap space . . . </str> <int name="code">500</int> </lst> </response> curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
  • 43. But I really need all that data Query
  • 44. But I really need all that data
  • 45. But I really need all that data
  • 46. But I really need all that data Response
  • 47. Use the scroll Luke curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc'
  • 48. Use the scroll Luke curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc' <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">189</int> <lst name="params"> <str name="sort">score desc,id desc</str> <str name="q">*:*</str> <str name="cursorMark">*</str> </lst> </lst> <result name="response" numFound="3284000" start="0"> <doc> ... </doc> . . . </result> <str name="nextCursorMark">AoIIP4AAACY5OTk5OTA=</str> </response>
  • 49. Use the scroll Luke curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc &cursorMark=AoIIP4AAACY5OTk5OTA='
  • 50. Use the scroll Luke curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc &cursorMark=AoIIP4AAACY5OTk5OTA=' <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">184</int> <lst name="params"> <str name="sort">score desc,id desc</str> <str name="q">*:*</str> <str name="cursorMark">AoIIP4AAACY5OTk5OTA=</str> </lst> </lst> <result name="response" numFound="3284000" start="0"> <doc> ... </doc> . . . </result> <str name="nextCursorMark">AoIIP4AAACY5OTk5ODE=</str> </response>
  • 51. Limiting faceting, why bother? curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-1&facet.mincount=0'
  • 52. Limiting faceting, why bother? curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-1&facet.mincount=0' <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">9967</int> <lst name="params"> ... </lst> </lst> <result name="response" numFound="3284000" start="0"> . . . </result> <lst name="facet_counts"> <lst name="facet_fields"> <lst name="tag"> ... </lst> </lst> </lst> </response>
  • 53. Limiting faceting, why bother? curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-1&facet.mincount=0' <?xml version="1.0" encoding="UTF-8"?> <response> . . . <lst name="error"> <str name="msg">Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space</str> <str name="trace">org.apache.solr.common.SolrException: Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space . . . Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.request.SimpleFacets.getFieldCacheCounts( . . . </str> <int name="code">500</int> </lst> </response>
  • 54. Now let’s look at performance
  • 55. Now let’s look at performance
  • 56. Now let’s look at performance
  • 57. Now let’s look at performance
  • 58. Now let’s look at performance
  • 59. Magic happens with small changes curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=100&facet.mincount=1'
  • 60. Magic happens with small changes
  • 61. Magic happens with small changes
  • 62. Magic happens with small changes
  • 63. Magic happens with small changes
  • 64. Magic happens with small changes
  • 65. Magic happens with small changes
  • 66. Magic happens with small changes
  • 70. We are hiring! Dig Search? Dig Analytics? Dig Big Data? Dig Performance? Dig Logging? Dig working with and in open – source? We’re hiring world – wide!