Solr Anti - patterns
Rafał Kuć, Sematext Group, Inc.
@kucrafal
@sematext
http://sematext.com
About me
Sematext consultant & engineer
Solr.pl co-founder
Father & husband
The (not so) perfect migration
http://en.wikipedia.org/wiki/Bird_migration
http://www.likesbooks.com/aarafterhours/?p=750
From 3.1 to 4.10 (and hopefully not back)
March 2011 September 2014
The lonely solrconfig.xml
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler" />
<requestHandler name="/update/javabin" class="solr.BinaryUpdateRequestHandler" />
<requestHandler name="/update/csv" class="solr.CSVRequestHandler" />
<requestHandler name="/update/json" class="solr.JsonUpdateRequestHandler" />
<luceneMatchVersion>LUCENE_31</luceneMatchVersion>
<directoryFactory name="DirectoryFactory"
class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
DOC
DOC
DOC
And faulty indexing
EXCEPTIONS :)
And faulty indexing
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">400</int>
<int name="QTime">0</int>
</lst>
<lst name="error">
<str name="msg">missing content stream</str>
<int name="code">400</int>
</lst>
</response>
109173 [qtp1223685984-20] ERROR org.apache.solr.core.SolrCore ľ org.apache.solr.common.SolrException: missing content stream
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:647)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)
Let’s make that right
<requestHandler name="/update" class="solr.UpdateRequestHandler" />
<requestHandler name="/update/json" class="solr.UpdateRequestHandler">
<lst name="defaults">
<str name="stream.contentType">application/json</str>
</lst>
</requestHandler>
<luceneMatchVersion>LUCENE_4.10.0</luceneMatchVersion>
<directoryFactory name="DirectoryFactory"
class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
<nrtMode>true</nrtMode>
<updateLog>
<str name="dir">
${solr.ulog.dir:}
</str>
</updateLog>
The old schema.xml
<fieldType name="int" class="solr.IntField" omitNorms="true"/>
<fieldType name="long" class="solr.LongField" omitNorms="true"/>
<fieldType name="float" class="solr.FloatField" omitNorms="true"/>
<fieldType name="double" class="solr.DoubleField" omitNorms="true"/>
<fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="int" class="solr.IntField" omitNorms="true"/>
<fieldType name="long" class="solr.LongField" omitNorms="true"/>
<fieldType name="float" class="solr.FloatField" omitNorms="true"/>
<fieldType name="double" class="solr.DoubleField" omitNorms="true"/>
<fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
The old schema.xml
The new schema.xml
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
Threads? What threads?
<Set name="ThreadPool">
<New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
<Set name="minThreads">10</Set>
<Set name="maxThreads">200</Set>
<Set name="detailedDump">false</Set>
</New>
</Set>
I see deadlocks
Threads? What threads?
<Set name="ThreadPool">
<New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
<Set name="minThreads">10</Set>
<Set name="maxThreads">200</Set>
<Set name="detailedDump">false</Set>
</New>
</Set>
OK, so now we can actually run queries
<Set name="ThreadPool">
<New class="org.eclipse.jetty.util.thread.QueuedThreadPool">
<Set name="minThreads">10</Set>
<Set name="maxThreads">10000</Set>
<Set name="detailedDump">false</Set>
</New>
</Set>
The ZooKeeper
The ZooKeeper
The ZooKeeper
The ZooKeeper
The ZooKeeper
The ZooKeeper – production
The ZooKeeper – production
-DzkHost=zk1:2181,zk2:2181,zk3:2181
The ZooKeeper – production
-DzkHost=zk1:2181,zk2:2181,zk3:2181
The ZooKeeper – production
-DzkHost=zk1:2181,zk2:2181,zk3:2181
The ZooKeeper – production
-DzkHost=zk1:2181,zk2:2181,zk3:2181
Let’s cache everything
<filterCache class="solr.LRUCache"
size="1048576"
initialSize="1048576"
autowarmCount="524288"/>
<queryResultCache class="solr.LRUCache"
size="1048576"
initialSize="1048576"
autowarmCount="524288"/><documentCache class="solr.LRUCache"
size="1048576"
initialSize="1048576"
autowarmCount="0"/>
And now let’s look at the warmup times
And now let’s look at the warmup times
OK, show us the way „Mr. Consultant”
<filterCache class="solr.FastLRUCache"
size="1024"
initialSize="1024"
autowarmCount="512"/>
<queryResultCache class="solr.LRUCache"
size="16000"
initialSize="16000"
autowarmCount="8000"/><documentCache class="solr.LRUCache"
size="16384"
initialSize="16384"
autowarmCount="0"/>
Let’s look at the warmup times again
Let’s look at the warmup times again
Bulks are for noobs
Application Application Application
Doc Doc Doc
Bulks are for noobs
Application Application Application
Doc Doc Doc
But let’s use bulks, just in case
But let’s use bulks, just in case
We need to refresh and hard commit
<autoCommit>
<maxTime>1000</maxTime>
<openSearcher>true</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>1000</maxTime>
</autoSoftCommit>
Maybe we should only refresh?
<autoCommit>
<maxTime>60000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>1000</maxTime>
</autoSoftCommit>
OK, let’s go easy with refreshing
<autoCommit>
<maxTime>60000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>30000</maxTime>
</autoSoftCommit>
But I really need all that data
curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">9418</int>
<lst name="params">
<str name="start">3000000</str>
<str name="q">*:*</str>
<str name="rows">100</str>
</lst>
</lst>
<result name="response" numFound="3284000" start="3000000">
.
.
.
</result>
</response>
But I really need all that data
curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">9418</int>
<lst name="params">
<str name="start">3000000</str>
<str name="q">*:*</str>
<str name="rows">5</str>
</lst>
</lst>
<result name="response" numFound="3284000" start="3000000">
.
.
.
</result>
</response>
But I really need all that data
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="error">
<str name="msg">java.lang.OutOfMemoryError: Java heap space</str>
<str name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:796)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:448)
.
.
.
Caused by: java.lang.OutOfMemoryError: Java heap space
.
.
.
</str>
<int name="code">500</int>
</lst>
</response>
curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
But I really need all that data
Query
But I really need all that data
But I really need all that data
But I really need all that data
Response
Use the scroll Luke
curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc'
Use the scroll Luke
curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc'
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">189</int>
<lst name="params">
<str name="sort">score desc,id desc</str>
<str name="q">*:*</str>
<str name="cursorMark">*</str>
</lst>
</lst>
<result name="response" numFound="3284000" start="0">
<doc>
...
</doc>
.
.
.
</result>
<str name="nextCursorMark">AoIIP4AAACY5OTk5OTA=</str>
</response>
Use the scroll Luke
curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc
&cursorMark=AoIIP4AAACY5OTk5OTA='
Use the scroll Luke
curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc
&cursorMark=AoIIP4AAACY5OTk5OTA='
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">184</int>
<lst name="params">
<str name="sort">score desc,id desc</str>
<str name="q">*:*</str>
<str name="cursorMark">AoIIP4AAACY5OTk5OTA=</str>
</lst>
</lst>
<result name="response" numFound="3284000" start="0">
<doc>
...
</doc>
.
.
.
</result>
<str name="nextCursorMark">AoIIP4AAACY5OTk5ODE=</str>
</response>
Limiting faceting, why bother?
curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&…
facet.limit=-1&facet.mincount=0'
Limiting faceting, why bother?
curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&…
facet.limit=-1&facet.mincount=0'
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">9967</int>
<lst name="params">
...
</lst>
</lst>
<result name="response" numFound="3284000" start="0">
.
.
.
</result>
<lst name="facet_counts">
<lst name="facet_fields">
<lst name="tag">
...
</lst>
</lst>
</lst>
</response>
Limiting faceting, why bother?
curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&…
facet.limit=-1&facet.mincount=0'
<?xml version="1.0" encoding="UTF-8"?>
<response>
.
.
.
<lst name="error">
<str name="msg">Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space</str>
<str name="trace">org.apache.solr.common.SolrException: Error while processing facet fields:
java.lang.OutOfMemoryError: Java heap space
.
.
.
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:685)
.
.
.
</str>
<int name="code">500</int>
</lst>
</response>
Now let’s look at performance
Now let’s look at performance
Now let’s look at performance
Now let’s look at performance
Now let’s look at performance
Magic happens with small changes
curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&…
facet.limit=100&facet.mincount=1'
Magic happens with small changes
Magic happens with small changes
Magic happens with small changes
Magic happens with small changes
Magic happens with small changes
Magic happens with small changes
Magic happens with small changes
Monitoring in production
http://sematext.com/spm/index.html
And remember…
<luceneMatchVersion>
3.1
</luceneMatchVersion>
Quick summary
http://www.soothetube.com/2013/12/29/thats-all-folks/
We are hiring!
Dig Search?
Dig Analytics?
Dig Big Data?
Dig Performance?
Dig Logging?
Dig working with and in open – source?
We’re hiring world – wide!
http://sematext.com/about/jobs.html
Thank you!
Rafał Kuć
@kucrafal
rafal.kuc@sematext.com
Sematext
@sematext
http://sematext.com
http://blog.sematext.com

Solr Anti - patterns

  • 2.
    Solr Anti -patterns Rafał Kuć, Sematext Group, Inc. @kucrafal @sematext http://sematext.com
  • 3.
    About me Sematext consultant& engineer Solr.pl co-founder Father & husband
  • 4.
    The (not so)perfect migration http://en.wikipedia.org/wiki/Bird_migration http://www.likesbooks.com/aarafterhours/?p=750
  • 5.
    From 3.1 to4.10 (and hopefully not back) March 2011 September 2014
  • 6.
    The lonely solrconfig.xml <requestHandlername="/update" class="solr.XmlUpdateRequestHandler" /> <requestHandler name="/update/javabin" class="solr.BinaryUpdateRequestHandler" /> <requestHandler name="/update/csv" class="solr.CSVRequestHandler" /> <requestHandler name="/update/json" class="solr.JsonUpdateRequestHandler" /> <luceneMatchVersion>LUCENE_31</luceneMatchVersion> <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
  • 7.
  • 8.
    And faulty indexing <?xmlversion="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">400</int> <int name="QTime">0</int> </lst> <lst name="error"> <str name="msg">missing content stream</str> <int name="code">400</int> </lst> </response> 109173 [qtp1223685984-20] ERROR org.apache.solr.core.SolrCore ľ org.apache.solr.common.SolrException: missing content stream at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:647) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source)
  • 9.
    Let’s make thatright <requestHandler name="/update" class="solr.UpdateRequestHandler" /> <requestHandler name="/update/json" class="solr.UpdateRequestHandler"> <lst name="defaults"> <str name="stream.contentType">application/json</str> </lst> </requestHandler> <luceneMatchVersion>LUCENE_4.10.0</luceneMatchVersion> <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/> <nrtMode>true</nrtMode> <updateLog> <str name="dir"> ${solr.ulog.dir:} </str> </updateLog>
  • 10.
    The old schema.xml <fieldTypename="int" class="solr.IntField" omitNorms="true"/> <fieldType name="long" class="solr.LongField" omitNorms="true"/> <fieldType name="float" class="solr.FloatField" omitNorms="true"/> <fieldType name="double" class="solr.DoubleField" omitNorms="true"/> <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/> <fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
  • 11.
    <fieldType name="int" class="solr.IntField"omitNorms="true"/> <fieldType name="long" class="solr.LongField" omitNorms="true"/> <fieldType name="float" class="solr.FloatField" omitNorms="true"/> <fieldType name="double" class="solr.DoubleField" omitNorms="true"/> <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/> <fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/> <fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/> The old schema.xml
  • 12.
    The new schema.xml <fieldTypename="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/> <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
  • 13.
    Threads? What threads? <Setname="ThreadPool"> <New class="org.eclipse.jetty.util.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">200</Set> <Set name="detailedDump">false</Set> </New> </Set>
  • 14.
  • 15.
    Threads? What threads? <Setname="ThreadPool"> <New class="org.eclipse.jetty.util.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">200</Set> <Set name="detailedDump">false</Set> </New> </Set>
  • 16.
    OK, so nowwe can actually run queries <Set name="ThreadPool"> <New class="org.eclipse.jetty.util.thread.QueuedThreadPool"> <Set name="minThreads">10</Set> <Set name="maxThreads">10000</Set> <Set name="detailedDump">false</Set> </New> </Set>
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
    The ZooKeeper –production -DzkHost=zk1:2181,zk2:2181,zk3:2181
  • 24.
    The ZooKeeper –production -DzkHost=zk1:2181,zk2:2181,zk3:2181
  • 25.
    The ZooKeeper –production -DzkHost=zk1:2181,zk2:2181,zk3:2181
  • 26.
    The ZooKeeper –production -DzkHost=zk1:2181,zk2:2181,zk3:2181
  • 27.
    Let’s cache everything <filterCacheclass="solr.LRUCache" size="1048576" initialSize="1048576" autowarmCount="524288"/> <queryResultCache class="solr.LRUCache" size="1048576" initialSize="1048576" autowarmCount="524288"/><documentCache class="solr.LRUCache" size="1048576" initialSize="1048576" autowarmCount="0"/>
  • 28.
    And now let’slook at the warmup times
  • 29.
    And now let’slook at the warmup times
  • 30.
    OK, show usthe way „Mr. Consultant” <filterCache class="solr.FastLRUCache" size="1024" initialSize="1024" autowarmCount="512"/> <queryResultCache class="solr.LRUCache" size="16000" initialSize="16000" autowarmCount="8000"/><documentCache class="solr.LRUCache" size="16384" initialSize="16384" autowarmCount="0"/>
  • 31.
    Let’s look atthe warmup times again
  • 32.
    Let’s look atthe warmup times again
  • 33.
    Bulks are fornoobs Application Application Application Doc Doc Doc
  • 34.
    Bulks are fornoobs Application Application Application Doc Doc Doc
  • 35.
    But let’s usebulks, just in case
  • 36.
    But let’s usebulks, just in case
  • 37.
    We need torefresh and hard commit <autoCommit> <maxTime>1000</maxTime> <openSearcher>true</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit>
  • 38.
    Maybe we shouldonly refresh? <autoCommit> <maxTime>60000</maxTime> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>1000</maxTime> </autoSoftCommit>
  • 39.
    OK, let’s goeasy with refreshing <autoCommit> <maxTime>60000</maxTime> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>30000</maxTime> </autoSoftCommit>
  • 40.
    But I reallyneed all that data curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
  • 41.
    <?xml version="1.0" encoding="UTF-8"?> <response> <lstname="responseHeader"> <int name="status">0</int> <int name="QTime">9418</int> <lst name="params"> <str name="start">3000000</str> <str name="q">*:*</str> <str name="rows">100</str> </lst> </lst> <result name="response" numFound="3284000" start="3000000"> . . . </result> </response> But I really need all that data curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
  • 42.
    <?xml version="1.0" encoding="UTF-8"?> <response> <lstname="responseHeader"> <int name="status">0</int> <int name="QTime">9418</int> <lst name="params"> <str name="start">3000000</str> <str name="q">*:*</str> <str name="rows">5</str> </lst> </lst> <result name="response" numFound="3284000" start="3000000"> . . . </result> </response> But I really need all that data <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="error"> <str name="msg">java.lang.OutOfMemoryError: Java heap space</str> <str name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:796) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:448) . . . Caused by: java.lang.OutOfMemoryError: Java heap space . . . </str> <int name="code">500</int> </lst> </response> curl -XGET 'localhost:8983/solr/select?q=*:*&start=3000000&rows=100'
  • 43.
    But I reallyneed all that data Query
  • 44.
    But I reallyneed all that data
  • 45.
    But I reallyneed all that data
  • 46.
    But I reallyneed all that data Response
  • 47.
    Use the scrollLuke curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc'
  • 48.
    Use the scrollLuke curl -XGET 'localhost:8983/solr/select?q=*:*&cursorMark=*&sort=score+desc,id+desc' <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">189</int> <lst name="params"> <str name="sort">score desc,id desc</str> <str name="q">*:*</str> <str name="cursorMark">*</str> </lst> </lst> <result name="response" numFound="3284000" start="0"> <doc> ... </doc> . . . </result> <str name="nextCursorMark">AoIIP4AAACY5OTk5OTA=</str> </response>
  • 49.
    Use the scrollLuke curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc &cursorMark=AoIIP4AAACY5OTk5OTA='
  • 50.
    Use the scrollLuke curl -XGET 'localhost:8983/solr/select?q=*:*&sort=score+desc,id+desc &cursorMark=AoIIP4AAACY5OTk5OTA=' <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">184</int> <lst name="params"> <str name="sort">score desc,id desc</str> <str name="q">*:*</str> <str name="cursorMark">AoIIP4AAACY5OTk5OTA=</str> </lst> </lst> <result name="response" numFound="3284000" start="0"> <doc> ... </doc> . . . </result> <str name="nextCursorMark">AoIIP4AAACY5OTk5ODE=</str> </response>
  • 51.
    Limiting faceting, whybother? curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-1&facet.mincount=0'
  • 52.
    Limiting faceting, whybother? curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-1&facet.mincount=0' <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">9967</int> <lst name="params"> ... </lst> </lst> <result name="response" numFound="3284000" start="0"> . . . </result> <lst name="facet_counts"> <lst name="facet_fields"> <lst name="tag"> ... </lst> </lst> </lst> </response>
  • 53.
    Limiting faceting, whybother? curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=-1&facet.mincount=0' <?xml version="1.0" encoding="UTF-8"?> <response> . . . <lst name="error"> <str name="msg">Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space</str> <str name="trace">org.apache.solr.common.SolrException: Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space . . . Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:685) . . . </str> <int name="code">500</int> </lst> </response>
  • 54.
    Now let’s lookat performance
  • 55.
    Now let’s lookat performance
  • 56.
    Now let’s lookat performance
  • 57.
    Now let’s lookat performance
  • 58.
    Now let’s lookat performance
  • 59.
    Magic happens withsmall changes curl -XGET 'localhost:8983/solr/select?q=*:*&facet=true&facet.field=tag&… facet.limit=100&facet.mincount=1'
  • 60.
    Magic happens withsmall changes
  • 61.
    Magic happens withsmall changes
  • 62.
    Magic happens withsmall changes
  • 63.
    Magic happens withsmall changes
  • 64.
    Magic happens withsmall changes
  • 65.
    Magic happens withsmall changes
  • 66.
    Magic happens withsmall changes
  • 67.
  • 68.
  • 69.
  • 70.
    We are hiring! DigSearch? Dig Analytics? Dig Big Data? Dig Performance? Dig Logging? Dig working with and in open – source? We’re hiring world – wide! http://sematext.com/about/jobs.html
  • 71.