lessons learned
Solr
@jeroenrosenberg
Frontend of
Lucene
Lucene
xml/json api + field types + caching + faceting + grouping
+
Indexing
Indexing
Lucene's inverted index
Efficient when many docs
share the same value
Field types
<field name="id" type="string" indexed="true" stored="true" required="
true" multiValued="false"/>
<field name="name" type="string" indexed="false" stored="true"
required="true" multiValued="false"/>
Field type definition
<field name="id" type="string" indexed="true" stored="true" required="
true" multiValued="false"/>
<field name="name" type="string" indexed="false" stored="true"
required="true" multiValued="false"/>
Field type definition
...
<fieldtype name="pdate" class="solr.DateField" sortMissingLast="true"/>
...
<field name="date" type="pdate" indexed="false" stored="true"/>
<field name="range_date" type="pdate" indexed="true" stored="false"/>
<copyField source="date" dest="range_date"/>
Field type definition
...
<fieldtype name="pdate" class="solr.DateField" sortMissingLast="true"/>
...
<field name="date" type="pdate" indexed="false" stored="true"/>
<field name="range_date" type="pdate" indexed="true" stored="false"/>
<copyField source="date" dest="range_date"/>
Field type definition
<dynamicField name="*_s" type="string" indexed="true" stored="true"/>
Schemaless
Segments
Tune the merge factor
Max. # of segments
Faster search, but slower indexing
Faster indexing, but slower search
Don't commit. Ever.
Don't commit often.
Sharding
Manual
distribution
foo foo foo
core1 core2 core3
Index distributor
replication
Look Ma,
no downtime!
q=name:hotel1&
shards=solr2:7070/solr/foo,solr3:
7070/solr/foo&
partialResults=true
Distributed search
<requestHandler name="distributedSearch" class="solr.SearchHandler"
default="false">
<lst name="defaults">
<int name="rows">10</int>
<str name="fl">*</str>
<bool name="partialResults">true</bool>
<str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str>
</lst>
</requestHandler>
Distributed search config
<requestHandler name="distributedSearch" class="solr.SearchHandler"
default="false">
<lst name="defaults">
<int name="rows">10</int>
<str name="fl">*</str>
<bool name="partialResults">true</bool>
<str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str>
</lst>
</requestHandler>
Distributed search config
<requestHandler name="distributedSearch" class="solr.SearchHandler" default
false">
<lst name="defaults">
<int name="rows">10</int>
<str name="fl">*</str>
<bool name="partialResults">true</bool>
<str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str>
</lst>
</requestHandler>
Distributed search config
q=name:hotel1&qt=distributedSearch
Distributed search
Caching
Field value
Filter
Document
Query result
Document
Field value
Query result
FilterDoc ids of results per filter query
Query result
Document
Filter
Field value
Field names (facets)
mapped to mapping of
doc ids to terms
Field value
Filter
Document
Query result
Ordered set of doc ids of top N
results
Field value
Filter
Query result
Document
Stored fields for each doc
Autowarming
q=*:*&fq=country:AN&fq=duration:[1 TO *]&
fq=date:[NOW TO 2013-07-01T00:00:00Z]
Filter queries...
q=*:*&fq=country:AN&fq=duration:[1 TO *]&
fq=date:[NOW TO 2013-07-01T00:00:00Z]
Match all documents
q=*:*
q=*:*&fq=country:AN&fq=duration:[1 TO *]&
fq=date:[NOW TO 2013-07-01T00:00:00Z]
Filter by field value
fq=country:AN
q=*:*&fq=country:AN&fq=duration:[1 TO *]&
fq=date:[NOW TO 2013-07-01T00:00:00Z]
Range query with wildcard
fq=duration:[1 TO *]
range query using DateMath syntax
fq=date:[NOW TO 2013-07-01T00:00:00Z]
q=*:*&rows=10000000
Getting all results
Faceting
rows=0&facet=true&facet.field=departureairport&
facet.field=touroperator&facet.limit=-1&
facet.mincount=1&f.touroperator.facet.limit=2
A facet query...
rows=0&facet=true&facet.field=departureairport&
facet.field=touroperator&facet.limit=-1&
facet.mincount=1&f.touroperator.facet.limit=2
Enable faceting
facet=true
rows=0&facet=true&facet.field=departureairport&
facet.field=touroperator&facet.limit=-1&
facet.mincount=1&f.touroperator.facet.limit=2
rows=0
Suppress document results
rows=0&facet=true&facet.field=departureairport&
facet.field=touroperator&facet.limit=-1&
facet.mincount=1&f.touroperator.facet.limit=2
facet.field=departureairport
Specify a field name
...and another one
facet.field=touroperator
rows=0&facet=true&facet.field=departureairport&
facet.field=touroperator&facet.limit=-1&
facet.mincount=1&f.touroperator.facet.limit=2
Unlimited field values (globally)
facet.limit=-1
rows=0&facet=true&facet.field=departureairport&
facet.field=touroperator&facet.limit=-1&
facet.mincount=1&f.touroperator.facet.limit=2
Unlimited field values (globally)
facet.limit=-1
Basically, always a good idea
rows=0&facet=true&facet.field=departureairport&
facet.field=touroperator&facet.limit=-1&
facet.mincount=1&f.touroperator.facet.limit=2
Override global limit for specific field names
f.touroperator.facet.limit=2
rows=0&facet=true&facet.field=departureairport&
facet.field=touroperator&facet.limit=-1&
facet.mincount=1&f.touroperator.facet.limit=2
At least 1 document per field value
facet.mincount=1
q=*:*&fq={!tag=country}country:AN&facet=true&
facet.field={!ex=country}country&facet.limit=-1&
facet.mincount=1
Multi-select faceting...
q=*:*&fq={!tag=country}country:AN&facet=true&
facet.field={!ex=country}country&facet.limit=-1&
facet.mincount=1
fq={!tag=country}country:AN
Tag a filter query...
...and exclude it for a field value
facet.field={!ex=country}country
FACET
ALL THE THINGS!
FACET
ALL THE THINGS!
Grouping
group=true&group.field=accoid&
group.sort=price asc&sort=popularity asc&
group.facets=UNGROUPED
A grouping query...
group=true&group.field=accoid&
group.sort=price asc&sort=popularity asc&
group.facets=UNGROUPED
Enable grouping
group=true
group=true&group.field=accoid&
group.sort=price asc&sort=popularity asc&
group.facets=UNGROUPED
Specify the field name
group.field=accoid
group=true&group.field=accoid&
group.sort=price asc&sort=popularity asc&
group.facets=UNGROUPED
Determines group head
group.sort=price asc
Determine order of document results
sort=popularity asc
group=true&group.field=accoid&
group.sort=price asc&sort=popularity asc&
group.facets=UNGROUPED
Determines group head
group.sort=price asc
Determine order of document results
sort=popularity asc
Only group heads are returned!
ONE DOES NOT
SIMPLY
EXPLAIN SOLR QUERIES
ONE DOES NOT
SIMPLY
EXPLAIN SOLR QUERIES
debugQuery=true
Solr 4.3 is
cominghttp://docs.lucidworks.com/display/solr/Major+Changes+from+Solr+3+to+Solr+4
Queries?

Apache Solr lessons learned