lessons learnedSolr@jeroenrosenberg
Frontend ofLucene
Lucenexml/json api + field types + caching + faceting + grouping+
Indexing
Indexing
Lucenes inverted index
Efficient when many docsshare the same value
Field types
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/><field name="name" type="...
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/><field name="name" type="...
...<fieldtype name="pdate" class="solr.DateField" sortMissingLast="true"/>...<field name="date" type="pdate" indexed="fals...
...<fieldtype name="pdate" class="solr.DateField" sortMissingLast="true"/>...<field name="date" type="pdate" indexed="fals...
<dynamicField name="*_s" type="string" indexed="true" stored="true"/>Schemaless
Segments
Tune the merge factor
Max. # of segmentsFaster search, but slower indexingFaster indexing, but slower search
Dont commit. Ever.
Dont commit often.
Sharding
Manualdistribution
foo foo foocore1 core2 core3Index distributorreplication
Look Ma,no downtime!
q=name:hotel1&shards=solr2:7070/solr/foo,solr3:7070/solr/foo&partialResults=trueDistributed search
<requestHandler name="distributedSearch" class="solr.SearchHandler"default="false"><lst name="defaults"><int name="rows">1...
<requestHandler name="distributedSearch" class="solr.SearchHandler"default="false"><lst name="defaults"><int name="rows">1...
<requestHandler name="distributedSearch" class="solr.SearchHandler" defaultfalse"><lst name="defaults"><int name="rows">10...
q=name:hotel1&qt=distributedSearchDistributed search
Caching
Field valueFilterDocumentQuery result
DocumentField valueQuery resultFilterDoc ids of results per filter query
Query resultDocumentFilterField valueField names (facets)mapped to mapping ofdoc ids to terms
Field valueFilterDocumentQuery resultOrdered set of doc ids of top Nresults
Field valueFilterQuery resultDocumentStored fields for each doc
Autowarming
q=*:*&fq=country:AN&fq=duration:[1 TO *]&fq=date:[NOW TO 2013-07-01T00:00:00Z]Filter queries...
q=*:*&fq=country:AN&fq=duration:[1 TO *]&fq=date:[NOW TO 2013-07-01T00:00:00Z]Match all documentsq=*:*
q=*:*&fq=country:AN&fq=duration:[1 TO *]&fq=date:[NOW TO 2013-07-01T00:00:00Z]Filter by field valuefq=country:AN
q=*:*&fq=country:AN&fq=duration:[1 TO *]&fq=date:[NOW TO 2013-07-01T00:00:00Z]Range query with wildcardfq=duration:[1 TO *...
q=*:*&rows=10000000Getting all results
Faceting
rows=0&facet=true&facet.field=departureairport&facet.field=touroperator&facet.limit=-1&facet.mincount=1&f.touroperator.fac...
rows=0&facet=true&facet.field=departureairport&facet.field=touroperator&facet.limit=-1&facet.mincount=1&f.touroperator.fac...
rows=0&facet=true&facet.field=departureairport&facet.field=touroperator&facet.limit=-1&facet.mincount=1&f.touroperator.fac...
rows=0&facet=true&facet.field=departureairport&facet.field=touroperator&facet.limit=-1&facet.mincount=1&f.touroperator.fac...
rows=0&facet=true&facet.field=departureairport&facet.field=touroperator&facet.limit=-1&facet.mincount=1&f.touroperator.fac...
rows=0&facet=true&facet.field=departureairport&facet.field=touroperator&facet.limit=-1&facet.mincount=1&f.touroperator.fac...
rows=0&facet=true&facet.field=departureairport&facet.field=touroperator&facet.limit=-1&facet.mincount=1&f.touroperator.fac...
rows=0&facet=true&facet.field=departureairport&facet.field=touroperator&facet.limit=-1&facet.mincount=1&f.touroperator.fac...
q=*:*&fq={!tag=country}country:AN&facet=true&facet.field={!ex=country}country&facet.limit=-1&facet.mincount=1Multi-select ...
q=*:*&fq={!tag=country}country:AN&facet=true&facet.field={!ex=country}country&facet.limit=-1&facet.mincount=1fq={!tag=coun...
FACETALL THE THINGS!FACETALL THE THINGS!
Grouping
group=true&group.field=accoid&group.sort=price asc&sort=popularity asc&group.facets=UNGROUPEDA grouping query...
group=true&group.field=accoid&group.sort=price asc&sort=popularity asc&group.facets=UNGROUPEDEnable groupinggroup=true
group=true&group.field=accoid&group.sort=price asc&sort=popularity asc&group.facets=UNGROUPEDSpecify the field namegroup.f...
group=true&group.field=accoid&group.sort=price asc&sort=popularity asc&group.facets=UNGROUPEDDetermines group headgroup.so...
group=true&group.field=accoid&group.sort=price asc&sort=popularity asc&group.facets=UNGROUPEDDetermines group headgroup.so...
ONE DOES NOTSIMPLYEXPLAIN SOLR QUERIESONE DOES NOTSIMPLYEXPLAIN SOLR QUERIES
debugQuery=true
Solr 4.3 iscominghttp://docs.lucidworks.com/display/solr/Major+Changes+from+Solr+3+to+Solr+4
Queries?
Apache Solr lessons learned
Upcoming SlideShare
Loading in...5
×

Apache Solr lessons learned

1,959

Published on

Lessons learned while working with (a customized version of) Apache Solr for 3 years

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,959
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Apache Solr lessons learned"

  1. 1. lessons learnedSolr@jeroenrosenberg
  2. 2. Frontend ofLucene
  3. 3. Lucenexml/json api + field types + caching + faceting + grouping+
  4. 4. Indexing
  5. 5. Indexing
  6. 6. Lucenes inverted index
  7. 7. Efficient when many docsshare the same value
  8. 8. Field types
  9. 9. <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/><field name="name" type="string" indexed="false" stored="true"required="true" multiValued="false"/>Field type definition
  10. 10. <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/><field name="name" type="string" indexed="false" stored="true"required="true" multiValued="false"/>Field type definition
  11. 11. ...<fieldtype name="pdate" class="solr.DateField" sortMissingLast="true"/>...<field name="date" type="pdate" indexed="false" stored="true"/><field name="range_date" type="pdate" indexed="true" stored="false"/><copyField source="date" dest="range_date"/>Field type definition
  12. 12. ...<fieldtype name="pdate" class="solr.DateField" sortMissingLast="true"/>...<field name="date" type="pdate" indexed="false" stored="true"/><field name="range_date" type="pdate" indexed="true" stored="false"/><copyField source="date" dest="range_date"/>Field type definition
  13. 13. <dynamicField name="*_s" type="string" indexed="true" stored="true"/>Schemaless
  14. 14. Segments
  15. 15. Tune the merge factor
  16. 16. Max. # of segmentsFaster search, but slower indexingFaster indexing, but slower search
  17. 17. Dont commit. Ever.
  18. 18. Dont commit often.
  19. 19. Sharding
  20. 20. Manualdistribution
  21. 21. foo foo foocore1 core2 core3Index distributorreplication
  22. 22. Look Ma,no downtime!
  23. 23. q=name:hotel1&shards=solr2:7070/solr/foo,solr3:7070/solr/foo&partialResults=trueDistributed search
  24. 24. <requestHandler name="distributedSearch" class="solr.SearchHandler"default="false"><lst name="defaults"><int name="rows">10</int><str name="fl">*</str><bool name="partialResults">true</bool><str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str></lst></requestHandler>Distributed search config
  25. 25. <requestHandler name="distributedSearch" class="solr.SearchHandler"default="false"><lst name="defaults"><int name="rows">10</int><str name="fl">*</str><bool name="partialResults">true</bool><str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str></lst></requestHandler>Distributed search config
  26. 26. <requestHandler name="distributedSearch" class="solr.SearchHandler" defaultfalse"><lst name="defaults"><int name="rows">10</int><str name="fl">*</str><bool name="partialResults">true</bool><str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str></lst></requestHandler>Distributed search config
  27. 27. q=name:hotel1&qt=distributedSearchDistributed search
  28. 28. Caching
  29. 29. Field valueFilterDocumentQuery result
  30. 30. DocumentField valueQuery resultFilterDoc ids of results per filter query
  31. 31. Query resultDocumentFilterField valueField names (facets)mapped to mapping ofdoc ids to terms
  32. 32. Field valueFilterDocumentQuery resultOrdered set of doc ids of top Nresults
  33. 33. Field valueFilterQuery resultDocumentStored fields for each doc
  34. 34. Autowarming
  35. 35. q=*:*&fq=country:AN&fq=duration:[1 TO *]&fq=date:[NOW TO 2013-07-01T00:00:00Z]Filter queries...
  36. 36. q=*:*&fq=country:AN&fq=duration:[1 TO *]&fq=date:[NOW TO 2013-07-01T00:00:00Z]Match all documentsq=*:*
  37. 37. q=*:*&fq=country:AN&fq=duration:[1 TO *]&fq=date:[NOW TO 2013-07-01T00:00:00Z]Filter by field valuefq=country:AN
  38. 38. q=*:*&fq=country:AN&fq=duration:[1 TO *]&fq=date:[NOW TO 2013-07-01T00:00:00Z]Range query with wildcardfq=duration:[1 TO *]range query using DateMath syntaxfq=date:[NOW TO 2013-07-01T00:00:00Z]
  39. 39. q=*:*&rows=10000000Getting all results
  40. 40. Faceting
  41. 41. rows=0&facet=true&facet.field=departureairport&facet.field=touroperator&facet.limit=-1&facet.mincount=1&f.touroperator.facet.limit=2A facet query...
  42. 42. rows=0&facet=true&facet.field=departureairport&facet.field=touroperator&facet.limit=-1&facet.mincount=1&f.touroperator.facet.limit=2Enable facetingfacet=true
  43. 43. rows=0&facet=true&facet.field=departureairport&facet.field=touroperator&facet.limit=-1&facet.mincount=1&f.touroperator.facet.limit=2rows=0Suppress document results
  44. 44. rows=0&facet=true&facet.field=departureairport&facet.field=touroperator&facet.limit=-1&facet.mincount=1&f.touroperator.facet.limit=2facet.field=departureairportSpecify a field name...and another onefacet.field=touroperator
  45. 45. rows=0&facet=true&facet.field=departureairport&facet.field=touroperator&facet.limit=-1&facet.mincount=1&f.touroperator.facet.limit=2Unlimited field values (globally)facet.limit=-1
  46. 46. rows=0&facet=true&facet.field=departureairport&facet.field=touroperator&facet.limit=-1&facet.mincount=1&f.touroperator.facet.limit=2Unlimited field values (globally)facet.limit=-1Basically, always a good idea
  47. 47. rows=0&facet=true&facet.field=departureairport&facet.field=touroperator&facet.limit=-1&facet.mincount=1&f.touroperator.facet.limit=2Override global limit for specific field namesf.touroperator.facet.limit=2
  48. 48. rows=0&facet=true&facet.field=departureairport&facet.field=touroperator&facet.limit=-1&facet.mincount=1&f.touroperator.facet.limit=2At least 1 document per field valuefacet.mincount=1
  49. 49. q=*:*&fq={!tag=country}country:AN&facet=true&facet.field={!ex=country}country&facet.limit=-1&facet.mincount=1Multi-select faceting...
  50. 50. q=*:*&fq={!tag=country}country:AN&facet=true&facet.field={!ex=country}country&facet.limit=-1&facet.mincount=1fq={!tag=country}country:ANTag a filter query......and exclude it for a field valuefacet.field={!ex=country}country
  51. 51. FACETALL THE THINGS!FACETALL THE THINGS!
  52. 52. Grouping
  53. 53. group=true&group.field=accoid&group.sort=price asc&sort=popularity asc&group.facets=UNGROUPEDA grouping query...
  54. 54. group=true&group.field=accoid&group.sort=price asc&sort=popularity asc&group.facets=UNGROUPEDEnable groupinggroup=true
  55. 55. group=true&group.field=accoid&group.sort=price asc&sort=popularity asc&group.facets=UNGROUPEDSpecify the field namegroup.field=accoid
  56. 56. group=true&group.field=accoid&group.sort=price asc&sort=popularity asc&group.facets=UNGROUPEDDetermines group headgroup.sort=price ascDetermine order of document resultssort=popularity asc
  57. 57. group=true&group.field=accoid&group.sort=price asc&sort=popularity asc&group.facets=UNGROUPEDDetermines group headgroup.sort=price ascDetermine order of document resultssort=popularity ascOnly group heads are returned!
  58. 58. ONE DOES NOTSIMPLYEXPLAIN SOLR QUERIESONE DOES NOTSIMPLYEXPLAIN SOLR QUERIES
  59. 59. debugQuery=true
  60. 60. Solr 4.3 iscominghttp://docs.lucidworks.com/display/solr/Major+Changes+from+Solr+3+to+Solr+4
  61. 61. Queries?

×