0
Solr & Lucene at Etsy       Gregg Donovan    Technical Lead, Search      gregg@etsy.com
1.5 years Solr & Lucene at Etsy.com3 years Solr & Lucene at TheLadders.com
8+ million members
9.3 million items
800k+ active sellers
1+ billion pageviews / month
Maximize Solr out-of-the-box
Hack at a low-level
Know when to do each
Or
Don’t fear trunk
builds.apache.org/job/Solr-trunk/changes
http://localhost:8393/solr/placesuggest/                   select?                q={!lucene}s*  &sfield=latlong&pt=37.595...
{!lucene} {!field} {!term} {!boost} {!func}{!dismax}{!edismax}
Cheap ranking awesomeness
ExternalFileField ftw!
schema.xml:    <fieldType name="file" keyField="treasury_id" defVal="0"stored="false" indexed="true" class="solr.ExternalF...
ExternalFileField caveats
More relevance: boost query
http://localhost:8983/solr/listings/select?q={!boost b=$rel v=$qq}&rel=category:furniture^10+OR+((-material:acrylic)^5)&qq...
Impression tracking
etsy.com/search?q=desk&explain=1
Side-by-Side testing
Cheap performance wins
Put off sharding till you must
cat ${indexDir}/* > /dev/null
Return IDs, minimize stored fields
RAM: $10-20 / GB
SSD: 0.1ms vs 10ms seek
Custom?
solr-user
Tools for low-level hacking
Continuous deployment
One button.So easy a dog could do it.
MTTR > MTBF
github.com/etsy/logster
Tracking GC
export GC_DEBUG="-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintAdapti...
Alerting
Testing
SaveAsFixture
Profiling
Java Primitive Library         fastutil         trove4j
Know the hooks  SolrRequestHandler  SearchComponent    QParserPlugin   SolrEventListener       SolrCache   ValueSourceParser
SolrIndexSearcher gotchas                reference counting             using it as a cache key:   WeakHashMap<SolrIndexSe...
Example:personalized collections
fq={!term f=id}123 OR {!term f=id}456
Need a map of PK to docId
Use custom SolrCache plus SolrEventListener                 to fill it
github.com/giokincade/FastTermFilter
i18n currency sorting and filtering
currency.xml:<currencyConfig version="1.0">! <currencies>! ! <currency name="United States Dollar" symbol="$" code="USD"/>...
price:[$10.00 to $50.00]price:[10.00USD to 50.00USD]       price:20.00EUR
MoneyFieldType.java:  @Override  public Query getRangeQuery(QParser parser, SchemaField field, String part1, String part2,...
Replication gotcha
SOLR-2202
Related Searches
Autosuggest!
bjewlery dewelry ejewelry ejwelry ewelery ewerly ewlery fewelryfewlery fjewelery fjewelry gewerly gewlery hewelery hewelry...
The TermDictionary is not a whitelist
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
Upcoming SlideShare
Loading in...5
×

Solr & Lucene @ Etsy by Gregg Donovan

6,323

Published on

Slides from my talk on "Solr & Lucene @ Etsy" from the LuceneRevolution conference on May 26th, 2011 in San Francisco.

Published in: Technology
1 Comment
19 Likes
Statistics
Notes
No Downloads
Views
Total Views
6,323
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
81
Comments
1
Likes
19
Embeds 0
No embeds

No notes for slide

Transcript of "Solr & Lucene @ Etsy by Gregg Donovan"

  1. 1. Solr & Lucene at Etsy Gregg Donovan Technical Lead, Search gregg@etsy.com
  2. 2. 1.5 years Solr & Lucene at Etsy.com3 years Solr & Lucene at TheLadders.com
  3. 3. 8+ million members
  4. 4. 9.3 million items
  5. 5. 800k+ active sellers
  6. 6. 1+ billion pageviews / month
  7. 7. Maximize Solr out-of-the-box
  8. 8. Hack at a low-level
  9. 9. Know when to do each
  10. 10. Or
  11. 11. Don’t fear trunk
  12. 12. builds.apache.org/job/Solr-trunk/changes
  13. 13. http://localhost:8393/solr/placesuggest/ select? q={!lucene}s* &sfield=latlong&pt=37.595804,-122.364521&sort=div(geodist(),sqrt(sum(population,50))) %20asc
  14. 14. {!lucene} {!field} {!term} {!boost} {!func}{!dismax}{!edismax}
  15. 15. Cheap ranking awesomeness
  16. 16. ExternalFileField ftw!
  17. 17. schema.xml: <fieldType name="file" keyField="treasury_id" defVal="0"stored="false" indexed="true" class="solr.ExternalFileField"valType="float"/> <field name="hotness" type="file"/>/search/data/treasury/external_hotness.1306390802088:1=2.32=1.73=1.1Solr query:sort={!func}hotness+desc
  18. 18. ExternalFileField caveats
  19. 19. More relevance: boost query
  20. 20. http://localhost:8983/solr/listings/select?q={!boost b=$rel v=$qq}&rel=category:furniture^10+OR+((-material:acrylic)^5)&qq=desk
  21. 21. Impression tracking
  22. 22. etsy.com/search?q=desk&explain=1
  23. 23. Side-by-Side testing
  24. 24. Cheap performance wins
  25. 25. Put off sharding till you must
  26. 26. cat ${indexDir}/* > /dev/null
  27. 27. Return IDs, minimize stored fields
  28. 28. RAM: $10-20 / GB
  29. 29. SSD: 0.1ms vs 10ms seek
  30. 30. Custom?
  31. 31. solr-user
  32. 32. Tools for low-level hacking
  33. 33. Continuous deployment
  34. 34. One button.So easy a dog could do it.
  35. 35. MTTR > MTBF
  36. 36. github.com/etsy/logster
  37. 37. Tracking GC
  38. 38. export GC_DEBUG="-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicy -XX:AdaptiveSizePolicyOutputInterval=1 -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -Xloggc:/var/log/search/gc.log"
  39. 39. Alerting
  40. 40. Testing
  41. 41. SaveAsFixture
  42. 42. Profiling
  43. 43. Java Primitive Library fastutil trove4j
  44. 44. Know the hooks SolrRequestHandler SearchComponent QParserPlugin SolrEventListener SolrCache ValueSourceParser
  45. 45. SolrIndexSearcher gotchas reference counting using it as a cache key: WeakHashMap<SolrIndexSearcher,MyValue> myCache...
  46. 46. Example:personalized collections
  47. 47. fq={!term f=id}123 OR {!term f=id}456
  48. 48. Need a map of PK to docId
  49. 49. Use custom SolrCache plus SolrEventListener to fill it
  50. 50. github.com/giokincade/FastTermFilter
  51. 51. i18n currency sorting and filtering
  52. 52. currency.xml:<currencyConfig version="1.0">! <currencies>! ! <currency name="United States Dollar" symbol="$" code="USD"/>! ! <currency name="Australian Dollar" symbol="$" code="AUD"/>! ! <currency name="Canadian Dollar" symbol="$" code="CAD"/>! ! <currency name="Czech Koruna" symbol="Kč" code="CZK"/>...! </currencies>! <rates>! ! <rate from="USD" to="AUD" rate="1.168750"/>! ! <rate from="USD" to="CAD" rate="1.085000"/>! ! <rate from="USD" to="CZK" rate="20.107500"/>! ! <rate from="USD" to="DKK" rate="5.323750"/>... </rates></currencyConfig>
  53. 53. price:[$10.00 to $50.00]price:[10.00USD to 50.00USD] price:20.00EUR
  54. 54. MoneyFieldType.java: @Override public Query getRangeQuery(QParser parser, SchemaField field, String part1, String part2,final boolean minInclusive, final boolean maxInclusive) { final MoneyValue p1 = MoneyValue.parse(part1, defaultCurrency); final MoneyValue p2 = MoneyValue.parse(part2, defaultCurrency); if (!p1.getCurrencyCode().equals(p2.getCurrencyCode())) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, new ParseException("Cannot parse range query " + part1 + " to " + part2 + ": range queries only supported when upper and lower bound have samecurrency.")); } String currencyCode = p1.getCurrencyCode(); final MoneyValueSource vs = new MoneyValueSource(field, currencyCode, parser); return new SolrConstantScoreQuery(new ValueSourceRangeFilter(vs, p1.getAmount() + "", p2.getAmount() + "", minInclusive, maxInclusive)); }
  55. 55. Replication gotcha
  56. 56. SOLR-2202
  57. 57. Related Searches
  58. 58. Autosuggest!
  59. 59. bjewlery dewelry ejewelry ejwelry ewelery ewerly ewlery fewelryfewlery fjewelery fjewelry gewerly gewlery hewelery hewelry hewerlyhewlery hjewelry iewelry ijewelry jawelery jawlery jeawlery jeeleryjeelry jeewelery jeewelry jeewlery jeewlry jefwelry jejelry jelelryjelery jellery jelwelery jelwelry jelwlery jemelry jemerly jemwelryjeqwelry jerelery jerelry jerely jererly jerlery jerwelery jerwelryjerwely jerwerly jeselery jeselry jevelry jeverly jewalery jewdelryjewedlry jeweelrry jeweelry jeweely jeweer jeweery jeweilry jeweiryjewejery jewejlry jewejrly jewejry jewekey jewekry jewelary jeweldyjewele jewelee jewelelry jewelera jewelerey jewelerly jewelertjewelerty jeweleru jeweleruy jeweleryl jewelerys jeweleryy jeweletjewelety jeweleya jewelfry jewelfy jeweliy jewellryp jewelltryjewelly jewelory jewelra jewelray jewelre jewelree jewelreyyjewelrfy jewelrh jewelri jewelrky jewelrly jewelrr jewelrs jewelrsyjewelrt jewelrty jewelru jewelruy jewelrye jewelryh jewelryljewelrym jewelryr jewelrys jewelryt jewelryu jewelryukjewelryy jewelrz jewelsry jewelsy jeweltry jewelty jewelw jewelweryjewelwey jewelwy jewelya jewelyj jewelyr jewelyry jewelyu jewelyyjewelzry jeweory jewerey jeweriy jewerky jewerlary jewerley jewerlijewerlly jewerls jewerlt jewerlu jewerlyh jewerlyr jewerlys jewerlyujewerry jeweryl jewetry jewewlry jewewly jewewrly jewewry jeweylryjewiery jewilary jewkery jewlary jewledy jewleery jewlelery jewlely
  60. 60. The TermDictionary is not a whitelist
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×