Solr & Lucene @ Etsy by Gregg Donovan
Upcoming SlideShare
Loading in...5

Solr & Lucene @ Etsy by Gregg Donovan



Slides from my talk on "Solr & Lucene @ Etsy" from the LuceneRevolution conference on May 26th, 2011 in San Francisco.

Slides from my talk on "Solr & Lucene @ Etsy" from the LuceneRevolution conference on May 26th, 2011 in San Francisco.



Total Views
Views on SlideShare
Embed Views



7 Embeds 98 47 20 13 9 4 4 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Solr & Lucene @ Etsy by Gregg Donovan Solr & Lucene @ Etsy by Gregg Donovan Presentation Transcript

  • Solr & Lucene at Etsy Gregg Donovan Technical Lead, Search
  • 1.5 years Solr & Lucene at Etsy.com3 years Solr & Lucene at
  • 8+ million members
  • 9.3 million items
  • 800k+ active sellers
  • 1+ billion pageviews / month
  • Maximize Solr out-of-the-box
  • Hack at a low-level
  • Know when to do each
  • Or
  • Don’t fear trunk
  • http://localhost:8393/solr/placesuggest/ select? q={!lucene}s* &sfield=latlong&pt=37.595804,-122.364521&sort=div(geodist(),sqrt(sum(population,50))) %20asc
  • {!lucene} {!field} {!term} {!boost} {!func}{!dismax}{!edismax}
  • Cheap ranking awesomeness
  • ExternalFileField ftw!
  • schema.xml: <fieldType name="file" keyField="treasury_id" defVal="0"stored="false" indexed="true" class="solr.ExternalFileField"valType="float"/> <field name="hotness" type="file"/>/search/data/treasury/external_hotness.1306390802088:1=2.32=1.73=1.1Solr query:sort={!func}hotness+desc
  • ExternalFileField caveats
  • More relevance: boost query
  • http://localhost:8983/solr/listings/select?q={!boost b=$rel v=$qq}&rel=category:furniture^10+OR+((-material:acrylic)^5)&qq=desk
  • Impression tracking
  • Side-by-Side testing
  • Cheap performance wins
  • Put off sharding till you must
  • cat ${indexDir}/* > /dev/null
  • Return IDs, minimize stored fields
  • RAM: $10-20 / GB
  • SSD: 0.1ms vs 10ms seek
  • Custom?
  • solr-user
  • Tools for low-level hacking
  • Continuous deployment
  • One button.So easy a dog could do it.
  • Tracking GC
  • export GC_DEBUG="-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicy -XX:AdaptiveSizePolicyOutputInterval=1 -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -Xloggc:/var/log/search/gc.log"
  • Alerting
  • Testing
  • SaveAsFixture
  • Profiling
  • Java Primitive Library fastutil trove4j
  • Know the hooks SolrRequestHandler SearchComponent QParserPlugin SolrEventListener SolrCache ValueSourceParser
  • SolrIndexSearcher gotchas reference counting using it as a cache key: WeakHashMap<SolrIndexSearcher,MyValue> myCache...
  • Example:personalized collections
  • fq={!term f=id}123 OR {!term f=id}456
  • Need a map of PK to docId
  • Use custom SolrCache plus SolrEventListener to fill it
  • i18n currency sorting and filtering
  • currency.xml:<currencyConfig version="1.0">! <currencies>! ! <currency name="United States Dollar" symbol="$" code="USD"/>! ! <currency name="Australian Dollar" symbol="$" code="AUD"/>! ! <currency name="Canadian Dollar" symbol="$" code="CAD"/>! ! <currency name="Czech Koruna" symbol="Kč" code="CZK"/>...! </currencies>! <rates>! ! <rate from="USD" to="AUD" rate="1.168750"/>! ! <rate from="USD" to="CAD" rate="1.085000"/>! ! <rate from="USD" to="CZK" rate="20.107500"/>! ! <rate from="USD" to="DKK" rate="5.323750"/>... </rates></currencyConfig>
  • price:[$10.00 to $50.00]price:[10.00USD to 50.00USD] price:20.00EUR
  • @Override public Query getRangeQuery(QParser parser, SchemaField field, String part1, String part2,final boolean minInclusive, final boolean maxInclusive) { final MoneyValue p1 = MoneyValue.parse(part1, defaultCurrency); final MoneyValue p2 = MoneyValue.parse(part2, defaultCurrency); if (!p1.getCurrencyCode().equals(p2.getCurrencyCode())) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, new ParseException("Cannot parse range query " + part1 + " to " + part2 + ": range queries only supported when upper and lower bound have samecurrency.")); } String currencyCode = p1.getCurrencyCode(); final MoneyValueSource vs = new MoneyValueSource(field, currencyCode, parser); return new SolrConstantScoreQuery(new ValueSourceRangeFilter(vs, p1.getAmount() + "", p2.getAmount() + "", minInclusive, maxInclusive)); }
  • Replication gotcha
  • SOLR-2202
  • Related Searches
  • Autosuggest!
  • bjewlery dewelry ejewelry ejwelry ewelery ewerly ewlery fewelryfewlery fjewelery fjewelry gewerly gewlery hewelery hewelry hewerlyhewlery hjewelry iewelry ijewelry jawelery jawlery jeawlery jeeleryjeelry jeewelery jeewelry jeewlery jeewlry jefwelry jejelry jelelryjelery jellery jelwelery jelwelry jelwlery jemelry jemerly jemwelryjeqwelry jerelery jerelry jerely jererly jerlery jerwelery jerwelryjerwely jerwerly jeselery jeselry jevelry jeverly jewalery jewdelryjewedlry jeweelrry jeweelry jeweely jeweer jeweery jeweilry jeweiryjewejery jewejlry jewejrly jewejry jewekey jewekry jewelary jeweldyjewele jewelee jewelelry jewelera jewelerey jewelerly jewelertjewelerty jeweleru jeweleruy jeweleryl jewelerys jeweleryy jeweletjewelety jeweleya jewelfry jewelfy jeweliy jewellryp jewelltryjewelly jewelory jewelra jewelray jewelre jewelree jewelreyyjewelrfy jewelrh jewelri jewelrky jewelrly jewelrr jewelrs jewelrsyjewelrt jewelrty jewelru jewelruy jewelrye jewelryh jewelryljewelrym jewelryr jewelrys jewelryt jewelryu jewelryukjewelryy jewelrz jewelsry jewelsy jeweltry jewelty jewelw jewelweryjewelwey jewelwy jewelya jewelyj jewelyr jewelyry jewelyu jewelyyjewelzry jeweory jewerey jeweriy jewerky jewerlary jewerley jewerlijewerlly jewerls jewerlt jewerlu jewerlyh jewerlyr jewerlys jewerlyujewerry jeweryl jewetry jewewlry jewewly jewewrly jewewry jeweylryjewiery jewilary jewkery jewlary jewledy jewleery jewlelery jewlely
  • The TermDictionary is not a whitelist