Solr at Etsy - Giovanni Fernandez-Kincade

2,330 views
2,177 views

Published on

See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011

Search at Etsy poses significant challenges. Our marketplace is filled with millions of unique, short-lived items and people trying to find them over 10 million times a day. In this session we'll discuss many of the solutions we've engineered to meet these challenges.

Published in: Technology, Business
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,330
On SlideShare
0
From Embeds
0
Number of Embeds
55
Actions
Shares
0
Downloads
20
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Solr at Etsy - Giovanni Fernandez-Kincade

  1. 1. Solr@
  2. 2. Things I’m not going to talk about: A/B Testing i18nContinuos Deployment
  3. 3. About Us
  4. 4. 10+ Million Listings 500 qps
  5. 5. Architecture Overview
  6. 6. Architecture OverviewThrift
  7. 7. Architecture OverviewThrift struct Listing { 1: i64 listing_id } struct ListingResults { 1: i64 count, 2: list<Listing> listings } service Search { ListingResults search(1:string query) }
  8. 8. Architecture OverviewThriftGenerated Java server code: public class Search { public interface Iface { public ListingResults search(String query) throws TException; }Generated PHP client code: class SearchClient implements SearchIf { /**...**/ public function search($query) { $this->send_search($query); return $this->recv_search(); }
  9. 9. Architecture OverviewThriftWhy use Thrift? • Service Encapsulation • Reduced Network Traffic
  10. 10. Architecture OverviewThriftWhy only return IDs? • Index Size • Easy to scale PK lookups
  11. 11. The Search Server
  12. 12. Architecture OverviewSearch Server • Identical Code + Hardware • Roles/Behavior controlled by Env variables • Single Java Process • Solr running as a Jetty Servlet • Thrift Servers • Smoker
  13. 13. Architecture OverviewSearch ServerMaster-specific processes: • Incremental Indexer • External File Field Updaters
  14. 14. Load Balancing
  15. 15. Load BalancingThrift TSocketPool
  16. 16. Load BalancingThrift TSocketPool
  17. 17. Load BalancingThrift TSocketPool
  18. 18. Load BalancingServer Affinity
  19. 19. Load Balancing Server Affinity Algorithm$serversNew = array(); [“host2”, “host3”, “host1”, “host4”]$numServers = count($servers);while($numServers > 0) { // Take the first 4 chars of the md5sum of the server count // and the query, mod the available servers $key = hexdec(substr(md5($numServers . + . $query),0,4))%($numServers); $keySet = array_keys($servers); $serverId = $keySet[$key]; // Push the chosen server onto the new list and remove it // from the initial list array_push($serversNew, $servers[$serverId]); unset($servers[$serverId]); --$numServers;}
  20. 20. Load BalancingServer Affinity Algorithm $key = hexdec(substr(md5($query),0,4)) “jewelry” [“host2”, “host3”, “host1”, “host4”] “scarf” [“host2”, “host3”, “host1”, “host4”]
  21. 21. Load BalancingServer Affinity Algorithm $key = hexdec(substr(md5($numServers . + . $query),0,4))%(count($servers)); “jewelry” [“host2”, “host3”, “host1”, “host4”] “scarf” [“host2”, “host1”, “host4”, “host3”]
  22. 22. Load BalancingServer Affinity Results 2% 20%
  23. 23. Load BalancingServer Affinity Caveats • Stemming / Analysis • Be wary of query distribution
  24. 24. Replication
  25. 25. ReplicationThe Problem
  26. 26. ReplicationThe Problem
  27. 27. ReplicationMulticast Rsync?
  28. 28. ReplicationMulticast Rsync?[15:25]  <engineer> patrick: im gonna test multi-rsyncing some indexesfrom host1 to host2 and host3 in prod. Ill be watching the graphs andwhat not, but let me know if you see anything funky with the network[15:26]  <patrick> ok....[15:31]  <keyur> is the site down?
  29. 29. ReplicationMulticast Rsync?
  30. 30. Hmm...Bit Torrent?
  31. 31. ReplicationBit Torrent POCUsing BitTornado:
  32. 32. ReplicationBit Torrent + SolrFork of TTorent: https://github.com/etsy/ttorrent Multi-File Support Performance Enhancements
  33. 33. ReplicationBit Torrent + Solr
  34. 34. ReplicationBit Torrent + Solr
  35. 35. ReplicationBit Torrent + Solr
  36. 36. ReplicationBit Torrent + Solr
  37. 37. Solr InterOp
  38. 38. QParsers
  39. 39. “writing query strings is for suckers”
  40. 40. Solr InterOpQParsers http://host:8393/solr/person/select/?q=_query_:%22{!dismax %20qf=$fqf%20v=$fnq}%22%20OR%20(_query_:%22{!dismax%20qf=$fiqf %20v=$fiq}%22%20AND%20(_query_:%22{!dismax%20qf=$lwqf%20v=$lwq} %22%20OR%20_query_:%22{!dismax%20qf=$lqf%20v=$lq}%20%22))&fnq= %22giovanni%20fernandez-kincade %22&fqf=full_name^4&fiq=giovanni&fiqf=first_name^2.0%20first_name_s yn&qt=standard&lwq=fernandez-kincade*&lwqf=last_name&lq=fernandez- kincade&lqf=last_name^3
  41. 41. Solr InterOpQParsers http://host:8393/solr/person/select/?q={!personrealqp}giovanni %20fernandez-kincade
  42. 42. Solr InterOpQParsers class PersonNameRealQParser extends QParser {   public PersonNameRealQParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {     super(qstr, localParams, params, req);   }
  43. 43. Solr InterOp QParsers @Override  public Query parse() throws ParseException { TermQuery exactFullNameQuery = new TermQuery(new Term("full_name", qstr));    exactFullNameQuery.setBoost(4.0f);    String[] userQueryTerms = qstr.split("s+");    Query firstLastQuery = null;    if (2 == userQueryTerms.length)      firstLastQuery = parseAsFirstAndLast(userQueryTerms[0], userQueryTerms[1]);    else      firstLastQuery = parseAsFirstOrLast(userQueryTerms);    DisjunctionMaxQuery realNameQuery = new DisjunctionMaxQuery(0);    realNameQuery.add(exactFullNameQuery);    realNameQuery.add(firstLastQuery);    return realNameQuery;  }
  44. 44. Solr InterOpQParsersThe QParserPlugin that returns our new QParser: public class PersonNameRealQParserPlugin extends QParserPlugin {   public static final String NAME = "personrealqp";   @Override   public void init(NamedList args) {}   @Override   public QParser createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {     return new PersonNameRealQParser(qstr, localParams, params, req);   } }
  45. 45. Solr InterOpQParsersRegistering the plugin in solrconfig.xml: <queryParser name="personrealqp" class="com.etsy.person.solr.PersonNameRealQParserPlugin" />
  46. 46. Custom Stemmer
  47. 47. Solr InterOpCustom Stemmer
  48. 48. Solr InterOpCustom Stemmer banded, banding, birding, bouldering, bounded, buffing, bundler, canning,carded, circled, coupler, dangler, doubler, firring, foiling, hooper, japanned,lipped, napped, papered, pebbled, pitted, pocketed, reductive, ricer, rooter,roper, seeded, shouldered, silvered, skinning, spindling, staining, stitcher, strapped, threaded, yellowing
  49. 49. Solr InterOpCustom StemmerFirst we extend KStemmer and intercept stem calls: public class LStemmer extends KStemmer { /**.....**/      @Override      String stem(String term) {          String override = overrideStemTransformations.get(term);          if(override != null) return override;          return super.stem(term);      } }
  50. 50. Solr InterOp Custom StemmerThen create a TokenFilter that uses the new Stemmer: final class LStemFilter extends TokenFilter { /**.....**/         protected LStemFilter(TokenStream input, int cacheSize) { super(input); stemmer = new LStemmer(cacheSize); }         @Override public boolean incrementToken() throws IOException { /**....**/ }
  51. 51. Solr InterOpCustom StemmerCreate a FilterFactory that exposes it: public class LStemFilterFactory extends BaseTokenFilterFactory { private int cacheSize = 20000;      @Override public void init(Map<String, String> args) { super.init(args);      String cacheSizeStr = args.get("cacheSize");      if (cacheSizeStr != null) {       cacheSize = Integer.parseInt(cacheSizeStr);      }    }      @Override    public TokenStream create(TokenStream in) {     return new LStemFilter(in, cacheSize);    } }
  52. 52. Solr InterOpCustom StemmerAnd finally plug it into your analysis chain: <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="solr/common/conf/stopwords.txt"/> <filter class="com.etsy.solr.analysis.LStemFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer>
  53. 53. Thanks!

×