Solr@
Things I’m not going to     talk about:     A/B Testing        i18nContinuos Deployment
About Us
10+ Million Listings     500 qps
Architecture Overview
Architecture OverviewThrift
Architecture OverviewThrift      struct Listing {         1: i64 listing_id     }     struct ListingResults {         1: i...
Architecture OverviewThriftGenerated Java server code: public class Search {   public interface Iface {     public Listing...
Architecture OverviewThriftWhy use Thrift?    • Service Encapsulation    • Reduced Network Traffic
Architecture OverviewThriftWhy only return IDs?    • Index Size    • Easy to scale PK lookups
The Search Server
Architecture OverviewSearch Server • Identical Code + Hardware • Roles/Behavior controlled by Env variables • Single Java ...
Architecture OverviewSearch ServerMaster-specific processes: • Incremental Indexer • External File Field Updaters
Load Balancing
Load BalancingThrift TSocketPool
Load BalancingThrift TSocketPool
Load BalancingThrift TSocketPool
Load BalancingServer Affinity
Load Balancing    Server Affinity Algorithm$serversNew = array();                                              [“host2”, “ho...
Load BalancingServer Affinity Algorithm              $key = hexdec(substr(md5($query),0,4))  “jewelry”                 [“hos...
Load BalancingServer Affinity Algorithm      $key = hexdec(substr(md5($numServers . + . $query),0,4))%(count($servers));  “j...
Load BalancingServer Affinity Results      2%        20%
Load BalancingServer Affinity Caveats     • Stemming / Analysis     • Be wary of query distribution
Replication
ReplicationThe Problem
ReplicationThe Problem
ReplicationMulticast Rsync?
ReplicationMulticast Rsync?[15:25]  <engineer> patrick: im gonna test multi-rsyncing some indexesfrom host1 to host2 and h...
ReplicationMulticast Rsync?
Hmm...Bit Torrent?
ReplicationBit Torrent POCUsing BitTornado:
ReplicationBit Torrent + SolrFork of TTorent: https://github.com/etsy/ttorrent                            Multi-File Suppo...
ReplicationBit Torrent + Solr
ReplicationBit Torrent + Solr
ReplicationBit Torrent + Solr
ReplicationBit Torrent + Solr
Solr InterOp
QParsers
“writing query strings   is for suckers”
Solr InterOpQParsers  http://host:8393/solr/person/select/?q=_query_:%22{!dismax  %20qf=$fqf%20v=$fnq}%22%20OR%20(_query_:...
Solr InterOpQParsers   http://host:8393/solr/person/select/?q={!personrealqp}giovanni  %20fernandez-kincade
Solr InterOpQParsers class PersonNameRealQParser extends QParser {   public PersonNameRealQParser(String qstr, SolrParams ...
Solr InterOp   QParsers  @Override  public Query parse() throws ParseException {    TermQuery exactFullNameQuery = new Ter...
Solr InterOpQParsersThe QParserPlugin that returns our new QParser:  public class PersonNameRealQParserPlugin extends QPar...
Solr InterOpQParsersRegistering the plugin in solrconfig.xml:   <queryParser name="personrealqp"      class="com.etsy.perso...
Custom Stemmer
Solr InterOpCustom Stemmer
Solr InterOpCustom Stemmer banded, banding, birding, bouldering, bounded, buffing, bundler, canning,carded, circled, coupler...
Solr InterOpCustom StemmerFirst we extend KStemmer and intercept stem calls:  public class LStemmer extends KStemmer {    ...
Solr InterOp Custom StemmerThen create a TokenFilter that uses the new Stemmer: final class LStemFilter extends TokenFilte...
Solr InterOpCustom StemmerCreate a FilterFactory that exposes it:       public class LStemFilterFactory extends BaseTokenF...
Solr InterOpCustom StemmerAnd finally plug it into your analysis chain: <analyzer>    <tokenizer class="solr.WhitespaceToke...
Thanks!
Solr at Etsy - Giovanni Fernandez-Kincade
Solr at Etsy - Giovanni Fernandez-Kincade
Solr at Etsy - Giovanni Fernandez-Kincade
Upcoming SlideShare
Loading in...5
×

Solr at Etsy - Giovanni Fernandez-Kincade

1,948

Published on

See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011

Search at Etsy poses significant challenges. Our marketplace is filled with millions of unique, short-lived items and people trying to find them over 10 million times a day. In this session we'll discuss many of the solutions we've engineered to meet these challenges.

Published in: Technology, Business
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,948
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
20
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Solr at Etsy - Giovanni Fernandez-Kincade

  1. 1. Solr@
  2. 2. Things I’m not going to talk about: A/B Testing i18nContinuos Deployment
  3. 3. About Us
  4. 4. 10+ Million Listings 500 qps
  5. 5. Architecture Overview
  6. 6. Architecture OverviewThrift
  7. 7. Architecture OverviewThrift struct Listing { 1: i64 listing_id } struct ListingResults { 1: i64 count, 2: list<Listing> listings } service Search { ListingResults search(1:string query) }
  8. 8. Architecture OverviewThriftGenerated Java server code: public class Search { public interface Iface { public ListingResults search(String query) throws TException; }Generated PHP client code: class SearchClient implements SearchIf { /**...**/ public function search($query) { $this->send_search($query); return $this->recv_search(); }
  9. 9. Architecture OverviewThriftWhy use Thrift? • Service Encapsulation • Reduced Network Traffic
  10. 10. Architecture OverviewThriftWhy only return IDs? • Index Size • Easy to scale PK lookups
  11. 11. The Search Server
  12. 12. Architecture OverviewSearch Server • Identical Code + Hardware • Roles/Behavior controlled by Env variables • Single Java Process • Solr running as a Jetty Servlet • Thrift Servers • Smoker
  13. 13. Architecture OverviewSearch ServerMaster-specific processes: • Incremental Indexer • External File Field Updaters
  14. 14. Load Balancing
  15. 15. Load BalancingThrift TSocketPool
  16. 16. Load BalancingThrift TSocketPool
  17. 17. Load BalancingThrift TSocketPool
  18. 18. Load BalancingServer Affinity
  19. 19. Load Balancing Server Affinity Algorithm$serversNew = array(); [“host2”, “host3”, “host1”, “host4”]$numServers = count($servers);while($numServers > 0) { // Take the first 4 chars of the md5sum of the server count // and the query, mod the available servers $key = hexdec(substr(md5($numServers . + . $query),0,4))%($numServers); $keySet = array_keys($servers); $serverId = $keySet[$key]; // Push the chosen server onto the new list and remove it // from the initial list array_push($serversNew, $servers[$serverId]); unset($servers[$serverId]); --$numServers;}
  20. 20. Load BalancingServer Affinity Algorithm $key = hexdec(substr(md5($query),0,4)) “jewelry” [“host2”, “host3”, “host1”, “host4”] “scarf” [“host2”, “host3”, “host1”, “host4”]
  21. 21. Load BalancingServer Affinity Algorithm $key = hexdec(substr(md5($numServers . + . $query),0,4))%(count($servers)); “jewelry” [“host2”, “host3”, “host1”, “host4”] “scarf” [“host2”, “host1”, “host4”, “host3”]
  22. 22. Load BalancingServer Affinity Results 2% 20%
  23. 23. Load BalancingServer Affinity Caveats • Stemming / Analysis • Be wary of query distribution
  24. 24. Replication
  25. 25. ReplicationThe Problem
  26. 26. ReplicationThe Problem
  27. 27. ReplicationMulticast Rsync?
  28. 28. ReplicationMulticast Rsync?[15:25]  <engineer> patrick: im gonna test multi-rsyncing some indexesfrom host1 to host2 and host3 in prod. Ill be watching the graphs andwhat not, but let me know if you see anything funky with the network[15:26]  <patrick> ok....[15:31]  <keyur> is the site down?
  29. 29. ReplicationMulticast Rsync?
  30. 30. Hmm...Bit Torrent?
  31. 31. ReplicationBit Torrent POCUsing BitTornado:
  32. 32. ReplicationBit Torrent + SolrFork of TTorent: https://github.com/etsy/ttorrent Multi-File Support Performance Enhancements
  33. 33. ReplicationBit Torrent + Solr
  34. 34. ReplicationBit Torrent + Solr
  35. 35. ReplicationBit Torrent + Solr
  36. 36. ReplicationBit Torrent + Solr
  37. 37. Solr InterOp
  38. 38. QParsers
  39. 39. “writing query strings is for suckers”
  40. 40. Solr InterOpQParsers http://host:8393/solr/person/select/?q=_query_:%22{!dismax %20qf=$fqf%20v=$fnq}%22%20OR%20(_query_:%22{!dismax%20qf=$fiqf %20v=$fiq}%22%20AND%20(_query_:%22{!dismax%20qf=$lwqf%20v=$lwq} %22%20OR%20_query_:%22{!dismax%20qf=$lqf%20v=$lq}%20%22))&fnq= %22giovanni%20fernandez-kincade %22&fqf=full_name^4&fiq=giovanni&fiqf=first_name^2.0%20first_name_s yn&qt=standard&lwq=fernandez-kincade*&lwqf=last_name&lq=fernandez- kincade&lqf=last_name^3
  41. 41. Solr InterOpQParsers http://host:8393/solr/person/select/?q={!personrealqp}giovanni %20fernandez-kincade
  42. 42. Solr InterOpQParsers class PersonNameRealQParser extends QParser {   public PersonNameRealQParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {     super(qstr, localParams, params, req);   }
  43. 43. Solr InterOp QParsers @Override  public Query parse() throws ParseException { TermQuery exactFullNameQuery = new TermQuery(new Term("full_name", qstr));    exactFullNameQuery.setBoost(4.0f);    String[] userQueryTerms = qstr.split("s+");    Query firstLastQuery = null;    if (2 == userQueryTerms.length)      firstLastQuery = parseAsFirstAndLast(userQueryTerms[0], userQueryTerms[1]);    else      firstLastQuery = parseAsFirstOrLast(userQueryTerms);    DisjunctionMaxQuery realNameQuery = new DisjunctionMaxQuery(0);    realNameQuery.add(exactFullNameQuery);    realNameQuery.add(firstLastQuery);    return realNameQuery;  }
  44. 44. Solr InterOpQParsersThe QParserPlugin that returns our new QParser: public class PersonNameRealQParserPlugin extends QParserPlugin {   public static final String NAME = "personrealqp";   @Override   public void init(NamedList args) {}   @Override   public QParser createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {     return new PersonNameRealQParser(qstr, localParams, params, req);   } }
  45. 45. Solr InterOpQParsersRegistering the plugin in solrconfig.xml: <queryParser name="personrealqp" class="com.etsy.person.solr.PersonNameRealQParserPlugin" />
  46. 46. Custom Stemmer
  47. 47. Solr InterOpCustom Stemmer
  48. 48. Solr InterOpCustom Stemmer banded, banding, birding, bouldering, bounded, buffing, bundler, canning,carded, circled, coupler, dangler, doubler, firring, foiling, hooper, japanned,lipped, napped, papered, pebbled, pitted, pocketed, reductive, ricer, rooter,roper, seeded, shouldered, silvered, skinning, spindling, staining, stitcher, strapped, threaded, yellowing
  49. 49. Solr InterOpCustom StemmerFirst we extend KStemmer and intercept stem calls: public class LStemmer extends KStemmer { /**.....**/      @Override      String stem(String term) {          String override = overrideStemTransformations.get(term);          if(override != null) return override;          return super.stem(term);      } }
  50. 50. Solr InterOp Custom StemmerThen create a TokenFilter that uses the new Stemmer: final class LStemFilter extends TokenFilter { /**.....**/         protected LStemFilter(TokenStream input, int cacheSize) { super(input); stemmer = new LStemmer(cacheSize); }         @Override public boolean incrementToken() throws IOException { /**....**/ }
  51. 51. Solr InterOpCustom StemmerCreate a FilterFactory that exposes it: public class LStemFilterFactory extends BaseTokenFilterFactory { private int cacheSize = 20000;      @Override public void init(Map<String, String> args) { super.init(args);      String cacheSizeStr = args.get("cacheSize");      if (cacheSizeStr != null) {       cacheSize = Integer.parseInt(cacheSizeStr);      }    }      @Override    public TokenStream create(TokenStream in) {     return new LStemFilter(in, cacheSize);    } }
  52. 52. Solr InterOpCustom StemmerAnd finally plug it into your analysis chain: <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="solr/common/conf/stopwords.txt"/> <filter class="com.etsy.solr.analysis.LStemFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer>
  53. 53. Thanks!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×