Successfully reported this slideshow.
Your SlideShare is downloading. ×

Solr at Etsy - Giovanni Fernandez-Kincade

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Antresol mobile app
Antresol mobile app
Loading in …3
×

Check these out next

1 of 56 Ad

Solr at Etsy - Giovanni Fernandez-Kincade

Download to read offline

See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011

Search at Etsy poses significant challenges. Our marketplace is filled with millions of unique, short-lived items and people trying to find them over 10 million times a day. In this session we'll discuss many of the solutions we've engineered to meet these challenges.

See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011

Search at Etsy poses significant challenges. Our marketplace is filled with millions of unique, short-lived items and people trying to find them over 10 million times a day. In this session we'll discuss many of the solutions we've engineered to meet these challenges.

Advertisement
Advertisement

More Related Content

More from lucenerevolution (20)

Advertisement

Recently uploaded (20)

Solr at Etsy - Giovanni Fernandez-Kincade

  1. 1. Solr@
  2. 2. Things I’m not going to talk about: A/B Testing i18n Continuos Deployment
  3. 3. About Us
  4. 4. 10+ Million Listings 500 qps
  5. 5. Architecture Overview
  6. 6. Architecture Overview Thrift
  7. 7. Architecture Overview Thrift struct Listing { 1: i64 listing_id } struct ListingResults { 1: i64 count, 2: list<Listing> listings } service Search { ListingResults search(1:string query) }
  8. 8. Architecture Overview Thrift Generated Java server code: public class Search { public interface Iface { public ListingResults search(String query) throws TException; } Generated PHP client code: class SearchClient implements SearchIf { /**...**/ public function search($query) { $this->send_search($query); return $this->recv_search(); }
  9. 9. Architecture Overview Thrift Why use Thrift? • Service Encapsulation • Reduced Network Traffic
  10. 10. Architecture Overview Thrift Why only return IDs? • Index Size • Easy to scale PK lookups
  11. 11. The Search Server
  12. 12. Architecture Overview Search Server • Identical Code + Hardware • Roles/Behavior controlled by Env variables • Single Java Process • Solr running as a Jetty Servlet • Thrift Servers • Smoker
  13. 13. Architecture Overview Search Server Master-specific processes: • Incremental Indexer • External File Field Updaters
  14. 14. Load Balancing
  15. 15. Load Balancing Thrift TSocketPool
  16. 16. Load Balancing Thrift TSocketPool
  17. 17. Load Balancing Thrift TSocketPool
  18. 18. Load Balancing Server Affinity
  19. 19. Load Balancing Server Affinity Algorithm $serversNew = array(); [“host2”, “host3”, “host1”, “host4”] $numServers = count($servers); while($numServers > 0) { // Take the first 4 chars of the md5sum of the server count // and the query, mod the available servers $key = hexdec(substr(md5($numServers . '+' . $query),0,4))%($numServers); $keySet = array_keys($servers); $serverId = $keySet[$key]; // Push the chosen server onto the new list and remove it // from the initial list array_push($serversNew, $servers[$serverId]); unset($servers[$serverId]); --$numServers; }
  20. 20. Load Balancing Server Affinity Algorithm $key = hexdec(substr(md5($query),0,4)) “jewelry” [“host2”, “host3”, “host1”, “host4”] “scarf” [“host2”, “host3”, “host1”, “host4”]
  21. 21. Load Balancing Server Affinity Algorithm $key = hexdec(substr(md5($numServers . '+' . $query),0,4))%(count($servers)); “jewelry” [“host2”, “host3”, “host1”, “host4”] “scarf” [“host2”, “host1”, “host4”, “host3”]
  22. 22. Load Balancing Server Affinity Results 2% 20%
  23. 23. Load Balancing Server Affinity Caveats • Stemming / Analysis • Be wary of query distribution
  24. 24. Replication
  25. 25. Replication The Problem
  26. 26. Replication The Problem
  27. 27. Replication Multicast Rsync?
  28. 28. Replication Multicast Rsync? [15:25]  <engineer> patrick: i'm gonna test multi-rsyncing some indexes from host1 to host2 and host3 in prod. I'll be watching the graphs and what not, but let me know if you see anything funky with the network [15:26]  <patrick> ok .... [15:31]  <keyur> is the site down?
  29. 29. Replication Multicast Rsync?
  30. 30. Hmm...Bit Torrent?
  31. 31. Replication Bit Torrent POC Using BitTornado:
  32. 32. Replication Bit Torrent + Solr Fork of TTorent: https://github.com/etsy/ttorrent Multi-File Support Performance Enhancements
  33. 33. Replication Bit Torrent + Solr
  34. 34. Replication Bit Torrent + Solr
  35. 35. Replication Bit Torrent + Solr
  36. 36. Replication Bit Torrent + Solr
  37. 37. Solr InterOp
  38. 38. QParsers
  39. 39. “writing query strings is for suckers”
  40. 40. Solr InterOp QParsers http://host:8393/solr/person/select/?q=_query_:%22{!dismax %20qf=$fqf%20v=$fnq}%22%20OR%20(_query_:%22{!dismax%20qf=$fiqf %20v=$fiq}%22%20AND%20(_query_:%22{!dismax%20qf=$lwqf%20v=$lwq} %22%20OR%20_query_:%22{!dismax%20qf=$lqf%20v=$lq}%20%22))&fnq= %22giovanni%20fernandez-kincade %22&fqf=full_name^4&fiq=giovanni&fiqf=first_name^2.0%20first_name_s yn&qt=standard&lwq=fernandez-kincade*&lwqf=last_name&lq=fernandez- kincade&lqf=last_name^3
  41. 41. Solr InterOp QParsers http://host:8393/solr/person/select/?q={!personrealqp}giovanni %20fernandez-kincade
  42. 42. Solr InterOp QParsers class PersonNameRealQParser extends QParser {    public PersonNameRealQParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {      super(qstr, localParams, params, req);    }
  43. 43. Solr InterOp QParsers @Override   public Query parse() throws ParseException { TermQuery exactFullNameQuery = new TermQuery(new Term("full_name", qstr));     exactFullNameQuery.setBoost(4.0f);     String[] userQueryTerms = qstr.split("s+");     Query firstLastQuery = null;     if (2 == userQueryTerms.length)       firstLastQuery = parseAsFirstAndLast(userQueryTerms[0], userQueryTerms[1]);     else       firstLastQuery = parseAsFirstOrLast(userQueryTerms);     DisjunctionMaxQuery realNameQuery = new DisjunctionMaxQuery(0);     realNameQuery.add(exactFullNameQuery);     realNameQuery.add(firstLastQuery);     return realNameQuery;   }
  44. 44. Solr InterOp QParsers The QParserPlugin that returns our new QParser: public class PersonNameRealQParserPlugin extends QParserPlugin {    public static final String NAME = "personrealqp";    @Override    public void init(NamedList args) {}    @Override    public QParser createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {      return new PersonNameRealQParser(qstr, localParams, params, req);    } }
  45. 45. Solr InterOp QParsers Registering the plugin in solrconfig.xml: <queryParser name="personrealqp" class="com.etsy.person.solr.PersonNameRealQParserPlugin" />
  46. 46. Custom Stemmer
  47. 47. Solr InterOp Custom Stemmer
  48. 48. Solr InterOp Custom Stemmer banded, banding, birding, bouldering, bounded, buffing, bundler, canning, carded, circled, coupler, dangler, doubler, firring, foiling, hooper, japanned, lipped, napped, papered, pebbled, pitted, pocketed, reductive, ricer, rooter, roper, seeded, shouldered, silvered, skinning, spindling, staining, stitcher, strapped, threaded, yellowing
  49. 49. Solr InterOp Custom Stemmer First we extend KStemmer and intercept stem calls: public class LStemmer extends KStemmer { /**.....**/      @Override      String stem(String term) {          String override = overrideStemTransformations.get(term);          if(override != null) return override;          return super.stem(term);      } }
  50. 50. Solr InterOp Custom Stemmer Then create a TokenFilter that uses the new Stemmer: final class LStemFilter extends TokenFilter { /**.....**/         protected LStemFilter(TokenStream input, int cacheSize) { super(input); stemmer = new LStemmer(cacheSize); }          @Override public boolean incrementToken() throws IOException { /**....**/ }
  51. 51. Solr InterOp Custom Stemmer Create a FilterFactory that exposes it: public class LStemFilterFactory extends BaseTokenFilterFactory { private int cacheSize = 20000;      @Override public void init(Map<String, String> args) { super.init(args);      String cacheSizeStr = args.get("cacheSize");      if (cacheSizeStr != null) {       cacheSize = Integer.parseInt(cacheSizeStr);      }    }      @Override    public TokenStream create(TokenStream in) {     return new LStemFilter(in, cacheSize);    } }
  52. 52. Solr InterOp Custom Stemmer And finally plug it into your analysis chain: <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="solr/common/conf/stopwords.txt"/> <filter class="com.etsy.solr.analysis.LStemFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer>
  53. 53. Thanks!

×