Solr at Etsy - Giovanni Fernandez-Kincade
Upcoming SlideShare
Loading in...5

Solr at Etsy - Giovanni Fernandez-Kincade



See conference video - ...

See conference video -

Search at Etsy poses significant challenges. Our marketplace is filled with millions of unique, short-lived items and people trying to find them over 10 million times a day. In this session we'll discuss many of the solutions we've engineered to meet these challenges.



Total Views
Views on SlideShare
Embed Views



1 Embed 48 48



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Solr at Etsy - Giovanni Fernandez-Kincade Solr at Etsy - Giovanni Fernandez-Kincade Presentation Transcript

  • Solr@
  • Things I’m not going to talk about: A/B Testing i18nContinuos Deployment
  • About Us
  • 10+ Million Listings 500 qps
  • Architecture Overview
  • Architecture OverviewThrift
  • Architecture OverviewThrift struct Listing { 1: i64 listing_id } struct ListingResults { 1: i64 count, 2: list<Listing> listings } service Search { ListingResults search(1:string query) }
  • Architecture OverviewThriftGenerated Java server code: public class Search { public interface Iface { public ListingResults search(String query) throws TException; }Generated PHP client code: class SearchClient implements SearchIf { /**...**/ public function search($query) { $this->send_search($query); return $this->recv_search(); }
  • Architecture OverviewThriftWhy use Thrift? • Service Encapsulation • Reduced Network Traffic
  • Architecture OverviewThriftWhy only return IDs? • Index Size • Easy to scale PK lookups
  • The Search Server
  • Architecture OverviewSearch Server • Identical Code + Hardware • Roles/Behavior controlled by Env variables • Single Java Process • Solr running as a Jetty Servlet • Thrift Servers • Smoker
  • Architecture OverviewSearch ServerMaster-specific processes: • Incremental Indexer • External File Field Updaters
  • Load Balancing
  • Load BalancingThrift TSocketPool
  • Load BalancingThrift TSocketPool
  • Load BalancingThrift TSocketPool
  • Load BalancingServer Affinity
  • Load Balancing Server Affinity Algorithm$serversNew = array(); [“host2”, “host3”, “host1”, “host4”]$numServers = count($servers);while($numServers > 0) { // Take the first 4 chars of the md5sum of the server count // and the query, mod the available servers $key = hexdec(substr(md5($numServers . + . $query),0,4))%($numServers); $keySet = array_keys($servers); $serverId = $keySet[$key]; // Push the chosen server onto the new list and remove it // from the initial list array_push($serversNew, $servers[$serverId]); unset($servers[$serverId]); --$numServers;}
  • Load BalancingServer Affinity Algorithm $key = hexdec(substr(md5($query),0,4)) “jewelry” [“host2”, “host3”, “host1”, “host4”] “scarf” [“host2”, “host3”, “host1”, “host4”]
  • Load BalancingServer Affinity Algorithm $key = hexdec(substr(md5($numServers . + . $query),0,4))%(count($servers)); “jewelry” [“host2”, “host3”, “host1”, “host4”] “scarf” [“host2”, “host1”, “host4”, “host3”]
  • Load BalancingServer Affinity Results 2% 20%
  • Load BalancingServer Affinity Caveats • Stemming / Analysis • Be wary of query distribution
  • Replication
  • ReplicationThe Problem
  • ReplicationThe Problem
  • ReplicationMulticast Rsync?
  • ReplicationMulticast Rsync?[15:25]  <engineer> patrick: im gonna test multi-rsyncing some indexesfrom host1 to host2 and host3 in prod. Ill be watching the graphs andwhat not, but let me know if you see anything funky with the network[15:26]  <patrick> ok....[15:31]  <keyur> is the site down?
  • ReplicationMulticast Rsync?
  • Hmm...Bit Torrent?
  • ReplicationBit Torrent POCUsing BitTornado:
  • ReplicationBit Torrent + SolrFork of TTorent: Multi-File Support Performance Enhancements
  • ReplicationBit Torrent + Solr
  • ReplicationBit Torrent + Solr
  • ReplicationBit Torrent + Solr
  • ReplicationBit Torrent + Solr
  • Solr InterOp
  • QParsers
  • “writing query strings is for suckers”
  • Solr InterOpQParsers http://host:8393/solr/person/select/?q=_query_:%22{!dismax %20qf=$fqf%20v=$fnq}%22%20OR%20(_query_:%22{!dismax%20qf=$fiqf %20v=$fiq}%22%20AND%20(_query_:%22{!dismax%20qf=$lwqf%20v=$lwq} %22%20OR%20_query_:%22{!dismax%20qf=$lqf%20v=$lq}%20%22))&fnq= %22giovanni%20fernandez-kincade %22&fqf=full_name^4&fiq=giovanni&fiqf=first_name^2.0%20first_name_s yn&qt=standard&lwq=fernandez-kincade*&lwqf=last_name&lq=fernandez- kincade&lqf=last_name^3
  • Solr InterOpQParsers http://host:8393/solr/person/select/?q={!personrealqp}giovanni %20fernandez-kincade
  • Solr InterOpQParsers class PersonNameRealQParser extends QParser {   public PersonNameRealQParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {     super(qstr, localParams, params, req);   }
  • Solr InterOp QParsers @Override  public Query parse() throws ParseException { TermQuery exactFullNameQuery = new TermQuery(new Term("full_name", qstr));    exactFullNameQuery.setBoost(4.0f);    String[] userQueryTerms = qstr.split("s+");    Query firstLastQuery = null;    if (2 == userQueryTerms.length)      firstLastQuery = parseAsFirstAndLast(userQueryTerms[0], userQueryTerms[1]);    else      firstLastQuery = parseAsFirstOrLast(userQueryTerms);    DisjunctionMaxQuery realNameQuery = new DisjunctionMaxQuery(0);    realNameQuery.add(exactFullNameQuery);    realNameQuery.add(firstLastQuery);    return realNameQuery;  }
  • Solr InterOpQParsersThe QParserPlugin that returns our new QParser: public class PersonNameRealQParserPlugin extends QParserPlugin {   public static final String NAME = "personrealqp";   @Override   public void init(NamedList args) {}   @Override   public QParser createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {     return new PersonNameRealQParser(qstr, localParams, params, req);   } }
  • Solr InterOpQParsersRegistering the plugin in solrconfig.xml: <queryParser name="personrealqp" class="com.etsy.person.solr.PersonNameRealQParserPlugin" />
  • Custom Stemmer
  • Solr InterOpCustom Stemmer
  • Solr InterOpCustom Stemmer banded, banding, birding, bouldering, bounded, buffing, bundler, canning,carded, circled, coupler, dangler, doubler, firring, foiling, hooper, japanned,lipped, napped, papered, pebbled, pitted, pocketed, reductive, ricer, rooter,roper, seeded, shouldered, silvered, skinning, spindling, staining, stitcher, strapped, threaded, yellowing
  • Solr InterOpCustom StemmerFirst we extend KStemmer and intercept stem calls: public class LStemmer extends KStemmer { /**.....**/      @Override      String stem(String term) {          String override = overrideStemTransformations.get(term);          if(override != null) return override;          return super.stem(term);      } }
  • Solr InterOp Custom StemmerThen create a TokenFilter that uses the new Stemmer: final class LStemFilter extends TokenFilter { /**.....**/         protected LStemFilter(TokenStream input, int cacheSize) { super(input); stemmer = new LStemmer(cacheSize); }         @Override public boolean incrementToken() throws IOException { /**....**/ }
  • Solr InterOpCustom StemmerCreate a FilterFactory that exposes it: public class LStemFilterFactory extends BaseTokenFilterFactory { private int cacheSize = 20000;      @Override public void init(Map<String, String> args) { super.init(args);      String cacheSizeStr = args.get("cacheSize");      if (cacheSizeStr != null) {       cacheSize = Integer.parseInt(cacheSizeStr);      }    }      @Override    public TokenStream create(TokenStream in) {     return new LStemFilter(in, cacheSize);    } }
  • Solr InterOpCustom StemmerAnd finally plug it into your analysis chain: <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="solr/common/conf/stopwords.txt"/> <filter class="com.etsy.solr.analysis.LStemFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer>
  • Thanks!