Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Solr@
Things I’m not going to
     talk about:
     A/B Testing
        i18n
Continuos Deployment
About
 Us
10+ Million Listings
     500 qps
Architecture
 Overview
Architecture Overview
Thrift
Architecture Overview
Thrift
      struct Listing {
         1: i64 listing_id
     }

     struct ListingResults {
      ...
Architecture Overview
Thrift
Generated Java server code:
 public class Search {

   public interface Iface {

     public ...
Architecture Overview
Thrift
Why use Thrift?
    • Service Encapsulation
    • Reduced Network Traffic
Architecture Overview
Thrift
Why only return IDs?
    • Index Size
    • Easy to scale PK lookups
The Search Server
Architecture Overview
Search Server


 • Identical Code + Hardware
 • Roles/Behavior controlled by Env variables
 • Single...
Architecture Overview
Search Server




Master-specific processes:
 • Incremental Indexer
 • External File Field Updaters
Load Balancing
Load Balancing
Thrift TSocketPool
Load Balancing
Thrift TSocketPool
Load Balancing
Thrift TSocketPool
Load Balancing
Server Affinity
Load Balancing
    Server Affinity Algorithm
$serversNew = array();
                                              [“host2”, ...
Load Balancing
Server Affinity Algorithm
              $key = hexdec(substr(md5($query),0,4))




  “jewelry”               ...
Load Balancing
Server Affinity Algorithm
      $key = hexdec(substr(md5($numServers . '+' . $query),0,4))%(count($servers));...
Load Balancing
Server Affinity Results


      2%        20%
Load Balancing
Server Affinity Caveats

     • Stemming / Analysis
     • Be wary of query distribution
Replication
Replication
The Problem
Replication
The Problem
Replication
Multicast Rsync?
Replication
Multicast Rsync?
[15:25]  <engineer> patrick: i'm gonna test multi-rsyncing some indexes
from host1 to host2 a...
Replication
Multicast Rsync?
Hmm...Bit Torrent?
Replication
Bit Torrent POC
Using BitTornado:
Replication
Bit Torrent + Solr
Fork of TTorent: https://github.com/etsy/ttorrent

                            Multi-File S...
Replication
Bit Torrent + Solr
Replication
Bit Torrent + Solr
Replication
Bit Torrent + Solr
Replication
Bit Torrent + Solr
Solr InterOp
QParsers
“writing query strings
   is for suckers”
Solr InterOp
QParsers

  http://host:8393/solr/person/select/?q=_query_:%22{!dismax
  %20qf=$fqf%20v=$fnq}%22%20OR%20(_que...
Solr InterOp
QParsers

   http://host:8393/solr/person/select/?q={!personrealqp}giovanni
  %20fernandez-kincade
Solr InterOp
QParsers

 class PersonNameRealQParser extends QParser {
   public PersonNameRealQParser(String qstr, SolrPar...
Solr InterOp
   QParsers
  @Override
  public Query parse() throws ParseException {
    TermQuery exactFullNameQuery = new...
Solr InterOp
QParsers
The QParserPlugin that returns our new QParser:
  public class PersonNameRealQParserPlugin extends Q...
Solr InterOp
QParsers

Registering the plugin in solrconfig.xml:

   <queryParser name="personrealqp"
      class="com.etsy...
Custom Stemmer
Solr InterOp
Custom Stemmer
Solr InterOp
Custom Stemmer

 banded, banding, birding, bouldering, bounded, buffing, bundler, canning,
carded, circled, cou...
Solr InterOp
Custom Stemmer
First we extend KStemmer and intercept stem calls:
  public class LStemmer extends KStemmer {
...
Solr InterOp
 Custom Stemmer
Then create a TokenFilter that uses the new Stemmer:
 final class LStemFilter extends TokenFi...
Solr InterOp
Custom Stemmer
Create a FilterFactory that exposes it:
       public class LStemFilterFactory extends BaseTok...
Solr InterOp
Custom Stemmer
And finally plug it into your analysis chain:

 <analyzer>
    <tokenizer class="solr.Whitespac...
Thanks!
Solr at Etsy - Giovanni Fernandez-Kincade
Solr at Etsy - Giovanni Fernandez-Kincade
Solr at Etsy - Giovanni Fernandez-Kincade
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
Antresol mobile app
Next
Upcoming SlideShare
Antresol mobile app
Next
Download to read offline and view in fullscreen.

6

Share

Solr at Etsy - Giovanni Fernandez-Kincade

Download to read offline

See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011

Search at Etsy poses significant challenges. Our marketplace is filled with millions of unique, short-lived items and people trying to find them over 10 million times a day. In this session we'll discuss many of the solutions we've engineered to meet these challenges.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Solr at Etsy - Giovanni Fernandez-Kincade

  1. 1. Solr@
  2. 2. Things I’m not going to talk about: A/B Testing i18n Continuos Deployment
  3. 3. About Us
  4. 4. 10+ Million Listings 500 qps
  5. 5. Architecture Overview
  6. 6. Architecture Overview Thrift
  7. 7. Architecture Overview Thrift struct Listing { 1: i64 listing_id } struct ListingResults { 1: i64 count, 2: list<Listing> listings } service Search { ListingResults search(1:string query) }
  8. 8. Architecture Overview Thrift Generated Java server code: public class Search { public interface Iface { public ListingResults search(String query) throws TException; } Generated PHP client code: class SearchClient implements SearchIf { /**...**/ public function search($query) { $this->send_search($query); return $this->recv_search(); }
  9. 9. Architecture Overview Thrift Why use Thrift? • Service Encapsulation • Reduced Network Traffic
  10. 10. Architecture Overview Thrift Why only return IDs? • Index Size • Easy to scale PK lookups
  11. 11. The Search Server
  12. 12. Architecture Overview Search Server • Identical Code + Hardware • Roles/Behavior controlled by Env variables • Single Java Process • Solr running as a Jetty Servlet • Thrift Servers • Smoker
  13. 13. Architecture Overview Search Server Master-specific processes: • Incremental Indexer • External File Field Updaters
  14. 14. Load Balancing
  15. 15. Load Balancing Thrift TSocketPool
  16. 16. Load Balancing Thrift TSocketPool
  17. 17. Load Balancing Thrift TSocketPool
  18. 18. Load Balancing Server Affinity
  19. 19. Load Balancing Server Affinity Algorithm $serversNew = array(); [“host2”, “host3”, “host1”, “host4”] $numServers = count($servers); while($numServers > 0) { // Take the first 4 chars of the md5sum of the server count // and the query, mod the available servers $key = hexdec(substr(md5($numServers . '+' . $query),0,4))%($numServers); $keySet = array_keys($servers); $serverId = $keySet[$key]; // Push the chosen server onto the new list and remove it // from the initial list array_push($serversNew, $servers[$serverId]); unset($servers[$serverId]); --$numServers; }
  20. 20. Load Balancing Server Affinity Algorithm $key = hexdec(substr(md5($query),0,4)) “jewelry” [“host2”, “host3”, “host1”, “host4”] “scarf” [“host2”, “host3”, “host1”, “host4”]
  21. 21. Load Balancing Server Affinity Algorithm $key = hexdec(substr(md5($numServers . '+' . $query),0,4))%(count($servers)); “jewelry” [“host2”, “host3”, “host1”, “host4”] “scarf” [“host2”, “host1”, “host4”, “host3”]
  22. 22. Load Balancing Server Affinity Results 2% 20%
  23. 23. Load Balancing Server Affinity Caveats • Stemming / Analysis • Be wary of query distribution
  24. 24. Replication
  25. 25. Replication The Problem
  26. 26. Replication The Problem
  27. 27. Replication Multicast Rsync?
  28. 28. Replication Multicast Rsync? [15:25]  <engineer> patrick: i'm gonna test multi-rsyncing some indexes from host1 to host2 and host3 in prod. I'll be watching the graphs and what not, but let me know if you see anything funky with the network [15:26]  <patrick> ok .... [15:31]  <keyur> is the site down?
  29. 29. Replication Multicast Rsync?
  30. 30. Hmm...Bit Torrent?
  31. 31. Replication Bit Torrent POC Using BitTornado:
  32. 32. Replication Bit Torrent + Solr Fork of TTorent: https://github.com/etsy/ttorrent Multi-File Support Performance Enhancements
  33. 33. Replication Bit Torrent + Solr
  34. 34. Replication Bit Torrent + Solr
  35. 35. Replication Bit Torrent + Solr
  36. 36. Replication Bit Torrent + Solr
  37. 37. Solr InterOp
  38. 38. QParsers
  39. 39. “writing query strings is for suckers”
  40. 40. Solr InterOp QParsers http://host:8393/solr/person/select/?q=_query_:%22{!dismax %20qf=$fqf%20v=$fnq}%22%20OR%20(_query_:%22{!dismax%20qf=$fiqf %20v=$fiq}%22%20AND%20(_query_:%22{!dismax%20qf=$lwqf%20v=$lwq} %22%20OR%20_query_:%22{!dismax%20qf=$lqf%20v=$lq}%20%22))&fnq= %22giovanni%20fernandez-kincade %22&fqf=full_name^4&fiq=giovanni&fiqf=first_name^2.0%20first_name_s yn&qt=standard&lwq=fernandez-kincade*&lwqf=last_name&lq=fernandez- kincade&lqf=last_name^3
  41. 41. Solr InterOp QParsers http://host:8393/solr/person/select/?q={!personrealqp}giovanni %20fernandez-kincade
  42. 42. Solr InterOp QParsers class PersonNameRealQParser extends QParser {    public PersonNameRealQParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {      super(qstr, localParams, params, req);    }
  43. 43. Solr InterOp QParsers @Override   public Query parse() throws ParseException { TermQuery exactFullNameQuery = new TermQuery(new Term("full_name", qstr));     exactFullNameQuery.setBoost(4.0f);     String[] userQueryTerms = qstr.split("s+");     Query firstLastQuery = null;     if (2 == userQueryTerms.length)       firstLastQuery = parseAsFirstAndLast(userQueryTerms[0], userQueryTerms[1]);     else       firstLastQuery = parseAsFirstOrLast(userQueryTerms);     DisjunctionMaxQuery realNameQuery = new DisjunctionMaxQuery(0);     realNameQuery.add(exactFullNameQuery);     realNameQuery.add(firstLastQuery);     return realNameQuery;   }
  44. 44. Solr InterOp QParsers The QParserPlugin that returns our new QParser: public class PersonNameRealQParserPlugin extends QParserPlugin {    public static final String NAME = "personrealqp";    @Override    public void init(NamedList args) {}    @Override    public QParser createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {      return new PersonNameRealQParser(qstr, localParams, params, req);    } }
  45. 45. Solr InterOp QParsers Registering the plugin in solrconfig.xml: <queryParser name="personrealqp" class="com.etsy.person.solr.PersonNameRealQParserPlugin" />
  46. 46. Custom Stemmer
  47. 47. Solr InterOp Custom Stemmer
  48. 48. Solr InterOp Custom Stemmer banded, banding, birding, bouldering, bounded, buffing, bundler, canning, carded, circled, coupler, dangler, doubler, firring, foiling, hooper, japanned, lipped, napped, papered, pebbled, pitted, pocketed, reductive, ricer, rooter, roper, seeded, shouldered, silvered, skinning, spindling, staining, stitcher, strapped, threaded, yellowing
  49. 49. Solr InterOp Custom Stemmer First we extend KStemmer and intercept stem calls: public class LStemmer extends KStemmer { /**.....**/      @Override      String stem(String term) {          String override = overrideStemTransformations.get(term);          if(override != null) return override;          return super.stem(term);      } }
  50. 50. Solr InterOp Custom Stemmer Then create a TokenFilter that uses the new Stemmer: final class LStemFilter extends TokenFilter { /**.....**/         protected LStemFilter(TokenStream input, int cacheSize) { super(input); stemmer = new LStemmer(cacheSize); }          @Override public boolean incrementToken() throws IOException { /**....**/ }
  51. 51. Solr InterOp Custom Stemmer Create a FilterFactory that exposes it: public class LStemFilterFactory extends BaseTokenFilterFactory { private int cacheSize = 20000;      @Override public void init(Map<String, String> args) { super.init(args);      String cacheSizeStr = args.get("cacheSize");      if (cacheSizeStr != null) {       cacheSize = Integer.parseInt(cacheSizeStr);      }    }      @Override    public TokenStream create(TokenStream in) {     return new LStemFilter(in, cacheSize);    } }
  52. 52. Solr InterOp Custom Stemmer And finally plug it into your analysis chain: <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="solr/common/conf/stopwords.txt"/> <filter class="com.etsy.solr.analysis.LStemFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer>
  53. 53. Thanks!
  • PhillipRessler

    Nov. 14, 2015
  • cstrep

    Jan. 16, 2015
  • gavinzhm

    Dec. 21, 2012
  • fleeting001

    Mar. 14, 2012
  • hypermin

    Feb. 20, 2012
  • fujohnwang

    Feb. 1, 2012

See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011 Search at Etsy poses significant challenges. Our marketplace is filled with millions of unique, short-lived items and people trying to find them over 10 million times a day. In this session we'll discuss many of the solutions we've engineered to meet these challenges.

Views

Total views

3,028

On Slideshare

0

From embeds

0

Number of embeds

54

Actions

Downloads

25

Shares

0

Comments

0

Likes

6

×