Data at Tumblr

                            Adam Laiacano
                        NYC Data Science Meetup

                           @adamlaiacano
                        adamlaiacano.tumblr.com

Monday, April 8, 13
What I Needed to Learn
                      When I Started My Job




Monday, April 8, 13
About Me


                           Electrical Engineering background
                      Worked at CBS to learn more about stats / data

                              Joined Tumblr in August 2011
                              40th employee, now over 160




Monday, April 8, 13
About Tumblr
                      blogging platform / social network
                             100,000,000 blogs!

                              unique signals:
                       asynchronous following graph
                           reblogs, likes, replies




Monday, April 8, 13
About You
                Country   Month   Value
                  USA     March   10000
                  USA     April   12000
                  USA      May    14000         Country   March   Apr     May
                Canada    March    7000          USA      10000   12000   14000
                Canada    April    6500         Canada    7000    6500    5000
                Canada     May     5000         France    1200    1400    2000
                 France   March    1200
                 France   April    1400
                 France    May     2000




Monday, April 8, 13
About You
                Country   Month   Value
                  USA     March   10000
                  USA     April   12000
                  USA      May    14000         Country   March   Apr     May
                Canada    March    7000          USA      10000   12000   14000
                Canada    April    6500         Canada    7000    6500    5000
                Canada     May     5000         France    1200    1400    2000
                 France   March    1200
                 France   April    1400
                 France    May     2000




                                      Pivot Table!
Monday, April 8, 13
About You
                  Country   Month   Value
                    USA     March    1000
                    USA     April   12000
                                                  Country   March   Apr     May
                    USA      May    14000
                  Canada    March    7000          USA      10000   12000   14000
                  Canada    April    6500         Canada    7000    6500    5000
                  Canada     May     5000         France    1200    1400    2000
                   France   March    1200
                   France   April    1400
                   France    May     2000




Monday, April 8, 13
About You
                  Country   Month   Value
                    USA     March    1000
                    USA     April   12000
                                                  Country   March   Apr     May
                    USA      May    14000
                  Canada    March    7000          USA      10000   12000   14000
                  Canada    April    6500         Canada    7000    6500    5000
                  Canada     May     5000         France    1200    1400    2000
                   France   March    1200
                   France   April    1400
                   France    May     2000

         pivoted <- cast(melted, country~month)
         melted <- melt.data.frame(pivoted, id.vars='country')


Monday, April 8, 13
About You
                Country   Month   Value
                  USA     March    1000
                  USA     April   12000
                  USA      May    14000         Country   March   Apr     May
                Canada    March    7000          USA      10000   12000   14000
                Canada    April    6500         Canada    7000    6500    5000
                Canada     May     5000         France    1200    1400    2000
                 France   March    1200
                 France   April    1400
                 France    May     2000




Monday, April 8, 13
About You
                Country   Month   Value
                  USA     March    1000
                  USA     April   12000
                  USA      May    14000         Country   March   Apr     May
                Canada    March    7000          USA      10000   12000   14000
                Canada    April    6500         Canada    7000    6500    5000
                Canada     May     5000         France    1200    1400    2000
                 France   March    1200
                 France   April    1400
                 France    May     2000




                                    Who Cares?
Monday, April 8, 13
One more question:




Monday, April 8, 13
Monday, April 8, 13
Hadoop




Monday, April 8, 13
What tools we use

                      What we do with those tools




Monday, April 8, 13
Plumbing




                             John D. Cook "The plumber programmer"
                             November 2011 http://bit.ly/XfcXrt

Monday, April 8, 13
Pipes

                      1. Record events / actions
                      2. Store / archive everything
                      3. Extract information
                        a. Reports / BI
                        b. Back to Tumblr application




Monday, April 8, 13
Step 1: Log Events
                      GiantOctopus: in-house event logging system.

                 Built-in Variables
                 •timestamp                 GiantOctopus::log(
                                                ‘posts’,
                 •referring page                array(‘send_to_fb’=>1,
                 •user identifier                     )
                                                       ‘send_to_twitter’=>0

                 •action identifier          );

                 •location (city)
                 •language setting

Monday, April 8, 13
Scribe
                      Web Servers       Scribe Servers




                              Continuously               Daily
                                                                 HDFS
                                Writing                  Cron

Monday, April 8, 13
Step 2: Store in Hadoop
                              One huge computer:
                               300TB hard drive
                                 7.8TB of RAM
                       85 x 2 = 170 hex-core processors




Monday, April 8, 13
Step 2: Store in Hadoop
                              One huge computer:
                               300TB hard drive
                                 7.8TB of RAM
                       85 x 2 = 170 hex-core processors


                               One huge PITA:
                         awful docs (search-hadoop.com helps)
                               java everywhere
                           fragmented community


Monday, April 8, 13
Hadoop

                         hive

                         pig

                      map/reduce




Monday, April 8, 13
Hive

           "Basically SQL"                      10 most liked posts

           Compiles to Java map/reduce
                                                 SELECT
           About 100 hive tables                     root_post_id,
                                                     count(*) AS likes
                                                 FROM posts
                                                 WHERE
           Each "table" is really a directory        action='like'
           of flat files                           ORDER BY likes DESC
                                                 LIMIIT 10;




Monday, April 8, 13
Hive Partitions
                        File location in HDFS         Hive partition value
                      /posts/2013/03/26/*.lzo          dt='2013-03-26'
                      /posts/2013/03/27/*.lzo          dt='2013-03-26'
                      /posts/2013/03/28/*.lzo          dt='2013-03-26'




                                                SELECT action, COUNT(*) AS views
        SELECT action, COUNT(*) AS views
                                                    FROM pageviews
            FROM pageviews
                                                    WHERE ts > 1330927200
            WHERE dt = "2012-03-05"
                                                        AND ts < 1331013600
            GROUP BY action
                                                    GROUP BY action

                          204 mappers                  22,895 mappers

Monday, April 8, 13
Extending Hive: Streaming
                      •Add all .py files you’ll need to the query
                      •Sends each record to python script via stdin
                      •Can be used as a subquery in a “normal” hive query

                                          #!/usr/bin/python
             add file helpers.py;
                                          ## helpers.py
             FROM
                                          import sys, re
                 users
                                          gmail = re.compile(r'.+@gmail.com')
             SELECT
                                          for row in sys.stdin:
               TRANSFORM(id, email)
                                              id, email = row.split('t')
               USING 'helpers.py'
                                              if gmail.match(email):
               AS (id_with_gmail)
                                                  print id



Monday, April 8, 13
Pig
                                            posts = LOAD 'posts.tsv' AS (
                                                root_post_id:int,
                                                action:chararray
                                            );
            "Basically SQL" if you had to   likes = FILTER posts BY action=='like';
            explain it piece by piece.      grouped = GROUP likes BY root_post_id;

                                            counted = FOREACH grouped GENERATE
            "DataBag" == "DataFrame"            group AS root_post_id,
                                                COUNT(likes.root_post_id) AS likes;

                                            sorted = ORDER counted BY likes DESC;

                                            top10 = LIMIT sorted 10;

                                            STORE top10 INTO 'top10.csv';




Monday, April 8, 13
Extending Pig: Python UDFs
                                Extract word prefixes for type-
                                      ahead tag search

                                def prefixes(input, max_len=3):
                                    nchar = min(len(input), max_len) + 1
                                    return [input[:i] for i in range(1,nchar)]


                                 >>> prefixes('museum')
                                 ['m', 'mu', 'mus', 'muse', 'museu', 'museum']




Monday, April 8, 13
Extending Pig: Python UDFs
                                Extract word prefixes for type-
                                      ahead tag search

                                @outputSchema("t:(prefix:chararray)")
                                def prefixes(input, max_len=3):
                                    nchar = min(len(input), max_len) + 1
                                    return [input[:i] for i in range(1,nchar)]


                                 >>> prefixes('museum')
                                 ['m', 'mu', 'mus', 'muse', 'museu', 'museum']




Monday, April 8, 13
Extending Pig: Java UDFs
                package com.tumblr.swine;

                import java.util.ArrayList;
                import java.util.List;

                public class Prefixes {

                      private int maxTermLen;

                      public Prefixes() {
                          this.maxTermLen = Integer.MAX_VALUE;
                      }

                      public Prefixes(int maxTermLen) {
                          this.maxTermLen = maxTermLen;
                      }

                      public List<String> get(String s) {
                          int size = s.length() < maxTermLen ? s.length() : maxTermLen;
                          ArrayList<String> results = new ArrayList<String>();
                          for (int i=1; i < size + 1; i++) {
                              results.add(s.substring(0,i));
                          }
                          return results;
                      }
                }




Monday, April 8, 13
package com.tumblr.swine.pig;




                                    Extending Pig: Java UDFs
                                                                                      import java.io.IOException;
                                                                                      import java.util.ArrayList;

                                                                                      import java.util.List;

                                                                                      import   org.apache.pig.EvalFunc;
                                                                                      import   org.apache.pig.FuncSpec;
                                                                                      import   org.apache.pig.data.DataBag;
                                                                                      import   org.apache.pig.data.DataType;
                                                                                      import   org.apache.pig.data.DefaultBagFactory;
                                                                                      import   org.apache.pig.data.Tuple;
                package com.tumblr.swine;                                             import   org.apache.pig.data.TupleFactory;
                                                                                      import   org.apache.pig.impl.logicalLayer.FrontendException;
                                                                                      import   org.apache.pig.impl.logicalLayer.schema.Schema;

                import java.util.ArrayList;                                           public class Prefixes extends EvalFunc<DataBag> {

                import java.util.List;                                                    public DataBag exec(Tuple input) throws IOException {
                                                                                              if (input == null || input.size() == 0)
                                                                                                   return null;
                                                                                              try{
                public class Prefixes {                                                            DataBag output = DefaultBagFactory.getInstance().newDefaultBag();
                                                                                                   String word = (String)input.get(0);
                                                                                                   int max = Integer.MAX_VALUE;
                                                                                                   if (input.size() == 2) {
                      private int maxTermLen;                                                      }
                                                                                                       max = (Integer)input.get(1);

                                                                                                   com.tumblr.swine.Prefixes prefixes = new com.tumblr.swine.Prefixes(max);
                                                                                                   for (String prefix : prefixes.get(word)) {
                                                                                                       Tuple t = TupleFactory.getInstance().newTuple(1);
                      public Prefixes() {                                                              t.set(0, prefix);
                                                                                                       output.add(t);
                          this.maxTermLen = Integer.MAX_VALUE;                                     }
                                                                                                   return output;
                      }                                                                       }catch(Exception e){
                                                                                                   System.err.println("Prefixes: failed to process input; error - " + e.getMessage());
                                                                                                   return null;
                                                                                              }
                      public Prefixes(int maxTermLen) {                                   }

                          this.maxTermLen = maxTermLen;                                   @Override
                                                                                          public Schema outputSchema(Schema input) {
                      }                                                                       Schema bagSchema = new Schema();
                                                                                              bagSchema.add(new Schema.FieldSchema("prefix", DataType.CHARARRAY));
                                                                                              try{
                                                                                                   return new Schema(new Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), input),
                      public List<String> get(String s) {                                                  bagSchema, DataType.BAG));
                                                                                              }catch (FrontendException e){
                          int size = s.length() < maxTermLen ? s.length() : maxTermLen;       }
                                                                                                   return null;

                          ArrayList<String> results = new ArrayList<String>();            }

                          for (int i=1; i < size + 1; i++) {         @Override
                                                                     public List<FuncSpec> getArgToFuncMapping() throws FrontendException                      {
                              results.add(s.substring(0,i));             List<FuncSpec> funcSpecs = new ArrayList<FuncSpec>(2);
                                                                         Schema s = new Schema();
                                                                         s.add(new Schema.FieldSchema(null, DataType.CHARARRAY));
                          }                                              funcSpecs.add(new FuncSpec(this.getClass().getName(), s));
                                                                         // Allow specifying optional max length of prefix
                          return results;                                s = new Schema();
                                                                         s.add(new Schema.FieldSchema(null, DataType.CHARARRAY));
                      }                                                  s.add(new Schema.FieldSchema(null, DataType.INTEGER));
                                                                                                funcSpecs.add(new FuncSpec(this.getClass().getName(), s));
                }                                                                               return funcSpecs;
                                                                                          }

                                                                                      }




Monday, April 8, 13
HUE


       Keeps query history

       Preview tables / results

       Save queries & templates




Monday, April 8, 13
What tools we use

                      What we do with those tools




Monday, April 8, 13
Spam


                      Classic example of supervised learning

                      Don't get too clever

                      Build good tooling!




Monday, April 8, 13
Spam: Vowpal Wabbit
                 Online (continuously learning) system

                 Updates parameters with every new piece of information

                 Parallelizable, can run as service, very fast.

                 Loss functions:
                 •squared
                 •logistic
                 •hinge
                 •quantile
Monday, April 8, 13
Spam: Vowpal Wabbit
                                blog:           'adamlaiacano',
                      Post:     tags:           ['free ipad', 'warez'],
                                location:       'US~NY-New York',
                                is_suspended:   0 or 1



                      Model:   is_suspended ~ free_ipad + warez + US~NY-New_York + .....




                      Square loss function
                      Very high dimension: L1 regularization to avoid overfitting
                      Great precision, decent recall


Monday, April 8, 13
Type - Ahead search

                      Most popular tags for any letter combination

                      Store daily results in distributed Redis cluster

                 m:            [me, model, mine]
                 mu:           [muscle, muscles, music video]
                 mus:          [muscle, muscles, music video]
                 muse:         [muse, museum, nine muses]
                 museu:        [museum, metropolitan museum of art,
                                natural history museum]

Monday, April 8, 13
Type - Ahead search

                      Only keep popular prefixes: tag must occur 10 times

                      Only update keys that have changed.


                 - muse:        [muse, museum, nine muses]
                 + muse:        [muse, museum, arizona muse]



Monday, April 8, 13
Questions?



                             @adamlaiacano

                      http://adamlaiacano.tumblr.com




Monday, April 8, 13

Data Science at Tumblr

  • 1.
    Data at Tumblr Adam Laiacano NYC Data Science Meetup @adamlaiacano adamlaiacano.tumblr.com Monday, April 8, 13
  • 2.
    What I Neededto Learn When I Started My Job Monday, April 8, 13
  • 3.
    About Me Electrical Engineering background Worked at CBS to learn more about stats / data Joined Tumblr in August 2011 40th employee, now over 160 Monday, April 8, 13
  • 4.
    About Tumblr blogging platform / social network 100,000,000 blogs! unique signals: asynchronous following graph reblogs, likes, replies Monday, April 8, 13
  • 5.
    About You Country Month Value USA March 10000 USA April 12000 USA May 14000 Country March Apr May Canada March 7000 USA 10000 12000 14000 Canada April 6500 Canada 7000 6500 5000 Canada May 5000 France 1200 1400 2000 France March 1200 France April 1400 France May 2000 Monday, April 8, 13
  • 6.
    About You Country Month Value USA March 10000 USA April 12000 USA May 14000 Country March Apr May Canada March 7000 USA 10000 12000 14000 Canada April 6500 Canada 7000 6500 5000 Canada May 5000 France 1200 1400 2000 France March 1200 France April 1400 France May 2000 Pivot Table! Monday, April 8, 13
  • 7.
    About You Country Month Value USA March 1000 USA April 12000 Country March Apr May USA May 14000 Canada March 7000 USA 10000 12000 14000 Canada April 6500 Canada 7000 6500 5000 Canada May 5000 France 1200 1400 2000 France March 1200 France April 1400 France May 2000 Monday, April 8, 13
  • 8.
    About You Country Month Value USA March 1000 USA April 12000 Country March Apr May USA May 14000 Canada March 7000 USA 10000 12000 14000 Canada April 6500 Canada 7000 6500 5000 Canada May 5000 France 1200 1400 2000 France March 1200 France April 1400 France May 2000 pivoted <- cast(melted, country~month) melted <- melt.data.frame(pivoted, id.vars='country') Monday, April 8, 13
  • 9.
    About You Country Month Value USA March 1000 USA April 12000 USA May 14000 Country March Apr May Canada March 7000 USA 10000 12000 14000 Canada April 6500 Canada 7000 6500 5000 Canada May 5000 France 1200 1400 2000 France March 1200 France April 1400 France May 2000 Monday, April 8, 13
  • 10.
    About You Country Month Value USA March 1000 USA April 12000 USA May 14000 Country March Apr May Canada March 7000 USA 10000 12000 14000 Canada April 6500 Canada 7000 6500 5000 Canada May 5000 France 1200 1400 2000 France March 1200 France April 1400 France May 2000 Who Cares? Monday, April 8, 13
  • 11.
  • 12.
  • 13.
  • 14.
    What tools weuse What we do with those tools Monday, April 8, 13
  • 15.
    Plumbing John D. Cook "The plumber programmer" November 2011 http://bit.ly/XfcXrt Monday, April 8, 13
  • 16.
    Pipes 1. Record events / actions 2. Store / archive everything 3. Extract information a. Reports / BI b. Back to Tumblr application Monday, April 8, 13
  • 17.
    Step 1: LogEvents GiantOctopus: in-house event logging system. Built-in Variables •timestamp GiantOctopus::log( ‘posts’, •referring page array(‘send_to_fb’=>1, •user identifier ) ‘send_to_twitter’=>0 •action identifier ); •location (city) •language setting Monday, April 8, 13
  • 18.
    Scribe Web Servers Scribe Servers Continuously Daily HDFS Writing Cron Monday, April 8, 13
  • 19.
    Step 2: Storein Hadoop One huge computer: 300TB hard drive 7.8TB of RAM 85 x 2 = 170 hex-core processors Monday, April 8, 13
  • 20.
    Step 2: Storein Hadoop One huge computer: 300TB hard drive 7.8TB of RAM 85 x 2 = 170 hex-core processors One huge PITA: awful docs (search-hadoop.com helps) java everywhere fragmented community Monday, April 8, 13
  • 21.
    Hadoop hive pig map/reduce Monday, April 8, 13
  • 22.
    Hive "Basically SQL" 10 most liked posts Compiles to Java map/reduce SELECT About 100 hive tables root_post_id, count(*) AS likes FROM posts WHERE Each "table" is really a directory action='like' of flat files ORDER BY likes DESC LIMIIT 10; Monday, April 8, 13
  • 23.
    Hive Partitions File location in HDFS Hive partition value /posts/2013/03/26/*.lzo dt='2013-03-26' /posts/2013/03/27/*.lzo dt='2013-03-26' /posts/2013/03/28/*.lzo dt='2013-03-26' SELECT action, COUNT(*) AS views SELECT action, COUNT(*) AS views FROM pageviews FROM pageviews WHERE ts > 1330927200 WHERE dt = "2012-03-05" AND ts < 1331013600 GROUP BY action GROUP BY action 204 mappers 22,895 mappers Monday, April 8, 13
  • 24.
    Extending Hive: Streaming •Add all .py files you’ll need to the query •Sends each record to python script via stdin •Can be used as a subquery in a “normal” hive query #!/usr/bin/python add file helpers.py; ## helpers.py FROM import sys, re users gmail = re.compile(r'.+@gmail.com') SELECT for row in sys.stdin: TRANSFORM(id, email) id, email = row.split('t') USING 'helpers.py' if gmail.match(email): AS (id_with_gmail) print id Monday, April 8, 13
  • 25.
    Pig posts = LOAD 'posts.tsv' AS ( root_post_id:int, action:chararray ); "Basically SQL" if you had to likes = FILTER posts BY action=='like'; explain it piece by piece. grouped = GROUP likes BY root_post_id; counted = FOREACH grouped GENERATE "DataBag" == "DataFrame" group AS root_post_id, COUNT(likes.root_post_id) AS likes; sorted = ORDER counted BY likes DESC; top10 = LIMIT sorted 10; STORE top10 INTO 'top10.csv'; Monday, April 8, 13
  • 26.
    Extending Pig: PythonUDFs Extract word prefixes for type- ahead tag search def prefixes(input, max_len=3): nchar = min(len(input), max_len) + 1 return [input[:i] for i in range(1,nchar)] >>> prefixes('museum') ['m', 'mu', 'mus', 'muse', 'museu', 'museum'] Monday, April 8, 13
  • 27.
    Extending Pig: PythonUDFs Extract word prefixes for type- ahead tag search @outputSchema("t:(prefix:chararray)") def prefixes(input, max_len=3): nchar = min(len(input), max_len) + 1 return [input[:i] for i in range(1,nchar)] >>> prefixes('museum') ['m', 'mu', 'mus', 'muse', 'museu', 'museum'] Monday, April 8, 13
  • 28.
    Extending Pig: JavaUDFs package com.tumblr.swine; import java.util.ArrayList; import java.util.List; public class Prefixes { private int maxTermLen; public Prefixes() { this.maxTermLen = Integer.MAX_VALUE; } public Prefixes(int maxTermLen) { this.maxTermLen = maxTermLen; } public List<String> get(String s) { int size = s.length() < maxTermLen ? s.length() : maxTermLen; ArrayList<String> results = new ArrayList<String>(); for (int i=1; i < size + 1; i++) { results.add(s.substring(0,i)); } return results; } } Monday, April 8, 13
  • 29.
    package com.tumblr.swine.pig; Extending Pig: Java UDFs import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.pig.EvalFunc; import org.apache.pig.FuncSpec; import org.apache.pig.data.DataBag; import org.apache.pig.data.DataType; import org.apache.pig.data.DefaultBagFactory; import org.apache.pig.data.Tuple; package com.tumblr.swine; import org.apache.pig.data.TupleFactory; import org.apache.pig.impl.logicalLayer.FrontendException; import org.apache.pig.impl.logicalLayer.schema.Schema; import java.util.ArrayList; public class Prefixes extends EvalFunc<DataBag> { import java.util.List; public DataBag exec(Tuple input) throws IOException { if (input == null || input.size() == 0) return null; try{ public class Prefixes { DataBag output = DefaultBagFactory.getInstance().newDefaultBag(); String word = (String)input.get(0); int max = Integer.MAX_VALUE; if (input.size() == 2) { private int maxTermLen; } max = (Integer)input.get(1); com.tumblr.swine.Prefixes prefixes = new com.tumblr.swine.Prefixes(max); for (String prefix : prefixes.get(word)) { Tuple t = TupleFactory.getInstance().newTuple(1); public Prefixes() { t.set(0, prefix); output.add(t); this.maxTermLen = Integer.MAX_VALUE; } return output; } }catch(Exception e){ System.err.println("Prefixes: failed to process input; error - " + e.getMessage()); return null; } public Prefixes(int maxTermLen) { } this.maxTermLen = maxTermLen; @Override public Schema outputSchema(Schema input) { } Schema bagSchema = new Schema(); bagSchema.add(new Schema.FieldSchema("prefix", DataType.CHARARRAY)); try{ return new Schema(new Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), input), public List<String> get(String s) { bagSchema, DataType.BAG)); }catch (FrontendException e){ int size = s.length() < maxTermLen ? s.length() : maxTermLen; } return null; ArrayList<String> results = new ArrayList<String>(); } for (int i=1; i < size + 1; i++) { @Override public List<FuncSpec> getArgToFuncMapping() throws FrontendException { results.add(s.substring(0,i)); List<FuncSpec> funcSpecs = new ArrayList<FuncSpec>(2); Schema s = new Schema(); s.add(new Schema.FieldSchema(null, DataType.CHARARRAY)); } funcSpecs.add(new FuncSpec(this.getClass().getName(), s)); // Allow specifying optional max length of prefix return results; s = new Schema(); s.add(new Schema.FieldSchema(null, DataType.CHARARRAY)); } s.add(new Schema.FieldSchema(null, DataType.INTEGER)); funcSpecs.add(new FuncSpec(this.getClass().getName(), s)); } return funcSpecs; } } Monday, April 8, 13
  • 30.
    HUE Keeps query history Preview tables / results Save queries & templates Monday, April 8, 13
  • 31.
    What tools weuse What we do with those tools Monday, April 8, 13
  • 32.
    Spam Classic example of supervised learning Don't get too clever Build good tooling! Monday, April 8, 13
  • 33.
    Spam: Vowpal Wabbit Online (continuously learning) system Updates parameters with every new piece of information Parallelizable, can run as service, very fast. Loss functions: •squared •logistic •hinge •quantile Monday, April 8, 13
  • 34.
    Spam: Vowpal Wabbit blog: 'adamlaiacano', Post: tags: ['free ipad', 'warez'], location: 'US~NY-New York', is_suspended: 0 or 1 Model: is_suspended ~ free_ipad + warez + US~NY-New_York + ..... Square loss function Very high dimension: L1 regularization to avoid overfitting Great precision, decent recall Monday, April 8, 13
  • 35.
    Type - Aheadsearch Most popular tags for any letter combination Store daily results in distributed Redis cluster m: [me, model, mine] mu: [muscle, muscles, music video] mus: [muscle, muscles, music video] muse: [muse, museum, nine muses] museu: [museum, metropolitan museum of art, natural history museum] Monday, April 8, 13
  • 36.
    Type - Aheadsearch Only keep popular prefixes: tag must occur 10 times Only update keys that have changed. - muse: [muse, museum, nine muses] + muse: [muse, museum, arizona muse] Monday, April 8, 13
  • 37.
    Questions? @adamlaiacano http://adamlaiacano.tumblr.com Monday, April 8, 13