Flexible Indexing in Hadoop
         Dmitriy Ryaboy @squarecog
        Analytics Infrastructure @ Twitter
    Hadoop Summit, San Jose, CA June 2012
@JoinTheFlock | Hadoop Summit, June 14 2012   2
@JoinTheFlock | Hadoop Summit, June 14 2012   3
Hadoop is great at plowing
through data


                                                              @JoinTheFlock | Hadoop Summit, June 14 2012   4
       Image source: http://en.wikipedia.org/wiki/File:Snowplow_in_the_morning.jpg
And we do plow
   10s of Thousands of Jobs per day

100 TB (uncompressed) ingested daily

Many users and diverse use cases




                                       @JoinTheFlock | Hadoop Summit, June 14 2012   5
Looking for needles in
haystacks.




                                                         @JoinTheFlock | Hadoop Summit, June 14 2012   6

        Image Source: http://en.wikipedia.org/wiki/File:July_1903_-_on_the_Gaisberg,_nr_Salzburg.JPG
Looking for needles in
haystacks.




With snowplows.
                                                         @JoinTheFlock | Hadoop Summit, June 14 2012   6

        Image Source: http://en.wikipedia.org/wiki/File:July_1903_-_on_the_Gaisberg,_nr_Salzburg.JPG
A Pig Script
 event_logs = load '/logs/lots_of_data'
                     using ThriftPigLoader('thrift.gen.LogEvent');
 filtered_logs = filter event_logs by event == 'something_rare';


 -- Then do stuff.




90% of the mappers in this job output no data.
We can do better...


                                                   @JoinTheFlock | Hadoop Summit, June 14 2012   7
Find smaller haystacks.




                                                                     @JoinTheFlock | Hadoop Summit, June 14 2012   8
     Image Source: http://en.wikipedia.org/wiki/File:July_1903_-_on_the_Gaisberg,_nr_Salzburg.JPG
Use subpartitions!




                     @JoinTheFlock | Hadoop Summit, June 14 2012   9
Use subpartitions!
• tablename/year/month/day/hour/bucket




                                         @JoinTheFlock | Hadoop Summit, June 14 2012   9
Use subpartitions!
• tablename/year/month/day/hour/bucket
• Only so many things you can partition by




                                             @JoinTheFlock | Hadoop Summit, June 14 2012   9
Use subpartitions!
• tablename/year/month/day/hour/bucket
• Only so many things you can partition by
• Up-front planning required




                                             @JoinTheFlock | Hadoop Summit, June 14 2012   9
Use subpartitions!
• tablename/year/month/day/hour/bucket
• Only so many things you can partition by
• Up-front planning required
• Rewrite or duplicate for different query patterns




                                              @JoinTheFlock | Hadoop Summit, June 14 2012   9
Keep the data sorted!




                        @JoinTheFlock | Hadoop Summit, June 14 2012   10
Keep the data sorted!
• Painful to maintain




                        @JoinTheFlock | Hadoop Summit, June 14 2012   10
Keep the data sorted!
• Painful to maintain
• Only one sort order at a time




                                  @JoinTheFlock | Hadoop Summit, June 14 2012   10
Keep the data sorted!
• Painful to maintain
• Only one sort order at a time
• Rewrite or duplicate for different query patterns




                                              @JoinTheFlock | Hadoop Summit, June 14 2012   10
Trojan Layouts*




                  * http://infosys.uni-saarland.de/publications/JQD11.pdf
                                     @JoinTheFlock | Hadoop Summit, June 14 2012   11
Trojan Layouts*
• Identify interesting column groupings




                             * http://infosys.uni-saarland.de/publications/JQD11.pdf
                                                @JoinTheFlock | Hadoop Summit, June 14 2012   11
Trojan Layouts*
• Identify interesting column groupings
• Use different column groupings per HDFS block replica




                             * http://infosys.uni-saarland.de/publications/JQD11.pdf
                                                @JoinTheFlock | Hadoop Summit, June 14 2012   11
Trojan Layouts*
• Identify interesting column groupings
• Use different column groupings per HDFS block replica
• Requires changes to NN




                             * http://infosys.uni-saarland.de/publications/JQD11.pdf
                                                @JoinTheFlock | Hadoop Summit, June 14 2012   11
Trojan Layouts*
• Identify interesting column groupings
• Use different column groupings per HDFS block replica
• Requires changes to NN
• ... and increases load on NN




                             * http://infosys.uni-saarland.de/publications/JQD11.pdf
                                                @JoinTheFlock | Hadoop Summit, June 14 2012   11
HBase!




         @JoinTheFlock | Hadoop Summit, June 14 2012   12
HBase!
• Good solution in many cases!




                                 @JoinTheFlock | Hadoop Summit, June 14 2012   12
HBase!
• Good solution in many cases!
• Maintenance overhead




                                 @JoinTheFlock | Hadoop Summit, June 14 2012   12
HBase!
• Good solution in many cases!
• Maintenance overhead
• All data must live in HBase




                                 @JoinTheFlock | Hadoop Summit, June 14 2012   12
HBase!
• Good solution in many cases!
• Maintenance overhead
• All data must live in HBase
• Full table scans slower than MR




                                    @JoinTheFlock | Hadoop Summit, June 14 2012   12
HBase!
• Good solution in many cases!
• Maintenance overhead
• All data must live in HBase
• Full table scans slower than MR
• Again with the up-front design




                                    @JoinTheFlock | Hadoop Summit, June 14 2012   12
HBase!
• Good solution in many cases!
• Maintenance overhead
• All data must live in HBase
• Full table scans slower than MR
• Again with the up-front design
  • Secondary Indexes can help




                                    @JoinTheFlock | Hadoop Summit, June 14 2012   12
Hive!




        @JoinTheFlock | Hadoop Summit, June 14 2012   13
Hive!
• That kind of works, actually.




                                  @JoinTheFlock | Hadoop Summit, June 14 2012   13
Hive
Generic Interface for defining indexing behavior.


Reference implementation: “compact” index
 value -> list of HDFS blocks; drop unneeded blocks.


Other indexes available (bitmap in 0.8)


It’ll even update indexes as you add partitions.




                                              @JoinTheFlock | Hadoop Summit, June 14 2012   14
WIN!
Done, Right?




               @JoinTheFlock | Hadoop Summit, June 14 2012   15
Hive
Good news if your data is in Hive!


Bad news if your world is a little bigger.


Indexing is tightly coupled to Hive.


No interoperability with the rest of the Hadoop stack.




                                             @JoinTheFlock | Hadoop Summit, June 14 2012   16
Democracy of Tools




                                                                                 @JoinTheFlock | Hadoop Summit, June 14 2012   17
   Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
Democracy of Tools
• Pig




                                                                                      @JoinTheFlock | Hadoop Summit, June 14 2012   17
        Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
Democracy of Tools
• Pig
• Raw Map-Reduce




                                                                                   @JoinTheFlock | Hadoop Summit, June 14 2012   17
     Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
Democracy of Tools
• Pig
• Raw Map-Reduce
• Cascading DSLs (Scalding, Cascalog, Py-Cascading)




                                                                                    @JoinTheFlock | Hadoop Summit, June 14 2012   17
      Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
Democracy of Tools
• Pig
• Raw Map-Reduce
• Cascading DSLs (Scalding, Cascalog, Py-Cascading)
• Mahout




                                                                                    @JoinTheFlock | Hadoop Summit, June 14 2012   17
      Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
Democracy of Tools
• Pig
• Raw Map-Reduce
• Cascading DSLs (Scalding, Cascalog, Py-Cascading)
• Mahout
• Maybe even Hive



                                                                                    @JoinTheFlock | Hadoop Summit, June 14 2012   17
      Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
Design Goals




               @JoinTheFlock | Hadoop Summit, June 14 2012   18
Design Goals




               @JoinTheFlock | Hadoop Summit, June 14 2012   18
Design Goals

• Minimal Job/Script modification required




                                 @JoinTheFlock | Hadoop Summit, June 14 2012   18
Design Goals

• Minimal Job/Script modification required
• As low in the stack as possible




                                 @JoinTheFlock | Hadoop Summit, June 14 2012   18
Design Goals

• Minimal Job/Script modification required
• As low in the stack as possible
 • In fact, pretty sure we could get Hive to use this...




                                        @JoinTheFlock | Hadoop Summit, June 14 2012   18
Design Goals

• Minimal Job/Script modification required
• As low in the stack as possible
 • In fact, pretty sure we could get Hive to use this...
• No unnecessary copies of data




                                        @JoinTheFlock | Hadoop Summit, June 14 2012   18
Design Goals

• Minimal Job/Script modification required
• As low in the stack as possible
 • In fact, pretty sure we could get Hive to use this...
• No unnecessary copies of data
• Allow post-factum indexing




                                        @JoinTheFlock | Hadoop Summit, June 14 2012   18
Design Goals

• Minimal Job/Script modification required
• As low in the stack as possible
 • In fact, pretty sure we could get Hive to use this...
• No unnecessary copies of data
• Allow post-factum indexing
• Graceful degradation




                                        @JoinTheFlock | Hadoop Summit, June 14 2012   18
Design Goals

• Minimal Job/Script modification required
• As low in the stack as possible
 • In fact, pretty sure we could get Hive to use this...
• No unnecessary copies of data
• Allow post-factum indexing
• Graceful degradation
• Flexible on-disk representation


                                        @JoinTheFlock | Hadoop Summit, June 14 2012   18
Elephant-Twin
Twitter’s library for creating indexes in Hadoop
https://github.com/twitter/elephant-twin
https://github.com/twitter/elephant-twin-lzo




                                               @JoinTheFlock | Hadoop Summit, June 14 2012   19
Block-Level Indexes
For each value, record the block it occurs in


“Block” can be HDFS block (100s of MBs)
Or LZO block (100s of KBs)
Or SequenceFile block
Or RCFile block ...


Ignore irrelevant blocks
Scan relevant blocks using original InputFormat




                                                @JoinTheFlock | Hadoop Summit, June 14 2012   20
Record-Level Indexes
For each value, record some representation of the record


Can be value + offset, as in bitmap indexes
Can be transformed projection of records, as in Lucene indexes


Some queries can be answered directly from index.




                                              @JoinTheFlock | Hadoop Summit, June 14 2012   21
Indexing:
                 MR
                               Index
                 job
   InputFormat


                 Data



                        @JoinTheFlock | Hadoop Summit, June 14 2012   22
Creating an Index
     public abstract class AbstractBlockIndexingJob {
    protected abstract List<String> getInput();
    protected abstract String getIndex();
    protected abstract String getInputFormat();
    protected abstract String getValueClass();
    protected abstract String getColumnName();
    protected abstract Job setMapper(Job job);
}

public abstract class AbstractLuceneIndexingJob {
  // Similar.
}




                                            @JoinTheFlock | Hadoop Summit, June 14 2012   23
Creating an Index
Mapper transforms the records: emit <DocId, Value>
                     Key                           Value
                 Block Offset                 Column Value
                   Tweet Id                       Text


Block helper:
public abstract class BlockIndexingMapper<KIN, VIN> extends
Mapper<KIN, VIN, TextLongPairWritable, LongPairWritable> {}


Lucene helper:
public abstract class AbstractIndexingMapper<KIN, VIN, KOUT, VOUT>
extends Mapper<KIN, VIN, KOUT, VOUT>
  abstract protected boolean filter(KIN k, VIN v);
  abstract protected KOUT buildOutputKey(KIN k, VIN v);

                                          @JoinTheFlock | Hadoop Summit, June 14 2012   24
Creating an Index
Reducer writes appropriately processed indexes and metadata.


MapFile block index:
public class MapFileIndexingReducer
    extends Reducer<TextLongPairWritable, LongPairWritable,
                    Text, ListLongPair>

Lucene index:
public abstract class AbstractLuceneIndexingReducer<KIN, VIN>
    extends Reducer<KIN, VIN, NullWritable, NullWritable> {
  protected abstract Document buildDocument(KIN k, VIN v);
}




                                          @JoinTheFlock | Hadoop Summit, June 14 2012   25
Creating an Index: Metadata
struct FileIndexDescriptor {
    1: DocType docType
    2: IndexType indexType
    3: i32 indexVersion
    4: string sourcePath
    5: FileChecksum checksum
    6: list<IndexedField> indexedFields
}
struct ETwinIndexDescriptor {
    1: list<FileIndexDescriptor> fileIndexDescriptors
    2: i32 indexPart
    3: optional map<string, string> options
}
                                              @JoinTheFlock | Hadoop Summit, June 14 2012   26
MR
       job     searchKey



                    IndexedInputFormat

Retrieval:
                                Index




             Data



                           @JoinTheFlock | Hadoop Summit, June 14 2012   27
InputFormat
  public class BlockIndexedFileInputFormat<K, V> extends
FileInputFormat<K, V> {

    // Indexing jobs call this function to set up indexing job
related parameters.
    public static void setIndexOptions(Job job,
      String inputformatClass, String valueClass,
      String indexDir, String columnName)

    // Searching jobs call this function to set up searching job
related parameters.
    public static void setSearchOptions(Job job,
      String inputformatClass, String valueClass,
      String indexDir, BinaryExpression filter)
}




                                         @JoinTheFlock | Hadoop Summit, June 14 2012   28
BinaryExpression
  public BinaryExpression(
  Expression lhs, Expression rhs, OpType opType)

public static enum OpType {
    OP_PLUS (" + "),
    OP_MINUS(" - "),
    ...
    OP_EQ(" == "),
    OP_NE(" != "),
    ...
    OP_AND(" and "),
    OP_OR(" or "),
    ...
    TERM_COL(" Column "),
    TERM_CONST(" Constant ");
}



                                         @JoinTheFlock | Hadoop Summit, June 14 2012   29
Pig Integration
    event_logs = load '/logs/lots_of_data'
    using ThriftPigLoader(
	       'thrift.gen.LogEvent');
	
    filtered_logs = filter event_logs by event == 'something_rare';
    -- Then do stuff.




                                               @JoinTheFlock | Hadoop Summit, June 14 2012   30
Pig Integration
    register elephant-twin-1.0.jar
    event_logs = load '/logs/lots_of_data'
    using IndexedLZOPigLoader(
	      'ThriftPigLoader',
	      'thrift.gen.LogEvent',
	      '/user/dmitriy/etwin');
	
    -- Pig will automatically push this down into the Loader and InputFormat
    filtered_logs = filter event_logs by event == 'something_rare';




                                                      @JoinTheFlock | Hadoop Summit, June 14 2012   31
Optimization: merge neighbors
     HDFS Block 1        HDFS Block 2




                     @JoinTheFlock | Hadoop Summit, June 14 2012   32
Optimization: merge neighbors
           HDFS Block 1                       HDFS Block 2




Merge neighbors, share the scan.
(Limit expansion to size of HDFS block)


                                          @JoinTheFlock | Hadoop Summit, June 14 2012   33
Optimization: merge neighbors
            HDFS Block 1                           HDFS Block 2




Scans are faster than random reads.. allow gaps?
Turns out, not that much faster. Better to jump.


                                              @JoinTheFlock | Hadoop Summit, June 14 2012   34
Optimization: combine small splits
              HDFS Block 1                            HDFS Block 2




      match                                             match                          match




                                Generated Split


Combine small relevant spans into single splits.
Try to take locality into account.



                                                  @JoinTheFlock | Hadoop Summit, June 14 2012   35
Applicability
Most keys occur in very few blocks!
Most frequent key only occurs in half the blocks.




                                             @JoinTheFlock | Hadoop Summit, June 14 2012   36
Results
Applicable Jobs take 5-10x fewer resources


Ad-hoc jobs particularly likely to benefit


“Real” indexes still faster..
 -- but can be represented using the same abstraction




                                             @JoinTheFlock | Hadoop Summit, June 14 2012   37
Future Work




                                                                                @JoinTheFlock | Hadoop Summit, June 14 2012   38
   Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
Future Work


  • Regex matching on keys




                                                                                 @JoinTheFlock | Hadoop Summit, June 14 2012   38
    Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
Future Work


  • Regex matching on keys
  • Better Pig pushdown support




                                                                                 @JoinTheFlock | Hadoop Summit, June 14 2012   38
    Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
Future Work


  • Regex matching on keys
  • Better Pig pushdown support
  • MultiIndexInputFormat




                                                                                 @JoinTheFlock | Hadoop Summit, June 14 2012   38
    Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
Future Work


  • Regex matching on keys
  • Better Pig pushdown support
  • MultiIndexInputFormat
  • Traditional indexes under ETwin




                                                                                 @JoinTheFlock | Hadoop Summit, June 14 2012   38
    Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
Future Work


  • Regex matching on keys
  • Better Pig pushdown support
  • MultiIndexInputFormat
  • Traditional indexes under ETwin
  • Index maintenance (via HCatalog?)




                                                                                 @JoinTheFlock | Hadoop Summit, June 14 2012   38
    Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
Questions?
@squarecog


Sounds like fun? We are hiring.



                                  @JoinTheFlock | Hadoop Summit, June 14 2012   39

Flexible In-Situ Indexing for Hadoop via Elephant Twin

  • 1.
    Flexible Indexing inHadoop Dmitriy Ryaboy @squarecog Analytics Infrastructure @ Twitter Hadoop Summit, San Jose, CA June 2012
  • 2.
    @JoinTheFlock | HadoopSummit, June 14 2012 2
  • 3.
    @JoinTheFlock | HadoopSummit, June 14 2012 3
  • 4.
    Hadoop is greatat plowing through data @JoinTheFlock | Hadoop Summit, June 14 2012 4 Image source: http://en.wikipedia.org/wiki/File:Snowplow_in_the_morning.jpg
  • 5.
    And we doplow 10s of Thousands of Jobs per day 100 TB (uncompressed) ingested daily Many users and diverse use cases @JoinTheFlock | Hadoop Summit, June 14 2012 5
  • 6.
    Looking for needlesin haystacks. @JoinTheFlock | Hadoop Summit, June 14 2012 6 Image Source: http://en.wikipedia.org/wiki/File:July_1903_-_on_the_Gaisberg,_nr_Salzburg.JPG
  • 7.
    Looking for needlesin haystacks. With snowplows. @JoinTheFlock | Hadoop Summit, June 14 2012 6 Image Source: http://en.wikipedia.org/wiki/File:July_1903_-_on_the_Gaisberg,_nr_Salzburg.JPG
  • 8.
    A Pig Script event_logs = load '/logs/lots_of_data' using ThriftPigLoader('thrift.gen.LogEvent'); filtered_logs = filter event_logs by event == 'something_rare'; -- Then do stuff. 90% of the mappers in this job output no data. We can do better... @JoinTheFlock | Hadoop Summit, June 14 2012 7
  • 9.
    Find smaller haystacks. @JoinTheFlock | Hadoop Summit, June 14 2012 8 Image Source: http://en.wikipedia.org/wiki/File:July_1903_-_on_the_Gaisberg,_nr_Salzburg.JPG
  • 10.
    Use subpartitions! @JoinTheFlock | Hadoop Summit, June 14 2012 9
  • 11.
    Use subpartitions! • tablename/year/month/day/hour/bucket @JoinTheFlock | Hadoop Summit, June 14 2012 9
  • 12.
    Use subpartitions! • tablename/year/month/day/hour/bucket •Only so many things you can partition by @JoinTheFlock | Hadoop Summit, June 14 2012 9
  • 13.
    Use subpartitions! • tablename/year/month/day/hour/bucket •Only so many things you can partition by • Up-front planning required @JoinTheFlock | Hadoop Summit, June 14 2012 9
  • 14.
    Use subpartitions! • tablename/year/month/day/hour/bucket •Only so many things you can partition by • Up-front planning required • Rewrite or duplicate for different query patterns @JoinTheFlock | Hadoop Summit, June 14 2012 9
  • 15.
    Keep the datasorted! @JoinTheFlock | Hadoop Summit, June 14 2012 10
  • 16.
    Keep the datasorted! • Painful to maintain @JoinTheFlock | Hadoop Summit, June 14 2012 10
  • 17.
    Keep the datasorted! • Painful to maintain • Only one sort order at a time @JoinTheFlock | Hadoop Summit, June 14 2012 10
  • 18.
    Keep the datasorted! • Painful to maintain • Only one sort order at a time • Rewrite or duplicate for different query patterns @JoinTheFlock | Hadoop Summit, June 14 2012 10
  • 19.
    Trojan Layouts* * http://infosys.uni-saarland.de/publications/JQD11.pdf @JoinTheFlock | Hadoop Summit, June 14 2012 11
  • 20.
    Trojan Layouts* • Identifyinteresting column groupings * http://infosys.uni-saarland.de/publications/JQD11.pdf @JoinTheFlock | Hadoop Summit, June 14 2012 11
  • 21.
    Trojan Layouts* • Identifyinteresting column groupings • Use different column groupings per HDFS block replica * http://infosys.uni-saarland.de/publications/JQD11.pdf @JoinTheFlock | Hadoop Summit, June 14 2012 11
  • 22.
    Trojan Layouts* • Identifyinteresting column groupings • Use different column groupings per HDFS block replica • Requires changes to NN * http://infosys.uni-saarland.de/publications/JQD11.pdf @JoinTheFlock | Hadoop Summit, June 14 2012 11
  • 23.
    Trojan Layouts* • Identifyinteresting column groupings • Use different column groupings per HDFS block replica • Requires changes to NN • ... and increases load on NN * http://infosys.uni-saarland.de/publications/JQD11.pdf @JoinTheFlock | Hadoop Summit, June 14 2012 11
  • 24.
    HBase! @JoinTheFlock | Hadoop Summit, June 14 2012 12
  • 25.
    HBase! • Good solutionin many cases! @JoinTheFlock | Hadoop Summit, June 14 2012 12
  • 26.
    HBase! • Good solutionin many cases! • Maintenance overhead @JoinTheFlock | Hadoop Summit, June 14 2012 12
  • 27.
    HBase! • Good solutionin many cases! • Maintenance overhead • All data must live in HBase @JoinTheFlock | Hadoop Summit, June 14 2012 12
  • 28.
    HBase! • Good solutionin many cases! • Maintenance overhead • All data must live in HBase • Full table scans slower than MR @JoinTheFlock | Hadoop Summit, June 14 2012 12
  • 29.
    HBase! • Good solutionin many cases! • Maintenance overhead • All data must live in HBase • Full table scans slower than MR • Again with the up-front design @JoinTheFlock | Hadoop Summit, June 14 2012 12
  • 30.
    HBase! • Good solutionin many cases! • Maintenance overhead • All data must live in HBase • Full table scans slower than MR • Again with the up-front design • Secondary Indexes can help @JoinTheFlock | Hadoop Summit, June 14 2012 12
  • 31.
    Hive! @JoinTheFlock | Hadoop Summit, June 14 2012 13
  • 32.
    Hive! • That kindof works, actually. @JoinTheFlock | Hadoop Summit, June 14 2012 13
  • 33.
    Hive Generic Interface fordefining indexing behavior. Reference implementation: “compact” index value -> list of HDFS blocks; drop unneeded blocks. Other indexes available (bitmap in 0.8) It’ll even update indexes as you add partitions. @JoinTheFlock | Hadoop Summit, June 14 2012 14
  • 34.
    WIN! Done, Right? @JoinTheFlock | Hadoop Summit, June 14 2012 15
  • 35.
    Hive Good news ifyour data is in Hive! Bad news if your world is a little bigger. Indexing is tightly coupled to Hive. No interoperability with the rest of the Hadoop stack. @JoinTheFlock | Hadoop Summit, June 14 2012 16
  • 36.
    Democracy of Tools @JoinTheFlock | Hadoop Summit, June 14 2012 17 Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
  • 37.
    Democracy of Tools •Pig @JoinTheFlock | Hadoop Summit, June 14 2012 17 Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
  • 38.
    Democracy of Tools •Pig • Raw Map-Reduce @JoinTheFlock | Hadoop Summit, June 14 2012 17 Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
  • 39.
    Democracy of Tools •Pig • Raw Map-Reduce • Cascading DSLs (Scalding, Cascalog, Py-Cascading) @JoinTheFlock | Hadoop Summit, June 14 2012 17 Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
  • 40.
    Democracy of Tools •Pig • Raw Map-Reduce • Cascading DSLs (Scalding, Cascalog, Py-Cascading) • Mahout @JoinTheFlock | Hadoop Summit, June 14 2012 17 Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
  • 41.
    Democracy of Tools •Pig • Raw Map-Reduce • Cascading DSLs (Scalding, Cascalog, Py-Cascading) • Mahout • Maybe even Hive @JoinTheFlock | Hadoop Summit, June 14 2012 17 Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
  • 42.
    Design Goals @JoinTheFlock | Hadoop Summit, June 14 2012 18
  • 43.
    Design Goals @JoinTheFlock | Hadoop Summit, June 14 2012 18
  • 44.
    Design Goals • MinimalJob/Script modification required @JoinTheFlock | Hadoop Summit, June 14 2012 18
  • 45.
    Design Goals • MinimalJob/Script modification required • As low in the stack as possible @JoinTheFlock | Hadoop Summit, June 14 2012 18
  • 46.
    Design Goals • MinimalJob/Script modification required • As low in the stack as possible • In fact, pretty sure we could get Hive to use this... @JoinTheFlock | Hadoop Summit, June 14 2012 18
  • 47.
    Design Goals • MinimalJob/Script modification required • As low in the stack as possible • In fact, pretty sure we could get Hive to use this... • No unnecessary copies of data @JoinTheFlock | Hadoop Summit, June 14 2012 18
  • 48.
    Design Goals • MinimalJob/Script modification required • As low in the stack as possible • In fact, pretty sure we could get Hive to use this... • No unnecessary copies of data • Allow post-factum indexing @JoinTheFlock | Hadoop Summit, June 14 2012 18
  • 49.
    Design Goals • MinimalJob/Script modification required • As low in the stack as possible • In fact, pretty sure we could get Hive to use this... • No unnecessary copies of data • Allow post-factum indexing • Graceful degradation @JoinTheFlock | Hadoop Summit, June 14 2012 18
  • 50.
    Design Goals • MinimalJob/Script modification required • As low in the stack as possible • In fact, pretty sure we could get Hive to use this... • No unnecessary copies of data • Allow post-factum indexing • Graceful degradation • Flexible on-disk representation @JoinTheFlock | Hadoop Summit, June 14 2012 18
  • 51.
    Elephant-Twin Twitter’s library forcreating indexes in Hadoop https://github.com/twitter/elephant-twin https://github.com/twitter/elephant-twin-lzo @JoinTheFlock | Hadoop Summit, June 14 2012 19
  • 52.
    Block-Level Indexes For eachvalue, record the block it occurs in “Block” can be HDFS block (100s of MBs) Or LZO block (100s of KBs) Or SequenceFile block Or RCFile block ... Ignore irrelevant blocks Scan relevant blocks using original InputFormat @JoinTheFlock | Hadoop Summit, June 14 2012 20
  • 53.
    Record-Level Indexes For eachvalue, record some representation of the record Can be value + offset, as in bitmap indexes Can be transformed projection of records, as in Lucene indexes Some queries can be answered directly from index. @JoinTheFlock | Hadoop Summit, June 14 2012 21
  • 54.
    Indexing: MR Index job InputFormat Data @JoinTheFlock | Hadoop Summit, June 14 2012 22
  • 55.
    Creating an Index public abstract class AbstractBlockIndexingJob { protected abstract List<String> getInput(); protected abstract String getIndex(); protected abstract String getInputFormat(); protected abstract String getValueClass(); protected abstract String getColumnName(); protected abstract Job setMapper(Job job); } public abstract class AbstractLuceneIndexingJob { // Similar. } @JoinTheFlock | Hadoop Summit, June 14 2012 23
  • 56.
    Creating an Index Mappertransforms the records: emit <DocId, Value> Key Value Block Offset Column Value Tweet Id Text Block helper: public abstract class BlockIndexingMapper<KIN, VIN> extends Mapper<KIN, VIN, TextLongPairWritable, LongPairWritable> {} Lucene helper: public abstract class AbstractIndexingMapper<KIN, VIN, KOUT, VOUT> extends Mapper<KIN, VIN, KOUT, VOUT> abstract protected boolean filter(KIN k, VIN v); abstract protected KOUT buildOutputKey(KIN k, VIN v); @JoinTheFlock | Hadoop Summit, June 14 2012 24
  • 57.
    Creating an Index Reducerwrites appropriately processed indexes and metadata. MapFile block index: public class MapFileIndexingReducer extends Reducer<TextLongPairWritable, LongPairWritable, Text, ListLongPair> Lucene index: public abstract class AbstractLuceneIndexingReducer<KIN, VIN> extends Reducer<KIN, VIN, NullWritable, NullWritable> { protected abstract Document buildDocument(KIN k, VIN v); } @JoinTheFlock | Hadoop Summit, June 14 2012 25
  • 58.
    Creating an Index:Metadata struct FileIndexDescriptor { 1: DocType docType 2: IndexType indexType 3: i32 indexVersion 4: string sourcePath 5: FileChecksum checksum 6: list<IndexedField> indexedFields } struct ETwinIndexDescriptor { 1: list<FileIndexDescriptor> fileIndexDescriptors 2: i32 indexPart 3: optional map<string, string> options } @JoinTheFlock | Hadoop Summit, June 14 2012 26
  • 59.
    MR job searchKey IndexedInputFormat Retrieval: Index Data @JoinTheFlock | Hadoop Summit, June 14 2012 27
  • 60.
    InputFormat publicclass BlockIndexedFileInputFormat<K, V> extends FileInputFormat<K, V> { // Indexing jobs call this function to set up indexing job related parameters. public static void setIndexOptions(Job job, String inputformatClass, String valueClass, String indexDir, String columnName) // Searching jobs call this function to set up searching job related parameters. public static void setSearchOptions(Job job, String inputformatClass, String valueClass, String indexDir, BinaryExpression filter) } @JoinTheFlock | Hadoop Summit, June 14 2012 28
  • 61.
    BinaryExpression publicBinaryExpression( Expression lhs, Expression rhs, OpType opType) public static enum OpType { OP_PLUS (" + "), OP_MINUS(" - "), ... OP_EQ(" == "), OP_NE(" != "), ... OP_AND(" and "), OP_OR(" or "), ... TERM_COL(" Column "), TERM_CONST(" Constant "); } @JoinTheFlock | Hadoop Summit, June 14 2012 29
  • 62.
    Pig Integration event_logs = load '/logs/lots_of_data' using ThriftPigLoader( 'thrift.gen.LogEvent'); filtered_logs = filter event_logs by event == 'something_rare'; -- Then do stuff. @JoinTheFlock | Hadoop Summit, June 14 2012 30
  • 63.
    Pig Integration register elephant-twin-1.0.jar event_logs = load '/logs/lots_of_data' using IndexedLZOPigLoader( 'ThriftPigLoader', 'thrift.gen.LogEvent', '/user/dmitriy/etwin'); -- Pig will automatically push this down into the Loader and InputFormat filtered_logs = filter event_logs by event == 'something_rare'; @JoinTheFlock | Hadoop Summit, June 14 2012 31
  • 64.
    Optimization: merge neighbors HDFS Block 1 HDFS Block 2 @JoinTheFlock | Hadoop Summit, June 14 2012 32
  • 65.
    Optimization: merge neighbors HDFS Block 1 HDFS Block 2 Merge neighbors, share the scan. (Limit expansion to size of HDFS block) @JoinTheFlock | Hadoop Summit, June 14 2012 33
  • 66.
    Optimization: merge neighbors HDFS Block 1 HDFS Block 2 Scans are faster than random reads.. allow gaps? Turns out, not that much faster. Better to jump. @JoinTheFlock | Hadoop Summit, June 14 2012 34
  • 67.
    Optimization: combine smallsplits HDFS Block 1 HDFS Block 2 match match match Generated Split Combine small relevant spans into single splits. Try to take locality into account. @JoinTheFlock | Hadoop Summit, June 14 2012 35
  • 68.
    Applicability Most keys occurin very few blocks! Most frequent key only occurs in half the blocks. @JoinTheFlock | Hadoop Summit, June 14 2012 36
  • 69.
    Results Applicable Jobs take5-10x fewer resources Ad-hoc jobs particularly likely to benefit “Real” indexes still faster.. -- but can be represented using the same abstraction @JoinTheFlock | Hadoop Summit, June 14 2012 37
  • 70.
    Future Work @JoinTheFlock | Hadoop Summit, June 14 2012 38 Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
  • 71.
    Future Work • Regex matching on keys @JoinTheFlock | Hadoop Summit, June 14 2012 38 Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
  • 72.
    Future Work • Regex matching on keys • Better Pig pushdown support @JoinTheFlock | Hadoop Summit, June 14 2012 38 Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
  • 73.
    Future Work • Regex matching on keys • Better Pig pushdown support • MultiIndexInputFormat @JoinTheFlock | Hadoop Summit, June 14 2012 38 Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
  • 74.
    Future Work • Regex matching on keys • Better Pig pushdown support • MultiIndexInputFormat • Traditional indexes under ETwin @JoinTheFlock | Hadoop Summit, June 14 2012 38 Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
  • 75.
    Future Work • Regex matching on keys • Better Pig pushdown support • MultiIndexInputFormat • Traditional indexes under ETwin • Index maintenance (via HCatalog?) @JoinTheFlock | Hadoop Summit, June 14 2012 38 Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
  • 76.
    Questions? @squarecog Sounds like fun?We are hiring. @JoinTheFlock | Hadoop Summit, June 14 2012 39