Flexible In-Situ Indexing for Hadoop via Elephant Twin
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Flexible In-Situ Indexing for Hadoop via Elephant Twin

on

  • 8,837 views

Slides from the Hadoop Summit 2012 presentation.

Slides from the Hadoop Summit 2012 presentation.

Statistics

Views

Total Views
8,837
Views on SlideShare
7,117
Embed Views
1,720

Actions

Likes
20
Downloads
158
Comments
0

12 Embeds 1,720

http://www.conseilsmarketing.com 1504
http://www.scoop.it 129
http://marilson.pbworks.com 38
http://eventifier.co 20
http://us-w1.rockmelt.com 7
http://www.twylah.com 7
http://tweetedtimes.com 5
https://si0.twimg.com 5
https://twitter.com 2
http://www.alertize.com 1
http://pigeindexermau 1
http://pigeindexeroff2 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Flexible In-Situ Indexing for Hadoop via Elephant Twin Presentation Transcript

  • 1. Flexible Indexing in Hadoop Dmitriy Ryaboy @squarecog Analytics Infrastructure @ Twitter Hadoop Summit, San Jose, CA June 2012
  • 2. @JoinTheFlock | Hadoop Summit, June 14 2012 2
  • 3. @JoinTheFlock | Hadoop Summit, June 14 2012 3
  • 4. Hadoop is great at plowingthrough data @JoinTheFlock | Hadoop Summit, June 14 2012 4 Image source: http://en.wikipedia.org/wiki/File:Snowplow_in_the_morning.jpg
  • 5. And we do plow 10s of Thousands of Jobs per day100 TB (uncompressed) ingested dailyMany users and diverse use cases @JoinTheFlock | Hadoop Summit, June 14 2012 5
  • 6. Looking for needles inhaystacks. @JoinTheFlock | Hadoop Summit, June 14 2012 6 Image Source: http://en.wikipedia.org/wiki/File:July_1903_-_on_the_Gaisberg,_nr_Salzburg.JPG
  • 7. Looking for needles inhaystacks.With snowplows. @JoinTheFlock | Hadoop Summit, June 14 2012 6 Image Source: http://en.wikipedia.org/wiki/File:July_1903_-_on_the_Gaisberg,_nr_Salzburg.JPG
  • 8. A Pig Script event_logs = load /logs/lots_of_data using ThriftPigLoader(thrift.gen.LogEvent); filtered_logs = filter event_logs by event == something_rare; -- Then do stuff.90% of the mappers in this job output no data.We can do better... @JoinTheFlock | Hadoop Summit, June 14 2012 7
  • 9. Find smaller haystacks. @JoinTheFlock | Hadoop Summit, June 14 2012 8 Image Source: http://en.wikipedia.org/wiki/File:July_1903_-_on_the_Gaisberg,_nr_Salzburg.JPG
  • 10. Use subpartitions! @JoinTheFlock | Hadoop Summit, June 14 2012 9
  • 11. Use subpartitions!• tablename/year/month/day/hour/bucket @JoinTheFlock | Hadoop Summit, June 14 2012 9
  • 12. Use subpartitions!• tablename/year/month/day/hour/bucket• Only so many things you can partition by @JoinTheFlock | Hadoop Summit, June 14 2012 9
  • 13. Use subpartitions!• tablename/year/month/day/hour/bucket• Only so many things you can partition by• Up-front planning required @JoinTheFlock | Hadoop Summit, June 14 2012 9
  • 14. Use subpartitions!• tablename/year/month/day/hour/bucket• Only so many things you can partition by• Up-front planning required• Rewrite or duplicate for different query patterns @JoinTheFlock | Hadoop Summit, June 14 2012 9
  • 15. Keep the data sorted! @JoinTheFlock | Hadoop Summit, June 14 2012 10
  • 16. Keep the data sorted!• Painful to maintain @JoinTheFlock | Hadoop Summit, June 14 2012 10
  • 17. Keep the data sorted!• Painful to maintain• Only one sort order at a time @JoinTheFlock | Hadoop Summit, June 14 2012 10
  • 18. Keep the data sorted!• Painful to maintain• Only one sort order at a time• Rewrite or duplicate for different query patterns @JoinTheFlock | Hadoop Summit, June 14 2012 10
  • 19. Trojan Layouts* * http://infosys.uni-saarland.de/publications/JQD11.pdf @JoinTheFlock | Hadoop Summit, June 14 2012 11
  • 20. Trojan Layouts*• Identify interesting column groupings * http://infosys.uni-saarland.de/publications/JQD11.pdf @JoinTheFlock | Hadoop Summit, June 14 2012 11
  • 21. Trojan Layouts*• Identify interesting column groupings• Use different column groupings per HDFS block replica * http://infosys.uni-saarland.de/publications/JQD11.pdf @JoinTheFlock | Hadoop Summit, June 14 2012 11
  • 22. Trojan Layouts*• Identify interesting column groupings• Use different column groupings per HDFS block replica• Requires changes to NN * http://infosys.uni-saarland.de/publications/JQD11.pdf @JoinTheFlock | Hadoop Summit, June 14 2012 11
  • 23. Trojan Layouts*• Identify interesting column groupings• Use different column groupings per HDFS block replica• Requires changes to NN• ... and increases load on NN * http://infosys.uni-saarland.de/publications/JQD11.pdf @JoinTheFlock | Hadoop Summit, June 14 2012 11
  • 24. HBase! @JoinTheFlock | Hadoop Summit, June 14 2012 12
  • 25. HBase!• Good solution in many cases! @JoinTheFlock | Hadoop Summit, June 14 2012 12
  • 26. HBase!• Good solution in many cases!• Maintenance overhead @JoinTheFlock | Hadoop Summit, June 14 2012 12
  • 27. HBase!• Good solution in many cases!• Maintenance overhead• All data must live in HBase @JoinTheFlock | Hadoop Summit, June 14 2012 12
  • 28. HBase!• Good solution in many cases!• Maintenance overhead• All data must live in HBase• Full table scans slower than MR @JoinTheFlock | Hadoop Summit, June 14 2012 12
  • 29. HBase!• Good solution in many cases!• Maintenance overhead• All data must live in HBase• Full table scans slower than MR• Again with the up-front design @JoinTheFlock | Hadoop Summit, June 14 2012 12
  • 30. HBase!• Good solution in many cases!• Maintenance overhead• All data must live in HBase• Full table scans slower than MR• Again with the up-front design • Secondary Indexes can help @JoinTheFlock | Hadoop Summit, June 14 2012 12
  • 31. Hive! @JoinTheFlock | Hadoop Summit, June 14 2012 13
  • 32. Hive!• That kind of works, actually. @JoinTheFlock | Hadoop Summit, June 14 2012 13
  • 33. HiveGeneric Interface for defining indexing behavior.Reference implementation: “compact” index value -> list of HDFS blocks; drop unneeded blocks.Other indexes available (bitmap in 0.8)It’ll even update indexes as you add partitions. @JoinTheFlock | Hadoop Summit, June 14 2012 14
  • 34. WIN!Done, Right? @JoinTheFlock | Hadoop Summit, June 14 2012 15
  • 35. HiveGood news if your data is in Hive!Bad news if your world is a little bigger.Indexing is tightly coupled to Hive.No interoperability with the rest of the Hadoop stack. @JoinTheFlock | Hadoop Summit, June 14 2012 16
  • 36. Democracy of Tools @JoinTheFlock | Hadoop Summit, June 14 2012 17 Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
  • 37. Democracy of Tools• Pig @JoinTheFlock | Hadoop Summit, June 14 2012 17 Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
  • 38. Democracy of Tools• Pig• Raw Map-Reduce @JoinTheFlock | Hadoop Summit, June 14 2012 17 Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
  • 39. Democracy of Tools• Pig• Raw Map-Reduce• Cascading DSLs (Scalding, Cascalog, Py-Cascading) @JoinTheFlock | Hadoop Summit, June 14 2012 17 Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
  • 40. Democracy of Tools• Pig• Raw Map-Reduce• Cascading DSLs (Scalding, Cascalog, Py-Cascading)• Mahout @JoinTheFlock | Hadoop Summit, June 14 2012 17 Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
  • 41. Democracy of Tools• Pig• Raw Map-Reduce• Cascading DSLs (Scalding, Cascalog, Py-Cascading)• Mahout• Maybe even Hive @JoinTheFlock | Hadoop Summit, June 14 2012 17 Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg
  • 42. Design Goals @JoinTheFlock | Hadoop Summit, June 14 2012 18
  • 43. Design Goals @JoinTheFlock | Hadoop Summit, June 14 2012 18
  • 44. Design Goals• Minimal Job/Script modification required @JoinTheFlock | Hadoop Summit, June 14 2012 18
  • 45. Design Goals• Minimal Job/Script modification required• As low in the stack as possible @JoinTheFlock | Hadoop Summit, June 14 2012 18
  • 46. Design Goals• Minimal Job/Script modification required• As low in the stack as possible • In fact, pretty sure we could get Hive to use this... @JoinTheFlock | Hadoop Summit, June 14 2012 18
  • 47. Design Goals• Minimal Job/Script modification required• As low in the stack as possible • In fact, pretty sure we could get Hive to use this...• No unnecessary copies of data @JoinTheFlock | Hadoop Summit, June 14 2012 18
  • 48. Design Goals• Minimal Job/Script modification required• As low in the stack as possible • In fact, pretty sure we could get Hive to use this...• No unnecessary copies of data• Allow post-factum indexing @JoinTheFlock | Hadoop Summit, June 14 2012 18
  • 49. Design Goals• Minimal Job/Script modification required• As low in the stack as possible • In fact, pretty sure we could get Hive to use this...• No unnecessary copies of data• Allow post-factum indexing• Graceful degradation @JoinTheFlock | Hadoop Summit, June 14 2012 18
  • 50. Design Goals• Minimal Job/Script modification required• As low in the stack as possible • In fact, pretty sure we could get Hive to use this...• No unnecessary copies of data• Allow post-factum indexing• Graceful degradation• Flexible on-disk representation @JoinTheFlock | Hadoop Summit, June 14 2012 18
  • 51. Elephant-TwinTwitter’s library for creating indexes in Hadoophttps://github.com/twitter/elephant-twinhttps://github.com/twitter/elephant-twin-lzo @JoinTheFlock | Hadoop Summit, June 14 2012 19
  • 52. Block-Level IndexesFor each value, record the block it occurs in“Block” can be HDFS block (100s of MBs)Or LZO block (100s of KBs)Or SequenceFile blockOr RCFile block ...Ignore irrelevant blocksScan relevant blocks using original InputFormat @JoinTheFlock | Hadoop Summit, June 14 2012 20
  • 53. Record-Level IndexesFor each value, record some representation of the recordCan be value + offset, as in bitmap indexesCan be transformed projection of records, as in Lucene indexesSome queries can be answered directly from index. @JoinTheFlock | Hadoop Summit, June 14 2012 21
  • 54. Indexing: MR Index job InputFormat Data @JoinTheFlock | Hadoop Summit, June 14 2012 22
  • 55. Creating an Index public abstract class AbstractBlockIndexingJob { protected abstract List<String> getInput(); protected abstract String getIndex(); protected abstract String getInputFormat(); protected abstract String getValueClass(); protected abstract String getColumnName(); protected abstract Job setMapper(Job job);}public abstract class AbstractLuceneIndexingJob { // Similar.} @JoinTheFlock | Hadoop Summit, June 14 2012 23
  • 56. Creating an IndexMapper transforms the records: emit <DocId, Value> Key Value Block Offset Column Value Tweet Id TextBlock helper:public abstract class BlockIndexingMapper<KIN, VIN> extendsMapper<KIN, VIN, TextLongPairWritable, LongPairWritable> {}Lucene helper:public abstract class AbstractIndexingMapper<KIN, VIN, KOUT, VOUT>extends Mapper<KIN, VIN, KOUT, VOUT> abstract protected boolean filter(KIN k, VIN v); abstract protected KOUT buildOutputKey(KIN k, VIN v); @JoinTheFlock | Hadoop Summit, June 14 2012 24
  • 57. Creating an IndexReducer writes appropriately processed indexes and metadata.MapFile block index:public class MapFileIndexingReducer extends Reducer<TextLongPairWritable, LongPairWritable, Text, ListLongPair>Lucene index:public abstract class AbstractLuceneIndexingReducer<KIN, VIN> extends Reducer<KIN, VIN, NullWritable, NullWritable> { protected abstract Document buildDocument(KIN k, VIN v);} @JoinTheFlock | Hadoop Summit, June 14 2012 25
  • 58. Creating an Index: Metadatastruct FileIndexDescriptor { 1: DocType docType 2: IndexType indexType 3: i32 indexVersion 4: string sourcePath 5: FileChecksum checksum 6: list<IndexedField> indexedFields}struct ETwinIndexDescriptor { 1: list<FileIndexDescriptor> fileIndexDescriptors 2: i32 indexPart 3: optional map<string, string> options} @JoinTheFlock | Hadoop Summit, June 14 2012 26
  • 59. MR job searchKey IndexedInputFormatRetrieval: Index Data @JoinTheFlock | Hadoop Summit, June 14 2012 27
  • 60. InputFormat public class BlockIndexedFileInputFormat<K, V> extendsFileInputFormat<K, V> { // Indexing jobs call this function to set up indexing jobrelated parameters. public static void setIndexOptions(Job job, String inputformatClass, String valueClass, String indexDir, String columnName) // Searching jobs call this function to set up searching jobrelated parameters. public static void setSearchOptions(Job job, String inputformatClass, String valueClass, String indexDir, BinaryExpression filter)} @JoinTheFlock | Hadoop Summit, June 14 2012 28
  • 61. BinaryExpression public BinaryExpression( Expression lhs, Expression rhs, OpType opType)public static enum OpType { OP_PLUS (" + "), OP_MINUS(" - "), ... OP_EQ(" == "), OP_NE(" != "), ... OP_AND(" and "), OP_OR(" or "), ... TERM_COL(" Column "), TERM_CONST(" Constant ");} @JoinTheFlock | Hadoop Summit, June 14 2012 29
  • 62. Pig Integration event_logs = load /logs/lots_of_data using ThriftPigLoader( thrift.gen.LogEvent); filtered_logs = filter event_logs by event == something_rare; -- Then do stuff. @JoinTheFlock | Hadoop Summit, June 14 2012 30
  • 63. Pig Integration register elephant-twin-1.0.jar event_logs = load /logs/lots_of_data using IndexedLZOPigLoader( ThriftPigLoader, thrift.gen.LogEvent, /user/dmitriy/etwin); -- Pig will automatically push this down into the Loader and InputFormat filtered_logs = filter event_logs by event == something_rare; @JoinTheFlock | Hadoop Summit, June 14 2012 31
  • 64. Optimization: merge neighbors HDFS Block 1 HDFS Block 2 @JoinTheFlock | Hadoop Summit, June 14 2012 32
  • 65. Optimization: merge neighbors HDFS Block 1 HDFS Block 2Merge neighbors, share the scan.(Limit expansion to size of HDFS block) @JoinTheFlock | Hadoop Summit, June 14 2012 33
  • 66. Optimization: merge neighbors HDFS Block 1 HDFS Block 2Scans are faster than random reads.. allow gaps?Turns out, not that much faster. Better to jump. @JoinTheFlock | Hadoop Summit, June 14 2012 34
  • 67. Optimization: combine small splits HDFS Block 1 HDFS Block 2 match match match Generated SplitCombine small relevant spans into single splits.Try to take locality into account. @JoinTheFlock | Hadoop Summit, June 14 2012 35
  • 68. ApplicabilityMost keys occur in very few blocks!Most frequent key only occurs in half the blocks. @JoinTheFlock | Hadoop Summit, June 14 2012 36
  • 69. ResultsApplicable Jobs take 5-10x fewer resourcesAd-hoc jobs particularly likely to benefit“Real” indexes still faster.. -- but can be represented using the same abstraction @JoinTheFlock | Hadoop Summit, June 14 2012 37
  • 70. Future Work @JoinTheFlock | Hadoop Summit, June 14 2012 38 Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
  • 71. Future Work • Regex matching on keys @JoinTheFlock | Hadoop Summit, June 14 2012 38 Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
  • 72. Future Work • Regex matching on keys • Better Pig pushdown support @JoinTheFlock | Hadoop Summit, June 14 2012 38 Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
  • 73. Future Work • Regex matching on keys • Better Pig pushdown support • MultiIndexInputFormat @JoinTheFlock | Hadoop Summit, June 14 2012 38 Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
  • 74. Future Work • Regex matching on keys • Better Pig pushdown support • MultiIndexInputFormat • Traditional indexes under ETwin @JoinTheFlock | Hadoop Summit, June 14 2012 38 Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
  • 75. Future Work • Regex matching on keys • Better Pig pushdown support • MultiIndexInputFormat • Traditional indexes under ETwin • Index maintenance (via HCatalog?) @JoinTheFlock | Hadoop Summit, June 14 2012 38 Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg
  • 76. Questions?@squarecogSounds like fun? We are hiring. @JoinTheFlock | Hadoop Summit, June 14 2012 39