Flexible In-Situ Indexing for Hadoop via Elephant Twin

Flexible Indexing in Hadoop
Dmitriy Ryaboy @squarecog
Analytics Infrastructure @ Twitter
Hadoop Summit, San Jose, CA June 2012

@JoinTheFlock | Hadoop Summit, June 14 2012 2

Hadoop is great at plowing
through data

Image source: http://en.wikipedia.org/wiki/File:Snowplow_in_the_morning.jpg

And we do plow
10s of Thousands of Jobs per day

100 TB (uncompressed) ingested daily

Many users and diverse use cases


Looking for needles in
haystacks.


Image Source: http://en.wikipedia.org/wiki/File:July_1903_-_on_the_Gaisberg,_nr_Salzburg.JPG

Looking for needles in
haystacks.

With snowplows.


A Pig Script
event_logs = load '/logs/lots_of_data'
using ThriftPigLoader('thrift.gen.LogEvent');
ﬁltered_logs = ﬁlter event_logs by event == 'something_rare';

-- Then do stuff.

90% of the mappers in this job output no data.
We can do better...


Find smaller haystacks.


Use subpartitions!


Use subpartitions!
• tablename/year/month/day/hour/bucket


Use subpartitions!
• Only so many things you can partition by


Use subpartitions!
• Up-front planning required


Use subpartitions!
• Up-front planning required
• Rewrite or duplicate for different query patterns


Keep the data sorted!


• Painful to maintain


• Only one sort order at a time


• Only one sort order at a time
• Rewrite or duplicate for different query patterns


Trojan Layouts*

* http://infosys.uni-saarland.de/publications/JQD11.pdf

Trojan Layouts*
• Identify interesting column groupings


Trojan Layouts*
• Use different column groupings per HDFS block replica


Trojan Layouts*
• Requires changes to NN


Trojan Layouts*
• Requires changes to NN
• ... and increases load on NN


HBase!


HBase!
• Good solution in many cases!


HBase!
• Maintenance overhead


HBase!
• All data must live in HBase


HBase!
• Full table scans slower than MR


HBase!
• Again with the up-front design


HBase!
• Again with the up-front design
• Secondary Indexes can help


Hive!


Hive!
• That kind of works, actually.


Hive
Generic Interface for deﬁning indexing behavior.

Reference implementation: “compact” index
value -> list of HDFS blocks; drop unneeded blocks.

Other indexes available (bitmap in 0.8)

It’ll even update indexes as you add partitions.


WIN!
Done, Right?


Hive
Good news if your data is in Hive!

Bad news if your world is a little bigger.

Indexing is tightly coupled to Hive.

No interoperability with the rest of the Hadoop stack.


Democracy of Tools

Image Source: http://en.wikipedia.org/wiki/File:20070124_sejm_sala_plenarna.jpg

Democracy of Tools
• Pig


Democracy of Tools
• Pig
• Raw Map-Reduce


Democracy of Tools
• Pig
• Raw Map-Reduce
• Cascading DSLs (Scalding, Cascalog, Py-Cascading)


Democracy of Tools
• Pig
• Raw Map-Reduce
• Mahout


Democracy of Tools
• Pig
• Raw Map-Reduce
• Mahout
• Maybe even Hive


Design Goals


Design Goals

• Minimal Job/Script modiﬁcation required


Design Goals

• As low in the stack as possible


Design Goals

• In fact, pretty sure we could get Hive to use this...


Design Goals

• No unnecessary copies of data


Design Goals

• Allow post-factum indexing


Design Goals

• Graceful degradation


Design Goals

• Graceful degradation
• Flexible on-disk representation


Elephant-Twin
Twitter’s library for creating indexes in Hadoop
https://github.com/twitter/elephant-twin
https://github.com/twitter/elephant-twin-lzo


Block-Level Indexes
For each value, record the block it occurs in

“Block” can be HDFS block (100s of MBs)
Or LZO block (100s of KBs)
Or SequenceFile block
Or RCFile block ...

Ignore irrelevant blocks
Scan relevant blocks using original InputFormat


Record-Level Indexes
For each value, record some representation of the record

Can be value + offset, as in bitmap indexes
Can be transformed projection of records, as in Lucene indexes

Some queries can be answered directly from index.


Indexing:
MR
Index
job
InputFormat

Data


Creating an Index
public abstract class AbstractBlockIndexingJob {
protected abstract List<String> getInput();
protected abstract String getIndex();
protected abstract String getInputFormat();
protected abstract String getValueClass();
protected abstract String getColumnName();
protected abstract Job setMapper(Job job);
}

public abstract class AbstractLuceneIndexingJob {
// Similar.
}


Creating an Index
Mapper transforms the records: emit <DocId, Value>
Key Value
Block Offset Column Value
Tweet Id Text

Block helper:
public abstract class BlockIndexingMapper<KIN, VIN> extends
Mapper<KIN, VIN, TextLongPairWritable, LongPairWritable> {}

Lucene helper:
public abstract class AbstractIndexingMapper<KIN, VIN, KOUT, VOUT>
extends Mapper<KIN, VIN, KOUT, VOUT>
abstract protected boolean filter(KIN k, VIN v);
abstract protected KOUT buildOutputKey(KIN k, VIN v);


Creating an Index
Reducer writes appropriately processed indexes and metadata.

MapFile block index:
public class MapFileIndexingReducer
extends Reducer<TextLongPairWritable, LongPairWritable,
Text, ListLongPair>

Lucene index:
public abstract class AbstractLuceneIndexingReducer<KIN, VIN>
extends Reducer<KIN, VIN, NullWritable, NullWritable> {
protected abstract Document buildDocument(KIN k, VIN v);
}


Creating an Index: Metadata
struct FileIndexDescriptor {
1: DocType docType
2: IndexType indexType
3: i32 indexVersion
4: string sourcePath
5: FileChecksum checksum
6: list<IndexedField> indexedFields
}
struct ETwinIndexDescriptor {
1: list<FileIndexDescriptor> fileIndexDescriptors
2: i32 indexPart
3: optional map<string, string> options
}

MR
job searchKey

IndexedInputFormat

Retrieval:
Index

Data


InputFormat
public class BlockIndexedFileInputFormat<K, V> extends
FileInputFormat<K, V> {

// Indexing jobs call this function to set up indexing job
related parameters.
public static void setIndexOptions(Job job,
String inputformatClass, String valueClass,
String indexDir, String columnName)

// Searching jobs call this function to set up searching job
related parameters.
public static void setSearchOptions(Job job,
String inputformatClass, String valueClass,
String indexDir, BinaryExpression filter)
}


BinaryExpression
public BinaryExpression(
Expression lhs, Expression rhs, OpType opType)

public static enum OpType {
OP_PLUS (" + "),
OP_MINUS(" - "),
...
OP_EQ(" == "),
OP_NE(" != "),
...
OP_AND(" and "),
OP_OR(" or "),
...
TERM_COL(" Column "),
TERM_CONST(" Constant ");
}


Pig Integration
using ThriftPigLoader(
'thrift.gen.LogEvent');

-- Then do stuff.


Pig Integration
register elephant-twin-1.0.jar
using IndexedLZOPigLoader(
'ThriftPigLoader',
'thrift.gen.LogEvent',
'/user/dmitriy/etwin');

-- Pig will automatically push this down into the Loader and InputFormat


Optimization: merge neighbors
HDFS Block 1 HDFS Block 2



Merge neighbors, share the scan.
(Limit expansion to size of HDFS block)



Scans are faster than random reads.. allow gaps?
Turns out, not that much faster. Better to jump.


Optimization: combine small splits

match match match

Generated Split

Combine small relevant spans into single splits.
Try to take locality into account.


Applicability
Most keys occur in very few blocks!
Most frequent key only occurs in half the blocks.


Results
Applicable Jobs take 5-10x fewer resources

Ad-hoc jobs particularly likely to beneﬁt

“Real” indexes still faster..
-- but can be represented using the same abstraction


Future Work

Image Source:http://en.wikipedia.org/wiki/File:Shasta_dam_under_construction_new_edit.jpg

Future Work

• Regex matching on keys


Future Work

• Better Pig pushdown support


Future Work

• MultiIndexInputFormat


Future Work

• Traditional indexes under ETwin


Future Work

• Traditional indexes under ETwin
• Index maintenance (via HCatalog?)


Questions?
@squarecog

Sounds like fun? We are hiring.


Flexible In-Situ Indexing for Hadoop via Elephant Twin

More Related Content

What's hot

Viewers also liked

Similar to Flexible In-Situ Indexing for Hadoop via Elephant Twin

Recently uploaded

Flexible In-Situ Indexing for Hadoop via Elephant Twin

Editor's Notes