TriHUG - Beyond Batch

Beyond Batch
HBase, Drill, & Storm

Brad Anderson

©MapR Technologies

whoami
• Brad Anderson

• Solutions Architect at MapR (Atlanta)

• ATLHUG co-chair

• ‘boorad’ most places (twitter, github)

• banderson@maprtech.com
©MapR Technologies

• The open enterprise-grade distribution for Hadoop
• Easy, dependable and fast
• Open source with standards-based extensions

• MapR is deployed at 1000’s of companies
• From small Internet startups to the world’s largest enterprises

• MapR customers analyze massive amounts of data:
• Hundreds of billions of events daily
• 90% of the world’s Internet population monthly
• $1 trillion in retail purchases annually

• MapR Cloud Partners
• Google to provide Hadoop on Google Compute Engine
• Amazon for Elastic Map Reduce + instances
©MapR Technologies

Beyond Batch
• HBase & M7

• Apache Drill

• Storm

©MapR Technologies

Latency Matters

Batch Interactive Streaming

©MapR Technologies

HBase Issues
Reliability
• Compactions disrupt operations
• Very slow crash recovery
• Unreliable splitting

Business continuity
• Common hardware/software issues cause downtime
• Administration requires downtime
• No point-in-time recovery
• Complex backup process

Performance
• Many bottlenecks result in low throughput
• Limited data locality
• Limited # of tables

Manageability
• Compactions, splits and merges must be done manually (in reality)
• Basic operations like backup or table rename are complex
©MapR Technologies

M7
 An integrated system for unstructured and structured data
– Unified namespace for files and tables
– Data management
– Data protection
– Disaster recovery
– No additional administration

 An architecture that delivers reliability and performance
– Fewer layers
– No compactions
– Seamless splits
– Automatic merges
– Single network hop
– Instant recovery
– Reduced read and write amplification

©MapR Technologies

Unified Namespace
$ pwd
/mapr/default/user/boorad

$ ls
file1 file2 table1 table2

$ hbase shell
hbase(main):003:0> create '/user/boorad/table3', 'cf1', 'cf2', 'cf3'
0 row(s) in 0.1570 seconds

$ ls
file1 file2 table1 table2 table3

$ hadoop fs -ls /user/boorad
Found 5 items
-rw-r--r-- 3 mapr mapr 16 2012-09-28 08:34 /user/boorad/file1
-rw-r--r-- 3 mapr mapr 22 2012-09-28 08:34 /user/boorad/file2
trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:32 /user/boorad/table1
©MapR Technologies

Simplifying HBase Architecture

HBase
JVM

DFS HBase
JVM JVM

ext3 MapR Uniﬁed

Disks Disks Disks
Other Distributions

©MapR Technologies

No RegionServers?

One network hop
No daemons to manage

One cache

©MapR Technologies
15

Region Assignment

©MapR Technologies

Instant Recovery

 Apache HBase experiences an outage when any node
crashes
– Each RegionServer replays WAL before any region can be
recovered
– All regions served by that RegionServer cannot be accessed
 M7 provides instant recovery
– M7 uses small WALs
• Multiple WALs per region vs. 1 per RegionServer (1000 regions)
– Instant recovery on put
– 1000-10000x faster recovery on get
 How?
– M7 leverages unique MapR-FS capabilities, not impacted by
HDFS limitations
• Append support
• No limit to # of ﬁles
©MapR Technologies

LSMT (FTW)
 Traditional disk-based index structures like B-
Trees are expensive to maintain in real-time
 Log Structured Merge Trees reduce the cost by
deferring and batching index changes
 Writes
– Writes go to an in-memory index
• And a commit log in case the node crashes and recovery is
needed
– The in-memory index is occasionally merged into the
disk-based index
• This may trigger a compaction
 Reads
– Reads hit the in-memory index and the disk-based
index
©MapR Technologies

Storage Subsystem Performance
What does it cost to merge the in-memory index into the disk-
based index?
HBase-style LevelDB-style M7
Examples BigTable, HBase, Cassandra, Riak M7
Cassandra, Riak
WAF Low High Low
RAF High Low Low
I/O storms Yes No No
Disk space High (2x) Low Low
overhead
Skewed data Bad Good Good
handling
Rewrite large Yes Yes No
values
Terminology:
 Write-ampliﬁcation factor (WAF): The ratio between writes to disk and
application writes. Note that data must be rewritten in every indexed structure.
 Read-ampliﬁcation factor (RAF): The ratio between reads from disk and
application reads.
 Skewed data handling: When inserting values with similar keys (eg, increasing
©MapR Technologies

Other M7 Features
 Smaller disk footprint
– HBase stores key & column name for every version of
every cell
– M7 never repeats the key or column name

 Columnar layout
– HBasesupports 2-3 column families in practice
– M7 supports 64 column families

 Online schema changes
– No need to disable table to add/remove/modify
column families

©MapR Technologies

Big Data Picture
Batch processing Interactive analysis Stream processing

Query runtime Minutes to hours Milliseconds to minutes Never-ending

Data volume TBs to PBs GBs to PBs Continuous stream

Programming model MapReduce Queries DAG

Users Developers Analysts and Developers Developers

Google project MapReduce Dremel

Open source project Hadoop MapReduce Storm, S4

©MapR Technologies

Big Data Picture
Batch processing Interactive analysis Stream processing

Query runtime Minutes to hours Milliseconds to minutes Never-ending

Data volume TBs to PBs GBs to PBs Continuous stream

Programming model MapReduce Queries DAG

Users Developers Analysts and Developers Developers

Google project MapReduce Dremel

Open source project Hadoop MapReduce Storm, S4

Apache Drill
©MapR Technologies

Google Dremel
• Interactive analysis of large-scale datasets
• Trillion records at interactive speeds
• Complementary to MapReduce
• Used by thousands of Google employees
• Paper published at VLDB 2010
• Model
• Nested data model with schema
• Most data at Google is stored/transferred in Protocol Buffers
• Normalization (to relational) is prohibitive
• SQL-like query language with nested data support
• Implementation
• Column-based storage and processing
• In-situ data access (GFS and Bigtable)
• Tree architecture as in Web search (and databases)
©MapR Technologies

Google BigQuery
• Hosted Dremel (Dremel as a Service)
• CLI (bq) and Web UI
• Import data from Google Cloud Storage or local ﬁles
• Files must be in CSV format
• Nested data not supported [yet] except built-in datasets
• Schema deﬁnition required

©MapR Technologies

DrQL Example
DocId: 10
Links
Forward: 20 SELECT DocId AS Id,
Forward: 40 COUNT(Name.Language.Code) WITHIN Name AS
Forward: 60 Cnt,
Name Name.Url + ',' + Name.Language.Code AS Str
Language FROM t
Code: 'en-us' WHERE REGEXP(Name.Url, '^http') AND DocId < 20;
Country: 'us'
Language
Code: 'en' Id: 10
Url: 'http://A' Name
Name Cnt: 2
Url: 'http://B' Language
Name Str: 'http://A,en-us'
Language Str: 'http://A,en'
Code: 'en-gb' Name
Country: 'gb' Cnt: 0
©MapR Technologies
* Example from the Dremel paper

Data Flow

©MapR Technologies

Extensibility
• Nested query languages
• Pluggable model
• DrQL
• Mongo Query Language
• Cascading
• Distributed execution engine
• Extensible model (eg, Dryad)
• Low-latency
• Fault tolerant

©MapR Technologies

Extensibility
• Nested data formats
• Pluggable model
• Column-based (ColumnIO/Dremel, Trevni, RCFile)
• Row-based (RecordIO, Avro, JSON, CSV)
• Schema (Protocol Buffers, Avro, CSV)
• Schema-less (JSON, BSON)
• Scalable data sources
• Pluggable model
• Hadoop
• HBase

©MapR Technologies

Architecture

• Only the execution engine knows the physical attributes of the
cluster
• # nodes, hardware, ﬁle locations, …

• Public interfaces enable extensibility
• Developers can build parsers for new query languages
• Developers can provide an execution plan directly

• Each level of the plan has a human readable representation
• Facilitates debugging and unit testing
©MapR Technologies

Architecture

©MapR Technologies

Query Components
• Query components:
• SELECT
• FROM
• WHERE
• GROUP BY
• HAVING
• (JOIN)

• Key logical operators:
• Scan
• Filter
• Aggregate
• (Join)
©MapR Technologies

Execution Engine Layers
• Drill execution engine has two layers
• Operator layer is serialization-aware
• Processes individual records
• Execution layer is not serialization-aware
• Processes batches of records (blobs)
• Responsible for communication, dependencies and fault tolerance

©MapR Technologies

Design Principles
Flexible Easy
• Pluggable query languages • Unzip and run
• Extensible execution engine • Zero conﬁguration
• Pluggable data formats • Reverse DNS not needed
• Column-based and row- • IP addresses can change
based • Clear and concise log
• Schema and schema-less messages

Fast Dependable
• C/C++ core with Java • No SPOF
support • Instant recovery from
• Google C++ style guide crashes
• Min latency and max
throughput (limited only by
hardware)
©MapR Technologies

Hadoop Integration
• Hadoop data sources
• Hadoop FileSystem API (HDFS/MapR-FS)
• HBase
• Hadoop data formats
• Apache Avro
• RCFile
• MapReduce-based tools to create column-based
formats

©MapR Technologies

Fully Open

©MapR Technologies

Before Storm

Queues Workers

©MapR Technologies

Example

©MapR Technologies
(simpliﬁed)

Storm

Guaranteed data processing
Horizontal scalability
Fault-tolerance
No intermediate message brokers!
Higher level abstraction than
message passing
“Just works”
©MapR Technologies

Concepts

©MapR Technologies

Spouts

Tuple
Tuple Tuple Tuple
Tuple Tuple Tuple

Tuple Tuple
Tuple Tuple
Tuple Tuple
Tuple

Source of streams

©MapR Technologies

Spouts

public interface ISpout extends Serializable {
void open(Map conf,
TopologyContext context,
SpoutOutputCollector collector);
void close();
void nextTuple();
void ack(Object msgId);
void fail(Object msgId);
}

©MapR Technologies

Bolts

Tuple Tuple Tuple Tuple Tuple Tuple Tuple

Tuple Tuple Tuple Tuple

Tuple Tuple
Tuple Tuple
Tuple Tuple
Tuple

Processes input streams and produces new streams

©MapR Technologies

Bolts
public class DoubleAndTripleBolt extends BaseRichBolt {
private OutputCollectorBase _collector;

public void prepare(Map conf,
TopologyContext context,
OutputCollectorBase collector) {
_collector = collector;
}

public void execute(Tuple input) {
int val = input.getInteger(0);
_collector.emit(input, new Values(val*2, val*3));
_collector.ack(input);
}

public void declareOutputFields(OutputFieldsDeclarer
declarer) {
declarer.declare(new Fields("double", "triple"));
}
}
©MapR Technologies

Trident
TridentTopology topology = new TridentTopology();
TridentState wordCounts =
topology.newStream("spout1", spout)
.each(new Fields("sentence"),
new Split(),
new Fields("word"))
.groupBy(new Fields("word"))
.persistentAggregate(new MemoryMapState.Factory(),
new Count(),
new Fields("count"))
.parallelismHint(6);

©MapR Technologies

Storm

realtime
processes
Apps
Queue

Raw
Data Business
Value
Hadoop

Parallel Cluster Ingest

batch
processes
©MapR Technologies

Get Involved!
• Get more details on M7
• http://mapr.com/products/mapr-editions/m7-edition

• Join the Apache Drill mailing list
• drill-dev-subscribe@incubator.apache.org

• Watch TailSpout development
• https://github.com/{tdunning | boorad}/mapr-spout

• Join MapR
• jobs@mapr.com
• banderson@maprtech.com

• @boorad
©MapR Technologies

TriHUG - Beyond Batch

More Related Content

What's hot

Viewers also liked

Similar to TriHUG - Beyond Batch

More from boorad

TriHUG - Beyond Batch

Editor's Notes