Microsoft Azure's Hadoop cloud service, HDInsight, offers Hadoop, Storm, and HBase as fully managed clusters. In this talk, you'll explore the architecture of HBase clusters in Azure, which is optimized for the cloud, and a set of unique challenges and advantages that come with that architecture. We'll also talk about common patterns and use cases utilizing HBase on Azure.
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
1. Optimizing HBase for the cloud in
Microsoft Azure HDInsight
Maxim Lukiyanov, Microsoft, Senior Program Manager
Ashit Gosalia, Microsoft, Principal Software Engineering Manager
May 7th 2015, HBaseCon 2015
2. About Us
Maxim Lukiyanov
Senior Program Manager,
Big Data team
Microsoft
Contact
email: maxluk@microsoft
@maxiluk
Ashit Gosalia
Principal Software Engineering
Manager, Big Data team
Microsoft
Contact
email: ashitg@microsoft
Maxim Lukiyanov, Ashit Gosalia2
10. Throughput Optimization = Cost Minimization
Capacity
Price
Decoupling of compute and storage
Removes capacity constraint
Which allows minimization of cluster size
to the exact level of throughput required
by workload Local VM Storage
Cloud Storage
Maxim Lukiyanov, Ashit Gosalia10
11. Cost Comparison
Price of 6 node cluster / month 6 hs1.8xlarge VM = $21,000 6 Large VM = $1,400
Price of 100TB / month Azure Blob Storage = $2,300
Total Price of Cluster / month $21,000 $3,700
Maxim Lukiyanov, Ashit Gosalia11
6x cheaper than local HDFS
13. Use Cases
Key value store
Sensor data store
Time series store
Maxim Lukiyanov, Ashit Gosalia13
14. Use case #1: key value store
Example
Product recommendation engine
Map-reduce populates HBase with
reference data
Recommendation service reads reference
data from HBase
10TB of data in 2 node cluster
Cloud optimization
In general throughput requirements vary
greatly by workload
In this extreme example:
40 nodes* -> 2 nodes
$9000/month -> $700/month = 12x
* All nodes in use case examples are Azure A3: 4 cores, 7GB
RAM, 1TB HDD
Maxim Lukiyanov, Ashit Gosalia14
12x
15. Use case #2: sensor data store
Example
Metric store for online advertising
platform
Storm cluster computes metrics on the
link click counts, etc over the stream of
user activity events
Storm stores aggregates in HBase
8TB of data in 4 node cluster
Cloud optimization
32 nodes -> 4 nodes
$7000/month -> $1100/month = 6x
Maxim Lukiyanov, Ashit Gosalia15
6x
16. Use case #3: time series store
Example
Performance metric time series
30TB in 40 node HBase cluster
Cloud optimization – step 1
120 nodes -> 40 nodes
$27,000/month -> $9,700/month = 2.8x
Row key: metric + timestamp
Region updates:
Cloud optimization – step 2
120 nodes -> 10 nodes
$27,000/month -> $2,800/month = 10x
30TB -> 400TB
Row key: day + metric + timestamp
Region updates:
Maxim Lukiyanov, Ashit Gosalia16
10x
3x
22. Announcing
HBase on Azure Data Lake
Azure Data Lake
A hyper scale repository for big data
workloads
HDFS for the cloud
Unlimited capacity
High throughput, low latency
Strong consistency
Durable and highly available
Sing up page for Public Preview
http://azure.microsoft.com/en-us/campaigns/data-lake/
Maxim Lukiyanov, Ashit Gosalia22
24. Summary
Cost
Azure HBase offers new low cost
deployment option, up to 10x
cheaper for some workloads, by
direct integration with cloud
storage
Performance
Comparable to HDD-based
clusters (66% worse storage-
backed read latency)
Flexibility
Easy to shrink or recreate cluster
without data loss
Maxim Lukiyanov, Ashit Gosalia24
Capacity
Price
Local VM Storage
Cloud Storage