SlideShare a Scribd company logo
Elastic HBase
on Mesos
Cosmin Lehene
Adobe
Industry Average Resource
Utilization <10%
used capacity
1-10%
spare / un-used capacity
90-99%
Cloud Resource Utilization
~60%
used capacity
60%
spare / un-used capacity
40%
Actual utilization: 3-6%
used capacity
1-10%
spare / un-used capacity
90-99%
Why
• peak load provisioning (can be 30X)
• resource imbalance (CPU vs. I/O vs. RAM bound)
• incorrect usage predictions
• all of the above (and others)
Typical HBase Deployment
• (mostly) static deployment footprint
• infrequent scaling out by adding more nodes
• scaling down uncommon
• OLTP, OLAP workloads as separate clusters
• < 32GB Heap (compressed OOPS, GC)
Wasted Resources
Idleness Costs
• idle servers draw > ~50% of the nominal power
• hardware deprecation accounts for ~40%
• public clouds idleness translates to 100% waste
(charged by time not by resource use)
Workload segregation
nulls economy of scale
benefits
• daily, weekly, seasonal variation (both up and down)
• load varies across workloads
• peaks are not synchronized
Load is not Constant
Opportunities
• datacenter as a single pool of shared resources
• resource oversubscription
• mixed workloads can scale elastically within pools
• shared extra capacity
Elastic HBase
Goals
Cluster Management
“Bill of Materials”
• single pool of resources
• multi-tenancy
• mixed short and long running tasks
• elasticity
• realtime scheduling
★ Mesos
★ Mesos
★ Mesos (through frameworks)
★ Marathon / Mesos
★ Marathon / Mesos
HBase “Bill of Materials”
• task portability
• statelessness
• auto discovery
• self contained binary
• resource isolation
✓ built-in (HDFS and ZK)
✓ built-in
★ docker
★ docker (through CGroups)
Node Level
Hardware
OS/Kernel
Mesos
Slave
Docker
Salt
Minion
Containers
Kafka
Broker
HBase
HRS
[APP]
Resource Management: Mesos
Kubernetes Marathon AuroraScheduling
Storage HDFS Tachyon HBase
Compute MapReduce Storm Spark
Cluster Level
Docker
(and containers
in general)
Why: Docker Containers
• “static link” everything (including the OS)
• standard interface (resources, lifecycle, events)
• lightweight
• just another process
• no overhead, native performance
• fine-grained resources
• e.g. 0.5 cores, 32MB RAM, 32MB disk
From .tgz/rpm + Puppet
to Docker
• goal: optimize for Mesos (not standalone)
• cluster, host agnostic (portability)
• env config injected through Marathon
• self contained:
• OS-base + JDK + HBase
• centos-7 + java-1.8u40 + hbase-1.0
Marathon
Marathon “runs” Applications
on Mesos
• REST API to start / stop / scale apps
• maintains desired state (e.g. # instances)
• kills / restarts unhealthy containers
• reacts to node failures
• constraints (e.g. locality)
Marathon Manifest
• env information:
• ZK, HDFS URIs
• container resources
• CPU, RAM
• cluster resources
• # container instances
Marathon “deployment”
• REST call
• Marathon (and Mesos) handle the actual
deployment automatically
Benefits
Easy
• no code needed
• trivial docker container
• could be released with HBase
• straight forward Marathon manifest
Efficiency
• improved resource utilization
• mixed workloads
• elasticity
Elasticity
• scale up / down based on load
• traffic spikes, compactions, etc.
• yield unused resources
Smaller, Better?
• multiple RS per node
• use all RAM without losing compressed OOPS
• smaller failure domain
• smaller heaps
• less GC-induced latency jitter
Simplified Tuning
• standard container sizes
• decoupled from physical hosts
• portable
• same tuning everywhere
• invariants based on resource ratios
• # threads to # cores to RAM to Bandwidth
Collocated Clusters
• multiple versions
• e.g 0.94, 0.98, 1.0
• simplifies multi-tenancy aspects
• e.g. cluster-per-table resource isolation
NEXT
Improvements
• drain regions before suspending
• schedule for data locality
• collocate Region Servers and HFiles blocks
• DN short-circuit through shared volumes
HBase Ergonomics
• auto-tune to available resources
• JVM heap
• number of threads, etc.
Disaggregating HBase
• HBase is an consistent, highly available, distributed
cache on top of HFiles in HDFS
• most *real* resource-wise, multi-tenant concerns
revolve around a (single) table
• each table could have it’s own cluster (minus some
security groups concerns)
HMaster as a Scheduler?
• could fully manage HRS lifecycle (start/stop)
• in conjunction to region allocation
• considerations:
• Marathon is a generic long-running app scheduler
• extend scheduling capabilities instead of
“reinventing” it?
FIN
Resources
• The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second edition
- http://www.morganclaypool.com/doi/abs/10.2200/S00516ED2V01Y201306CAC024
• Omega: flexible, scalable schedulers for large compute clusters - http://research.google.com/pubs/
pub41684.html
• Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center - https://www.cs.berkeley.edu/~alig/
papers/mesos.pdf
• https://github.com/mesosphere/marathon
Contact
• @clehene
• clehene@[gmail | adobe].com
• hstack.org

More Related Content

What's hot

Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
tomasbart
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on Mesos
Joe Stein
 
Apache Mesos
Apache MesosApache Mesos
Apache Mesos
Puneet soni
 
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
Cloudera, Inc.
 
Mesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraMesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run Cassandra
DataStax Academy
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on Mesos
Joe Stein
 
HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBase
HBaseCon
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
Cloudera, Inc.
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache Mesos
Joe Stein
 
Running Cassandra in AWS
Running Cassandra in AWSRunning Cassandra in AWS
Running Cassandra in AWS
DataStax Academy
 
Scaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and MesosScaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and Mesos
Discover Pinterest
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme Makeover
HBaseCon
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applications
Ben Slater
 
DockerCon14 Cluster Management and Containerization
DockerCon14 Cluster Management and ContainerizationDockerCon14 Cluster Management and Containerization
DockerCon14 Cluster Management and ContainerizationDocker, Inc.
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Joe Stein
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
Knoldus Inc.
 
Lessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerLessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On Docker
Spark Summit
 
Introduction of mesos persistent storage
Introduction of mesos persistent storageIntroduction of mesos persistent storage
Introduction of mesos persistent storage
Zhou Weitao
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt
Ceph Community
 

What's hot (20)

Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on Mesos
 
Apache Mesos
Apache MesosApache Mesos
Apache Mesos
 
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
 
Mesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraMesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run Cassandra
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on Mesos
 
HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBase
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache Mesos
 
Running Cassandra in AWS
Running Cassandra in AWSRunning Cassandra in AWS
Running Cassandra in AWS
 
Scaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and MesosScaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and Mesos
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme Makeover
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applications
 
DockerCon14 Cluster Management and Containerization
DockerCon14 Cluster Management and ContainerizationDockerCon14 Cluster Management and Containerization
DockerCon14 Cluster Management and Containerization
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
 
Lessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerLessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On Docker
 
Introduction of mesos persistent storage
Introduction of mesos persistent storageIntroduction of mesos persistent storage
Introduction of mesos persistent storage
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt
 

Viewers also liked

11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2
Fabio Fumarola
 
Docker Based Hadoop Provisioning
Docker Based Hadoop ProvisioningDocker Based Hadoop Provisioning
Docker Based Hadoop ProvisioningDataWorks Summit
 
Accumulo Nutch/GORA, Storm, and Pig
Accumulo Nutch/GORA, Storm, and PigAccumulo Nutch/GORA, Storm, and Pig
Accumulo Nutch/GORA, Storm, and PigJason Trost
 
Python meetup: coroutines, event loops, and non-blocking I/O
Python meetup: coroutines, event loops, and non-blocking I/OPython meetup: coroutines, event loops, and non-blocking I/O
Python meetup: coroutines, event loops, and non-blocking I/O
Buzzcapture
 
HBaseConEast2016: HBase on Docker with Clusterdock
HBaseConEast2016: HBase on Docker with ClusterdockHBaseConEast2016: HBase on Docker with Clusterdock
HBaseConEast2016: HBase on Docker with Clusterdock
Michael Stack
 
Qml 培訓課程 multi media
Qml 培訓課程   multi mediaQml 培訓課程   multi media
Qml 培訓課程 multi media
diro fan
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 
Protocol libraries the right way
Protocol libraries the right wayProtocol libraries the right way
Protocol libraries the right way
Leo Zhou
 
如何提高研发效率
如何提高研发效率如何提高研发效率
如何提高研发效率
Leo Zhou
 
Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012Cosmin Lehene
 
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
tatsuya6502
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBase
Cloudera, Inc.
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
Anil Gupta
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
Nick Dimiduk
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
Edureka!
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
Lars Hofhansl
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
DataWorks Summit/Hadoop Summit
 

Viewers also liked (17)

11. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/211. From Hadoop to Spark 2/2
11. From Hadoop to Spark 2/2
 
Docker Based Hadoop Provisioning
Docker Based Hadoop ProvisioningDocker Based Hadoop Provisioning
Docker Based Hadoop Provisioning
 
Accumulo Nutch/GORA, Storm, and Pig
Accumulo Nutch/GORA, Storm, and PigAccumulo Nutch/GORA, Storm, and Pig
Accumulo Nutch/GORA, Storm, and Pig
 
Python meetup: coroutines, event loops, and non-blocking I/O
Python meetup: coroutines, event loops, and non-blocking I/OPython meetup: coroutines, event loops, and non-blocking I/O
Python meetup: coroutines, event loops, and non-blocking I/O
 
HBaseConEast2016: HBase on Docker with Clusterdock
HBaseConEast2016: HBase on Docker with ClusterdockHBaseConEast2016: HBase on Docker with Clusterdock
HBaseConEast2016: HBase on Docker with Clusterdock
 
Qml 培訓課程 multi media
Qml 培訓課程   multi mediaQml 培訓課程   multi media
Qml 培訓課程 multi media
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Protocol libraries the right way
Protocol libraries the right wayProtocol libraries the right way
Protocol libraries the right way
 
如何提高研发效率
如何提高研发效率如何提高研发效率
如何提高研发效率
 
Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012
 
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBase
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 

Similar to Elastic HBase on Mesos - HBaseCon 2015

HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014
Nick Dimiduk
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low Latency
HBaseCon
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
Jean-Baptiste Poullet
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfsNAVER D2
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
Jean-Baptiste Poullet
 
Apache Spark
Apache SparkApache Spark
Apache Spark
SugumarSarDurai
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
larsgeorge
 
Azure DBA with IaaS
Azure DBA with IaaSAzure DBA with IaaS
Azure DBA with IaaS
Kellyn Pot'Vin-Gorman
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
saipriyacoool
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
JAX London
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012Chris Huang
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
强 王
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batchboorad
 
Azure Databases with IaaS
Azure Databases with IaaSAzure Databases with IaaS
Azure Databases with IaaS
Kellyn Pot'Vin-Gorman
 

Similar to Elastic HBase on Mesos - HBaseCon 2015 (20)

HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low Latency
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfs
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 
Azure DBA with IaaS
Azure DBA with IaaSAzure DBA with IaaS
Azure DBA with IaaS
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batch
 
Azure Databases with IaaS
Azure Databases with IaaSAzure Databases with IaaS
Azure Databases with IaaS
 

Recently uploaded

Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
Srikant77
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Jay Das
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 

Recently uploaded (20)

Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 

Elastic HBase on Mesos - HBaseCon 2015

  • 2. Industry Average Resource Utilization <10% used capacity 1-10% spare / un-used capacity 90-99%
  • 3. Cloud Resource Utilization ~60% used capacity 60% spare / un-used capacity 40%
  • 4. Actual utilization: 3-6% used capacity 1-10% spare / un-used capacity 90-99%
  • 5. Why • peak load provisioning (can be 30X) • resource imbalance (CPU vs. I/O vs. RAM bound) • incorrect usage predictions • all of the above (and others)
  • 6. Typical HBase Deployment • (mostly) static deployment footprint • infrequent scaling out by adding more nodes • scaling down uncommon • OLTP, OLAP workloads as separate clusters • < 32GB Heap (compressed OOPS, GC)
  • 8. Idleness Costs • idle servers draw > ~50% of the nominal power • hardware deprecation accounts for ~40% • public clouds idleness translates to 100% waste (charged by time not by resource use)
  • 10. • daily, weekly, seasonal variation (both up and down) • load varies across workloads • peaks are not synchronized Load is not Constant
  • 11. Opportunities • datacenter as a single pool of shared resources • resource oversubscription • mixed workloads can scale elastically within pools • shared extra capacity
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 18. Goals
  • 19. Cluster Management “Bill of Materials” • single pool of resources • multi-tenancy • mixed short and long running tasks • elasticity • realtime scheduling ★ Mesos ★ Mesos ★ Mesos (through frameworks) ★ Marathon / Mesos ★ Marathon / Mesos
  • 20. HBase “Bill of Materials” • task portability • statelessness • auto discovery • self contained binary • resource isolation ✓ built-in (HDFS and ZK) ✓ built-in ★ docker ★ docker (through CGroups)
  • 22. Resource Management: Mesos Kubernetes Marathon AuroraScheduling Storage HDFS Tachyon HBase Compute MapReduce Storm Spark Cluster Level
  • 24. Why: Docker Containers • “static link” everything (including the OS) • standard interface (resources, lifecycle, events) • lightweight • just another process • no overhead, native performance • fine-grained resources • e.g. 0.5 cores, 32MB RAM, 32MB disk
  • 25. From .tgz/rpm + Puppet to Docker • goal: optimize for Mesos (not standalone) • cluster, host agnostic (portability) • env config injected through Marathon • self contained: • OS-base + JDK + HBase • centos-7 + java-1.8u40 + hbase-1.0
  • 26.
  • 28. Marathon “runs” Applications on Mesos • REST API to start / stop / scale apps • maintains desired state (e.g. # instances) • kills / restarts unhealthy containers • reacts to node failures • constraints (e.g. locality)
  • 29. Marathon Manifest • env information: • ZK, HDFS URIs • container resources • CPU, RAM • cluster resources • # container instances
  • 30.
  • 31. Marathon “deployment” • REST call • Marathon (and Mesos) handle the actual deployment automatically
  • 33. Easy • no code needed • trivial docker container • could be released with HBase • straight forward Marathon manifest
  • 34. Efficiency • improved resource utilization • mixed workloads • elasticity
  • 35. Elasticity • scale up / down based on load • traffic spikes, compactions, etc. • yield unused resources
  • 36. Smaller, Better? • multiple RS per node • use all RAM without losing compressed OOPS • smaller failure domain • smaller heaps • less GC-induced latency jitter
  • 37. Simplified Tuning • standard container sizes • decoupled from physical hosts • portable • same tuning everywhere • invariants based on resource ratios • # threads to # cores to RAM to Bandwidth
  • 38. Collocated Clusters • multiple versions • e.g 0.94, 0.98, 1.0 • simplifies multi-tenancy aspects • e.g. cluster-per-table resource isolation
  • 39. NEXT
  • 40. Improvements • drain regions before suspending • schedule for data locality • collocate Region Servers and HFiles blocks • DN short-circuit through shared volumes
  • 41. HBase Ergonomics • auto-tune to available resources • JVM heap • number of threads, etc.
  • 42. Disaggregating HBase • HBase is an consistent, highly available, distributed cache on top of HFiles in HDFS • most *real* resource-wise, multi-tenant concerns revolve around a (single) table • each table could have it’s own cluster (minus some security groups concerns)
  • 43. HMaster as a Scheduler? • could fully manage HRS lifecycle (start/stop) • in conjunction to region allocation • considerations: • Marathon is a generic long-running app scheduler • extend scheduling capabilities instead of “reinventing” it?
  • 44. FIN
  • 45. Resources • The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second edition - http://www.morganclaypool.com/doi/abs/10.2200/S00516ED2V01Y201306CAC024 • Omega: flexible, scalable schedulers for large compute clusters - http://research.google.com/pubs/ pub41684.html • Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center - https://www.cs.berkeley.edu/~alig/ papers/mesos.pdf • https://github.com/mesosphere/marathon
  • 46. Contact • @clehene • clehene@[gmail | adobe].com • hstack.org