Elastic HBase
on Mesos
Cosmin Lehene
Adobe
Industry Average Resource
Utilization <10%
used capacity
1-10%
spare / un-used capacity
90-99%
Cloud Resource Utilization
~60%
used capacity
60%
spare / un-used capacity
40%
Actual utilization: 3-6%
used capacity
1-10%
spare / un-used capacity
90-99%
Why
• peak load provisioning (can be 30X)
• resource imbalance (CPU vs. I/O vs. RAM bound)
• incorrect usage predictions
• all of the above (and others)
Typical HBase Deployment
• (mostly) static deployment footprint
• infrequent scaling out by adding more nodes
• scaling down uncommon
• OLTP, OLAP workloads as separate clusters
• < 32GB Heap (compressed OOPS, GC)
Wasted Resources
Idleness Costs
• idle servers draw > ~50% of the nominal power
• hardware deprecation accounts for ~40%
• public clouds idleness translates to 100% waste
(charged by time not by resource use)
Workload segregation
nulls economy of scale
benefits
• daily, weekly, seasonal variation (both up and down)
• load varies across workloads
• peaks are not synchronized
Load is not Constant
Opportunities
• datacenter as a single pool of shared resources
• resource oversubscription
• mixed workloads can scale elastically within pools
• shared extra capacity
Elastic HBase
Goals
Cluster Management
“Bill of Materials”
• single pool of resources
• multi-tenancy
• mixed short and long running tasks
• elasticity
• realtime scheduling
★ Mesos
★ Mesos
★ Mesos (through frameworks)
★ Marathon / Mesos
★ Marathon / Mesos
Multitenancy
mixing multiple workloads
• daily, weekly, variation
• balance resource usage
• e.g. cpu-bound + I/O bound
• off-peak scheduling (e.g. nighty batch jobs)
• No “analytics” clusters
HBase “Bill of Materials”
• Task portability
• statelessness
• auto discovery
• self contained binary
• resource isolation
✓ built-in (HDFS and ZK)
✓ built-in
★ docker
★ docker (through CGgroups)
Node Level
Hardware
OS/Kernel
Mesos
Slave
Docker
Salt
Minion
Containers
Kafka
Broker
HBase
HRS
[APP]
Resource Management: Mesos
Kubernetes Marathon AuroraScheduling
Storage HDFS Tachyon HBase
Compute MapReduce Storm Spark
Cluster Level
Docker
(and containers
in general)
Why: Docker Containers
• “static link” everything (including the OS)
• Standard interface (resources, lifecycle, events)
• lightweight
• Just another process
• No overhead, native performance
• fine-grained resources
• e.g. 0.5 cores, 32MB RAM, 32MB disk
From .tgz/rpm + Puppet
to Docker
• Goal: optimize for Mesos (not standalone)
• cluster, host agnostic (portability)
• env config injected through Marathon
• Self contained:
• OS-base + JDK + HBase
• centos-7 + java-1.8u40 + hbase-1.0
Marathon
Marathon “runs” Applications
on Mesos
• REST API to start / stop / scale apps
• maintains desired state (e.g. # instances)
• kills / restarts unhealthy containers
• reacts to node failures
• constraints (e.g. locality)
Marathon Manifest
• env information:
• ZK, HDFS URIs
• container resources
• CPU, RAM
• cluster resources
• # container instances
Marathon “deployment”
• REST call
• Marathon (and Mesos) handle the actual
deployment automatically
Benefits
Easy
• no code needed
• trivial docker container
• could be released with HBase
• straight forward Marathon manifest
Efficiency
• Improved resource utilization
• mixed workloads
• elasticity
Elasticity
• Scale up / down based on load
• traffic spikes, compactions, etc.
• yield unused resources
Smaller, Better?
• multiple RS per node
• use all RAM without losing compressed OOPS
• smaller failure domain
• smaller heaps
• less GC-induced latency jitter
Simplified Tuning
• standard container sizes
• decoupled from physical hosts
• portable
• same tuning everywhere
• invariants based on resource ratios
• # threads to # cores to RAM to Bandwidth
Collocated Clusters
• multiple versions
• e.g 0.94, 0.98, 1.0
• simplifies multi-tenancy aspects
• e.g. cluster-per-table resource isolation
NEXT
Improvements
• drain regions before suspending
• schedule for data locality
• collocate Region Servers and HFiles blocks
• DN short-circuit through shared volumes
HBase Ergonomics
• auto-tune to available resources
• JVM heap
• number of threads, etc.
Disaggregating HBase
• HBase is an consistent, highly available, distributed
cache on top of HFiles in HDFS
• Most *real* resource-wise, multi-tenant concerns
revolve around a (single) table
• Each table could have it’s own cluster (minus some
security groups concerns)
HMaster as a Scheduler?
• could fully manage HRS lifecycle (start/stop)
• in conjunction to region allocation
• considerations:
• Marathon is a generic long-running app scheduler
• extend scheduling capabilities instead of
“reinventing” it?
FIN
Resources
• The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second edition
- http://www.morganclaypool.com/doi/abs/10.2200/S00516ED2V01Y201306CAC024
• Omega: flexible, scalable schedulers for large compute clusters - http://research.google.com/pubs/
pub41684.html
• Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center - https://www.cs.berkeley.edu/~alig/
papers/mesos.pdf
• https://github.com/mesosphere/marathon
Contact
• @clehene
• clehene@[gmail | adobe].com
• hstack.org

HBaseCon 2015: Elastic HBase on Mesos

  • 1.
  • 2.
    Industry Average Resource Utilization<10% used capacity 1-10% spare / un-used capacity 90-99%
  • 3.
    Cloud Resource Utilization ~60% usedcapacity 60% spare / un-used capacity 40%
  • 4.
    Actual utilization: 3-6% usedcapacity 1-10% spare / un-used capacity 90-99%
  • 5.
    Why • peak loadprovisioning (can be 30X) • resource imbalance (CPU vs. I/O vs. RAM bound) • incorrect usage predictions • all of the above (and others)
  • 6.
    Typical HBase Deployment •(mostly) static deployment footprint • infrequent scaling out by adding more nodes • scaling down uncommon • OLTP, OLAP workloads as separate clusters • < 32GB Heap (compressed OOPS, GC)
  • 7.
  • 8.
    Idleness Costs • idleservers draw > ~50% of the nominal power • hardware deprecation accounts for ~40% • public clouds idleness translates to 100% waste (charged by time not by resource use)
  • 9.
  • 10.
    • daily, weekly,seasonal variation (both up and down) • load varies across workloads • peaks are not synchronized Load is not Constant
  • 11.
    Opportunities • datacenter asa single pool of shared resources • resource oversubscription • mixed workloads can scale elastically within pools • shared extra capacity
  • 17.
  • 18.
  • 19.
    Cluster Management “Bill ofMaterials” • single pool of resources • multi-tenancy • mixed short and long running tasks • elasticity • realtime scheduling ★ Mesos ★ Mesos ★ Mesos (through frameworks) ★ Marathon / Mesos ★ Marathon / Mesos
  • 20.
    Multitenancy mixing multiple workloads •daily, weekly, variation • balance resource usage • e.g. cpu-bound + I/O bound • off-peak scheduling (e.g. nighty batch jobs) • No “analytics” clusters
  • 21.
    HBase “Bill ofMaterials” • Task portability • statelessness • auto discovery • self contained binary • resource isolation ✓ built-in (HDFS and ZK) ✓ built-in ★ docker ★ docker (through CGgroups)
  • 22.
  • 23.
    Resource Management: Mesos KubernetesMarathon AuroraScheduling Storage HDFS Tachyon HBase Compute MapReduce Storm Spark Cluster Level
  • 24.
  • 25.
    Why: Docker Containers •“static link” everything (including the OS) • Standard interface (resources, lifecycle, events) • lightweight • Just another process • No overhead, native performance • fine-grained resources • e.g. 0.5 cores, 32MB RAM, 32MB disk
  • 26.
    From .tgz/rpm +Puppet to Docker • Goal: optimize for Mesos (not standalone) • cluster, host agnostic (portability) • env config injected through Marathon • Self contained: • OS-base + JDK + HBase • centos-7 + java-1.8u40 + hbase-1.0
  • 28.
  • 29.
    Marathon “runs” Applications onMesos • REST API to start / stop / scale apps • maintains desired state (e.g. # instances) • kills / restarts unhealthy containers • reacts to node failures • constraints (e.g. locality)
  • 30.
    Marathon Manifest • envinformation: • ZK, HDFS URIs • container resources • CPU, RAM • cluster resources • # container instances
  • 32.
    Marathon “deployment” • RESTcall • Marathon (and Mesos) handle the actual deployment automatically
  • 33.
  • 34.
    Easy • no codeneeded • trivial docker container • could be released with HBase • straight forward Marathon manifest
  • 35.
    Efficiency • Improved resourceutilization • mixed workloads • elasticity
  • 36.
    Elasticity • Scale up/ down based on load • traffic spikes, compactions, etc. • yield unused resources
  • 37.
    Smaller, Better? • multipleRS per node • use all RAM without losing compressed OOPS • smaller failure domain • smaller heaps • less GC-induced latency jitter
  • 38.
    Simplified Tuning • standardcontainer sizes • decoupled from physical hosts • portable • same tuning everywhere • invariants based on resource ratios • # threads to # cores to RAM to Bandwidth
  • 39.
    Collocated Clusters • multipleversions • e.g 0.94, 0.98, 1.0 • simplifies multi-tenancy aspects • e.g. cluster-per-table resource isolation
  • 40.
  • 41.
    Improvements • drain regionsbefore suspending • schedule for data locality • collocate Region Servers and HFiles blocks • DN short-circuit through shared volumes
  • 42.
    HBase Ergonomics • auto-tuneto available resources • JVM heap • number of threads, etc.
  • 43.
    Disaggregating HBase • HBaseis an consistent, highly available, distributed cache on top of HFiles in HDFS • Most *real* resource-wise, multi-tenant concerns revolve around a (single) table • Each table could have it’s own cluster (minus some security groups concerns)
  • 44.
    HMaster as aScheduler? • could fully manage HRS lifecycle (start/stop) • in conjunction to region allocation • considerations: • Marathon is a generic long-running app scheduler • extend scheduling capabilities instead of “reinventing” it?
  • 45.
  • 46.
    Resources • The Datacenteras a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second edition - http://www.morganclaypool.com/doi/abs/10.2200/S00516ED2V01Y201306CAC024 • Omega: flexible, scalable schedulers for large compute clusters - http://research.google.com/pubs/ pub41684.html • Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center - https://www.cs.berkeley.edu/~alig/ papers/mesos.pdf • https://github.com/mesosphere/marathon
  • 47.
    Contact • @clehene • clehene@[gmail| adobe].com • hstack.org