HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
Valta: A Resource Management
Layer over Apache HBase
Lars George| Director EMEA Services
Andrew Wang | Software Engineer
June 13, 2013
Background on HBase
2
• Write-heavy processing pipelines
• Web crawling, personalization, time-series
• Storing a lot of data (many TBs)
• Random reads/writes
• Tight MapReduce and Hadoop integration
Workloads
3
• Very much a shared system
• One system, multiple workloads
• Frontend doing random reads/writes
• Analytical MR doing sequential scans
• Bulk import/export with MR
• Hard to isolate multitenant workloads
Example: Rolling RS failures
4
• Happened in production
• Bad bulk import wiped out entire cluster
• MR writes kill the RS
• Region gets reassigned
• Repeat until cluster is dead
• Applies to any high-load traffic
Current state of the art
5
• Run separate clusters, replicate between
• $$$, poor utilization, more complex
• Namespace-based hardware partitioning
• Same issues as above
• Delay big tasks until periods of low load
• Ad-hoc, weak guarantees
Other Problems
6
• Long requests impact frontend latency
• I/O latency (HDFS, OS, disk)
• Unpredictable ops (compaction, cron, …)
• Some straightforward to fix, some not
Outline
7
• Project Valta (HBase)
• Resource limits
• Blueprint for further issues
• Request scheduling
• Auto-tuning scheduling for SLOs
• Multiple read replicas
Project Valta
9
• Need basic resource limits in HBase
• Single shared system
• Ill-behaved HBase clients are unrestricted
• Take resources from other clients
• Worst case: rolling RS failures
• Want to limit damage from bad clients
Resource Limits
10
• Collect RPC metrics
• Payload size and throughput
• Impose per-client throughput limits
• e.g. MR import limited to 100 1MB puts/s
• Limits are enforced per-regionserver
• Soft state
• Think of it as a firewall
Limitations
12
• Important first steps, still more to do
• Static limits need baby-sitting
• Dynamic workload, set of clients
• Doesn’t fix some parts of HBase
• Compactions
• Doesn’t fix the rest of the stack
• HDFS, OS, disk
Blueprint
14
• Ideas on other QoS issues
• Full-stack request scheduling
• HBase, HDFS, OS, disk
• Auto-tuning to meet high-level SLOs
• Random latency (compaction, cron, …)
• Let’s file some JIRAs
Full-stack request scheduling
15
• Need scheduling in all layers
• HBase, HDFS, OS, disk
• Run high-priority requests first
• Preemption of long operations
• Some pieces already available
• RPC priority field (HADOOP-9194)
• Client names in MR/HBase/HDFS
HBase request scheduling
16
• Add more HBase scheduling hooks
• RPC handling
• Between HDFS I/Os
• During long coprocessors or scans
• Expose hooks to coprocessors
• Could be used by Valta
HDFS request scheduling
17
• Same scheduling hooks as in HBase
• RPC layer, between I/Os
• Bound # of requests per disk
• Reduces queue length and contention
• Preempt queues in OS and disk
• OS block layer (CFQ, ioprio_set)
• Disk controller (SATA NCQ, ???)
High-level SLO enforcement
18
• Research work I did at Berkeley (Cake)
• Specify high-level SLOs directly to HBase
• “100ms 99th percentile latency for gets”
• Added hooks to HBase and HDFS
• System auto-tunes to satisfy SLOs
• Read the paper or hit me up!
• http://www.umbrant.com/papers/socc12-cake.pdf
Multiple read replicas
19
• Also proposed for MTTR, availability
• Many unpredictable sources of latency
• Compactions
• Also: cron, MR spill, shared caches, network, …
• Sidestep the problem!
• Read from 3 RS, return the fastest result
• Unlikely all three will be slow
• Weaker consistency, better latency
Conclusion
20
• HBase is a great system!
• Let’s make it multitenant
• Request limits
• Full-stack request scheduling
• High-level SLO enforcement
• Multiple read replicas