Advertisement
Advertisement

More Related Content

Slideshows for you(20)

Viewers also liked(20)

Advertisement

More from Cloudera, Inc.(20)

Advertisement

HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase

  1. Valta: A Resource Management Layer over Apache HBase Lars George| Director EMEA Services Andrew Wang | Software Engineer June 13, 2013
  2. Background on HBase 2 • Write-heavy processing pipelines • Web crawling, personalization, time-series • Storing a lot of data (many TBs) • Random reads/writes • Tight MapReduce and Hadoop integration
  3. Workloads 3 • Very much a shared system • One system, multiple workloads • Frontend doing random reads/writes • Analytical MR doing sequential scans • Bulk import/export with MR • Hard to isolate multitenant workloads
  4. Example: Rolling RS failures 4 • Happened in production • Bad bulk import wiped out entire cluster • MR writes kill the RS • Region gets reassigned • Repeat until cluster is dead • Applies to any high-load traffic
  5. Current state of the art 5 • Run separate clusters, replicate between • $$$, poor utilization, more complex • Namespace-based hardware partitioning • Same issues as above • Delay big tasks until periods of low load • Ad-hoc, weak guarantees
  6. Other Problems 6 • Long requests impact frontend latency • I/O latency (HDFS, OS, disk) • Unpredictable ops (compaction, cron, …) • Some straightforward to fix, some not
  7. Outline 7 • Project Valta (HBase) • Resource limits • Blueprint for further issues • Request scheduling • Auto-tuning scheduling for SLOs • Multiple read replicas
  8. 8 Project Valta
  9. Project Valta 9 • Need basic resource limits in HBase • Single shared system • Ill-behaved HBase clients are unrestricted • Take resources from other clients • Worst case: rolling RS failures • Want to limit damage from bad clients
  10. Resource Limits 10 • Collect RPC metrics • Payload size and throughput • Impose per-client throughput limits • e.g. MR import limited to 100 1MB puts/s • Limits are enforced per-regionserver • Soft state • Think of it as a firewall
  11. Implementation 11 • Client-side table wrapper • Server-side coprocessor • Github • https://github.com/larsgeorge/Valta • Follow HBASE-8481 • https://issues.apache.org/jira/browse/HBASE-8481
  12. Limitations 12 • Important first steps, still more to do • Static limits need baby-sitting • Dynamic workload, set of clients • Doesn’t fix some parts of HBase • Compactions • Doesn’t fix the rest of the stack • HDFS, OS, disk
  13. 13 Blueprint for further issues
  14. Blueprint 14 • Ideas on other QoS issues • Full-stack request scheduling • HBase, HDFS, OS, disk • Auto-tuning to meet high-level SLOs • Random latency (compaction, cron, …) • Let’s file some JIRAs 
  15. Full-stack request scheduling 15 • Need scheduling in all layers • HBase, HDFS, OS, disk • Run high-priority requests first • Preemption of long operations • Some pieces already available • RPC priority field (HADOOP-9194) • Client names in MR/HBase/HDFS
  16. HBase request scheduling 16 • Add more HBase scheduling hooks • RPC handling • Between HDFS I/Os • During long coprocessors or scans • Expose hooks to coprocessors • Could be used by Valta
  17. HDFS request scheduling 17 • Same scheduling hooks as in HBase • RPC layer, between I/Os • Bound # of requests per disk • Reduces queue length and contention • Preempt queues in OS and disk • OS block layer (CFQ, ioprio_set) • Disk controller (SATA NCQ, ???)
  18. High-level SLO enforcement 18 • Research work I did at Berkeley (Cake) • Specify high-level SLOs directly to HBase • “100ms 99th percentile latency for gets” • Added hooks to HBase and HDFS • System auto-tunes to satisfy SLOs • Read the paper or hit me up! • http://www.umbrant.com/papers/socc12-cake.pdf
  19. Multiple read replicas 19 • Also proposed for MTTR, availability • Many unpredictable sources of latency • Compactions • Also: cron, MR spill, shared caches, network, … • Sidestep the problem! • Read from 3 RS, return the fastest result • Unlikely all three will be slow • Weaker consistency, better latency
  20. Conclusion 20 • HBase is a great system! • Let’s make it multitenant • Request limits • Full-stack request scheduling • High-level SLO enforcement • Multiple read replicas
  21. 21 Thanks! lars@cloudera.com andrew.wang@cloudera.com
Advertisement