Your SlideShare is downloading. ×
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase

790

Published on

Presented by: Lars George (Cloudera) and Andrew Wang (Cloudera)

Presented by: Lars George (Cloudera) and Andrew Wang (Cloudera)

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
790
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Valta: A Resource Management Layer over Apache HBase Lars George| Director EMEA Services Andrew Wang | Software Engineer June 13, 2013
  • 2. Background on HBase 2 • Write-heavy processing pipelines • Web crawling, personalization, time-series • Storing a lot of data (many TBs) • Random reads/writes • Tight MapReduce and Hadoop integration
  • 3. Workloads 3 • Very much a shared system • One system, multiple workloads • Frontend doing random reads/writes • Analytical MR doing sequential scans • Bulk import/export with MR • Hard to isolate multitenant workloads
  • 4. Example: Rolling RS failures 4 • Happened in production • Bad bulk import wiped out entire cluster • MR writes kill the RS • Region gets reassigned • Repeat until cluster is dead • Applies to any high-load traffic
  • 5. Current state of the art 5 • Run separate clusters, replicate between • $$$, poor utilization, more complex • Namespace-based hardware partitioning • Same issues as above • Delay big tasks until periods of low load • Ad-hoc, weak guarantees
  • 6. Other Problems 6 • Long requests impact frontend latency • I/O latency (HDFS, OS, disk) • Unpredictable ops (compaction, cron, …) • Some straightforward to fix, some not
  • 7. Outline 7 • Project Valta (HBase) • Resource limits • Blueprint for further issues • Request scheduling • Auto-tuning scheduling for SLOs • Multiple read replicas
  • 8. 8 Project Valta
  • 9. Project Valta 9 • Need basic resource limits in HBase • Single shared system • Ill-behaved HBase clients are unrestricted • Take resources from other clients • Worst case: rolling RS failures • Want to limit damage from bad clients
  • 10. Resource Limits 10 • Collect RPC metrics • Payload size and throughput • Impose per-client throughput limits • e.g. MR import limited to 100 1MB puts/s • Limits are enforced per-regionserver • Soft state • Think of it as a firewall
  • 11. Implementation 11 • Client-side table wrapper • Server-side coprocessor • Github • https://github.com/larsgeorge/Valta • Follow HBASE-8481 • https://issues.apache.org/jira/browse/HBASE-8481
  • 12. Limitations 12 • Important first steps, still more to do • Static limits need baby-sitting • Dynamic workload, set of clients • Doesn’t fix some parts of HBase • Compactions • Doesn’t fix the rest of the stack • HDFS, OS, disk
  • 13. 13 Blueprint for further issues
  • 14. Blueprint 14 • Ideas on other QoS issues • Full-stack request scheduling • HBase, HDFS, OS, disk • Auto-tuning to meet high-level SLOs • Random latency (compaction, cron, …) • Let’s file some JIRAs 
  • 15. Full-stack request scheduling 15 • Need scheduling in all layers • HBase, HDFS, OS, disk • Run high-priority requests first • Preemption of long operations • Some pieces already available • RPC priority field (HADOOP-9194) • Client names in MR/HBase/HDFS
  • 16. HBase request scheduling 16 • Add more HBase scheduling hooks • RPC handling • Between HDFS I/Os • During long coprocessors or scans • Expose hooks to coprocessors • Could be used by Valta
  • 17. HDFS request scheduling 17 • Same scheduling hooks as in HBase • RPC layer, between I/Os • Bound # of requests per disk • Reduces queue length and contention • Preempt queues in OS and disk • OS block layer (CFQ, ioprio_set) • Disk controller (SATA NCQ, ???)
  • 18. High-level SLO enforcement 18 • Research work I did at Berkeley (Cake) • Specify high-level SLOs directly to HBase • “100ms 99th percentile latency for gets” • Added hooks to HBase and HDFS • System auto-tunes to satisfy SLOs • Read the paper or hit me up! • http://www.umbrant.com/papers/socc12-cake.pdf
  • 19. Multiple read replicas 19 • Also proposed for MTTR, availability • Many unpredictable sources of latency • Compactions • Also: cron, MR spill, shared caches, network, … • Sidestep the problem! • Read from 3 RS, return the fastest result • Unlikely all three will be slow • Weaker consistency, better latency
  • 20. Conclusion 20 • HBase is a great system! • Let’s make it multitenant • Request limits • Full-stack request scheduling • High-level SLO enforcement • Multiple read replicas
  • 21. 21 Thanks! lars@cloudera.com andrew.wang@cloudera.com

×