HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.

3,625 views
3,455 views

Published on

Presented by: Jean-Daniel Cryans (Cloudera), and Kevin O'dell (Cloudera)

Published in: Technology
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,625
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
0
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide

HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.

  1. 1. HBase, Meet OPs. OPs, Meet HBase Kevin O'Dell Jean-Daniel Cryans
  2. 2. Kevin O'Dell Systems Engineer Extensive experience supporting customers 2
  3. 3. Jean-Daniel Cryans Software Engineer Builds and runs HBase 3
  4. 4. Agenda Leveraging previous knowledge (Kevin) Getting to production (JD) 4
  5. 5. Goals Help audience members understanding how to operate HBase. Empower audience members when talking to their own ops organization. 5
  6. 6. Leveraging previous knowledge Distributed Filesystem Distributed Database 6 Java OS Network Hardware
  7. 7. Machines •Industry Standard •No RAID controller (JBOD on the slaves) •Homogeneous environment is not necessary •Cores, Spindles, and RAM • Different configurations for different uses 7
  8. 8. Network •Leverage the existing infrastructure •No fancy equipment, no Infiniband •Redundancy is key, no SPOF •TOR vs Core •1Gb, 10Gb, and 40Gb •Bonding, VIPs, other such complexities 8
  9. 9. Network 9
  10. 10. Operating system •Production ready Linux •Swap vs. Swappiness •Basic FS -> Ext3/4 •Cgroups •Recommended packages (systat, mce, iperf) 10
  11. 11. Java •User space •Programs run in contained JVM •JVM requires tuning •No leaks (usually), but overcommitting is easy 11
  12. 12. Distributed Filesystem(HDFS) •Shared nothing •User-level filesystem •No POSIX compliant •Immutable •Built in Redundancy •Linear Scalability 12
  13. 13. Distributed Filesystem 13
  14. 14. Distributed Database(HBase) •Distributed Hash table •Get, put, delete, scan, and CaS •Denormalization is necessary •Not a parallel database, just distributed •Write-ahead log / data durability •Master/slave replication •ACID compliance 14
  15. 15. Distributed Database 15
  16. 16. Getting to Production Things HBase doesn't come with: •Metrics •Automation •Alerting 16
  17. 17. Metrics Tony was really excited to try his new cluster 17
  18. 18. Metrics You have no excuses: •Ganglia •Cacti •OpenTSDB •Hannibal 18
  19. 19. Metrics - Ganglia 19
  20. 20. Metrics - Cacti 20
  21. 21. Metrics - OpenTSDB 21
  22. 22. Metrics - Hannibal 22
  23. 23. Metrics Metrics you want in your dashboards: •Call queues •IO wait •Compaction queues 23
  24. 24. Metrics - Call Queues 24 View of all the machines together
  25. 25. Metrics - Call Queues 25 Ceiling
  26. 26. Metrics - Call Queues 26 Breaking it down per node
  27. 27. Metrics - Call Queues 27 What’s up with this one?
  28. 28. Metrics - IO Wait 28 Same time, breaking it down per node
  29. 29. Metrics - IO Wait 29 Our machine is somewhere here...
  30. 30. Metrics - IO Wait 30 Showing the previous machine (used to be yellow sorry)
  31. 31. Metrics - Compaction Queues 31 View of all the machines together, different time
  32. 32. Metrics - Compaction Queues 32 Nice slope! Load is well distributed
  33. 33. Metrics - Compaction Queues 33 Oh...
  34. 34. Metrics - Compaction Queues 34 What is going on here?
  35. 35. Metrics Want to learn more about metrics? See: “Using Metrics to Monitor and Debug Apache HBase” (5:00pm-5:20pm) with Elliott Clark 35
  36. 36. 23:59:60 36
  37. 37. Automation How fast can you: •Change an OS configuration on 100 machines? •Kill one process on said machines? •Reboot all your machines? •Reboot all your machines one by one, with some added configuration changes? •Add 10 new fully configured nodes? 37
  38. 38. "Automation" - CSSH Are you blind yet? 38
  39. 39. Automation - Puppet 39
  40. 40. Automation - Chef 40
  41. 41. Automation - Fabric $ fab 41
  42. 42. Automation Common automations: •Rolling restart •Adding/removing nodes •Deploying new configurations •Finer re-balancing 42
  43. 43. Alerting HBase is just like any other system you are running, so maybe you've heard of... 43
  44. 44. Alerting - Nagios 44
  45. 45. Alerting - Zabbix 45
  46. 46. Alerting What to alert on: •Previous metrics (call/compaction queues, IO). •Network bandwidth •Disk space •Number of regions •SMARTD 46
  47. 47. Backup 47
  48. 48. Backup 48 No, you’re not the only one. Now drop that gun.
  49. 49. If you can manage to take your cluster offline for possibly an hour: 1.Shutdown HBase 2.distcp to another cluster/separate folder 3.Restart HBase * It's possible to run a distcp before shutting down, make sure you run distcp -update -delete for the second step. Backup - Offline 49
  50. 50. 1.Create another HBase cluster (can be remote) 2.Alter the families that need replication 3.Make sure the same tables exist on the slave cluster * Replication isn't done inline with the inserts in the master cluster * See "Apache HBase Replication" with Chris Trezzo at 5:20PM Backup - Replication 50
  51. 51. •Doesn't require copying data •Runs in less than 60 seconds •Minimal impact on performance * See the slides from "Apache HBase Table Snapshots" with Jonathan Hsieh & pals Backup - Snapshot 51
  52. 52. Thank You! 52

×