• Like
  • Save

HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.

  • 1,735 views
Uploaded on

Presented by: Jean-Daniel Cryans (Cloudera), and Kevin O'dell (Cloudera)

Presented by: Jean-Daniel Cryans (Cloudera), and Kevin O'dell (Cloudera)

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,735
On Slideshare
0
From Embeds
0
Number of Embeds
5

Actions

Shares
Downloads
0
Comments
0
Likes
9

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. HBase, Meet OPs. OPs, Meet HBase Kevin O'Dell Jean-Daniel Cryans
  • 2. Kevin O'Dell Systems Engineer Extensive experience supporting customers 2
  • 3. Jean-Daniel Cryans Software Engineer Builds and runs HBase 3
  • 4. Agenda Leveraging previous knowledge (Kevin) Getting to production (JD) 4
  • 5. Goals Help audience members understanding how to operate HBase. Empower audience members when talking to their own ops organization. 5
  • 6. Leveraging previous knowledge Distributed Filesystem Distributed Database 6 Java OS Network Hardware
  • 7. Machines •Industry Standard •No RAID controller (JBOD on the slaves) •Homogeneous environment is not necessary •Cores, Spindles, and RAM • Different configurations for different uses 7
  • 8. Network •Leverage the existing infrastructure •No fancy equipment, no Infiniband •Redundancy is key, no SPOF •TOR vs Core •1Gb, 10Gb, and 40Gb •Bonding, VIPs, other such complexities 8
  • 9. Network 9
  • 10. Operating system •Production ready Linux •Swap vs. Swappiness •Basic FS -> Ext3/4 •Cgroups •Recommended packages (systat, mce, iperf) 10
  • 11. Java •User space •Programs run in contained JVM •JVM requires tuning •No leaks (usually), but overcommitting is easy 11
  • 12. Distributed Filesystem(HDFS) •Shared nothing •User-level filesystem •No POSIX compliant •Immutable •Built in Redundancy •Linear Scalability 12
  • 13. Distributed Filesystem 13
  • 14. Distributed Database(HBase) •Distributed Hash table •Get, put, delete, scan, and CaS •Denormalization is necessary •Not a parallel database, just distributed •Write-ahead log / data durability •Master/slave replication •ACID compliance 14
  • 15. Distributed Database 15
  • 16. Getting to Production Things HBase doesn't come with: •Metrics •Automation •Alerting 16
  • 17. Metrics Tony was really excited to try his new cluster 17
  • 18. Metrics You have no excuses: •Ganglia •Cacti •OpenTSDB •Hannibal 18
  • 19. Metrics - Ganglia 19
  • 20. Metrics - Cacti 20
  • 21. Metrics - OpenTSDB 21
  • 22. Metrics - Hannibal 22
  • 23. Metrics Metrics you want in your dashboards: •Call queues •IO wait •Compaction queues 23
  • 24. Metrics - Call Queues 24 View of all the machines together
  • 25. Metrics - Call Queues 25 Ceiling
  • 26. Metrics - Call Queues 26 Breaking it down per node
  • 27. Metrics - Call Queues 27 What’s up with this one?
  • 28. Metrics - IO Wait 28 Same time, breaking it down per node
  • 29. Metrics - IO Wait 29 Our machine is somewhere here...
  • 30. Metrics - IO Wait 30 Showing the previous machine (used to be yellow sorry)
  • 31. Metrics - Compaction Queues 31 View of all the machines together, different time
  • 32. Metrics - Compaction Queues 32 Nice slope! Load is well distributed
  • 33. Metrics - Compaction Queues 33 Oh...
  • 34. Metrics - Compaction Queues 34 What is going on here?
  • 35. Metrics Want to learn more about metrics? See: “Using Metrics to Monitor and Debug Apache HBase” (5:00pm-5:20pm) with Elliott Clark 35
  • 36. 23:59:60 36
  • 37. Automation How fast can you: •Change an OS configuration on 100 machines? •Kill one process on said machines? •Reboot all your machines? •Reboot all your machines one by one, with some added configuration changes? •Add 10 new fully configured nodes? 37
  • 38. "Automation" - CSSH Are you blind yet? 38
  • 39. Automation - Puppet 39
  • 40. Automation - Chef 40
  • 41. Automation - Fabric $ fab 41
  • 42. Automation Common automations: •Rolling restart •Adding/removing nodes •Deploying new configurations •Finer re-balancing 42
  • 43. Alerting HBase is just like any other system you are running, so maybe you've heard of... 43
  • 44. Alerting - Nagios 44
  • 45. Alerting - Zabbix 45
  • 46. Alerting What to alert on: •Previous metrics (call/compaction queues, IO). •Network bandwidth •Disk space •Number of regions •SMARTD 46
  • 47. Backup 47
  • 48. Backup 48 No, you’re not the only one. Now drop that gun.
  • 49. If you can manage to take your cluster offline for possibly an hour: 1.Shutdown HBase 2.distcp to another cluster/separate folder 3.Restart HBase * It's possible to run a distcp before shutting down, make sure you run distcp -update -delete for the second step. Backup - Offline 49
  • 50. 1.Create another HBase cluster (can be remote) 2.Alter the families that need replication 3.Make sure the same tables exist on the slave cluster * Replication isn't done inline with the inserts in the master cluster * See "Apache HBase Replication" with Chris Trezzo at 5:20PM Backup - Replication 50
  • 51. •Doesn't require copying data •Runs in less than 60 seconds •Minimal impact on performance * See the slides from "Apache HBase Table Snapshots" with Jonathan Hsieh & pals Backup - Snapshot 51
  • 52. Thank You! 52