•Distributed Hash table
•Get, put, delete, scan, and CaS
•Denormalization is necessary
•Not a parallel database, just distributed
•Write-ahead log / data durability
How fast can you:
•Change an OS conﬁguration on 100 machines?
•Kill one process on said machines?
•Reboot all your machines?
•Reboot all your machines one by one, with
some added conﬁguration changes?
•Add 10 new fully conﬁgured nodes?
If you can manage to take your cluster offline for
possibly an hour:
2.distcp to another cluster/separate folder
* It's possible to run a distcp before shutting down, make sure you run distcp
-update -delete for the second step.
Backup - Offline
1.Create another HBase cluster (can be remote)
2.Alter the families that need replication
3.Make sure the same tables exist on the slave
* Replication isn't done inline with the inserts in the master cluster
* See "Apache HBase Replication" with Chris Trezzo at 5:20PM
Backup - Replication
•Doesn't require copying data
•Runs in less than 60 seconds
•Minimal impact on performance
* See the slides from "Apache HBase Table Snapshots" with Jonathan Hsieh
Backup - Snapshot