Data flying around,HBase is just chugging along. Your adding servers weekly – daily? – to handle the excess capacity; life is good. But wait, one of your DBAs fat fingers a command a deletes a table, a column family, the database. Or maybe your dev’s want to test out some new features – not on my production server!Or a customer makes a mistake and wants to get back to last Tuesday at 6PM.
HBase has been around for a few years and well, these aren’t exactly new problems.
OK, if you’ve thought about this problem for at least 5 minutes, you’ve probably seen these before. You probably are even running them them already
Ok, we can do better…
Just get a list of all the hfiles/hlogs and copy them over. Use hardlinks to ensure that we have the same state for the tableThis is getting better – we aren’t directly impacting the cluster (except for bandwidth).
General trend down the stack – more knowledge of individual files, layout in HDFS, low-level functionality. Also trending towards a minimal impact on the running cluster – only take the hit on the wire, not through the HBase layer.HBASE-50:Internalhardlinks using reference counting in META, massive patch including restore, offline and online snapshots. WAY too much to review
And for a few years people we really sad and made do with existing tooling. We are starting to run HBase in some large companies though and have stringent data requirements
Story-ize the problem
Focus on TADA of the snapshots
Imagine you have 1000 servers, each with in memory state. How would you save it? How would you save it fast? Any problems?
Example for stronger guaranettes than hbase - Currently, we only support transactions on a single row on a single server. This gives you a semi-omniscent view over all servers hosting a table – full cross server consensus over multiple rows. WAY more than HBase gives you now.
Guarantee that all writes are filtered on a timestamp, flushing on the regionserver so all the information in the snapshot is present entirely in HFiles – NO WAL REPLAY!
Built-in• Export – MapReduce job against HBase API – Output to single seqeunce file• Copy Table – MapReduce job against HBase API – Output to another tableYay• Simple• Heavily tested• Can do point-in-timeBoo• Slow• High impact for running cluster
Replication• Export all changes by tailing WALYAY• Simple• Gets all edits• Minimal impact on running clusterBoo• Turn on from beginning• Can’t turn it off and catch up• No built-in point-in-time• Still need ETL process to get multiple copies
(Facebook) Solution!1 Mozilla did something similar21. issues.apache.org/jira/browse/HBASE-55092. github.com/mozilla-metrics/akela/blob/master/src/main/java/com/mozilla/hadoop/Backup.java
Facebook Backup• Copy existing hfiles, hlogsYay• Through HDFS – Doesn’t impact running cluster• Fast – distcp is 100% faster than M/R through HBaseBoo• Not widely used• Requires Hardlinks• Recovery requires WAL replay• Point-in-time needs filter
Backup through the ages Export Copy Table ReplicationHBase HBASE-50HDFS Facebook
Snapshot Types• Offline – Table is already disabled• Globally consistent – Consistent across all servers• Timestamp consistent – Point-in-time according to each server
Offline Snapshots• Table is already disabled• Requires minimal log replay – Especially if table is cleanly disabled• State of the table when disabled• Don’t need to worry about changing stateYAY• Fast!• Simple!
Globally Consistent Snapshots• All regions block writes until everyone agrees to snapshot – Two-phase commit-ish• Time-bound to prevent infinite blocking – Unavailability SLA maintained per region• No Flushing – its fast!
Cross-Server Consistency Problems• General distributed coordination problems – Block writes while waiting for all regions – Limited by slowest region – servers = P(failure)• Stronger guarantees than currently in HBase• Requires WAL replay to restore table