HBase Snapshots
HBase User Group Meetup
10/29/12



Jesse Yates
So you wanna….
• Prevent data loss

• Recover to a point in time

• Backup your data

• Sandbox copy of data
Problem!
a BIG Problem…
• Petabytes of data

• 100’s of servers

• At a single point in time

• Millions of writes per-second
Solution!
Solutions!
(Obvious) Solutions!
Built-in
• Export
    – MapReduce job against HBase API
    – Output to single seqeunce file

• Copy Table
    – MapReduce job against HBase API
    – Output to another table

Yay
• Simple
• Heavily tested
• Can do point-in-time

Boo
• Slow
• High impact for running cluster
(Less Obvious) Solution!
Replication
• Export all changes by tailing WAL

YAY
• Simple
• Gets all edits
• Minimal impact on running cluster

Boo
• Turn on from beginning
• Can’t turn it off and catch up
• No built-in point-in-time
• Still need ETL process to get multiple copies
(Facebook) Solution!1
                    Mozilla did something similar2




1. issues.apache.org/jira/browse/HBASE-5509
2. github.com/mozilla-metrics/akela/blob/master/src/main/java/com/mozilla/hadoop/Backup.java
Facebook Backup
• Copy existing hfiles, hlogs

Yay
• Through HDFS
   – Doesn’t impact running cluster
• Fast
   – distcp is 100% faster than M/R through HBase

Boo
• Not widely used
• Requires Hardlinks
• Recovery requires WAL replay
• Point-in-time needs filter
Backup through the ages
        Export


                 Copy Table



                              Replication

HBase                                            HBASE-50
HDFS



                                      Facebook
Maybethis is harder than we thought…
We did some work…
Hardlink workarounds
• HBASE-5547
  – Move deleted hfiles to .archive directory


• HBASE-6610
  – FileLink: equivalent to Windows link files



Enough to get started….
Difficulties
• Coordinating many servers

• Minimizing unavailability

• Minimize time to restore

• Gotta’ be Fast
HBASE-6055
 HBASE-50
Snapshots
• Fast
  - zero-copy of files
• Point-in-time semantics
  – Part of how its built
• Built-in recovery
  – Make a table from a snapshot
• SLA enforcement
  – Guaranteed max unavailability
Snapshots?
We’ve got a couple of those…
Snapshot Types
• Offline
  – Table is already disabled


• Globally consistent
  – Consistent across all servers


• Timestamp consistent
  – Point-in-time according to each server
Offline Snapshots
• Table is already disabled
• Requires minimal log replay
  – Especially if table is cleanly disabled
• State of the table when disabled
• Don’t need to worry about changing state

YAY
• Fast!
• Simple!
But I can’t take my table offline!
Globally Consistent Snapshots
• All regions block writes until everyone agrees
  to snapshot
  – Two-phase commit-ish


• Time-bound to prevent infinite blocking
  – Unavailability SLA maintained per region


• No Flushing – its fast!
What could possibly go wrong?
Cross-Server Consistency Problems
• General distributed coordination problems
  – Block writes while waiting for all regions
  – Limited by slowest region
  –  servers = P(failure)

• Stronger guarantees than currently in HBase

• Requires WAL replay to restore table
I don’t need all that,
what else do you have?
Timestamp Consistent Snapshots
• All writes up to a TS are in the snapshot

• Leverages existing flush functionality

• Doesn’t block writes

• No WAL replay on recovery
Timestamp Consistent?
Put/Get/Delete/Mutate/etc.




                       MemStore



                 Timestamp in snapshot?



     Yes                                    No




Snapshot Store                            Future Store
I’ve got a snapshot,
     now what?
Recovery
• Export snapshot
  – Send snapshot to another cluster


• Clone snapshot
  – Create new table from snapshot


• Restore table
  – Rollback table to specific state
Export Snapshot
• Copy a full snapshot to another cluster
  – All required HFiles/Hlogs
  – Lots of options


• Fancy dist-cp
  – Fast!
  – Minimal impact on running cluster
Clone Table
• New table from snapshot

• Create multiple tables from same snapshot

• Exact replica at the point-in-time

• Full Read/Write on new table
Restore
• Replace existing table with snapshot

• Snapshots current table, just in case

• Minimal overhead
  – Handles creating/deleting regions
  – Fixes META for you
Whew, that’s a lot!
Even more awesome!
Goodies
• Full support in shell

• Distributed Coordination Framework

• ‘Ragged Backup’ added along the way

• Coming in next CDH

• Backport to 0.94?
Special thanks!
• MatteoBertozzi
  – All the recovery code
  – Shell support

• Jon Hsieh
  – Distributed Two-Phase Commit refactor

• All our reviewers…
  – Stack, Ted Yu, Jon Hsieh, Matteo
Thanks!
   Questions?

         Jesse Yates
      @jesse_yates
jesse.k.yates@gmail.com

Jesse Yates: Hbase snapshots patch

  • 1.
    HBase Snapshots HBase UserGroup Meetup 10/29/12 Jesse Yates
  • 2.
    So you wanna…. •Prevent data loss • Recover to a point in time • Backup your data • Sandbox copy of data
  • 3.
  • 4.
    a BIG Problem… •Petabytes of data • 100’s of servers • At a single point in time • Millions of writes per-second
  • 5.
  • 6.
  • 7.
  • 8.
    Built-in • Export – MapReduce job against HBase API – Output to single seqeunce file • Copy Table – MapReduce job against HBase API – Output to another table Yay • Simple • Heavily tested • Can do point-in-time Boo • Slow • High impact for running cluster
  • 9.
  • 10.
    Replication • Export allchanges by tailing WAL YAY • Simple • Gets all edits • Minimal impact on running cluster Boo • Turn on from beginning • Can’t turn it off and catch up • No built-in point-in-time • Still need ETL process to get multiple copies
  • 11.
    (Facebook) Solution!1 Mozilla did something similar2 1. issues.apache.org/jira/browse/HBASE-5509 2. github.com/mozilla-metrics/akela/blob/master/src/main/java/com/mozilla/hadoop/Backup.java
  • 12.
    Facebook Backup • Copyexisting hfiles, hlogs Yay • Through HDFS – Doesn’t impact running cluster • Fast – distcp is 100% faster than M/R through HBase Boo • Not widely used • Requires Hardlinks • Recovery requires WAL replay • Point-in-time needs filter
  • 13.
    Backup through theages Export Copy Table Replication HBase HBASE-50 HDFS Facebook
  • 14.
    Maybethis is harderthan we thought…
  • 16.
    We did somework…
  • 17.
    Hardlink workarounds • HBASE-5547 – Move deleted hfiles to .archive directory • HBASE-6610 – FileLink: equivalent to Windows link files Enough to get started….
  • 18.
    Difficulties • Coordinating manyservers • Minimizing unavailability • Minimize time to restore • Gotta’ be Fast
  • 19.
  • 20.
    Snapshots • Fast - zero-copy of files • Point-in-time semantics – Part of how its built • Built-in recovery – Make a table from a snapshot • SLA enforcement – Guaranteed max unavailability
  • 21.
  • 22.
    We’ve got acouple of those…
  • 23.
    Snapshot Types • Offline – Table is already disabled • Globally consistent – Consistent across all servers • Timestamp consistent – Point-in-time according to each server
  • 24.
    Offline Snapshots • Tableis already disabled • Requires minimal log replay – Especially if table is cleanly disabled • State of the table when disabled • Don’t need to worry about changing state YAY • Fast! • Simple!
  • 25.
    But I can’ttake my table offline!
  • 26.
    Globally Consistent Snapshots •All regions block writes until everyone agrees to snapshot – Two-phase commit-ish • Time-bound to prevent infinite blocking – Unavailability SLA maintained per region • No Flushing – its fast!
  • 27.
  • 28.
    Cross-Server Consistency Problems •General distributed coordination problems – Block writes while waiting for all regions – Limited by slowest region –  servers = P(failure) • Stronger guarantees than currently in HBase • Requires WAL replay to restore table
  • 29.
    I don’t needall that, what else do you have?
  • 30.
    Timestamp Consistent Snapshots •All writes up to a TS are in the snapshot • Leverages existing flush functionality • Doesn’t block writes • No WAL replay on recovery
  • 31.
  • 32.
    Put/Get/Delete/Mutate/etc. MemStore Timestamp in snapshot? Yes No Snapshot Store Future Store
  • 33.
    I’ve got asnapshot, now what?
  • 34.
    Recovery • Export snapshot – Send snapshot to another cluster • Clone snapshot – Create new table from snapshot • Restore table – Rollback table to specific state
  • 35.
    Export Snapshot • Copya full snapshot to another cluster – All required HFiles/Hlogs – Lots of options • Fancy dist-cp – Fast! – Minimal impact on running cluster
  • 36.
    Clone Table • Newtable from snapshot • Create multiple tables from same snapshot • Exact replica at the point-in-time • Full Read/Write on new table
  • 37.
    Restore • Replace existingtable with snapshot • Snapshots current table, just in case • Minimal overhead – Handles creating/deleting regions – Fixes META for you
  • 38.
  • 39.
  • 40.
    Goodies • Full supportin shell • Distributed Coordination Framework • ‘Ragged Backup’ added along the way • Coming in next CDH • Backport to 0.94?
  • 41.
    Special thanks! • MatteoBertozzi – All the recovery code – Shell support • Jon Hsieh – Distributed Two-Phase Commit refactor • All our reviewers… – Stack, Ted Yu, Jon Hsieh, Matteo
  • 42.
    Thanks! Questions? Jesse Yates @jesse_yates jesse.k.yates@gmail.com

Editor's Notes

  • #3 Data flying around,HBase is just chugging along. Your adding servers weekly – daily? – to handle the excess capacity; life is good. But wait, one of your DBAs fat fingers a command a deletes a table, a column family, the database. Or maybe your dev’s want to test out some new features – not on my production server!Or a customer makes a mistake and wants to get back to last Tuesday at 6PM.
  • #6 HBase has been around for a few years and well, these aren’t exactly new problems.
  • #8 OK, if you’ve thought about this problem for at least 5 minutes, you’ve probably seen these before. You probably are even running them them already
  • #9 Ok, we can do better…
  • #13 Just get a list of all the hfiles/hlogs and copy them over. Use hardlinks to ensure that we have the same state for the tableThis is getting better – we aren’t directly impacting the cluster (except for bandwidth).
  • #14 General trend down the stack – more knowledge of individual files, layout in HDFS, low-level functionality. Also trending towards a minimal impact on the running cluster – only take the hit on the wire, not through the HBase layer.HBASE-50:Internalhardlinks using reference counting in META, massive patch including restore, offline and online snapshots. WAY too much to review
  • #16 And for a few years people we really sad and made do with existing tooling. We are starting to run HBase in some large companies though and have stringent data requirements
  • #19 Story-ize the problem
  • #21 Focus on TADA of the snapshots
  • #28 Imagine you have 1000 servers, each with in memory state. How would you save it? How would you save it fast? Any problems?
  • #29 Example for stronger guaranettes than hbase - Currently, we only support transactions on a single row on a single server. This gives you a semi-omniscent view over all servers hosting a table – full cross server consensus over multiple rows. WAY more than HBase gives you now.
  • #31 Guarantee that all writes are filtered on a timestamp, flushing on the regionserver so all the information in the snapshot is present entirely in HFiles – NO WAL REPLAY!
  • #40 http://www.flickr.com/photos/69382656@N04/6744068967/in/photostream