Jesse Yates: Hbase snapshots patch


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Data flying around,HBase is just chugging along. Your adding servers weekly – daily? – to handle the excess capacity; life is good. But wait, one of your DBAs fat fingers a command a deletes a table, a column family, the database. Or maybe your dev’s want to test out some new features – not on my production server!Or a customer makes a mistake and wants to get back to last Tuesday at 6PM.
  • HBase has been around for a few years and well, these aren’t exactly new problems.
  • OK, if you’ve thought about this problem for at least 5 minutes, you’ve probably seen these before. You probably are even running them them already
  • Ok, we can do better…
  • Just get a list of all the hfiles/hlogs and copy them over. Use hardlinks to ensure that we have the same state for the tableThis is getting better – we aren’t directly impacting the cluster (except for bandwidth).
  • General trend down the stack – more knowledge of individual files, layout in HDFS, low-level functionality. Also trending towards a minimal impact on the running cluster – only take the hit on the wire, not through the HBase layer.HBASE-50:Internalhardlinks using reference counting in META, massive patch including restore, offline and online snapshots. WAY too much to review
  • And for a few years people we really sad and made do with existing tooling. We are starting to run HBase in some large companies though and have stringent data requirements
  • Story-ize the problem
  • Focus on TADA of the snapshots
  • Imagine you have 1000 servers, each with in memory state. How would you save it? How would you save it fast? Any problems?
  • Example for stronger guaranettes than hbase - Currently, we only support transactions on a single row on a single server. This gives you a semi-omniscent view over all servers hosting a table – full cross server consensus over multiple rows. WAY more than HBase gives you now.
  • Guarantee that all writes are filtered on a timestamp, flushing on the regionserver so all the information in the snapshot is present entirely in HFiles – NO WAL REPLAY!
  • Jesse Yates: Hbase snapshots patch

    1. 1. HBase SnapshotsHBase User Group Meetup10/29/12Jesse Yates
    2. 2. So you wanna….• Prevent data loss• Recover to a point in time• Backup your data• Sandbox copy of data
    3. 3. Problem!
    4. 4. a BIG Problem…• Petabytes of data• 100’s of servers• At a single point in time• Millions of writes per-second
    5. 5. Solution!
    6. 6. Solutions!
    7. 7. (Obvious) Solutions!
    8. 8. Built-in• Export – MapReduce job against HBase API – Output to single seqeunce file• Copy Table – MapReduce job against HBase API – Output to another tableYay• Simple• Heavily tested• Can do point-in-timeBoo• Slow• High impact for running cluster
    9. 9. (Less Obvious) Solution!
    10. 10. Replication• Export all changes by tailing WALYAY• Simple• Gets all edits• Minimal impact on running clusterBoo• Turn on from beginning• Can’t turn it off and catch up• No built-in point-in-time• Still need ETL process to get multiple copies
    11. 11. (Facebook) Solution!1 Mozilla did something similar21.
    12. 12. Facebook Backup• Copy existing hfiles, hlogsYay• Through HDFS – Doesn’t impact running cluster• Fast – distcp is 100% faster than M/R through HBaseBoo• Not widely used• Requires Hardlinks• Recovery requires WAL replay• Point-in-time needs filter
    13. 13. Backup through the ages Export Copy Table ReplicationHBase HBASE-50HDFS Facebook
    14. 14. Maybethis is harder than we thought…
    15. 15. We did some work…
    16. 16. Hardlink workarounds• HBASE-5547 – Move deleted hfiles to .archive directory• HBASE-6610 – FileLink: equivalent to Windows link filesEnough to get started….
    17. 17. Difficulties• Coordinating many servers• Minimizing unavailability• Minimize time to restore• Gotta’ be Fast
    18. 18. HBASE-6055 HBASE-50
    19. 19. Snapshots• Fast - zero-copy of files• Point-in-time semantics – Part of how its built• Built-in recovery – Make a table from a snapshot• SLA enforcement – Guaranteed max unavailability
    20. 20. Snapshots?
    21. 21. We’ve got a couple of those…
    22. 22. Snapshot Types• Offline – Table is already disabled• Globally consistent – Consistent across all servers• Timestamp consistent – Point-in-time according to each server
    23. 23. Offline Snapshots• Table is already disabled• Requires minimal log replay – Especially if table is cleanly disabled• State of the table when disabled• Don’t need to worry about changing stateYAY• Fast!• Simple!
    24. 24. But I can’t take my table offline!
    25. 25. Globally Consistent Snapshots• All regions block writes until everyone agrees to snapshot – Two-phase commit-ish• Time-bound to prevent infinite blocking – Unavailability SLA maintained per region• No Flushing – its fast!
    26. 26. What could possibly go wrong?
    27. 27. Cross-Server Consistency Problems• General distributed coordination problems – Block writes while waiting for all regions – Limited by slowest region –  servers = P(failure)• Stronger guarantees than currently in HBase• Requires WAL replay to restore table
    28. 28. I don’t need all that,what else do you have?
    29. 29. Timestamp Consistent Snapshots• All writes up to a TS are in the snapshot• Leverages existing flush functionality• Doesn’t block writes• No WAL replay on recovery
    30. 30. Timestamp Consistent?
    31. 31. Put/Get/Delete/Mutate/etc. MemStore Timestamp in snapshot? Yes NoSnapshot Store Future Store
    32. 32. I’ve got a snapshot, now what?
    33. 33. Recovery• Export snapshot – Send snapshot to another cluster• Clone snapshot – Create new table from snapshot• Restore table – Rollback table to specific state
    34. 34. Export Snapshot• Copy a full snapshot to another cluster – All required HFiles/Hlogs – Lots of options• Fancy dist-cp – Fast! – Minimal impact on running cluster
    35. 35. Clone Table• New table from snapshot• Create multiple tables from same snapshot• Exact replica at the point-in-time• Full Read/Write on new table
    36. 36. Restore• Replace existing table with snapshot• Snapshots current table, just in case• Minimal overhead – Handles creating/deleting regions – Fixes META for you
    37. 37. Whew, that’s a lot!
    38. 38. Even more awesome!
    39. 39. Goodies• Full support in shell• Distributed Coordination Framework• ‘Ragged Backup’ added along the way• Coming in next CDH• Backport to 0.94?
    40. 40. Special thanks!• MatteoBertozzi – All the recovery code – Shell support• Jon Hsieh – Distributed Two-Phase Commit refactor• All our reviewers… – Stack, Ted Yu, Jon Hsieh, Matteo
    41. 41. Thanks! Questions? Jesse Yates