HBase Incremental Backup / Restore2012/07/23
How to perform Incremental Backup/Restore?• HBase ships with a handful of useful tools  – CopyTable  – Export / Import
CopyTable• Purpose:  – Copy part of or all of a table, either to the same cluster or    another cluster• Usage:  – bin/hba...
CopyTable (cont.)• Limitation  – Can only backup to another table (Scan + Put)  – While a CopyTable is running, newly inse...
Export• Purpose:  – Dump the contents of table to HDFS in a sequence file• Usage:  – $ bin/hbase org.apache.hadoop.hbase.m...
Export (cont.)• Limitation  – Can only backup to HDFS in a sequence file (Scan + Write to    HDFS).  – While a CopyTable i...
Import• Purpose:  – Load data that has been exported back into HBase• Usage  – $ bin/hbase org.apache.hadoop.hbase.mapredu...
Conclusion• Regular (ex. Daily) Incremental backup  – Use Export and organize output dir as a meaningful hierarchy     • /...
Question?
Upcoming SlideShare
Loading in...5
×

HBase Incremental Backup

1,902

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,902
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
13
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

HBase Incremental Backup

  1. 1. HBase Incremental Backup / Restore2012/07/23
  2. 2. How to perform Incremental Backup/Restore?• HBase ships with a handful of useful tools – CopyTable – Export / Import
  3. 3. CopyTable• Purpose: – Copy part of or all of a table, either to the same cluster or another cluster• Usage: – bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable [--starttime=X] [-- endtime=Y] [--new.name=NEW] [--peer.adr=ADR] tablename• Options: – starttime: Beginning of the time range. – endtime: End of the time range. Without endtime means starttime to forever. – new.name: New tables name. – peer.adr: Address of the peer cluster given in the format hbase.zookeeper.quorum:hbase.zookeeper.client.port:zookeepe r.znode.parent – families: Comma-separated list of ColumnFamilies to copy.
  4. 4. CopyTable (cont.)• Limitation – Can only backup to another table (Scan + Put) – While a CopyTable is running, newly inserted or updated rows may occur and these concurrent edits may cause inconsistency.
  5. 5. Export• Purpose: – Dump the contents of table to HDFS in a sequence file• Usage: – $ bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [[<starttime> [<endtime>]]]• Options: – *tablename: The name of the table to export – *outputdir: The location in HDFS to store the exported data – starttime: Beginning of the time range – endtime: The matching end time for the time range of the scan used
  6. 6. Export (cont.)• Limitation – Can only backup to HDFS in a sequence file (Scan + Write to HDFS). – While a CopyTable is running, newly inserted or updated rows may occur and these concurrent edits may cause inconsistency.
  7. 7. Import• Purpose: – Load data that has been exported back into HBase• Usage – $ bin/hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>
  8. 8. Conclusion• Regular (ex. Daily) Incremental backup – Use Export and organize output dir as a meaningful hierarchy • /table_name /2012 (year) /07 (month) /01 (date) /02 … /31 /01 (hour) … /24 – Perform Import to restore data on-demand• To reduce the overhead, don’t perform it during the peak time
  9. 9. Question?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×