How to perform Incremental Backup/Restore?• HBase ships with a handful of useful tools – CopyTable – Export / Import
CopyTable• Purpose: – Copy part of or all of a table, either to the same cluster or another cluster• Usage: – bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable [--starttime=X] [-- endtime=Y] [--new.name=NEW] [--peer.adr=ADR] tablename• Options: – starttime: Beginning of the time range. – endtime: End of the time range. Without endtime means starttime to forever. – new.name: New tables name. – peer.adr: Address of the peer cluster given in the format hbase.zookeeper.quorum:hbase.zookeeper.client.port:zookeepe r.znode.parent – families: Comma-separated list of ColumnFamilies to copy.
CopyTable (cont.)• Limitation – Can only backup to another table (Scan + Put) – While a CopyTable is running, newly inserted or updated rows may occur and these concurrent edits may cause inconsistency.
Export• Purpose: – Dump the contents of table to HDFS in a sequence file• Usage: – $ bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [[<starttime> [<endtime>]]]• Options: – *tablename: The name of the table to export – *outputdir: The location in HDFS to store the exported data – starttime: Beginning of the time range – endtime: The matching end time for the time range of the scan used
Export (cont.)• Limitation – Can only backup to HDFS in a sequence file (Scan + Write to HDFS). – While a CopyTable is running, newly inserted or updated rows may occur and these concurrent edits may cause inconsistency.
Import• Purpose: – Load data that has been exported back into HBase• Usage – $ bin/hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>
Conclusion• Regular (ex. Daily) Incremental backup – Use Export and organize output dir as a meaningful hierarchy • /table_name /2012 (year) /07 (month) /01 (date) /02 … /31 /01 (hour) … /24 – Perform Import to restore data on-demand• To reduce the overhead, don’t perform it during the peak time