HBase Incremental Backup / Restore
2012/07/23
How to perform Incremental Backup/Restore?

• HBase ships with a handful of useful tools
  – CopyTable
  – Export / Import
CopyTable

• Purpose:
  – Copy part of or all of a table, either to the same cluster or
    another cluster
• Usage:
  – bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable [--starttime=X] [--
    endtime=Y] [--new.name=NEW] [--peer.adr=ADR] tablename

• Options:
  – starttime: Beginning of the time range.
  – endtime: End of the time range. Without endtime means
    starttime to forever.
  – new.name: New table's name.
  – peer.adr: Address of the peer cluster given in the format
    hbase.zookeeper.quorum:hbase.zookeeper.client.port:zookeepe
    r.znode.parent
  – families: Comma-separated list of ColumnFamilies to copy.
CopyTable (cont.)

• Limitation
  – Can only backup to another table (Scan + Put)
  – While a CopyTable is running, newly inserted or updated rows
    may occur and these concurrent edits may cause inconsistency.
Export

• Purpose:
  – Dump the contents of table to HDFS in a sequence file
• Usage:
  – $ bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename>
    <outputdir> [[<starttime> [<endtime>]]]

• Options:
  –   *tablename: The name of the table to export
  –   *outputdir: The location in HDFS to store the exported data
  –   starttime: Beginning of the time range
  –   endtime: The matching end time for the time range of the scan
      used
Export (cont.)

• Limitation
  – Can only backup to HDFS in a sequence file (Scan + Write to
    HDFS).
  – While a CopyTable is running, newly inserted or updated rows
    may occur and these concurrent edits may cause inconsistency.
Import

• Purpose:
  – Load data that has been exported back into HBase
• Usage
  – $ bin/hbase org.apache.hadoop.hbase.mapreduce.Import <tablename>
    <inputdir>
Conclusion

• Regular (ex. Daily) Incremental backup
  – Use Export and organize output dir as a meaningful hierarchy
     • /table_name
       /2012     (year)
         /07       (month)
           /01        (date)
           /02
            …
           /31
             /01        (hour)
             …
             /24
  – Perform Import to restore data on-demand
• To reduce the overhead, don’t perform it during the
  peak time
Question?

HBase Incremental Backup

  • 1.
    HBase Incremental Backup/ Restore 2012/07/23
  • 2.
    How to performIncremental Backup/Restore? • HBase ships with a handful of useful tools – CopyTable – Export / Import
  • 3.
    CopyTable • Purpose: – Copy part of or all of a table, either to the same cluster or another cluster • Usage: – bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable [--starttime=X] [-- endtime=Y] [--new.name=NEW] [--peer.adr=ADR] tablename • Options: – starttime: Beginning of the time range. – endtime: End of the time range. Without endtime means starttime to forever. – new.name: New table's name. – peer.adr: Address of the peer cluster given in the format hbase.zookeeper.quorum:hbase.zookeeper.client.port:zookeepe r.znode.parent – families: Comma-separated list of ColumnFamilies to copy.
  • 4.
    CopyTable (cont.) • Limitation – Can only backup to another table (Scan + Put) – While a CopyTable is running, newly inserted or updated rows may occur and these concurrent edits may cause inconsistency.
  • 5.
    Export • Purpose: – Dump the contents of table to HDFS in a sequence file • Usage: – $ bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [[<starttime> [<endtime>]]] • Options: – *tablename: The name of the table to export – *outputdir: The location in HDFS to store the exported data – starttime: Beginning of the time range – endtime: The matching end time for the time range of the scan used
  • 6.
    Export (cont.) • Limitation – Can only backup to HDFS in a sequence file (Scan + Write to HDFS). – While a CopyTable is running, newly inserted or updated rows may occur and these concurrent edits may cause inconsistency.
  • 7.
    Import • Purpose: – Load data that has been exported back into HBase • Usage – $ bin/hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>
  • 8.
    Conclusion • Regular (ex.Daily) Incremental backup – Use Export and organize output dir as a meaningful hierarchy • /table_name /2012 (year) /07 (month) /01 (date) /02 … /31 /01 (hour) … /24 – Perform Import to restore data on-demand • To reduce the overhead, don’t perform it during the peak time
  • 9.