SlideShare a Scribd company logo
1 of 23
Download to read offline
© Hortonworks Inc. 2014
HBase Backup and Restore
Vladimir Rodionov
Ted Yu
Page 1
© Hortonworks Inc. 2014
About the authors
• Ted Yu:
• Been working on HBase for over 6 years
• HBase committer / PMC
• Senior Staff Engineer at Hortonworks
• Vladimir:
• active contributor to hbase (over 100 HBase JIRAs)
• completed most of the backup work based on IBM’s initial contribution
• Senior Staff Engineer at Hortonworks
Page 2
© Hortonworks Inc. 2014
HBase Backup – Why We Need It
• Database needs disaster recovery tool
• Previously users can perform snapshot
• However, execution cost for snapshot may be high – flush across region
servers is involved
• There was no incremental snapshot – whole dataset is captured by
snapshot
• Incremental backup doesn’t involve flushing, making continuous backup
possible
Page 3
© Hortonworks Inc. 2014
Brief History of Backup / Restore work
• Started by engineers at IBM – see HBASE-7912
• Initial design included backup manifest
• Vladimir / Ted picked up the work last year
• Vladimir rendered many iterations of patches for phase 2 work (see
HBASE-14123)
• Due to feedback from community, the design has gone thru major
changes
• Mostly tested by developers and QA engineers so far
Page 4
© Hortonworks Inc. 2014
HBase Backup Types
• Full backup – foundation for incremental backups
• Incremental backup – can be periodic to capture changes over time
• Supports table level backup
Page 5
© Hortonworks Inc. 2014
Required Configuration
• Set hbase.backup.enable to true
• BackupLogCleaner for
hbase.master.logcleaner.plugins
• LogRollMasterProcedureManager for
hbase.procedure.master.classes
• LogRollRegionServerProcedureManager for
hbase.procedure.regionserver.classes
• Backup may get stuck if not configured properly
Page 6
© Hortonworks Inc. 2014
Backup Strategy
• Intra-cluster backup is appropriate for testing
Page 7
© Hortonworks Inc. 2014
Backup Strategy: Dedicated HDFS Cluster
• backup on a separate HDFS archive cluster
Page 8
© Hortonworks Inc. 2014
Backup Strategy: Cloud or a Storage Vendor
• vendor can be a public cloud provider or a storage vendor who uses a
Hadoop compatible file system
Page 9
© Hortonworks Inc. 2014
Best Practices for Backup-and-Restore
• Secure a full backup image first
• Formulate a restore strategy and test it
• Define and use backup sets for groups of tables that are logical subsets
of the entire dataset
• Document the backup-and-restore strategy, and ideally log information
about each backup
Page 10
© Hortonworks Inc. 2014
Creating/Maintaining Backup Image
• Run the following command as hbase superuser:
• hbase backup create {{ full | incremental }
{backup_root_path} {[-t tables] | [-set
backup_set_name]}} [[-silent] | [-w
number_of_workers] | [-b bandwidth_per_worker]]
Page 11
© Hortonworks Inc. 2014
Using Backup Sets
• Reduces the amount of repetitive input of table names.
• “hbase backup set add” command.
• You can have multiple backup sets
• Backup set can be used in the “hbase backup create” or “hbase backup
restore” commands
Page 12
© Hortonworks Inc. 2014
Restoring a Backup Image
• You can only restore on a live HBase cluster
• Run the following command as hbase superuser
• hbase restore {[-set backup_set_name] | [backup_root_path] | [backupId]
| -t [tables]} [-m [table_mapping] | [-overwrite] | [-check]]
• hbase restore /tmp/backup_incremental backupId_1467823988425 -t
mytable1,mytable2 -overwrite
Page 13
© Hortonworks Inc. 2014
Backup table
• Backup table will keep track of all backup sessions
–Write/Read backup session state
–Write/Read backup session progress (per region server).
–Stores last backed up WAL file timestamp (per region server).
–Stores list of all backed up WAL files (for BackupLogCleaner )
–Stores backup sets
• Must be backed up and restored separately from other tables
• Information needed for restore is on hdfs
Page 14
© Hortonworks Inc. 2014
Incremental backups
• Use Write Ahead Logs (WALs) to capture the data changes since the
previous backup
• Log roll is executed across all RegionServers
• All the WAL files from incremental backups between the last full backup
and the incremental backup are converted to HFiles
• A process similar to the DistCp tool is used to move the source backup
files to the target file system
Page 15
© Hortonworks Inc. 2014
Filter WALs on backup to only include relevant edits
• Suppose incremental backup request is for table t, all the tables already
registered in a backup system, T, are union’ed with t
• For every table K in the union:
1. Convert new WAL files into HFile applying table filter for K
2. Move these HFile(s) to backup destination
Page 16
© Hortonworks Inc. 2014
Restore
• The full backup is restored from the full backup image.
• HFileSplitter job will collect all HFile(s), split them into new region
boundaries
• HBase Bulk Load utility is invoked by restore to import the HFiles as
restored data in the table.
Page 17
© Hortonworks Inc. 2014
Backup Manifest
• Backup image has the following:
• Backup Id, Backup Type, Backup Rootdir, Table List, start timestamp,
completion timestamp
• Mapping between region server and last recorded WAL timestamp
• Backup image keeps lineage of all previously created backup images
(ancestors)
• When backup image list covers the image being considered, it is
removed from restore
• See message BackupImage in Backup.proto
Page 18
© Hortonworks Inc. 2014
Bulk load support
• Bulk loaded Hfiles are recorded in backup table at the end of bulk load,
thru preCommitStoreFile() hook
• During incremental backup, these Hfiles are copied to backup
destination
• During restore, these Hfiles are loaded into target table
Page 19
© Hortonworks Inc. 2014
Limitations of the Backup-Restore
• Only one active backup session is supported.
• Both backup and restore can’t be canceled while in progress. (HBASE-
15997,15998)
• Single backup destination only is supported. HBASE-15476
• There is no merge for incremental images (HBASE-14135)
• Only superuser (hbase) is allowed to perform backup/restore
Page 20
© Hortonworks Inc. 2014
Credit
• Richard Ding
• Vladimir Rodionov
Page 21
© Hortonworks Inc. 2014
Q/A
Page 22
© Hortonworks Inc. 2014
Thank you.
Page 23

More Related Content

More from HBaseCon

hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at NeteaseHBaseCon
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践HBaseCon
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台HBaseCon
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comHBaseCon
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architectureHBaseCon
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at HuaweiHBaseCon
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMiHBaseCon
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0HBaseCon
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon
 
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...HBaseCon
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon
 
HBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at XiaomiHBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at XiaomiHBaseCon
 
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ SalesforceHBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ SalesforceHBaseCon
 

More from HBaseCon (20)

hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
 
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
 
HBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at XiaomiHBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at Xiaomi
 
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ SalesforceHBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
 

Recently uploaded

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 

Recently uploaded (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

hbaseconasia2017: Backup / Restore feature in HBase

  • 1. © Hortonworks Inc. 2014 HBase Backup and Restore Vladimir Rodionov Ted Yu Page 1
  • 2. © Hortonworks Inc. 2014 About the authors • Ted Yu: • Been working on HBase for over 6 years • HBase committer / PMC • Senior Staff Engineer at Hortonworks • Vladimir: • active contributor to hbase (over 100 HBase JIRAs) • completed most of the backup work based on IBM’s initial contribution • Senior Staff Engineer at Hortonworks Page 2
  • 3. © Hortonworks Inc. 2014 HBase Backup – Why We Need It • Database needs disaster recovery tool • Previously users can perform snapshot • However, execution cost for snapshot may be high – flush across region servers is involved • There was no incremental snapshot – whole dataset is captured by snapshot • Incremental backup doesn’t involve flushing, making continuous backup possible Page 3
  • 4. © Hortonworks Inc. 2014 Brief History of Backup / Restore work • Started by engineers at IBM – see HBASE-7912 • Initial design included backup manifest • Vladimir / Ted picked up the work last year • Vladimir rendered many iterations of patches for phase 2 work (see HBASE-14123) • Due to feedback from community, the design has gone thru major changes • Mostly tested by developers and QA engineers so far Page 4
  • 5. © Hortonworks Inc. 2014 HBase Backup Types • Full backup – foundation for incremental backups • Incremental backup – can be periodic to capture changes over time • Supports table level backup Page 5
  • 6. © Hortonworks Inc. 2014 Required Configuration • Set hbase.backup.enable to true • BackupLogCleaner for hbase.master.logcleaner.plugins • LogRollMasterProcedureManager for hbase.procedure.master.classes • LogRollRegionServerProcedureManager for hbase.procedure.regionserver.classes • Backup may get stuck if not configured properly Page 6
  • 7. © Hortonworks Inc. 2014 Backup Strategy • Intra-cluster backup is appropriate for testing Page 7
  • 8. © Hortonworks Inc. 2014 Backup Strategy: Dedicated HDFS Cluster • backup on a separate HDFS archive cluster Page 8
  • 9. © Hortonworks Inc. 2014 Backup Strategy: Cloud or a Storage Vendor • vendor can be a public cloud provider or a storage vendor who uses a Hadoop compatible file system Page 9
  • 10. © Hortonworks Inc. 2014 Best Practices for Backup-and-Restore • Secure a full backup image first • Formulate a restore strategy and test it • Define and use backup sets for groups of tables that are logical subsets of the entire dataset • Document the backup-and-restore strategy, and ideally log information about each backup Page 10
  • 11. © Hortonworks Inc. 2014 Creating/Maintaining Backup Image • Run the following command as hbase superuser: • hbase backup create {{ full | incremental } {backup_root_path} {[-t tables] | [-set backup_set_name]}} [[-silent] | [-w number_of_workers] | [-b bandwidth_per_worker]] Page 11
  • 12. © Hortonworks Inc. 2014 Using Backup Sets • Reduces the amount of repetitive input of table names. • “hbase backup set add” command. • You can have multiple backup sets • Backup set can be used in the “hbase backup create” or “hbase backup restore” commands Page 12
  • 13. © Hortonworks Inc. 2014 Restoring a Backup Image • You can only restore on a live HBase cluster • Run the following command as hbase superuser • hbase restore {[-set backup_set_name] | [backup_root_path] | [backupId] | -t [tables]} [-m [table_mapping] | [-overwrite] | [-check]] • hbase restore /tmp/backup_incremental backupId_1467823988425 -t mytable1,mytable2 -overwrite Page 13
  • 14. © Hortonworks Inc. 2014 Backup table • Backup table will keep track of all backup sessions –Write/Read backup session state –Write/Read backup session progress (per region server). –Stores last backed up WAL file timestamp (per region server). –Stores list of all backed up WAL files (for BackupLogCleaner ) –Stores backup sets • Must be backed up and restored separately from other tables • Information needed for restore is on hdfs Page 14
  • 15. © Hortonworks Inc. 2014 Incremental backups • Use Write Ahead Logs (WALs) to capture the data changes since the previous backup • Log roll is executed across all RegionServers • All the WAL files from incremental backups between the last full backup and the incremental backup are converted to HFiles • A process similar to the DistCp tool is used to move the source backup files to the target file system Page 15
  • 16. © Hortonworks Inc. 2014 Filter WALs on backup to only include relevant edits • Suppose incremental backup request is for table t, all the tables already registered in a backup system, T, are union’ed with t • For every table K in the union: 1. Convert new WAL files into HFile applying table filter for K 2. Move these HFile(s) to backup destination Page 16
  • 17. © Hortonworks Inc. 2014 Restore • The full backup is restored from the full backup image. • HFileSplitter job will collect all HFile(s), split them into new region boundaries • HBase Bulk Load utility is invoked by restore to import the HFiles as restored data in the table. Page 17
  • 18. © Hortonworks Inc. 2014 Backup Manifest • Backup image has the following: • Backup Id, Backup Type, Backup Rootdir, Table List, start timestamp, completion timestamp • Mapping between region server and last recorded WAL timestamp • Backup image keeps lineage of all previously created backup images (ancestors) • When backup image list covers the image being considered, it is removed from restore • See message BackupImage in Backup.proto Page 18
  • 19. © Hortonworks Inc. 2014 Bulk load support • Bulk loaded Hfiles are recorded in backup table at the end of bulk load, thru preCommitStoreFile() hook • During incremental backup, these Hfiles are copied to backup destination • During restore, these Hfiles are loaded into target table Page 19
  • 20. © Hortonworks Inc. 2014 Limitations of the Backup-Restore • Only one active backup session is supported. • Both backup and restore can’t be canceled while in progress. (HBASE- 15997,15998) • Single backup destination only is supported. HBASE-15476 • There is no merge for incremental images (HBASE-14135) • Only superuser (hbase) is allowed to perform backup/restore Page 20
  • 21. © Hortonworks Inc. 2014 Credit • Richard Ding • Vladimir Rodionov Page 21
  • 22. © Hortonworks Inc. 2014 Q/A Page 22
  • 23. © Hortonworks Inc. 2014 Thank you. Page 23