SlideShare a Scribd company logo
1 of 61
Apache HBase Table Snapshots
Matteo Bertozzi | Cloudera | Software Engineer / HBase Committer
Jonathan Hsieh | Cloudera | Software Engineer / HBase Committer
Jesse Yates | Salesforce.com | Software Engineer / HBase Committer
June 13, 2013
HBaseCon 2013
Outline
• Intro and Use Cases
• Usage Instructions
• Internals
• Snapshot Layout
• Snapshot Restoration
• Online Snapshots
• Conclusion
HBase Table Snapshots
Snapshot is a collection of
metadata required to
reconstitute the data near
a particular point in time
HBaseCon 2013 6/13/20133
HBase Table Snapshots
• An inexpensive way to freeze state of a
table
• A mechanism that helps backup data to
in the cluster or to a remote cluster
• Recover from user error
• Bootstrap Replication
HBaseCon 2013 6/13/20134
Old: HBase-Supported Batch Backups
• Export / Dist CP / Import
• 3 batch MR jobs
• Several extra copies of data
• High latency (hours)
• Impacts existing low-latency
workloads
• Copy Table
• 1 MR Job
• Single copy of data
• Incremental table copies
• High Latency (hours)
• Impacts existing workloads
Export
MR Job
Import
MR Job
Dist CP
MR Job
Copy Table
MR Job
HBaseCon 2013 6/13/20135
Upcoming: HDFS Snapshots (or DistCP backup)
• Take an hdfs snapshot of all the
files in the underlying HBase’s
data directory.
• Hfiles, hlogs, and other
metadata.
• Snapshots all tables in Hbase
• Cannot Clone tables
• “Restore As”
• Targeted for Hadoop 2.1 /
Hadoop 3.0 DistCP
HLog Append
Flush
Compact
Restart
Recover
HBaseCon 2013 6/13/20136
New: HBase Snapshot-based Backups
• Snapshot, then Export
• 1 MR Job
• Single copy of data
• Little impact on low-latency
workloads
• Export is like distcp directly
from hfds
• No incremental snapshot
copy
HBaseCon 2013
Export
Snapshot
6/13/20137
Export
• Like distcp for a snapshot manifest
• Copy data files without going through HBase’s “front door”
Export
HBaseCon 2013 6/13/20138
Recover from User Error
• How do we recover from user error?
Recovery Time
time
User Error:
drop ‘table’
Service is
restored, major data
loss
Service is down!
Panic! Black magic!
HBaseCon 2013 6/13/20139
Recovering from User Mistakes: Table Snapshots
• Snapshot the state of a table at a certain moment in time
• Restore it or Clone it later, creating a new read write table
• Export it to another cluster with minimal impact on HBase
time
User Error:
drop ‘table’
Service restored, Minor
data loss. Carry on.
Periodic
snapshot
Service is down!
Keep calm!
restore
Periodic
snapshot
HBaseCon 2013 6/13/201310
Usage
What an Admin needs to know
Configuration
• Simple hbase-site.xml configuration
<property>
<name>hbase.snapshot.enabled</name>
<value>true</value>
</property>
• Enabled by default in 0.95+
• Requires user to enable in 0.94.6.1+.
HBaseCon 2013 6/13/201312
Usage: Shell Commands
• snapshot ‘table’, ‘snapshot’
• Table can be offline or online
• list_snapshot [<regex>]
• clone_snapshot ‘snapshot’, ‘dsttable’
• restore_snapshot ‘snapshot’
• delete_snapshot ‘snapshot’
HBaseCon 2013 6/13/201313
Usage: Web UI
HBaseCon 2013 6/13/201314
Usage: Web UI
HBaseCon 2013 6/13/201315
Export: Usage
• Copy “MySnapshot” to a remote HDFS
• $ hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
MySnapshot -copy-to hdfs:///srv2:8082/hbase -mappers 16
• With permission change on the copy
• $ hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
MySnapshot -copy-to hdfs:///srv2:8082/hbase -chuser MyUser -chgroup MyGroup
-chmod 700 -mappers 16
HBaseCon 2013 6/13/201316
Debugging and Info
• Dump a snapshot manifest
• Writes to standard out
• Usage
• $ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot
test-snapshot
• $ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot
test-snapshot -files
HBaseCon 2013 6/13/201317
Metrics
• Histograms of operation completion
• snapshotTime
• cloneTime
• restoreTime
• Includes ‘extended’ metrics
• Std deviation
• Min/max
HBaseCon 2013 6/13/201318
Table Snapshot Internals
Internals
• HBase Table HDFS Layout
• Snapshot HDFS layout
• Offline Snapshots
• Restore and Clone Snapshot
• Online Snapshots
HBaseCon 2013 6/13/201320
Primer: HBase Table Layout in HDFS
• HRegions map directly to a directory structure
with table name, encoded region
name, column family and hfiles.
• In HDFS:
/hbase/Table/<enc R1>/cf/<hfile f11>
/hbase/Table/<enc R1>/cf/<hfile f12>
/hbase/Table/<enc R2>/cf/<hfile f21>
/hbase/Table/<enc R2>/cf/<hfile f22>
/hbase/Table/<enc R3>/cf/<hfile f31>
/hbase/Table/<enc R3>/cf/<hfile f32>
Table
F11 F21 F31
R1 R2 R3
6/13/2013HBaseCon 201321
Table Snapshots in the File System
• A Snapshot manifest contains references to files in the original
table.
./.hbase-snapshots
Table
F11 F21 F31
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201322
Table Snapshots in the File System
• A Snapshot manifest contains references to files in the original
table.
• Each snapshot is stored in the hbase/.hbase-snapshots dir.
./.hbase-snapshots
Table
F11 F21 F31
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201323
Offline Snapshots
• Disable table, then create Snapshot Manifest
• Created in temporary dir to guarantee snapshot creation
atomicity
• Includes
• Snapshot Metadata
• Table Metadata/Schema (.tableinfo)
• References to original HFiles
• Master-only file system operation
HBaseCon 2013 6/13/201324
HFile Life Cycle
• Splits and Compactions remove hfiles
• What happens to references to these files?
./.hbase-snapshots
Table
F11 F21 F31
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201325
HFile Life Cycle
• Splits and Compactions remove hfiles
• What happens to references to these files?
./.hbase-snapshots
Table
F11 F21 F31
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201326
HFile Life Cycle
• Splits and Compactions remove hfiles
• What happens to references to these files?
./.hbase-snapshots
Table
F11 F21
F31
+32
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
No more
Hfile??
HBaseCon 2013 6/13/201327
HFile Archiver
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
Table files
F31
HBaseCon 2013 6/13/201328
• We archive old HFiles from compactions (HBASE-5547)
HFile Archiver
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
F31
+32
Table files
F31
HBaseCon 2013 6/13/201329
• We archive old HFiles from compactions (HBASE-5547)
• Files stored in hbase/.archive
HFile Archiver
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
F31
+32
Table files
F31
• We archive old HFiles from compactions (HBASE-5547)
• Files stored in hbase/.archive
• HFileCleaner ensures HFiles’ data remains available
HBaseCon 2013 6/13/201330
Restore Snapshot Internals
Restore Operations
• Restore table
• Rollback table to specific state
• Clone from snapshot (Restore As)
• Create new read-write table from snapshot
• There can be multiple replicas of a snapshot
• Export snapshot
• Send snapshot and all its data to another cluster
HBaseCon 2013 6/13/201332
Clone: Creating table from a Snapshot
• Convert snapshot manifest info into a Table.
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
Table files
Clone
R1 R2 R3
F31
HBaseCon 2013 6/13/201333
Clone: Creating table from a Snapshot
• Convert snapshot manifest info into a Table.
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
Table files
Clone
R1 R2 R3
F31
HBaseCon 2013 6/13/201334
Clone: Creating table from a Snapshot
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
F31
+32
Table files
F31
Clone
R1 R2 R3
• Convert snapshot manifest info into a Table.
• HFileLinks (HBASE-6610) to mimic unix open file descriptor semantics
HBaseCon 2013 6/13/201335
Restore: Rollback to an old state
• Rollback the existing table to snapshot state
• Restores original schema if altered
• Snapshots current table, just in case
• Minimal overhead
• Smarter delete table & clone snapshot
• Handles creating/deleting regions
• Restore META
HBaseCon 2013 6/13/201336
Restore illustrated
./.hbase-snapshots
./.archive
TableSnapshot manifest
R1 R2 R3
Table files
F31
Table
F11 F21
R1 R2 R3
F31
+32 F41
R4
• Rollback “Table” to the “TableSnapshot” state
HBaseCon 2013 6/13/201337
Restore illustrated
./.hbase-snapshots
./.archive
TableSnapshot manifest
R1 R2 R3
Table files
F31
Table
F11 F21
R1 R2 R3
F31
+32 F41
R4
• Region “R4” is not present in the snapshot
• “R4” will be removed from “Table”, files moved to “.archive”
HBaseCon 2013 6/13/201338
Restore illustrated
./.hbase-snapshots
./.archive
TableSnapshot manifest
R1 R2 R3
Table files
F31
Table
F11 F21
R1 R2 R3
F31
+32
F41
• New files not present in the snapshots are moved to the archive
HBaseCon 2013 6/13/201339
Restore illustrated
./.hbase-snapshots
./.archive
TableSnapshot manifest
R1 R2 R3
Table files
F31
Table
F11 F21
R1 R2 R3
F41F3+
32
• New files not present in the snapshots are moved to the archive
• HFileLinks are created to point to old files.
HBaseCon 2013 6/13/201340
Restore failures
• The table to restore is disabled
• META and HDFS operations may fail (network issue, server down, …)
• hbck can’t repair an incomplete restore...
• Restore again!
HBaseCon 2013 6/13/201341
Export Snapshot
• Copy a full snapshot to another cluster
• All required HFiles, and Metadata
• Lots of options
• Fancy dist-cp
• Must resolve HFileLinks
• Faster than CopyTable or table export+import!
• Minimal impact on running cluster
HBaseCon 2013 6/13/201342
Online Snapshots
Online snapshots
• Take a snapshot without making the table
unavailable
• No need to disable the table
• Continue accepting reads and writes from
clients
• Challenges
• Coordinating Region Servers
• Data is in memory
• Consistency
HBaseCon 2013 6/13/201344
Offline vs Online Snapshots
Offline Online
mastermaster
RS1 RS2 RS3 RS4
verify
Snapshot
region
subprocedure
Write
manifest
per region
verify
Write
manifest
per region
6/13/2013HBaseCon 201345
Online Snapshots
• Each Region can have data in memstore and hlog, not yet Hfile
• Snapshot is missing in memory data!
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
Table files
F31
mem mem mem
HBaseCon 2013 6/13/201346
Online Snapshots
• Flush so that all in memory data written in an Hfile
• Then add to snapshot manifest
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
Table files
F31
F13 F23 F33
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201347
Online Snapshots
• Flush so that all in memory data in an Hfile
• Then add to snapshot manifest
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
Table files
F31
F13 F23 F33
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201348
Consistency
• Offline Snapshots
• Fully consistent snapshot
• Online Flush Snapshot
• “CopyTable” level consistency with a much smaller window.
• Time bounded by slowest region server and region flush
HBaseCon 2013 6/13/201349
Online Snapshots and Causal consistency
• Causal consistency would only allow A, AB, or neither A nor B.
• B and Not A is currently possible
Table
F11 F21
R1 R2 R3
F31
TableSnapshot manifest
R1 R2 R3
Master RS1 RS2 Client
mem mem
Flush SS
F13
HBaseCon 2013 6/13/201350
Online Snapshots and Causal consistency
• Causal consistency would only allow A, AB, or neither A nor B.
• B and Not A is currently possible
Table
F11 F21
R1 R2 R3
F31
TableSnapshot manifest
R1 R2 R3
Master RS1 RS2 Client
mem mem
Put A …
… then
Put B
F13
mem
HBaseCon 2013 6/13/201351
Online Snapshots and Causal consistency
• Causal consistency would allow A, AB, or neither A nor B.
• B and Not A is possible with Flush Snapshots
Table
F11 F21
R1 R2 R3
F31
TableSnapshot manifest
R1 R2 R3
Master RS1 RS2 Client
mem
F23
F13
Flush SS
Put B is
in but
Put A is
not!
F33
HBaseCon 2013 6/13/201352
Online snapshot attempts can fail
• If involved RS’s fail, the snapshots attempt will fail.
• Needs a way to prevent other table metadata operations
• Table Metadata Locks (0.95+)
• Avoid many snapshot failures conflicts(Ex: Online schema, splits)
• Failed attempt will report errors -- user must retry.
• o.a.h.hbase.snapshot.HBaseSnapshotException
• o.a.h.hbase.snapshot.CorruptedSnapshotException
HBaseCon 2013 6/13/201353
Development Notes
How we collaborated, built, and tested
Table Snapshots Development
• Developed in a Branch off of trunk
merged and in 0.95 and trunk.
• Feature is too big to include as a single
patch
• Does not destabilize trunk
• Does not slow time-based release
trains
• Later Backported to 0.94.6.1
src
branch
Reintegrate
into trunk
sync
HBaseCon 2013 6/13/201355
System testing with Jenkins
• Concurrently load data while taking snapshots
• Inject compactions, Kill RS’s, Meta RS, Master
• Create snapshot clones of the snapshots
• Inject Compactions, Kill RS’s, META Rs, Master
HBaseCon 2013 6/13/201356
Future Work:
• Alternative semantics and implementations
• Log Roll Snapshot (HBASE-7291)
• Store logs and replay on restore
• Faster for snapshot, slower and more complicated for restore.
• Timestamp Snapshot (HBASE-6866)
• All updates before ts in snapshot, all after not in snapshot
• Longer pause before snapshot taken
• Globally-Consistent Snapshot (HBASE-6867)
• global write lock for all regions nodes until snapshot complete.
• Expensive
• Repair tools
• Manual repairs necessary (hbck does not support yet)
HBaseCon 2013 6/13/201357
Conclusions
Feature Summary by Version
Apache 0.92.x
Apache <0.94.6.1
Apache 0.94.6.1+ Apache 0.95.0
Apache 0.96.0
Copy Table Copy Table Copy Table
Import / Export Import / Export Import / Export
Offline snapshots Offline snapshots
Flush Online
Snapshot
Flush Online
Snapshot
Table Locks
HBaseCon 2013 6/13/201359
Key Contributors
• Jesse Yates (Salesforce.com)
• HFileArchiver, Offline Snapshot, first draft online
• Matteo Bertozzi (Cloudera)
• HFileLink, Restore, clone, Testing, 0.94 backport
• Jonathan Hsieh (Cloudera)
• Online Snapshots revamp, Testing, Branch Sheppard
• Ted Yu (HortonWorks)
• Reviews
• Enis Soztutar (HortonWorks)
• Table Locks on Snapshots
HBaseCon 2013 6/13/201360
Thanks! Questions?
Matteo Bertozzi
@th30z
matteo.bertozzi@cloudera.com
Jonathan Hsieh
@jmhsieh
jon@cloudera.com
Jesse Yates
@jesse_yates
jesse.k.yates@gmail.com
HBaseCon 2013 6/13/201361

More Related Content

What's hot

Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilDatabricks
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...DataWorks Summit/Hadoop Summit
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersCloudera, Inc.
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Yongho Ha
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark InternalsPietro Michiardi
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
How Prometheus Store the Data
How Prometheus Store the DataHow Prometheus Store the Data
How Prometheus Store the DataHao Chen
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimDatabricks
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 

What's hot (20)

Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
How Prometheus Store the Data
How Prometheus Store the DataHow Prometheus Store the Data
How Prometheus Store the Data
 
Apache kudu
Apache kuduApache kudu
Apache kudu
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 

Viewers also liked

HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestCloudera, Inc.
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseCloudera, Inc.
 
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseCloudera, Inc.
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache HiveHBaseCon
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the BasicsHBaseCon
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshotsenissoz
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsMichael Stack
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...Yahoo Developer Network
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineData Con LA
 
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata Storagehive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata StorageDataWorks Summit/Hadoop Summit
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Sematext Group, Inc.
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop User Group
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Hortonworks
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...Cloudera, Inc.
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBaseCon
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics Cloudera, Inc.
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseCloudera, Inc.
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterCloudera, Inc.
 

Viewers also liked (20)

HBase Snapshots
HBase SnapshotsHBase Snapshots
HBase Snapshots
 
HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at Pinterest
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
 
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBase
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache Hive
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshots
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbms
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata Storagehive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User Group
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart Meter
 

Similar to HBaseCon 2013: Apache HBase Table Snapshots

Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0enissoz
 
Meet HBase 2.0
Meet HBase 2.0Meet HBase 2.0
Meet HBase 2.0enissoz
 
[Altibase] 13 backup and recovery
[Altibase] 13 backup and recovery[Altibase] 13 backup and recovery
[Altibase] 13 backup and recoveryaltistory
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 ReleaseNick Dimiduk
 
HBase Backups
HBase BackupsHBase Backups
HBase BackupsHBaseCon
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBaseCon
 
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...Michael Stack
 
Hbase Backups: Backups in the Enterprise
Hbase Backups: Backups in the EnterpriseHbase Backups: Backups in the Enterprise
Hbase Backups: Backups in the EnterpriseSalesforce Engineering
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかToshihiro Suzuki
 
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxMarco Gralike
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0DataWorks Summit
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...Michael Stack
 
Less17 flashback tb3
Less17 flashback tb3Less17 flashback tb3
Less17 flashback tb3Imran Ali
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkMichael Stack
 
Oracle 12c New Features_RMAN_slides
Oracle 12c New Features_RMAN_slidesOracle 12c New Features_RMAN_slides
Oracle 12c New Features_RMAN_slidesSaiful
 
Take your database source code and data under control
Take your database source code and data under controlTake your database source code and data under control
Take your database source code and data under controlMarcin Przepiórowski
 
RMAN in 12c: The Next Generation (PPT)
RMAN in 12c: The Next Generation (PPT)RMAN in 12c: The Next Generation (PPT)
RMAN in 12c: The Next Generation (PPT)Gustavo Rene Antunez
 

Similar to HBaseCon 2013: Apache HBase Table Snapshots (20)

Meet Apache HBase - 2.0
Meet Apache HBase - 2.0Meet Apache HBase - 2.0
Meet Apache HBase - 2.0
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
Meet HBase 2.0
Meet HBase 2.0Meet HBase 2.0
Meet HBase 2.0
 
[Altibase] 13 backup and recovery
[Altibase] 13 backup and recovery[Altibase] 13 backup and recovery
[Altibase] 13 backup and recovery
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
 
HBase Backups
HBase BackupsHBase Backups
HBase Backups
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
 
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
 
Les 12 fl_db
Les 12 fl_dbLes 12 fl_db
Les 12 fl_db
 
Hbase Backups: Backups in the Enterprise
Hbase Backups: Backups in the EnterpriseHbase Backups: Backups in the Enterprise
Hbase Backups: Backups in the Enterprise
 
Les 11 fl2
Les 11 fl2Les 11 fl2
Les 11 fl2
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
 
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
 
Less17 flashback tb3
Less17 flashback tb3Less17 flashback tb3
Less17 flashback tb3
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
 
Oracle 12c New Features_RMAN_slides
Oracle 12c New Features_RMAN_slidesOracle 12c New Features_RMAN_slides
Oracle 12c New Features_RMAN_slides
 
Take your database source code and data under control
Take your database source code and data under controlTake your database source code and data under control
Take your database source code and data under control
 
RMAN in 12c: The Next Generation (PPT)
RMAN in 12c: The Next Generation (PPT)RMAN in 12c: The Next Generation (PPT)
RMAN in 12c: The Next Generation (PPT)
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 

HBaseCon 2013: Apache HBase Table Snapshots

  • 1. Apache HBase Table Snapshots Matteo Bertozzi | Cloudera | Software Engineer / HBase Committer Jonathan Hsieh | Cloudera | Software Engineer / HBase Committer Jesse Yates | Salesforce.com | Software Engineer / HBase Committer June 13, 2013 HBaseCon 2013
  • 2. Outline • Intro and Use Cases • Usage Instructions • Internals • Snapshot Layout • Snapshot Restoration • Online Snapshots • Conclusion
  • 3. HBase Table Snapshots Snapshot is a collection of metadata required to reconstitute the data near a particular point in time HBaseCon 2013 6/13/20133
  • 4. HBase Table Snapshots • An inexpensive way to freeze state of a table • A mechanism that helps backup data to in the cluster or to a remote cluster • Recover from user error • Bootstrap Replication HBaseCon 2013 6/13/20134
  • 5. Old: HBase-Supported Batch Backups • Export / Dist CP / Import • 3 batch MR jobs • Several extra copies of data • High latency (hours) • Impacts existing low-latency workloads • Copy Table • 1 MR Job • Single copy of data • Incremental table copies • High Latency (hours) • Impacts existing workloads Export MR Job Import MR Job Dist CP MR Job Copy Table MR Job HBaseCon 2013 6/13/20135
  • 6. Upcoming: HDFS Snapshots (or DistCP backup) • Take an hdfs snapshot of all the files in the underlying HBase’s data directory. • Hfiles, hlogs, and other metadata. • Snapshots all tables in Hbase • Cannot Clone tables • “Restore As” • Targeted for Hadoop 2.1 / Hadoop 3.0 DistCP HLog Append Flush Compact Restart Recover HBaseCon 2013 6/13/20136
  • 7. New: HBase Snapshot-based Backups • Snapshot, then Export • 1 MR Job • Single copy of data • Little impact on low-latency workloads • Export is like distcp directly from hfds • No incremental snapshot copy HBaseCon 2013 Export Snapshot 6/13/20137
  • 8. Export • Like distcp for a snapshot manifest • Copy data files without going through HBase’s “front door” Export HBaseCon 2013 6/13/20138
  • 9. Recover from User Error • How do we recover from user error? Recovery Time time User Error: drop ‘table’ Service is restored, major data loss Service is down! Panic! Black magic! HBaseCon 2013 6/13/20139
  • 10. Recovering from User Mistakes: Table Snapshots • Snapshot the state of a table at a certain moment in time • Restore it or Clone it later, creating a new read write table • Export it to another cluster with minimal impact on HBase time User Error: drop ‘table’ Service restored, Minor data loss. Carry on. Periodic snapshot Service is down! Keep calm! restore Periodic snapshot HBaseCon 2013 6/13/201310
  • 11. Usage What an Admin needs to know
  • 12. Configuration • Simple hbase-site.xml configuration <property> <name>hbase.snapshot.enabled</name> <value>true</value> </property> • Enabled by default in 0.95+ • Requires user to enable in 0.94.6.1+. HBaseCon 2013 6/13/201312
  • 13. Usage: Shell Commands • snapshot ‘table’, ‘snapshot’ • Table can be offline or online • list_snapshot [<regex>] • clone_snapshot ‘snapshot’, ‘dsttable’ • restore_snapshot ‘snapshot’ • delete_snapshot ‘snapshot’ HBaseCon 2013 6/13/201313
  • 14. Usage: Web UI HBaseCon 2013 6/13/201314
  • 15. Usage: Web UI HBaseCon 2013 6/13/201315
  • 16. Export: Usage • Copy “MySnapshot” to a remote HDFS • $ hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MySnapshot -copy-to hdfs:///srv2:8082/hbase -mappers 16 • With permission change on the copy • $ hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MySnapshot -copy-to hdfs:///srv2:8082/hbase -chuser MyUser -chgroup MyGroup -chmod 700 -mappers 16 HBaseCon 2013 6/13/201316
  • 17. Debugging and Info • Dump a snapshot manifest • Writes to standard out • Usage • $ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot test-snapshot • $ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot test-snapshot -files HBaseCon 2013 6/13/201317
  • 18. Metrics • Histograms of operation completion • snapshotTime • cloneTime • restoreTime • Includes ‘extended’ metrics • Std deviation • Min/max HBaseCon 2013 6/13/201318
  • 20. Internals • HBase Table HDFS Layout • Snapshot HDFS layout • Offline Snapshots • Restore and Clone Snapshot • Online Snapshots HBaseCon 2013 6/13/201320
  • 21. Primer: HBase Table Layout in HDFS • HRegions map directly to a directory structure with table name, encoded region name, column family and hfiles. • In HDFS: /hbase/Table/<enc R1>/cf/<hfile f11> /hbase/Table/<enc R1>/cf/<hfile f12> /hbase/Table/<enc R2>/cf/<hfile f21> /hbase/Table/<enc R2>/cf/<hfile f22> /hbase/Table/<enc R3>/cf/<hfile f31> /hbase/Table/<enc R3>/cf/<hfile f32> Table F11 F21 F31 R1 R2 R3 6/13/2013HBaseCon 201321
  • 22. Table Snapshots in the File System • A Snapshot manifest contains references to files in the original table. ./.hbase-snapshots Table F11 F21 F31 R1 R2 R3 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201322
  • 23. Table Snapshots in the File System • A Snapshot manifest contains references to files in the original table. • Each snapshot is stored in the hbase/.hbase-snapshots dir. ./.hbase-snapshots Table F11 F21 F31 R1 R2 R3 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201323
  • 24. Offline Snapshots • Disable table, then create Snapshot Manifest • Created in temporary dir to guarantee snapshot creation atomicity • Includes • Snapshot Metadata • Table Metadata/Schema (.tableinfo) • References to original HFiles • Master-only file system operation HBaseCon 2013 6/13/201324
  • 25. HFile Life Cycle • Splits and Compactions remove hfiles • What happens to references to these files? ./.hbase-snapshots Table F11 F21 F31 R1 R2 R3 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201325
  • 26. HFile Life Cycle • Splits and Compactions remove hfiles • What happens to references to these files? ./.hbase-snapshots Table F11 F21 F31 R1 R2 R3 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201326
  • 27. HFile Life Cycle • Splits and Compactions remove hfiles • What happens to references to these files? ./.hbase-snapshots Table F11 F21 F31 +32 R1 R2 R3 TableSnapshot manifest R1 R2 R3 No more Hfile?? HBaseCon 2013 6/13/201327
  • 28. HFile Archiver ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 Table files F31 HBaseCon 2013 6/13/201328 • We archive old HFiles from compactions (HBASE-5547)
  • 29. HFile Archiver ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 F31 +32 Table files F31 HBaseCon 2013 6/13/201329 • We archive old HFiles from compactions (HBASE-5547) • Files stored in hbase/.archive
  • 30. HFile Archiver ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 F31 +32 Table files F31 • We archive old HFiles from compactions (HBASE-5547) • Files stored in hbase/.archive • HFileCleaner ensures HFiles’ data remains available HBaseCon 2013 6/13/201330
  • 32. Restore Operations • Restore table • Rollback table to specific state • Clone from snapshot (Restore As) • Create new read-write table from snapshot • There can be multiple replicas of a snapshot • Export snapshot • Send snapshot and all its data to another cluster HBaseCon 2013 6/13/201332
  • 33. Clone: Creating table from a Snapshot • Convert snapshot manifest info into a Table. ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 Table files Clone R1 R2 R3 F31 HBaseCon 2013 6/13/201333
  • 34. Clone: Creating table from a Snapshot • Convert snapshot manifest info into a Table. ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 Table files Clone R1 R2 R3 F31 HBaseCon 2013 6/13/201334
  • 35. Clone: Creating table from a Snapshot ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 F31 +32 Table files F31 Clone R1 R2 R3 • Convert snapshot manifest info into a Table. • HFileLinks (HBASE-6610) to mimic unix open file descriptor semantics HBaseCon 2013 6/13/201335
  • 36. Restore: Rollback to an old state • Rollback the existing table to snapshot state • Restores original schema if altered • Snapshots current table, just in case • Minimal overhead • Smarter delete table & clone snapshot • Handles creating/deleting regions • Restore META HBaseCon 2013 6/13/201336
  • 37. Restore illustrated ./.hbase-snapshots ./.archive TableSnapshot manifest R1 R2 R3 Table files F31 Table F11 F21 R1 R2 R3 F31 +32 F41 R4 • Rollback “Table” to the “TableSnapshot” state HBaseCon 2013 6/13/201337
  • 38. Restore illustrated ./.hbase-snapshots ./.archive TableSnapshot manifest R1 R2 R3 Table files F31 Table F11 F21 R1 R2 R3 F31 +32 F41 R4 • Region “R4” is not present in the snapshot • “R4” will be removed from “Table”, files moved to “.archive” HBaseCon 2013 6/13/201338
  • 39. Restore illustrated ./.hbase-snapshots ./.archive TableSnapshot manifest R1 R2 R3 Table files F31 Table F11 F21 R1 R2 R3 F31 +32 F41 • New files not present in the snapshots are moved to the archive HBaseCon 2013 6/13/201339
  • 40. Restore illustrated ./.hbase-snapshots ./.archive TableSnapshot manifest R1 R2 R3 Table files F31 Table F11 F21 R1 R2 R3 F41F3+ 32 • New files not present in the snapshots are moved to the archive • HFileLinks are created to point to old files. HBaseCon 2013 6/13/201340
  • 41. Restore failures • The table to restore is disabled • META and HDFS operations may fail (network issue, server down, …) • hbck can’t repair an incomplete restore... • Restore again! HBaseCon 2013 6/13/201341
  • 42. Export Snapshot • Copy a full snapshot to another cluster • All required HFiles, and Metadata • Lots of options • Fancy dist-cp • Must resolve HFileLinks • Faster than CopyTable or table export+import! • Minimal impact on running cluster HBaseCon 2013 6/13/201342
  • 44. Online snapshots • Take a snapshot without making the table unavailable • No need to disable the table • Continue accepting reads and writes from clients • Challenges • Coordinating Region Servers • Data is in memory • Consistency HBaseCon 2013 6/13/201344
  • 45. Offline vs Online Snapshots Offline Online mastermaster RS1 RS2 RS3 RS4 verify Snapshot region subprocedure Write manifest per region verify Write manifest per region 6/13/2013HBaseCon 201345
  • 46. Online Snapshots • Each Region can have data in memstore and hlog, not yet Hfile • Snapshot is missing in memory data! ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 Table files F31 mem mem mem HBaseCon 2013 6/13/201346
  • 47. Online Snapshots • Flush so that all in memory data written in an Hfile • Then add to snapshot manifest ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 Table files F31 F13 F23 F33 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201347
  • 48. Online Snapshots • Flush so that all in memory data in an Hfile • Then add to snapshot manifest ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 Table files F31 F13 F23 F33 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201348
  • 49. Consistency • Offline Snapshots • Fully consistent snapshot • Online Flush Snapshot • “CopyTable” level consistency with a much smaller window. • Time bounded by slowest region server and region flush HBaseCon 2013 6/13/201349
  • 50. Online Snapshots and Causal consistency • Causal consistency would only allow A, AB, or neither A nor B. • B and Not A is currently possible Table F11 F21 R1 R2 R3 F31 TableSnapshot manifest R1 R2 R3 Master RS1 RS2 Client mem mem Flush SS F13 HBaseCon 2013 6/13/201350
  • 51. Online Snapshots and Causal consistency • Causal consistency would only allow A, AB, or neither A nor B. • B and Not A is currently possible Table F11 F21 R1 R2 R3 F31 TableSnapshot manifest R1 R2 R3 Master RS1 RS2 Client mem mem Put A … … then Put B F13 mem HBaseCon 2013 6/13/201351
  • 52. Online Snapshots and Causal consistency • Causal consistency would allow A, AB, or neither A nor B. • B and Not A is possible with Flush Snapshots Table F11 F21 R1 R2 R3 F31 TableSnapshot manifest R1 R2 R3 Master RS1 RS2 Client mem F23 F13 Flush SS Put B is in but Put A is not! F33 HBaseCon 2013 6/13/201352
  • 53. Online snapshot attempts can fail • If involved RS’s fail, the snapshots attempt will fail. • Needs a way to prevent other table metadata operations • Table Metadata Locks (0.95+) • Avoid many snapshot failures conflicts(Ex: Online schema, splits) • Failed attempt will report errors -- user must retry. • o.a.h.hbase.snapshot.HBaseSnapshotException • o.a.h.hbase.snapshot.CorruptedSnapshotException HBaseCon 2013 6/13/201353
  • 54. Development Notes How we collaborated, built, and tested
  • 55. Table Snapshots Development • Developed in a Branch off of trunk merged and in 0.95 and trunk. • Feature is too big to include as a single patch • Does not destabilize trunk • Does not slow time-based release trains • Later Backported to 0.94.6.1 src branch Reintegrate into trunk sync HBaseCon 2013 6/13/201355
  • 56. System testing with Jenkins • Concurrently load data while taking snapshots • Inject compactions, Kill RS’s, Meta RS, Master • Create snapshot clones of the snapshots • Inject Compactions, Kill RS’s, META Rs, Master HBaseCon 2013 6/13/201356
  • 57. Future Work: • Alternative semantics and implementations • Log Roll Snapshot (HBASE-7291) • Store logs and replay on restore • Faster for snapshot, slower and more complicated for restore. • Timestamp Snapshot (HBASE-6866) • All updates before ts in snapshot, all after not in snapshot • Longer pause before snapshot taken • Globally-Consistent Snapshot (HBASE-6867) • global write lock for all regions nodes until snapshot complete. • Expensive • Repair tools • Manual repairs necessary (hbck does not support yet) HBaseCon 2013 6/13/201357
  • 59. Feature Summary by Version Apache 0.92.x Apache <0.94.6.1 Apache 0.94.6.1+ Apache 0.95.0 Apache 0.96.0 Copy Table Copy Table Copy Table Import / Export Import / Export Import / Export Offline snapshots Offline snapshots Flush Online Snapshot Flush Online Snapshot Table Locks HBaseCon 2013 6/13/201359
  • 60. Key Contributors • Jesse Yates (Salesforce.com) • HFileArchiver, Offline Snapshot, first draft online • Matteo Bertozzi (Cloudera) • HFileLink, Restore, clone, Testing, 0.94 backport • Jonathan Hsieh (Cloudera) • Online Snapshots revamp, Testing, Branch Sheppard • Ted Yu (HortonWorks) • Reviews • Enis Soztutar (HortonWorks) • Table Locks on Snapshots HBaseCon 2013 6/13/201360
  • 61. Thanks! Questions? Matteo Bertozzi @th30z matteo.bertozzi@cloudera.com Jonathan Hsieh @jmhsieh jon@cloudera.com Jesse Yates @jesse_yates jesse.k.yates@gmail.com HBaseCon 2013 6/13/201361

Editor's Notes

  1. Talk about everything! Don’t glaze over internals
  2. Why does it cause extra latency? What “crushes” the cluster?
  3. Causes move expensive backups. Have a bunch of ‘write optimized files’ – HLogs and have to convert them to ‘read optimized files’ – HFIles. This isn’t a cheap process.
  4. Don’t over sell! Just say what it is.
  5. Add a quick summary of what just talked about BEFORE handoff!!!