Apache HBase
HBase
Cloudera
Hadoop / Spark Conference Japan 2019
( )
• Apache HBase Committer
• Cloudera
• Sr. Software Engineer, Breakfix
•
•
• ( HBase/Phoenix)
• HBase
• Twitter: @brfrn169
• HBase
• HBase
• HBase
HBase
(master)
(branch-2.0)
(branch-2.1)
(branch-2.2)
(branch-1)
(branch-1.5)
2.2.0
2.0.52.0.4
2.1.3 2.1.4
1.4.9
1.5.0
2.0.0
(branch-2)
(branch-1.4)
• HBase 0.98
• HBase 1.4.9
•
• HBase 1.5.0
• HBase 1
• HBase 2 HBase 2.1.x
HBase 2.2.0
• HBase 2
• CDH
• CDH 5.8+: HBase 1.2.0 (+ bugfixes and backports)
• CDH 6.0: HBase 2.0.1 (+ bugfixes and backports)
• CDH 6.1: HBase 2.1.1 (+ bugfixes and backports)
• HDP
• HDP 2.x: HBase 1.1.2 (+ bugfixes and backports)
• HDP 3.x: HBase 2.0.2 (+ bugfixes and backports)
HBase
HBase
• HBase 2.x
•
• Procedure version 2
• Assignment Manager version 2
•
• Backup/Restore
•
• Compacting Memstore
•
• Serial Replication
Procedure version 2
• Master (create/drop table region assign
split )
• Master Procedure
Procedure version 2
• ) CreateTableProcedure
PRE_OPERATION WRITE_FS_LAYOUT ADD_TO_META
ASSIGN_REGIONSUPDATE_DESC_CACHEPOST_OPERATION
Start
End
Procedure version 2
• ) CreateTableProcedure
PRE_OPERATION WRITE_FS_LAYOUT ADD_TO_META
ASSIGN_REGIONSUPDATE_DESC_CACHEPOST_OPERATION
Start
End
Procedure version 2
• ) CreateTableProcedure
PRE_OPERATION WRITE_FS_LAYOUT ADD_TO_META
ASSIGN_REGIONSUPDATE_DESC_CACHEPOST_OPERATION
Start
End
Procedure version 2
• ) CreateTableProcedure
PRE_OPERATION WRITE_FS_LAYOUT ADD_TO_META
ASSIGN_REGIONSUPDATE_DESC_CACHEPOST_OPERATION
Start
End
Procedure
ASSIGN_REGIONS
Region
Procedure
Assignment Manager version 2
• Region
• Region
• HBCK
• Region Assignment
Manager version 2
• Procedure version 2
• Region Zookeeper
•
•
• Region
• Region
• Master
Backup/Restore
•
•
• hbase backup create <type> <backup_path> [options]
• hbase restore <backup_path> <backup_id> [options]
• HDFS S3, ADLS, WASB
•
• hbase snapshot
• Write Ahead Log (WAL)
Compacting Memstore
• Compacting Memstore
• in-memory flush
•
• in-memory compaction
•
• Flush
• Compaction
Compacting Memstore
• Default Memstore ( )
Active
HDFS
Compacting Memstore
• Default Memstore ( )
ActiveWrite
HDFS
Compacting Memstore
• Default Memstore ( )
ActiveWrite
Snapshot
HDFS
Active
Compacting Memstore
• Default Memstore ( )
ActiveWrite
Snapshot
HDFS
Flush HFile
Active
Compacting Memstore
• Default Memstore ( )
Active
HDFS
HFile
HFile
Compacting Memstore
• Default Memstore ( )
Active
HDFS
HFile
HFile
Compaction HFile
Compacting Memstore
• Compacting Memstore
Active
HDFS
Compacting Memstore
• Compacting Memstore
ActiveWrite
HDFS
Compacting Memstore
• Compacting Memstore
ActiveWrite
HDFS
Pipeline #1
Active
in-memory flush
Compacting Memstore
• Compacting Memstore
ActiveWrite
HDFS
Pipeline #1
Pipeline #2
Pipeline #3
Compacting Memstore
• Compacting Memstore
ActiveWrite
HDFS
Pipeline #1
Pipeline #2
Pipeline #3
Pipeline
in-memory compaction
Compacting Memstore
• Compacting Memstore
ActiveWrite
HDFS
Pipeline #1
Pipeline
in-memory compaction
Compacting Memstore
• Compacting Memstore
ActiveWrite
HDFS
Pipeline #1
Pipeline #2
Compacting Memstore
• Compacting Memstore
ActiveWrite
HDFS
Pipeline #1
Pipeline #2 Flush HFile
Serial Replication
• HBase Replication
•
•
• Push
• ( ) ( )
Push
• MySQL Pull
Serial Replication
• HBase Replication
RegionServer
WAL1
WAL2
1
Queue
2
3
4
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push asynchronously
Serial Replication
• HBase Replication
RegionServer
WAL1
WAL2
1
Queue
2
3
4
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push asynchronously
Tail the WALs
Serial Replication
• HBase Replication
RegionServer
WAL1
WAL2
1
Queue
2
3
4
ReplicationSource
Cluster 1
1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push asynchronously
Tail the WALs
Serial Replication
• HBase Replication
RegionServer
WAL1
WAL2
1
Queue
2
3
4
ReplicationSource
Cluster 1
1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push asynchronously
1
Tail the WALs
Serial Replication
• HBase Replication
RegionServer
WAL1
WAL2
1
Queue
2
3
4
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push asynchronously
1
Serial Replication
• HBase Replication
RegionServer
WAL1
WAL2
1
Queue
2
3
4
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push asynchronously
1
Serial Replication
• HBase Replication
RegionServer
WAL1
WAL2
1
Queue
2
3
4
ReplicationSource
Cluster 1
2
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push asynchronously
1
Serial Replication
• HBase Replication
RegionServer
WAL1
WAL2
1
Queue
2
3
4
ReplicationSource
Cluster 1
2
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push asynchronously
1 2
Serial Replication
• HBase Replication
RegionServer
WAL1
WAL2
1
Queue
2
3
4
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push asynchronously
1 2
Serial Replication
• HBase Replication
RegionServer
WAL1
WAL2
1
Queue
2
3
4
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push asynchronously
1 2
Serial Replication
• HBase Replication
RegionServer
WAL1
WAL2
1
Queue
2
3
4
ReplicationSource
Cluster 1
3
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push asynchronously
1 2
Serial Replication
• HBase Replication
RegionServer
WAL1
WAL2
1
Queue
2
3
4
ReplicationSource
Cluster 1
3
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push asynchronously
1 2 3
Serial Replication
• HBase Replication
RegionServer
WAL1
WAL2
1
Queue
2
3
4
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push asynchronously
1 2 3
Serial Replication
• HBase Replication
• RegionServer Region move
Push
Serial Replication
• HBase Replication
RegionServer 1Queue
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
RegionServer 2Queue
ReplicationSource
Serial Replication
• HBase Replication
RegionServer 1
1
Queue
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
RegionServer 2Queue
ReplicationSource
Serial Replication
• HBase Replication
RegionServer 1
1
Queue
2 ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
RegionServer 2Queue
ReplicationSource
Serial Replication
• HBase Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
RegionServer 2Queue
ReplicationSource
Serial Replication
• HBase Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
RegionServer 2Queue
ReplicationSource
Move the Region
Serial Replication
• HBase Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
RegionServer 2Queue
ReplicationSource
4
Move the Region
Serial Replication
• HBase Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push
RegionServer 2Queue
ReplicationSource
4
Push
Serial Replication
• HBase Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push
RegionServer 2Queue
ReplicationSource
4
Push
Serial Replication
• HBase Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push
RegionServer 2Queue
ReplicationSource
4
Push
1
Serial Replication
• HBase Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push
1
RegionServer 2Queue
ReplicationSource
4
Push
1
Serial Replication
• HBase Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push
1
RegionServer 2Queue
ReplicationSource
4
Push
Serial Replication
• HBase Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push
1
RegionServer 2Queue
ReplicationSource
4
Push
Serial Replication
• HBase Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push
1
RegionServer 2Queue
ReplicationSource
4
Push
2
Serial Replication
• HBase Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
Push
2
Serial Replication
• HBase Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
Push
Serial Replication
• HBase Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
Push
Serial Replication
• HBase Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
Push
4
Serial Replication
• HBase Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
Push
4
4
Serial Replication
• HBase Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
Push
4
Inconsistent
State!
Serial Replication
• Serial Replication
• Serial Replication
Serial Replication
• HBase
RegionServer
Region
WAL
Client
Put
MemStore
HDFS
Serial Replication
• HBase
RegionServer
Region
WAL
Client
Put
MemStore
HDFS
Serial Replication
• HBase
RegionServer
Region
WAL
Client
Put
MemStore
HDFS
Serial Replication
• HBase
RegionServer
Region
WAL
Client
Put
MemStore
HDFS
1
Assign Sequence ID to the
data (Cell) before writing WAL
Serial Replication
• HBase
RegionServer
Region
WAL
1
Client
Put
MemStore
HDFS
1
Serial Replication
• HBase
RegionServer
Region
WAL
1
Client
Put
MemStore
HDFS
1
1
Serial Replication
• HBase
RegionServer
Region
WAL
1
2
3
4
Client
Put
MemStore
HDFS
1 2 3 4
Serial Replication
• Sequence ID
• Region (Cell)
• Multi Version Concurrency Control (MVCC)
• Serial Replication
Serial Replication
RegionServer 1Queue
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
RegionServer 2Queue
ReplicationSource
Zookeeper
hbase:meta
Serial Replication
RegionServer 1
1
Queue
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
RegionServer 2Queue
ReplicationSource
Zookeeper
hbase:meta
Serial Replication
RegionServer 1
1
Queue
2 ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
RegionServer 2Queue
ReplicationSource
Zookeeper
hbase:meta
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
RegionServer 2Queue
ReplicationSource
Zookeeper
hbase:meta
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
RegionServer 2Queue
ReplicationSource
Zookeeper
hbase:meta
Move the
Region
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
RegionServer 2Queue
ReplicationSource
Zookeeper
hbase:meta
The sequence of open
sequence numbers for
the region
3
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
Cluster 1
RegionServer
ReplicationSink
Cluster 2
HTable
RegionServer
RegionServer 2Queue
ReplicationSource
4
Zookeeper
hbase:meta
3
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
RegionServer 2Queue
ReplicationSource
4
PushZookeeper
hbase:meta
3
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
RegionServer 2Queue
ReplicationSource
4
PushZookeeper
hbase:meta
3
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
RegionServer 2Queue
ReplicationSource
4
Push
1
Zookeeper
hbase:meta
3
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1
RegionServer 2Queue
ReplicationSource
4
Push
1
Zookeeper
hbase:meta
3
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1
RegionServer 2Queue
ReplicationSource
4
PushZookeeper
hbase:meta
31
The last pushed
Sequence ID
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1
RegionServer 2Queue
ReplicationSource
4
PushZookeeper
hbase:meta
31
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1
RegionServer 2Queue
ReplicationSource
4
PushZookeeper
hbase:meta
31
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1
RegionServer 2Queue
ReplicationSource
4
Push
2
Zookeeper
hbase:meta
31
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
Push
2
Zookeeper
hbase:meta
31
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
PushZookeeper
hbase:meta
31
The last pushed
Sequence ID
2
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
PushZookeeper
hbase:meta
312
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
PushZookeeper
hbase:meta
312
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
PushZookeeper
hbase:meta
312
Wait
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
PushZookeeper
hbase:meta
312
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
PushZookeeper
hbase:meta
312
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
Push
3
Zookeeper
hbase:meta
312
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
Push
3
Zookeeper
3
hbase:meta
312
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
PushZookeeper
3
hbase:meta
31
The last pushed
Sequence ID
23
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
PushZookeeper
3
hbase:meta
3123
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
PushZookeeper
3
hbase:meta
3123
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
PushZookeeper
3
hbase:meta
3123
Go
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
Push
4
Zookeeper
3
hbase:meta
3123
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
Push
4
4
Zookeeper
3
hbase:meta
3123
Serial Replication
RegionServer 1
1
Queue
2
3
ReplicationSource
RegionServer
ReplicationSink HTable
RegionServer
Push
1 2
RegionServer 2Queue
ReplicationSource
4
Push
4
Zookeeper
3
hbase:meta
3123
• Procedure version 2 / Assignment Manager version 2
•
• Backup/Restore
• Compacting Memstore
• Serial Replication Replication
HBase
HBase
• Evolving HBase in the Cloud
• HBase
• HBase on Persistent Memory
• HBase Persistent Memory
• Synchronous Replication
•
•
•
HBase
• Evolving HBase in the Cloud
• HBase
• HBase on Persistent Memory
• HBase Persistent Memory
• Synchronous Replication
•
•
•
Evolving HBase in the Cloud
• HBASE-20951 Ratis LogService backed WALs
• IaaS (Amazon EC2, Google Compute
Engine, Microsoft Azure Compute) HBase
• IaaS HBase
Evolving HBase in the Cloud
• IaaS
• Amazon EC2
•
AWS
• HDFS
• DataNode
• AWS
Evolving HBase in the Cloud
• Amazon EBS (Elastic Block Store) Google Persistent Storage
( )
• Amazon EBS (Elastic Block Store) Google Persistent Storage
• Amazon S3 Google Cloud Storage
•
• Amazon EBS Google Persistent Storage
Evolving HBase in the Cloud
• HBase HFile WAL HDFS
• HFile
• WAL short-lived, sub-second durability requirements
HDFS
HFile
HFile
HFile
WAL
RegionServerPuts Memstore
Flush
Evolving HBase in the Cloud
• HFile (S3 with S3Guard )
• WAL
• WAL
• sub-second durability requirements
• WAL
• traversable queue (FIFO)
• constant-time append complexity
• linear-time traversal
• sub-linear seek to an arbitrary offset
Evolving HBase in the Cloud
• Apache Ratis
• Apache Software Foundation
• RAFT Java
• Apache Hadoop Ozone
• Ratis Kafka DistributedLog
• HBase WAL
•
• Ratis
• WAL Ratis
Evolving HBase in the Cloud
• Ratis WAL Ratis LogService Ratis
• WAL HBase
• 2
1. Ratis LogService (RATIS-271)
2. HBase WAL (HBASE-20952)
• HDFS HDFS WAL 1
• Ratis LogService Kafka DistributedLog
Evolving HBase in the Cloud
•
RegionServer1
ReginoServer2
New WAL API
Ratis LogService
Amazon S3/Google Cloud Storage
ReginoServer3
Flush
Memstore
WAL
Storage
WAL
Storage
WAL
Storage
Puts
HFile
HFile
HFile
RAFT
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか

Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか