Successfully reported this slideshow.
Your SlideShare is downloading. ×

Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか

More Related Content

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか

  1. 1. Apache HBase HBase Cloudera Hadoop / Spark Conference Japan 2019
  2. 2. ( ) • Apache HBase Committer • Cloudera • Sr. Software Engineer, Breakfix • • • ( HBase/Phoenix) • HBase • Twitter: @brfrn169
  3. 3. • HBase • HBase • HBase
  4. 4. HBase
  5. 5. (master) (branch-2.0) (branch-2.1) (branch-2.2) (branch-1) (branch-1.5) 2.2.0 2.0.52.0.4 2.1.3 2.1.4 1.4.9 1.5.0 2.0.0 (branch-2) (branch-1.4)
  6. 6. • HBase 0.98 • HBase 1.4.9 • • HBase 1.5.0 • HBase 1 • HBase 2 HBase 2.1.x HBase 2.2.0 • HBase 2
  7. 7. • CDH • CDH 5.8+: HBase 1.2.0 (+ bugfixes and backports) • CDH 6.0: HBase 2.0.1 (+ bugfixes and backports) • CDH 6.1: HBase 2.1.1 (+ bugfixes and backports) • HDP • HDP 2.x: HBase 1.1.2 (+ bugfixes and backports) • HDP 3.x: HBase 2.0.2 (+ bugfixes and backports)
  8. 8. HBase
  9. 9. HBase • HBase 2.x • • Procedure version 2 • Assignment Manager version 2 • • Backup/Restore • • Compacting Memstore • • Serial Replication
  10. 10. Procedure version 2 • Master (create/drop table region assign split ) • Master Procedure
  11. 11. Procedure version 2 • ) CreateTableProcedure PRE_OPERATION WRITE_FS_LAYOUT ADD_TO_META ASSIGN_REGIONSUPDATE_DESC_CACHEPOST_OPERATION Start End
  12. 12. Procedure version 2 • ) CreateTableProcedure PRE_OPERATION WRITE_FS_LAYOUT ADD_TO_META ASSIGN_REGIONSUPDATE_DESC_CACHEPOST_OPERATION Start End
  13. 13. Procedure version 2 • ) CreateTableProcedure PRE_OPERATION WRITE_FS_LAYOUT ADD_TO_META ASSIGN_REGIONSUPDATE_DESC_CACHEPOST_OPERATION Start End
  14. 14. Procedure version 2 • ) CreateTableProcedure PRE_OPERATION WRITE_FS_LAYOUT ADD_TO_META ASSIGN_REGIONSUPDATE_DESC_CACHEPOST_OPERATION Start End Procedure ASSIGN_REGIONS Region Procedure
  15. 15. Assignment Manager version 2 • Region • Region • HBCK • Region Assignment Manager version 2 • Procedure version 2 • Region Zookeeper • • • Region • Region • Master
  16. 16. Backup/Restore • • • hbase backup create <type> <backup_path> [options] • hbase restore <backup_path> <backup_id> [options] • HDFS S3, ADLS, WASB • • hbase snapshot • Write Ahead Log (WAL)
  17. 17. Compacting Memstore • Compacting Memstore • in-memory flush • • in-memory compaction • • Flush • Compaction
  18. 18. Compacting Memstore • Default Memstore ( ) Active HDFS
  19. 19. Compacting Memstore • Default Memstore ( ) ActiveWrite HDFS
  20. 20. Compacting Memstore • Default Memstore ( ) ActiveWrite Snapshot HDFS Active
  21. 21. Compacting Memstore • Default Memstore ( ) ActiveWrite Snapshot HDFS Flush HFile Active
  22. 22. Compacting Memstore • Default Memstore ( ) Active HDFS HFile HFile
  23. 23. Compacting Memstore • Default Memstore ( ) Active HDFS HFile HFile Compaction HFile
  24. 24. Compacting Memstore • Compacting Memstore Active HDFS
  25. 25. Compacting Memstore • Compacting Memstore ActiveWrite HDFS
  26. 26. Compacting Memstore • Compacting Memstore ActiveWrite HDFS Pipeline #1 Active in-memory flush
  27. 27. Compacting Memstore • Compacting Memstore ActiveWrite HDFS Pipeline #1 Pipeline #2 Pipeline #3
  28. 28. Compacting Memstore • Compacting Memstore ActiveWrite HDFS Pipeline #1 Pipeline #2 Pipeline #3 Pipeline in-memory compaction
  29. 29. Compacting Memstore • Compacting Memstore ActiveWrite HDFS Pipeline #1 Pipeline in-memory compaction
  30. 30. Compacting Memstore • Compacting Memstore ActiveWrite HDFS Pipeline #1 Pipeline #2
  31. 31. Compacting Memstore • Compacting Memstore ActiveWrite HDFS Pipeline #1 Pipeline #2 Flush HFile
  32. 32. Serial Replication • HBase Replication • • • Push • ( ) ( ) Push • MySQL Pull
  33. 33. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously
  34. 34. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously Tail the WALs
  35. 35. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously Tail the WALs
  36. 36. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously 1 Tail the WALs
  37. 37. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously 1
  38. 38. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously 1
  39. 39. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 2 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously 1
  40. 40. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 2 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously 1 2
  41. 41. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously 1 2
  42. 42. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously 1 2
  43. 43. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 3 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously 1 2
  44. 44. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 3 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously 1 2 3
  45. 45. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously 1 2 3
  46. 46. Serial Replication • HBase Replication • RegionServer Region move Push
  47. 47. Serial Replication • HBase Replication RegionServer 1Queue ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource
  48. 48. Serial Replication • HBase Replication RegionServer 1 1 Queue ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource
  49. 49. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource
  50. 50. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource
  51. 51. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource Move the Region
  52. 52. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource 4 Move the Region
  53. 53. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push RegionServer 2Queue ReplicationSource 4 Push
  54. 54. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push RegionServer 2Queue ReplicationSource 4 Push
  55. 55. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push RegionServer 2Queue ReplicationSource 4 Push 1
  56. 56. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push 1 RegionServer 2Queue ReplicationSource 4 Push 1
  57. 57. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push 1 RegionServer 2Queue ReplicationSource 4 Push
  58. 58. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push 1 RegionServer 2Queue ReplicationSource 4 Push
  59. 59. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push 1 RegionServer 2Queue ReplicationSource 4 Push 2
  60. 60. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push 2
  61. 61. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push
  62. 62. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push
  63. 63. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push 4
  64. 64. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push 4 4
  65. 65. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push 4 Inconsistent State!
  66. 66. Serial Replication • Serial Replication • Serial Replication
  67. 67. Serial Replication • HBase RegionServer Region WAL Client Put MemStore HDFS
  68. 68. Serial Replication • HBase RegionServer Region WAL Client Put MemStore HDFS
  69. 69. Serial Replication • HBase RegionServer Region WAL Client Put MemStore HDFS
  70. 70. Serial Replication • HBase RegionServer Region WAL Client Put MemStore HDFS 1 Assign Sequence ID to the data (Cell) before writing WAL
  71. 71. Serial Replication • HBase RegionServer Region WAL 1 Client Put MemStore HDFS 1
  72. 72. Serial Replication • HBase RegionServer Region WAL 1 Client Put MemStore HDFS 1 1
  73. 73. Serial Replication • HBase RegionServer Region WAL 1 2 3 4 Client Put MemStore HDFS 1 2 3 4
  74. 74. Serial Replication • Sequence ID • Region (Cell) • Multi Version Concurrency Control (MVCC) • Serial Replication
  75. 75. Serial Replication RegionServer 1Queue ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource Zookeeper hbase:meta
  76. 76. Serial Replication RegionServer 1 1 Queue ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource Zookeeper hbase:meta
  77. 77. Serial Replication RegionServer 1 1 Queue 2 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource Zookeeper hbase:meta
  78. 78. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource Zookeeper hbase:meta
  79. 79. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource Zookeeper hbase:meta Move the Region
  80. 80. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource Zookeeper hbase:meta The sequence of open sequence numbers for the region 3
  81. 81. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource 4 Zookeeper hbase:meta 3
  82. 82. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 3
  83. 83. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 3
  84. 84. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push RegionServer 2Queue ReplicationSource 4 Push 1 Zookeeper hbase:meta 3
  85. 85. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 RegionServer 2Queue ReplicationSource 4 Push 1 Zookeeper hbase:meta 3
  86. 86. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 31 The last pushed Sequence ID
  87. 87. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 31
  88. 88. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 31
  89. 89. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 RegionServer 2Queue ReplicationSource 4 Push 2 Zookeeper hbase:meta 31
  90. 90. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push 2 Zookeeper hbase:meta 31
  91. 91. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 31 The last pushed Sequence ID 2
  92. 92. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 312
  93. 93. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 312
  94. 94. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 312 Wait
  95. 95. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 312
  96. 96. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 312
  97. 97. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push 3 Zookeeper hbase:meta 312
  98. 98. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push 3 Zookeeper 3 hbase:meta 312
  99. 99. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 PushZookeeper 3 hbase:meta 31 The last pushed Sequence ID 23
  100. 100. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 PushZookeeper 3 hbase:meta 3123
  101. 101. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 PushZookeeper 3 hbase:meta 3123
  102. 102. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 PushZookeeper 3 hbase:meta 3123 Go
  103. 103. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push 4 Zookeeper 3 hbase:meta 3123
  104. 104. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push 4 4 Zookeeper 3 hbase:meta 3123
  105. 105. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push 4 Zookeeper 3 hbase:meta 3123
  106. 106. • Procedure version 2 / Assignment Manager version 2 • • Backup/Restore • Compacting Memstore • Serial Replication Replication
  107. 107. HBase
  108. 108. HBase • Evolving HBase in the Cloud • HBase • HBase on Persistent Memory • HBase Persistent Memory • Synchronous Replication • • •
  109. 109. HBase • Evolving HBase in the Cloud • HBase • HBase on Persistent Memory • HBase Persistent Memory • Synchronous Replication • • •
  110. 110. Evolving HBase in the Cloud • HBASE-20951 Ratis LogService backed WALs • IaaS (Amazon EC2, Google Compute Engine, Microsoft Azure Compute) HBase • IaaS HBase
  111. 111. Evolving HBase in the Cloud • IaaS • Amazon EC2 • AWS • HDFS • DataNode • AWS
  112. 112. Evolving HBase in the Cloud • Amazon EBS (Elastic Block Store) Google Persistent Storage ( ) • Amazon EBS (Elastic Block Store) Google Persistent Storage • Amazon S3 Google Cloud Storage • • Amazon EBS Google Persistent Storage
  113. 113. Evolving HBase in the Cloud • HBase HFile WAL HDFS • HFile • WAL short-lived, sub-second durability requirements HDFS HFile HFile HFile WAL RegionServerPuts Memstore Flush
  114. 114. Evolving HBase in the Cloud • HFile (S3 with S3Guard ) • WAL • WAL • sub-second durability requirements • WAL • traversable queue (FIFO) • constant-time append complexity • linear-time traversal • sub-linear seek to an arbitrary offset
  115. 115. Evolving HBase in the Cloud • Apache Ratis • Apache Software Foundation • RAFT Java • Apache Hadoop Ozone • Ratis Kafka DistributedLog • HBase WAL • • Ratis • WAL Ratis
  116. 116. Evolving HBase in the Cloud • Ratis WAL Ratis LogService Ratis • WAL HBase • 2 1. Ratis LogService (RATIS-271) 2. HBase WAL (HBASE-20952) • HDFS HDFS WAL 1 • Ratis LogService Kafka DistributedLog
  117. 117. Evolving HBase in the Cloud • RegionServer1 ReginoServer2 New WAL API Ratis LogService Amazon S3/Google Cloud Storage ReginoServer3 Flush Memstore WAL Storage WAL Storage WAL Storage Puts HFile HFile HFile RAFT

×