Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか

526 views

Published on

Hadoop / Spark Conference Japan 2019 での講演資料です
http://hadoop.apache.jp/

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか

  1. 1. Apache HBase HBase Cloudera Hadoop / Spark Conference Japan 2019
  2. 2. ( ) • Apache HBase Committer • Cloudera • Sr. Software Engineer, Breakfix • • • ( HBase/Phoenix) • HBase • Twitter: @brfrn169
  3. 3. • HBase • HBase • HBase
  4. 4. HBase
  5. 5. (master) (branch-2.0) (branch-2.1) (branch-2.2) (branch-1) (branch-1.5) 2.2.0 2.0.52.0.4 2.1.3 2.1.4 1.4.9 1.5.0 2.0.0 (branch-2) (branch-1.4)
  6. 6. • HBase 0.98 • HBase 1.4.9 • • HBase 1.5.0 • HBase 1 • HBase 2 HBase 2.1.x HBase 2.2.0 • HBase 2
  7. 7. • CDH • CDH 5.8+: HBase 1.2.0 (+ bugfixes and backports) • CDH 6.0: HBase 2.0.1 (+ bugfixes and backports) • CDH 6.1: HBase 2.1.1 (+ bugfixes and backports) • HDP • HDP 2.x: HBase 1.1.2 (+ bugfixes and backports) • HDP 3.x: HBase 2.0.2 (+ bugfixes and backports)
  8. 8. HBase
  9. 9. HBase • HBase 2.x • • Procedure version 2 • Assignment Manager version 2 • • Backup/Restore • • Compacting Memstore • • Serial Replication
  10. 10. Procedure version 2 • Master (create/drop table region assign split ) • Master Procedure
  11. 11. Procedure version 2 • ) CreateTableProcedure PRE_OPERATION WRITE_FS_LAYOUT ADD_TO_META ASSIGN_REGIONSUPDATE_DESC_CACHEPOST_OPERATION Start End
  12. 12. Procedure version 2 • ) CreateTableProcedure PRE_OPERATION WRITE_FS_LAYOUT ADD_TO_META ASSIGN_REGIONSUPDATE_DESC_CACHEPOST_OPERATION Start End
  13. 13. Procedure version 2 • ) CreateTableProcedure PRE_OPERATION WRITE_FS_LAYOUT ADD_TO_META ASSIGN_REGIONSUPDATE_DESC_CACHEPOST_OPERATION Start End
  14. 14. Procedure version 2 • ) CreateTableProcedure PRE_OPERATION WRITE_FS_LAYOUT ADD_TO_META ASSIGN_REGIONSUPDATE_DESC_CACHEPOST_OPERATION Start End Procedure ASSIGN_REGIONS Region Procedure
  15. 15. Assignment Manager version 2 • Region • Region • HBCK • Region Assignment Manager version 2 • Procedure version 2 • Region Zookeeper • • • Region • Region • Master
  16. 16. Backup/Restore • • • hbase backup create <type> <backup_path> [options] • hbase restore <backup_path> <backup_id> [options] • HDFS S3, ADLS, WASB • • hbase snapshot • Write Ahead Log (WAL)
  17. 17. Compacting Memstore • Compacting Memstore • in-memory flush • • in-memory compaction • • Flush • Compaction
  18. 18. Compacting Memstore • Default Memstore ( ) Active HDFS
  19. 19. Compacting Memstore • Default Memstore ( ) ActiveWrite HDFS
  20. 20. Compacting Memstore • Default Memstore ( ) ActiveWrite Snapshot HDFS Active
  21. 21. Compacting Memstore • Default Memstore ( ) ActiveWrite Snapshot HDFS Flush HFile Active
  22. 22. Compacting Memstore • Default Memstore ( ) Active HDFS HFile HFile
  23. 23. Compacting Memstore • Default Memstore ( ) Active HDFS HFile HFile Compaction HFile
  24. 24. Compacting Memstore • Compacting Memstore Active HDFS
  25. 25. Compacting Memstore • Compacting Memstore ActiveWrite HDFS
  26. 26. Compacting Memstore • Compacting Memstore ActiveWrite HDFS Pipeline #1 Active in-memory flush
  27. 27. Compacting Memstore • Compacting Memstore ActiveWrite HDFS Pipeline #1 Pipeline #2 Pipeline #3
  28. 28. Compacting Memstore • Compacting Memstore ActiveWrite HDFS Pipeline #1 Pipeline #2 Pipeline #3 Pipeline in-memory compaction
  29. 29. Compacting Memstore • Compacting Memstore ActiveWrite HDFS Pipeline #1 Pipeline in-memory compaction
  30. 30. Compacting Memstore • Compacting Memstore ActiveWrite HDFS Pipeline #1 Pipeline #2
  31. 31. Compacting Memstore • Compacting Memstore ActiveWrite HDFS Pipeline #1 Pipeline #2 Flush HFile
  32. 32. Serial Replication • HBase Replication • • • Push • ( ) ( ) Push • MySQL Pull
  33. 33. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously
  34. 34. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously Tail the WALs
  35. 35. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously Tail the WALs
  36. 36. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously 1 Tail the WALs
  37. 37. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously 1
  38. 38. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously 1
  39. 39. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 2 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously 1
  40. 40. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 2 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously 1 2
  41. 41. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously 1 2
  42. 42. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously 1 2
  43. 43. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 3 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously 1 2
  44. 44. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 3 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously 1 2 3
  45. 45. Serial Replication • HBase Replication RegionServer WAL1 WAL2 1 Queue 2 3 4 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push asynchronously 1 2 3
  46. 46. Serial Replication • HBase Replication • RegionServer Region move Push
  47. 47. Serial Replication • HBase Replication RegionServer 1Queue ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource
  48. 48. Serial Replication • HBase Replication RegionServer 1 1 Queue ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource
  49. 49. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource
  50. 50. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource
  51. 51. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource Move the Region
  52. 52. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource 4 Move the Region
  53. 53. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push RegionServer 2Queue ReplicationSource 4 Push
  54. 54. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push RegionServer 2Queue ReplicationSource 4 Push
  55. 55. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push RegionServer 2Queue ReplicationSource 4 Push 1
  56. 56. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push 1 RegionServer 2Queue ReplicationSource 4 Push 1
  57. 57. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push 1 RegionServer 2Queue ReplicationSource 4 Push
  58. 58. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push 1 RegionServer 2Queue ReplicationSource 4 Push
  59. 59. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push 1 RegionServer 2Queue ReplicationSource 4 Push 2
  60. 60. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push 2
  61. 61. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push
  62. 62. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push
  63. 63. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push 4
  64. 64. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push 4 4
  65. 65. Serial Replication • HBase Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push 4 Inconsistent State!
  66. 66. Serial Replication • Serial Replication • Serial Replication
  67. 67. Serial Replication • HBase RegionServer Region WAL Client Put MemStore HDFS
  68. 68. Serial Replication • HBase RegionServer Region WAL Client Put MemStore HDFS
  69. 69. Serial Replication • HBase RegionServer Region WAL Client Put MemStore HDFS
  70. 70. Serial Replication • HBase RegionServer Region WAL Client Put MemStore HDFS 1 Assign Sequence ID to the data (Cell) before writing WAL
  71. 71. Serial Replication • HBase RegionServer Region WAL 1 Client Put MemStore HDFS 1
  72. 72. Serial Replication • HBase RegionServer Region WAL 1 Client Put MemStore HDFS 1 1
  73. 73. Serial Replication • HBase RegionServer Region WAL 1 2 3 4 Client Put MemStore HDFS 1 2 3 4
  74. 74. Serial Replication • Sequence ID • Region (Cell) • Multi Version Concurrency Control (MVCC) • Serial Replication
  75. 75. Serial Replication RegionServer 1Queue ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource Zookeeper hbase:meta
  76. 76. Serial Replication RegionServer 1 1 Queue ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource Zookeeper hbase:meta
  77. 77. Serial Replication RegionServer 1 1 Queue 2 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource Zookeeper hbase:meta
  78. 78. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource Zookeeper hbase:meta
  79. 79. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource Zookeeper hbase:meta Move the Region
  80. 80. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource Zookeeper hbase:meta The sequence of open sequence numbers for the region 3
  81. 81. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource Cluster 1 RegionServer ReplicationSink Cluster 2 HTable RegionServer RegionServer 2Queue ReplicationSource 4 Zookeeper hbase:meta 3
  82. 82. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 3
  83. 83. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 3
  84. 84. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push RegionServer 2Queue ReplicationSource 4 Push 1 Zookeeper hbase:meta 3
  85. 85. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 RegionServer 2Queue ReplicationSource 4 Push 1 Zookeeper hbase:meta 3
  86. 86. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 31 The last pushed Sequence ID
  87. 87. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 31
  88. 88. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 31
  89. 89. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 RegionServer 2Queue ReplicationSource 4 Push 2 Zookeeper hbase:meta 31
  90. 90. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push 2 Zookeeper hbase:meta 31
  91. 91. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 31 The last pushed Sequence ID 2
  92. 92. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 312
  93. 93. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 312
  94. 94. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 312 Wait
  95. 95. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 312
  96. 96. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 PushZookeeper hbase:meta 312
  97. 97. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push 3 Zookeeper hbase:meta 312
  98. 98. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push 3 Zookeeper 3 hbase:meta 312
  99. 99. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 PushZookeeper 3 hbase:meta 31 The last pushed Sequence ID 23
  100. 100. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 PushZookeeper 3 hbase:meta 3123
  101. 101. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 PushZookeeper 3 hbase:meta 3123
  102. 102. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 PushZookeeper 3 hbase:meta 3123 Go
  103. 103. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push 4 Zookeeper 3 hbase:meta 3123
  104. 104. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push 4 4 Zookeeper 3 hbase:meta 3123
  105. 105. Serial Replication RegionServer 1 1 Queue 2 3 ReplicationSource RegionServer ReplicationSink HTable RegionServer Push 1 2 RegionServer 2Queue ReplicationSource 4 Push 4 Zookeeper 3 hbase:meta 3123
  106. 106. • Procedure version 2 / Assignment Manager version 2 • • Backup/Restore • Compacting Memstore • Serial Replication Replication
  107. 107. HBase
  108. 108. HBase • Evolving HBase in the Cloud • HBase • HBase on Persistent Memory • HBase Persistent Memory • Synchronous Replication • • •
  109. 109. HBase • Evolving HBase in the Cloud • HBase • HBase on Persistent Memory • HBase Persistent Memory • Synchronous Replication • • •
  110. 110. Evolving HBase in the Cloud • HBASE-20951 Ratis LogService backed WALs • IaaS (Amazon EC2, Google Compute Engine, Microsoft Azure Compute) HBase • IaaS HBase
  111. 111. Evolving HBase in the Cloud • IaaS • Amazon EC2 • AWS • HDFS • DataNode • AWS
  112. 112. Evolving HBase in the Cloud • Amazon EBS (Elastic Block Store) Google Persistent Storage ( ) • Amazon EBS (Elastic Block Store) Google Persistent Storage • Amazon S3 Google Cloud Storage • • Amazon EBS Google Persistent Storage
  113. 113. Evolving HBase in the Cloud • HBase HFile WAL HDFS • HFile • WAL short-lived, sub-second durability requirements HDFS HFile HFile HFile WAL RegionServerPuts Memstore Flush
  114. 114. Evolving HBase in the Cloud • HFile (S3 with S3Guard ) • WAL • WAL • sub-second durability requirements • WAL • traversable queue (FIFO) • constant-time append complexity • linear-time traversal • sub-linear seek to an arbitrary offset
  115. 115. Evolving HBase in the Cloud • Apache Ratis • Apache Software Foundation • RAFT Java • Apache Hadoop Ozone • Ratis Kafka DistributedLog • HBase WAL • • Ratis • WAL Ratis
  116. 116. Evolving HBase in the Cloud • Ratis WAL Ratis LogService Ratis • WAL HBase • 2 1. Ratis LogService (RATIS-271) 2. HBase WAL (HBASE-20952) • HDFS HDFS WAL 1 • Ratis LogService Kafka DistributedLog
  117. 117. Evolving HBase in the Cloud • RegionServer1 ReginoServer2 New WAL API Ratis LogService Amazon S3/Google Cloud Storage ReginoServer3 Flush Memstore WAL Storage WAL Storage WAL Storage Puts HFile HFile HFile RAFT

×