SlideShare a Scribd company logo
1 of 22
@TwitterAds | Confidential
@ctrezzo
HBaseCon 2013
Apache HBase Replication
Thursday, July 25, 13
@Twitter 2
About me
Active contributor to Apache HBase
Software Engineer @ Twitter
Core Storage Team - Hadoop/HBase
Follow me @ctrezzo
Thursday, July 25, 13
@Twitter 3
Agenda
Introduction
High-level Architecture
Replication State
Path of a replicated edit
Replication Source
Replication Sink
Replication Source Manager
Thursday, July 25, 13
@Twitter 4
HBase replication
Asynchronously copy data between two HBase clusters
Push-based architecture
WAL shipping technique similar to MySQL
Thursday, July 25, 13
@Twitter 5
Guarantees of replication
Eventually consistent
Deliver updates at least once
Atomicity of individual updates will be preserved
Thursday, July 25, 13
@Twitter 6
Administering Replication
Simply set parameter in hbase-site.xml
hbase.replication => true
Setup replication topologies
add_peer, remove_peer, disable_peer, enable_peer,
list_peers
Create/Alter tables with replication scope set
REPLICATION_SCOPE => ‘1’
Thursday, July 25, 13
@Twitter 7
High-Level Architecture
ReplicationSource
Manager
ReplicationSource
Region Server
Region Server
ReplicationSink HTable
Region Server
Cluster 2Cluster 1
1
2
4
3
HLog
ReplicationSource
Region Server
ReplicationSink HTable
Region Server
Cluster 3
1
/state
/peers
/1
/2
/rs
Zookeeper
123
Replication
Admin
Thursday, July 25, 13
@Twitter 8
Replication State
Persistently stored in Zookeeper
Status
Master kill switch
Peers
List of remote target clusters
Queues
List of remaining HLogs to replicate and current position in
each log
Thursday, July 25, 13
@Twitter 9
Path of a replicated edit
ReplicationSource
Region Server
Region Server
ReplicationSink HTable
Region Server
Cluster 2Cluster 1
1
2
4
3
HLog
123
Thursday, July 25, 13
@Twitter 10
Path of a replicated edit
ReplicationSource
Region Server 1
Region Server
ReplicationSink HTable
Region Server
Cluster 2Cluster 1
1
2
4
3
HLog
123
ReplicationSource
Region Server 2
Region Server
ReplicationSink HTable
Region Server
1
2
4
3
HLog
12
ReplicationSource
Region Server X
Region Server
ReplicationSink HTable
Region Server
1
2
4
3
HLog
1
Thursday, July 25, 13
@Twitter
End-point for shipping WAL entries
One instance for each queue
Runs as a separate thread on region server
Uses AdminProtocol RPC to synchronously ship entries
Filters edits based on replication scope
ReplicationSource
Region Server
Region Server
ReplicationSink HTable
Region Server
Cluster 2Cluster 1
1
2
4
3
HLog
123
11
Replication Source
Thursday, July 25, 13
@Twitter 12
Replication Sink
End-point for receiving shipped WAL entries
One instance per region server
Synchronously receives entries and applies them using
HTable
Batches rows in the same table
ReplicationSource
Region Server
Region Server
ReplicationSink HTable
Region Server
Cluster 2Cluster 1
1
2
4
3
HLog
123
Thursday, July 25, 13
@Twitter 13
Load balancing
Balances load on remote cluster using randomization
Ships edits to random subset of remote region servers
Default is 10%
Cluster 2
20 Region Servers
Cluster 1
Thursday, July 25, 13
@Twitter 14
Path of a replicated edit
ReplicationSource
Region Server
Region Server
ReplicationSink HTable
Region Server
Cluster 2Cluster 1
1
2
4
3
HLog
123
Thursday, July 25, 13
@Twitter 15
Replication Source Manager
Manages all replication sources
Manages change in replication state
Log rolling
Region server failure
Addition/deletion of peer clusters
ReplicationSource
Manager
ReplicationSource
Region Server
Region Server
ReplicationSink HTable
Region Server
Cluster 2Cluster 1
1
2
4
3
HLog
ReplicationSource
Region Server
ReplicationSink HTable
Region Server
Cluster 3
1
/state
/peers
/1
/2
/rs
Zookeeper
123
Thursday, July 25, 13
@Twitter 16
High-Level Architecture
ReplicationSource
Manager
ReplicationSource
Region Server
Region Server
ReplicationSink HTable
Region Server
Cluster 2Cluster 1
1
2
4
3
HLog
ReplicationSource
Region Server
ReplicationSink HTable
Region Server
Cluster 3
1
/state
/peers
/1
/2
/rs
Zookeeper
123
Replication
Admin
Thursday, July 25, 13
@Twitter 17
Additional Resources
Apache HBase user mailing list
user@hbase.apache.org
Apache HBase reference guide
https://hbase.apache.org/book.html
Tweet me
@ctrezzo
Thursday, July 25, 13
@TwitterAds | Confidential
Questions?
Thursday, July 25, 13
@Twitter 19
Replication State
Persistently stored in Zookeeper
Three major replication znodes: Status, Peers, Queues
/hbase/replication
/state [VALUE: true]
/peers
/1 [Value: zk1.host.com,zk2.host.com,zk3.host.com:2181:/hbase]
/peer-state [Value: ENABLED]
/2 [Value: zk5.host.com,zk6.host.com,zk7.host.com:2181:/hbase]
/peer-state [Value: DISABLED]
/rs
/hostname.example.org,6020,1234
/1
/23522342.23422 [VALUE: 254]
/12340993.22342 [VALUE: 0]
/2
/23522342.23422 [VALUE: 34]
/12340993.22342 [VALUE: 0]
/hostname2.example.org,6020,1234
/1
/23522348.23443 [VALUE: 87]
/12340999.22362 [VALUE: 0]
/2
/23522348.23443 [VALUE: 127]
/12340999.22362 [VALUE: 0]
Thursday, July 25, 13
@Twitter 20
Status znode
Master kill switch
Controlled by start_replication, stop_replication
Be careful what you wish for
/hbase/replication
/state [VALUE: true]
Thursday, July 25, 13
@Twitter 21
Peers znode
A set of remote clusters registered as possible replication
targets
Identified by peer id
Contains status of each peer cluster
/hbase/replication
/peers
/1 [Value: zk1.host.com,zk2.host.com,zk3.host.com:2181:/hbase]
/peer-state [Value: ENABLED]
/2 [Value: zk5.host.com,zk6.host.com,zk7.host.com:2181:/hbase]
/peer-state [Value: DISABLED]
Thursday, July 25, 13
@Twitter 22
Queues znode
Queues identified by region server and peer id
Queues contain list of HLogs and current position in log
/hbase/replication
/rs
/hostname.example.org,6020,1234
/1
/23522342.23422 [VALUE: 254]
/12340993.22342 [VALUE: 0]
/2
/23522342.23422 [VALUE: 34]
/12340993.22342 [VALUE: 0]
/hostname2.example.org,6020,1234
/1
/23522348.23443 [VALUE: 87]
/12340999.22362 [VALUE: 0]
/2
/23522348.23443 [VALUE: 127]
/12340999.22362 [VALUE: 0]
Thursday, July 25, 13

More Related Content

What's hot

HBaseCon 2012 | HBase Filtering - Lars George, Cloudera
HBaseCon 2012 | HBase Filtering - Lars George, ClouderaHBaseCon 2012 | HBase Filtering - Lars George, Cloudera
HBaseCon 2012 | HBase Filtering - Lars George, ClouderaCloudera, Inc.
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...Cloudera, Inc.
 
State of HBase: Meet the Release Managers
State of HBase: Meet the Release ManagersState of HBase: Meet the Release Managers
State of HBase: Meet the Release ManagersHBaseCon
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.Cloudera, Inc.
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicasenissoz
 
HBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond PanelHBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond PanelHBaseCon
 
Real-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the CloudReal-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the CloudHBaseCon
 
HBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at XiaomiHBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at XiaomiHBaseCon
 
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More
HBaseCon 2013: How to Get the MTTR Below 1 Minute and MoreHBaseCon 2013: How to Get the MTTR Below 1 Minute and More
HBaseCon 2013: How to Get the MTTR Below 1 Minute and MoreCloudera, Inc.
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clustersenissoz
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseSankar H
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0enissoz
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera FieldHBaseCon
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshotsenissoz
 
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and  High-Demand EnvironmentHBaseCon 2015: HBase at Scale in an Online and  High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and High-Demand EnvironmentHBaseCon
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかToshihiro Suzuki
 
Inside HDFS Append
Inside HDFS AppendInside HDFS Append
Inside HDFS AppendYue Chen
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketHBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketCloudera, Inc.
 

What's hot (20)

HBaseCon 2012 | HBase Filtering - Lars George, Cloudera
HBaseCon 2012 | HBase Filtering - Lars George, ClouderaHBaseCon 2012 | HBase Filtering - Lars George, Cloudera
HBaseCon 2012 | HBase Filtering - Lars George, Cloudera
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
 
State of HBase: Meet the Release Managers
State of HBase: Meet the Release ManagersState of HBase: Meet the Release Managers
State of HBase: Meet the Release Managers
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
 
HBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond PanelHBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond Panel
 
Real-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the CloudReal-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the Cloud
 
HBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at XiaomiHBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at Xiaomi
 
Flume and HBase
Flume and HBase Flume and HBase
Flume and HBase
 
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More
HBaseCon 2013: How to Get the MTTR Below 1 Minute and MoreHBaseCon 2013: How to Get the MTTR Below 1 Minute and More
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clusters
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshots
 
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and  High-Demand EnvironmentHBaseCon 2015: HBase at Scale in an Online and  High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
 
Inside HDFS Append
Inside HDFS AppendInside HDFS Append
Inside HDFS Append
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketHBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
 

Similar to HBase Replication

PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsMydbops
 
SQL Server Alwayson for SharePoint HA/DR Step by Step Guide
SQL Server Alwayson for SharePoint HA/DR Step by Step GuideSQL Server Alwayson for SharePoint HA/DR Step by Step Guide
SQL Server Alwayson for SharePoint HA/DR Step by Step GuideLars Platzdasch
 
Postgresql_Replication.pptx
Postgresql_Replication.pptxPostgresql_Replication.pptx
Postgresql_Replication.pptxStephenEfange3
 
SQLSaturday Bulgaria : HA & DR with SQL Server AlwaysOn Availability Groups
SQLSaturday Bulgaria : HA & DR with SQL Server AlwaysOn Availability GroupsSQLSaturday Bulgaria : HA & DR with SQL Server AlwaysOn Availability Groups
SQLSaturday Bulgaria : HA & DR with SQL Server AlwaysOn Availability Groupsturgaysahtiyan
 
Galera Cluster: Synchronous Multi-Master Replication for MySQL HA
Galera Cluster: Synchronous Multi-Master Replication for MySQL HAGalera Cluster: Synchronous Multi-Master Replication for MySQL HA
Galera Cluster: Synchronous Multi-Master Replication for MySQL HALudovico Caldara
 
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptxBuilt-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptxnadirpervez2
 
Built in physical and logical replication in postgresql-Firat Gulec
Built in physical and logical replication in postgresql-Firat GulecBuilt in physical and logical replication in postgresql-Firat Gulec
Built in physical and logical replication in postgresql-Firat GulecFIRAT GULEC
 
Sql server replication step by step
Sql server replication step by stepSql server replication step by step
Sql server replication step by steplaonap166
 
ProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQLProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQLRené Cannaò
 
FlashbackLoggingInternals.ppt
FlashbackLoggingInternals.pptFlashbackLoggingInternals.ppt
FlashbackLoggingInternals.pptssuser2e101e
 
MySQL HA with PaceMaker
MySQL HA with  PaceMakerMySQL HA with  PaceMaker
MySQL HA with PaceMakerKris Buytaert
 
Learn Oracle WebLogic Server 12c Administration
Learn Oracle WebLogic Server 12c AdministrationLearn Oracle WebLogic Server 12c Administration
Learn Oracle WebLogic Server 12c AdministrationRevelation Technologies
 
Oracle REST Data Services Best Practices/ Overview
Oracle REST Data Services Best Practices/ OverviewOracle REST Data Services Best Practices/ Overview
Oracle REST Data Services Best Practices/ OverviewKris Rice
 
[Altibase] 9 replication part2 (methods and controls)
[Altibase] 9 replication part2 (methods and controls)[Altibase] 9 replication part2 (methods and controls)
[Altibase] 9 replication part2 (methods and controls)altistory
 
MySQL Webinar 2/4 Performance tuning, hardware, optimisation
MySQL Webinar 2/4 Performance tuning, hardware, optimisationMySQL Webinar 2/4 Performance tuning, hardware, optimisation
MySQL Webinar 2/4 Performance tuning, hardware, optimisationMark Swarbrick
 
HBase Replication for Bulk Loaded Data
HBase Replication for Bulk Loaded DataHBase Replication for Bulk Loaded Data
HBase Replication for Bulk Loaded DataAshish Singhi
 
Rails 3.1 sneak peak
Rails 3.1 sneak peakRails 3.1 sneak peak
Rails 3.1 sneak peakOleg Kossoy
 
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxMarco Gralike
 
State of The Dolphin - May 2021
State of The Dolphin - May 2021State of The Dolphin - May 2021
State of The Dolphin - May 2021Frederic Descamps
 

Similar to HBase Replication (20)

PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
 
SQL Server Alwayson for SharePoint HA/DR Step by Step Guide
SQL Server Alwayson for SharePoint HA/DR Step by Step GuideSQL Server Alwayson for SharePoint HA/DR Step by Step Guide
SQL Server Alwayson for SharePoint HA/DR Step by Step Guide
 
Postgresql_Replication.pptx
Postgresql_Replication.pptxPostgresql_Replication.pptx
Postgresql_Replication.pptx
 
SQLSaturday Bulgaria : HA & DR with SQL Server AlwaysOn Availability Groups
SQLSaturday Bulgaria : HA & DR with SQL Server AlwaysOn Availability GroupsSQLSaturday Bulgaria : HA & DR with SQL Server AlwaysOn Availability Groups
SQLSaturday Bulgaria : HA & DR with SQL Server AlwaysOn Availability Groups
 
Galera Cluster: Synchronous Multi-Master Replication for MySQL HA
Galera Cluster: Synchronous Multi-Master Replication for MySQL HAGalera Cluster: Synchronous Multi-Master Replication for MySQL HA
Galera Cluster: Synchronous Multi-Master Replication for MySQL HA
 
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptxBuilt-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
 
Built in physical and logical replication in postgresql-Firat Gulec
Built in physical and logical replication in postgresql-Firat GulecBuilt in physical and logical replication in postgresql-Firat Gulec
Built in physical and logical replication in postgresql-Firat Gulec
 
Sql server replication step by step
Sql server replication step by stepSql server replication step by step
Sql server replication step by step
 
ProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQLProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQL
 
FlashbackLoggingInternals.ppt
FlashbackLoggingInternals.pptFlashbackLoggingInternals.ppt
FlashbackLoggingInternals.ppt
 
MySQL HA with PaceMaker
MySQL HA with  PaceMakerMySQL HA with  PaceMaker
MySQL HA with PaceMaker
 
Learn Oracle WebLogic Server 12c Administration
Learn Oracle WebLogic Server 12c AdministrationLearn Oracle WebLogic Server 12c Administration
Learn Oracle WebLogic Server 12c Administration
 
Oracle REST Data Services Best Practices/ Overview
Oracle REST Data Services Best Practices/ OverviewOracle REST Data Services Best Practices/ Overview
Oracle REST Data Services Best Practices/ Overview
 
[Altibase] 9 replication part2 (methods and controls)
[Altibase] 9 replication part2 (methods and controls)[Altibase] 9 replication part2 (methods and controls)
[Altibase] 9 replication part2 (methods and controls)
 
MySQL Webinar 2/4 Performance tuning, hardware, optimisation
MySQL Webinar 2/4 Performance tuning, hardware, optimisationMySQL Webinar 2/4 Performance tuning, hardware, optimisation
MySQL Webinar 2/4 Performance tuning, hardware, optimisation
 
Sqlmap
SqlmapSqlmap
Sqlmap
 
HBase Replication for Bulk Loaded Data
HBase Replication for Bulk Loaded DataHBase Replication for Bulk Loaded Data
HBase Replication for Bulk Loaded Data
 
Rails 3.1 sneak peak
Rails 3.1 sneak peakRails 3.1 sneak peak
Rails 3.1 sneak peak
 
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
 
State of The Dolphin - May 2021
State of The Dolphin - May 2021State of The Dolphin - May 2021
State of The Dolphin - May 2021
 

Recently uploaded

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Recently uploaded (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

HBase Replication

  • 1. @TwitterAds | Confidential @ctrezzo HBaseCon 2013 Apache HBase Replication Thursday, July 25, 13
  • 2. @Twitter 2 About me Active contributor to Apache HBase Software Engineer @ Twitter Core Storage Team - Hadoop/HBase Follow me @ctrezzo Thursday, July 25, 13
  • 3. @Twitter 3 Agenda Introduction High-level Architecture Replication State Path of a replicated edit Replication Source Replication Sink Replication Source Manager Thursday, July 25, 13
  • 4. @Twitter 4 HBase replication Asynchronously copy data between two HBase clusters Push-based architecture WAL shipping technique similar to MySQL Thursday, July 25, 13
  • 5. @Twitter 5 Guarantees of replication Eventually consistent Deliver updates at least once Atomicity of individual updates will be preserved Thursday, July 25, 13
  • 6. @Twitter 6 Administering Replication Simply set parameter in hbase-site.xml hbase.replication => true Setup replication topologies add_peer, remove_peer, disable_peer, enable_peer, list_peers Create/Alter tables with replication scope set REPLICATION_SCOPE => ‘1’ Thursday, July 25, 13
  • 7. @Twitter 7 High-Level Architecture ReplicationSource Manager ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog ReplicationSource Region Server ReplicationSink HTable Region Server Cluster 3 1 /state /peers /1 /2 /rs Zookeeper 123 Replication Admin Thursday, July 25, 13
  • 8. @Twitter 8 Replication State Persistently stored in Zookeeper Status Master kill switch Peers List of remote target clusters Queues List of remaining HLogs to replicate and current position in each log Thursday, July 25, 13
  • 9. @Twitter 9 Path of a replicated edit ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog 123 Thursday, July 25, 13
  • 10. @Twitter 10 Path of a replicated edit ReplicationSource Region Server 1 Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog 123 ReplicationSource Region Server 2 Region Server ReplicationSink HTable Region Server 1 2 4 3 HLog 12 ReplicationSource Region Server X Region Server ReplicationSink HTable Region Server 1 2 4 3 HLog 1 Thursday, July 25, 13
  • 11. @Twitter End-point for shipping WAL entries One instance for each queue Runs as a separate thread on region server Uses AdminProtocol RPC to synchronously ship entries Filters edits based on replication scope ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog 123 11 Replication Source Thursday, July 25, 13
  • 12. @Twitter 12 Replication Sink End-point for receiving shipped WAL entries One instance per region server Synchronously receives entries and applies them using HTable Batches rows in the same table ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog 123 Thursday, July 25, 13
  • 13. @Twitter 13 Load balancing Balances load on remote cluster using randomization Ships edits to random subset of remote region servers Default is 10% Cluster 2 20 Region Servers Cluster 1 Thursday, July 25, 13
  • 14. @Twitter 14 Path of a replicated edit ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog 123 Thursday, July 25, 13
  • 15. @Twitter 15 Replication Source Manager Manages all replication sources Manages change in replication state Log rolling Region server failure Addition/deletion of peer clusters ReplicationSource Manager ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog ReplicationSource Region Server ReplicationSink HTable Region Server Cluster 3 1 /state /peers /1 /2 /rs Zookeeper 123 Thursday, July 25, 13
  • 16. @Twitter 16 High-Level Architecture ReplicationSource Manager ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog ReplicationSource Region Server ReplicationSink HTable Region Server Cluster 3 1 /state /peers /1 /2 /rs Zookeeper 123 Replication Admin Thursday, July 25, 13
  • 17. @Twitter 17 Additional Resources Apache HBase user mailing list user@hbase.apache.org Apache HBase reference guide https://hbase.apache.org/book.html Tweet me @ctrezzo Thursday, July 25, 13
  • 19. @Twitter 19 Replication State Persistently stored in Zookeeper Three major replication znodes: Status, Peers, Queues /hbase/replication /state [VALUE: true] /peers /1 [Value: zk1.host.com,zk2.host.com,zk3.host.com:2181:/hbase] /peer-state [Value: ENABLED] /2 [Value: zk5.host.com,zk6.host.com,zk7.host.com:2181:/hbase] /peer-state [Value: DISABLED] /rs /hostname.example.org,6020,1234 /1 /23522342.23422 [VALUE: 254] /12340993.22342 [VALUE: 0] /2 /23522342.23422 [VALUE: 34] /12340993.22342 [VALUE: 0] /hostname2.example.org,6020,1234 /1 /23522348.23443 [VALUE: 87] /12340999.22362 [VALUE: 0] /2 /23522348.23443 [VALUE: 127] /12340999.22362 [VALUE: 0] Thursday, July 25, 13
  • 20. @Twitter 20 Status znode Master kill switch Controlled by start_replication, stop_replication Be careful what you wish for /hbase/replication /state [VALUE: true] Thursday, July 25, 13
  • 21. @Twitter 21 Peers znode A set of remote clusters registered as possible replication targets Identified by peer id Contains status of each peer cluster /hbase/replication /peers /1 [Value: zk1.host.com,zk2.host.com,zk3.host.com:2181:/hbase] /peer-state [Value: ENABLED] /2 [Value: zk5.host.com,zk6.host.com,zk7.host.com:2181:/hbase] /peer-state [Value: DISABLED] Thursday, July 25, 13
  • 22. @Twitter 22 Queues znode Queues identified by region server and peer id Queues contain list of HLogs and current position in log /hbase/replication /rs /hostname.example.org,6020,1234 /1 /23522342.23422 [VALUE: 254] /12340993.22342 [VALUE: 0] /2 /23522342.23422 [VALUE: 34] /12340993.22342 [VALUE: 0] /hostname2.example.org,6020,1234 /1 /23522348.23443 [VALUE: 87] /12340999.22362 [VALUE: 0] /2 /23522348.23443 [VALUE: 127] /12340999.22362 [VALUE: 0] Thursday, July 25, 13