SlideShare a Scribd company logo
© Cloudera, Inc. All rights reserved.
HBase Replication
Wellington Chevreuil
© Cloudera, Inc. All rights reserved.
Overview
● Replication Basics
● Requirements
● HBase Shell Commands
● Implementation Details
● Monitoring
● Extra Tools
● Hands-on labs
© Cloudera, Inc. All rights reserved.
Replication Basics
● Source-push strategy
● Master, Source, Originator - means the cluster sending data.
● Slave, Destination, Target - means cluster receiving data.
● Can be cyclic and allows for multiple masters and slaves
○ A master can have multiple slaves
○ A slave can have multiple masters
○ A cluster can perform both master/slave roles on a given topology
● Eventual consistency
● Asynchronous
● Configurable at column family level
● Relies on WAL data
○ Any changes that bypass WAL won't be replicated, such as bulk load, truncate command, or if skip wal
has been enabled.
● Tracked via ZooKeeper
● Work done by RegionServers
● Adds a source cluster ID to edit's metadata
© Cloudera, Inc. All rights reserved.
Requirements
● All RegionServers must be accessible from all RegionServers from each cluster
● Zookeeper Quorum from slaves must be accessible by masters
● Table structure must be the same in master and slave clusters
○ The column family target for replication must match on master/slave clusters
● If same Zookeeper Quorum is used for master/slave clusters,
zookeeper.znode.parent must be different
● Clusters can have varying sizes
● Clusters can have pre-existing data on target tables
○ In this case, only data added on master after replication has been enabled will be replicated
© Cloudera, Inc. All rights reserved.
HBase Shell Commands
● add_peer
○ Sets a new slave to the current cluster.
● list_peers
○ Shows current list of slaves "known" by this cluster.
● disable_peer
○ Pause replication, but stays tracking new edits to be replicated.
● enable_peer
○ Resumes replication. All edits added since disable_peer execution will now be sent to related
slaves.
● remove_peer
○ Disables replication for the given slave.
○ No edits will be sent to the slave.
© Cloudera, Inc. All rights reserved.
HBase Shell Commands
● enable_table_replication
○ Sets replication flag as true on all column families from specified table.
● disable_table_replication
○ The opposite from the above.
● append_peer_tableCFs, remove_peer_tableCFs, set_peer_tableCFs,
show_peer_tableCFs, update_peer_config, get_peer_config, list_peer_configs,
list_replicated_tables.
○ General admin commands that allow for changing/monitoring configuration of tables currently
targeted for replication
© Cloudera, Inc. All rights reserved.
Implementation Details - Deployment Overview
● This is a deployment diagram
in the context of replication
only, so only major replication
flow relevant components are
highlighted.
● Note no presence of HMasters
either on master (source) or
slave (destination) clusters.
● Zookeeper is of vital
importance, as it keeps the
registry of edits to be
replicated, as well as peers to
replicate to.
● RSes on Master cluster depend
on ZK from Slave cluster.
© Cloudera, Inc. All rights reserved.
Implementation Details - Setup/Maintenance commands
● Shell commands interact directly with Zookeeper.
● Replication is kept on master cluster's Zookeeper znodes.
● No interaction within RSes when replication shell commands are ran.
© Cloudera, Inc. All rights reserved.
Implementation Details - Setup WAL and Replication
● RS init phase where
replication service classes are
created.
● Once replication related
classes are properly
initialized, Replication
instance is added to the list
of WALActionListener.
● WALFactory instance is
created, with the list of
listeners containing
Replication instance.
© Cloudera, Inc. All rights reserved.
Implementation Details - Setup WAL and Replication
● Replication related classes are only initialised if "hbase.replication" is set to true.
● This will happen between the following log messages from RS startup logs:
● Replication Source/Sink implementation default: org.apache.hadoop.hbase.replication.regionserver.Replication
○ This is configurable by hbase.replication.source.service and hbase.replication.source.service
INFO org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to master=...
INFO org.apache.hadoop.hbase.replication.ReplicationPeersZKImpl: Added new peer
cluster=remote_peer_host:2181:/hbase
INFO org.apache.hadoop.hbase.wal.WALFactory: Instantiating WALProvider of type class
org.apache.hadoop.hbase.wal.BoundedRegionGroupingProvider
Watch out for possible customer
specific configurations
© Cloudera, Inc. All rights reserved.
Implementation Details - Setup WAL and Replication
© Cloudera, Inc. All rights reserved.
Implementation Details - Setup WAL and Replication
● During WAL related classes creation, WAL file is rolled.
● Replication was added as a WAL listener before, so ReplicationSourceManager will be
notified about log roll.
● Using Zookeeper, ReplicationSourceManager adds the new WAL file to the queue of
logs (this will be under replication znodes).
© Cloudera, Inc. All rights reserved.
Implementation Details - Setup WAL and Replication
● Over WAL file rolling, no replication specific log message is recorded.
● ReplicationSourceManager code will be notified about new WAL file creation
between below messages:
INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: WAL configuration: blocksize=128 MB, ...
...
INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: New WAL /hbase/WALs/…
….
© Cloudera, Inc. All rights reserved.
Implementation Details - Setup WAL and Replication
● Potential errors involving replication on this phase will be mostly related to znodes
access, preventing ZK queue from being initialized:
ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Failed init
java.io.IOException: Failed replication handler create
at org.apache.hadoop.hbase.replication.regionserver.Replication.initialize(Replication.java:130)
at org.apache.hadoop.hbase.regionserver.HRegionServer.newReplicationInstance(HRegionServer.java:2662)
at org.apache.hadoop.hbase.regionserver.HRegionServer.createNewReplicationInstance(HRegionServer.java:2632)
at org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1647)
at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1388)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:918)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hbase.replication.ReplicationException: Could not initialize replication queues.
at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.init(ReplicationQueuesZKImpl.java:85)
at org.apache.hadoop.hbase.replication.regionserver.Replication.initialize(Replication.java:122)
... 6 more
© Cloudera, Inc. All rights reserved.
Implementation Details - Start Replication Thread
● From HRegionServer.startServiceThreads method, replication source and sink
threads are set and started.
● ReplicationSourceManager initialization involves several steps, to be detailed next.
● ReplicationSink instance will be used to perform the actual sink if the cluster act as a
destination cluster. To be detailed later.
© Cloudera, Inc. All rights reserved.
Implementation Details - Start Replication Thread
© Cloudera, Inc. All rights reserved.
Implementation Details - Start Replication Thread
● Once ReplicationSourceManager.addSource completed properly for each peer,
following message would be seen:
● Upon startup, ReplicationSource.run method will also log below message:
● Since this is asynchronously, it may occur before or after the previous message.
● It should be logged for each peer id.
INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Current list of replicators:
[host-1,60020,1510938412878, host1,60020,1510929825829] other RSs: [host-1,60020,1510938412878]
…
INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicating
9fa10771-97b2-48ed-b635-b0bd474a99b2 -> 5f54f936-a5f8-4726-9d09-7bf1c709eeab
© Cloudera, Inc. All rights reserved.
Implementation Details - New Peers
● ReplicationTrackerZKImpl receives notification about changes on replication znodes.
● New peer addition triggers peer list update on ReplicationPeersZKImpl.
● With at least one peer, ReplicationQueuesZKImpl will get notified about WAL file
creation.
INFO org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl: /hbase/replication/peers znode expired, triggering
peerListChanged event
...
INFO org.apache.hadoop.hbase.replication.ReplicationPeersZKImpl: Added new peer cluster=peer-host:2181:/hbase
© Cloudera, Inc. All rights reserved.
Implementation Details - Shipping Edits
● Main work done by ReplicationSourceWorkerThread instances.
○ One per WAL group
○ Every WAL group has its own queue of WAL files to be processed.
○ Runs in the background indefinitely. Will sleep for replication.source.sleepforretries if peer is
disabled.
○ On each loop iteration:
■ Reads current WAL being written.
■ Apply editlog filters (get only edits for CFs marked for replication, whose cluster origin ID is not same as peer).
■ For editlogs filtered, connect to a RS on the remote cluster and send those (via RPC).
■ Edits must be read (and processed) sequentially. If shipment fails, replication will not progress for that WAL
group, and lags may be seen
© Cloudera, Inc. All rights reserved.
Implementation Details - Shipping Edits (Source Side)
© Cloudera, Inc. All rights reserved.
Implementation Details - Shipping Edits (Source Side)
● HBaseInterClusterReplicationEndpoint.replicate() method detailed flow
● Uses its own thread pool for performing RPC calls
● Replicator class implements java.util.concurrent.Callable for async execution.
© Cloudera, Inc. All rights reserved.
Implementation Details - Shipping Edits (Source Side)
● Replicator uses SinkPeer to discover remote RS responsible to run the sink.
● ReplicationProtbufUtil is used for convert request to protobuff and perform RPC.
© Cloudera, Inc. All rights reserved.
Implementation Details - Shipping Edits (Destination Side)
● ReplicationSink uses default client API to process put/delete operations.
● Not necessarily the RS running the sink is the same for the regions where entries will
be placed.
● Coprocessors may get invoked.
© Cloudera, Inc. All rights reserved.
Monitoring
● Some classes provide additional TRACE/DEBUG messages that can be turned on for
further troubleshooting.
● Worth enable it using RS UI for specific classes only, instead of turn TRACE to whole
HBase service:
○ ReplicationSource, HBaseReplicationEndpoint, HBaseInterClusterReplicationEndpoint,
● JMX Metrics might also help get a state of replication:
○ shippedBatches, AgeOfLastShippedOP, logReadInBytes.
■ Global and per WAL group id.
● ReplicationStatisticsThread also logs replication stats every 5 minutes:
IINFO org.apache.hadoop.hbase.replication.regionserver.Replication: Normal source for cluster 1: Total replicated edits: 2, current progress:
walGroup [host-1%2C60020%2C1511034265841.null0]: currently replicating from:
hdfs://nameservice1/hbase/WALs/host-1,60020,1511034265841/host-1%2C60020%2C1511034265841.null0.1511196279542 at position: 83
© Cloudera, Inc. All rights reserved.
Monitoring
● HBase shell status 'replication' command:
○ On source cluster:
○ On destination cluster:
1 live servers
Host-10-17-101-41.coe.cloudera.com:
SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=0, TimeStampsOfLastShippedOp=Mon Nov 20 10:02:05 PST 2017, Replication Lag=0
SINK : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Sat Nov 18 11:49:29 PST 2017
1 live servers
Host-10-17-103-206.coe.cloudera.com:
SOURCE:
SINK : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Mon Nov 20 08:40:19 PST 2017
© Cloudera, Inc. All rights reserved.
Monitoring
● VerifyReplication
○ MR job that compares the records for the table in source and destination cluster.
○ Prints counter within its findings:
1 test-1
...
17/11/20 10:43:12 INFO mapreduce.Job: map 0% reduce 0%
17/11/20 10:43:18 INFO mapreduce.Job: map 33% reduce 0%
17/11/20 10:43:19 INFO mapreduce.Job: map 67% reduce 0%
17/11/20 10:43:23 INFO mapreduce.Job: map 100% reduce 0%
17/11/20 10:43:24 INFO mapreduce.Job: Job job_1506585949780_0005 completed successfully
…
org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters
BADROWS=25
GOODROWS=11
ONLY_IN_SOURCE_TABLE_ROWS=25
...
© Cloudera, Inc. All rights reserved.
Monitoring
● DumpReplicationQueues
hbase org.apache.hadoop.hbase.replication.regionserver.DumpReplicationQueues --distributed
...
Dumping replication peers and configurations:
Peer: 2
State: ENABLED
Cluster Name:
clusterKey=host-10-17-103-187.coe.cloudera.com,host-10-17-103-189.coe.cloudera.com,host-10-17-103-193.coe.cloudera.com:2181:/hbase,replicationEndpoint
Impl=null
Peer Table CFs: null
…
Dumping replication queue info for RegionServer: [host-10-17-101-41.coe.cloudera.com,60020,1511971261591]
replication queue: 1
Replication position for host-10-17-101-41.coe.cloudera.com%2C60020%2C1511971261591.null0.1512140473468: 13227
...
© Cloudera, Inc. All rights reserved.
Extra Tools
● In case data is already available on either source/destination cluster tables, some
tools can be used to sync data:
○ CopyTable
■ https://hbase.apache.org/book.html#copy.table
○ Export Snapshots
■ https://hbase.apache.org/book.html#ops.snapshots.export
○ Bulk Load
■ https://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/
○ HashTable/SyncTable
■ Now documented here.
■ Best option, can be used even after replication is already enabled.
■ Allows for syncing deleted rows.
■ Only available from CDH 5.9.0 onwards
© Cloudera, Inc. All rights reserved.
Extra Tools
● HashTable/SyncTable:
○ Two MR jobs
■ org.apache.hadoop.hbase.mapreduce.HashTable
■ org.apache.hadoop.hbase.mapreduce.SyncTable
○ Usage:
■ First, run HashTable MR job on the cluster whose state should be propagated to the remote peer. For example, if
we want to sync table "test-1" state on destination cluster with state from source cluster, run below at source:
● Where first param is the table name, and second param is an hdfs path where HashTable job should
output table's summary
$ hbase org.apache.hadoop.hbase.mapreduce.HashTable test-1 /tmp/test-1
© Cloudera, Inc. All rights reserved.
Extra Tools
● HashTable/SyncTable:
○ Usage
■ Once HashTable has finished on source cluster, run SyncTable on destination cluster:
■ First and second params are the ZK address and NN address of source cluster, respectively
■ Last two params are the table names on source and destination cluster
■ This command would cause the table data on destination cluster to be in sync with the source
cluster
● If source cluster had more rows prior to the command, these additional rows would be
copied to destination.
● If destination cluster had more rows then source, these rows would be deleted from
destination.
$ hbase org.apache.hadoop.hbase.mapreduce.SyncTable --sourcezkcluster=source_zk:2181:/hbase hdfs://source_nn:8020/tmp/test-1 test-1 test-1
© Cloudera, Inc. All rights reserved.
Labs Exercises
1. Problem 1: Replication related znodes not readable by RSes
2. Problem 2: Remote cluster not reachable by source cluster
3. Problem 3: Remote cluster is reachable, but sinks are not completing

More Related Content

What's hot

Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
Jun Rao
 
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Transparent Data Encryption in PostgreSQL and Integration with Key Management...Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Masahiko Sawada
 

What's hot (20)

Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
MySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaMySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & Grafana
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
How to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams SafeHow to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams Safe
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBase
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
 
Grafana
GrafanaGrafana
Grafana
 
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Transparent Data Encryption in PostgreSQL and Integration with Key Management...Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
 
Scylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with Raft
 
PostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetPostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication Cheatsheet
 
Kudu Deep-Dive
Kudu Deep-DiveKudu Deep-Dive
Kudu Deep-Dive
 
HBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsHBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table Snapshots
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
 

Similar to HBase replication

Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptxBuilt-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
nadirpervez2
 

Similar to HBase replication (20)

Highly Available Load Balanced Galera MySql Cluster
Highly Available Load Balanced  Galera MySql ClusterHighly Available Load Balanced  Galera MySql Cluster
Highly Available Load Balanced Galera MySql Cluster
 
Hbase 89 fb online configuration
Hbase 89 fb online configurationHbase 89 fb online configuration
Hbase 89 fb online configuration
 
How Yelp does Service Discovery
How Yelp does Service DiscoveryHow Yelp does Service Discovery
How Yelp does Service Discovery
 
What’s new in Galera 4
What’s new in Galera 4What’s new in Galera 4
What’s new in Galera 4
 
Galera Cluster 4 presentation at Percona Live Austin 2019
Galera Cluster 4 presentation at Percona Live Austin 2019 Galera Cluster 4 presentation at Percona Live Austin 2019
Galera Cluster 4 presentation at Percona Live Austin 2019
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptxBuilt-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
 
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
TechDay - Cambridge 2016 - OpenNebula at Harvard UniverityTechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
 
MySQL Webinar 2/4 Performance tuning, hardware, optimisation
MySQL Webinar 2/4 Performance tuning, hardware, optimisationMySQL Webinar 2/4 Performance tuning, hardware, optimisation
MySQL Webinar 2/4 Performance tuning, hardware, optimisation
 
M|18 Under the Hood: Galera Cluster
M|18 Under the Hood: Galera ClusterM|18 Under the Hood: Galera Cluster
M|18 Under the Hood: Galera Cluster
 
Galera webinar migration to galera cluster from my sql async replication
Galera webinar migration to galera cluster from my sql async replicationGalera webinar migration to galera cluster from my sql async replication
Galera webinar migration to galera cluster from my sql async replication
 
HBase tales from the trenches
HBase tales from the trenchesHBase tales from the trenches
HBase tales from the trenches
 
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group ReplicationPercona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
 
MySqL Failover by Weatherly Cloud Computing USA
MySqL Failover by Weatherly Cloud Computing USAMySqL Failover by Weatherly Cloud Computing USA
MySqL Failover by Weatherly Cloud Computing USA
 
My sql failover test using orchestrator
My sql failover test  using orchestratorMy sql failover test  using orchestrator
My sql failover test using orchestrator
 
Scale Apache with Nginx
Scale Apache with NginxScale Apache with Nginx
Scale Apache with Nginx
 
Openstack HA
Openstack HAOpenstack HA
Openstack HA
 
Deploy Rails Application by Capistrano
Deploy Rails Application by CapistranoDeploy Rails Application by Capistrano
Deploy Rails Application by Capistrano
 
Multi Source Replication With MySQL 5.7 @ Verisure
Multi Source Replication With MySQL 5.7 @ VerisureMulti Source Replication With MySQL 5.7 @ Verisure
Multi Source Replication With MySQL 5.7 @ Verisure
 
03 h base-2-installation_andshell
03 h base-2-installation_andshell03 h base-2-installation_andshell
03 h base-2-installation_andshell
 

More from wchevreuil

Hadoop - TDC 2012
Hadoop - TDC 2012Hadoop - TDC 2012
Hadoop - TDC 2012
wchevreuil
 

More from wchevreuil (9)

Cloudera Enabling Native Integration of NoSQL HBase with Cloud Providers.pdf
Cloudera Enabling Native Integration of NoSQL HBase with Cloud Providers.pdfCloudera Enabling Native Integration of NoSQL HBase with Cloud Providers.pdf
Cloudera Enabling Native Integration of NoSQL HBase with Cloud Providers.pdf
 
HBase System Tables / Metadata Info
HBase System Tables / Metadata InfoHBase System Tables / Metadata Info
HBase System Tables / Metadata Info
 
HDFS client write/read implementation details
HDFS client write/read implementation detailsHDFS client write/read implementation details
HDFS client write/read implementation details
 
HBase RITs
HBase RITsHBase RITs
HBase RITs
 
Hbasecon2019 hbck2 (1)
Hbasecon2019 hbck2 (1)Hbasecon2019 hbck2 (1)
Hbasecon2019 hbck2 (1)
 
Web hdfs and httpfs
Web hdfs and httpfsWeb hdfs and httpfs
Web hdfs and httpfs
 
Hadoop tuning
Hadoop tuningHadoop tuning
Hadoop tuning
 
I nd t_bigdata(1)
I nd t_bigdata(1)I nd t_bigdata(1)
I nd t_bigdata(1)
 
Hadoop - TDC 2012
Hadoop - TDC 2012Hadoop - TDC 2012
Hadoop - TDC 2012
 

Recently uploaded

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 

Recently uploaded (20)

In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 

HBase replication

  • 1. © Cloudera, Inc. All rights reserved. HBase Replication Wellington Chevreuil
  • 2. © Cloudera, Inc. All rights reserved. Overview ● Replication Basics ● Requirements ● HBase Shell Commands ● Implementation Details ● Monitoring ● Extra Tools ● Hands-on labs
  • 3. © Cloudera, Inc. All rights reserved. Replication Basics ● Source-push strategy ● Master, Source, Originator - means the cluster sending data. ● Slave, Destination, Target - means cluster receiving data. ● Can be cyclic and allows for multiple masters and slaves ○ A master can have multiple slaves ○ A slave can have multiple masters ○ A cluster can perform both master/slave roles on a given topology ● Eventual consistency ● Asynchronous ● Configurable at column family level ● Relies on WAL data ○ Any changes that bypass WAL won't be replicated, such as bulk load, truncate command, or if skip wal has been enabled. ● Tracked via ZooKeeper ● Work done by RegionServers ● Adds a source cluster ID to edit's metadata
  • 4. © Cloudera, Inc. All rights reserved. Requirements ● All RegionServers must be accessible from all RegionServers from each cluster ● Zookeeper Quorum from slaves must be accessible by masters ● Table structure must be the same in master and slave clusters ○ The column family target for replication must match on master/slave clusters ● If same Zookeeper Quorum is used for master/slave clusters, zookeeper.znode.parent must be different ● Clusters can have varying sizes ● Clusters can have pre-existing data on target tables ○ In this case, only data added on master after replication has been enabled will be replicated
  • 5. © Cloudera, Inc. All rights reserved. HBase Shell Commands ● add_peer ○ Sets a new slave to the current cluster. ● list_peers ○ Shows current list of slaves "known" by this cluster. ● disable_peer ○ Pause replication, but stays tracking new edits to be replicated. ● enable_peer ○ Resumes replication. All edits added since disable_peer execution will now be sent to related slaves. ● remove_peer ○ Disables replication for the given slave. ○ No edits will be sent to the slave.
  • 6. © Cloudera, Inc. All rights reserved. HBase Shell Commands ● enable_table_replication ○ Sets replication flag as true on all column families from specified table. ● disable_table_replication ○ The opposite from the above. ● append_peer_tableCFs, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs, update_peer_config, get_peer_config, list_peer_configs, list_replicated_tables. ○ General admin commands that allow for changing/monitoring configuration of tables currently targeted for replication
  • 7. © Cloudera, Inc. All rights reserved. Implementation Details - Deployment Overview ● This is a deployment diagram in the context of replication only, so only major replication flow relevant components are highlighted. ● Note no presence of HMasters either on master (source) or slave (destination) clusters. ● Zookeeper is of vital importance, as it keeps the registry of edits to be replicated, as well as peers to replicate to. ● RSes on Master cluster depend on ZK from Slave cluster.
  • 8. © Cloudera, Inc. All rights reserved. Implementation Details - Setup/Maintenance commands ● Shell commands interact directly with Zookeeper. ● Replication is kept on master cluster's Zookeeper znodes. ● No interaction within RSes when replication shell commands are ran.
  • 9. © Cloudera, Inc. All rights reserved. Implementation Details - Setup WAL and Replication ● RS init phase where replication service classes are created. ● Once replication related classes are properly initialized, Replication instance is added to the list of WALActionListener. ● WALFactory instance is created, with the list of listeners containing Replication instance.
  • 10. © Cloudera, Inc. All rights reserved. Implementation Details - Setup WAL and Replication ● Replication related classes are only initialised if "hbase.replication" is set to true. ● This will happen between the following log messages from RS startup logs: ● Replication Source/Sink implementation default: org.apache.hadoop.hbase.replication.regionserver.Replication ○ This is configurable by hbase.replication.source.service and hbase.replication.source.service INFO org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to master=... INFO org.apache.hadoop.hbase.replication.ReplicationPeersZKImpl: Added new peer cluster=remote_peer_host:2181:/hbase INFO org.apache.hadoop.hbase.wal.WALFactory: Instantiating WALProvider of type class org.apache.hadoop.hbase.wal.BoundedRegionGroupingProvider Watch out for possible customer specific configurations
  • 11. © Cloudera, Inc. All rights reserved. Implementation Details - Setup WAL and Replication
  • 12. © Cloudera, Inc. All rights reserved. Implementation Details - Setup WAL and Replication ● During WAL related classes creation, WAL file is rolled. ● Replication was added as a WAL listener before, so ReplicationSourceManager will be notified about log roll. ● Using Zookeeper, ReplicationSourceManager adds the new WAL file to the queue of logs (this will be under replication znodes).
  • 13. © Cloudera, Inc. All rights reserved. Implementation Details - Setup WAL and Replication ● Over WAL file rolling, no replication specific log message is recorded. ● ReplicationSourceManager code will be notified about new WAL file creation between below messages: INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: WAL configuration: blocksize=128 MB, ... ... INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: New WAL /hbase/WALs/… ….
  • 14. © Cloudera, Inc. All rights reserved. Implementation Details - Setup WAL and Replication ● Potential errors involving replication on this phase will be mostly related to znodes access, preventing ZK queue from being initialized: ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Failed init java.io.IOException: Failed replication handler create at org.apache.hadoop.hbase.replication.regionserver.Replication.initialize(Replication.java:130) at org.apache.hadoop.hbase.regionserver.HRegionServer.newReplicationInstance(HRegionServer.java:2662) at org.apache.hadoop.hbase.regionserver.HRegionServer.createNewReplicationInstance(HRegionServer.java:2632) at org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1647) at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1388) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:918) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hbase.replication.ReplicationException: Could not initialize replication queues. at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.init(ReplicationQueuesZKImpl.java:85) at org.apache.hadoop.hbase.replication.regionserver.Replication.initialize(Replication.java:122) ... 6 more
  • 15. © Cloudera, Inc. All rights reserved. Implementation Details - Start Replication Thread ● From HRegionServer.startServiceThreads method, replication source and sink threads are set and started. ● ReplicationSourceManager initialization involves several steps, to be detailed next. ● ReplicationSink instance will be used to perform the actual sink if the cluster act as a destination cluster. To be detailed later.
  • 16. © Cloudera, Inc. All rights reserved. Implementation Details - Start Replication Thread
  • 17. © Cloudera, Inc. All rights reserved. Implementation Details - Start Replication Thread ● Once ReplicationSourceManager.addSource completed properly for each peer, following message would be seen: ● Upon startup, ReplicationSource.run method will also log below message: ● Since this is asynchronously, it may occur before or after the previous message. ● It should be logged for each peer id. INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Current list of replicators: [host-1,60020,1510938412878, host1,60020,1510929825829] other RSs: [host-1,60020,1510938412878] … INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicating 9fa10771-97b2-48ed-b635-b0bd474a99b2 -> 5f54f936-a5f8-4726-9d09-7bf1c709eeab
  • 18. © Cloudera, Inc. All rights reserved. Implementation Details - New Peers ● ReplicationTrackerZKImpl receives notification about changes on replication znodes. ● New peer addition triggers peer list update on ReplicationPeersZKImpl. ● With at least one peer, ReplicationQueuesZKImpl will get notified about WAL file creation. INFO org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl: /hbase/replication/peers znode expired, triggering peerListChanged event ... INFO org.apache.hadoop.hbase.replication.ReplicationPeersZKImpl: Added new peer cluster=peer-host:2181:/hbase
  • 19. © Cloudera, Inc. All rights reserved. Implementation Details - Shipping Edits ● Main work done by ReplicationSourceWorkerThread instances. ○ One per WAL group ○ Every WAL group has its own queue of WAL files to be processed. ○ Runs in the background indefinitely. Will sleep for replication.source.sleepforretries if peer is disabled. ○ On each loop iteration: ■ Reads current WAL being written. ■ Apply editlog filters (get only edits for CFs marked for replication, whose cluster origin ID is not same as peer). ■ For editlogs filtered, connect to a RS on the remote cluster and send those (via RPC). ■ Edits must be read (and processed) sequentially. If shipment fails, replication will not progress for that WAL group, and lags may be seen
  • 20. © Cloudera, Inc. All rights reserved. Implementation Details - Shipping Edits (Source Side)
  • 21. © Cloudera, Inc. All rights reserved. Implementation Details - Shipping Edits (Source Side) ● HBaseInterClusterReplicationEndpoint.replicate() method detailed flow ● Uses its own thread pool for performing RPC calls ● Replicator class implements java.util.concurrent.Callable for async execution.
  • 22. © Cloudera, Inc. All rights reserved. Implementation Details - Shipping Edits (Source Side) ● Replicator uses SinkPeer to discover remote RS responsible to run the sink. ● ReplicationProtbufUtil is used for convert request to protobuff and perform RPC.
  • 23. © Cloudera, Inc. All rights reserved. Implementation Details - Shipping Edits (Destination Side) ● ReplicationSink uses default client API to process put/delete operations. ● Not necessarily the RS running the sink is the same for the regions where entries will be placed. ● Coprocessors may get invoked.
  • 24. © Cloudera, Inc. All rights reserved. Monitoring ● Some classes provide additional TRACE/DEBUG messages that can be turned on for further troubleshooting. ● Worth enable it using RS UI for specific classes only, instead of turn TRACE to whole HBase service: ○ ReplicationSource, HBaseReplicationEndpoint, HBaseInterClusterReplicationEndpoint, ● JMX Metrics might also help get a state of replication: ○ shippedBatches, AgeOfLastShippedOP, logReadInBytes. ■ Global and per WAL group id. ● ReplicationStatisticsThread also logs replication stats every 5 minutes: IINFO org.apache.hadoop.hbase.replication.regionserver.Replication: Normal source for cluster 1: Total replicated edits: 2, current progress: walGroup [host-1%2C60020%2C1511034265841.null0]: currently replicating from: hdfs://nameservice1/hbase/WALs/host-1,60020,1511034265841/host-1%2C60020%2C1511034265841.null0.1511196279542 at position: 83
  • 25. © Cloudera, Inc. All rights reserved. Monitoring ● HBase shell status 'replication' command: ○ On source cluster: ○ On destination cluster: 1 live servers Host-10-17-101-41.coe.cloudera.com: SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=0, TimeStampsOfLastShippedOp=Mon Nov 20 10:02:05 PST 2017, Replication Lag=0 SINK : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Sat Nov 18 11:49:29 PST 2017 1 live servers Host-10-17-103-206.coe.cloudera.com: SOURCE: SINK : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Mon Nov 20 08:40:19 PST 2017
  • 26. © Cloudera, Inc. All rights reserved. Monitoring ● VerifyReplication ○ MR job that compares the records for the table in source and destination cluster. ○ Prints counter within its findings: 1 test-1 ... 17/11/20 10:43:12 INFO mapreduce.Job: map 0% reduce 0% 17/11/20 10:43:18 INFO mapreduce.Job: map 33% reduce 0% 17/11/20 10:43:19 INFO mapreduce.Job: map 67% reduce 0% 17/11/20 10:43:23 INFO mapreduce.Job: map 100% reduce 0% 17/11/20 10:43:24 INFO mapreduce.Job: Job job_1506585949780_0005 completed successfully … org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters BADROWS=25 GOODROWS=11 ONLY_IN_SOURCE_TABLE_ROWS=25 ...
  • 27. © Cloudera, Inc. All rights reserved. Monitoring ● DumpReplicationQueues hbase org.apache.hadoop.hbase.replication.regionserver.DumpReplicationQueues --distributed ... Dumping replication peers and configurations: Peer: 2 State: ENABLED Cluster Name: clusterKey=host-10-17-103-187.coe.cloudera.com,host-10-17-103-189.coe.cloudera.com,host-10-17-103-193.coe.cloudera.com:2181:/hbase,replicationEndpoint Impl=null Peer Table CFs: null … Dumping replication queue info for RegionServer: [host-10-17-101-41.coe.cloudera.com,60020,1511971261591] replication queue: 1 Replication position for host-10-17-101-41.coe.cloudera.com%2C60020%2C1511971261591.null0.1512140473468: 13227 ...
  • 28. © Cloudera, Inc. All rights reserved. Extra Tools ● In case data is already available on either source/destination cluster tables, some tools can be used to sync data: ○ CopyTable ■ https://hbase.apache.org/book.html#copy.table ○ Export Snapshots ■ https://hbase.apache.org/book.html#ops.snapshots.export ○ Bulk Load ■ https://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/ ○ HashTable/SyncTable ■ Now documented here. ■ Best option, can be used even after replication is already enabled. ■ Allows for syncing deleted rows. ■ Only available from CDH 5.9.0 onwards
  • 29. © Cloudera, Inc. All rights reserved. Extra Tools ● HashTable/SyncTable: ○ Two MR jobs ■ org.apache.hadoop.hbase.mapreduce.HashTable ■ org.apache.hadoop.hbase.mapreduce.SyncTable ○ Usage: ■ First, run HashTable MR job on the cluster whose state should be propagated to the remote peer. For example, if we want to sync table "test-1" state on destination cluster with state from source cluster, run below at source: ● Where first param is the table name, and second param is an hdfs path where HashTable job should output table's summary $ hbase org.apache.hadoop.hbase.mapreduce.HashTable test-1 /tmp/test-1
  • 30. © Cloudera, Inc. All rights reserved. Extra Tools ● HashTable/SyncTable: ○ Usage ■ Once HashTable has finished on source cluster, run SyncTable on destination cluster: ■ First and second params are the ZK address and NN address of source cluster, respectively ■ Last two params are the table names on source and destination cluster ■ This command would cause the table data on destination cluster to be in sync with the source cluster ● If source cluster had more rows prior to the command, these additional rows would be copied to destination. ● If destination cluster had more rows then source, these rows would be deleted from destination. $ hbase org.apache.hadoop.hbase.mapreduce.SyncTable --sourcezkcluster=source_zk:2181:/hbase hdfs://source_nn:8020/tmp/test-1 test-1 test-1
  • 31. © Cloudera, Inc. All rights reserved. Labs Exercises 1. Problem 1: Replication related znodes not readable by RSes 2. Problem 2: Remote cluster not reachable by source cluster 3. Problem 3: Remote cluster is reachable, but sinks are not completing