1
The State of HBase Replication
Jean-Daniel Cryans
May 5th, 2014
©2014 Cloudera, Inc. All rights reserved.
About me
2
• Software Engineer at Cloudera, Storage team
• Apache HBase committer since 2008, PMC
member
©2014 Cloudera, Inc. All rights reserved.
Motivation for HBase Replication
• Even though HBase is:
3
©2014 Cloudera, Inc. All rights reserved.
Motivation for HBase Replication
• Even though HBase is:
• distributed;
3
©2014 Cloudera, Inc. All rights reserved.
Motivation for HBase Replication
• Even though HBase is:
• distributed;
• fault-tolerant;
3
©2014 Cloudera, Inc. All rights reserved.
Motivation for HBase Replication
• Even though HBase is:
• distributed;
• fault-tolerant;
• highly available; and
3
©2014 Cloudera, Inc. All rights reserved.
Motivation for HBase Replication
• Even though HBase is:
• distributed;
• fault-tolerant;
• highly available; and
• almost magic.
3
©2014 Cloudera, Inc. All rights reserved.
Motivation for HBase Replication
• Even though HBase is:
• distributed;
• fault-tolerant;
• highly available; and
• almost magic.
3
©2014 Cloudera, Inc. All rights reserved.
The Current State
• It’s production-ready.
4
©2014 Cloudera, Inc. All rights reserved.
The Current State
• It’s production-ready.
• It’s used to replicate data between thousands
of nodes across continents.
4
©2014 Cloudera, Inc. All rights reserved.
The Current State
• It’s production-ready.
• It’s used to replicate data between thousands
of nodes across continents.
• It’s used for Disaster Recovery, geo-
distributed serving, and more.
4
©2014 Cloudera, Inc. All rights reserved.
5
Agenda
• Four Years of Replication
• Use Cases in Production
• Roadmap
©2014 Cloudera, Inc. All rights reserved.
Design
• Clusters are distinct
• Pull VS push
• Sync VS Async
6
©2014 Cloudera, Inc. All rights reserved.
Clusters are Distinct
• HBase doesn’t span DCs, HDFSs
7
Master
20 RS
Slave
15 RS
©2014 Cloudera, Inc. All rights reserved.
Clusters are Distinct
• HBase doesn’t span DCs, HDFSs
• .META. operations aren’t replicated
7
Master
20 RS
Slave
15 RS
©2014 Cloudera, Inc. All rights reserved.
Clusters are Distinct
• HBase doesn’t span DCs, HDFSs
• .META. operations aren’t replicated
• Regions can be different
7
Master
20 RS
Slave
15 RS
©2014 Cloudera, Inc. All rights reserved.
Clusters are Distinct
• HBase doesn’t span DCs, HDFSs
• .META. operations aren’t replicated
• Regions can be different
• Security has to be configured for each cluster
7
Master
20 RS
Slave
15 RS
©2014 Cloudera, Inc. All rights reserved.
Push instead of Pull
8
MySQL
Master
MySQL
Slave
Get binlog
Apply locally
MySQL Replication uses Pull
Cluster A Cluster B
©2014 Cloudera, Inc. All rights reserved.
Push instead of Pull
9
RS RSreplicate entries
Apply to cluster
HBase Replication uses Push
Cluster A Cluster B
©2014 Cloudera, Inc. All rights reserved.
Async instead of Sync
10
Cluster A Cluster B
RS
HLog
MemStore
RS
HLog
MemStore
Synchronous Replication
©2014 Cloudera, Inc. All rights reserved.
Async instead of Sync
10
Cluster A Cluster B
RS
HLog
MemStore
RS
HLog
MemStore
Put
2
3
1
Synchronous Replication
©2014 Cloudera, Inc. All rights reserved.
Async instead of Sync
10
Cluster A Cluster B
RS
HLog
MemStore
RS
HLog
MemStore
Put
2
3
1
Ack Ack
Put
5
6
4
78
Synchronous Replication
©2014 Cloudera, Inc. All rights reserved.
Async instead of Sync
11
Asynchronous Replication
©2014 Cloudera, Inc. All rights reserved.
Async instead of Sync
11
Asynchronous Replication
Cluster A
RS
HLog
MemStore
Put
Ack
2
3
1
4
©2014 Cloudera, Inc. All rights reserved.
Async instead of Sync
11
Asynchronous Replication
Cluster A
RS
HLog
MemStore
Put
Ack
2
3
1
4
Cluster B
RS
HLog
MemStore
Ack
Put
3
4
2
5
HLog
Tailing
Thread
1
©2014 Cloudera, Inc. All rights reserved.
First Release - 0.90.0
• Simple master-slave (only one)
• Disabled by default
• Uses ZK as a metadata store
12
©2014 Cloudera, Inc. All rights reserved.
Original Implementation
13
replicateLogEntries()Replication
Source
ZooKeeper
Watcher
Region Server on
Master Cluster
Replication
Sink
HTable
Put
Delete
Region Server on
Slave Cluster
©2014 Cloudera, Inc. All rights reserved.
First Lesson Learned
• HDFS doesn’t support tailing files being
written to. It requires:
• open()
• seek()// go where we stopped last time
• while (not EOF || enoughData)
• read()
• close()
• repeat
14
©2014 Cloudera, Inc. All rights reserved.
Second Lesson Learned
• Single threaded, non-batched ZK is slow
• ZK didn’t have an atomic move operation
• Doubles # ops needed, race conditions
15
©2014 Cloudera, Inc. All rights reserved.
Second Lesson Learned
• Single threaded, non-batched ZK is slow
• ZK didn’t have an atomic move operation
• Doubles # ops needed, race conditions
15
/hbase
/replication
/RS1
/1
/hlog1
/hlog2
...
/hbase
/replication
/RS2
/1-RS1
/hlog1
1. create new hlog2
2. delete old hlog2
©2014 Cloudera, Inc. All rights reserved.
Second Release - 0.92.0
• Cyclic replication
• Multi-slave (scope LOCAL or GLOBAL)
• Enable / disable peer
• Special configurations
16
©2014 Cloudera, Inc. All rights reserved.
Cyclic Replication
17
Cluster
1
Cluster
2
Cluster
3
Put Row X
©2014 Cloudera, Inc. All rights reserved.
Cyclic Replication
17
Cluster
1
Cluster
2
Cluster
3
Put Row X
Put Row X
©2014 Cloudera, Inc. All rights reserved.
Cyclic Replication
17
Cluster
1
Cluster
2
Cluster
3
Put Row X
Put Row X
Put Row X
©2014 Cloudera, Inc. All rights reserved.
Cyclic Replication
17
Cluster
1
Cluster
2
Cluster
3
Put Row X
Put Row X
Put Row X
Row X is from 1
Don’t replicate!
©2014 Cloudera, Inc. All rights reserved.
Multi-Slave
18
Cluster
1
Cluster
2
Cluster
3
Put Row X
©2014 Cloudera, Inc. All rights reserved.
Multi-Slave
18
Cluster
1
Cluster
2
Cluster
3
Put Row X
Put Row X
©2014 Cloudera, Inc. All rights reserved.
Multi-Slave
18
Cluster
1
Cluster
2
Cluster
3
Put Row X
Put Row X Put Row X
©2014 Cloudera, Inc. All rights reserved.
Enable / Disable Peers
19
Cluster 1
RS
HLog
Cluster 2
RSHLog
Tailing
Thread
©2014 Cloudera, Inc. All rights reserved.
Enable / Disable Peers
> disable_peer ‘2’
19
Cluster 1
RS
HLog
Cluster 2
RSHLog
Tailing
Thread
Is the peer enabled?
©2014 Cloudera, Inc. All rights reserved.
Enable / Disable Peers
> disable_peer ‘2’
19
Cluster 1
RS
HLog
Cluster 2
RSHLog
Tailing
Thread
HLog
Is the peer enabled?
©2014 Cloudera, Inc. All rights reserved.
Enable / Disable Peers
> disable_peer ‘2’
19
Cluster 1
RS
HLog
Cluster 2
RSHLog
Tailing
Thread
HLog
HLog
Is the peer enabled?
©2014 Cloudera, Inc. All rights reserved.
Enable / Disable Peers
> disable_peer ‘2’
19
Cluster 1
RS
HLog
Cluster 2
RSHLog
Tailing
Thread
HLog
HLog
HLog
Is the peer enabled?
©2014 Cloudera, Inc. All rights reserved.
Enable / Disable Peers
> disable_peer ‘2’
19
Cluster 1
RS
HLog
Cluster 2
RSHLog
Tailing
Thread
HLog
HLog
HLog
HLog
Is the peer enabled?
©2014 Cloudera, Inc. All rights reserved.
Enable / Disable Peers
> disable_peer ‘2’
19
Cluster 1
RS
HLog
Cluster 2
RSHLog
Tailing
Thread
HLog
HLog
HLog
HLog
HLog
Is the peer enabled?
©2014 Cloudera, Inc. All rights reserved.
Special Configurations
• KEEP_DELETED_CELLS
• Must be used on slaves with replication when
deleting data.
20
©2014 Cloudera, Inc. All rights reserved.
Special Configurations
• KEEP_DELETED_CELLS
• Must be used on slaves with replication when
deleting data.
• MIN_VERSION
• With TTL, makes it easy to configure a slave that
contains only the last few days of data.
20
©2014 Cloudera, Inc. All rights reserved.
Third Lesson Learned
• It’s easy to DDOS yourself.
• Replication was using the normal handlers...
• ... and using them to write back!
21
Handler1: Put
Handler2: Delete
Handler3: Replicate
Handler4: Get
Handler5: Put
Replicated Put goes in the queue
©2014 Cloudera, Inc. All rights reserved.
Fourth Lesson Learned
• Instinctively, what would something called
stop_replication do?
22
©2014 Cloudera, Inc. All rights reserved.
Fourth Lesson Learned
• Instinctively, what would something called
stop_replication do?
• Good intentions, bad outcomes, HBASE-8861
22
start/stop_replication
X
©2014 Cloudera, Inc. All rights reserved.
Third Release - 0.96.0 / 0.98.0
• Replication enabled by default!
• Completely refactored for readability/
extensibility (Chris Trezzo)
• ReplicationSyncUp tool (HBASE-9047)
• Throttling (HBASE-9501)
• Finer grained replication controls
(HBASE-8751)
23
©2014 Cloudera, Inc. All rights reserved.
ReplicationSyncUp Tool
• Works on an offline cluster
• Can finish replicating the queues in ZK
• Useful to finish draining a master cluster
24
HBase
HDFS
ZooKeeper
HBase
HDFS
ZooKeeper
ReplicationSyncUp
©2014 Cloudera, Inc. All rights reserved.
Finer Grained Replication Controls
> set_peer_tableCFs '2', "table1;
table2:cf1,cf2; table3:cfA,cfB"
• Meaning: enable replication to peer #2 for:
• All of table1
• cf1 and cf2 from table2
• cfA and cfB from table3
25
©2014 Cloudera, Inc. All rights reserved.
26
Agenda
• Four Years of Replication
• Use Cases in Production
• Roadmap
©2014 Cloudera, Inc. All rights reserved.
Flurry
• Two data centers, coast to coast
• Three clusters, in master-master pairs
• 1200 nodes
• 800 nodes
• 30 nodes
• Replication traffic: 2Gbps
• Latency between DCs: 85ms
27
©2014 Cloudera, Inc. All rights reserved.
Opower
• Two clusters, same data center
• Master: tens of nodes
• Slave: tens of nodes
• Replication traffic: 1GB/day
• Bulk load replication traffic: 180GB/day
• Recent use case
28
©2014 Cloudera, Inc. All rights reserved.
Lily HBase Indexer
• Collaboration between NGData & Cloudera.
• NGData are the creators of the Lily data
management platform.
• Lily HBase Indexer
• Service which acts as a HBase replication listener.
• Custom sink writes to SolrCloud.
• Integrates Cloudera Morphlines library for ETL of
rows.
29
©2014 Cloudera, Inc. All rights reserved.
30
Agenda
• Four Years of Replication
• Use Cases in Production
• Roadmap
©2014 Cloudera, Inc. All rights reserved.
Stop Relying on Permanent Znodes
• Current rule is to never rely on znodes to
survive cluster restarts, upgrades, etc.
• State data should be kept in an HBase table.
• Notification done through a new mechanism
• See: https://issues.apache.org/jira/browse/
HBASE-10295
31
©2014 Cloudera, Inc. All rights reserved.
Define a Replication Interface
• Replication is somewhat extendable but it
lacks stable interfaces.
• The HBase Indexer is such an extension and it
required surgery every time a committer
sneezed.
• See: https://issues.apache.org/jira/browse/
HBASE-10504
32
©2014 Cloudera, Inc. All rights reserved.
Distributed Counters
• Incrementing consists of:
33
©2014 Cloudera, Inc. All rights reserved.
Distributed Counters
• Incrementing consists of:
1.Taking a lock;
33
©2014 Cloudera, Inc. All rights reserved.
Distributed Counters
• Incrementing consists of:
1.Taking a lock;
2.Get’ing the current value; and
33
©2014 Cloudera, Inc. All rights reserved.
Distributed Counters
• Incrementing consists of:
1.Taking a lock;
2.Get’ing the current value; and
3.Put’ing the newly incremented value.
33
©2014 Cloudera, Inc. All rights reserved.
Distributed Counters
• Incrementing consists of:
1.Taking a lock;
2.Get’ing the current value; and
3.Put’ing the newly incremented value.
• This breaks in Master-Master because the
Puts are overwriting each other.
33
©2014 Cloudera, Inc. All rights reserved.
Distributed Counters
• Incrementing consists of:
1.Taking a lock;
2.Get’ing the current value; and
3.Put’ing the newly incremented value.
• This breaks in Master-Master because the
Puts are overwriting each other.
• See https://issues.apache.org/jira/browse/
HBASE-2804
33
©2014 Cloudera, Inc. All rights reserved.
More Tooling
• Replication management console, one shell to
rule all the clusters!
• Replication bootstrapping tool.
• Tool that can move queues between region
servers.
• Tool that can throttle replication on a live
cluster.
34
©2014 Cloudera, Inc. All rights reserved.
Questions?
• Or ping me async:
• @jdcryans
• jdcryans@cloudera.com
• jdcryans on #hbase irc.freenode.net
35

The State of HBase Replication

  • 1.
    1 The State ofHBase Replication Jean-Daniel Cryans May 5th, 2014
  • 2.
    ©2014 Cloudera, Inc.All rights reserved. About me 2 • Software Engineer at Cloudera, Storage team • Apache HBase committer since 2008, PMC member
  • 3.
    ©2014 Cloudera, Inc.All rights reserved. Motivation for HBase Replication • Even though HBase is: 3
  • 4.
    ©2014 Cloudera, Inc.All rights reserved. Motivation for HBase Replication • Even though HBase is: • distributed; 3
  • 5.
    ©2014 Cloudera, Inc.All rights reserved. Motivation for HBase Replication • Even though HBase is: • distributed; • fault-tolerant; 3
  • 6.
    ©2014 Cloudera, Inc.All rights reserved. Motivation for HBase Replication • Even though HBase is: • distributed; • fault-tolerant; • highly available; and 3
  • 7.
    ©2014 Cloudera, Inc.All rights reserved. Motivation for HBase Replication • Even though HBase is: • distributed; • fault-tolerant; • highly available; and • almost magic. 3
  • 8.
    ©2014 Cloudera, Inc.All rights reserved. Motivation for HBase Replication • Even though HBase is: • distributed; • fault-tolerant; • highly available; and • almost magic. 3
  • 9.
    ©2014 Cloudera, Inc.All rights reserved. The Current State • It’s production-ready. 4
  • 10.
    ©2014 Cloudera, Inc.All rights reserved. The Current State • It’s production-ready. • It’s used to replicate data between thousands of nodes across continents. 4
  • 11.
    ©2014 Cloudera, Inc.All rights reserved. The Current State • It’s production-ready. • It’s used to replicate data between thousands of nodes across continents. • It’s used for Disaster Recovery, geo- distributed serving, and more. 4
  • 12.
    ©2014 Cloudera, Inc.All rights reserved. 5 Agenda • Four Years of Replication • Use Cases in Production • Roadmap
  • 13.
    ©2014 Cloudera, Inc.All rights reserved. Design • Clusters are distinct • Pull VS push • Sync VS Async 6
  • 14.
    ©2014 Cloudera, Inc.All rights reserved. Clusters are Distinct • HBase doesn’t span DCs, HDFSs 7 Master 20 RS Slave 15 RS
  • 15.
    ©2014 Cloudera, Inc.All rights reserved. Clusters are Distinct • HBase doesn’t span DCs, HDFSs • .META. operations aren’t replicated 7 Master 20 RS Slave 15 RS
  • 16.
    ©2014 Cloudera, Inc.All rights reserved. Clusters are Distinct • HBase doesn’t span DCs, HDFSs • .META. operations aren’t replicated • Regions can be different 7 Master 20 RS Slave 15 RS
  • 17.
    ©2014 Cloudera, Inc.All rights reserved. Clusters are Distinct • HBase doesn’t span DCs, HDFSs • .META. operations aren’t replicated • Regions can be different • Security has to be configured for each cluster 7 Master 20 RS Slave 15 RS
  • 18.
    ©2014 Cloudera, Inc.All rights reserved. Push instead of Pull 8 MySQL Master MySQL Slave Get binlog Apply locally MySQL Replication uses Pull Cluster A Cluster B
  • 19.
    ©2014 Cloudera, Inc.All rights reserved. Push instead of Pull 9 RS RSreplicate entries Apply to cluster HBase Replication uses Push Cluster A Cluster B
  • 20.
    ©2014 Cloudera, Inc.All rights reserved. Async instead of Sync 10 Cluster A Cluster B RS HLog MemStore RS HLog MemStore Synchronous Replication
  • 21.
    ©2014 Cloudera, Inc.All rights reserved. Async instead of Sync 10 Cluster A Cluster B RS HLog MemStore RS HLog MemStore Put 2 3 1 Synchronous Replication
  • 22.
    ©2014 Cloudera, Inc.All rights reserved. Async instead of Sync 10 Cluster A Cluster B RS HLog MemStore RS HLog MemStore Put 2 3 1 Ack Ack Put 5 6 4 78 Synchronous Replication
  • 23.
    ©2014 Cloudera, Inc.All rights reserved. Async instead of Sync 11 Asynchronous Replication
  • 24.
    ©2014 Cloudera, Inc.All rights reserved. Async instead of Sync 11 Asynchronous Replication Cluster A RS HLog MemStore Put Ack 2 3 1 4
  • 25.
    ©2014 Cloudera, Inc.All rights reserved. Async instead of Sync 11 Asynchronous Replication Cluster A RS HLog MemStore Put Ack 2 3 1 4 Cluster B RS HLog MemStore Ack Put 3 4 2 5 HLog Tailing Thread 1
  • 26.
    ©2014 Cloudera, Inc.All rights reserved. First Release - 0.90.0 • Simple master-slave (only one) • Disabled by default • Uses ZK as a metadata store 12
  • 27.
    ©2014 Cloudera, Inc.All rights reserved. Original Implementation 13 replicateLogEntries()Replication Source ZooKeeper Watcher Region Server on Master Cluster Replication Sink HTable Put Delete Region Server on Slave Cluster
  • 28.
    ©2014 Cloudera, Inc.All rights reserved. First Lesson Learned • HDFS doesn’t support tailing files being written to. It requires: • open() • seek()// go where we stopped last time • while (not EOF || enoughData) • read() • close() • repeat 14
  • 29.
    ©2014 Cloudera, Inc.All rights reserved. Second Lesson Learned • Single threaded, non-batched ZK is slow • ZK didn’t have an atomic move operation • Doubles # ops needed, race conditions 15
  • 30.
    ©2014 Cloudera, Inc.All rights reserved. Second Lesson Learned • Single threaded, non-batched ZK is slow • ZK didn’t have an atomic move operation • Doubles # ops needed, race conditions 15 /hbase /replication /RS1 /1 /hlog1 /hlog2 ... /hbase /replication /RS2 /1-RS1 /hlog1 1. create new hlog2 2. delete old hlog2
  • 31.
    ©2014 Cloudera, Inc.All rights reserved. Second Release - 0.92.0 • Cyclic replication • Multi-slave (scope LOCAL or GLOBAL) • Enable / disable peer • Special configurations 16
  • 32.
    ©2014 Cloudera, Inc.All rights reserved. Cyclic Replication 17 Cluster 1 Cluster 2 Cluster 3 Put Row X
  • 33.
    ©2014 Cloudera, Inc.All rights reserved. Cyclic Replication 17 Cluster 1 Cluster 2 Cluster 3 Put Row X Put Row X
  • 34.
    ©2014 Cloudera, Inc.All rights reserved. Cyclic Replication 17 Cluster 1 Cluster 2 Cluster 3 Put Row X Put Row X Put Row X
  • 35.
    ©2014 Cloudera, Inc.All rights reserved. Cyclic Replication 17 Cluster 1 Cluster 2 Cluster 3 Put Row X Put Row X Put Row X Row X is from 1 Don’t replicate!
  • 36.
    ©2014 Cloudera, Inc.All rights reserved. Multi-Slave 18 Cluster 1 Cluster 2 Cluster 3 Put Row X
  • 37.
    ©2014 Cloudera, Inc.All rights reserved. Multi-Slave 18 Cluster 1 Cluster 2 Cluster 3 Put Row X Put Row X
  • 38.
    ©2014 Cloudera, Inc.All rights reserved. Multi-Slave 18 Cluster 1 Cluster 2 Cluster 3 Put Row X Put Row X Put Row X
  • 39.
    ©2014 Cloudera, Inc.All rights reserved. Enable / Disable Peers 19 Cluster 1 RS HLog Cluster 2 RSHLog Tailing Thread
  • 40.
    ©2014 Cloudera, Inc.All rights reserved. Enable / Disable Peers > disable_peer ‘2’ 19 Cluster 1 RS HLog Cluster 2 RSHLog Tailing Thread Is the peer enabled?
  • 41.
    ©2014 Cloudera, Inc.All rights reserved. Enable / Disable Peers > disable_peer ‘2’ 19 Cluster 1 RS HLog Cluster 2 RSHLog Tailing Thread HLog Is the peer enabled?
  • 42.
    ©2014 Cloudera, Inc.All rights reserved. Enable / Disable Peers > disable_peer ‘2’ 19 Cluster 1 RS HLog Cluster 2 RSHLog Tailing Thread HLog HLog Is the peer enabled?
  • 43.
    ©2014 Cloudera, Inc.All rights reserved. Enable / Disable Peers > disable_peer ‘2’ 19 Cluster 1 RS HLog Cluster 2 RSHLog Tailing Thread HLog HLog HLog Is the peer enabled?
  • 44.
    ©2014 Cloudera, Inc.All rights reserved. Enable / Disable Peers > disable_peer ‘2’ 19 Cluster 1 RS HLog Cluster 2 RSHLog Tailing Thread HLog HLog HLog HLog Is the peer enabled?
  • 45.
    ©2014 Cloudera, Inc.All rights reserved. Enable / Disable Peers > disable_peer ‘2’ 19 Cluster 1 RS HLog Cluster 2 RSHLog Tailing Thread HLog HLog HLog HLog HLog Is the peer enabled?
  • 46.
    ©2014 Cloudera, Inc.All rights reserved. Special Configurations • KEEP_DELETED_CELLS • Must be used on slaves with replication when deleting data. 20
  • 47.
    ©2014 Cloudera, Inc.All rights reserved. Special Configurations • KEEP_DELETED_CELLS • Must be used on slaves with replication when deleting data. • MIN_VERSION • With TTL, makes it easy to configure a slave that contains only the last few days of data. 20
  • 48.
    ©2014 Cloudera, Inc.All rights reserved. Third Lesson Learned • It’s easy to DDOS yourself. • Replication was using the normal handlers... • ... and using them to write back! 21 Handler1: Put Handler2: Delete Handler3: Replicate Handler4: Get Handler5: Put Replicated Put goes in the queue
  • 49.
    ©2014 Cloudera, Inc.All rights reserved. Fourth Lesson Learned • Instinctively, what would something called stop_replication do? 22
  • 50.
    ©2014 Cloudera, Inc.All rights reserved. Fourth Lesson Learned • Instinctively, what would something called stop_replication do? • Good intentions, bad outcomes, HBASE-8861 22 start/stop_replication X
  • 51.
    ©2014 Cloudera, Inc.All rights reserved. Third Release - 0.96.0 / 0.98.0 • Replication enabled by default! • Completely refactored for readability/ extensibility (Chris Trezzo) • ReplicationSyncUp tool (HBASE-9047) • Throttling (HBASE-9501) • Finer grained replication controls (HBASE-8751) 23
  • 52.
    ©2014 Cloudera, Inc.All rights reserved. ReplicationSyncUp Tool • Works on an offline cluster • Can finish replicating the queues in ZK • Useful to finish draining a master cluster 24 HBase HDFS ZooKeeper HBase HDFS ZooKeeper ReplicationSyncUp
  • 53.
    ©2014 Cloudera, Inc.All rights reserved. Finer Grained Replication Controls > set_peer_tableCFs '2', "table1; table2:cf1,cf2; table3:cfA,cfB" • Meaning: enable replication to peer #2 for: • All of table1 • cf1 and cf2 from table2 • cfA and cfB from table3 25
  • 54.
    ©2014 Cloudera, Inc.All rights reserved. 26 Agenda • Four Years of Replication • Use Cases in Production • Roadmap
  • 55.
    ©2014 Cloudera, Inc.All rights reserved. Flurry • Two data centers, coast to coast • Three clusters, in master-master pairs • 1200 nodes • 800 nodes • 30 nodes • Replication traffic: 2Gbps • Latency between DCs: 85ms 27
  • 56.
    ©2014 Cloudera, Inc.All rights reserved. Opower • Two clusters, same data center • Master: tens of nodes • Slave: tens of nodes • Replication traffic: 1GB/day • Bulk load replication traffic: 180GB/day • Recent use case 28
  • 57.
    ©2014 Cloudera, Inc.All rights reserved. Lily HBase Indexer • Collaboration between NGData & Cloudera. • NGData are the creators of the Lily data management platform. • Lily HBase Indexer • Service which acts as a HBase replication listener. • Custom sink writes to SolrCloud. • Integrates Cloudera Morphlines library for ETL of rows. 29
  • 58.
    ©2014 Cloudera, Inc.All rights reserved. 30 Agenda • Four Years of Replication • Use Cases in Production • Roadmap
  • 59.
    ©2014 Cloudera, Inc.All rights reserved. Stop Relying on Permanent Znodes • Current rule is to never rely on znodes to survive cluster restarts, upgrades, etc. • State data should be kept in an HBase table. • Notification done through a new mechanism • See: https://issues.apache.org/jira/browse/ HBASE-10295 31
  • 60.
    ©2014 Cloudera, Inc.All rights reserved. Define a Replication Interface • Replication is somewhat extendable but it lacks stable interfaces. • The HBase Indexer is such an extension and it required surgery every time a committer sneezed. • See: https://issues.apache.org/jira/browse/ HBASE-10504 32
  • 61.
    ©2014 Cloudera, Inc.All rights reserved. Distributed Counters • Incrementing consists of: 33
  • 62.
    ©2014 Cloudera, Inc.All rights reserved. Distributed Counters • Incrementing consists of: 1.Taking a lock; 33
  • 63.
    ©2014 Cloudera, Inc.All rights reserved. Distributed Counters • Incrementing consists of: 1.Taking a lock; 2.Get’ing the current value; and 33
  • 64.
    ©2014 Cloudera, Inc.All rights reserved. Distributed Counters • Incrementing consists of: 1.Taking a lock; 2.Get’ing the current value; and 3.Put’ing the newly incremented value. 33
  • 65.
    ©2014 Cloudera, Inc.All rights reserved. Distributed Counters • Incrementing consists of: 1.Taking a lock; 2.Get’ing the current value; and 3.Put’ing the newly incremented value. • This breaks in Master-Master because the Puts are overwriting each other. 33
  • 66.
    ©2014 Cloudera, Inc.All rights reserved. Distributed Counters • Incrementing consists of: 1.Taking a lock; 2.Get’ing the current value; and 3.Put’ing the newly incremented value. • This breaks in Master-Master because the Puts are overwriting each other. • See https://issues.apache.org/jira/browse/ HBASE-2804 33
  • 67.
    ©2014 Cloudera, Inc.All rights reserved. More Tooling • Replication management console, one shell to rule all the clusters! • Replication bootstrapping tool. • Tool that can move queues between region servers. • Tool that can throttle replication on a live cluster. 34
  • 68.
    ©2014 Cloudera, Inc.All rights reserved. Questions? • Or ping me async: • @jdcryans • jdcryans@cloudera.com • jdcryans on #hbase irc.freenode.net 35