Storage infrastructure using HBase behind LINE messagesPresentation Transcript
Storage infrastructure using HBase behind LINE messages NHN Japan Corp. LINE Server Task Force Shunsuke Nakamura @sunsuk7tp 13.1.21 Hadoop Conference Japan 2013 Winter 2
To support ’s users, we have built message storage that is Large scale (tens of billion rows/day) Responsive (under 10 ms) High available (dual clusters)13.1.21 Hadoop Conference Japan 2013 Winter 3
Outline • About LINE• LINE & Storage requirements• What we achieved• Today’s topics – IDC online migration – NN failover – Stabilizing LINE message cluster• Conclusion13.1.21 Hadoop Conference Japan 2013 Winter 4
LINE - A global messenger powered by NHN Japan - Devices 5 different mobile platforms + Desktop support13.1.21 Hadoop Conference Japan 2013 Winter 5
13.1.21 Hadoop Conference Japan 2013 Winter 6
13.1.21 Hadoop Conference Japan 2013 Winter 7
New year 2013 in Japan Number of requests in a HBase cluster Usual Peak Hours New Year 2013 X 3 (ploFed by 1min) あけおめ! 新年好! 3 5mes traffic explosion LINE Storage had no problems :) 13.1.21 Hadoop Conference Japan 2013 Winter 9
LINE on Hadoop Storages for service, backup and log For HBase, M/R and log archive Bulk migration and ad-hoc analysis For HBase and Sharded-Redis Collecting Apache and Tomcat logs KPI, Log analysis 13.1.21 Hadoop Conference Japan 2013 Winter 10
LINE on Hadoop Storages for service, backup and log For HBase, M/R and log archive Bulk migration and ad-hoc analysis For HBase and Sharded-Redis Collecting Apache and Tomcat logs KPI, Log analysis 13.1.21 Hadoop Conference Japan 2013 Winter 11
LINE service requirements LINE is a… Messaging Service - Should be fast Global Service - Downtime not allowedBut, not a Simple Messaging Service. Message synchronization b/w phone & PCs – Messages should be kept for a while. 13.1.21 Hadoop Conference Japan 2013 Winter 12
LINE’s storage requirements No data loss Eventual Low consistency latency HA Flexible schema Easy scale-‐ management out 13.1.21 Hadoop Conference Japan 2013 Winter 13
Our selection is HBase • Low latency for large amount of data• Linearly scalable• Relatively lower operating cost – Replication by nature – Automatic failover• Data model fits our requirements – Semi-structured – Timestamp13.1.21 Hadoop Conference Japan 2013 Winter 14
Stored rows per day in a cluster (billions/day) 10 8 6 4 2 13.1.21 Hadoop Conference Japan 2013 Winter 15
What we achieved with HBase • No data loss – Persistent – Data replication • Automatic recovery from server failure• Reasonable performance for large data sets – Hundreds of billion rows – Write: ~ 1 ms – Read: 1 ~ 10 ms13.1.21 Hadoop Conference Japan 2013 Winter 16
Many issues we had • Heterogeneous storages coordination• IDC online migration• Flush & Compaction Storms by “too many HLogs”• Row & Column distribution• Secondary Index• Region Management – load, size balancing – RS Allocation – META region – M/R• Monitoring for diagnostics• Traffic burst by decommission• NN problems• Performance degradation – hotspot problem – timeout burst – GC problem• Client bugs – Thread Blocking on server failure (HBASE-6364)13.1.21 Hadoop Conference Japan 2013 Winter 17
Today’s topics IDC online migration NN failover Stabilizing LINE message cluster13.1.21 Hadoop Conference Japan 2013 Winter 18
IDC online migration NN failoverStabilizing LINE message cluster
Why? • Move whole HBase clusters and data• For better network infrastructure• Without downtime13.1.21 Hadoop Conference Japan 2013 Winter 20
IDC online migration Before migration App Server dst-HBase write src-HBase 13.1.21 Hadoop Conference Japan 2013 Winter 21
IDC online migration • Write to both (client-level replication) write App Server dst-HBase write src-HBase 13.1.21 Hadoop Conference Japan 2013 Winter 22
IDC online migration • New data: Incremental replication• Old data: Bulk migration• dst’s timestamp equals src’s one write App Server dst-HBase write src-HBase 13.1.21 Hadoop Conference Japan 2013 Winter 23
LINE HBase Replicator & BulkMigrator Replicator is for incremental replication BulkMigrator is for bulk migration 13.1.21 Hadoop Conference Japan 2013 Winter 24
LINE HBase Replicator • Our own implementation• Prefer pull to push • Throughput throttling • Workload isolation of replicator and RS• Rowkey conversion and filtering HBase Replicator LINE HBase Replicator src-HBase src-HBase push pull dst-HBase dst-HBase 13.1.21 Hadoop Conference Japan 2013 Winter 25
LINE HBase Replicator - A simple daemon to replicate local regions - 1. HLogTracker reads a ckpt and selects next HLog. 2. For each entry in HLog: 1. Filter & convert a HLog.Entry 2. Create Puts and batch to dst HBase • Periodic checkpointing • Generally, entries are replicated in seconds 13.1.21 Hadoop Conference Japan 2013 Winter 26
Bulk migration 1. MapReduce between any storages – Map task only – Read source, write destination – Task scheduling problem depends on region allocation2. Non MapReduce version (BulkMigrator) – Our own implementation – HBase → HBase – On each RS, scan & batch by a region – Throughput throttling – Slow, but easy to implement and debug 13.1.21 Hadoop Conference Japan 2013 Winter 27
IDC online migration NN failoverStabilizing LINE message cluster
Background • Our HBase has a SPOF: NameNode• “Apache Hadoop HA Configuration” http://blog.cloudera.com/blog/2009/07/hadoop-ha-configuration/• Furthermore, added Pacemaker – Heartbeat can’t detect whether NN is running13.1.21 Hadoop Conference Japan 2013 Winter 29
NameNode failure in 2012.10 13.1.21 Hadoop Conference Japan 2013 Winter 31
HA-NN failover failed • Not NameNode process• Incorrect leader election at network partitioning• Complicated configuration – Easy to mistake, difficult to control – Pacemaker scripting was not straightforward – VIP is risky to HDFS• DRBD split-brain problem – Protocol C – Unable to re-sync while service is online13.1.21 Hadoop Conference Japan 2013 Winter 32
Now: In-house NN failure handling • Bye-bye old HA-NN – Had to restart whole HBase clusters after NN failover• Alternative ideas – Quorum-based leader election (Using ZK) – Using L4 switch – Implement our own AvatarNode• Safer solution instead of a little downtime13.1.21 Hadoop Conference Japan 2013 Winter 33
In-house NN failure handling (1) rsync with -‐-‐link-‐dest periodically 13.1.21 Hadoop Conference Japan 2013 Winter 34
In-house NN failure handling (2) Bomb 13.1.21 Hadoop Conference Japan 2013 Winter 35
In-house NN failure handling (3) 13.1.21 Hadoop Conference Japan 2013 Winter 36
IDC online migration NN failoverStabilizing LINE message cluster
Stabilizing LINE message cluster Case 1 “Too many HLogs” H/W Failure RS GC Storm Handling Case 3 Case 2 META region Hotspot workload Performance problems isola5on Case 4 Region mappings to RS 13.1.21 Hadoop Conference Japan 2013 Winter 38
Case1: “Too many HLogs” • Effect – MemStore flush storm – Compaction storm• Cause – Different regions growth – Heterogeneous tables in a RS• Solution – Region balancing – External flush scheduler13.1.21 Hadoop Conference Japan 2013 Winter 39
Case1: Number of HLogs Forced flushed shed N o flu Periodic flushed better case peak off-peak worse case Forced flushed Forced flushed flush storm Forced flushed 13.1.21 Hadoop Conference Japan 2013 Winter 40
Case2: Hotspot problems • Effect – Excessive GC – RS performance degradation (High CPU usage)• Cause – Get/Scan: • Row or column, updated too frequently • Row which has too many columns (+ tombstones)• Solution – Schema and row/column distribution are important – Hotspot region isolation13.1.21 Hadoop Conference Japan 2013 Winter 41
Case3: META region workload isolation • Effect 1. RS high CPU 2. Excessive timeout 3. META lookup timeout• Cause – Inefficient exception handling of HBase client – Hotspot region and META in same RS• Solution – META only RS13.1.21 Hadoop Conference Japan 2013 Winter 42
Case4: Region mappings to RS • Effect – Region mapping is not restored on RS restart – Some region mappings aren’t restored properly after graceful restart • graceful_stop.sh --restart --reload• Cause – HBase does not support it well• Solution – Periodic dump and restore it13.1.21 Hadoop Conference Japan 2013 Winter 43
Summary • IDC online migration – Without downtime – LINE HBase Replicator & BulkMigrator• NN failover – Simple solution for a person saying “What’s Hadoop?”• Stabilizing LINE message cluster – Improved response time of RS13.1.21 Hadoop Conference Japan 2013 Winter 44
Conclusion We won 100M user adopting HBase LINE Storage is a successful example of a messaging service using HBase13.1.21 Hadoop Conference Japan 2013 Winter 45