Storage infrastructure using HBase behind LINE messages
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Storage infrastructure using HBase behind LINE messages

on

  • 18,650 views

Slides at hcj13w (http://hcj2013w.eventbrite.com/)

Slides at hcj13w (http://hcj2013w.eventbrite.com/)

Statistics

Views

Total Views
18,650
Views on SlideShare
13,555
Embed Views
5,095

Actions

Likes
91
Downloads
418
Comments
0

35 Embeds 5,095

http://tech.naver.jp 1794
http://cptl.corp.yahoo.co.jp 1019
http://6109.hidepiy.com 755
http://shinodogg.com 547
http://www.hadoopsphere.com 223
http://alpha.kouhou.navercorp.jp 223
http://www.scoop.it 125
http://dev-web6.navercorp.jp 115
http://developers.linecorp.com 64
http://10.32.196.82 47
http://www.twylah.com 43
https://twitter.com 24
http://translate.googleusercontent.com 23
http://vncmnae1501.nhnjp.ism 15
http://www.plurk.com 14
http://119.235.236.18 11
http://www.newsblur.com 9
http://feedly.com 8
http://webcache.googleusercontent.com 5
http://fullrss.net 4
http://www.docshut.com 4
http://www.linkedin.com 3
http://cache.yahoofs.jp 2
http://feedreader.com 2
https://www.google.fr 2
http://www.tuicool.com 2
https://translate.googleusercontent.com 2
http://twitter.com 2
http://nvcmnae1502.nhnjp.ism 2
https://twimg0-a.akamaihd.net 1
http://www.feedspot.com 1
http://tami1994.blogspot.com 1
http://digg.com 1
http://honyaku.yahoofs.jp 1
https://web.tweetdeck.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Storage infrastructure using HBase behind LINE messages Presentation Transcript

  • 1. Storage infrastructure using HBase behind LINE messages NHN Japan Corp. LINE Server Task Force Shunsuke Nakamura @sunsuk7tp 13.1.21 Hadoop  Conference  Japan  2013  Winter 2
  • 2. To support ’s users, we have built message storage that is Large scale (tens of billion rows/day) Responsive (under 10 ms) High available (dual clusters)13.1.21 Hadoop  Conference  Japan  2013  Winter 3
  • 3. Outline •  About LINE•  LINE & Storage requirements•  What we achieved•  Today’s topics –  IDC online migration –  NN failover –  Stabilizing LINE message cluster•  Conclusion13.1.21 Hadoop  Conference  Japan  2013  Winter 4
  • 4. LINE - A global messenger powered by NHN Japan - Devices 5 different mobile platforms + Desktop support13.1.21 Hadoop  Conference  Japan  2013  Winter 5
  • 5. 13.1.21 Hadoop  Conference  Japan  2013  Winter 6
  • 6. 13.1.21 Hadoop  Conference  Japan  2013  Winter 7
  • 7. New year 2013 in Japan Number of requests in a HBase cluster Usual Peak Hours New Year 2013 X  3 (ploFed  by  1min) あけおめ! 新年好! 3  5mes  traffic  explosion   LINE  Storage  had  no  problems  :)   13.1.21 Hadoop  Conference  Japan  2013  Winter 9
  • 8. LINE on Hadoop Storages for service, backup and log For HBase, M/R and log archive Bulk migration and ad-hoc analysis For HBase and Sharded-Redis Collecting Apache and Tomcat logs KPI, Log analysis 13.1.21 Hadoop  Conference  Japan  2013  Winter 10
  • 9. LINE on Hadoop Storages for service, backup and log For HBase, M/R and log archive Bulk migration and ad-hoc analysis For HBase and Sharded-Redis Collecting Apache and Tomcat logs KPI, Log analysis 13.1.21 Hadoop  Conference  Japan  2013  Winter 11
  • 10. LINE service requirements LINE is a… Messaging Service - Should be fast Global Service - Downtime not allowedBut, not a Simple Messaging Service. Message synchronization b/w phone & PCs –  Messages should be kept for a while. 13.1.21 Hadoop  Conference  Japan  2013  Winter 12
  • 11. LINE’s storage requirements No     data  loss Eventual   Low   consistency latency HA Flexible   schema   Easy  scale-­‐ management out 13.1.21 Hadoop  Conference  Japan  2013  Winter 13
  • 12. Our selection is HBase •  Low latency for large amount of data•  Linearly scalable•  Relatively lower operating cost –  Replication by nature –  Automatic failover•  Data model fits our requirements –  Semi-structured –  Timestamp13.1.21 Hadoop  Conference  Japan  2013  Winter 14
  • 13. Stored rows per day in a cluster (billions/day) 10 8 6 4 2 13.1.21 Hadoop  Conference  Japan  2013  Winter 15
  • 14. What we achieved with HBase •  No data loss –  Persistent –  Data replication •  Automatic recovery from server failure•  Reasonable performance for large data sets –  Hundreds of billion rows –  Write: ~ 1 ms –  Read: 1 ~ 10 ms13.1.21 Hadoop  Conference  Japan  2013  Winter 16
  • 15. Many issues we had •  Heterogeneous storages coordination•  IDC online migration•  Flush & Compaction Storms by “too many HLogs”•  Row & Column distribution•  Secondary Index•  Region Management –  load, size balancing –  RS Allocation –  META region –  M/R•  Monitoring for diagnostics•  Traffic burst by decommission•  NN problems•  Performance degradation –  hotspot problem –  timeout burst –  GC problem•  Client bugs –  Thread Blocking on server failure (HBASE-6364)13.1.21 Hadoop  Conference  Japan  2013  Winter 17
  • 16. Today’s topics IDC online migration NN failover Stabilizing LINE message cluster13.1.21 Hadoop  Conference  Japan  2013  Winter 18
  • 17. IDC online migration NN failoverStabilizing LINE message cluster
  • 18. Why? •  Move whole HBase clusters and data•  For better network infrastructure•  Without downtime13.1.21 Hadoop  Conference  Japan  2013  Winter 20
  • 19. IDC online migration Before migration App Server dst-HBase write src-HBase 13.1.21 Hadoop  Conference  Japan  2013  Winter 21
  • 20. IDC online migration •  Write to both (client-level replication) write App Server dst-HBase write src-HBase 13.1.21 Hadoop  Conference  Japan  2013  Winter 22
  • 21. IDC online migration •  New data: Incremental replication•  Old data: Bulk migration•  dst’s timestamp equals src’s one write App Server dst-HBase write src-HBase 13.1.21 Hadoop  Conference  Japan  2013  Winter 23
  • 22. LINE HBase Replicator & BulkMigrator Replicator is for incremental replication BulkMigrator is for bulk migration 13.1.21 Hadoop  Conference  Japan  2013  Winter 24
  • 23. LINE HBase Replicator •  Our own implementation•  Prefer pull to push •  Throughput throttling •  Workload isolation of replicator and RS•  Rowkey conversion and filtering HBase  Replicator LINE  HBase  Replicator src-HBase src-HBase push pull dst-HBase dst-HBase 13.1.21 Hadoop  Conference  Japan  2013  Winter 25
  • 24. LINE HBase Replicator - A simple daemon to replicate local regions - 1.  HLogTracker reads a ckpt and selects next HLog. 2.  For each entry in HLog: 1.  Filter & convert a HLog.Entry 2.  Create Puts and batch to dst HBase •  Periodic checkpointing •  Generally, entries are replicated in seconds 13.1.21 Hadoop  Conference  Japan  2013  Winter 26
  • 25. Bulk migration 1.  MapReduce between any storages –  Map task only –  Read source, write destination –  Task scheduling problem depends on region allocation2.  Non MapReduce version (BulkMigrator) –  Our own implementation –  HBase → HBase –  On each RS, scan & batch by a region –  Throughput throttling –  Slow, but easy to implement and debug 13.1.21 Hadoop  Conference  Japan  2013  Winter 27
  • 26. IDC online migration NN failoverStabilizing LINE message cluster
  • 27. Background •  Our HBase has a SPOF: NameNode•  “Apache Hadoop HA Configuration” http://blog.cloudera.com/blog/2009/07/hadoop-ha-configuration/•  Furthermore, added Pacemaker –  Heartbeat can’t detect whether NN is running13.1.21 Hadoop  Conference  Japan  2013  Winter 29
  • 28. Previous: HA-NN DRBD + VIP + Pacemaker 13.1.21 Hadoop  Conference  Japan  2013  Winter 30
  • 29. NameNode failure in 2012.10 13.1.21 Hadoop  Conference  Japan  2013  Winter 31
  • 30. HA-NN failover failed •  Not NameNode process•  Incorrect leader election at network partitioning•  Complicated configuration –  Easy to mistake, difficult to control –  Pacemaker scripting was not straightforward –  VIP is risky to HDFS•  DRBD split-brain problem –  Protocol C –  Unable to re-sync while service is online13.1.21 Hadoop  Conference  Japan  2013  Winter 32
  • 31. Now: In-house NN failure handling •  Bye-bye old HA-NN –  Had to restart whole HBase clusters after NN failover•  Alternative ideas –  Quorum-based leader election (Using ZK) –  Using L4 switch –  Implement our own AvatarNode•  Safer solution instead of a little downtime13.1.21 Hadoop  Conference  Japan  2013  Winter 33
  • 32. In-house NN failure handling (1)  rsync  with  -­‐-­‐link-­‐dest  periodically  13.1.21 Hadoop  Conference  Japan  2013  Winter 34
  • 33. In-house NN failure handling (2) Bomb 13.1.21 Hadoop  Conference  Japan  2013  Winter 35
  • 34. In-house NN failure handling (3) 13.1.21 Hadoop  Conference  Japan  2013  Winter 36
  • 35. IDC online migration NN failoverStabilizing LINE message cluster
  • 36. Stabilizing LINE message cluster Case  1 “Too  many   HLogs”   H/W  Failure   RS  GC  Storm   Handling   Case  3 Case  2 META  region   Hotspot   workload   Performance   problems isola5on Case  4 Region   mappings   to  RS 13.1.21 Hadoop  Conference  Japan  2013  Winter 38
  • 37. Case1: “Too many HLogs” •  Effect –  MemStore flush storm –  Compaction storm•  Cause –  Different regions growth –  Heterogeneous tables in a RS•  Solution –  Region balancing –  External flush scheduler13.1.21 Hadoop  Conference  Japan  2013  Winter 39
  • 38. Case1: Number of HLogs Forced flushed shed N o flu Periodic flushed better case peak off-peak worse case Forced flushed Forced flushed flush storm Forced flushed 13.1.21 Hadoop  Conference  Japan  2013  Winter 40
  • 39. Case2: Hotspot problems •  Effect –  Excessive GC –  RS performance degradation (High CPU usage)•  Cause –  Get/Scan: •  Row or column, updated too frequently •  Row which has too many columns (+ tombstones)•  Solution –  Schema and row/column distribution are important –  Hotspot region isolation13.1.21 Hadoop  Conference  Japan  2013  Winter 41
  • 40. Case3: META region workload isolation •  Effect 1.  RS high CPU 2.  Excessive timeout 3.  META lookup timeout•  Cause –  Inefficient exception handling of HBase client –  Hotspot region and META in same RS•  Solution –  META only RS13.1.21 Hadoop  Conference  Japan  2013  Winter 42
  • 41. Case4: Region mappings to RS •  Effect –  Region mapping is not restored on RS restart –  Some region mappings aren’t restored properly after graceful restart •  graceful_stop.sh --restart --reload•  Cause –  HBase does not support it well•  Solution –  Periodic dump and restore it13.1.21 Hadoop  Conference  Japan  2013  Winter 43
  • 42. Summary •  IDC online migration –  Without downtime –  LINE HBase Replicator & BulkMigrator•  NN failover –  Simple solution for a person saying “What’s Hadoop?”•  Stabilizing LINE message cluster –  Improved response time of RS13.1.21 Hadoop  Conference  Japan  2013  Winter 44
  • 43. Conclusion We won 100M user adopting HBase LINE Storage is a successful example of a messaging service using HBase13.1.21 Hadoop  Conference  Japan  2013  Winter 45