Storage infrastructure using HBase behind LINE messages

22,109 views
22,050 views

Published on

Slides at hcj13w (http://hcj2013w.eventbrite.com/)

Published in: Technology
0 Comments
99 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
22,109
On SlideShare
0
From Embeds
0
Number of Embeds
6,094
Actions
Shares
0
Downloads
472
Comments
0
Likes
99
Embeds 0
No embeds

No notes for slide

Storage infrastructure using HBase behind LINE messages

  1. 1. Storage infrastructure using HBase behind LINE messages NHN Japan Corp. LINE Server Task Force Shunsuke Nakamura @sunsuk7tp 13.1.21 Hadoop  Conference  Japan  2013  Winter 2
  2. 2. To support ’s users, we have built message storage that is Large scale (tens of billion rows/day) Responsive (under 10 ms) High available (dual clusters)13.1.21 Hadoop  Conference  Japan  2013  Winter 3
  3. 3. Outline •  About LINE•  LINE & Storage requirements•  What we achieved•  Today’s topics –  IDC online migration –  NN failover –  Stabilizing LINE message cluster•  Conclusion13.1.21 Hadoop  Conference  Japan  2013  Winter 4
  4. 4. LINE - A global messenger powered by NHN Japan - Devices 5 different mobile platforms + Desktop support13.1.21 Hadoop  Conference  Japan  2013  Winter 5
  5. 5. 13.1.21 Hadoop  Conference  Japan  2013  Winter 6
  6. 6. 13.1.21 Hadoop  Conference  Japan  2013  Winter 7
  7. 7. New year 2013 in Japan Number of requests in a HBase cluster Usual Peak Hours New Year 2013 X  3 (ploFed  by  1min) あけおめ! 新年好! 3  5mes  traffic  explosion   LINE  Storage  had  no  problems  :)   13.1.21 Hadoop  Conference  Japan  2013  Winter 9
  8. 8. LINE on Hadoop Storages for service, backup and log For HBase, M/R and log archive Bulk migration and ad-hoc analysis For HBase and Sharded-Redis Collecting Apache and Tomcat logs KPI, Log analysis 13.1.21 Hadoop  Conference  Japan  2013  Winter 10
  9. 9. LINE on Hadoop Storages for service, backup and log For HBase, M/R and log archive Bulk migration and ad-hoc analysis For HBase and Sharded-Redis Collecting Apache and Tomcat logs KPI, Log analysis 13.1.21 Hadoop  Conference  Japan  2013  Winter 11
  10. 10. LINE service requirements LINE is a… Messaging Service - Should be fast Global Service - Downtime not allowedBut, not a Simple Messaging Service. Message synchronization b/w phone & PCs –  Messages should be kept for a while. 13.1.21 Hadoop  Conference  Japan  2013  Winter 12
  11. 11. LINE’s storage requirements No     data  loss Eventual   Low   consistency latency HA Flexible   schema   Easy  scale-­‐ management out 13.1.21 Hadoop  Conference  Japan  2013  Winter 13
  12. 12. Our selection is HBase •  Low latency for large amount of data•  Linearly scalable•  Relatively lower operating cost –  Replication by nature –  Automatic failover•  Data model fits our requirements –  Semi-structured –  Timestamp13.1.21 Hadoop  Conference  Japan  2013  Winter 14
  13. 13. Stored rows per day in a cluster (billions/day) 10 8 6 4 2 13.1.21 Hadoop  Conference  Japan  2013  Winter 15
  14. 14. What we achieved with HBase •  No data loss –  Persistent –  Data replication •  Automatic recovery from server failure•  Reasonable performance for large data sets –  Hundreds of billion rows –  Write: ~ 1 ms –  Read: 1 ~ 10 ms13.1.21 Hadoop  Conference  Japan  2013  Winter 16
  15. 15. Many issues we had •  Heterogeneous storages coordination•  IDC online migration•  Flush & Compaction Storms by “too many HLogs”•  Row & Column distribution•  Secondary Index•  Region Management –  load, size balancing –  RS Allocation –  META region –  M/R•  Monitoring for diagnostics•  Traffic burst by decommission•  NN problems•  Performance degradation –  hotspot problem –  timeout burst –  GC problem•  Client bugs –  Thread Blocking on server failure (HBASE-6364)13.1.21 Hadoop  Conference  Japan  2013  Winter 17
  16. 16. Today’s topics IDC online migration NN failover Stabilizing LINE message cluster13.1.21 Hadoop  Conference  Japan  2013  Winter 18
  17. 17. IDC online migration NN failoverStabilizing LINE message cluster
  18. 18. Why? •  Move whole HBase clusters and data•  For better network infrastructure•  Without downtime13.1.21 Hadoop  Conference  Japan  2013  Winter 20
  19. 19. IDC online migration Before migration App Server dst-HBase write src-HBase 13.1.21 Hadoop  Conference  Japan  2013  Winter 21
  20. 20. IDC online migration •  Write to both (client-level replication) write App Server dst-HBase write src-HBase 13.1.21 Hadoop  Conference  Japan  2013  Winter 22
  21. 21. IDC online migration •  New data: Incremental replication•  Old data: Bulk migration•  dst’s timestamp equals src’s one write App Server dst-HBase write src-HBase 13.1.21 Hadoop  Conference  Japan  2013  Winter 23
  22. 22. LINE HBase Replicator & BulkMigrator Replicator is for incremental replication BulkMigrator is for bulk migration 13.1.21 Hadoop  Conference  Japan  2013  Winter 24
  23. 23. LINE HBase Replicator •  Our own implementation•  Prefer pull to push •  Throughput throttling •  Workload isolation of replicator and RS•  Rowkey conversion and filtering HBase  Replicator LINE  HBase  Replicator src-HBase src-HBase push pull dst-HBase dst-HBase 13.1.21 Hadoop  Conference  Japan  2013  Winter 25
  24. 24. LINE HBase Replicator - A simple daemon to replicate local regions - 1.  HLogTracker reads a ckpt and selects next HLog. 2.  For each entry in HLog: 1.  Filter & convert a HLog.Entry 2.  Create Puts and batch to dst HBase •  Periodic checkpointing •  Generally, entries are replicated in seconds 13.1.21 Hadoop  Conference  Japan  2013  Winter 26
  25. 25. Bulk migration 1.  MapReduce between any storages –  Map task only –  Read source, write destination –  Task scheduling problem depends on region allocation2.  Non MapReduce version (BulkMigrator) –  Our own implementation –  HBase → HBase –  On each RS, scan & batch by a region –  Throughput throttling –  Slow, but easy to implement and debug 13.1.21 Hadoop  Conference  Japan  2013  Winter 27
  26. 26. IDC online migration NN failoverStabilizing LINE message cluster
  27. 27. Background •  Our HBase has a SPOF: NameNode•  “Apache Hadoop HA Configuration” http://blog.cloudera.com/blog/2009/07/hadoop-ha-configuration/•  Furthermore, added Pacemaker –  Heartbeat can’t detect whether NN is running13.1.21 Hadoop  Conference  Japan  2013  Winter 29
  28. 28. Previous: HA-NN DRBD + VIP + Pacemaker 13.1.21 Hadoop  Conference  Japan  2013  Winter 30
  29. 29. NameNode failure in 2012.10 13.1.21 Hadoop  Conference  Japan  2013  Winter 31
  30. 30. HA-NN failover failed •  Not NameNode process•  Incorrect leader election at network partitioning•  Complicated configuration –  Easy to mistake, difficult to control –  Pacemaker scripting was not straightforward –  VIP is risky to HDFS•  DRBD split-brain problem –  Protocol C –  Unable to re-sync while service is online13.1.21 Hadoop  Conference  Japan  2013  Winter 32
  31. 31. Now: In-house NN failure handling •  Bye-bye old HA-NN –  Had to restart whole HBase clusters after NN failover•  Alternative ideas –  Quorum-based leader election (Using ZK) –  Using L4 switch –  Implement our own AvatarNode•  Safer solution instead of a little downtime13.1.21 Hadoop  Conference  Japan  2013  Winter 33
  32. 32. In-house NN failure handling (1)  rsync  with  -­‐-­‐link-­‐dest  periodically  13.1.21 Hadoop  Conference  Japan  2013  Winter 34
  33. 33. In-house NN failure handling (2) Bomb 13.1.21 Hadoop  Conference  Japan  2013  Winter 35
  34. 34. In-house NN failure handling (3) 13.1.21 Hadoop  Conference  Japan  2013  Winter 36
  35. 35. IDC online migration NN failoverStabilizing LINE message cluster
  36. 36. Stabilizing LINE message cluster Case  1 “Too  many   HLogs”   H/W  Failure   RS  GC  Storm   Handling   Case  3 Case  2 META  region   Hotspot   workload   Performance   problems isola5on Case  4 Region   mappings   to  RS 13.1.21 Hadoop  Conference  Japan  2013  Winter 38
  37. 37. Case1: “Too many HLogs” •  Effect –  MemStore flush storm –  Compaction storm•  Cause –  Different regions growth –  Heterogeneous tables in a RS•  Solution –  Region balancing –  External flush scheduler13.1.21 Hadoop  Conference  Japan  2013  Winter 39
  38. 38. Case1: Number of HLogs Forced flushed shed N o flu Periodic flushed better case peak off-peak worse case Forced flushed Forced flushed flush storm Forced flushed 13.1.21 Hadoop  Conference  Japan  2013  Winter 40
  39. 39. Case2: Hotspot problems •  Effect –  Excessive GC –  RS performance degradation (High CPU usage)•  Cause –  Get/Scan: •  Row or column, updated too frequently •  Row which has too many columns (+ tombstones)•  Solution –  Schema and row/column distribution are important –  Hotspot region isolation13.1.21 Hadoop  Conference  Japan  2013  Winter 41
  40. 40. Case3: META region workload isolation •  Effect 1.  RS high CPU 2.  Excessive timeout 3.  META lookup timeout•  Cause –  Inefficient exception handling of HBase client –  Hotspot region and META in same RS•  Solution –  META only RS13.1.21 Hadoop  Conference  Japan  2013  Winter 42
  41. 41. Case4: Region mappings to RS •  Effect –  Region mapping is not restored on RS restart –  Some region mappings aren’t restored properly after graceful restart •  graceful_stop.sh --restart --reload•  Cause –  HBase does not support it well•  Solution –  Periodic dump and restore it13.1.21 Hadoop  Conference  Japan  2013  Winter 43
  42. 42. Summary •  IDC online migration –  Without downtime –  LINE HBase Replicator & BulkMigrator•  NN failover –  Simple solution for a person saying “What’s Hadoop?”•  Stabilizing LINE message cluster –  Improved response time of RS13.1.21 Hadoop  Conference  Japan  2013  Winter 44
  43. 43. Conclusion We won 100M user adopting HBase LINE Storage is a successful example of a messaging service using HBase13.1.21 Hadoop  Conference  Japan  2013  Winter 45

×