Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Zero-downtime Hadoop/HBase Cross-datacenter Migration

908 views

Published on

This is slides was shared in HadoopConf in TW 2015

Published in: Technology
  • Be the first to comment

Zero-downtime Hadoop/HBase Cross-datacenter Migration

  1. 1. Zero-downtime Hadoop/HBase Cross- datacenter Migration SPN, TrendMicro Scott Miao & Dumbo team members Sep. 19, 2015
  2. 2. Who am I • Scott Miao • RD, SPN, Trend Micro • Worked on Hadoop ecosystem since 2011 • Expertise in HDFS/MR/HBase • Contributor for HBase/HDFS • Speaker in HBaseCon2014 • @takeshi.miao Our blog ‘Dumbo in TW’: http://dumbointaiwan.blogspot.tw/ HBasecon2014 sharing: http://www.slideshare.net/HBaseCon/case-studies-session-6 https://vimeo.com/99679688 2
  3. 3. Agenda • What problem we suffered • IDC migration • Zero downtime migration • Wrap up 3
  4. 4. What problem we suffered ? 4
  5. 5. #1 Network bandwidth insufficient
  6. 6. Old IDC Layout 6 ● ● ● POD Core Switch TOR Switch 41U rack 41U rack POD 1Gb 1Gb 20 Gb POD ● ● ● Up stream devices HD NN cpu: 8cores mem: 72GB Disk: 4TB HD DN cpu: 12cores mem: 128GB disk: 6TB Other services 12 Gb usage Hadoop + services network traffic No physical space Core Switch Since 2008 x n x 2 x n Devices view Servers view
  7. 7. #2 Data storage capacity insufficient
  8. 8. Est. Data Growth 8 • ~2x data growth
  9. 9. 9
  10. 10. 10http://www.305startup.net/creative-new-business-ideas-2015/
  11. 11. What options • Enhance old IDC – Replace 1Gb to 10 Gb network topology – Adjust servers location – Any chances for more physical space ? • Migrate to new IDC – 10 Gb network topology – Servers location well defined – More physical space 11
  12. 12. What options • Migrate to public cloud – Provision on-demand • Instance type (NIC/CPU/Mem/Disk) and amount – Pay as you go – Need to optimize our existing services 12
  13. 13. IDC Migration 14
  14. 14. Recap… Network bandwidth Data storage capacity insufficient
  15. 15. New IDC Layout ● ● ● POD Core TOR Switch 41rack 41U rack SPN Hadoop POD 10Gb 160 Gb 40Gb Up stream devices HD NN cpu: 16cores mem: 128GB disk: 10TB HD DN cpu: 24cores mem: 196GB disk: 72TB Other services Core Switch Network traffic becomes far more less Total 2~3X data storage capacity in terms of our data growth Grow up to 14 racks x 2 x n Servers viewDevices view
  16. 16. Now what ? Don’t forget our beloved elephant~ https://gigaom.com/2013/10/25/cloudera-ceo-were-taking-the-high-profit-road-in-hadoop/ http://www.pragsis.com/blog/how_install_hadoop_3_commands
  17. 17. 18 YARN abstracts the computing frameworks from Hadoop http://hortonworks.com/hadoop/yarn/
  18. 18. So not only doing migration also doing upgrade as well
  19. 19. TMH6 V.S. TMH7 2 Project TMH6 TMH7 Highlights Hadoop 2.0.0 (MRv1) 2.6.0 • YARN + MRv2 • YARN + ??? HBase 0.94.2 0.98.5 • MTTR impr. • Stripe Comp. Zookeeper 3.4.5 3.4.6 Pig 0.10.0 0.14.0 • Pig on Tez Sqoop1 1.4.2 1.4.5 Oozie 4.0.1 4.0.1 JVM Java6 Java7 • G1GC support
  20. 20. How we test our TMH7 ? How our services port and test with TMH7 ? Apache Bigtop PMC Evans Ye Comes to rescue in next Session 21
  21. 21. http://www.desktopwallpapers4.me/computers/hardware-28528/
  22. 22. Migration + Upgrade • Span two IDCs -> upgrade -> phase out old one 23 Old IDC 20 Gb New IDC
  23. 23. Migration + Upgrade • Build new one -> migrate -> phase out old one 24 Old IDC 20 Gb New IDC
  24. 24. 1. Build new one 2. migrate 3. phase out old one
  25. 25. Are we done ? We even not in the game !
  26. 26. 27
  27. 27. Zero downtime migration 28
  28. 28. 29http://www.whatdegreewhichuniversity.com/Student-Housing/Moving-out-of-home-in-2013.aspx
  29. 29. Data Access Pattern Analysis Hadoop/HDFS/MR
  30. 30. 2 IDC Hadoop cluster Log collectors Message queues Data sourcing services File compactors Internet Data in Data proc Application services Data out Service1. New files put (mins) to HDFS 2. Proc files with Pig/MR (hourly/daily) to HDFS 3. Get result files from HDFS, do further proc 4. Serve user requests 1. 2. 3. 4.
  31. 31. Data access patterns for Hadoop/HDFS/MR • Data in – New file put in couple mins • Computation – Process data hourly or daily • Data out – Get result files by services for further process 32
  32. 32. Categorize Data • Hot data – Ingest files in mins • New data file put into Hadoop continuously – Digest by Pig/MR for services hourly or daily • Needed history data files – Usually within couple months – Sync data by • Replicate Data streaming ingestion (Message queues + File compactors) • distcp – every mins • Cold data – All data except hot • Time spans couple years data • For monthly/quarterly/yearly report purposes • Adhoc query – Copy data by • disctp, run & leave it alone 33
  33. 33. Kerberos federation among our clusters • Please wait for our next session – Multi-Cluster Live Synchronization with Kerberos Federated Hadoop by Mammi Chang, Dumbo team 34 Old IDC TMH6 stg TMH6 prod Old IDC TMH7 stg TMH7 prod
  34. 34. 35 New IDCOld IDC Hadoop (tmh7) Old Service 1’ New Service 1 Log collectors 20g Link Hadoop (tmh6) Old Service 2 Old Service 1 Log collectors Sync hot data Sync hot data Message Queue Zero downtime migration for Hadoop/HDFS/MR File Compactors Copy cold data File Compactors Message Queues
  35. 35. Need services’ cooperation • It seems services have no downtime • Latency for hot data sync – May cause about latency in mins – Due to distcp cron job runs every couple mins • Need services to – Adjust their jobs to delay couple mins to run 36
  36. 36. Seems pretty ! So are we done? Don’t forget our HBase XD
  37. 37. Data Access Pattern Analysis HBase
  38. 38. 2 IDC Hadoop cluster Log collectors Message queues Data sourcing services File compactors Internet Data in Data proc Application services Data out Service1. New files put (mins) to HDFS 2. Proc files with Pig/MR (hourly/daily) to HBase 3. Random read from HBase 4. Serve user requests 5. Random writes to HBase 1. 3.4. 5. 2.
  39. 39. Data access patterns for HBase • Data in – Random write to HBase – Process/write data hourly or daily • Data out – Random read from HBase 40
  40. 40. Considerations for HBase data sync • What we want ? • All HBase data synced between old and new • Arrange useless regions (Region merge) • Rowkey: ‘<key>-<timestamp>’ • hbase.hregion.max.filesize – 1GB to 4GB 41
  41. 41. Considerations for HBase data sync • Incompatible changes between old & new HBases – API binary incomapatible – HDFS level folder structure changed – HDFS level meta data file format changed • Not include HFileV2 42
  42. 42. Tools for HBase data sync Tool Impl. tech. API compatible Service impact Data chunk Boundary CopyTable API client call Cluster Replication API client call Completebulkload HFile Need to pending writes and flush table Based on when to pending writes Export/Import SequenceFile + KeyValue + MR Set start/end timestamp Based on previous 43http://hbase.apache.org/book.html#tools
  43. 43. Support tools for HBase sync • Pre-splits generator – Run on TMH6 – Deal with region merge issue – To generate pre-splits rowkey file – Create new HTable on TMH7 with this file 44 gen-htable-presplits.sh /user/SPN-hbase/<table-name>/ <region-size-bytes> <threshold> > /tmp/<table-name>-splits.txt hbase shell create '<table-name>', '<column-family-1>' , SPLITS_FILE => '/tmp/<table- name>-splits.txt'
  44. 44. Support tools for HBase sync • RowCount with timerange – Support on both TMH6 & TMH7 – Imported data check – Not officially support – Enhance old one to make our own 45 rowCounter.sh <table-name> --time-range=<start-timestamp>,end-timestamp> # ... com.trendmicro.spn.hbase.mapreduce.RowCounter$RowCounterMapper$Counters ROWS=10892133 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0
  45. 45. Support tools for HBase sync • Snapshot – On TMH7 – For every time pass of imported data check – Rollback to previous snapshot if data check fails 46 hbase shell snapshot '<table-name>', '<table-name>-<start-timestamp>-<end-timestamp>'
  46. 46. Support tools for HBase sync • DateTime <-> Timestamp 47 # get current java timestamp (long) date +%s%N | cut -b1-13 # get current hour java timestamp (long) date --date="$(date +'%Y%m%d %H:00:00')" +%s%N | cut -b1-13 # get current hour -1 java timestamp (long) date --date="$(date --date='1 hour ago' +'%Y%m%d %H:00:00')" +%s%N | cut - b1-13 # timestamp to date date -d '@1436336202' # must be 10 digits, from left to right
  47. 47. Zero downtime migration for HBase 48 Old IDC Staging New IDC Staging hbase- tmh7 Hadoop- tmh7 hbase- tmh6 Hadoop- tmh6 ServiceA ServiceB 1. Confirm KV timestamp with ServiceB 2. Export data to HDFS with timestamp 3. Gen splits file 4. distcp data to TMH7 5. Create HTable with splits 6. Import data to HTable 7. Verify data by rowcount W/ timestamp 8. Create snapshot 9,11. Sync data thru #2~8 (skip 3, 5) 10. ServiceB stag test start 12. Grant ‘RW’ to HTable for ServiceB 13. Install ServiceB in new IDC 14. Start ServiceB in new IDC 15. Done 2. 3. 4. 5. 6. 7. 8. 12. ServiceB 13. 14.
  48. 48. Need services’ cooperation • There still will be a small data gap – It may be mins • Is it sensitive to services ? – If it is not • Wait for our final data sync – If it is • Services need to direct their writes to both clusters 49 Data sync to HTable -> service start up and run -> final data sync to Htable Data gap
  49. 49. Wrap up 50
  50. 50. Wrap up • Analyze access patterns – Batch ? Real time ? Streaming ? – Cold data ? Hot data ? • Keep it simple! – Use native utils as far as you can • Rehearsal ! Rehearsal ! Rehearsal ! • Communicate with your users closely 51
  51. 51. 52 某一天… 你們migrate的如何? 我migrate完了! 我migrate,完了 有聽有保庇!
  52. 52. Backups
  53. 53. What items need to take care of • CPU – Use more cores • One MR task process uses 1 CPU core • Single core clock rate does not increase much – Do math to compare CPU cores for old and new 2 (codes-per-old-machine * amount-of-machines * increases-percent) / cores-per-new-machine = amount-of-new-machines 1. Hortonworks, Corp., Apache Hadoop Cluster Configuration Guide, 2013 Apr., p. 15. e.g. # of 8 cores machine s to # of 24 cores machine, with 1.5X capacity higher (8 * 10 * 150%) / 24 = 120 / 24 =~ 5 P.S. could consider to enable hyper-threading1, then the # of cores is double, but 1/3 of doubled cores need to keep for OS
  54. 54. What items need to take care of • Memory – Total memories much higher than our old cluster – Consider next gen. computing framework 57 ((per-slot-gigbytes * total-slots + hbase-heap-gigabytes) * 120%-os-mem) * increase-percent / mem-per-new-machine = amount-of-new-machines e.g. 8 slots with 2GB for each per old machine (((2GB * 80 + 8GB) * 120%) * 300%) / 192GB = (168GB * 120%) * 300% / 192GB =~ 4
  55. 55. What items need to take care of • Disk – 2~3X storage capacity to fulfill our BIG data size – Hot swapping support – One disk/partition versus 2~3 process (MR tasks) • Network – Network topology changed (as previous) – 10Gb NIC for Hadoop nodes 58 total-cores / (disks-per-new-machine * amount-of-new-machines) = amount-of-process-per-disk e.g. with total cores is 120; 120 / (12 * 5) =~ 2
  56. 56. What items need to take care of • Rack – Power consumption & cooling – One rack can support our Hadoop nodes is 15, instead of 20 – Ask your HW vendor for PoC !! • Transactional workload (heavy IO load) • Computation workload (100% CPU workload) • Memory Intensive workload (full memory usage) • New Hadoop TMH7 – Build new one first -> migrate -> phase out old one 2
  57. 57. Need services’ cooperation • Services need to port their codes for TMH7 • We released a Dev Env. (all-in-one Hadoop) for services to test in advanced – VMWare image (OVF) – VagrantBox – Docker image • A Jira project for users to submit issues if any 60

×