Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Upgrading from-hdp-21-to-hdp-25

1,155 views

Published on

upgrade hadoop

Published in: Technology
  • Be the first to comment

Upgrading from-hdp-21-to-hdp-25

  1. 1. Upgrading from HDP2.1 to HDP2.5 (Including podcast CM) 2017/03/03 @wyukawa HadoopSCR #hadoopreading https://www.eventbrite.com/e/hadoop-22-tickets- 31987821435
  2. 2. About me • Data Engineer at LINE for more than 4 years • This hadoop upgrade operation is the third time at LINE – https://www.slideshare.net/wyukawa/upgrading- fromhdp21tohdp24-59994044 is the second time
  3. 3. 2014/6-2017/1 • Machines – 40 servers – CPU 24 processors – Memory 64GB – HDD 3.6TB x 12 – Network 1Gbps – Hardware maintenance deadline is 2017/6 • HDP2.1(Ambari 1.6.0) – Hadoop 2.4.0 • NameNode HA • Ambari 1.6 didn’t support ResourceManager HA – Hive 0.13.0 • MapReduce(not Tez)
  4. 4. Hadoop, Hive of HDP2.1 Azkaban 3.1.0 Presto 0.163 Cognos Prestogres Netezza DBDB ETL with Python 2.7.11 InfiniDB Pentaho Saiku 2017/1
  5. 5. 2017/1- • Machines – 40 servers – CPU 40 processors – Memory 256GB – HDD 6.1TB x 12 – Network 10Gbps • HDP2.5.3(Ambari 2.4.2) – Hadoop 2.7.3 • NameNode HA • ResourceManager HA – Hive 1.2.1 • MapReduce • Tez
  6. 6. How to upgrade • Setup new Hadoop Cluster to new machines • Blue green deployment all at once – http://aws.typepad.com/sajp/2015/12/what-is-blue- green-deployment.html • Migrate data by DistCp(-m 20 -bandwidth 125) – Copy 500TB(first copy took about 3 days) • Not parallel execution on both hadoop clusters • See http://d.hatena.ne.jp/wyukawa/20170131/14858 54288 in detail
  7. 7. DistCp with HDFS Snapshot • http://qiita.com/bwtakacy/items/fa63cdcdfc0 5e4043c69 is good article • -update -diff option doesn’t support webhdfs://orig/... – Edit hdfs-site.xml in destination hadoop and use hdfs://orig/...
  8. 8. Migrate Hive schema • Use show create table command • Use msck repair command to add partition – But it didn’t work in too many(for example, 4000) partition tables • Use webhdfs://... in external table – can’t use hdfs://… – but empty returns when you select by presto
  9. 9. HDFS/YARN/Hive/Sqoop setting • dfs.datanode.failed.volumes.tolerated=1 • fs.trash.interval=4320 • Namenode heap 64GB • yarn.nodemanager.resource.memory-mb 100GB • yarn.scheduler.maximum-allocation-mb 100GB • Use DominantResourceCalculator – https://hortonworks.com/blog/managing-cpu-resources-in-your-hadoop-yarn- clusters/ • hive.server2.authentication=NOSASL • hive.server2.enable.doAs=false • hive.auto.convert.join=false • hive.support.sql11.reserved.keywords=false • org.apache.sqoop.splitter.allow_text_splitter=true • Sometimes use Tez – https://community.hortonworks.com/questions/24953/solution-for-hive- runtime-error-while-processing-r.html
  10. 10. My feeling • If you upgrade hadoop with many batches(for example, more than 100 azkaban flows), many errors will occur the next day – highly recommend to upgrade on first half of the week. We upgraded on Tuesday. – share jobs to address batch error • If you do such kind jobs alone, you will be overwhelmed
  11. 11. Podcast • https://itunes.apple.com/jp/podcast/wyukaw as-podcast/id1152456701 • http://wyukawa.tumblr.com/

×