Successfully reported this slideshow.

Migrating your clusters and workloads from Hadoop 2 to Hadoop 3

11

Share

Loading in …3
×
1 of 39
1 of 39

Migrating your clusters and workloads from Hadoop 2 to Hadoop 3

11

Share

Download to read offline

The Hadoop community announced Hadoop 3.0 GA in December, 2017 and 3.1 around April, 2018 loaded with a lot of features and improvements. One of the biggest challenges for any new major release of a software platform is its compatibility. Apache Hadoop community has focused on ensuring wire and binary compatibility for Hadoop 2 clients and workloads.

There are many challenges to be addressed by admins while upgrading to a major release of Hadoop. Users running workloads on Hadoop 2 should be able to seamlessly run or migrate their workloads onto Hadoop 3. This session will be deep diving into upgrade aspects in detail and provide a detailed preview of migration strategies with information on what works and what might not work. This talk would focus on the motivation for upgrading to Hadoop 3 and provide a cluster upgrade guide for admins and workload migration guide for users of Hadoop.

Speaker
Suma Shivaprasad, Hortonworks, Staff Engineer
Rohith Sharma, Hortonworks, Senior Software Engineer

The Hadoop community announced Hadoop 3.0 GA in December, 2017 and 3.1 around April, 2018 loaded with a lot of features and improvements. One of the biggest challenges for any new major release of a software platform is its compatibility. Apache Hadoop community has focused on ensuring wire and binary compatibility for Hadoop 2 clients and workloads.

There are many challenges to be addressed by admins while upgrading to a major release of Hadoop. Users running workloads on Hadoop 2 should be able to seamlessly run or migrate their workloads onto Hadoop 3. This session will be deep diving into upgrade aspects in detail and provide a detailed preview of migration strategies with information on what works and what might not work. This talk would focus on the motivation for upgrading to Hadoop 3 and provide a cluster upgrade guide for admins and workload migration guide for users of Hadoop.

Speaker
Suma Shivaprasad, Hortonworks, Staff Engineer
Rohith Sharma, Hortonworks, Senior Software Engineer

More Related Content

More from DataWorks Summit

Related Books

Free with a 14 day trial from Scribd

See all

Migrating your clusters and workloads from Hadoop 2 to Hadoop 3

  1. 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved Migrating your clusters and workloads from Hadoop 2 to Hadoop 3 Suma Shivaprasad - Staff Engineer Rohith Sharma K S - Senior Software Engineer
  2. 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved Speaker Info Suma Shivaprasad  Apache Hadoop Contributor  Apache Atlas PMC  Staff Engineer @ Hortonworks Rohith Sharma K S  Apache Hadoop PMC  Sr.Software Engineer @ Hortonworks
  3. 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved • Why upgrade to Apache Hadoop 3.x? • Things to consider before upgrade • Upgrade process • Workload migration • Other aspects • Summary Agenda
  4. 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved Why upgrade to Apache Hadoop 3.x?
  5. 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved Major release with lot of features and improvements! Motivation • Federation GA • Erasure Coding • Significant cost savings in storage • Reduction of overhead from 200% to 50% • Intra-DataNode Disk Balancer HDFS • Scheduler Improvements • New Resource types - GPUs, FPGAs • Fast and Global scheduling • Containerization - Docker • Long running Services rehash • New UI2 • Timeline Server v2 YARN
  6. 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved Hadoop-3 Container Runtimes (Docker / Linux / Default) Platform Services Storage Service Discovery Holiday Web App HBase HTTP MR Tez Hive / Pig Hive on LLAPSpark Resource Management Deep Learning App On-Premises Cloud
  7. 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved Things to consider before upgrade
  8. 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved Upgrades involve many things • Upgrade mechanism • Recommendation for 3.x - Express or Rolling ? • Compatibility • Source & Target versions • Tooling • Cluster Environment • Configuration changes • Script changes • Classpath changes
  9. 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved Upgrade mechanism: Express/Rolling Upgrades • “Stop the world” Upgrades • Cluster downtime • Less stringent prerequisites • Process • Upgrade masters and workers in one shot Express Upgrades • Preserve cluster operation • Minimizes Service impact and downtime • Can take longer to complete • Process • Upgrades masters and workers in batches Rolling Upgrades
  10. 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved Recommendation for 3.x - Express or Rolling ? ● Major version upgrade ● Challenges and issues in supporting Rolling Upgrades ● Why rolling upgrades can't be done? ● HDFS-13596 ● Change in edit log format ● HADOOP-15502 ● MetricsPlugin API In-compatibility change ● HDFS-6440 ● Incompatible changes in image transfer protocol ● Recommended ● ‘Express Upgrade’ from Hadoop 2 to 3
  11. 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved Compatibility • Wire compatibility o Preserves compatibility with Hadoop 2 clients o Distcp/WebHDFS compatibility preserved • API compatibility Not fully! o Dependency version bumps o Removal of deprecated APIs and tools o Shell script rewrite, rework of Hadoop tools scripts o Incompatible bug fixes!
  12. 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved Source & Target versions ● Upgrades Tested with • Why 2.8.4 release? ● Most of production deployments are close to 2.8.x ● What should users of 2.6.x and 2.7.x do? ● Recommend upgrading at least to Hadoop 2.8.4 before migrating to Hadoop 3! Hadoop 2 Base version Hadoop 3 Base version Apache Hadoop 2.8.4 Apache Hadoop 3.1.x
  13. 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved Tooling ● Fresh Install ● Fully automated via Apache Ambari ● Manual installation of RPMs/Tar balls ● Upgrade ● Fully automated via Apache Ambari 2.7 ● Manual upgrade
  14. 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved Cluster Environment • >= Java 8 • Java 7 EOL in April 2015 • Lot of libraries support only Java 8 Java • >= Bash V3 • POSIX shell NOT supported Shell • If you want to use containerized apps in 3.x • >= 1.12.5 • Also corresponding stable OS Docker
  15. 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved Configuration changes: Hadoop Env files • Common placeholder • Precedence rule • yarn/hdfs-env.sh > hadoop-env.sh > hard-coded defaults hadoop-env.sh • HDFS_* replaces HADOOP_* • Precedence rule • hdfs-env.sh > hadoop-env.sh > hard-coded defaults hdfs-env.sh • YARN_* replaces HADOOP_* • Precedence rule • yarn-env.sh > hadoop-env.sh > hard-coded defaults yarn-env.sh
  16. 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved Configuration changes: Hadoop Env files Contd.. Daemon Heap Size HADOOP-10950 • Deprecated • HADOOP_HEAPSIZE • Replaced with • HADOOP_HEAPSIZE_MAX and HADOOP_HEAPSIZE_MIN • Units support in heap size • Default unit is MB • Ex: HADOOP_HEAPSIZE_MAX=4096 • Ex: HADOOP_HEAPSIZE_MAX=4g • Auto-tuning • Based on memory size of the host
  17. 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved Configuration changes: YARN Modified Defaults • RM Max Completed Applications in State Store/Memory Configuration Previous Current yarn.resourcemanager.max-completed- applications 10000 1000 yarn.resourcemanager.state-store.max- completed-applications 10000 1000
  18. 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved Configurations Changes: HDFS Service Previous Current Port NameNode 50470 50070 9871 9870 DataNode 50020 50010 50475 50075 9867 9866 9865 9864 Secondary NameNode 50091 50090 9869 9868 KMS 16000 9600 Change in Default Daemon Ports (HDFS-9427)
  19. 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved Script changes: Starting/Stopping Hadoop Daemons Daemon scripts • *-daemon.sh deprecated • Use bin/hdfs or bin/yarn commands with --daemon option • Ex: bin/hdfs --daemon start/stop/status namenode • Ex: bin/yarn --daemon start/stop/status resourcemanager Debuggability • Scripts support –debug • Construction of env • Java options and classpath Logs/Pid • Created as hadoop-yarn* instead of yarn-yarn* • Log4j settings in the *-daemon.sh have been removed. Instead set via *_OPT in*-env.sh • Eg: YARN_RESOURCEMANAGER_OPTS in yarn-env.sh
  20. 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved Classpath Changes Classpath isolation now! Users should rebuild their applications with shaded hadoop-client jars ● Hadoop Dependencies leaked to application’s classpath - Guava, protobuf,jackson,jetty... ● Shaded jars available - isolates downstream clients from any third party dependencies HADOOP-11804 ○ hadoop-client-api For compile time dependencies ○ hadoop-client-runtime For runtime third-party dependencies ○ hadoop-minicluster For test scope dependencies ● HDFS-6200 hadoop-hdfs jar contained both the hdfs server and the hdfs client. ○ Clients should instead depend on hadoop-hdfs-client instead to isolate themselves from server-side dependencies ● No YARN/MR shaded jars
  21. 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved Upgrade process
  22. 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved YARN • Stop all YARN queues • Stop/Wait for Running applications to complete Hadoop Pre-Upgrade Steps HDFS • Run fsck and fix any errors • hdfs fsck / -files –blocks –locations > dfs-old- fsck.1.log • Checkpoint Metadata • hdfs dfsadmin -safemode enter • hdfs dfsadmin -saveNamespace • Backup checkpoint files • ${dfs.namenode.name.dir}/current • Get Cluster DataNode reports • hdfs dfsadmin -report > dfs-old-report-1.log • Capture Namespace • hdfs dfs –ls –R / > dfs-old-lsr-1.log • Finalize previous upgrade • hdfs dfsadmin –finalizeUpgrade STACK • Backup Configuration files • Stop users/services using YARN/HDFS • Other metadata backup – Hive MetaStore, Oozie etc
  23. 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved Upgrade Steps Configuration Updates Additional HDFS Upgrade Steps https://docs.hortonworks.com/HDPDocuments/HDP2/HDP- 2.6.3/bk_command-line-upgrade/content/start-hadoop-core- 25.html Install new packages Link to new versions Start ServicesStop Services
  24. 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved Upgrade Validation • Run HDFS Service checks • Verify NameNode gets out of Safe Mode hdfs dfsadmin -safeMode wait • FileSystem Health • Compare with Previous State • Node list • Full NameSpace • Let Cluster run production workloads for a while • When ready to discard backup, finalize HDFS upgrade hdfs dfsadmin –upgrade finalize/query HDFS • Run YARN Service checks • Submit test applications – MR, TEZ, … YARN
  25. 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved Enable New features • Erasure Coding • https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop- hdfs/HDFSErasureCoding.html • YARN UI2 • https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnUI2.html • ATSv2 • New Daemon – Timeline Reader • https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html • YARN DNS • Service Discovery of YARN Services • http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn- service/RegistryDNS.html • HDFS Federation • https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/Federation.html
  26. 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved Migrating workloads
  27. 27. 27 © Hortonworks Inc. 2011–2018. All rights reserved ● Full Binary compatibility of mapreduce APIs ● hadoop-streaming related deprecated IO Formats removed HADOOP-10485 ○ XMLRecordInput/Output ○ CSVRecordInput Compatibility Configuration MapReduce (1/2) ● yarn.app.mapreduce.client.job.max- retries o Default changed from 0 to 3 o Protects clients from failures that are transient.
  28. 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved MapReduce (2/2) - Task Heap Management MAPREDUCE-5785 mapreduce.map.memory.mb mapreduce.map.java.opts Xmx Behaviour Configured 2048MB Configured 1638 MB No Change 1638MB Configured 2048 MB Not Configured Derived from mapreduce.map.memory.mb 1638MB Not Configured Configure 1638 MB Automatically inferred from Xmx in mapreduce.map.java.opts. 1638MB Not Configured Not Configured Default : 1024 MB Heap size no longer needs to be specified in task configuration and Java options.
  29. 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved Hive on Tez ● Hive 3.0.0 Hive version supporting Hadoop 3 HIVE-16531 ● Does NOT support rolling upgrades ● Acid table format changes are not compatible with 2.x ● Tez version support for Hadoop 3 o Planned for release 0.10.0 ● TEZ-3923 Move master to Hadoop 3+ and create separate 0.9.x line ● TEZ-3252 - [Umbrella] Enable support for Hadoop-3.x
  30. 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved Spark Ongoing efforts in community to build/validate Spark with Hadoop 3 ○ SPARK-23534 Umbrella jira to Build/test with Hadoop 3
  31. 31. 31 © Hortonworks Inc. 2011–2018. All rights reserved Apache HBase ● HBase 2.0 supports Hadoop 3 ● Does NOT support Rolling Upgrades in major version upgrades (1.x to 2.x) ● Refer Upgrade documentation for further details https://github.com/apache/hbase/blob/master/src/main/asciidoc/_chapters/upgrading .adoc#upgrade2.0
  32. 32. 32 © Hortonworks Inc. 2011–2018. All rights reserved • Apache Slider is retiring from Apache Incubator • Superseded by YARN Services. • Port your Slider apps to Yarn Services • Benefits of Yarn Services • Easier to manage and deploy • Single “yarnfile” to configure a Yarn Service • Supports container placement scheduling such as affinity and anti-affinity YARN-6592 • Rolling upgrades for containers and service YARN-7512 and YARN-4726. • Services UI in YARN UI2 improving debuggability and log access. Apache Slider Applications • YarnUI2 Integration • Rolling Upgrades for containers/Se rvices • Affinity • Anti-Affinity • Single “yarnfile” for services Easier to Manage Placement UIUpgrades
  33. 33. 33 © Hortonworks Inc. 2011–2018. All rights reserved Hive on LLAP ● Now runs as a Yarn Service Application instead of a Slider App ● Version that supports LLAP as a YARN service is not released yet. ● Planned for release Hive-4.0.0/3.1.0 ● Refer https://hortonworks.com/blog/apache-hive-llap-as-a-yarn-service
  34. 34. 34 © Hortonworks Inc. 2011–2018. All rights reserved PIG/Oozie Support for Hadoop 3 In-Progress in the community ● PIG ○ Planned for release – 0.18.0 ○ PIG-5253 Pig Hadoop 3 support ● OOZIE ○ Planned for release – 5.1.0 ○ OOZIE-2973 Make sure Oozie works with Hadoop 3
  35. 35. 35 © Hortonworks Inc. 2011–2018. All rights reserved Other Aspects
  36. 36. 36 © Hortonworks Inc. 2011–2018. All rights reserved Other Aspects Validations In-progress ● Performance testing ● Scale testing for HDFS/YARN ● OSes compatibility
  37. 37. 37 © Hortonworks Inc. 2011–2018. All rights reserved Summary • Hadoop 3 • Eagerly awaited release with lots of new features and optimizations ! • 3.1.1 will be released soon with some bug fixes identified since 3.1.0 • Express Upgrades are recommended • Admins • A bit of work • Users • Should work mostly as-is • Community effort • HADOOP-15501 Upgrade efforts to Hadoop 3.x • Wiki - https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.x+to+3.x+Upgrade+Efforts • Volunteers needed for validating workload upgrades on Hadoop 3 !
  38. 38. 38 © Hortonworks Inc. 2011–2018. All rights reserved Questions?
  39. 39. 39 © Hortonworks Inc. 2011–2018. All rights reserved Thank you

Editor's Notes

  • YARN
    Yarn scheduler Improvements
    Improves cluster throughput
    Distributed scheduling significantly
    Fine grained scheduling according to resource types - GPUs, FPGAs
    Support for Long running Services and Docker
    Revamped UI
    ATS v2 - More scalable and based on Hbase

    HDFS
    HDFS Federation
    HDFS Intra-DataNode Disk Balancer
    Erasure Coding
    Significant cost savings in storage - savings in storage cost
    Reduction of overhead from 200% to 50%

  • TODO- Animation
  • YARN
    Yarn scheduler Improvements
    Improves cluster throughput
    Distributed scheduling significantly
    Fine grained scheduling according to resource types - GPUs, FPGAs
    Support for Long running Services and Docker
    Revamped UI
    ATS v2 - More scalable and based on Hbase

    HDFS
    HDFS Federation
    HDFS Intra-DataNode Disk Balancer
    Erasure Coding
    Significant cost savings in storage - savings in storage cost
    Reduction of overhead from 200% to 50%

  • Major version upgrade - there are some challenges and issues in supporting Rolling Upgrades
    Why rolling upgrades cannot be done?
    HDFS Rolling Upgrade has issues with NN restarts due to change in edit log format HDFS-13596
    Hadoop daemons configured with a custom MetricsPlugin sink fail to start after upgrade HADOOP-15502 (API In-compatibility)
    HDFS-6440
    DataNode layout change prevents downgrade.
    Recommend admins to ‘Express Upgrade’ clusters from Hadoop 2 to 3
    Community effort for upgrade issues is being tracked here - HADOOP-15501
  • Wire compatibility
    Preserves compatibility with Hadoop 2 clients
    Distcp/WebHDFS compatibility preserved
    API compatibility
    Not fully!
    Dependency version bumps
    Removal of deprecated APIs and tools
    Shell script rewrite, rework of Hadoop tools scripts
    Incompatible bug fixes
  • Upgrade has been tested/validated from
    Apache Hadoop 2.8.4 to Hadoop 3.1.0 in our test environments


    Ongoing effort in community to release Hadoop 3.1.1 with a lot of fixes.
    Recommend upgrading lower Hadoop 2 versions to at least Hadoop 2.8.4 before migrating to Hadoop 3
  • Fresh Install
    Fully automated via Apache Ambari
    Manual installation of RPMs/Tarballs
    Upgrade
    Fully automated via Apache Ambari
    Manual upgrade
  • Fresh Install
    Fully automated via Apache Ambari
    Manual installation of RPMs/Tarballs
    Upgrade
    Fully automated via Apache Ambari
    Manual upgrade
  • Fresh Install
    Fully automated via Apache Ambari
    Manual installation of RPMs/Tarballs
    Upgrade
    Fully automated via Apache Ambari
    Manual upgrade
  • [10:46 PM] Rohith Sharma KS:
    -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/yarn-daemon.sh --config /usr/apache/conf start nodemanager
    WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR.
    WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR.
    WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE.
    WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR.
    WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER.
    WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
    WARNING: Use of this script to start YARN daemons is deprecated.
    WARNING: Attempting to execute replacement "yarn --daemon start" instead.
    WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR.
    WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR.
    WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR.
    WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER.
    [10:47 PM] Rohith Sharma KS:
    -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/hadoop-daemon.sh --config /usr/apache/conf start datanode
    WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
    WARNING: Use of this script to start HDFS daemons is deprecated.
    WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
    WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
    WARNING: HADOOP_SECURE_DN_PID_DIR has been replaced by HADOOP_SECURE_PID_DIR. Using value of HADOOP_SECURE_DN_PID_DIR.
    WARNING: HADOOP_SECURE_DN_LOG_DIR has been replaced by HADOOP_SECURE_LOG_DIR. Using value of HADOOP_SECURE_DN_LOG_DIR.
    WARNING: HADOOP_DATANODE_OPTS has been replaced by HDFS_DATANODE_OPTS. Using value of HADOOP_DATANODE_OPTS.
    ERROR: You must be a privileged user in order to run a secure service.
    [10:48 PM] Rohith Sharma KS:
    WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
    WARNING: Use of this script to stop HDFS daemons is deprecated.
    WARNING: Attempting to execute replacement "hdfs --daemon stop" instead.
    WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
    WARNING: HADOOP_NAMENODE_OPTS has been replaced by HDFS_NAMENODE_OPTS. Using value of HADOOP_NAMENODE_OPTS.
  • [10:46 PM] Rohith Sharma KS:
    -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/yarn-daemon.sh --config /usr/apache/conf start nodemanager
    WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR.
    WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR.
    WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE.
    WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR.
    WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER.
    WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
    WARNING: Use of this script to start YARN daemons is deprecated.
    WARNING: Attempting to execute replacement "yarn --daemon start" instead.
    WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR.
    WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR.
    WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR.
    WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER.
    [10:47 PM] Rohith Sharma KS:
    -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/hadoop-daemon.sh --config /usr/apache/conf start datanode
    WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
    WARNING: Use of this script to start HDFS daemons is deprecated.
    WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
    WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
    WARNING: HADOOP_SECURE_DN_PID_DIR has been replaced by HADOOP_SECURE_PID_DIR. Using value of HADOOP_SECURE_DN_PID_DIR.
    WARNING: HADOOP_SECURE_DN_LOG_DIR has been replaced by HADOOP_SECURE_LOG_DIR. Using value of HADOOP_SECURE_DN_LOG_DIR.
    WARNING: HADOOP_DATANODE_OPTS has been replaced by HDFS_DATANODE_OPTS. Using value of HADOOP_DATANODE_OPTS.
    ERROR: You must be a privileged user in order to run a secure service.
    [10:48 PM] Rohith Sharma KS:
    WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
    WARNING: Use of this script to stop HDFS daemons is deprecated.
    WARNING: Attempting to execute replacement "hdfs --daemon stop" instead.
    WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
    WARNING: HADOOP_NAMENODE_OPTS has been replaced by HDFS_NAMENODE_OPTS. Using value of HADOOP_NAMENODE_OPTS.
  • [10:46 PM] Rohith Sharma KS:
    -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/yarn-daemon.sh --config /usr/apache/conf start nodemanager
    WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR.
    WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR.
    WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE.
    WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR.
    WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER.
    WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
    WARNING: Use of this script to start YARN daemons is deprecated.
    WARNING: Attempting to execute replacement "yarn --daemon start" instead.
    WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR.
    WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR.
    WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR.
    WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER.
    [10:47 PM] Rohith Sharma KS:
    -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/hadoop-daemon.sh --config /usr/apache/conf start datanode
    WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
    WARNING: Use of this script to start HDFS daemons is deprecated.
    WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
    WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
    WARNING: HADOOP_SECURE_DN_PID_DIR has been replaced by HADOOP_SECURE_PID_DIR. Using value of HADOOP_SECURE_DN_PID_DIR.
    WARNING: HADOOP_SECURE_DN_LOG_DIR has been replaced by HADOOP_SECURE_LOG_DIR. Using value of HADOOP_SECURE_DN_LOG_DIR.
    WARNING: HADOOP_DATANODE_OPTS has been replaced by HDFS_DATANODE_OPTS. Using value of HADOOP_DATANODE_OPTS.
    ERROR: You must be a privileged user in order to run a secure service.
    [10:48 PM] Rohith Sharma KS:
    WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
    WARNING: Use of this script to stop HDFS daemons is deprecated.
    WARNING: Attempting to execute replacement "hdfs --daemon stop" instead.
    WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
    WARNING: HADOOP_NAMENODE_OPTS has been replaced by HDFS_NAMENODE_OPTS. Using value of HADOOP_NAMENODE_OPTS.
  • HDFS
    Backup Configuration files
    Create a list of all the DataNodes in the cluster.
    hdfs dfsadmin -report > dfs-old-report-1.log
    Save Namespace
    hdfs dfsadmin -safemode enter
    hdfs dfsadmin -saveNamespace
    Backup the checkpoint files located
    in ${dfs.namenode.name.dir}/current
    Finalize any prior HDFS upgrade
    hdfs dfsadmin -finalizeUpgrade
    Create a fsimage for rollback
    hdfs dfsadmin -rollingUpgrade prepare



    YARN
    Stop all YARN queues
    Stop/Wait for Running applications to finish
    NOTE: YARN supports rolling upgrade!
  • YARN
    Yarn scheduler Improvements
    Improves cluster throughput
    Distributed scheduling significantly
    Fine grained scheduling according to resource types - GPUs, FPGAs
    Support for Long running Services and Docker
    Revamped UI
    ATS v2 - More scalable and based on Hbase

    HDFS
    HDFS Federation
    HDFS Intra-DataNode Disk Balancer
    Erasure Coding
    Significant cost savings in storage - savings in storage cost
    Reduction of overhead from 200% to 50%

  • Heap size no longer needs to be specified in task configuration and Java options.
  • ×