SlideShare a Scribd company logo

Migrating your clusters and workloads from Hadoop 2 to Hadoop 3

The Hadoop community announced Hadoop 3.0 GA in December, 2017 and 3.1 around April, 2018 loaded with a lot of features and improvements. One of the biggest challenges for any new major release of a software platform is its compatibility. Apache Hadoop community has focused on ensuring wire and binary compatibility for Hadoop 2 clients and workloads. There are many challenges to be addressed by admins while upgrading to a major release of Hadoop. Users running workloads on Hadoop 2 should be able to seamlessly run or migrate their workloads onto Hadoop 3. This session will be deep diving into upgrade aspects in detail and provide a detailed preview of migration strategies with information on what works and what might not work. This talk would focus on the motivation for upgrading to Hadoop 3 and provide a cluster upgrade guide for admins and workload migration guide for users of Hadoop. Speaker Suma Shivaprasad, Hortonworks, Staff Engineer Rohith Sharma, Hortonworks, Senior Software Engineer

1 of 39
Download to read offline
1 © Hortonworks Inc. 2011–2018. All rights reserved
Migrating your clusters and workloads
from Hadoop 2 to Hadoop 3
Suma Shivaprasad - Staff Engineer
Rohith Sharma K S - Senior Software Engineer
2 © Hortonworks Inc. 2011–2018. All rights reserved
Speaker Info
Suma Shivaprasad
 Apache Hadoop Contributor
 Apache Atlas PMC
 Staff Engineer @ Hortonworks
Rohith Sharma K S
 Apache Hadoop PMC
 Sr.Software Engineer @ Hortonworks
3 © Hortonworks Inc. 2011–2018. All rights reserved
• Why upgrade to Apache Hadoop 3.x?
• Things to consider before upgrade
• Upgrade process
• Workload migration
• Other aspects
• Summary
Agenda
4 © Hortonworks Inc. 2011–2018. All rights reserved
Why upgrade to Apache
Hadoop 3.x?
5 © Hortonworks Inc. 2011–2018. All rights reserved
Major release with lot of features and improvements!
Motivation
• Federation GA
• Erasure Coding
• Significant cost savings in storage
• Reduction of overhead from 200%
to 50%
• Intra-DataNode Disk Balancer
HDFS
• Scheduler Improvements
• New Resource types - GPUs, FPGAs
• Fast and Global scheduling
• Containerization - Docker
• Long running Services rehash
• New UI2
• Timeline Server v2
YARN
6 © Hortonworks Inc. 2011–2018. All rights reserved
Hadoop-3
Container Runtimes (Docker / Linux / Default)
Platform Services
Storage
Service
Discovery
Holiday Web App
HBase
HTTP
MR Tez
Hive / Pig
Hive on
LLAPSpark
Resource
Management
Deep Learning App
On-Premises Cloud

Recommended

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceDatabricks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizonThejas Nair
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 

More Related Content

What's hot

HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiLev Brailovskiy
 
Improving PySpark performance: Spark Performance Beyond the JVM
Improving PySpark performance: Spark Performance Beyond the JVMImproving PySpark performance: Spark Performance Beyond the JVM
Improving PySpark performance: Spark Performance Beyond the JVMHolden Karau
 
How to Performance-Tune Apache Spark Applications in Large Clusters
How to Performance-Tune Apache Spark Applications in Large ClustersHow to Performance-Tune Apache Spark Applications in Large Clusters
How to Performance-Tune Apache Spark Applications in Large ClustersDatabricks
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Timothy Spann
 
Dynamic Allocation in Spark
Dynamic Allocation in SparkDynamic Allocation in Spark
Dynamic Allocation in SparkDatabricks
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimDatabricks
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...Databricks
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLDatabricks
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsFlink Forward
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberFlink Forward
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?DataWorks Summit
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 

What's hot (20)

HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
 
Improving PySpark performance: Spark Performance Beyond the JVM
Improving PySpark performance: Spark Performance Beyond the JVMImproving PySpark performance: Spark Performance Beyond the JVM
Improving PySpark performance: Spark Performance Beyond the JVM
 
How to Performance-Tune Apache Spark Applications in Large Clusters
How to Performance-Tune Apache Spark Applications in Large ClustersHow to Performance-Tune Apache Spark Applications in Large Clusters
How to Performance-Tune Apache Spark Applications in Large Clusters
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016
 
Dynamic Allocation in Spark
Dynamic Allocation in SparkDynamic Allocation in Spark
Dynamic Allocation in Spark
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 

Similar to Migrating your clusters and workloads from Hadoop 2 to Hadoop 3

Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop ClusterEnterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop ClusterDataWorks Summit
 
Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhereDocker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhereDataWorks Summit
 
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop ClusterEnterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop ClusterDataWorks Summit
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?DataWorks Summit
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019alanfgates
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storySunil Govindan
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and FutureDataWorks Summit
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Hortonworks
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Chris Nauroth
 
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleApache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleHortonworks
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFTDataWorks Summit
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith SharmaNewton Alex
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopYifeng Jiang
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionWangda Tan
 

Similar to Migrating your clusters and workloads from Hadoop 2 to Hadoop 3 (20)

Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop ClusterEnterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
 
Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhereDocker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere
 
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop ClusterEnterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
 
Containers and Big Data
Containers and Big DataContainers and Big Data
Containers and Big Data
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Running Services on YARN
Running Services on YARNRunning Services on YARN
Running Services on YARN
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
 
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleApache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, Scale
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFT
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Python For Kids - Sách Lập trình cho trẻ em
Python For Kids - Sách Lập trình cho trẻ emPython For Kids - Sách Lập trình cho trẻ em
Python For Kids - Sách Lập trình cho trẻ emNho Vĩnh
 
KUBRICK Graphs: A journey from in vogue to success-ion
KUBRICK Graphs: A journey from in vogue to success-ionKUBRICK Graphs: A journey from in vogue to success-ion
KUBRICK Graphs: A journey from in vogue to success-ionNeo4j
 
Artificial Intelligence, Design, and More-than-Human Justice
Artificial Intelligence, Design, and More-than-Human JusticeArtificial Intelligence, Design, and More-than-Human Justice
Artificial Intelligence, Design, and More-than-Human JusticeJosh Gellers
 
Utilising Energy Modelling for LCSF and PSDS Funding Applications
Utilising Energy Modelling for LCSF and PSDS Funding ApplicationsUtilising Energy Modelling for LCSF and PSDS Funding Applications
Utilising Energy Modelling for LCSF and PSDS Funding ApplicationsIES VE
 
Revolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
Revolutionizing The Banking Industry: The Monzo Way by CPO, MonzoRevolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
Revolutionizing The Banking Industry: The Monzo Way by CPO, MonzoProduct School
 
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...htrindia
 
Confoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceConfoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceSusan Ibach
 
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31shyamraj55
 
Centralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-ManagerCentralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-ManagerSaiLinnThu2
 
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxGraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxNeo4j
 
IT Nation Evolve event 2024 - Quarter 1
IT Nation Evolve event 2024  - Quarter 1IT Nation Evolve event 2024  - Quarter 1
IT Nation Evolve event 2024 - Quarter 1Inbay UK
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...Neo4j
 
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Umar Saif
 
Roundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfRoundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfMostafa Higazy
 
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Product School
 
AI for Educators - Integrating AI in the Classrooms
AI for Educators - Integrating AI in the ClassroomsAI for Educators - Integrating AI in the Classrooms
AI for Educators - Integrating AI in the ClassroomsPremsankar Chakkingal
 
Huntly presentation deck design for Behance
Huntly presentation deck design for BehanceHuntly presentation deck design for Behance
Huntly presentation deck design for Behancewhalesdesign
 
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Jay Zhao
 
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...Product School
 
Establishing data sharing standards to promote global industry development
Establishing data sharing standards to promote global industry developmentEstablishing data sharing standards to promote global industry development
Establishing data sharing standards to promote global industry developmentThorsten Huelsmann
 

Recently uploaded (20)

Python For Kids - Sách Lập trình cho trẻ em
Python For Kids - Sách Lập trình cho trẻ emPython For Kids - Sách Lập trình cho trẻ em
Python For Kids - Sách Lập trình cho trẻ em
 
KUBRICK Graphs: A journey from in vogue to success-ion
KUBRICK Graphs: A journey from in vogue to success-ionKUBRICK Graphs: A journey from in vogue to success-ion
KUBRICK Graphs: A journey from in vogue to success-ion
 
Artificial Intelligence, Design, and More-than-Human Justice
Artificial Intelligence, Design, and More-than-Human JusticeArtificial Intelligence, Design, and More-than-Human Justice
Artificial Intelligence, Design, and More-than-Human Justice
 
Utilising Energy Modelling for LCSF and PSDS Funding Applications
Utilising Energy Modelling for LCSF and PSDS Funding ApplicationsUtilising Energy Modelling for LCSF and PSDS Funding Applications
Utilising Energy Modelling for LCSF and PSDS Funding Applications
 
Revolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
Revolutionizing The Banking Industry: The Monzo Way by CPO, MonzoRevolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
Revolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
 
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
 
Confoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceConfoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data science
 
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
 
Centralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-ManagerCentralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-Manager
 
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxGraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
 
IT Nation Evolve event 2024 - Quarter 1
IT Nation Evolve event 2024  - Quarter 1IT Nation Evolve event 2024  - Quarter 1
IT Nation Evolve event 2024 - Quarter 1
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
 
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
 
Roundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfRoundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdf
 
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
 
AI for Educators - Integrating AI in the Classrooms
AI for Educators - Integrating AI in the ClassroomsAI for Educators - Integrating AI in the Classrooms
AI for Educators - Integrating AI in the Classrooms
 
Huntly presentation deck design for Behance
Huntly presentation deck design for BehanceHuntly presentation deck design for Behance
Huntly presentation deck design for Behance
 
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
 
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
 
Establishing data sharing standards to promote global industry development
Establishing data sharing standards to promote global industry developmentEstablishing data sharing standards to promote global industry development
Establishing data sharing standards to promote global industry development
 

Migrating your clusters and workloads from Hadoop 2 to Hadoop 3

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved Migrating your clusters and workloads from Hadoop 2 to Hadoop 3 Suma Shivaprasad - Staff Engineer Rohith Sharma K S - Senior Software Engineer
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved Speaker Info Suma Shivaprasad  Apache Hadoop Contributor  Apache Atlas PMC  Staff Engineer @ Hortonworks Rohith Sharma K S  Apache Hadoop PMC  Sr.Software Engineer @ Hortonworks
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved • Why upgrade to Apache Hadoop 3.x? • Things to consider before upgrade • Upgrade process • Workload migration • Other aspects • Summary Agenda
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved Why upgrade to Apache Hadoop 3.x?
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved Major release with lot of features and improvements! Motivation • Federation GA • Erasure Coding • Significant cost savings in storage • Reduction of overhead from 200% to 50% • Intra-DataNode Disk Balancer HDFS • Scheduler Improvements • New Resource types - GPUs, FPGAs • Fast and Global scheduling • Containerization - Docker • Long running Services rehash • New UI2 • Timeline Server v2 YARN
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved Hadoop-3 Container Runtimes (Docker / Linux / Default) Platform Services Storage Service Discovery Holiday Web App HBase HTTP MR Tez Hive / Pig Hive on LLAPSpark Resource Management Deep Learning App On-Premises Cloud
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved Things to consider before upgrade
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved Upgrades involve many things • Upgrade mechanism • Recommendation for 3.x - Express or Rolling ? • Compatibility • Source & Target versions • Tooling • Cluster Environment • Configuration changes • Script changes • Classpath changes
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved Upgrade mechanism: Express/Rolling Upgrades • “Stop the world” Upgrades • Cluster downtime • Less stringent prerequisites • Process • Upgrade masters and workers in one shot Express Upgrades • Preserve cluster operation • Minimizes Service impact and downtime • Can take longer to complete • Process • Upgrades masters and workers in batches Rolling Upgrades
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved Recommendation for 3.x - Express or Rolling ? ● Major version upgrade ● Challenges and issues in supporting Rolling Upgrades ● Why rolling upgrades can't be done? ● HDFS-13596 ● Change in edit log format ● HADOOP-15502 ● MetricsPlugin API In-compatibility change ● HDFS-6440 ● Incompatible changes in image transfer protocol ● Recommended ● ‘Express Upgrade’ from Hadoop 2 to 3
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved Compatibility • Wire compatibility o Preserves compatibility with Hadoop 2 clients o Distcp/WebHDFS compatibility preserved • API compatibility Not fully! o Dependency version bumps o Removal of deprecated APIs and tools o Shell script rewrite, rework of Hadoop tools scripts o Incompatible bug fixes!
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved Source & Target versions ● Upgrades Tested with • Why 2.8.4 release? ● Most of production deployments are close to 2.8.x ● What should users of 2.6.x and 2.7.x do? ● Recommend upgrading at least to Hadoop 2.8.4 before migrating to Hadoop 3! Hadoop 2 Base version Hadoop 3 Base version Apache Hadoop 2.8.4 Apache Hadoop 3.1.x
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved Tooling ● Fresh Install ● Fully automated via Apache Ambari ● Manual installation of RPMs/Tar balls ● Upgrade ● Fully automated via Apache Ambari 2.7 ● Manual upgrade
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved Cluster Environment • >= Java 8 • Java 7 EOL in April 2015 • Lot of libraries support only Java 8 Java • >= Bash V3 • POSIX shell NOT supported Shell • If you want to use containerized apps in 3.x • >= 1.12.5 • Also corresponding stable OS Docker
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved Configuration changes: Hadoop Env files • Common placeholder • Precedence rule • yarn/hdfs-env.sh > hadoop-env.sh > hard-coded defaults hadoop-env.sh • HDFS_* replaces HADOOP_* • Precedence rule • hdfs-env.sh > hadoop-env.sh > hard-coded defaults hdfs-env.sh • YARN_* replaces HADOOP_* • Precedence rule • yarn-env.sh > hadoop-env.sh > hard-coded defaults yarn-env.sh
  • 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved Configuration changes: Hadoop Env files Contd.. Daemon Heap Size HADOOP-10950 • Deprecated • HADOOP_HEAPSIZE • Replaced with • HADOOP_HEAPSIZE_MAX and HADOOP_HEAPSIZE_MIN • Units support in heap size • Default unit is MB • Ex: HADOOP_HEAPSIZE_MAX=4096 • Ex: HADOOP_HEAPSIZE_MAX=4g • Auto-tuning • Based on memory size of the host
  • 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved Configuration changes: YARN Modified Defaults • RM Max Completed Applications in State Store/Memory Configuration Previous Current yarn.resourcemanager.max-completed- applications 10000 1000 yarn.resourcemanager.state-store.max- completed-applications 10000 1000
  • 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved Configurations Changes: HDFS Service Previous Current Port NameNode 50470 50070 9871 9870 DataNode 50020 50010 50475 50075 9867 9866 9865 9864 Secondary NameNode 50091 50090 9869 9868 KMS 16000 9600 Change in Default Daemon Ports (HDFS-9427)
  • 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved Script changes: Starting/Stopping Hadoop Daemons Daemon scripts • *-daemon.sh deprecated • Use bin/hdfs or bin/yarn commands with --daemon option • Ex: bin/hdfs --daemon start/stop/status namenode • Ex: bin/yarn --daemon start/stop/status resourcemanager Debuggability • Scripts support –debug • Construction of env • Java options and classpath Logs/Pid • Created as hadoop-yarn* instead of yarn-yarn* • Log4j settings in the *-daemon.sh have been removed. Instead set via *_OPT in*-env.sh • Eg: YARN_RESOURCEMANAGER_OPTS in yarn-env.sh
  • 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved Classpath Changes Classpath isolation now! Users should rebuild their applications with shaded hadoop-client jars ● Hadoop Dependencies leaked to application’s classpath - Guava, protobuf,jackson,jetty... ● Shaded jars available - isolates downstream clients from any third party dependencies HADOOP-11804 ○ hadoop-client-api For compile time dependencies ○ hadoop-client-runtime For runtime third-party dependencies ○ hadoop-minicluster For test scope dependencies ● HDFS-6200 hadoop-hdfs jar contained both the hdfs server and the hdfs client. ○ Clients should instead depend on hadoop-hdfs-client instead to isolate themselves from server-side dependencies ● No YARN/MR shaded jars
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved Upgrade process
  • 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved YARN • Stop all YARN queues • Stop/Wait for Running applications to complete Hadoop Pre-Upgrade Steps HDFS • Run fsck and fix any errors • hdfs fsck / -files –blocks –locations > dfs-old- fsck.1.log • Checkpoint Metadata • hdfs dfsadmin -safemode enter • hdfs dfsadmin -saveNamespace • Backup checkpoint files • ${dfs.namenode.name.dir}/current • Get Cluster DataNode reports • hdfs dfsadmin -report > dfs-old-report-1.log • Capture Namespace • hdfs dfs –ls –R / > dfs-old-lsr-1.log • Finalize previous upgrade • hdfs dfsadmin –finalizeUpgrade STACK • Backup Configuration files • Stop users/services using YARN/HDFS • Other metadata backup – Hive MetaStore, Oozie etc
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved Upgrade Steps Configuration Updates Additional HDFS Upgrade Steps https://docs.hortonworks.com/HDPDocuments/HDP2/HDP- 2.6.3/bk_command-line-upgrade/content/start-hadoop-core- 25.html Install new packages Link to new versions Start ServicesStop Services
  • 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved Upgrade Validation • Run HDFS Service checks • Verify NameNode gets out of Safe Mode hdfs dfsadmin -safeMode wait • FileSystem Health • Compare with Previous State • Node list • Full NameSpace • Let Cluster run production workloads for a while • When ready to discard backup, finalize HDFS upgrade hdfs dfsadmin –upgrade finalize/query HDFS • Run YARN Service checks • Submit test applications – MR, TEZ, … YARN
  • 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved Enable New features • Erasure Coding • https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop- hdfs/HDFSErasureCoding.html • YARN UI2 • https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnUI2.html • ATSv2 • New Daemon – Timeline Reader • https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html • YARN DNS • Service Discovery of YARN Services • http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/yarn- service/RegistryDNS.html • HDFS Federation • https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/Federation.html
  • 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved Migrating workloads
  • 27. 27 © Hortonworks Inc. 2011–2018. All rights reserved ● Full Binary compatibility of mapreduce APIs ● hadoop-streaming related deprecated IO Formats removed HADOOP-10485 ○ XMLRecordInput/Output ○ CSVRecordInput Compatibility Configuration MapReduce (1/2) ● yarn.app.mapreduce.client.job.max- retries o Default changed from 0 to 3 o Protects clients from failures that are transient.
  • 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved MapReduce (2/2) - Task Heap Management MAPREDUCE-5785 mapreduce.map.memory.mb mapreduce.map.java.opts Xmx Behaviour Configured 2048MB Configured 1638 MB No Change 1638MB Configured 2048 MB Not Configured Derived from mapreduce.map.memory.mb 1638MB Not Configured Configure 1638 MB Automatically inferred from Xmx in mapreduce.map.java.opts. 1638MB Not Configured Not Configured Default : 1024 MB Heap size no longer needs to be specified in task configuration and Java options.
  • 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved Hive on Tez ● Hive 3.0.0 Hive version supporting Hadoop 3 HIVE-16531 ● Does NOT support rolling upgrades ● Acid table format changes are not compatible with 2.x ● Tez version support for Hadoop 3 o Planned for release 0.10.0 ● TEZ-3923 Move master to Hadoop 3+ and create separate 0.9.x line ● TEZ-3252 - [Umbrella] Enable support for Hadoop-3.x
  • 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved Spark Ongoing efforts in community to build/validate Spark with Hadoop 3 ○ SPARK-23534 Umbrella jira to Build/test with Hadoop 3
  • 31. 31 © Hortonworks Inc. 2011–2018. All rights reserved Apache HBase ● HBase 2.0 supports Hadoop 3 ● Does NOT support Rolling Upgrades in major version upgrades (1.x to 2.x) ● Refer Upgrade documentation for further details https://github.com/apache/hbase/blob/master/src/main/asciidoc/_chapters/upgrading .adoc#upgrade2.0
  • 32. 32 © Hortonworks Inc. 2011–2018. All rights reserved • Apache Slider is retiring from Apache Incubator • Superseded by YARN Services. • Port your Slider apps to Yarn Services • Benefits of Yarn Services • Easier to manage and deploy • Single “yarnfile” to configure a Yarn Service • Supports container placement scheduling such as affinity and anti-affinity YARN-6592 • Rolling upgrades for containers and service YARN-7512 and YARN-4726. • Services UI in YARN UI2 improving debuggability and log access. Apache Slider Applications • YarnUI2 Integration • Rolling Upgrades for containers/Se rvices • Affinity • Anti-Affinity • Single “yarnfile” for services Easier to Manage Placement UIUpgrades
  • 33. 33 © Hortonworks Inc. 2011–2018. All rights reserved Hive on LLAP ● Now runs as a Yarn Service Application instead of a Slider App ● Version that supports LLAP as a YARN service is not released yet. ● Planned for release Hive-4.0.0/3.1.0 ● Refer https://hortonworks.com/blog/apache-hive-llap-as-a-yarn-service
  • 34. 34 © Hortonworks Inc. 2011–2018. All rights reserved PIG/Oozie Support for Hadoop 3 In-Progress in the community ● PIG ○ Planned for release – 0.18.0 ○ PIG-5253 Pig Hadoop 3 support ● OOZIE ○ Planned for release – 5.1.0 ○ OOZIE-2973 Make sure Oozie works with Hadoop 3
  • 35. 35 © Hortonworks Inc. 2011–2018. All rights reserved Other Aspects
  • 36. 36 © Hortonworks Inc. 2011–2018. All rights reserved Other Aspects Validations In-progress ● Performance testing ● Scale testing for HDFS/YARN ● OSes compatibility
  • 37. 37 © Hortonworks Inc. 2011–2018. All rights reserved Summary • Hadoop 3 • Eagerly awaited release with lots of new features and optimizations ! • 3.1.1 will be released soon with some bug fixes identified since 3.1.0 • Express Upgrades are recommended • Admins • A bit of work • Users • Should work mostly as-is • Community effort • HADOOP-15501 Upgrade efforts to Hadoop 3.x • Wiki - https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.x+to+3.x+Upgrade+Efforts • Volunteers needed for validating workload upgrades on Hadoop 3 !
  • 38. 38 © Hortonworks Inc. 2011–2018. All rights reserved Questions?
  • 39. 39 © Hortonworks Inc. 2011–2018. All rights reserved Thank you

Editor's Notes

  1. YARN Yarn scheduler Improvements Improves cluster throughput Distributed scheduling significantly Fine grained scheduling according to resource types - GPUs, FPGAs Support for Long running Services and Docker Revamped UI ATS v2 - More scalable and based on Hbase HDFS HDFS Federation HDFS Intra-DataNode Disk Balancer Erasure Coding Significant cost savings in storage - savings in storage cost Reduction of overhead from 200% to 50%
  2. TODO- Animation
  3. YARN Yarn scheduler Improvements Improves cluster throughput Distributed scheduling significantly Fine grained scheduling according to resource types - GPUs, FPGAs Support for Long running Services and Docker Revamped UI ATS v2 - More scalable and based on Hbase HDFS HDFS Federation HDFS Intra-DataNode Disk Balancer Erasure Coding Significant cost savings in storage - savings in storage cost Reduction of overhead from 200% to 50%
  4. Major version upgrade - there are some challenges and issues in supporting Rolling Upgrades Why rolling upgrades cannot be done? HDFS Rolling Upgrade has issues with NN restarts due to change in edit log format HDFS-13596 Hadoop daemons configured with a custom MetricsPlugin sink fail to start after upgrade HADOOP-15502 (API In-compatibility) HDFS-6440 DataNode layout change prevents downgrade. Recommend admins to ‘Express Upgrade’ clusters from Hadoop 2 to 3 Community effort for upgrade issues is being tracked here - HADOOP-15501
  5. Wire compatibility Preserves compatibility with Hadoop 2 clients Distcp/WebHDFS compatibility preserved API compatibility Not fully! Dependency version bumps Removal of deprecated APIs and tools Shell script rewrite, rework of Hadoop tools scripts Incompatible bug fixes
  6. Upgrade has been tested/validated from Apache Hadoop 2.8.4 to Hadoop 3.1.0 in our test environments Ongoing effort in community to release Hadoop 3.1.1 with a lot of fixes. Recommend upgrading lower Hadoop 2 versions to at least Hadoop 2.8.4 before migrating to Hadoop 3
  7. Fresh Install Fully automated via Apache Ambari Manual installation of RPMs/Tarballs Upgrade Fully automated via Apache Ambari Manual upgrade
  8. Fresh Install Fully automated via Apache Ambari Manual installation of RPMs/Tarballs Upgrade Fully automated via Apache Ambari Manual upgrade
  9. Fresh Install Fully automated via Apache Ambari Manual installation of RPMs/Tarballs Upgrade Fully automated via Apache Ambari Manual upgrade
  10. [10:46 PM] Rohith Sharma KS: -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/yarn-daemon.sh --config /usr/apache/conf start nodemanager WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR. WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. WARNING: Use of this script to start YARN daemons is deprecated. WARNING: Attempting to execute replacement "yarn --daemon start" instead. WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR. WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER. [10:47 PM] Rohith Sharma KS: -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/hadoop-daemon.sh --config /usr/apache/conf start datanode WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: Use of this script to start HDFS daemons is deprecated. WARNING: Attempting to execute replacement "hdfs --daemon start" instead. WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: HADOOP_SECURE_DN_PID_DIR has been replaced by HADOOP_SECURE_PID_DIR. Using value of HADOOP_SECURE_DN_PID_DIR. WARNING: HADOOP_SECURE_DN_LOG_DIR has been replaced by HADOOP_SECURE_LOG_DIR. Using value of HADOOP_SECURE_DN_LOG_DIR. WARNING: HADOOP_DATANODE_OPTS has been replaced by HDFS_DATANODE_OPTS. Using value of HADOOP_DATANODE_OPTS. ERROR: You must be a privileged user in order to run a secure service. [10:48 PM] Rohith Sharma KS: WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: Use of this script to stop HDFS daemons is deprecated. WARNING: Attempting to execute replacement "hdfs --daemon stop" instead. WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: HADOOP_NAMENODE_OPTS has been replaced by HDFS_NAMENODE_OPTS. Using value of HADOOP_NAMENODE_OPTS.
  11. [10:46 PM] Rohith Sharma KS: -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/yarn-daemon.sh --config /usr/apache/conf start nodemanager WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR. WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. WARNING: Use of this script to start YARN daemons is deprecated. WARNING: Attempting to execute replacement "yarn --daemon start" instead. WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR. WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER. [10:47 PM] Rohith Sharma KS: -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/hadoop-daemon.sh --config /usr/apache/conf start datanode WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: Use of this script to start HDFS daemons is deprecated. WARNING: Attempting to execute replacement "hdfs --daemon start" instead. WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: HADOOP_SECURE_DN_PID_DIR has been replaced by HADOOP_SECURE_PID_DIR. Using value of HADOOP_SECURE_DN_PID_DIR. WARNING: HADOOP_SECURE_DN_LOG_DIR has been replaced by HADOOP_SECURE_LOG_DIR. Using value of HADOOP_SECURE_DN_LOG_DIR. WARNING: HADOOP_DATANODE_OPTS has been replaced by HDFS_DATANODE_OPTS. Using value of HADOOP_DATANODE_OPTS. ERROR: You must be a privileged user in order to run a secure service. [10:48 PM] Rohith Sharma KS: WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: Use of this script to stop HDFS daemons is deprecated. WARNING: Attempting to execute replacement "hdfs --daemon stop" instead. WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: HADOOP_NAMENODE_OPTS has been replaced by HDFS_NAMENODE_OPTS. Using value of HADOOP_NAMENODE_OPTS.
  12. [10:46 PM] Rohith Sharma KS: -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/yarn-daemon.sh --config /usr/apache/conf start nodemanager WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR. WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of YARN_LOGFILE. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER. WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. WARNING: Use of this script to start YARN daemons is deprecated. WARNING: Attempting to execute replacement "yarn --daemon start" instead. WARNING: YARN_CONF_DIR has been replaced by HADOOP_CONF_DIR. Using value of YARN_CONF_DIR. WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR. WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of YARN_PID_DIR. WARNING: YARN_ROOT_LOGGER has been replaced by HADOOP_ROOT_LOGGER. Using value of YARN_ROOT_LOGGER. [10:47 PM] Rohith Sharma KS: -bash-4.2$ export HADOOP_LIBEXEC_DIR=/usr/apache/hadoop/libexec;/usr/apache/hadoop/sbin/hadoop-daemon.sh --config /usr/apache/conf start datanode WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: Use of this script to start HDFS daemons is deprecated. WARNING: Attempting to execute replacement "hdfs --daemon start" instead. WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: HADOOP_SECURE_DN_PID_DIR has been replaced by HADOOP_SECURE_PID_DIR. Using value of HADOOP_SECURE_DN_PID_DIR. WARNING: HADOOP_SECURE_DN_LOG_DIR has been replaced by HADOOP_SECURE_LOG_DIR. Using value of HADOOP_SECURE_DN_LOG_DIR. WARNING: HADOOP_DATANODE_OPTS has been replaced by HDFS_DATANODE_OPTS. Using value of HADOOP_DATANODE_OPTS. ERROR: You must be a privileged user in order to run a secure service. [10:48 PM] Rohith Sharma KS: WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: Use of this script to stop HDFS daemons is deprecated. WARNING: Attempting to execute replacement "hdfs --daemon stop" instead. WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. WARNING: HADOOP_NAMENODE_OPTS has been replaced by HDFS_NAMENODE_OPTS. Using value of HADOOP_NAMENODE_OPTS.
  13. HDFS Backup Configuration files Create a list of all the DataNodes in the cluster. hdfs dfsadmin -report > dfs-old-report-1.log Save Namespace hdfs dfsadmin -safemode enter hdfs dfsadmin -saveNamespace Backup the checkpoint files located in ${dfs.namenode.name.dir}/current Finalize any prior HDFS upgrade hdfs dfsadmin -finalizeUpgrade Create a fsimage for rollback hdfs dfsadmin -rollingUpgrade prepare YARN Stop all YARN queues Stop/Wait for Running applications to finish NOTE: YARN supports rolling upgrade!
  14. YARN Yarn scheduler Improvements Improves cluster throughput Distributed scheduling significantly Fine grained scheduling according to resource types - GPUs, FPGAs Support for Long running Services and Docker Revamped UI ATS v2 - More scalable and based on Hbase HDFS HDFS Federation HDFS Intra-DataNode Disk Balancer Erasure Coding Significant cost savings in storage - savings in storage cost Reduction of overhead from 200% to 50%
  15. Heap size no longer needs to be specified in task configuration and Java options.