SlideShare a Scribd company logo
Introduction To YARN
Adam Kawa, Spotify
          th
The 9 Meeting of Warsaw Hadoop User Group




2/23/13
About Me
Data Engineer at Spotify, Sweden
Hadoop Instructor at Compendium (Cloudera Training Partner)
+2.5 year of experience in Hadoop




2/23/13
“Classical” Apache Hadoop Cluster
Can consist of one, five, hundred or 4000 nodes




2/23/13
Hadoop Golden Era
One of the most exciting open-source projects on the planet
Becomes the standard for large-scale processing
Successfully deployed by hundreds/thousands of companies
Like a salutary virus
... But it has some little drawbacks



2/23/13
HDFS Main Limitations

Single NameNode that
   keeps all metadata in RAM
   performs all metadata operations
   becomes a single point of failure (SPOF)
(not the topic of this presentation… )

2/23/13
“Classical” MapReduce Limitations
Limited scalability
Poor resource utilization
Lack of support for the alternative frameworks
Lack of wire-compatible protocols




2/23/13
Limited Scalability
Scales to only
    ~4,000-node cluster
    ~40,000 concurrent tasks
Smaller and less-powerful clusters must be created




2/23/13
Image source: http://www.flickr.com/photos/rsms/660332848/sizes/l/in/set-72157607427138760/
Poor Cluster Resources Utilization




                       Hard-coded values. TaskTrackers
                       must be restarted after a change.



2/23/13
Poor Cluster Resources Utilization




   Map tasks are waiting for the slots   Hard-coded values. TaskTrackers
   which are                             must be restarted after a change.
   NOT currently used by Reduce tasks

2/23/13
Resources Needs That Varies Over Time
                             How to
                             hard-code the
                             number of map
                             and reduce slots
                             efficiently?




2/23/13
(Secondary) Benefits Of Better Utilization
The same calculation on more efficient Hadoop cluster
    Less data center space
    Less silicon waste
    Less power utilizations
    Less carbon emissions
    =
    Less disastrous environmental consequences
2/23/13
Image source: http://globeattractions.com/wp-content/uploads/2012/01/green-leaf-drops-green-hd-leaf-nature-wet.jpg
“Classical” MapReduce Daemons




2/23/13
JobTracker Responsibilities
Manages the computational resources (map and reduce slots)
Schedules all user jobs
   Schedules all tasks that belongs to a job
   Monitors tasks executions
   Restarts failed and speculatively runs slow tasks
   Calculates job counters totals

2/23/13
Simple Observation
JobTracker has lots tasks...




2/23/13
JobTracker Redesign Idea
Reduce responsibilities of JobTracker
   Separate cluster resource management from job
   coordination
   Use slaves (many of them!) to manage jobs life-cycle
Scale to (at least) 10K nodes, 10K jobs, 100K tasks



2/23/13
Disappearance Of The Centralized JobTracker




2/23/13
YARN – Yet Another Resource Negotiator




2/23/13
Resource Manager and Application Master




2/23/13
YARN Applications
MRv2 (simply MRv1 rewritten to run on top of YARN)
   No need to rewrite existing MapReduce jobs
Distributed shell
Spark, Apache S4 (a real-time processing)
Hamster (MPI on Hadoop)
Apache Giraph (a graph processing)
Apache HAMA (matrix, graph and network algorithms)
   ... and http://wiki.apache.org/hadoop/PoweredByYarn
2/23/13
NodeManager
More flexible and efficient than TaskTracker
Executes any computation that makes sense to ApplicationMaster
   Not only map or reduce tasks
Containers with variable resource sizes (e.g. RAM, CPU, network, disk)
   No hard-coded split into map and reduce slots




2/23/13
NodeManager Containers
NodeManager creates a container for each task
Container contains variable resources sizes
   e.g. 2GB RAM, 1 CPU, 1 disk
Number of created containers is limited
   By total sum of resources available on NodeManager



2/23/13
Application Submission In YARN




2/23/13
Resource Manager Components




2/23/13
Uberization
Running the small jobs in the same JVM as AplicationMaster

mapreduce.job.ubertask.enable                     true (false is default)
mapreduce.job.ubertask.maxmaps                    9*
mapreduce.job.ubertask.maxreduces 1*
mapreduce.job.ubertask.maxbytes                   dfs.block.size*


* Users may override these values, but only downward

2/23/13
Interesting Differences
Task progress, status, counters are passed directly to the
  Application Master
Shuffle handlers (long-running auxiliary services in Node
  Managers) serve map outputs to reduce tasks
YARN does not support JVM reuse for map/reduce tasks




2/23/13
Fault-Tollerance
Failure of the running tasks or NodeManagers is similar as in MRv1
Applications can be retried several times
      yarn.resourcemanager.am.max-retries         1
      yarn.app.mapreduce.am.job.recovery.enable   false
Resource Manager can start Application Master in a new container
      yarn.app.mapreduce.am.job.recovery.enable   false



2/23/13
Resource Manager Failure
Two interesting features
   RM restarts without the need to re-run currently
   running/submitted applications [YARN-128]
   ZK-based High Availability (HA) for RM [YARN-149] (still in
   progress)




2/23/13
Wire-Compatible Protocol
Client and cluster may use different versions
More manageable upgrades
   Rolling upgrades without disrupting the service
   Active and standby NameNodes upgraded independently
Protocol buffers chosen for serialization (instead of Writables)



2/23/13
YARN Maturity
Still considered as both production and not production-ready
   The code is already promoted to the trunk
   Yahoo! runs it on 2000 and 6000-node clusters
      Using Apache Hadoop 2.0.2 (Alpha)
Anyway, it will replace MRv1 sooner than later



2/23/13
How To Quickly Setup The YARN Cluster




2/23/13
Basic YARN Configuration Parameters
yarn.nodemanager.resource.memory-mb        8192
yarn.nodemanager.resource.cpu-cores        8


yarn.scheduler.minimum-allocation-mb       1024
yarn.scheduler.maximum-allocation-mb       8192
yarn.scheduler.minimum-allocation-vcores   1
yarn.scheduler.maximum-allocation-vcores   32



2/23/13
Basic MR Apps Configuration Parameters
mapreduce.map.cpu.vcores     1
mapreduce.reduce.cpu.vcores 1


mapreduce.map.memory.mb      1536
mapreduce.map.java.opts      -Xmx1024M
mapreduce.reduce.memory.mb   3072
mapreduce.reduce.java.opts   -Xmx2560M
mapreduce.task.io.sort.mb    512

2/23/13
Apache Whirr Configuration File




2/23/13
Running And Destroying Cluster
$ whirr launch-cluster --config hadoop-yarn.properties
$ whirr destroy-cluster --config hadoop-yarn.properties




2/23/13
It Might Be A Demo
But what about running
production applications
on the real cluster
consisting of hundreds of nodes?


Join Spotify!
jobs@spotify.com
2/23/13
2/23/13
Image source: http://allthingsd.com/files/2012/07/10Questions.jpeg
Thank You!




2/23/13

More Related Content

What's hot

Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
YounesCharfaoui
 
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFSOzone: An Object Store in HDFS
Ozone: An Object Store in HDFS
DataWorks Summit
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
Cloudera, Inc.
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
Lynn Langit
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Kevin Weil
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
 
Presto
PrestoPresto
Presto
Knoldus Inc.
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
Viet-Trung TRAN
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
GetInData
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
Knoldus Inc.
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Stanley Wang
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
confluent
 

What's hot (20)

Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFSOzone: An Object Store in HDFS
Ozone: An Object Store in HDFS
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Hadoop
HadoopHadoop
Hadoop
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Presto
PrestoPresto
Presto
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 

Viewers also liked

Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Hortonworks
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
Adam Kawa
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Adam Kawa
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
Adam Kawa
 
Apache Hadoop Java API
Apache Hadoop Java APIApache Hadoop Java API
Apache Hadoop Java API
Adam Kawa
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Adam Kawa
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesDataWorks Summit
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit
 
Introduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUGIntroduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUG
Adam Kawa
 
Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604
Chicago Hadoop Users Group
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
StampedeCon
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGAdam Kawa
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
Apache Apex
 
Multi risk factor model
Multi risk factor model Multi risk factor model
Multi risk factor model
Stefan Duprey
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
Vijay Srinivas Agneeswaran, Ph.D
 
Yarn
YarnYarn
Hadoop Migration from 0.20.2 to 2.0
Hadoop Migration from 0.20.2 to 2.0Hadoop Migration from 0.20.2 to 2.0
Hadoop Migration from 0.20.2 to 2.0
Jabir Ahmed
 
Hadoop YARN overview
Hadoop YARN overviewHadoop YARN overview
Hadoop YARN overview
Arnon Rotem-Gal-Oz
 
Apache spark Intro
Apache spark IntroApache spark Intro
Apache spark Intro
Tudor Lapusan
 

Viewers also liked (20)

Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
 
Apache Hadoop Java API
Apache Hadoop Java APIApache Hadoop Java API
Apache Hadoop Java API
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Introduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUGIntroduction To Elastic MapReduce at WHUG
Introduction To Elastic MapReduce at WHUG
 
Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Multi risk factor model
Multi risk factor model Multi risk factor model
Multi risk factor model
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 
Yarn
YarnYarn
Yarn
 
Hadoop Migration from 0.20.2 to 2.0
Hadoop Migration from 0.20.2 to 2.0Hadoop Migration from 0.20.2 to 2.0
Hadoop Migration from 0.20.2 to 2.0
 
Hadoop YARN overview
Hadoop YARN overviewHadoop YARN overview
Hadoop YARN overview
 
Apache spark Intro
Apache spark IntroApache spark Intro
Apache spark Intro
 

Similar to Apache Hadoop YARN

E031201032036
E031201032036E031201032036
E031201032036
ijceronline
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
RojaT4
 
H04502048051
H04502048051H04502048051
H04502048051
ijceronline
 
Map Reduce along with Amazon EMR
Map Reduce along with Amazon EMRMap Reduce along with Amazon EMR
Map Reduce along with Amazon EMR
ABC Talks
 
Presentation on Large Scale Data Management
Presentation on Large Scale Data ManagementPresentation on Large Scale Data Management
Presentation on Large Scale Data Management
Chris Bunch
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
Cloudera, Inc.
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
Urvashi Kataria
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
Adam Muise
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map Reduce
Edureka!
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
M Baddar
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Impetus Technologies
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Yahoo Developer Network
 
YARN (2).pptx
YARN (2).pptxYARN (2).pptx
YARN (2).pptx
Raja Ram Dutta
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
Ted Dunning
 
Hadoop 2.0 yarn arch training
Hadoop 2.0 yarn arch trainingHadoop 2.0 yarn arch training
Hadoop 2.0 yarn arch training
Nandan Kumar
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Hortonworks
 
Taming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data AnalyticsTaming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data Analytics
EMC
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
MapR Technologies
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
Leila panahi
 
Hadoop with Lustre WhitePaper
Hadoop with Lustre WhitePaperHadoop with Lustre WhitePaper
Hadoop with Lustre WhitePaper
David Luan
 

Similar to Apache Hadoop YARN (20)

E031201032036
E031201032036E031201032036
E031201032036
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
 
H04502048051
H04502048051H04502048051
H04502048051
 
Map Reduce along with Amazon EMR
Map Reduce along with Amazon EMRMap Reduce along with Amazon EMR
Map Reduce along with Amazon EMR
 
Presentation on Large Scale Data Management
Presentation on Large Scale Data ManagementPresentation on Large Scale Data Management
Presentation on Large Scale Data Management
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map Reduce
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
 
YARN (2).pptx
YARN (2).pptxYARN (2).pptx
YARN (2).pptx
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
Hadoop 2.0 yarn arch training
Hadoop 2.0 yarn arch trainingHadoop 2.0 yarn arch training
Hadoop 2.0 yarn arch training
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Taming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data AnalyticsTaming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data Analytics
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
Hadoop with Lustre WhitePaper
Hadoop with Lustre WhitePaperHadoop with Lustre WhitePaper
Hadoop with Lustre WhitePaper
 

Recently uploaded

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 

Recently uploaded (20)

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 

Apache Hadoop YARN

  • 1. Introduction To YARN Adam Kawa, Spotify th The 9 Meeting of Warsaw Hadoop User Group 2/23/13
  • 2. About Me Data Engineer at Spotify, Sweden Hadoop Instructor at Compendium (Cloudera Training Partner) +2.5 year of experience in Hadoop 2/23/13
  • 3. “Classical” Apache Hadoop Cluster Can consist of one, five, hundred or 4000 nodes 2/23/13
  • 4. Hadoop Golden Era One of the most exciting open-source projects on the planet Becomes the standard for large-scale processing Successfully deployed by hundreds/thousands of companies Like a salutary virus ... But it has some little drawbacks 2/23/13
  • 5. HDFS Main Limitations Single NameNode that keeps all metadata in RAM performs all metadata operations becomes a single point of failure (SPOF) (not the topic of this presentation… ) 2/23/13
  • 6. “Classical” MapReduce Limitations Limited scalability Poor resource utilization Lack of support for the alternative frameworks Lack of wire-compatible protocols 2/23/13
  • 7. Limited Scalability Scales to only ~4,000-node cluster ~40,000 concurrent tasks Smaller and less-powerful clusters must be created 2/23/13 Image source: http://www.flickr.com/photos/rsms/660332848/sizes/l/in/set-72157607427138760/
  • 8. Poor Cluster Resources Utilization Hard-coded values. TaskTrackers must be restarted after a change. 2/23/13
  • 9. Poor Cluster Resources Utilization Map tasks are waiting for the slots Hard-coded values. TaskTrackers which are must be restarted after a change. NOT currently used by Reduce tasks 2/23/13
  • 10. Resources Needs That Varies Over Time How to hard-code the number of map and reduce slots efficiently? 2/23/13
  • 11. (Secondary) Benefits Of Better Utilization The same calculation on more efficient Hadoop cluster Less data center space Less silicon waste Less power utilizations Less carbon emissions = Less disastrous environmental consequences 2/23/13 Image source: http://globeattractions.com/wp-content/uploads/2012/01/green-leaf-drops-green-hd-leaf-nature-wet.jpg
  • 13. JobTracker Responsibilities Manages the computational resources (map and reduce slots) Schedules all user jobs Schedules all tasks that belongs to a job Monitors tasks executions Restarts failed and speculatively runs slow tasks Calculates job counters totals 2/23/13
  • 14. Simple Observation JobTracker has lots tasks... 2/23/13
  • 15. JobTracker Redesign Idea Reduce responsibilities of JobTracker Separate cluster resource management from job coordination Use slaves (many of them!) to manage jobs life-cycle Scale to (at least) 10K nodes, 10K jobs, 100K tasks 2/23/13
  • 16. Disappearance Of The Centralized JobTracker 2/23/13
  • 17. YARN – Yet Another Resource Negotiator 2/23/13
  • 18. Resource Manager and Application Master 2/23/13
  • 19. YARN Applications MRv2 (simply MRv1 rewritten to run on top of YARN) No need to rewrite existing MapReduce jobs Distributed shell Spark, Apache S4 (a real-time processing) Hamster (MPI on Hadoop) Apache Giraph (a graph processing) Apache HAMA (matrix, graph and network algorithms) ... and http://wiki.apache.org/hadoop/PoweredByYarn 2/23/13
  • 20. NodeManager More flexible and efficient than TaskTracker Executes any computation that makes sense to ApplicationMaster Not only map or reduce tasks Containers with variable resource sizes (e.g. RAM, CPU, network, disk) No hard-coded split into map and reduce slots 2/23/13
  • 21. NodeManager Containers NodeManager creates a container for each task Container contains variable resources sizes e.g. 2GB RAM, 1 CPU, 1 disk Number of created containers is limited By total sum of resources available on NodeManager 2/23/13
  • 24. Uberization Running the small jobs in the same JVM as AplicationMaster mapreduce.job.ubertask.enable true (false is default) mapreduce.job.ubertask.maxmaps 9* mapreduce.job.ubertask.maxreduces 1* mapreduce.job.ubertask.maxbytes dfs.block.size* * Users may override these values, but only downward 2/23/13
  • 25. Interesting Differences Task progress, status, counters are passed directly to the Application Master Shuffle handlers (long-running auxiliary services in Node Managers) serve map outputs to reduce tasks YARN does not support JVM reuse for map/reduce tasks 2/23/13
  • 26. Fault-Tollerance Failure of the running tasks or NodeManagers is similar as in MRv1 Applications can be retried several times yarn.resourcemanager.am.max-retries 1 yarn.app.mapreduce.am.job.recovery.enable false Resource Manager can start Application Master in a new container yarn.app.mapreduce.am.job.recovery.enable false 2/23/13
  • 27. Resource Manager Failure Two interesting features RM restarts without the need to re-run currently running/submitted applications [YARN-128] ZK-based High Availability (HA) for RM [YARN-149] (still in progress) 2/23/13
  • 28. Wire-Compatible Protocol Client and cluster may use different versions More manageable upgrades Rolling upgrades without disrupting the service Active and standby NameNodes upgraded independently Protocol buffers chosen for serialization (instead of Writables) 2/23/13
  • 29. YARN Maturity Still considered as both production and not production-ready The code is already promoted to the trunk Yahoo! runs it on 2000 and 6000-node clusters Using Apache Hadoop 2.0.2 (Alpha) Anyway, it will replace MRv1 sooner than later 2/23/13
  • 30. How To Quickly Setup The YARN Cluster 2/23/13
  • 31. Basic YARN Configuration Parameters yarn.nodemanager.resource.memory-mb 8192 yarn.nodemanager.resource.cpu-cores 8 yarn.scheduler.minimum-allocation-mb 1024 yarn.scheduler.maximum-allocation-mb 8192 yarn.scheduler.minimum-allocation-vcores 1 yarn.scheduler.maximum-allocation-vcores 32 2/23/13
  • 32. Basic MR Apps Configuration Parameters mapreduce.map.cpu.vcores 1 mapreduce.reduce.cpu.vcores 1 mapreduce.map.memory.mb 1536 mapreduce.map.java.opts -Xmx1024M mapreduce.reduce.memory.mb 3072 mapreduce.reduce.java.opts -Xmx2560M mapreduce.task.io.sort.mb 512 2/23/13
  • 34. Running And Destroying Cluster $ whirr launch-cluster --config hadoop-yarn.properties $ whirr destroy-cluster --config hadoop-yarn.properties 2/23/13
  • 35. It Might Be A Demo But what about running production applications on the real cluster consisting of hundreds of nodes? Join Spotify! jobs@spotify.com 2/23/13