SlideShare a Scribd company logo
Yarns about YARN 
Migrating to MapReduce v2 
Kathleen Ting, kate@cloudera.com 
Strata Hadoop Barcelona, 21 November 2014
© 2014 Cloudera, Inc. All rights reserved. 2 
$ whoami 
Kathleen Ting 
• Joined Cloudera in 2011 
• Former customer operations 
engineer 
• Technical account manager 
• Apache Sqoop Cookbook co-author
Cloudera and Apache Hadoop 
• Apache Hadoop is an open source software framework for distributed 
storage and distributed processing of Big Data on clusters of 
commodity hardware. 
• Cloudera is revolutionizing enterprise data management by offering the 
first unified Platform for Big Data, an enterprise data hub built on 
Apache Hadoop. 
• Distributes CDH, a distribution Hadoop Distribution. 
• Teaches, consults, and supports customers building applications on the Hadoop 
stack. 
• The world-wide Cloudera Customer Operations Engineering team has closed tens 
of thousands of support incidents over six years. 
© 2014 Cloudera, Inc. All rights reserved. 3
© 2014 Cloudera, Inc. All rights reserved. 4 
Outline 
• MR2 motivation 
• Upgrading MR1 to MR2 
• YARN upgrade pitfalls 
• YARN applications
© 2014 Cloudera and/or its affiliates. All rights reserved. 5 
MR2 motivation
Search: Solr 
Hadoop 
Daemons 
© 2014 Cloudera, Inc. All rights reserved. 6 
Hadoop 
Languages/APIs: Hive, Pig, Crunch, Kite, 
Machine 
Hadoop 
Daemons 
JVM 
Linux 
Disk, CPU, Mem 
Mahout 
Resource Management: YARN 
Machine 
Hadoop 
Daemons 
JVM 
Linux 
Disk, CPU, Mem 
Machine 
Hadoop 
Daemons 
JVM 
Linux 
Disk, CPU, Mem 
Machine 
JVM 
Linux 
Disk, CPU, Mem 
In mem processing: 
Spark 
Interactive SQL: 
Batch processing Impala 
MapReduce 
Coordination: 
ZooKeeper; 
Security: Sentry 
Event ingest: 
Flume, Kafka 
DB Import 
Export: Sqoop 
User interface: HUE 
Other 
systems: 
httpd, sas, 
custom 
apps 
etc 
Random Access Storage: HBase 
File Storage: HDFS
© 2014 Cloudera, Inc. All rights reserved. 7
© 2014 Cloudera, Inc. All rights reserved. 8 
MapReduce v1 
Job Tracker 
Task 
Tracker 
Task 
Tracker 
Task 
Tracker 
Map 
Map 
Reduce 
Map 
Map 
Reduce 
Client 
Client 
Karthik Kambatla, http://www.meetup.com/LA-HUG/events/184628072/
© 2014 Cloudera, Inc. All rights reserved. 9 
MapReduce v2 
http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce-2-on-yarn-for-operators/
© 2014 Cloudera, Inc. All rights reserved. 10 
YARN motivation 
MR1 MR2 
Scalability JobTracker tracks all jobs, tasks 
Max out at 4k nodes , 40k tasks 
Split up tracking between 
ResourceManager, ApplicationMaster 
Scale up to 10k nodes, 100k tasks 
Availability JT HA RM HA & for per-application basis 
Utilization Fixed size slots for map, reduce Allocate only as many resources as needed, 
allows cluster utilization > 70% 
Multi-tenancy N/A Cluster resource management system 
Data locality & lowered operational costs 
from sharing resources between frameworks
Upgrading MR1 to MR2 
© 2014 Cloudera and/or its affiliates. All rights reserved. 11
© 2014 Cloudera, Inc. All rights reserved. 12 
MR1 to MR2 functionality mapping 
• Completely revamped architecture in MR2 on YARN 
• While most translate, some configurations don’t
© 2014 Cloudera, Inc. All rights reserved. 13 
MR1 to MR2 functionality mapping 
MR1 MR2 
TaskTracker NodeManager (launch containers, slave) 
JobTracker JobHistoryServer (store completed job, http://rmhost:19888/jobhistory) 
ResourceManager (scheduler, master) 
ApplicationMaster (manage MR job) 
Slots Containers (fungible)
© 2014 Cloudera, Inc. All rights reserved. 14 
MR1 to MR2 memory allocation mapping 
MR1 MR2 
Memory 
allocations 
mapred.child.java.opts 
mapred.child.ulimit 
3 mappers :: 2 reducers ratio 
Memory slots = 
yarn.nodemanager.resource.memory-mb ÷ 
mapreduce.[map|reduce].memory.mb 
CPU slots = 
yarn.nodemanager.resource.cpu-vcores ÷ 
mapreduce.[map|reduce].cpu.vcores 
Set yarn.nodemanager.resource.cpu-vcores = 
# of physical cores 
(prevents swapping or OOM)
© 2014 Cloudera, Inc. All rights reserved. 15 
MR1 to MR2 task mapping 
MR1 MR2 
Tasks 
launched 
per node 
mapred.map.tasks.maximum 
mapred.reduce.tasks.maximum 
total tasks < total # of cores 
Note this makes for an 
underutilized cluster 
minimum (memory slots, 
CPU slots) 
Note no distinguishing 
between map and reduce 
tasks
© 2014 Cloudera, Inc. All rights reserved. 16 
YARN compatibility 
Migration path Binary 
support? 
MR1 (CDH4) to MR2 (CDH5) ✔ 
MR1 (CDH4) to MR1 (CDH5) ✔ 
MR2 (CDH4) to MR1/MR2 (CDH5) ✖ 
• Migrating to MR2 on YARN 
• For Operators: 
http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce- 
2-on-yarn-for-operators/ 
• For Users: 
http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce- 
2-on-yarn-for-users/ 
• http://blog.cloudera.com/blog/2014/04/apache-hadoop- 
yarn-avoiding-6-time-consuming-gotchas/ 
• Getting MR2 Up to Speed 
• http://blog.cloudera.com/blog/2014/02/getting-mapreduce- 
2-up-to-speed/ 
• Avoiding YARN Gotchas 
• http://blog.cloudera.com/blog/2014/04/apache-hadoop- 
yarn-avoiding-6-time-consuming-gotchas/ 
CDH has complete binary compatibility and 
source compatibility for almost all programs.
© 2014 Cloudera and/or its affiliates. All rights reserved. 17 
YARN upgrade pitfalls
© 2014 Cloudera, Inc. All rights reserved. 18 
The Law of Cluster Inertia 
A cluster in a good state stays in a good state, 
and 
a cluster in a bad state stays in a bad state, 
unless 
acted upon by an external force.
© 2014 Cloudera, Inc. All rights reserved. 19
© 2014 Cloudera, Inc. All rights reserved. 20 
An upgrade is an external force 
Upgrades 
• Linux 
• Hadoop 
• Java 
Misconfiguration 
• Memory Mismanagement 
• Thread Mismanagement 
• Disk Mismanagement
Tips on mitigating upgrades (from Eric Sammer’s Hadoop Operations) 
Order matters KISS Check units 
© 2014 Cloudera, Inc. All rights reserved. 21 
Understand the precedence of 
configuration value overrides. 
Get the cluster working with the 
minimal viable set before you 
add on security and 
performance parameters. 
Double-check the unit of each 
parameter you set.
Tips on mitigating upgrades (from Eric Sammer’s Hadoop Operations) 
Audit Cloudera Manager Hello World 
© 2014 Cloudera, Inc. All rights reserved. 22 
Frequently confirm whether 
parameters changed in meaning 
or scope. 
Use a configuration 
management and deployment 
system to ensure that all files are 
up-to-date on all hosts. 
Before and after making 
configuration changes, run 
benchmarks to evaluate the 
performance and resource impact.
© 2014 Cloudera, Inc. All rights reserved. 23 
Frozen jobs after upgrade to YARN 
Symptom: 
After upgrading to 
YARN, applications 
were waiting 7+ 
hours to launch
© 2014 Cloudera, Inc. All rights reserved. 24 
Frozen jobs after upgrade to YARN 
Evidence: 
• Initially thought it was a GC 
issue 
• RM had frozen and without 
RM, no tasks or containers 
can be launched, rendering 
cluster useless
© 2014 Cloudera, Inc. All rights reserved. 25 
Frozen jobs after upgrade to YARN 
Solution: 
• Fixed by YARN-713 (CDH5.1.1/ 
Hadoop 2.4.0), YARN-2313 
(CDH5.2/Hadoop 2.6.0), 
YARN-2359 (CDH5.1.1/Hadoop 
2.6.0) 
• Continuous scheduling reduces the 
allocation time of containers when there is 
capacity available 
Workaround: 
• Known JDK7 issue (sorting algorithm) 
• -Djava.util.Arrays.useLegacyMergeSort=true 
• Disable Fair Scheduler Continuous 
Scheduling (reduces RM load by 
coupling container scheduling 
decisions with NM heartbeats) 
• Configure RM HA to recover apps 
running on the cluster when RM died
© 2014 Cloudera, Inc. All rights reserved. 26 
Killing of tasks after upgrade to YARN 
Symptom: 
After upgrading to 
YARN, tasks were 
being killed
© 2014 Cloudera, Inc. All rights reserved. 27 
Killing of tasks after upgrade to YARN 
Evidence: 
Even with sufficient physical 
memory available, RHEL6 
bug was causing JVM to take 
up to 100x as much virtual 
memory as physical memory 
http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn- 
avoiding-6-time-consuming-gotchas/ 
Application 
application_1392853856445_0900 
failed 2 times due to AM 
Container for 
appattempt_1392853856445_0900_ 
000002 exited with exitCode: 
143 due to: Container 
[pid=27408,containerID=contain 
er_1392853856445_0900_02_00000 
1] is running beyond virtual 
memory limits. 
Current usage: 337.6 MB of 1 
GB physical memory used; 2.2 
GB of 2.1 GB virtual memory 
used. Killing container.
© 2014 Cloudera, Inc. All rights reserved. 28 
Killing of tasks after upgrade to YARN 
Solution: 
• Turn off virtual memory limits as 
YARN can kill tasks perceived to 
even virtually take up too much 
memory 
• Set yarn.nodemanager.vmem-check- 
enabled = false (fixed in 
CDH5.0.0)
© 2014 Cloudera and/or its affiliates. All rights reserved. 29 
YARN applications
© 2014 Cloudera, Inc. All rights reserved. 30 
Configuring YARN for Spark 
• Designed for interactive queries and 
iterative algorithms 
• In-memory caching, DAG engine, and APIs 
• Set yarn.scheduler.maximum-allocation-mb 
as high as 64G on a machine with 192GB of 
memory 
• Won’t run with small (< 1 GB) containers 
because starts fixed number of executors 
on the cluster 
http://blog.cloudera.com/blog/2014/05/apache-spark- 
resource-management-and-yarn-app-models/
© 2014 Cloudera, Inc. All rights reserved. 31 
YARN applications 
• Llama (Low Latency Application MAster) 
• Reserves memory in YARN for short-lived processes (e.g. Impala) 
• Registers one long-lived AM per YARN pool 
• Caches resources allocated by YARN for a short time, so that they can be quickly re-allocated 
to Impala queries 
• Long-term solution is to run Impala on YARN but currently recommend setting up admission 
control
© 2014 Cloudera, Inc. All rights reserved. 32 
YARN applications 
• Apache Slider (incubating) née Hoya 
• Runs long-lived persistent services on YARN (e.g. HBase) 
• Not currently recommended as it doesn’t provide IO isolation
© 2014 Cloudera and/or its affiliates. All rights reserved. 33 
Conclusion
© 2014 Cloudera, Inc. All rights reserved. 34 
YARN performance 
• Improved cluster utilization 
• Can run more jobs in smaller clusters 
• Run in uber mode for smaller jobs (reduces AM 
overhead) 
• Dynamic resource sharing between 
frameworks 
• One framework can use the entire cluster 
• Tom White’s Hadoop: The Definitive Guide 
4th Ed (to be released) 
• Chapter 4 is on YARN
YARN vs Mesos: Resource Manager’s role 
© 2014 Cloudera, Inc. All rights reserved. 35 
YARN Mesos 
Asks for resources Offers resources 
Evolved into a resource manager Evolved into managing Hadoop 
Written in Java Written in C++ 
Locality aware More customizable
Cloudera Live Cloudera Community In Europe! 
© 2014 Cloudera, Inc. All rights reserved. 36 
Shameless plug 
A full demo cluster + tutorial and sample 
data/queries in the cloud – free access 
for two weeks (plus, clusters in Tableau 
and Zoomdata flavors!) 
cloudera.com/live 
Ask technical questions/get 
technical answers from Cloudera 
Software Engineers and other 
users 
cloudera.com/community 
Account Executives, Solutions 
Architects, Sales Engineers, 
Trainers 
cloudera.com/careers
@kate_ting 
Office hours 
@Table C in 10 min

More Related Content

What's hot

Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and Future
DataWorks Summit
 
Empower Hive with Spark
Empower Hive with SparkEmpower Hive with Spark
Empower Hive with Spark
DataWorks Summit
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
Uwe Printz
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Hortonworks
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
David Kaiser
 
Hive Now Sparks
Hive Now SparksHive Now Sparks
Hive Now Sparks
DataWorks Summit
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
Vigen Sahakyan
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and Future
Vinod Kumar Vavilapalli
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
Uwe Printz
 
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureHadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Vinod Kumar Vavilapalli
 
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Simplilearn
 
Yarn
YarnYarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
hdhappy001
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
DataWorks Summit
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
DataWorks Summit
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
 
Hadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceHadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduce
Uwe Printz
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
DataWorks Summit
 

What's hot (20)

Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and Future
 
Empower Hive with Spark
Empower Hive with SparkEmpower Hive with Spark
Empower Hive with Spark
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
 
Hive Now Sparks
Hive Now SparksHive Now Sparks
Hive Now Sparks
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and Future
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
 
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureHadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
 
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
 
Yarn
YarnYarn
Yarn
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
 
Hadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceHadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduce
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 

Viewers also liked

Spark tuning2016may11bida
Spark tuning2016may11bidaSpark tuning2016may11bida
Spark tuning2016may11bida
Anya Bida
 
451 Research Impact Report
451 Research Impact Report451 Research Impact Report
451 Research Impact Report
Infochimps, a CSC Big Data Business
 
Hw09 Cloudera Desktop In Detail
Hw09   Cloudera Desktop In DetailHw09   Cloudera Desktop In Detail
Hw09 Cloudera Desktop In Detail
Cloudera, Inc.
 
The Future of Data
The Future of DataThe Future of Data
The Future of Data
blynnbuckley
 
Cloudera introduction
Cloudera introductionCloudera introduction
Cloudera introduction
Phate334
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN Apps
Cloudera, Inc.
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hive
rxu
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
Cloudera, Inc.
 
Tuning Java for Big Data
Tuning Java for Big DataTuning Java for Big Data
Tuning Java for Big Data
Scott Seighman
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera Hadoop
David Yahalom
 
Hadoop administration using cloudera student lab guidebook
Hadoop administration using cloudera   student lab guidebookHadoop administration using cloudera   student lab guidebook
Hadoop administration using cloudera student lab guidebook
Niranjan Pandey
 
Hadoop & Cloudera Workshop
Hadoop & Cloudera WorkshopHadoop & Cloudera Workshop
Hadoop & Cloudera Workshop
Serkan Sakınmaz
 
Cloudera Desktop
Cloudera DesktopCloudera Desktop
Cloudera Desktop
Hadoop User Group
 
Hadoop I/O Analysis
Hadoop I/O AnalysisHadoop I/O Analysis
Hadoop I/O Analysis
Richard McDougall
 
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopData Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
Cloudera, Inc.
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
Spark Summit
 
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2
IMC Institute
 
Yarn & textile sculpture
Yarn & textile sculptureYarn & textile sculpture
Yarn & textile sculpture
Tamsyn King
 
TYPES OF YARNS & APPLICATION& PROPERTIES
TYPES OF YARNS & APPLICATION& PROPERTIESTYPES OF YARNS & APPLICATION& PROPERTIES
TYPES OF YARNS & APPLICATION& PROPERTIES
Tina Dhingra
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting Guide
IBM
 

Viewers also liked (20)

Spark tuning2016may11bida
Spark tuning2016may11bidaSpark tuning2016may11bida
Spark tuning2016may11bida
 
451 Research Impact Report
451 Research Impact Report451 Research Impact Report
451 Research Impact Report
 
Hw09 Cloudera Desktop In Detail
Hw09   Cloudera Desktop In DetailHw09   Cloudera Desktop In Detail
Hw09 Cloudera Desktop In Detail
 
The Future of Data
The Future of DataThe Future of Data
The Future of Data
 
Cloudera introduction
Cloudera introductionCloudera introduction
Cloudera introduction
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN Apps
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hive
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
 
Tuning Java for Big Data
Tuning Java for Big DataTuning Java for Big Data
Tuning Java for Big Data
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera Hadoop
 
Hadoop administration using cloudera student lab guidebook
Hadoop administration using cloudera   student lab guidebookHadoop administration using cloudera   student lab guidebook
Hadoop administration using cloudera student lab guidebook
 
Hadoop & Cloudera Workshop
Hadoop & Cloudera WorkshopHadoop & Cloudera Workshop
Hadoop & Cloudera Workshop
 
Cloudera Desktop
Cloudera DesktopCloudera Desktop
Cloudera Desktop
 
Hadoop I/O Analysis
Hadoop I/O AnalysisHadoop I/O Analysis
Hadoop I/O Analysis
 
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopData Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
 
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2
 
Yarn & textile sculpture
Yarn & textile sculptureYarn & textile sculpture
Yarn & textile sculpture
 
TYPES OF YARNS & APPLICATION& PROPERTIES
TYPES OF YARNS & APPLICATION& PROPERTIESTYPES OF YARNS & APPLICATION& PROPERTIES
TYPES OF YARNS & APPLICATION& PROPERTIES
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting Guide
 

Similar to Yarns About Yarn

Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
Wei-Chiu Chuang
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
 
YARN
YARNYARN
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Cloudera, Inc.
 
Virtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In ChineseVirtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In Chinese
天青 王
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
BigDataEverywhere
 
MapR Unique features
MapR Unique featuresMapR Unique features
MapR Unique features
Vishwas Tengse
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Cloudera, Inc.
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
Kathleen Ting
 
HugNov14
HugNov14HugNov14
HugNov14
Adam Faris
 
Building Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache SparkBuilding Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache Spark
Jeremy Beard
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
Cloudera, Inc.
 
Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0
Big Data Joe™ Rossi
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
bigdatagurus_meetup
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwords
Szehon Ho
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
cdmaxime
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
markgrover
 
Fraud Detection using Hadoop
Fraud Detection using HadoopFraud Detection using Hadoop
Fraud Detection using Hadoop
hadooparchbook
 

Similar to Yarns About Yarn (20)

Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
YARN
YARNYARN
YARN
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
 
Virtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In ChineseVirtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In Chinese
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
 
MapR Unique features
MapR Unique featuresMapR Unique features
MapR Unique features
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
HugNov14
HugNov14HugNov14
HugNov14
 
Building Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache SparkBuilding Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache Spark
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 
Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwords
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Fraud Detection using Hadoop
Fraud Detection using HadoopFraud Detection using Hadoop
Fraud Detection using Hadoop
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 

Recently uploaded (20)

Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 

Yarns About Yarn

  • 1. Yarns about YARN Migrating to MapReduce v2 Kathleen Ting, kate@cloudera.com Strata Hadoop Barcelona, 21 November 2014
  • 2. © 2014 Cloudera, Inc. All rights reserved. 2 $ whoami Kathleen Ting • Joined Cloudera in 2011 • Former customer operations engineer • Technical account manager • Apache Sqoop Cookbook co-author
  • 3. Cloudera and Apache Hadoop • Apache Hadoop is an open source software framework for distributed storage and distributed processing of Big Data on clusters of commodity hardware. • Cloudera is revolutionizing enterprise data management by offering the first unified Platform for Big Data, an enterprise data hub built on Apache Hadoop. • Distributes CDH, a distribution Hadoop Distribution. • Teaches, consults, and supports customers building applications on the Hadoop stack. • The world-wide Cloudera Customer Operations Engineering team has closed tens of thousands of support incidents over six years. © 2014 Cloudera, Inc. All rights reserved. 3
  • 4. © 2014 Cloudera, Inc. All rights reserved. 4 Outline • MR2 motivation • Upgrading MR1 to MR2 • YARN upgrade pitfalls • YARN applications
  • 5. © 2014 Cloudera and/or its affiliates. All rights reserved. 5 MR2 motivation
  • 6. Search: Solr Hadoop Daemons © 2014 Cloudera, Inc. All rights reserved. 6 Hadoop Languages/APIs: Hive, Pig, Crunch, Kite, Machine Hadoop Daemons JVM Linux Disk, CPU, Mem Mahout Resource Management: YARN Machine Hadoop Daemons JVM Linux Disk, CPU, Mem Machine Hadoop Daemons JVM Linux Disk, CPU, Mem Machine JVM Linux Disk, CPU, Mem In mem processing: Spark Interactive SQL: Batch processing Impala MapReduce Coordination: ZooKeeper; Security: Sentry Event ingest: Flume, Kafka DB Import Export: Sqoop User interface: HUE Other systems: httpd, sas, custom apps etc Random Access Storage: HBase File Storage: HDFS
  • 7. © 2014 Cloudera, Inc. All rights reserved. 7
  • 8. © 2014 Cloudera, Inc. All rights reserved. 8 MapReduce v1 Job Tracker Task Tracker Task Tracker Task Tracker Map Map Reduce Map Map Reduce Client Client Karthik Kambatla, http://www.meetup.com/LA-HUG/events/184628072/
  • 9. © 2014 Cloudera, Inc. All rights reserved. 9 MapReduce v2 http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce-2-on-yarn-for-operators/
  • 10. © 2014 Cloudera, Inc. All rights reserved. 10 YARN motivation MR1 MR2 Scalability JobTracker tracks all jobs, tasks Max out at 4k nodes , 40k tasks Split up tracking between ResourceManager, ApplicationMaster Scale up to 10k nodes, 100k tasks Availability JT HA RM HA & for per-application basis Utilization Fixed size slots for map, reduce Allocate only as many resources as needed, allows cluster utilization > 70% Multi-tenancy N/A Cluster resource management system Data locality & lowered operational costs from sharing resources between frameworks
  • 11. Upgrading MR1 to MR2 © 2014 Cloudera and/or its affiliates. All rights reserved. 11
  • 12. © 2014 Cloudera, Inc. All rights reserved. 12 MR1 to MR2 functionality mapping • Completely revamped architecture in MR2 on YARN • While most translate, some configurations don’t
  • 13. © 2014 Cloudera, Inc. All rights reserved. 13 MR1 to MR2 functionality mapping MR1 MR2 TaskTracker NodeManager (launch containers, slave) JobTracker JobHistoryServer (store completed job, http://rmhost:19888/jobhistory) ResourceManager (scheduler, master) ApplicationMaster (manage MR job) Slots Containers (fungible)
  • 14. © 2014 Cloudera, Inc. All rights reserved. 14 MR1 to MR2 memory allocation mapping MR1 MR2 Memory allocations mapred.child.java.opts mapred.child.ulimit 3 mappers :: 2 reducers ratio Memory slots = yarn.nodemanager.resource.memory-mb ÷ mapreduce.[map|reduce].memory.mb CPU slots = yarn.nodemanager.resource.cpu-vcores ÷ mapreduce.[map|reduce].cpu.vcores Set yarn.nodemanager.resource.cpu-vcores = # of physical cores (prevents swapping or OOM)
  • 15. © 2014 Cloudera, Inc. All rights reserved. 15 MR1 to MR2 task mapping MR1 MR2 Tasks launched per node mapred.map.tasks.maximum mapred.reduce.tasks.maximum total tasks < total # of cores Note this makes for an underutilized cluster minimum (memory slots, CPU slots) Note no distinguishing between map and reduce tasks
  • 16. © 2014 Cloudera, Inc. All rights reserved. 16 YARN compatibility Migration path Binary support? MR1 (CDH4) to MR2 (CDH5) ✔ MR1 (CDH4) to MR1 (CDH5) ✔ MR2 (CDH4) to MR1/MR2 (CDH5) ✖ • Migrating to MR2 on YARN • For Operators: http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce- 2-on-yarn-for-operators/ • For Users: http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce- 2-on-yarn-for-users/ • http://blog.cloudera.com/blog/2014/04/apache-hadoop- yarn-avoiding-6-time-consuming-gotchas/ • Getting MR2 Up to Speed • http://blog.cloudera.com/blog/2014/02/getting-mapreduce- 2-up-to-speed/ • Avoiding YARN Gotchas • http://blog.cloudera.com/blog/2014/04/apache-hadoop- yarn-avoiding-6-time-consuming-gotchas/ CDH has complete binary compatibility and source compatibility for almost all programs.
  • 17. © 2014 Cloudera and/or its affiliates. All rights reserved. 17 YARN upgrade pitfalls
  • 18. © 2014 Cloudera, Inc. All rights reserved. 18 The Law of Cluster Inertia A cluster in a good state stays in a good state, and a cluster in a bad state stays in a bad state, unless acted upon by an external force.
  • 19. © 2014 Cloudera, Inc. All rights reserved. 19
  • 20. © 2014 Cloudera, Inc. All rights reserved. 20 An upgrade is an external force Upgrades • Linux • Hadoop • Java Misconfiguration • Memory Mismanagement • Thread Mismanagement • Disk Mismanagement
  • 21. Tips on mitigating upgrades (from Eric Sammer’s Hadoop Operations) Order matters KISS Check units © 2014 Cloudera, Inc. All rights reserved. 21 Understand the precedence of configuration value overrides. Get the cluster working with the minimal viable set before you add on security and performance parameters. Double-check the unit of each parameter you set.
  • 22. Tips on mitigating upgrades (from Eric Sammer’s Hadoop Operations) Audit Cloudera Manager Hello World © 2014 Cloudera, Inc. All rights reserved. 22 Frequently confirm whether parameters changed in meaning or scope. Use a configuration management and deployment system to ensure that all files are up-to-date on all hosts. Before and after making configuration changes, run benchmarks to evaluate the performance and resource impact.
  • 23. © 2014 Cloudera, Inc. All rights reserved. 23 Frozen jobs after upgrade to YARN Symptom: After upgrading to YARN, applications were waiting 7+ hours to launch
  • 24. © 2014 Cloudera, Inc. All rights reserved. 24 Frozen jobs after upgrade to YARN Evidence: • Initially thought it was a GC issue • RM had frozen and without RM, no tasks or containers can be launched, rendering cluster useless
  • 25. © 2014 Cloudera, Inc. All rights reserved. 25 Frozen jobs after upgrade to YARN Solution: • Fixed by YARN-713 (CDH5.1.1/ Hadoop 2.4.0), YARN-2313 (CDH5.2/Hadoop 2.6.0), YARN-2359 (CDH5.1.1/Hadoop 2.6.0) • Continuous scheduling reduces the allocation time of containers when there is capacity available Workaround: • Known JDK7 issue (sorting algorithm) • -Djava.util.Arrays.useLegacyMergeSort=true • Disable Fair Scheduler Continuous Scheduling (reduces RM load by coupling container scheduling decisions with NM heartbeats) • Configure RM HA to recover apps running on the cluster when RM died
  • 26. © 2014 Cloudera, Inc. All rights reserved. 26 Killing of tasks after upgrade to YARN Symptom: After upgrading to YARN, tasks were being killed
  • 27. © 2014 Cloudera, Inc. All rights reserved. 27 Killing of tasks after upgrade to YARN Evidence: Even with sufficient physical memory available, RHEL6 bug was causing JVM to take up to 100x as much virtual memory as physical memory http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn- avoiding-6-time-consuming-gotchas/ Application application_1392853856445_0900 failed 2 times due to AM Container for appattempt_1392853856445_0900_ 000002 exited with exitCode: 143 due to: Container [pid=27408,containerID=contain er_1392853856445_0900_02_00000 1] is running beyond virtual memory limits. Current usage: 337.6 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.
  • 28. © 2014 Cloudera, Inc. All rights reserved. 28 Killing of tasks after upgrade to YARN Solution: • Turn off virtual memory limits as YARN can kill tasks perceived to even virtually take up too much memory • Set yarn.nodemanager.vmem-check- enabled = false (fixed in CDH5.0.0)
  • 29. © 2014 Cloudera and/or its affiliates. All rights reserved. 29 YARN applications
  • 30. © 2014 Cloudera, Inc. All rights reserved. 30 Configuring YARN for Spark • Designed for interactive queries and iterative algorithms • In-memory caching, DAG engine, and APIs • Set yarn.scheduler.maximum-allocation-mb as high as 64G on a machine with 192GB of memory • Won’t run with small (< 1 GB) containers because starts fixed number of executors on the cluster http://blog.cloudera.com/blog/2014/05/apache-spark- resource-management-and-yarn-app-models/
  • 31. © 2014 Cloudera, Inc. All rights reserved. 31 YARN applications • Llama (Low Latency Application MAster) • Reserves memory in YARN for short-lived processes (e.g. Impala) • Registers one long-lived AM per YARN pool • Caches resources allocated by YARN for a short time, so that they can be quickly re-allocated to Impala queries • Long-term solution is to run Impala on YARN but currently recommend setting up admission control
  • 32. © 2014 Cloudera, Inc. All rights reserved. 32 YARN applications • Apache Slider (incubating) née Hoya • Runs long-lived persistent services on YARN (e.g. HBase) • Not currently recommended as it doesn’t provide IO isolation
  • 33. © 2014 Cloudera and/or its affiliates. All rights reserved. 33 Conclusion
  • 34. © 2014 Cloudera, Inc. All rights reserved. 34 YARN performance • Improved cluster utilization • Can run more jobs in smaller clusters • Run in uber mode for smaller jobs (reduces AM overhead) • Dynamic resource sharing between frameworks • One framework can use the entire cluster • Tom White’s Hadoop: The Definitive Guide 4th Ed (to be released) • Chapter 4 is on YARN
  • 35. YARN vs Mesos: Resource Manager’s role © 2014 Cloudera, Inc. All rights reserved. 35 YARN Mesos Asks for resources Offers resources Evolved into a resource manager Evolved into managing Hadoop Written in Java Written in C++ Locality aware More customizable
  • 36. Cloudera Live Cloudera Community In Europe! © 2014 Cloudera, Inc. All rights reserved. 36 Shameless plug A full demo cluster + tutorial and sample data/queries in the cloud – free access for two weeks (plus, clusters in Tableau and Zoomdata flavors!) cloudera.com/live Ask technical questions/get technical answers from Cloudera Software Engineers and other users cloudera.com/community Account Executives, Solutions Architects, Sales Engineers, Trainers cloudera.com/careers
  • 37. @kate_ting Office hours @Table C in 10 min