SlideShare a Scribd company logo
Yarns about YARN:
Migrating to MapReduce v2	
  
Kathleen	
  Ting	
  |	
  @kate_0ng	
  	
  
Technical	
  Account	
  Manager,	
  Cloudera	
  |	
  Sqoop	
  PMC	
  Member	
  	
  
Big	
  Data	
  Camp	
  LA	
  
June	
  14,	
  2014	
  
Who Am I?
•  Started 3 yr ago as 1st Cloudera Support Eng
•  Now manages Cloudera’s 2 largest customers
•  Sqoop Committer, PMC Member
•  Co-Author of the Apache Sqoop Cookbook
•  MRv1 misconfig talk viewed 20k on slideshare
Agenda
•  MapReduce Example
•  MR2 Motivation
•  Support Ticket Categorization
•  What are Misconfigurations?
•  Memory Misconfigurations
•  Thread Misconfigurations
•  Federation Misconfigurations
•  YARN Memory Misconfigurations
Agenda
•  MapReduce Example
•  MR2 Motivation
•  Support Ticket Categorization
•  What are Misconfigurations?
•  Memory Misconfigurations
•  Thread Misconfigurations
•  Federation Misconfigurations
•  YARN Memory Misconfigurations
Agenda
•  MapReduce Example
•  MR2 Motivation
•  Support Ticket Categorization
•  What are Misconfigurations?
•  Memory Misconfigurations
•  Thread Misconfigurations
•  Federation Misconfigurations
•  YARN Memory Misconfigurations
MR2 Motivation
•  Higher cluster utilization
•  Recommended MRv1 run only at 70% cap
•  Resources not used can be consumed by another
•  Lower operational costs
•  One cluster running MR, Spark, Impala, etc
•  Don’t need to transfer data between clusters
•  Not restricted to < 5k cluster
MRv2 Architecture
http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce-2-on-yarn-for-operators/
Agenda
•  MapReduce Example
•  MR2 Motivation
•  Support Ticket Categorization
•  What are Misconfigurations?
•  Memory Misconfigurations
•  Thread Misconfigurations
•  Federation Misconfigurations
•  YARN Memory Misconfigurations
10	
  
MapReduce is Central to Hadoop
Agenda
•  MapReduce Example
•  MR2 Motivation
•  Support Ticket Categorization
•  What are Misconfigurations?
•  Memory Misconfigurations
•  Thread Misconfigurations
•  Federation Misconfigurations
•  YARN Memory Misconfigurations
What are Misconfigurations?
•  Issues requiring change to Hadoop or to OS config files
•  Comprises 35% of Cloudera Support Tickets
•  e.g. resource-allocation: memory, file-handles, disk-space
Agenda
•  MapReduce Example
•  MR2 Motivation
•  Support Ticket Categorization
•  What are Misconfigurations?
•  Memory Misconfigurations
•  Thread Misconfigurations
•  Federation Misconfigurations
•  YARN Memory Misconfigurations
1. Task Out Of Memory Error (MRv1)
FATAL org.apache.hadoop.mapred.TaskTracker:
Error running child : java.lang.OutOfMemoryError:
Java heap space
at org.apache.hadoop.mapred.MapTask
$MapOutputBuffer.<init>
•  What does it mean?
o  Memory leak in task code
•  What causes this?
o  MR task heap sizes will not fit
1. Task Out Of Memory Error (MRv1)
o  MRv1 TaskTracker:
o  mapred.child.ulimit > 2*mapred.child.java.opts
o  0.25*mapred.child.java.opts < io.sort.mb < 0.5*mapred.child.java.opts
o  MRv1 DataNode:
o  Use short pathnames for dfs.data.dir names
o  e.g. /data/1, /data/2, /data/3
o  Increase DN heap
o  MRv2:
o  Manual tuning of io.sort.record.percent not needed
o  Tune mapreduce.map|reduce.memory.mb
o  mapred.child.ulimit = yarn.nodemanager.vmem-pmem-ratio
o  Moot if yarn.nodemanager.vmem-check-enabled is disabled
2. JobTracker Out of Memory Error
ERROR org.apache.hadoop.mapred.JobTracker: Job
initialization failed:
java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.mapred.TaskInProgress.<init>(TaskInProg
ress.java:122)
•  What does it mean?
o  Total JT memory usage > allocated RAM
•  What causes this?
o  Tasks too small
o  Too much job history
2. JobTracker Out of Memory Error
•  How can it be resolved?
o  sudo	
  -­‐u	
  mapreduce	
  jmap	
  -­‐histo:live	
  <pid>	
  	
  
o  histogram	
  of	
  what	
  objects	
  the	
  JVM	
  has	
  allocated	
  
o  Increase JT heap
o  Don’t co-locate JT and NN
o  mapred.job.tracker.handler.count = ln(#TT)*20
o  mapred.jobtracker.completeuserjobs.maximum = 5
o  mapred.job.tracker.retiredjobs.cache.size = 100
o  mapred.jobtracker.retirejob.interval = 3600000
o  YARN has Uber AMs (run in single JVM)
o  One AM per MR job
o  Not restricted to keeping 5 jobs in memory
Agenda
•  MapReduce Example
•  MR2 Motivation
•  Support Ticket Categorization
•  What are Misconfigurations?
•  Memory Misconfigurations
•  Thread Misconfigurations
•  Federation Misconfigurations
•  YARN Memory Misconfigurations
Fetch Failures
3. Too Many Fetch-Failures
MR1: INFO org.apache.hadoop.mapred.JobInProgress:
Too many fetch-failures for output of task
MR2: ERROR
org.apache.hadoop.mapred.ShuffleHandler: Shuffle
error:
java.io.IOException: Broken pipe
•  What does it mean?
o  Reducer fetch operations fail to retrieve mapper outputs
o  Too many could blacklist the TT
•  What causes this?
o  DNS issues
o  Not enough http threads on the mapper side
o  Not enough connections
3. Too Many Fetch-Failures
MR1:
o  mapred.reduce.slowstart.completed.maps = 0.80
o  Unblocks other reducers to run while a big job waits on mappers
o  tasktracker.http.threads = 80
o  Increases threads used to serve map output to reducers
o  mapred.reduce.parallel.copies = SQRT(Nodes), floor of 10
o  Allows reducers to fetch map output in parallel
MR2:
o  Set ShuffleHandler configs:
o  yarn.nodemanager.aux-services = mapreduce_shuffle
o  yarn.nodemanager.aux-services.mapreduce_shuffle.class =
org.apache.hadoop.mapred.ShuffleHandler
o  tasktracker.http.threads N/A
o  max # of threads is based on # of processors on machine
o  Uses Netty, allowing up to twice as many threads as there are processors
Agenda
•  MapReduce Example
•  MR2 Motivation
•  Support Ticket Categorization
•  What are Misconfigurations?
•  Memory Misconfigurations
•  Thread Misconfigurations
•  Federation Misconfigurations
•  YARN Memory Misconfigurations
4. Federation: Just (Don’t) Do It
= spreads FS metadata across NNs
= is stable (but ViewFS isn’t)
= is meant for 1k+ nodes
≠ multi-tenancy
≠ horizontally scale namespaces
è NN HA + YARN
è RPC QoS
Agenda
•  MapReduce Example
•  MR2 Motivation
•  Support Ticket Categorization
•  What are Misconfigurations?
•  Memory Misconfigurations
•  Thread Misconfigurations
•  Federation Misconfigurations
•  YARN Memory Misconfigurations
5. Optimizing YARN Virtual Memory Usage
Problem:	
  	
  
Current usage: 337.6 MB of 1 GB physical
memory used; 2.2 GB of 2.1 GB virtual memory
used. Killing container.
	
  
Solution:	
  
•  Set yarn.nodemanager.vmem-check-enabled = false
•  Determine AM container size:
yarn.app.mapreduce.am.resource.cpu-vcores
yarn.app.mapreduce.am.resource.mb
•  Sizing the AM: 1024mb (-Xmx768m)
•  Can be smaller because only storing one job unless using Uber
6. CPU Isolation in YARN Containers
•  mapreduce.map.cpu.vcores
mapreduce.reduce.cpu.vcores
(per-job config)
•  yarn.nodemanager.resource.cpu-vcores
(slave service side resource config)
•  yarn.scheduler.minimum-allocation-vcores
yarn.scheduler.maximum-allocation-vcores
(scheduler allocation control configs)
•  yarn.nodemanager.linux-container-
executor.resources-handler.class
(turn on cgroups in NM)
7. Understanding YARN Virtual Memory	
  
Situation:
yarn.nodemanager.resource.cpu-vcores
> actual cores
yarn.nodemanager.resource.memory-mb
> RAM
Effect:
- Exceeding cores = sharing existing cores, slower
-  Exceeding RAM = swapping, OOM
Bonus: Fair Scheduler Errors
ERROR
org.apache.hadoop.yarn.server.resourc
emanager.scheduler.fair.FairScheduler
: Request for appInfo of unknown
attemptappattempt_1395214170909_0059_
000001
	
  
Harmless	
  message	
  fixed	
  in	
  YARN-­‐1785	
  
YARN Resources
•  Migrating to MR2 on YARN:
•  For Operators:
http://blog.cloudera.com/blog/2013/11/migrating-to-
mapreduce-2-on-yarn-for-operators/
•  For Users:
http://blog.cloudera.com/blog/2013/11/migrating-to-
mapreduce-2-on-yarn-for-users/
•  http://blog.cloudera.com/blog/2014/04/apache-hadoop-
yarn-avoiding-6-time-consuming-gotchas/
•  Getting MR2 Up to Speed:
•  http://blog.cloudera.com/blog/2014/02/getting-
mapreduce-2-up-to-speed/
Takeaways
•  Want to DIY?
•  Take Cloudera’s Admin Training - now with 4x the labs
•  Get it right the first time with monitoring tools.
•  "Yep - we were able to download/install/configure/
setup a Cloudera Manager cluster from scratch in
minutes :)”
•  Want misconfig updates?
•  Follow @kate_ting

More Related Content

What's hot

Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
Manaranjan Pradhan
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability
Omid Vahdaty
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Allen Wittenauer
 
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduce
Hanborq Inc.
 
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationHadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationYahoo Developer Network
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best PracticesCloudera, Inc.
 
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingReal-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Abdelhamide EL ARIB
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jkEdureka!
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They Work
Ilya Ganelin
 
Webinar: Top 5 Hadoop Admin Tasks
Webinar: Top 5 Hadoop Admin TasksWebinar: Top 5 Hadoop Admin Tasks
Webinar: Top 5 Hadoop Admin Tasks
Edureka!
 
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceHadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Cloudera, Inc.
 
Database Research on Modern Computing Architecture
Database Research on Modern Computing ArchitectureDatabase Research on Modern Computing Architecture
Database Research on Modern Computing Architecture
Kyong-Ha Lee
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationDataWorks Summit
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
hadooparchbook
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
Alex Moundalexis
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
Ted Dunning
 
Optimizing MapReduce Job performance
Optimizing MapReduce Job performanceOptimizing MapReduce Job performance
Optimizing MapReduce Job performanceDataWorks Summit
 
How to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterHow to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop Cluster
Altoros
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud Ibrahimov
Maksud Ibrahimov
 

What's hot (20)

Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
ha_module5
ha_module5ha_module5
ha_module5
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
 
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduce
 
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationHadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best Practices
 
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingReal-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They Work
 
Webinar: Top 5 Hadoop Admin Tasks
Webinar: Top 5 Hadoop Admin TasksWebinar: Top 5 Hadoop Admin Tasks
Webinar: Top 5 Hadoop Admin Tasks
 
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceHadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
 
Database Research on Modern Computing Architecture
Database Research on Modern Computing ArchitectureDatabase Research on Modern Computing Architecture
Database Research on Modern Computing Architecture
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
Optimizing MapReduce Job performance
Optimizing MapReduce Job performanceOptimizing MapReduce Job performance
Optimizing MapReduce Job performance
 
How to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterHow to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop Cluster
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud Ibrahimov
 

Viewers also liked

Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Data Con LA
 
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitLa big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixit
Data Con LA
 
Summit v4 dave wolcott
Summit v4 dave wolcottSummit v4 dave wolcott
Summit v4 dave wolcott
Data Con LA
 
Aziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaAziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jha
Data Con LA
 
Ag big datacampla-06-14-2014-ajay_gopal
Ag big datacampla-06-14-2014-ajay_gopalAg big datacampla-06-14-2014-ajay_gopal
Ag big datacampla-06-14-2014-ajay_gopal
Data Con LA
 
Big datacamp june14_alex_liu
Big datacamp june14_alex_liuBig datacamp june14_alex_liu
Big datacamp june14_alex_liu
Data Con LA
 
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Data Con LA
 
20140614 introduction to spark-ben white
20140614 introduction to spark-ben white20140614 introduction to spark-ben white
20140614 introduction to spark-ben white
Data Con LA
 
140614 bigdatacamp-la-keynote-jon hsieh
140614 bigdatacamp-la-keynote-jon hsieh140614 bigdatacamp-la-keynote-jon hsieh
140614 bigdatacamp-la-keynote-jon hsieh
Data Con LA
 
2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky
Data Con LA
 
Kiji cassandra la june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kellyKiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la june 2014 - v02 clint-kelly
Data Con LA
 
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Data Con LA
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
Data Con LA
 
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Data Con LA
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014
Data Con LA
 
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Data Con LA
 
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Data Con LA
 
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Data Con LA
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Data Con LA
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Data Con LA
 

Viewers also liked (20)

Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
 
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitLa big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixit
 
Summit v4 dave wolcott
Summit v4 dave wolcottSummit v4 dave wolcott
Summit v4 dave wolcott
 
Aziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaAziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jha
 
Ag big datacampla-06-14-2014-ajay_gopal
Ag big datacampla-06-14-2014-ajay_gopalAg big datacampla-06-14-2014-ajay_gopal
Ag big datacampla-06-14-2014-ajay_gopal
 
Big datacamp june14_alex_liu
Big datacamp june14_alex_liuBig datacamp june14_alex_liu
Big datacamp june14_alex_liu
 
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
 
20140614 introduction to spark-ben white
20140614 introduction to spark-ben white20140614 introduction to spark-ben white
20140614 introduction to spark-ben white
 
140614 bigdatacamp-la-keynote-jon hsieh
140614 bigdatacamp-la-keynote-jon hsieh140614 bigdatacamp-la-keynote-jon hsieh
140614 bigdatacamp-la-keynote-jon hsieh
 
2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky
 
Kiji cassandra la june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kellyKiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la june 2014 - v02 clint-kelly
 
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014
 
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
 
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
 
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
 

Similar to Yarn cloudera-kathleenting061414 kate-ting

Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Cloudera, Inc.
 
Yarn Resource Management Using Machine Learning
Yarn Resource Management Using Machine LearningYarn Resource Management Using Machine Learning
Yarn Resource Management Using Machine Learning
ojavajava
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
hadooparchbook
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
Ravindra Bandara
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Cloudera, Inc.
 
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaTop 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Spark Summit
 
Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
Jason Hubbard
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
Databricks
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
markgrover
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
markgrover
 
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
Insight Technology, Inc.
 
Spark on YARN
Spark on YARNSpark on YARN
Spark on YARN
Adarsh Pannu
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
Subhas Kumar Ghosh
 
Yarn
YarnYarn
Yarn
Yu Xia
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)mundlapudi
 
Review of Calculation Paradigm and its Components
Review of Calculation Paradigm and its ComponentsReview of Calculation Paradigm and its Components
Review of Calculation Paradigm and its Components
Namuk Park
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
Omid Vahdaty
 
isca22-feng-menda_for sparse transposition and dataflow.pptx
isca22-feng-menda_for sparse transposition and dataflow.pptxisca22-feng-menda_for sparse transposition and dataflow.pptx
isca22-feng-menda_for sparse transposition and dataflow.pptx
ssuser30e7d2
 

Similar to Yarn cloudera-kathleenting061414 kate-ting (20)

Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
 
Yarn Resource Management Using Machine Learning
Yarn Resource Management Using Machine LearningYarn Resource Management Using Machine Learning
Yarn Resource Management Using Machine Learning
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
 
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaTop 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
 
Spark Tips & Tricks
Spark Tips & TricksSpark Tips & Tricks
Spark Tips & Tricks
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
 
Spark on YARN
Spark on YARNSpark on YARN
Spark on YARN
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
 
Yarn
YarnYarn
Yarn
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
Review of Calculation Paradigm and its Components
Review of Calculation Paradigm and its ComponentsReview of Calculation Paradigm and its Components
Review of Calculation Paradigm and its Components
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
 
isca22-feng-menda_for sparse transposition and dataflow.pptx
isca22-feng-menda_for sparse transposition and dataflow.pptxisca22-feng-menda_for sparse transposition and dataflow.pptx
isca22-feng-menda_for sparse transposition and dataflow.pptx
 

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
Data Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
Data Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
Data Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
Data Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
Data Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
Data Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 

Recently uploaded (20)

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 

Yarn cloudera-kathleenting061414 kate-ting

  • 1. Yarns about YARN: Migrating to MapReduce v2   Kathleen  Ting  |  @kate_0ng     Technical  Account  Manager,  Cloudera  |  Sqoop  PMC  Member     Big  Data  Camp  LA   June  14,  2014  
  • 2. Who Am I? •  Started 3 yr ago as 1st Cloudera Support Eng •  Now manages Cloudera’s 2 largest customers •  Sqoop Committer, PMC Member •  Co-Author of the Apache Sqoop Cookbook •  MRv1 misconfig talk viewed 20k on slideshare
  • 3. Agenda •  MapReduce Example •  MR2 Motivation •  Support Ticket Categorization •  What are Misconfigurations? •  Memory Misconfigurations •  Thread Misconfigurations •  Federation Misconfigurations •  YARN Memory Misconfigurations
  • 4. Agenda •  MapReduce Example •  MR2 Motivation •  Support Ticket Categorization •  What are Misconfigurations? •  Memory Misconfigurations •  Thread Misconfigurations •  Federation Misconfigurations •  YARN Memory Misconfigurations
  • 5.
  • 6. Agenda •  MapReduce Example •  MR2 Motivation •  Support Ticket Categorization •  What are Misconfigurations? •  Memory Misconfigurations •  Thread Misconfigurations •  Federation Misconfigurations •  YARN Memory Misconfigurations
  • 7. MR2 Motivation •  Higher cluster utilization •  Recommended MRv1 run only at 70% cap •  Resources not used can be consumed by another •  Lower operational costs •  One cluster running MR, Spark, Impala, etc •  Don’t need to transfer data between clusters •  Not restricted to < 5k cluster
  • 9. Agenda •  MapReduce Example •  MR2 Motivation •  Support Ticket Categorization •  What are Misconfigurations? •  Memory Misconfigurations •  Thread Misconfigurations •  Federation Misconfigurations •  YARN Memory Misconfigurations
  • 10. 10  
  • 11. MapReduce is Central to Hadoop
  • 12. Agenda •  MapReduce Example •  MR2 Motivation •  Support Ticket Categorization •  What are Misconfigurations? •  Memory Misconfigurations •  Thread Misconfigurations •  Federation Misconfigurations •  YARN Memory Misconfigurations
  • 13. What are Misconfigurations? •  Issues requiring change to Hadoop or to OS config files •  Comprises 35% of Cloudera Support Tickets •  e.g. resource-allocation: memory, file-handles, disk-space
  • 14. Agenda •  MapReduce Example •  MR2 Motivation •  Support Ticket Categorization •  What are Misconfigurations? •  Memory Misconfigurations •  Thread Misconfigurations •  Federation Misconfigurations •  YARN Memory Misconfigurations
  • 15. 1. Task Out Of Memory Error (MRv1) FATAL org.apache.hadoop.mapred.TaskTracker: Error running child : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.MapTask $MapOutputBuffer.<init> •  What does it mean? o  Memory leak in task code •  What causes this? o  MR task heap sizes will not fit
  • 16. 1. Task Out Of Memory Error (MRv1) o  MRv1 TaskTracker: o  mapred.child.ulimit > 2*mapred.child.java.opts o  0.25*mapred.child.java.opts < io.sort.mb < 0.5*mapred.child.java.opts o  MRv1 DataNode: o  Use short pathnames for dfs.data.dir names o  e.g. /data/1, /data/2, /data/3 o  Increase DN heap o  MRv2: o  Manual tuning of io.sort.record.percent not needed o  Tune mapreduce.map|reduce.memory.mb o  mapred.child.ulimit = yarn.nodemanager.vmem-pmem-ratio o  Moot if yarn.nodemanager.vmem-check-enabled is disabled
  • 17.
  • 18. 2. JobTracker Out of Memory Error ERROR org.apache.hadoop.mapred.JobTracker: Job initialization failed: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.TaskInProgress.<init>(TaskInProg ress.java:122) •  What does it mean? o  Total JT memory usage > allocated RAM •  What causes this? o  Tasks too small o  Too much job history
  • 19. 2. JobTracker Out of Memory Error •  How can it be resolved? o  sudo  -­‐u  mapreduce  jmap  -­‐histo:live  <pid>     o  histogram  of  what  objects  the  JVM  has  allocated   o  Increase JT heap o  Don’t co-locate JT and NN o  mapred.job.tracker.handler.count = ln(#TT)*20 o  mapred.jobtracker.completeuserjobs.maximum = 5 o  mapred.job.tracker.retiredjobs.cache.size = 100 o  mapred.jobtracker.retirejob.interval = 3600000 o  YARN has Uber AMs (run in single JVM) o  One AM per MR job o  Not restricted to keeping 5 jobs in memory
  • 20. Agenda •  MapReduce Example •  MR2 Motivation •  Support Ticket Categorization •  What are Misconfigurations? •  Memory Misconfigurations •  Thread Misconfigurations •  Federation Misconfigurations •  YARN Memory Misconfigurations
  • 22.
  • 23. 3. Too Many Fetch-Failures MR1: INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task MR2: ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error: java.io.IOException: Broken pipe •  What does it mean? o  Reducer fetch operations fail to retrieve mapper outputs o  Too many could blacklist the TT •  What causes this? o  DNS issues o  Not enough http threads on the mapper side o  Not enough connections
  • 24. 3. Too Many Fetch-Failures MR1: o  mapred.reduce.slowstart.completed.maps = 0.80 o  Unblocks other reducers to run while a big job waits on mappers o  tasktracker.http.threads = 80 o  Increases threads used to serve map output to reducers o  mapred.reduce.parallel.copies = SQRT(Nodes), floor of 10 o  Allows reducers to fetch map output in parallel MR2: o  Set ShuffleHandler configs: o  yarn.nodemanager.aux-services = mapreduce_shuffle o  yarn.nodemanager.aux-services.mapreduce_shuffle.class = org.apache.hadoop.mapred.ShuffleHandler o  tasktracker.http.threads N/A o  max # of threads is based on # of processors on machine o  Uses Netty, allowing up to twice as many threads as there are processors
  • 25. Agenda •  MapReduce Example •  MR2 Motivation •  Support Ticket Categorization •  What are Misconfigurations? •  Memory Misconfigurations •  Thread Misconfigurations •  Federation Misconfigurations •  YARN Memory Misconfigurations
  • 26. 4. Federation: Just (Don’t) Do It = spreads FS metadata across NNs = is stable (but ViewFS isn’t) = is meant for 1k+ nodes ≠ multi-tenancy ≠ horizontally scale namespaces è NN HA + YARN è RPC QoS
  • 27. Agenda •  MapReduce Example •  MR2 Motivation •  Support Ticket Categorization •  What are Misconfigurations? •  Memory Misconfigurations •  Thread Misconfigurations •  Federation Misconfigurations •  YARN Memory Misconfigurations
  • 28. 5. Optimizing YARN Virtual Memory Usage Problem:     Current usage: 337.6 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.   Solution:   •  Set yarn.nodemanager.vmem-check-enabled = false •  Determine AM container size: yarn.app.mapreduce.am.resource.cpu-vcores yarn.app.mapreduce.am.resource.mb •  Sizing the AM: 1024mb (-Xmx768m) •  Can be smaller because only storing one job unless using Uber
  • 29. 6. CPU Isolation in YARN Containers •  mapreduce.map.cpu.vcores mapreduce.reduce.cpu.vcores (per-job config) •  yarn.nodemanager.resource.cpu-vcores (slave service side resource config) •  yarn.scheduler.minimum-allocation-vcores yarn.scheduler.maximum-allocation-vcores (scheduler allocation control configs) •  yarn.nodemanager.linux-container- executor.resources-handler.class (turn on cgroups in NM)
  • 30. 7. Understanding YARN Virtual Memory   Situation: yarn.nodemanager.resource.cpu-vcores > actual cores yarn.nodemanager.resource.memory-mb > RAM Effect: - Exceeding cores = sharing existing cores, slower -  Exceeding RAM = swapping, OOM
  • 31. Bonus: Fair Scheduler Errors ERROR org.apache.hadoop.yarn.server.resourc emanager.scheduler.fair.FairScheduler : Request for appInfo of unknown attemptappattempt_1395214170909_0059_ 000001   Harmless  message  fixed  in  YARN-­‐1785  
  • 32. YARN Resources •  Migrating to MR2 on YARN: •  For Operators: http://blog.cloudera.com/blog/2013/11/migrating-to- mapreduce-2-on-yarn-for-operators/ •  For Users: http://blog.cloudera.com/blog/2013/11/migrating-to- mapreduce-2-on-yarn-for-users/ •  http://blog.cloudera.com/blog/2014/04/apache-hadoop- yarn-avoiding-6-time-consuming-gotchas/ •  Getting MR2 Up to Speed: •  http://blog.cloudera.com/blog/2014/02/getting- mapreduce-2-up-to-speed/
  • 33. Takeaways •  Want to DIY? •  Take Cloudera’s Admin Training - now with 4x the labs •  Get it right the first time with monitoring tools. •  "Yep - we were able to download/install/configure/ setup a Cloudera Manager cluster from scratch in minutes :)” •  Want misconfig updates? •  Follow @kate_ting