This talk draws on our experience in debugging and analyzing Hadoop jobs to describe some methodical approaches to this and present current and new tracing and tooling ideas that can help semi-automate parts of this difficult problem.
It is now possible to infer which application/job did what in HDFS Files created can be tracked down to the MR or Tez job and the specific task attempt that created them. Using simple string manipulation and aggregations, you can file jobs inducing high loads against the Namenode.
Tracking what YARN maps to what application type and instance is now much easier. It could made more easier if “mr_attempt_1464484887407_0007_m_000000_0” pointed to an oozie worklow instead of the MR job Who killed my application and how (command-line, webservice)?
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster slow?, Bikas Saha, Software Engineer, Hortonworks
Why is my Hadoop* job slow?
*Apache Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive,
HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper,
Oozie, Zeppelin and the Hadoop elephant logo are trademarks of the
Apache Software Foundation.