Why is my Hadoop* job slow?

Why is my Hadoop* job slow?
Rajesh Balamohan
@rajeshbalamohan
*Apache Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive,
HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper,
Oozie, Zeppelin and the Hadoop elephant logo are trademarks of the
Apache Software Foundation.

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Metrics and Monitoring
Logging and Correlation
Tracing and Analysis

 Metrics as high level pointers
 Ambari Metrics System
 Ambari Grafana Integration
 HBase, HDFS, YARN Dashboards
 Metrics based alerting

Metrics as high level pointers
 Machine level metrics like CPU load
 Application level metrics like HDFS counters
 Metrics at point of time
 Metrics anomalies along a time series
 Correlated anomalies
 Problem is to need to know what to look for

Ambari Metrics Service - Motivation
 Limited Ganglia capabilities
 OpenTSDB – GPL license and needs a Hadoop cluster
 Need service level aggregation as well as time based
 Alerts based on metrics system
 Ability to scale past a 1000 nodes
 Ability to perform analytics based on a use case
 Allow fine grained control over aspects like: retention, collection intervals, aggregation
 Pluggable and Extensible
First version released with Ambari 2.0.0

Ambari Grafana Integration
 Open source dashboard builder integrated with AMS.
 Available from Ambari-2.2.2
 Pre-defined host level and service level (HDFS, HBase, Yarn etc) dashboards.
 Added to Ambari through API after upgrade

HBase Dashboard

HDFS Dashboard

YARN Dashboard

Metrics based Alerting
 Top N support to quickly identify potential offenders
 Alerting based on time series

Agenda

 HDFS, YARN Audit logs
 Caller Context
 YARN Application Timeline Service
 Lineage tracking of operations across workloads
 Ambari Log Search

HDFS Audit Logs and Caller Context
FSNamesystem.audit: allowed=true ugi=userA (auth:SIMPLE) ip=/172.22.68.32 cmd=create
src=/tmp/in/_temporary/1/_temporary/attempt_14644848874070_0009_m_009995_0/part-m-09995
dst=null perm=root:hdfs:rw-r--r-- proto=rpc
callerContext=tez_ta:attempt_1464484887407_0009_1_00_009995_0
FSNamesystem.audit: allowed=true ugi=userA (auth:SIMPLE) ip=/172.22.68.33 cmd=create
src=/tmp/in2/_temporary/1/_temporary/attempt_1464484887407_0011_m_000097_0/part-m-00097
callerContext=mr_attempt_1464484887407_0011_m_000097_0
FSNamesystem.audit: allowed=true ugi=userB (auth:SIMPLE) ip=/172.22.68.34 cmd=create
src=/tmp/in2/_temporary/1/_temporary/attempt_1464484887407_0011_m_000095_0/part-m-00095
callerContext=mr_attempt_1464484887407_0011_m_000095_0

ResourceManager Audit Logs and Caller Context
resourcemanager.RMAuditLogger: USER=userA IP=172.22.68.32 OPERATION=Submit Application
Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0001
CALLERCONTEXT=PIG-pigSmoke.sh-8a052588-0013-4e39-83b1-ebad699d8e2e
resourcemanager.RMAuditLogger: USER=userA IP=172.22.68.30 OPERATION=Submit Application
CALLERCONTEXT=CLI
resourcemanager.RMAuditLogger: USER=userB IP=172.22.68.34 OPERATION=Submit Application
CALLERCONTEXT=mr_attempt_1464484887407_0007_m_000000_0
resourcemanager.RMAuditLogger: USER=userB IP=172.22.68.30 OPERATION=Submit Application
CALLERCONTEXT=HIVE_SSN_ID:f3aadf99-9e36-494b-84a1-99b685ac344b

YARN Application Timeline Service
 YARN service for fine grained application level tracing
 Enables complex metadata to be recorded as the YARN app makes progress
 Allows retrieval of this timeline data based on filters
 Can be used to drive limited online analytics and extensive post-hoc analysis

Lineage Tracking using YARN Timeline
 Timeline:8188/ws/v1/timeline/TEZ_DAG_ID/dag_1464484887407_0013_1
dagContext: { callerId: "root_20160529021115_006f8007-5840-4c64-9970-c1b506f68db2",
callerType: "HIVE_QUERY_ID",
context: "HIVE",
description: "select user, count(visit_id) as visits from users group by user order by visits” }
 Timeline:8188/ws/v1/timeline/HIVE_QUERY_ID/root_20160529021115_006f8007-
5840-4c64-9970-c1b506f68db2
hiveContext: { callerId: “workflow_abcd",
callerType: “OOZIE_ID",
context: “OOZIE",
description: “Daily ETL Summary Job” }

Ambari Log Search

Agenda

 Use Big Data methods to solve Big Data problems
 Apache Zeppelin as analytical tool
 Hive/Tez/YARN notebook for analysis

Zeppelin for Ad-hoc Analytics

YARN Analyzer

Tez Analyzer

Tez Swimlane View

Tez UI to Download Timeline Data

Enable Task Level Debug Logs in Tez
 Enable debug logs for specific class
 tez.task.log.level="INFO;org.apache.hadoop.hive.ql.io.orc=DEBUG;”
 For specific task in specific vertex
– hive --hiveconf tez.task-specific.launch.cmd-opts.list="Map 1[0]" --hiveconf tez.task-
specific.log.level="INFO;org.apache=DEBUG;”
– Adds DEBUG logs for Task 0 in Map 1.

Swimlanes
 TEZ-1332

Tez Analyzer

Thank You

Why is my Hadoop* job slow?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Why is my Hadoop* job slow?

Similar to Why is my Hadoop* job slow? (20)

More from DataWorks Summit/Hadoop Summit

More from DataWorks Summit/Hadoop Summit (20)

Recently uploaded

Recently uploaded (20)

Why is my Hadoop* job slow?

Editor's Notes