SlideShare a Scribd company logo
Why is my Hadoop* job slow?
Bikas Saha
@bikassaha
*Apache Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive,
HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper,
Oozie, Zeppelin and the Hadoop elephant logo are trademarks of the
Apache Software Foundation.
Hitesh Shah
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Metrics and Monitoring
Logging and Correlation
Tracing and Analysis
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Metrics and Monitoring
 Metrics as high level pointers
 Ambari Metrics System
 Ambari Grafana Integration
 HBase, HDFS, YARN Dashboards
 Metrics based alerting
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Metrics as high level pointers
 Machine level metrics like CPU load
 Application level metrics like HDFS counters
 Metrics at point of time
 Metrics anomalies along a time series
 Correlated anomalies
 Problem is to need to know what to look for
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Metrics Service - Motivation
 Limited Ganglia capabilities
 OpenTSDB – GPL license and needs a Hadoop cluster
 Need service level aggregation as well as time based
 Alerts based on metrics system
 Ability to scale past a 1000 nodes
 Ability to perform analytics based on a use case
 Allow fine grained control over aspects like: retention, collection intervals, aggregation
 Pluggable and Extensible
First version released with Ambari 2.0.0
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Grafana Integration
 Open source dashboard builder integrated with AMS.
 Available from Ambari-2.2.2
 Pre-defined host level and service level (HDFS, HBase, Yarn etc) dashboards.
 Added to Ambari through API after upgrade
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBase Dashboard
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Dashboard
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
YARN Dashboard
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Metrics based Alerting
 Top N support to quickly identify potential offenders
 Alerting based on time series
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Metrics and Monitoring
Logging and Correlation
Tracing and Analysis
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Logging and Correlation
 HDFS, YARN Audit logs
 Caller Context
 YARN Application Timeline Service
 Lineage tracking of operations across workloads
 Ambari Log Search
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Audit Logs and Caller Context
FSNamesystem.audit: allowed=true ugi=userA (auth:SIMPLE) ip=/172.22.68.32 cmd=create
src=/tmp/in/_temporary/1/_temporary/attempt_14644848874070_0009_m_009995_0/part-m-09995
dst=null perm=root:hdfs:rw-r--r-- proto=rpc
callerContext=tez_ta:attempt_1464484887407_0009_1_00_009995_0
FSNamesystem.audit: allowed=true ugi=userA (auth:SIMPLE) ip=/172.22.68.33 cmd=create
src=/tmp/in2/_temporary/1/_temporary/attempt_1464484887407_0011_m_000097_0/part-m-00097
dst=null perm=root:hdfs:rw-r--r-- proto=rpc
callerContext=mr_attempt_1464484887407_0011_m_000097_0
FSNamesystem.audit: allowed=true ugi=userB (auth:SIMPLE) ip=/172.22.68.34 cmd=create
src=/tmp/in2/_temporary/1/_temporary/attempt_1464484887407_0011_m_000095_0/part-m-00095
dst=null perm=root:hdfs:rw-r--r-- proto=rpc
callerContext=mr_attempt_1464484887407_0011_m_000095_0
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
ResourceManager Audit Logs and Caller Context
resourcemanager.RMAuditLogger: USER=userA IP=172.22.68.32 OPERATION=Submit Application
Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0001
CALLERCONTEXT=PIG-pigSmoke.sh-8a052588-0013-4e39-83b1-ebad699d8e2e
resourcemanager.RMAuditLogger: USER=userA IP=172.22.68.30 OPERATION=Submit Application
Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0009
CALLERCONTEXT=CLI
resourcemanager.RMAuditLogger: USER=userB IP=172.22.68.34 OPERATION=Submit Application
Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0008
CALLERCONTEXT=mr_attempt_1464484887407_0007_m_000000_0
resourcemanager.RMAuditLogger: USER=userB IP=172.22.68.30 OPERATION=Submit Application
Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0012
CALLERCONTEXT=HIVE_SSN_ID:f3aadf99-9e36-494b-84a1-99b685ac344b
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
YARN Application Timeline Service
 YARN service for fine grained application level tracing
 Enables complex metadata to be recorded as the YARN app makes progress
 Allows retrieval of this timeline data based on filters
 Can be used to drive limited online analytics and extensive post-hoc analysis
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lineage Tracking using YARN Timeline
 Timeline:8188/ws/v1/timeline/TEZ_DAG_ID/dag_1464484887407_0013_1
dagContext: { callerId: "root_20160529021115_006f8007-5840-4c64-9970-c1b506f68db2",
callerType: "HIVE_QUERY_ID",
context: "HIVE",
description: "select user, count(visit_id) as visits from users group by user order by visits” }
 Timeline:8188/ws/v1/timeline/HIVE_QUERY_ID/root_20160529021115_006f8007-
5840-4c64-9970-c1b506f68db2
hiveContext: { callerId: “workflow_abcd",
callerType: “OOZIE_ID",
context: “OOZIE",
description: “Daily ETL Summary Job” }
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Log Search
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Log Search
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Metrics and Monitoring
Logging and Correlation
Tracing and Analysis
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tracing and Analysis
 Use Big Data methods to solve Big Data problems
 Apache Zeppelin as analytical tool
 Hive/Tez/YARN notebook for analysis
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin for Ad-hoc Analytics
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
YARN Analyzer
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tez Analyzer
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tez Analyzer
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tez Analyzer
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You

More Related Content

Similar to Why is My Hadoop Job Slow?

Why is my Hadoop* job slow?
Why is my Hadoop* job slow?Why is my Hadoop* job slow?
Why is my Hadoop* job slow?
DataWorks Summit/Hadoop Summit
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
Hortonworks
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
Hortonworks
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
DataWorks Summit/Hadoop Summit
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
DataWorks Summit
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
DataWorks Summit/Hadoop Summit
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
Madhan Neethiraj
 
Ambari metrics system - Apache ambari meetup (DataWorks Summit 2017)
Ambari metrics system - Apache ambari meetup (DataWorks Summit 2017)Ambari metrics system - Apache ambari meetup (DataWorks Summit 2017)
Ambari metrics system - Apache ambari meetup (DataWorks Summit 2017)
Aravindan Vijayan
 
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
alanfgates
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
DataWorks Summit
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
DataWorks Summit/Hadoop Summit
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
DataWorks Summit/Hadoop Summit
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
 
Schema Registry & Stream Analytics Manager
Schema Registry  & Stream Analytics ManagerSchema Registry  & Stream Analytics Manager
Schema Registry & Stream Analytics Manager
Sriharsha Chintalapani
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
ahortonworks
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 

Similar to Why is My Hadoop Job Slow? (20)

Why is my Hadoop* job slow?
Why is my Hadoop* job slow?Why is my Hadoop* job slow?
Why is my Hadoop* job slow?
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
 
Ambari metrics system - Apache ambari meetup (DataWorks Summit 2017)
Ambari metrics system - Apache ambari meetup (DataWorks Summit 2017)Ambari metrics system - Apache ambari meetup (DataWorks Summit 2017)
Ambari metrics system - Apache ambari meetup (DataWorks Summit 2017)
 
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Schema Registry & Stream Analytics Manager
Schema Registry  & Stream Analytics ManagerSchema Registry  & Stream Analytics Manager
Schema Registry & Stream Analytics Manager
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 

Recently uploaded

University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 

Recently uploaded (20)

University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 

Why is My Hadoop Job Slow?

  • 1. Why is my Hadoop* job slow? Bikas Saha @bikassaha *Apache Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie, Zeppelin and the Hadoop elephant logo are trademarks of the Apache Software Foundation. Hitesh Shah
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Metrics and Monitoring Logging and Correlation Tracing and Analysis
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Metrics and Monitoring  Metrics as high level pointers  Ambari Metrics System  Ambari Grafana Integration  HBase, HDFS, YARN Dashboards  Metrics based alerting
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Metrics as high level pointers  Machine level metrics like CPU load  Application level metrics like HDFS counters  Metrics at point of time  Metrics anomalies along a time series  Correlated anomalies  Problem is to need to know what to look for
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari Metrics Service - Motivation  Limited Ganglia capabilities  OpenTSDB – GPL license and needs a Hadoop cluster  Need service level aggregation as well as time based  Alerts based on metrics system  Ability to scale past a 1000 nodes  Ability to perform analytics based on a use case  Allow fine grained control over aspects like: retention, collection intervals, aggregation  Pluggable and Extensible First version released with Ambari 2.0.0
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari Grafana Integration  Open source dashboard builder integrated with AMS.  Available from Ambari-2.2.2  Pre-defined host level and service level (HDFS, HBase, Yarn etc) dashboards.  Added to Ambari through API after upgrade
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HBase Dashboard
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Dashboard
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved YARN Dashboard
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Metrics based Alerting  Top N support to quickly identify potential offenders  Alerting based on time series
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Metrics and Monitoring Logging and Correlation Tracing and Analysis
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Logging and Correlation  HDFS, YARN Audit logs  Caller Context  YARN Application Timeline Service  Lineage tracking of operations across workloads  Ambari Log Search
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Audit Logs and Caller Context FSNamesystem.audit: allowed=true ugi=userA (auth:SIMPLE) ip=/172.22.68.32 cmd=create src=/tmp/in/_temporary/1/_temporary/attempt_14644848874070_0009_m_009995_0/part-m-09995 dst=null perm=root:hdfs:rw-r--r-- proto=rpc callerContext=tez_ta:attempt_1464484887407_0009_1_00_009995_0 FSNamesystem.audit: allowed=true ugi=userA (auth:SIMPLE) ip=/172.22.68.33 cmd=create src=/tmp/in2/_temporary/1/_temporary/attempt_1464484887407_0011_m_000097_0/part-m-00097 dst=null perm=root:hdfs:rw-r--r-- proto=rpc callerContext=mr_attempt_1464484887407_0011_m_000097_0 FSNamesystem.audit: allowed=true ugi=userB (auth:SIMPLE) ip=/172.22.68.34 cmd=create src=/tmp/in2/_temporary/1/_temporary/attempt_1464484887407_0011_m_000095_0/part-m-00095 dst=null perm=root:hdfs:rw-r--r-- proto=rpc callerContext=mr_attempt_1464484887407_0011_m_000095_0
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ResourceManager Audit Logs and Caller Context resourcemanager.RMAuditLogger: USER=userA IP=172.22.68.32 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0001 CALLERCONTEXT=PIG-pigSmoke.sh-8a052588-0013-4e39-83b1-ebad699d8e2e resourcemanager.RMAuditLogger: USER=userA IP=172.22.68.30 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0009 CALLERCONTEXT=CLI resourcemanager.RMAuditLogger: USER=userB IP=172.22.68.34 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0008 CALLERCONTEXT=mr_attempt_1464484887407_0007_m_000000_0 resourcemanager.RMAuditLogger: USER=userB IP=172.22.68.30 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1464484887407_0012 CALLERCONTEXT=HIVE_SSN_ID:f3aadf99-9e36-494b-84a1-99b685ac344b
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved YARN Application Timeline Service  YARN service for fine grained application level tracing  Enables complex metadata to be recorded as the YARN app makes progress  Allows retrieval of this timeline data based on filters  Can be used to drive limited online analytics and extensive post-hoc analysis
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lineage Tracking using YARN Timeline  Timeline:8188/ws/v1/timeline/TEZ_DAG_ID/dag_1464484887407_0013_1 dagContext: { callerId: "root_20160529021115_006f8007-5840-4c64-9970-c1b506f68db2", callerType: "HIVE_QUERY_ID", context: "HIVE", description: "select user, count(visit_id) as visits from users group by user order by visits” }  Timeline:8188/ws/v1/timeline/HIVE_QUERY_ID/root_20160529021115_006f8007- 5840-4c64-9970-c1b506f68db2 hiveContext: { callerId: “workflow_abcd", callerType: “OOZIE_ID", context: “OOZIE", description: “Daily ETL Summary Job” }
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari Log Search
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari Log Search
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Metrics and Monitoring Logging and Correlation Tracing and Analysis
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Tracing and Analysis  Use Big Data methods to solve Big Data problems  Apache Zeppelin as analytical tool  Hive/Tez/YARN notebook for analysis
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin for Ad-hoc Analytics
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved YARN Analyzer
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Tez Analyzer
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Tez Analyzer
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Tez Analyzer
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You

Editor's Notes

  1. It is now possible to infer which application/job did what in HDFS Files created can be tracked down to the MR or Tez job and the specific task attempt that created them. Using simple string manipulation and aggregations, you can file jobs inducing high loads against the Namenode.
  2. Tracking what YARN maps to what application type and instance is now much easier. It could made more easier if “mr_attempt_1464484887407_0007_m_000000_0” pointed to an oozie worklow instead of the MR job  Who killed my application and how (command-line, webservice)?