SlideShare a Scribd company logo
1 of 58
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Operations -
Past, Present, and
Future
Room III
Wednesday, April 18
2:50 PM - 3:30 PM
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Introduction
⬢ Who are we?
⬢ Logging - Ambari Log Search
⬢ Metrics - Anomaly Detection
⬢ Upgrades - Patch Upgrade
⬢ Core - Management Packs, Multi-Version/Instance
⬢ Recommendations - SmartSense
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Today’s Speakers
Oliver Szabo Paul Codding
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Logging
Oliver Szabo
Istvan Tobias
Ambari Log Search
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Logging - Feature Intro
Problem Statement: Hadoop services produce a lot of logs, how can we make it easier
for operators to find the key logs they are looking for in a hundred, or thousand node
cluster...especially if the issue occured a week ago?
Challenge: How do we build a scalable logging infrastructure that is aware of all of our
components, their dependencies, and can parse all of the logs from 30+ projects?
Key Requirements:
- Provide a turnkey solution to logging for all of our HDP products that a
- Make it easy and fun to use
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Logging - Technical Approach
⬢ Log Aggregation, analysis and visualization for Ambari managed services
⬢ Store Logs + Search Logs + Centralized
⬢ Basic components: SOLR + ZooKeeper + Log Feeder + Log Search
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Logging - Architecture
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Logging - Architecture
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Logging - Architecture
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Logging - Architecture
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Logging - Architecture
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Logging - Architecture
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Logging - Architecture (from Ambari 2.7+)
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Logging - Pluggable components
⬢ Config API: ZooKeeper by default
⬢ Log Shipper: LogFeeder
⬢ Log Collection: Solr by default
– Log Search server act as a proxy (other auth. mechanisms can be added)
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Logging - Notifications (Ambari 3.0)
⬢ Do something with the logs !
⬢ Send notifications (like email with reports)
⬢ Notifications based on policies
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Log Search Demo
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Logging - Timelines
Ambari 2.6.0.0
Oct 2017
HDP
2.6.3 GA
Ambari 3.0.0.0
2H 2018
HDP
3.1 GA
Ambari 2.6.1.0
Jan 2018
HDP
2.6.4 GA
Ambari 2.7.0.0
1H 2018
HDP
3.0 GA
Log Search
GA
Log Search
Tech Preview
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Metrics
Sid Wagle
Aravindan Vijayan
Ambari Metrics System - Anomaly Detection
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Anomaly Detection - Feature Intro
Problem Statement: Hadoop services produce a lot of metrics, and an operator can
only stare at a wall of graphs for so long...how do we ensure that we only bring
attention to those metrics that are messed up?
Challenge: How do we build a system that can detect both point and trend anomalies
in multiple different time series data streams?
Key Requirements:
- Provide a way to “watch” key component metrics and detect when there is a non-
normal change in those metrics
- Do all of this without creating too many false positives
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Anomaly Detection - Technical Approach
Types of Anomalies:
⬢ Point in Time Anomaly
⬢ Trend Anomaly
⬢ Correlation Anomaly
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Anomaly Detection - Technical Approach
Set of 3 independent subsystems
Every configured metric will be processed by every subsystem
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Anomaly Detection - Technical Approach
Point In Time Subsystem
⬢ EMA & Tukey’s
Trend Anomaly Subsystem
⬢ KS Test & Historical Standard Deviation
Correlation Anomaly Subsystem
⬢ Application of Isolation Forest
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Anomaly Detection - Technical Approach
What can we do when a system event of interest occurs?
Event: HBase Region Server crashes
⬢ We can provide a dynamic Grafana dashboard which has
– Snapshot of the metrics & anomalies in the RS profile
– Snapshot of the metrics in related and correlated profiles
– Highlight RS profile anomalies that occurred in the last N minutes
⬢ Can be used as a launching pad for exploring the crash.
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Anomaly Detection - Technical Approach
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Anomaly Detection - Timelines
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Upgrades
Jonathan Hurley
Nate Cole
Ambari Patch Upgrade
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Patch Upgrade - Feature Intro
Problem Statement: Bugs happen, and Hortonworks support is there to help
customers get a fix for those bugs...but how do we make it easy for customers to apply
those fixes and not have to upgrade everything?
Challenge: How do we build a system that can allow customers to easily apply, revert,
and track applied patches?
Key Requirements:
- Make applying a patch feel just like any other upgrade...only just upgrade those
services affected
- Allow users to revert patches
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Patch Upgrade - Technical Approach
A Problem With Ambari's Architecture…
⬢ Ambari only dealt with stacks and services as a whole.
⬢ A cluster was always bound to a single repository at a time.
⬢ Upgrades could be accomplished, but would require every service to participate
– For larger stack changes, this was fine. However, this also meant that applying simple
patches for bug fixes touched the entire cluster.
HDFS
YARN
ZooKeeper
HDP 2.6
(2.6.0.0-1234)
HDFS
YARN
ZooKeeper
HDP 2.6
(2.6.1.0-5555)
HDFS
YARN
ZooKeeper
HDP 2.6
(2.6.2.0-7890)
Upgrade Upgrade
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Patch Upgrade - Technical Approach
⬢ Allowing the cluster to be associated with multiple stacks would not only be a
monumental change, it would also require a matrix of testing which could not
possible be supported.
⬢ However, allowing individual services and components to be associated with
different repositories was possible.
– The only restriction is that each repository is still a part of the same major stack version.
Upgrade
HDFS
(2.6.0.0-1234)
HDP 2.6
YARN
(2.6.0.0-1234)
ZooKeeper
(2.6.0.0-1234)
HDFS
(2.6.0.0-1234)
HDP 2.6
YARN
(2.6.0.0-1234)
ZooKeeper
(2.6.1.0-1111)
HDFS
(2.6.2.5-9999)
HDP 2.6
YARN
(2.6.0.0-1234)
ZooKeeper
(2.6.1.0-1111)
Upgrade HDFS
(2.6.3.0-0001)
HDP 2.6
YARN
(2.6.3.0-0001)
ZooKeeper
(2.6.1.0-1111)
Upgrade
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Patch Upgrade - Technical Approach
Version Definition File (VDF) - A single XML file which describes the contents of a
repository, such a services, versions, repository URLs, etc.
⬢ Ambari needed a way to determine if a repository only contained a subset of
services. This would allow the software to understand what kind of upgrade was
being performed.
<release>
<type>PATCH</type>
<stack-id>HDP-2.6</stack-id>
<version>2.6.3.0</version>
<build>235</build>
<compatible-with>2.6.3.d+</compatible-with>
<release-notes>http://example.com</release-notes>
<display>HDP-2.6.3.0-235</display>
</release>
<manifest>
<service id="STORM-110" name="STORM" version="1.1.0"/>
</manifest>
<available-services>
<service idref="STORM-110"/>
</available-services>
<repository-info>
<os family="redhat6">
<package-version>2_6_3_0_*</package-version>
<repo>
<baseurl>http://repo.ambari.apache.org/hdp/centos6/HDP-2.6.3.0-235</baseurl>
⬢ Defines the repository as a PATCH
repository
⬢ Contains only STORM
⬢ Specifies version 2.6.3.0-235
⬢ Contains the URL for packages
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Patch Upgrade - Technical Approach
⬢ Ambari is now able to track versions of services and components independently
⬢ Depending on the command being generated, Ambari can determine the correct
version information to send to the agents
– Writing out configuration values which contain versioned paths
– Starting the correct version of a service
⬢ During distribution of a PATCH repository, Ambari can also target specific hosts
which only contain the services from the VDF
– Provides faster installations which prevent RPM bloat on hosts which are not involved
– Allows for a much faster upgrade since only specific services are restarted
⬢ A larger problem remained still … HADOOP!
– Jobs launched from one host assumed that the versions of components remained constant
across the cluster
– Some components assume that dependent components will match their specific versions
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Register Version
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Install Version
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Upgrade & Revert
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Patch Upgrade - Timelines
Ambari 2.6.0.0
Oct 2017
HDP
2.6.3 GA
Ambari 2.6.1.0
Jan 2018
HDP
2.6.4 GA
Ambari 2.7.0.0
Summer 2018
HDP
3.0 GA
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Core
Jayush Luniya
Madhuvanthi Radhakrishnan
Swapan Shridhar
Scott Duan
Ambari Management Packs
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Core - Feature Intro
Problem Statement: When we started HDP we had one product with 8 services, now
we have over 30 spread across multiple products...In order for Ambari to handle these
new complexities changes are necessary.
Challenge: How do we build a system that can allow multiple Hortonworks products,
and third party products to work together seamlessly...all while having complete
upgrade, patch, and lifecycle operations be completely automated?
Key Requirements:
- Allow Ambari to mix and match multiple products and services on the same
clusters
- Make it easy for partners and users to create their own software that’s managed
by Ambari
43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Core - Technical Approach
Ambari 2.x
Stacks
Ambari 3.x
MPacks
● Stack Definitions Shipped with
Ambari
● Ambari 2.x MPacks used to
“shim” services into an existing
stack
● Only stack services can
participate in upgrades (not
stack extensions, or services
added as MPacks)
● Hard 1:1 Relationships
● Stack Definitions externalized
into MPacks
● MPacks are stand-alone stacks
● MPack services get full Ambari
upgrade automation
● Flexible Relationships
44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Core - Technical Approach
MPack Repository
HDP
3.1.0
BigSQL
5.0.2
HDF
3.1.0
HDP
3.1.0
A Module Definition is:
● Built as a tarball
● Equivalent to Service Definition
● Also contains module.json
● hdfs-3.0.0.0-b123-definition.tar.gz
An Ambari 3.x Management Pack is:
● Built as a tarball
● Containing module definitions
● Equivalent to Stack Definition
● Also contains mpack.json
● hdpcore-1.0.0-b22-definition.tar.gz
Module RPMs:
● Actual install bits for the module
● hdfs_3_0_0_0_b123.rpm
MPack Meta RPM:
● Meta RPM to install all module RPMs
● hdp_3_1_0_b22.rpm
An MPack Repository:
● Holds references and metadata for
MPacks
● Ambari supports multiple MPack
repositories
● Allows operators to search and
discover management packs
● Stores compatibility between
management packs
● Provides recommendations for
MPack bundles
45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Core - Technical Approach
46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Core - Technical Approach
47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Core - Timelines
Ambari 2.6.0.0
Oct 2017
HDP
2.6.3 GA
Ambari 3.0.0.0
2H 2018
HDP
3.1 GA
Ambari 2.6.1.0
Jan 2018
HDP
2.6.4 GA
Ambari 2.7.0.0
1H 2018
HDP
3.0 GA
48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Recommendations
Sheetal Dolas
Beau Plath
Aditya Pathak
Cabir Zounaidou
Better Configuration & Performance
49 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Recommendations - Feature Intro
Problem Statement: Hadoop services have many configurations, and our users use
cases change, and morph over time. How can we make sure that customers
configurations stays optimal as their use of the cluster changes?
Challenge: Ambari’s stack advisor logic is shipped with each version of Ambari, how
we can we provide fresh advice to customers using software shipped two years ago?
Key Requirements:
- Provide a way to constantly analyze a customers cluster and make
recommendations using up to date best practices
- Provide an easy way for customers to review and apply those recommendations in
Ambari
50 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Recommendations - Technical Approach
Challenge 1: Collecting configuration, metrics, logs for all components and all services
Challenge 2: Anonymizing, encrypting, and sending those logs to Hortonworks on a
scheduled basis
Challenge 3: Making recommendations based off of the input diagnostics
Challenge 4: Make it easy for customers to apply these recommendations in Ambari
51 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Recommendations - Collection
S E RV E R
A M B A R I
A G E N T A G E N T
A G E N TA G E N TA G E N T
A G E N T
B U N D L E
WO R K E R
N O D E
WO R K E R
N O D E
WO R K E R
N O D E
WO R K E R
N O D E
WO R K E R
N O D E
WO R K E R
N O D E
52 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Recommendations - Sending
L A N D I N G Z O N E
S E RV E R
G AT E WAY
A M B A R I
A G E N T A G E N T
A G E N TA G E N TA G E N T
A G E N T
B U N D L E
WO R K E R
N O D E
WO R K E R
N O D E
WO R K E R
N O D E
WO R K E R
N O D E
WO R K E R
N O D E
WO R K E R
N O D E
53 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Recommendations - Recommend
L A N D I N G Z O N E
S E RV E R
G AT E WAY
A M B A R I
A G E N T A G E N T
A G E N TA G E N TA G E N T
A G E N T
B U N D L E
WO R K E R
N O D E
WO R K E R
N O D E
WO R K E R
N O D E
WO R K E R
N O D E
WO R K E R
N O D E
WO R K E R
N O D E
S m a r t S e n s e
A n a l y t i c s
54 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Recommendations - Apply
L A N D I N G Z O N E
S E RV E R
G AT E WAY
A M B A R I
A G E N T A G E N T
A G E N TA G E N TA G E N T
A G E N T
B U N D L E
WO R K E R
N O D E
WO R K E R
N O D E
WO R K E R
N O D E
WO R K E R
N O D E
WO R K E R
N O D E
WO R K E R
N O D E
S m a r t S e n s e
A n a l y t i c s
55 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Recommendations - Timelines
Ambari 2.6.0.0
Oct 2017
HDP
2.6.3 GA
Ambari 2.6.1.0
Jan 2018
HDP
2.6.4 GA
Ambari 2.7.0.0
Summer 2018
HDP
3.0 GA
56 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summary
57 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summary
 Logging - Ambari Log Search
 Metrics - Anomaly Detection
 Upgrades - Patch Upgrade
 Core - Management Packs, Multi-Version/Instance
 Recommendations - SmartSense
Ambari 2.6.0.0
Oct 2017
Ambari 3.0.0.0
2H 2018
Ambari 2.6.1.0
Jan 2018
Ambari 2.7.0.0
1H 2018
Log Search
Anomaly Detection
Management Packs
Coming Soon
Patch Upgrade
Recommendations
Available Now
58 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions?
Thank you for attending!

More Related Content

What's hot

What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4DataWorks Summit
 
Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...DataWorks Summit
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNDataWorks Summit
 
MiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talkMiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talkJoe Percivall
 
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleApache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleHortonworks
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFiHortonworks
 
Intro to Spark with Zeppelin
Intro to Spark with ZeppelinIntro to Spark with Zeppelin
Intro to Spark with ZeppelinHortonworks
 
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3DataWorks Summit
 
Building a data-driven authorization framework
Building a data-driven authorization frameworkBuilding a data-driven authorization framework
Building a data-driven authorization frameworkDataWorks Summit
 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureDataWorks Summit
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerDataWorks Summit
 
Apache Ambari - What's New in 2.2
 Apache Ambari - What's New in 2.2 Apache Ambari - What's New in 2.2
Apache Ambari - What's New in 2.2Hortonworks
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxDataWorks Summit
 
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC IsilonScaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC IsilonHortonworks
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseHortonworks
 
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingApache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingDataWorks Summit/Hadoop Summit
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHortonworks
 
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureDataWorks Summit
 

What's hot (20)

What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4
 
Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...
 
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache SparkRow/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARN
 
MiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talkMiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talk
 
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleApache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, Scale
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
 
Intro to Spark with Zeppelin
Intro to Spark with ZeppelinIntro to Spark with Zeppelin
Intro to Spark with Zeppelin
 
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
 
Building a data-driven authorization framework
Building a data-driven authorization frameworkBuilding a data-driven authorization framework
Building a data-driven authorization framework
 
Transactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and futureTransactional operations in Apache Hive: present and future
Transactional operations in Apache Hive: present and future
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
 
Apache Ambari - What's New in 2.2
 Apache Ambari - What's New in 2.2 Apache Ambari - What's New in 2.2
Apache Ambari - What's New in 2.2
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
 
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC IsilonScaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC Isilon
 
Keynote
KeynoteKeynote
Keynote
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
 
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingApache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical Workshop
 
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
 

Similar to Hadoop Operations - Past, Present, and Future

Managing Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariManaging Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariJayush Luniya
 
Managing Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariManaging Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariHortonworks
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureHortonworks
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easyDataWorks Summit
 
Manage Add-on Services in Apache Ambari
Manage Add-on Services in Apache AmbariManage Add-on Services in Apache Ambari
Manage Add-on Services in Apache AmbariJayush Luniya
 
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureAn Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureDataWorks Summit/Hadoop Summit
 
Future of Apache Ambari
Future of Apache AmbariFuture of Apache Ambari
Future of Apache AmbariJayush Luniya
 
The Future of Apache Ambari
The Future of Apache AmbariThe Future of Apache Ambari
The Future of Apache AmbariDataWorks Summit
 
Manage Add-On Services with Apache Ambari
Manage Add-On Services with Apache AmbariManage Add-On Services with Apache Ambari
Manage Add-On Services with Apache AmbariDataWorks Summit
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming dataCarolyn Duby
 
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSenseStreamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSenseHortonworks
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasyDataWorks Summit
 
Hive acid and_2.x new_features
Hive acid and_2.x new_featuresHive acid and_2.x new_features
Hive acid and_2.x new_featuresAlberto Romero
 
Apache Ambari - What's New in 2.4
Apache Ambari - What's New in 2.4 Apache Ambari - What's New in 2.4
Apache Ambari - What's New in 2.4 Hortonworks
 
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesHDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesTimothy Spann
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...DataWorks Summit
 

Similar to Hadoop Operations - Past, Present, and Future (20)

Managing Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariManaging Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache Ambari
 
Managing Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariManaging Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache Ambari
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easy
 
Manage Add-on Services in Apache Ambari
Manage Add-on Services in Apache AmbariManage Add-on Services in Apache Ambari
Manage Add-on Services in Apache Ambari
 
An Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present FutureAn Overview on Optimization in Apache Hive: Past, Present Future
An Overview on Optimization in Apache Hive: Past, Present Future
 
Future of Apache Ambari
Future of Apache AmbariFuture of Apache Ambari
Future of Apache Ambari
 
The Future of Apache Ambari
The Future of Apache AmbariThe Future of Apache Ambari
The Future of Apache Ambari
 
Manage Add-On Services with Apache Ambari
Manage Add-On Services with Apache AmbariManage Add-On Services with Apache Ambari
Manage Add-On Services with Apache Ambari
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming data
 
What's new in Ambari
What's new in AmbariWhat's new in Ambari
What's new in Ambari
 
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSenseStreamline Apache Hadoop Operations with Apache Ambari and SmartSense
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Streaming analytics manager
Streaming analytics managerStreaming analytics manager
Streaming analytics manager
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
 
Hive acid and_2.x new_features
Hive acid and_2.x new_featuresHive acid and_2.x new_features
Hive acid and_2.x new_features
 
Apache Ambari - What's New in 2.4
Apache Ambari - What's New in 2.4 Apache Ambari - What's New in 2.4
Apache Ambari - What's New in 2.4
 
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesHDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New Features
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 

Hadoop Operations - Past, Present, and Future

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop Operations - Past, Present, and Future Room III Wednesday, April 18 2:50 PM - 3:30 PM
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Introduction ⬢ Who are we? ⬢ Logging - Ambari Log Search ⬢ Metrics - Anomaly Detection ⬢ Upgrades - Patch Upgrade ⬢ Core - Management Packs, Multi-Version/Instance ⬢ Recommendations - SmartSense
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Today’s Speakers Oliver Szabo Paul Codding
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Logging Oliver Szabo Istvan Tobias Ambari Log Search
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Logging - Feature Intro Problem Statement: Hadoop services produce a lot of logs, how can we make it easier for operators to find the key logs they are looking for in a hundred, or thousand node cluster...especially if the issue occured a week ago? Challenge: How do we build a scalable logging infrastructure that is aware of all of our components, their dependencies, and can parse all of the logs from 30+ projects? Key Requirements: - Provide a turnkey solution to logging for all of our HDP products that a - Make it easy and fun to use
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Logging - Technical Approach ⬢ Log Aggregation, analysis and visualization for Ambari managed services ⬢ Store Logs + Search Logs + Centralized ⬢ Basic components: SOLR + ZooKeeper + Log Feeder + Log Search
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Logging - Architecture
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Logging - Architecture
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Logging - Architecture
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Logging - Architecture
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Logging - Architecture
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Logging - Architecture
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Logging - Architecture (from Ambari 2.7+)
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Logging - Pluggable components ⬢ Config API: ZooKeeper by default ⬢ Log Shipper: LogFeeder ⬢ Log Collection: Solr by default – Log Search server act as a proxy (other auth. mechanisms can be added)
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Logging - Notifications (Ambari 3.0) ⬢ Do something with the logs ! ⬢ Send notifications (like email with reports) ⬢ Notifications based on policies
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari Log Search Demo
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Logging - Timelines Ambari 2.6.0.0 Oct 2017 HDP 2.6.3 GA Ambari 3.0.0.0 2H 2018 HDP 3.1 GA Ambari 2.6.1.0 Jan 2018 HDP 2.6.4 GA Ambari 2.7.0.0 1H 2018 HDP 3.0 GA Log Search GA Log Search Tech Preview
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Metrics Sid Wagle Aravindan Vijayan Ambari Metrics System - Anomaly Detection
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Anomaly Detection - Feature Intro Problem Statement: Hadoop services produce a lot of metrics, and an operator can only stare at a wall of graphs for so long...how do we ensure that we only bring attention to those metrics that are messed up? Challenge: How do we build a system that can detect both point and trend anomalies in multiple different time series data streams? Key Requirements: - Provide a way to “watch” key component metrics and detect when there is a non- normal change in those metrics - Do all of this without creating too many false positives
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Anomaly Detection - Technical Approach Types of Anomalies: ⬢ Point in Time Anomaly ⬢ Trend Anomaly ⬢ Correlation Anomaly
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Anomaly Detection - Technical Approach Set of 3 independent subsystems Every configured metric will be processed by every subsystem
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Anomaly Detection - Technical Approach Point In Time Subsystem ⬢ EMA & Tukey’s Trend Anomaly Subsystem ⬢ KS Test & Historical Standard Deviation Correlation Anomaly Subsystem ⬢ Application of Isolation Forest
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Anomaly Detection - Technical Approach What can we do when a system event of interest occurs? Event: HBase Region Server crashes ⬢ We can provide a dynamic Grafana dashboard which has – Snapshot of the metrics & anomalies in the RS profile – Snapshot of the metrics in related and correlated profiles – Highlight RS profile anomalies that occurred in the last N minutes ⬢ Can be used as a launching pad for exploring the crash.
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Anomaly Detection - Technical Approach
  • 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Anomaly Detection - Timelines
  • 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Upgrades Jonathan Hurley Nate Cole Ambari Patch Upgrade
  • 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Patch Upgrade - Feature Intro Problem Statement: Bugs happen, and Hortonworks support is there to help customers get a fix for those bugs...but how do we make it easy for customers to apply those fixes and not have to upgrade everything? Challenge: How do we build a system that can allow customers to easily apply, revert, and track applied patches? Key Requirements: - Make applying a patch feel just like any other upgrade...only just upgrade those services affected - Allow users to revert patches
  • 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Patch Upgrade - Technical Approach A Problem With Ambari's Architecture… ⬢ Ambari only dealt with stacks and services as a whole. ⬢ A cluster was always bound to a single repository at a time. ⬢ Upgrades could be accomplished, but would require every service to participate – For larger stack changes, this was fine. However, this also meant that applying simple patches for bug fixes touched the entire cluster. HDFS YARN ZooKeeper HDP 2.6 (2.6.0.0-1234) HDFS YARN ZooKeeper HDP 2.6 (2.6.1.0-5555) HDFS YARN ZooKeeper HDP 2.6 (2.6.2.0-7890) Upgrade Upgrade
  • 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Patch Upgrade - Technical Approach ⬢ Allowing the cluster to be associated with multiple stacks would not only be a monumental change, it would also require a matrix of testing which could not possible be supported. ⬢ However, allowing individual services and components to be associated with different repositories was possible. – The only restriction is that each repository is still a part of the same major stack version. Upgrade HDFS (2.6.0.0-1234) HDP 2.6 YARN (2.6.0.0-1234) ZooKeeper (2.6.0.0-1234) HDFS (2.6.0.0-1234) HDP 2.6 YARN (2.6.0.0-1234) ZooKeeper (2.6.1.0-1111) HDFS (2.6.2.5-9999) HDP 2.6 YARN (2.6.0.0-1234) ZooKeeper (2.6.1.0-1111) Upgrade HDFS (2.6.3.0-0001) HDP 2.6 YARN (2.6.3.0-0001) ZooKeeper (2.6.1.0-1111) Upgrade
  • 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Patch Upgrade - Technical Approach Version Definition File (VDF) - A single XML file which describes the contents of a repository, such a services, versions, repository URLs, etc. ⬢ Ambari needed a way to determine if a repository only contained a subset of services. This would allow the software to understand what kind of upgrade was being performed. <release> <type>PATCH</type> <stack-id>HDP-2.6</stack-id> <version>2.6.3.0</version> <build>235</build> <compatible-with>2.6.3.d+</compatible-with> <release-notes>http://example.com</release-notes> <display>HDP-2.6.3.0-235</display> </release> <manifest> <service id="STORM-110" name="STORM" version="1.1.0"/> </manifest> <available-services> <service idref="STORM-110"/> </available-services> <repository-info> <os family="redhat6"> <package-version>2_6_3_0_*</package-version> <repo> <baseurl>http://repo.ambari.apache.org/hdp/centos6/HDP-2.6.3.0-235</baseurl> ⬢ Defines the repository as a PATCH repository ⬢ Contains only STORM ⬢ Specifies version 2.6.3.0-235 ⬢ Contains the URL for packages
  • 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Patch Upgrade - Technical Approach ⬢ Ambari is now able to track versions of services and components independently ⬢ Depending on the command being generated, Ambari can determine the correct version information to send to the agents – Writing out configuration values which contain versioned paths – Starting the correct version of a service ⬢ During distribution of a PATCH repository, Ambari can also target specific hosts which only contain the services from the VDF – Provides faster installations which prevent RPM bloat on hosts which are not involved – Allows for a much faster upgrade since only specific services are restarted ⬢ A larger problem remained still … HADOOP! – Jobs launched from one host assumed that the versions of components remained constant across the cluster – Some components assume that dependent components will match their specific versions
  • 37. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Register Version
  • 38. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Install Version
  • 39. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Upgrade & Revert
  • 40. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Patch Upgrade - Timelines Ambari 2.6.0.0 Oct 2017 HDP 2.6.3 GA Ambari 2.6.1.0 Jan 2018 HDP 2.6.4 GA Ambari 2.7.0.0 Summer 2018 HDP 3.0 GA
  • 41. 41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Core Jayush Luniya Madhuvanthi Radhakrishnan Swapan Shridhar Scott Duan Ambari Management Packs
  • 42. 42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Core - Feature Intro Problem Statement: When we started HDP we had one product with 8 services, now we have over 30 spread across multiple products...In order for Ambari to handle these new complexities changes are necessary. Challenge: How do we build a system that can allow multiple Hortonworks products, and third party products to work together seamlessly...all while having complete upgrade, patch, and lifecycle operations be completely automated? Key Requirements: - Allow Ambari to mix and match multiple products and services on the same clusters - Make it easy for partners and users to create their own software that’s managed by Ambari
  • 43. 43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Core - Technical Approach Ambari 2.x Stacks Ambari 3.x MPacks ● Stack Definitions Shipped with Ambari ● Ambari 2.x MPacks used to “shim” services into an existing stack ● Only stack services can participate in upgrades (not stack extensions, or services added as MPacks) ● Hard 1:1 Relationships ● Stack Definitions externalized into MPacks ● MPacks are stand-alone stacks ● MPack services get full Ambari upgrade automation ● Flexible Relationships
  • 44. 44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Core - Technical Approach MPack Repository HDP 3.1.0 BigSQL 5.0.2 HDF 3.1.0 HDP 3.1.0 A Module Definition is: ● Built as a tarball ● Equivalent to Service Definition ● Also contains module.json ● hdfs-3.0.0.0-b123-definition.tar.gz An Ambari 3.x Management Pack is: ● Built as a tarball ● Containing module definitions ● Equivalent to Stack Definition ● Also contains mpack.json ● hdpcore-1.0.0-b22-definition.tar.gz Module RPMs: ● Actual install bits for the module ● hdfs_3_0_0_0_b123.rpm MPack Meta RPM: ● Meta RPM to install all module RPMs ● hdp_3_1_0_b22.rpm An MPack Repository: ● Holds references and metadata for MPacks ● Ambari supports multiple MPack repositories ● Allows operators to search and discover management packs ● Stores compatibility between management packs ● Provides recommendations for MPack bundles
  • 45. 45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Core - Technical Approach
  • 46. 46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Core - Technical Approach
  • 47. 47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Core - Timelines Ambari 2.6.0.0 Oct 2017 HDP 2.6.3 GA Ambari 3.0.0.0 2H 2018 HDP 3.1 GA Ambari 2.6.1.0 Jan 2018 HDP 2.6.4 GA Ambari 2.7.0.0 1H 2018 HDP 3.0 GA
  • 48. 48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Recommendations Sheetal Dolas Beau Plath Aditya Pathak Cabir Zounaidou Better Configuration & Performance
  • 49. 49 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Recommendations - Feature Intro Problem Statement: Hadoop services have many configurations, and our users use cases change, and morph over time. How can we make sure that customers configurations stays optimal as their use of the cluster changes? Challenge: Ambari’s stack advisor logic is shipped with each version of Ambari, how we can we provide fresh advice to customers using software shipped two years ago? Key Requirements: - Provide a way to constantly analyze a customers cluster and make recommendations using up to date best practices - Provide an easy way for customers to review and apply those recommendations in Ambari
  • 50. 50 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Recommendations - Technical Approach Challenge 1: Collecting configuration, metrics, logs for all components and all services Challenge 2: Anonymizing, encrypting, and sending those logs to Hortonworks on a scheduled basis Challenge 3: Making recommendations based off of the input diagnostics Challenge 4: Make it easy for customers to apply these recommendations in Ambari
  • 51. 51 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Recommendations - Collection S E RV E R A M B A R I A G E N T A G E N T A G E N TA G E N TA G E N T A G E N T B U N D L E WO R K E R N O D E WO R K E R N O D E WO R K E R N O D E WO R K E R N O D E WO R K E R N O D E WO R K E R N O D E
  • 52. 52 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Recommendations - Sending L A N D I N G Z O N E S E RV E R G AT E WAY A M B A R I A G E N T A G E N T A G E N TA G E N TA G E N T A G E N T B U N D L E WO R K E R N O D E WO R K E R N O D E WO R K E R N O D E WO R K E R N O D E WO R K E R N O D E WO R K E R N O D E
  • 53. 53 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Recommendations - Recommend L A N D I N G Z O N E S E RV E R G AT E WAY A M B A R I A G E N T A G E N T A G E N TA G E N TA G E N T A G E N T B U N D L E WO R K E R N O D E WO R K E R N O D E WO R K E R N O D E WO R K E R N O D E WO R K E R N O D E WO R K E R N O D E S m a r t S e n s e A n a l y t i c s
  • 54. 54 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Recommendations - Apply L A N D I N G Z O N E S E RV E R G AT E WAY A M B A R I A G E N T A G E N T A G E N TA G E N TA G E N T A G E N T B U N D L E WO R K E R N O D E WO R K E R N O D E WO R K E R N O D E WO R K E R N O D E WO R K E R N O D E WO R K E R N O D E S m a r t S e n s e A n a l y t i c s
  • 55. 55 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Recommendations - Timelines Ambari 2.6.0.0 Oct 2017 HDP 2.6.3 GA Ambari 2.6.1.0 Jan 2018 HDP 2.6.4 GA Ambari 2.7.0.0 Summer 2018 HDP 3.0 GA
  • 56. 56 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Summary
  • 57. 57 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Summary  Logging - Ambari Log Search  Metrics - Anomaly Detection  Upgrades - Patch Upgrade  Core - Management Packs, Multi-Version/Instance  Recommendations - SmartSense Ambari 2.6.0.0 Oct 2017 Ambari 3.0.0.0 2H 2018 Ambari 2.6.1.0 Jan 2018 Ambari 2.7.0.0 1H 2018 Log Search Anomaly Detection Management Packs Coming Soon Patch Upgrade Recommendations Available Now
  • 58. 58 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Questions? Thank you for attending!

Editor's Notes

  1. We’ll spend just a few minutes introducing ourselves
  2. Point in Time Represents a single or a few extreme (outlier) values in a single metric series. For example, on a steady state system, if the cpu load is suddenly high, then that would be a point in time anomaly. The goal is to capture notify these kind of anomalies in real time (~1-2 mins) Trend Anomaly Represents an unusual change in the trend or distribution of a single metric series, with respect to its historical behavior. For example, if the Namenode Heap Usage is usually high for a particular hour during the weekdays and falls on weekends. In the current week, if there is a rise in heap usage during the weekend, it probably is an anomaly. Correlation Anomaly Anomalies where a metric by itself could be operating within acceptable levels. However, the combination of N metrics together represent an anomalous state. For example, if there is an increase in ResourceManager client requests per second but the number of operations completed per second does not increase (or falls) -> Anomaly! It could be a problem in queues getting filled up in YARN