Apache Ambari is used by thousands of Hadoop Operators to manage the deployment, lifecycle, and automation of DevOps for Hadoop ecosystem projects. The Ambari engineering team will talk about improvements being made to the automation, metrics, logging, upgrade, and other core frameworks within Ambari as the project is being re-imagined.
Starting out, Apache Ambari installed a handful of Apache Hadoop ecosystem projects, on a few operating systems, and helped with the most basic Hadoop operational tasks. Today, the product manages over 20 different services, runs on multiple major operating systems and versions, and automates many of the most challenging Hadoop operational tasks in the most secure customer environments.
As part of this talk, the engineering team will walk you through what we've learned, the challenges we've overcome, and how the Apache Ambari community has changed the product to handle them. The future is fast approaching, and with it comes new on-premise and cloud deployment architectures. See how Apache Ambari is being re-imagined to handle these new challenges.
Speaker
Paul Codding, Product Management Director, Hortonworks
Oliver Szabo, Senior Software Engineer, Hortonworks
We’ll spend just a few minutes introducing ourselves
Point in Time
Represents a single or a few extreme (outlier) values in a single metric series.
For example, on a steady state system, if the cpu load is suddenly high, then that would be a point in time anomaly.
The goal is to capture notify these kind of anomalies in real time (~1-2 mins)
Trend Anomaly
Represents an unusual change in the trend or distribution of a single metric series, with respect to its historical behavior.
For example, if the Namenode Heap Usage is usually high for a particular hour during the weekdays and falls on weekends.
In the current week, if there is a rise in heap usage during the weekend, it probably is an anomaly.
Correlation Anomaly
Anomalies where a metric by itself could be operating within acceptable levels.
However, the combination of N metrics together represent an anomalous state.
For example, if there is an increase in ResourceManager client requests per second but the number of operations completed per second does not increase (or falls) -> Anomaly!
It could be a problem in queues getting filled up in YARN