SlideShare a Scribd company logo
Modernized Monitoring for Clusters
and Clouds of All Types
Ian Lumb
Product Marketing Manager
Bright Topics Webinar
April 15, 2015
RECORDING
2
 Modernized monitoring
 Monitoring HPC and Hadoop clusters
 Monitoring public and private clouds
 Monitoring with alerts and health checks
 Customized monitoring - including how to incorporate
your own monitors
Key Takeaways
RECORDING
The Five Essential Strategies
1. Plan to manage the impact of software complexity
2. Plan for scalable growth
3. Plan to manage heterogeneous hardware/software
solutions
4. Be ready for the Cloud
5. Have an answer for the Hadoop question
http://insidehpc.com/2014/05/five-essential-strategies-successful-hpc-clusters/
http://insidehpc.com/2014/11/monitoring-hpc-clusters-modernized/
5
The problem with “Toolkits”
 Toolkits — “A patchwork of disparate tools”
• Tools typically used: Ganglia, Nagios, Cfengine, System Imager,
Puppet, Chef, Cobbler, Hobbit, Big Brother, Zabbix, etc.
• Scripts
 Issues with the “toolkit” approach:
• Scripts poorly documented and hard to maintain
• Tools not designed to work together
• Each tool has its own user interface (CLI/GUI)
• Each tool has its own agent and database
Hidden assumptions and biases re: sampling and more
• Tools rarely designed for scale & high performance
• Accelerators and coprocessors often not supported
 Making a collection of unrelated tools work together
• Requires a lot of expertise and scripting
• Rarely leads to a really easy-to-use and scalable solution
6
The problem with “Meta-Toolkits”
 Meta-Toolkits likely to obfuscate
• Assumptions and biases involved in sampling and processing
Was interpolation or extrapolation required?
• Scalability limitations
• Existing capabilities within a specific toolkit
User beware the LCD effect!
• The ongoing burden of management and maintenance
http://insidehpc.com/2014/11/monitoring-hpc-clusters-modernized/
7
Pressing concerns, real implications
 Significant toolkit legacy in HPC
• Use of meta-toolkits escalating
 Hadoop deployments rediscovering
toolkit legacy
• Hadoop monitoring +
{ NAGIOS || Ganglia || ??? }
• Apache Ambari an evolving meta-toolkit
‘Modernized’ monitoring with meta-toolkits?
http://www.hpcwire.com/2014/09/18/modernizing-hpc-cluster-monitoring/
"Those who cannot
remember the past
are condemned to
repeat it“
George Santayana
The Life of Reason, Vol. 1
1905
Hadoop Users: Stop Settling for the Santayana Effect TODAY!
https://www.linkedin.com/pulse/hadoop-users-stop-settling-santayana-effect-today-ian-lumb
https://lnkd.in/eymE82J
11
Pressing concerns, real implications
 Significant toolkit legacy in HPC
• Use of meta-toolkits escalating
 Hadoop deployments rediscovering
toolkit legacy
• Hadoop monitoring +
{ NAGIOS || Ganglia || ??? }
• Apache Ambari an evolving meta-toolkit
 OpenStack on track to also
rediscover the toolkit legacy
‘Modernized’ monitoring with meta-toolkits?
http://www.hpcwire.com/2014/09/18/modernizing-hpc-cluster-monitoring/
"Those who cannot
remember the past
are condemned to
repeat it“
George Santayana
The Life of Reason, Vol. 1
1905
Hadoop Users: Stop Settling for the Santayana Effect TODAY!
https://www.linkedin.com/pulse/hadoop-users-stop-settling-santayana-effect-today-ian-lumb
http://docs.openstack.org/admin-guide-cloud/content/figures/2/figures/openstack-arch-havana-logical-v1.jpg
OpenStack Architecture (Havana)
13
Pressing concerns, real implications
 Significant toolkit legacy in HPC
• Use of meta-toolkits escalating
 Hadoop deployments rediscovering
toolkit legacy
• Hadoop monitoring +
{ NAGIOS || Ganglia || ??? }
• Apache Ambari an evolving meta-toolkit
 OpenStack on track to also
rediscover the toolkit legacy
‘Modernized’ monitoring with meta-toolkits?
http://www.hpcwire.com/2014/09/18/modernizing-hpc-cluster-monitoring/
"Those who cannot
remember the past
are condemned to
repeat it“
George Santayana
The Life of Reason, Vol. 1
1905
Hadoop Users: Stop Settling for the Santayana Effect TODAY!
https://www.linkedin.com/pulse/hadoop-users-stop-settling-santayana-effect-today-ian-lumb
14
A Better Solution
 Bright Cluster Manager takes a much more fundamental
& integrated approach
• Designed and written from the ground up
• Single cluster management agent provides all functionality
• Single, central database for configuration and monitoring data
• Single UI for ALL cluster management functionality
 Which makes Bright Cluster Manager …
• Extremely easy to use
• Extremely scalable
• Secure & reliable
• Complete
• Flexible
• Maintainable
Bright Cluster
Architecture — Monitoring
CMDaemon
head node
node001
node003
node002
data
Cluster
Management
GUI
Cluster
Management
Shell
Web-Based
User Portal
Third-Party
Applications
BMC
BMC
BMCraw data consolidated
data
metrics
metrics
metrics
metrics
metrics
16
Native Metrics for Clusters & Clouds
 Over 160 relating to HPC
• From bare metal to workload managers to apps
Includes accelerators and coprocessors
• On-the-ground and in-the-public-cloud
 Over 400 relating to Hadoop
• From distros, HDFS & YARN to data-platform apps
 Almost 90 relating to OpenStack
• Tenant-specific plus private cloud as-a-whole
 Over 60 relating to Ceph
http://www.brightcomputing.com/Linux-Cluster-Monitoring
19
Monitoring++
 Proactive alert-based monitoring
• Define thresholds for any metric
• Associate actions with thresholds
Actions execute when thresholds exceeded
 Health checks
• Invasive plus dynamic diagnostics
Cluster monitoring vs. health checking: What’s the difference?
http://info.brightcomputing.com/blog/cluster-monitoring-vs.-health-checking-whats-the-difference
http://www.brightcomputing.com/Linux-Cluster-Health
20
23
 Modernized monitoring
 Monitoring HPC and Hadoop clusters
 Monitoring public and private clouds
 Monitoring with alerts and health checks
 Customized monitoring - including how to incorporate
your own monitors
Key Takeaways
RECORDING
Q & A
Ian Lumb, ian.lumb@brightcomputing.com
http://www.brightcomputing.com/

More Related Content

Similar to Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and Clouds of All Types

Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Renato Bonomini
 
Operating a Highly Available Cloud Service
Operating a Highly Available Cloud ServiceOperating a Highly Available Cloud Service
Operating a Highly Available Cloud Service
Depankar Neogi
 
Top 10 DevOps Areas Need To Focus
Top 10 DevOps Areas Need To FocusTop 10 DevOps Areas Need To Focus
Top 10 DevOps Areas Need To Focus
devopsjourney
 
Humana digitally transforming health and well-being with Pivotal cloud foundr...
Humana digitally transforming health and well-being with Pivotal cloud foundr...Humana digitally transforming health and well-being with Pivotal cloud foundr...
Humana digitally transforming health and well-being with Pivotal cloud foundr...
Dynatrace
 
Taking agile development to enterprise scale in a mixed tool environment with...
Taking agile development to enterprise scale in a mixed tool environment with...Taking agile development to enterprise scale in a mixed tool environment with...
Taking agile development to enterprise scale in a mixed tool environment with...
IBM Rational software
 
Hp discover 2012 managing the virtualization explosion
Hp discover 2012   managing the virtualization explosionHp discover 2012   managing the virtualization explosion
Hp discover 2012 managing the virtualization explosion
Stefan Bergstein
 
A DevOps adoption playbook- achieving business value at scale
A DevOps adoption playbook- achieving business value at scaleA DevOps adoption playbook- achieving business value at scale
A DevOps adoption playbook- achieving business value at scale
Sanjeev Sharma
 
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Subbu Rama
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
Yafang Chang
 
Cloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
Cloud Foundry and Microservices: A Mutualistic Symbiotic RelationshipCloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
Cloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
VMware Tanzu
 
Cloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
Cloud Foundry and Microservices: A Mutualistic Symbiotic RelationshipCloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
Cloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
Matt Stine
 
Automate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking EcosystemAutomate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking Ecosystem
Hellmar Becker
 
Technology insights: Decision Science Platform
Technology insights: Decision Science PlatformTechnology insights: Decision Science Platform
Technology insights: Decision Science Platform
Decision Science Community
 
ML-Ops: Philosophy, Best-Practices and Tools
ML-Ops:Philosophy, Best-Practices and ToolsML-Ops:Philosophy, Best-Practices and Tools
ML-Ops: Philosophy, Best-Practices and Tools
Jorge Davila-Chacon
 
Evolution of Drupal and the Drupal community
Evolution of Drupal and the Drupal communityEvolution of Drupal and the Drupal community
Evolution of Drupal and the Drupal community
Angela Byron
 
OpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software StackOpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software Stack
inside-BigData.com
 
VMworld 2013: Building the Management Stack for Your Software Defined Data Ce...
VMworld 2013: Building the Management Stack for Your Software Defined Data Ce...VMworld 2013: Building the Management Stack for Your Software Defined Data Ce...
VMworld 2013: Building the Management Stack for Your Software Defined Data Ce...
VMworld
 
What HPC can learn from DevOps?
What HPC can learn from DevOps?What HPC can learn from DevOps?
What HPC can learn from DevOps?
Walid Shaari
 
Does Big Data Spell Big Costs- Impetus Webinar
Does Big Data Spell Big Costs- Impetus WebinarDoes Big Data Spell Big Costs- Impetus Webinar
Does Big Data Spell Big Costs- Impetus Webinar
Impetus Technologies
 
PureApp Hybrid Cloud - Mark Willemse ING Presentation 11th September 2014
PureApp Hybrid Cloud - Mark Willemse ING Presentation 11th September 2014PureApp Hybrid Cloud - Mark Willemse ING Presentation 11th September 2014
PureApp Hybrid Cloud - Mark Willemse ING Presentation 11th September 2014
IBM Systems UKI
 

Similar to Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and Clouds of All Types (20)

Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
 
Operating a Highly Available Cloud Service
Operating a Highly Available Cloud ServiceOperating a Highly Available Cloud Service
Operating a Highly Available Cloud Service
 
Top 10 DevOps Areas Need To Focus
Top 10 DevOps Areas Need To FocusTop 10 DevOps Areas Need To Focus
Top 10 DevOps Areas Need To Focus
 
Humana digitally transforming health and well-being with Pivotal cloud foundr...
Humana digitally transforming health and well-being with Pivotal cloud foundr...Humana digitally transforming health and well-being with Pivotal cloud foundr...
Humana digitally transforming health and well-being with Pivotal cloud foundr...
 
Taking agile development to enterprise scale in a mixed tool environment with...
Taking agile development to enterprise scale in a mixed tool environment with...Taking agile development to enterprise scale in a mixed tool environment with...
Taking agile development to enterprise scale in a mixed tool environment with...
 
Hp discover 2012 managing the virtualization explosion
Hp discover 2012   managing the virtualization explosionHp discover 2012   managing the virtualization explosion
Hp discover 2012 managing the virtualization explosion
 
A DevOps adoption playbook- achieving business value at scale
A DevOps adoption playbook- achieving business value at scaleA DevOps adoption playbook- achieving business value at scale
A DevOps adoption playbook- achieving business value at scale
 
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
 
Cloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
Cloud Foundry and Microservices: A Mutualistic Symbiotic RelationshipCloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
Cloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
 
Cloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
Cloud Foundry and Microservices: A Mutualistic Symbiotic RelationshipCloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
Cloud Foundry and Microservices: A Mutualistic Symbiotic Relationship
 
Automate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking EcosystemAutomate Hadoop Cluster Deployment in a Banking Ecosystem
Automate Hadoop Cluster Deployment in a Banking Ecosystem
 
Technology insights: Decision Science Platform
Technology insights: Decision Science PlatformTechnology insights: Decision Science Platform
Technology insights: Decision Science Platform
 
ML-Ops: Philosophy, Best-Practices and Tools
ML-Ops:Philosophy, Best-Practices and ToolsML-Ops:Philosophy, Best-Practices and Tools
ML-Ops: Philosophy, Best-Practices and Tools
 
Evolution of Drupal and the Drupal community
Evolution of Drupal and the Drupal communityEvolution of Drupal and the Drupal community
Evolution of Drupal and the Drupal community
 
OpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software StackOpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software Stack
 
VMworld 2013: Building the Management Stack for Your Software Defined Data Ce...
VMworld 2013: Building the Management Stack for Your Software Defined Data Ce...VMworld 2013: Building the Management Stack for Your Software Defined Data Ce...
VMworld 2013: Building the Management Stack for Your Software Defined Data Ce...
 
What HPC can learn from DevOps?
What HPC can learn from DevOps?What HPC can learn from DevOps?
What HPC can learn from DevOps?
 
Does Big Data Spell Big Costs- Impetus Webinar
Does Big Data Spell Big Costs- Impetus WebinarDoes Big Data Spell Big Costs- Impetus Webinar
Does Big Data Spell Big Costs- Impetus Webinar
 
PureApp Hybrid Cloud - Mark Willemse ING Presentation 11th September 2014
PureApp Hybrid Cloud - Mark Willemse ING Presentation 11th September 2014PureApp Hybrid Cloud - Mark Willemse ING Presentation 11th September 2014
PureApp Hybrid Cloud - Mark Willemse ING Presentation 11th September 2014
 

More from Ian Lumb

Towards Deep Learning from Twitter for Improved Tsunami Alerts and Advisories
Towards Deep Learning from Twitter for Improved Tsunami Alerts and AdvisoriesTowards Deep Learning from Twitter for Improved Tsunami Alerts and Advisories
Towards Deep Learning from Twitter for Improved Tsunami Alerts and Advisories
Ian Lumb
 
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Ian Lumb
 
Managing Containerized HPC and AI Workloads on TSUBAME3.0
Managing Containerized HPC and AI Workloads on TSUBAME3.0Managing Containerized HPC and AI Workloads on TSUBAME3.0
Managing Containerized HPC and AI Workloads on TSUBAME3.0
Ian Lumb
 
Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...
Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...
Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...
Ian Lumb
 
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Ian Lumb
 
Drilling Deep with Machine Learning as an Enterprise Enabled Micro Service
Drilling Deep with Machine Learning as an Enterprise Enabled Micro ServiceDrilling Deep with Machine Learning as an Enterprise Enabled Micro Service
Drilling Deep with Machine Learning as an Enterprise Enabled Micro Service
Ian Lumb
 
Machine Learning for Big Data Analytics: Scaling In with Containers while Sc...
Machine Learning for Big Data Analytics:  Scaling In with Containers while Sc...Machine Learning for Big Data Analytics:  Scaling In with Containers while Sc...
Machine Learning for Big Data Analytics: Scaling In with Containers while Sc...
Ian Lumb
 
Docker 101 - all about Docker containers
Docker 101 - all about Docker containers Docker 101 - all about Docker containers
Docker 101 - all about Docker containers
Ian Lumb
 
High Performance Computing in the Cloud?
High Performance Computing in the Cloud?High Performance Computing in the Cloud?
High Performance Computing in the Cloud?
Ian Lumb
 
VoDcast Slides: The Rise in Popularity of Apache Spark
VoDcast Slides: The Rise in Popularity of Apache SparkVoDcast Slides: The Rise in Popularity of Apache Spark
VoDcast Slides: The Rise in Popularity of Apache Spark
Ian Lumb
 
Utilizing Public AND Private Clouds with Bright Cluster Manager
Utilizing Public AND Private Clouds with Bright Cluster ManagerUtilizing Public AND Private Clouds with Bright Cluster Manager
Utilizing Public AND Private Clouds with Bright Cluster Manager
Ian Lumb
 
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeHow to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
Ian Lumb
 
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Ian Lumb
 

More from Ian Lumb (13)

Towards Deep Learning from Twitter for Improved Tsunami Alerts and Advisories
Towards Deep Learning from Twitter for Improved Tsunami Alerts and AdvisoriesTowards Deep Learning from Twitter for Improved Tsunami Alerts and Advisories
Towards Deep Learning from Twitter for Improved Tsunami Alerts and Advisories
 
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
 
Managing Containerized HPC and AI Workloads on TSUBAME3.0
Managing Containerized HPC and AI Workloads on TSUBAME3.0Managing Containerized HPC and AI Workloads on TSUBAME3.0
Managing Containerized HPC and AI Workloads on TSUBAME3.0
 
Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...
Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...
Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...
 
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
 
Drilling Deep with Machine Learning as an Enterprise Enabled Micro Service
Drilling Deep with Machine Learning as an Enterprise Enabled Micro ServiceDrilling Deep with Machine Learning as an Enterprise Enabled Micro Service
Drilling Deep with Machine Learning as an Enterprise Enabled Micro Service
 
Machine Learning for Big Data Analytics: Scaling In with Containers while Sc...
Machine Learning for Big Data Analytics:  Scaling In with Containers while Sc...Machine Learning for Big Data Analytics:  Scaling In with Containers while Sc...
Machine Learning for Big Data Analytics: Scaling In with Containers while Sc...
 
Docker 101 - all about Docker containers
Docker 101 - all about Docker containers Docker 101 - all about Docker containers
Docker 101 - all about Docker containers
 
High Performance Computing in the Cloud?
High Performance Computing in the Cloud?High Performance Computing in the Cloud?
High Performance Computing in the Cloud?
 
VoDcast Slides: The Rise in Popularity of Apache Spark
VoDcast Slides: The Rise in Popularity of Apache SparkVoDcast Slides: The Rise in Popularity of Apache Spark
VoDcast Slides: The Rise in Popularity of Apache Spark
 
Utilizing Public AND Private Clouds with Bright Cluster Manager
Utilizing Public AND Private Clouds with Bright Cluster ManagerUtilizing Public AND Private Clouds with Bright Cluster Manager
Utilizing Public AND Private Clouds with Bright Cluster Manager
 
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeHow to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
 
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
 

Recently uploaded

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 

Recently uploaded (20)

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 

Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and Clouds of All Types

  • 1. Modernized Monitoring for Clusters and Clouds of All Types Ian Lumb Product Marketing Manager Bright Topics Webinar April 15, 2015 RECORDING
  • 2. 2  Modernized monitoring  Monitoring HPC and Hadoop clusters  Monitoring public and private clouds  Monitoring with alerts and health checks  Customized monitoring - including how to incorporate your own monitors Key Takeaways RECORDING
  • 3. The Five Essential Strategies 1. Plan to manage the impact of software complexity 2. Plan for scalable growth 3. Plan to manage heterogeneous hardware/software solutions 4. Be ready for the Cloud 5. Have an answer for the Hadoop question http://insidehpc.com/2014/05/five-essential-strategies-successful-hpc-clusters/
  • 5. 5 The problem with “Toolkits”  Toolkits — “A patchwork of disparate tools” • Tools typically used: Ganglia, Nagios, Cfengine, System Imager, Puppet, Chef, Cobbler, Hobbit, Big Brother, Zabbix, etc. • Scripts  Issues with the “toolkit” approach: • Scripts poorly documented and hard to maintain • Tools not designed to work together • Each tool has its own user interface (CLI/GUI) • Each tool has its own agent and database Hidden assumptions and biases re: sampling and more • Tools rarely designed for scale & high performance • Accelerators and coprocessors often not supported  Making a collection of unrelated tools work together • Requires a lot of expertise and scripting • Rarely leads to a really easy-to-use and scalable solution
  • 6. 6 The problem with “Meta-Toolkits”  Meta-Toolkits likely to obfuscate • Assumptions and biases involved in sampling and processing Was interpolation or extrapolation required? • Scalability limitations • Existing capabilities within a specific toolkit User beware the LCD effect! • The ongoing burden of management and maintenance http://insidehpc.com/2014/11/monitoring-hpc-clusters-modernized/
  • 7. 7 Pressing concerns, real implications  Significant toolkit legacy in HPC • Use of meta-toolkits escalating  Hadoop deployments rediscovering toolkit legacy • Hadoop monitoring + { NAGIOS || Ganglia || ??? } • Apache Ambari an evolving meta-toolkit ‘Modernized’ monitoring with meta-toolkits? http://www.hpcwire.com/2014/09/18/modernizing-hpc-cluster-monitoring/ "Those who cannot remember the past are condemned to repeat it“ George Santayana The Life of Reason, Vol. 1 1905 Hadoop Users: Stop Settling for the Santayana Effect TODAY! https://www.linkedin.com/pulse/hadoop-users-stop-settling-santayana-effect-today-ian-lumb
  • 9.
  • 10.
  • 11. 11 Pressing concerns, real implications  Significant toolkit legacy in HPC • Use of meta-toolkits escalating  Hadoop deployments rediscovering toolkit legacy • Hadoop monitoring + { NAGIOS || Ganglia || ??? } • Apache Ambari an evolving meta-toolkit  OpenStack on track to also rediscover the toolkit legacy ‘Modernized’ monitoring with meta-toolkits? http://www.hpcwire.com/2014/09/18/modernizing-hpc-cluster-monitoring/ "Those who cannot remember the past are condemned to repeat it“ George Santayana The Life of Reason, Vol. 1 1905 Hadoop Users: Stop Settling for the Santayana Effect TODAY! https://www.linkedin.com/pulse/hadoop-users-stop-settling-santayana-effect-today-ian-lumb
  • 13. 13 Pressing concerns, real implications  Significant toolkit legacy in HPC • Use of meta-toolkits escalating  Hadoop deployments rediscovering toolkit legacy • Hadoop monitoring + { NAGIOS || Ganglia || ??? } • Apache Ambari an evolving meta-toolkit  OpenStack on track to also rediscover the toolkit legacy ‘Modernized’ monitoring with meta-toolkits? http://www.hpcwire.com/2014/09/18/modernizing-hpc-cluster-monitoring/ "Those who cannot remember the past are condemned to repeat it“ George Santayana The Life of Reason, Vol. 1 1905 Hadoop Users: Stop Settling for the Santayana Effect TODAY! https://www.linkedin.com/pulse/hadoop-users-stop-settling-santayana-effect-today-ian-lumb
  • 14. 14 A Better Solution  Bright Cluster Manager takes a much more fundamental & integrated approach • Designed and written from the ground up • Single cluster management agent provides all functionality • Single, central database for configuration and monitoring data • Single UI for ALL cluster management functionality  Which makes Bright Cluster Manager … • Extremely easy to use • Extremely scalable • Secure & reliable • Complete • Flexible • Maintainable
  • 15. Bright Cluster Architecture — Monitoring CMDaemon head node node001 node003 node002 data Cluster Management GUI Cluster Management Shell Web-Based User Portal Third-Party Applications BMC BMC BMCraw data consolidated data metrics metrics metrics metrics metrics
  • 16. 16 Native Metrics for Clusters & Clouds  Over 160 relating to HPC • From bare metal to workload managers to apps Includes accelerators and coprocessors • On-the-ground and in-the-public-cloud  Over 400 relating to Hadoop • From distros, HDFS & YARN to data-platform apps  Almost 90 relating to OpenStack • Tenant-specific plus private cloud as-a-whole  Over 60 relating to Ceph http://www.brightcomputing.com/Linux-Cluster-Monitoring
  • 17.
  • 18.
  • 19. 19 Monitoring++  Proactive alert-based monitoring • Define thresholds for any metric • Associate actions with thresholds Actions execute when thresholds exceeded  Health checks • Invasive plus dynamic diagnostics Cluster monitoring vs. health checking: What’s the difference? http://info.brightcomputing.com/blog/cluster-monitoring-vs.-health-checking-whats-the-difference http://www.brightcomputing.com/Linux-Cluster-Health
  • 20. 20
  • 21.
  • 22.
  • 23. 23  Modernized monitoring  Monitoring HPC and Hadoop clusters  Monitoring public and private clouds  Monitoring with alerts and health checks  Customized monitoring - including how to incorporate your own monitors Key Takeaways RECORDING
  • 24. Q & A Ian Lumb, ian.lumb@brightcomputing.com http://www.brightcomputing.com/