SlideShare a Scribd company logo
1 of 18
Download to read offline
Monitoring uptime on the
Nectar Research Cloud
TPAC
Uni Melb
NCI
Core
Services
Pawsey
QCIF
eRSA
Intersect Monash
Nectar architecture
Load BalancerLoad Balancer
Tier 1
Service API
Tier 1
Service API
Tier 1
Service API
Tier 2
Service API
Message
Queue
Message
Queue
Message
Queue
Database
Tier 1
Service API
Tier 1
Service
Engine
Tier 1
Service API
Tier 2
Service
Engine
Load BalancerDashboard
Nectar core services
Test everything
● APIs and dashboard are running
● Services are working correctly
● Existing resources are happy (e.g. instances, networks)
● New resources can be created successfully
● Across all sites
Control plane hosts
Nagios
● Ping
● SSH
● NTP
● Filesystem
● Uptime
● Puppet
Ganglia metrics
● CPU
● Memory
● Network
● Disk I/O
Control plane services
● Service ports and processes
● HTTP endpoint
● API process
● Oslo middleware healthcheck
● Consistent /healthcheck URL for all services
● Called by load balancers
● More complex tests
● Request token from Keystone
● Check glance for image
Environment
Canary instance in each AZ
● Ping
● DHCP
● Metadata
Not an exhaustive test, but a good indicator
Instance boot test
Exercise the whole stack with Tempest
● Fetch a token
● Create a keypair
● Create security groups and rules
● Create instance
● Ping instance
● SSH to instance
● Destroy/clean up all resources
Tempest
Instance boot test
Instance boot for each AZ
● Tiny CirrOS image for speed
● Help identify site specific issues
Instance boot for each flavour
● Enough capacity for large flavours?
● Scheduler working properly
● Can be problematic with cells v1
Tempest
Tempest
● OpenStack integration testing suite
● Jobs launched by Jenkins with custom wrapper script
● Result pushed (passive) to Nagios via NRDP
● Lots more can be done here (e.g testing more services)
Tempest
Jenkins
Nagios
Status dashboard
Alerts
Nagios
● Configured by Puppet
● Notifications delivered by email and Slack
● Site specific alerts sent to site ops team
Analysing logs
ELK
● ElasticSearch, Logstash and Kibana
● Service and LB access logs sent to central syslog server
● Pretty dashboards
● Great for diagnosing issues
Kibana graphs
Tying it all together
● Define nagios_host and nagios_service resources
in Puppet
● Nagios configuration built by Naginator from
PuppetDB
● Deploy Ganglia
● Custom scripts to extract data from Nagios for
dashboard and reports
Thanks
Andy Botting
Nectar Core Services
andrew.botting@unimelb.edu.au

More Related Content

What's hot

Monitoring of OpenNebula installations
Monitoring of OpenNebula installationsMonitoring of OpenNebula installations
Monitoring of OpenNebula installationsNETWAYS
 
Integration testing for salt states using aws ec2 container service
Integration testing for salt states using aws ec2 container serviceIntegration testing for salt states using aws ec2 container service
Integration testing for salt states using aws ec2 container serviceSaltStack
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniMonal Daxini
 
From nothing to Prometheus : one year after
From nothing to Prometheus : one year afterFrom nothing to Prometheus : one year after
From nothing to Prometheus : one year afterAntoine Leroyer
 
Fluentd - CNCF Paris
Fluentd - CNCF ParisFluentd - CNCF Paris
Fluentd - CNCF ParisHorgix
 
OSMC 2013 | Zabbix: A Practical Demo by Rihards Olups
OSMC 2013 | Zabbix: A Practical Demo by Rihards OlupsOSMC 2013 | Zabbix: A Practical Demo by Rihards Olups
OSMC 2013 | Zabbix: A Practical Demo by Rihards OlupsNETWAYS
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecPeter Bakas
 
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...Tokuhiro Matsuno
 
Nginx monitoring with graphite
Nginx monitoring with graphiteNginx monitoring with graphite
Nginx monitoring with graphitedamaex17
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Monal Daxini
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
 
Kafka Summit SF 2017 - Running Kafka as a Service at Scale
Kafka Summit SF 2017 - Running Kafka as a Service at ScaleKafka Summit SF 2017 - Running Kafka as a Service at Scale
Kafka Summit SF 2017 - Running Kafka as a Service at Scaleconfluent
 
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner) Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner) Puppet
 
gRPC: The Story of Microservices at Square
gRPC: The Story of Microservices at SquaregRPC: The Story of Microservices at Square
gRPC: The Story of Microservices at SquareApigee | Google Cloud
 
Prezo at-mesos con2015-final
Prezo at-mesos con2015-finalPrezo at-mesos con2015-final
Prezo at-mesos con2015-finalSharma Podila
 
Writing Rust Command Line Applications
Writing Rust Command Line ApplicationsWriting Rust Command Line Applications
Writing Rust Command Line ApplicationsAll Things Open
 
Airflow Clustering and High Availability
Airflow Clustering and High AvailabilityAirflow Clustering and High Availability
Airflow Clustering and High AvailabilityRobert Sanders
 
Smart Testing: Catching More Bugs with Less Code Through Topology Shuffler
Smart Testing: Catching More Bugs with Less Code Through Topology ShufflerSmart Testing: Catching More Bugs with Less Code Through Topology Shuffler
Smart Testing: Catching More Bugs with Less Code Through Topology ShufflerOPNFV
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.Renzo Tomà
 
DOD 2016 - Kamil Szczygieł - Patching 100 OpenStack Compute Nodes with Zero-d...
DOD 2016 - Kamil Szczygieł - Patching 100 OpenStack Compute Nodes with Zero-d...DOD 2016 - Kamil Szczygieł - Patching 100 OpenStack Compute Nodes with Zero-d...
DOD 2016 - Kamil Szczygieł - Patching 100 OpenStack Compute Nodes with Zero-d...PROIDEA
 

What's hot (20)

Monitoring of OpenNebula installations
Monitoring of OpenNebula installationsMonitoring of OpenNebula installations
Monitoring of OpenNebula installations
 
Integration testing for salt states using aws ec2 container service
Integration testing for salt states using aws ec2 container serviceIntegration testing for salt states using aws ec2 container service
Integration testing for salt states using aws ec2 container service
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
 
From nothing to Prometheus : one year after
From nothing to Prometheus : one year afterFrom nothing to Prometheus : one year after
From nothing to Prometheus : one year after
 
Fluentd - CNCF Paris
Fluentd - CNCF ParisFluentd - CNCF Paris
Fluentd - CNCF Paris
 
OSMC 2013 | Zabbix: A Practical Demo by Rihards Olups
OSMC 2013 | Zabbix: A Practical Demo by Rihards OlupsOSMC 2013 | Zabbix: A Practical Demo by Rihards Olups
OSMC 2013 | Zabbix: A Practical Demo by Rihards Olups
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
 
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
 
Nginx monitoring with graphite
Nginx monitoring with graphiteNginx monitoring with graphite
Nginx monitoring with graphite
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Kafka Summit SF 2017 - Running Kafka as a Service at Scale
Kafka Summit SF 2017 - Running Kafka as a Service at ScaleKafka Summit SF 2017 - Running Kafka as a Service at Scale
Kafka Summit SF 2017 - Running Kafka as a Service at Scale
 
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner) Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
 
gRPC: The Story of Microservices at Square
gRPC: The Story of Microservices at SquaregRPC: The Story of Microservices at Square
gRPC: The Story of Microservices at Square
 
Prezo at-mesos con2015-final
Prezo at-mesos con2015-finalPrezo at-mesos con2015-final
Prezo at-mesos con2015-final
 
Writing Rust Command Line Applications
Writing Rust Command Line ApplicationsWriting Rust Command Line Applications
Writing Rust Command Line Applications
 
Airflow Clustering and High Availability
Airflow Clustering and High AvailabilityAirflow Clustering and High Availability
Airflow Clustering and High Availability
 
Smart Testing: Catching More Bugs with Less Code Through Topology Shuffler
Smart Testing: Catching More Bugs with Less Code Through Topology ShufflerSmart Testing: Catching More Bugs with Less Code Through Topology Shuffler
Smart Testing: Catching More Bugs with Less Code Through Topology Shuffler
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.
 
DOD 2016 - Kamil Szczygieł - Patching 100 OpenStack Compute Nodes with Zero-d...
DOD 2016 - Kamil Szczygieł - Patching 100 OpenStack Compute Nodes with Zero-d...DOD 2016 - Kamil Szczygieł - Patching 100 OpenStack Compute Nodes with Zero-d...
DOD 2016 - Kamil Szczygieł - Patching 100 OpenStack Compute Nodes with Zero-d...
 

Similar to Monitor Nectar Research Cloud uptime

Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetesRishabh Indoria
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and MetricsRicardo Lourenço
 
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and DaemonsQConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemonsaspyker
 
Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Ceph Community
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthNicolas Brousse
 
Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209mffiedler
 
All of the thing about Postman
All of the thing about PostmanAll of the thing about Postman
All of the thing about PostmanAlihossein shahabi
 
Micro services infrastructure with AWS and Ansible
Micro services infrastructure with AWS and AnsibleMicro services infrastructure with AWS and Ansible
Micro services infrastructure with AWS and AnsibleBamdad Dashtban
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
 
Serverless for the Cloud Native Era with Fission
Serverless for the Cloud Native Era with FissionServerless for the Cloud Native Era with Fission
Serverless for the Cloud Native Era with FissionNATS
 
Build cloud like Rackspace with OpenStack Ansible
Build cloud like Rackspace with OpenStack AnsibleBuild cloud like Rackspace with OpenStack Ansible
Build cloud like Rackspace with OpenStack AnsibleJirayut Nimsaeng
 
Performance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovPerformance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovValeriia Maliarenko
 
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Martin Spier
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudDatadog
 
Fuel's current use cases, architecture and next steps
Fuel's current use cases, architecture and next stepsFuel's current use cases, architecture and next steps
Fuel's current use cases, architecture and next stepsOpen-IT
 
TryStack: A Sandbox for OpenStack Users and Admins
TryStack: A Sandbox for OpenStack Users and AdminsTryStack: A Sandbox for OpenStack Users and Admins
TryStack: A Sandbox for OpenStack Users and AdminsAnne Gentle
 
PaaSTA: Running applications at Yelp
PaaSTA: Running applications at YelpPaaSTA: Running applications at Yelp
PaaSTA: Running applications at YelpNathan Handler
 

Similar to Monitor Nectar Research Cloud uptime (20)

Netty training
Netty trainingNetty training
Netty training
 
Netty training
Netty trainingNetty training
Netty training
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and Metrics
 
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and DaemonsQConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
 
Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
 
Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209
 
All of the thing about Postman
All of the thing about PostmanAll of the thing about Postman
All of the thing about Postman
 
Micro services infrastructure with AWS and Ansible
Micro services infrastructure with AWS and AnsibleMicro services infrastructure with AWS and Ansible
Micro services infrastructure with AWS and Ansible
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Serverless for the Cloud Native Era with Fission
Serverless for the Cloud Native Era with FissionServerless for the Cloud Native Era with Fission
Serverless for the Cloud Native Era with Fission
 
Build cloud like Rackspace with OpenStack Ansible
Build cloud like Rackspace with OpenStack AnsibleBuild cloud like Rackspace with OpenStack Ansible
Build cloud like Rackspace with OpenStack Ansible
 
Performance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovPerformance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei Radov
 
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
 
reBuy on Kubernetes
reBuy on KubernetesreBuy on Kubernetes
reBuy on Kubernetes
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
 
Fuel's current use cases, architecture and next steps
Fuel's current use cases, architecture and next stepsFuel's current use cases, architecture and next steps
Fuel's current use cases, architecture and next steps
 
TryStack: A Sandbox for OpenStack Users and Admins
TryStack: A Sandbox for OpenStack Users and AdminsTryStack: A Sandbox for OpenStack Users and Admins
TryStack: A Sandbox for OpenStack Users and Admins
 
PaaSTA: Running applications at Yelp
PaaSTA: Running applications at YelpPaaSTA: Running applications at Yelp
PaaSTA: Running applications at Yelp
 

More from OpenStack

Swinburne University of Technology - Shunde Zhang & Kieran Spear, Aptira
Swinburne University of Technology - Shunde Zhang & Kieran Spear, AptiraSwinburne University of Technology - Shunde Zhang & Kieran Spear, Aptira
Swinburne University of Technology - Shunde Zhang & Kieran Spear, AptiraOpenStack
 
Related OSS Projects - Peter Rowe, Flexera Software
Related OSS Projects - Peter Rowe, Flexera SoftwareRelated OSS Projects - Peter Rowe, Flexera Software
Related OSS Projects - Peter Rowe, Flexera SoftwareOpenStack
 
Supercomputing by API: Connecting Modern Web Apps to HPC
Supercomputing by API: Connecting Modern Web Apps to HPCSupercomputing by API: Connecting Modern Web Apps to HPC
Supercomputing by API: Connecting Modern Web Apps to HPCOpenStack
 
Federation and Interoperability in the Nectar Research Cloud
Federation and Interoperability in the Nectar Research CloudFederation and Interoperability in the Nectar Research Cloud
Federation and Interoperability in the Nectar Research CloudOpenStack
 
Simplifying the Move to OpenStack
Simplifying the Move to OpenStackSimplifying the Move to OpenStack
Simplifying the Move to OpenStackOpenStack
 
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red HatHyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red HatOpenStack
 
Migrating your infrastructure to OpenStack - Avi Miller, Oracle
Migrating your infrastructure to OpenStack - Avi Miller, OracleMigrating your infrastructure to OpenStack - Avi Miller, Oracle
Migrating your infrastructure to OpenStack - Avi Miller, OracleOpenStack
 
A glimpse into an industry Cloud using Open Source Technologies - Adrian Koh,...
A glimpse into an industry Cloud using Open Source Technologies - Adrian Koh,...A glimpse into an industry Cloud using Open Source Technologies - Adrian Koh,...
A glimpse into an industry Cloud using Open Source Technologies - Adrian Koh,...OpenStack
 
Enabling OpenStack for Enterprise - Tarso Dos Santos, Veritas
Enabling OpenStack for Enterprise - Tarso Dos Santos, VeritasEnabling OpenStack for Enterprise - Tarso Dos Santos, Veritas
Enabling OpenStack for Enterprise - Tarso Dos Santos, VeritasOpenStack
 
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSE
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSEUnderstanding blue store, Ceph's new storage backend - Tim Serong, SUSE
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSEOpenStack
 
OpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus Networks
OpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus NetworksOpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus Networks
OpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus NetworksOpenStack
 
Diving in the desert: A quick overview into OpenStack Sahara capabilities - A...
Diving in the desert: A quick overview into OpenStack Sahara capabilities - A...Diving in the desert: A quick overview into OpenStack Sahara capabilities - A...
Diving in the desert: A quick overview into OpenStack Sahara capabilities - A...OpenStack
 
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...OpenStack
 
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...OpenStack
 
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...OpenStack
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...OpenStack
 
Ironically, Infrastructure Doesn't Matter - Quinton Anderson, Commonwealth Ba...
Ironically, Infrastructure Doesn't Matter - Quinton Anderson, Commonwealth Ba...Ironically, Infrastructure Doesn't Matter - Quinton Anderson, Commonwealth Ba...
Ironically, Infrastructure Doesn't Matter - Quinton Anderson, Commonwealth Ba...OpenStack
 
Traditional Enterprise to OpenStack Cloud - An Unexpected Journey
Traditional Enterprise to OpenStack Cloud - An Unexpected JourneyTraditional Enterprise to OpenStack Cloud - An Unexpected Journey
Traditional Enterprise to OpenStack Cloud - An Unexpected JourneyOpenStack
 
Building a GPU-enabled OpenStack Cloud for HPC - Lance Wilson, Monash University
Building a GPU-enabled OpenStack Cloud for HPC - Lance Wilson, Monash UniversityBuilding a GPU-enabled OpenStack Cloud for HPC - Lance Wilson, Monash University
Building a GPU-enabled OpenStack Cloud for HPC - Lance Wilson, Monash UniversityOpenStack
 
Containers and OpenStack: Marc Van Hoof, Kumulus: Containers and OpenStack
Containers and OpenStack: Marc Van Hoof, Kumulus: Containers and OpenStackContainers and OpenStack: Marc Van Hoof, Kumulus: Containers and OpenStack
Containers and OpenStack: Marc Van Hoof, Kumulus: Containers and OpenStackOpenStack
 

More from OpenStack (20)

Swinburne University of Technology - Shunde Zhang & Kieran Spear, Aptira
Swinburne University of Technology - Shunde Zhang & Kieran Spear, AptiraSwinburne University of Technology - Shunde Zhang & Kieran Spear, Aptira
Swinburne University of Technology - Shunde Zhang & Kieran Spear, Aptira
 
Related OSS Projects - Peter Rowe, Flexera Software
Related OSS Projects - Peter Rowe, Flexera SoftwareRelated OSS Projects - Peter Rowe, Flexera Software
Related OSS Projects - Peter Rowe, Flexera Software
 
Supercomputing by API: Connecting Modern Web Apps to HPC
Supercomputing by API: Connecting Modern Web Apps to HPCSupercomputing by API: Connecting Modern Web Apps to HPC
Supercomputing by API: Connecting Modern Web Apps to HPC
 
Federation and Interoperability in the Nectar Research Cloud
Federation and Interoperability in the Nectar Research CloudFederation and Interoperability in the Nectar Research Cloud
Federation and Interoperability in the Nectar Research Cloud
 
Simplifying the Move to OpenStack
Simplifying the Move to OpenStackSimplifying the Move to OpenStack
Simplifying the Move to OpenStack
 
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red HatHyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
 
Migrating your infrastructure to OpenStack - Avi Miller, Oracle
Migrating your infrastructure to OpenStack - Avi Miller, OracleMigrating your infrastructure to OpenStack - Avi Miller, Oracle
Migrating your infrastructure to OpenStack - Avi Miller, Oracle
 
A glimpse into an industry Cloud using Open Source Technologies - Adrian Koh,...
A glimpse into an industry Cloud using Open Source Technologies - Adrian Koh,...A glimpse into an industry Cloud using Open Source Technologies - Adrian Koh,...
A glimpse into an industry Cloud using Open Source Technologies - Adrian Koh,...
 
Enabling OpenStack for Enterprise - Tarso Dos Santos, Veritas
Enabling OpenStack for Enterprise - Tarso Dos Santos, VeritasEnabling OpenStack for Enterprise - Tarso Dos Santos, Veritas
Enabling OpenStack for Enterprise - Tarso Dos Santos, Veritas
 
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSE
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSEUnderstanding blue store, Ceph's new storage backend - Tim Serong, SUSE
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSE
 
OpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus Networks
OpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus NetworksOpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus Networks
OpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus Networks
 
Diving in the desert: A quick overview into OpenStack Sahara capabilities - A...
Diving in the desert: A quick overview into OpenStack Sahara capabilities - A...Diving in the desert: A quick overview into OpenStack Sahara capabilities - A...
Diving in the desert: A quick overview into OpenStack Sahara capabilities - A...
 
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
 
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
 
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
 
Ironically, Infrastructure Doesn't Matter - Quinton Anderson, Commonwealth Ba...
Ironically, Infrastructure Doesn't Matter - Quinton Anderson, Commonwealth Ba...Ironically, Infrastructure Doesn't Matter - Quinton Anderson, Commonwealth Ba...
Ironically, Infrastructure Doesn't Matter - Quinton Anderson, Commonwealth Ba...
 
Traditional Enterprise to OpenStack Cloud - An Unexpected Journey
Traditional Enterprise to OpenStack Cloud - An Unexpected JourneyTraditional Enterprise to OpenStack Cloud - An Unexpected Journey
Traditional Enterprise to OpenStack Cloud - An Unexpected Journey
 
Building a GPU-enabled OpenStack Cloud for HPC - Lance Wilson, Monash University
Building a GPU-enabled OpenStack Cloud for HPC - Lance Wilson, Monash UniversityBuilding a GPU-enabled OpenStack Cloud for HPC - Lance Wilson, Monash University
Building a GPU-enabled OpenStack Cloud for HPC - Lance Wilson, Monash University
 
Containers and OpenStack: Marc Van Hoof, Kumulus: Containers and OpenStack
Containers and OpenStack: Marc Van Hoof, Kumulus: Containers and OpenStackContainers and OpenStack: Marc Van Hoof, Kumulus: Containers and OpenStack
Containers and OpenStack: Marc Van Hoof, Kumulus: Containers and OpenStack
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

Monitor Nectar Research Cloud uptime

  • 1. Monitoring uptime on the Nectar Research Cloud
  • 3. Load BalancerLoad Balancer Tier 1 Service API Tier 1 Service API Tier 1 Service API Tier 2 Service API Message Queue Message Queue Message Queue Database Tier 1 Service API Tier 1 Service Engine Tier 1 Service API Tier 2 Service Engine Load BalancerDashboard Nectar core services
  • 4. Test everything ● APIs and dashboard are running ● Services are working correctly ● Existing resources are happy (e.g. instances, networks) ● New resources can be created successfully ● Across all sites
  • 5. Control plane hosts Nagios ● Ping ● SSH ● NTP ● Filesystem ● Uptime ● Puppet Ganglia metrics ● CPU ● Memory ● Network ● Disk I/O
  • 6. Control plane services ● Service ports and processes ● HTTP endpoint ● API process ● Oslo middleware healthcheck ● Consistent /healthcheck URL for all services ● Called by load balancers ● More complex tests ● Request token from Keystone ● Check glance for image
  • 7. Environment Canary instance in each AZ ● Ping ● DHCP ● Metadata Not an exhaustive test, but a good indicator
  • 8. Instance boot test Exercise the whole stack with Tempest ● Fetch a token ● Create a keypair ● Create security groups and rules ● Create instance ● Ping instance ● SSH to instance ● Destroy/clean up all resources Tempest
  • 9. Instance boot test Instance boot for each AZ ● Tiny CirrOS image for speed ● Help identify site specific issues Instance boot for each flavour ● Enough capacity for large flavours? ● Scheduler working properly ● Can be problematic with cells v1 Tempest
  • 10. Tempest ● OpenStack integration testing suite ● Jobs launched by Jenkins with custom wrapper script ● Result pushed (passive) to Nagios via NRDP ● Lots more can be done here (e.g testing more services) Tempest
  • 14. Alerts Nagios ● Configured by Puppet ● Notifications delivered by email and Slack ● Site specific alerts sent to site ops team
  • 15. Analysing logs ELK ● ElasticSearch, Logstash and Kibana ● Service and LB access logs sent to central syslog server ● Pretty dashboards ● Great for diagnosing issues
  • 17. Tying it all together ● Define nagios_host and nagios_service resources in Puppet ● Nagios configuration built by Naginator from PuppetDB ● Deploy Ganglia ● Custom scripts to extract data from Nagios for dashboard and reports
  • 18. Thanks Andy Botting Nectar Core Services andrew.botting@unimelb.edu.au