SlideShare a Scribd company logo
Monasca
Monitoring/Logging-as-a-Service (at-scale)
Speaker
Roland Hochmuth
Hewlett Packard Enterprise
Fort Collins, Colorado, USA
Agenda
• Describe how to build a highly scalable monitoring and logging as a
service platform
• Architectural and design principles
• Scale, HA
• Provide an overview of Monasca
• Features
• API
• Demo
What is Monitoring-as-a-Service?
• A Monitoring or Logging solution deployed as Software-as-a-Service
• E.g. CloudWatch, Datadog, New Relic, Librato, Loggly and many others
• First-class, preferably RESTful HTTP API
• Authentication
• Multi-tenancy
• Provides self-provisioning to users/tenants of the service
• Designed to be highly reliable and operate at scale
• Historically run by an operations team doing web services
What is OpenStack?
• OpenStack is a cloud operating system that controls large pools of
compute, storage, and networking resources
• Open-source alternative to AWS, Microsoft Azure, Google Cloud and
other cloud services
• Deployed in both public and private clouds
What is Monasca?
• Open-source Monitoring/Logging-as-a-Service platform for OpenStack
• Authentication currently via OpenStack Identity Service (Keystone)
• Microservices message-bus based architecture
• First-class RESTful API
• Push-based metrics
• Consolidates Operational Monitoring, Monitoring-as-a-Service, Metering &
Billing and more
• Designed for elastic cloud environments/deployments
• High-availability / clustering built-in
• Horizontally scalable and vertically 4 tiered/layered architecture
• Capable of long-term data retention to address metering, SLA, capacity
planning, trend analysis, post-hoc RCA, and other use cases
• Extensible and Composable
The Log
• The Log: What every software engineer should know about real-time data's
unifying abstraction
• https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-
should-know-about-real-time-datas-unifying
• Log: An append-only, totally-ordered sequence of records ordered by time
From To
Monitoring Architecture
Kafka
• A performant, distributed, durable, publish/subscribe messaging and stream
processing system
• Metrics, logs and events are published to topics in Kafka
• Microservices register in a "consumer group" as a consumer
• Microservices "subscribe" to topics and consume metrics/logs and events
• Messages are replicated per consumer group
• Messages are load-balanced across all consumers in a consumer group
• Can add/remove micro-services to handle load or mitigate problems
• As micro-services expand/contract the partitions are automatically re-balanced
• At-least-once semantic guarantees on message delivery
• Also used for domain events, notification retry events, periodic notifications,
grouping notifcations and other areas
• Always accept data, never drop data, true elasticity
• Loggly: https://www.youtube.com/watch?v=LpNbjXFPyZ0
CQRS
• Command Query Responsibility Segregation (CQRS)
• CQRS involves splitting an application into two parts internally:
1. Command side ordering the system to update state
2. Query side that gets information without changing state
• Advantages
• Decouples the read/write load. Allows each to be scaled independently
• Read store can be optimized for the query pattern of the application
• Reference
• Event sourcing, CQRS, stream processing and Apache Kafka
• https://www.confluent.io/blog/event-sourcing-cqrs-stream-processing-apache-kafka-whats-connection/
Microservices
• Microservices are small, autonomous, decoupled services that are
deployed independenty and work together as a single application
• Communication between services occurs via a network
• Services need to be able to change independently of each other, and be
deployed by themselves without requiring consumers to change
• Benefits:
• Resilience
• Scale
• Ease of deployment
• Organizational Alignment
• Optimized for Change/Replaceability
POST Metrics Sequence
Domain Events Sequence
Deployment Models (HA/Scale)
• Many ways to deploy Monasca
• Typically deployed in a clustered/HA configuration using three nodes
or greater
• If any node or microservice fails, the cluster remains operational
• Partitions in Kafka are redistributed among the remaining components
• Preferably, the database is run on a separate layer from the other
components/microservices
• Note, Monasca can also be deployed on a single-node, non-clustered
• Has also been containerized and run in Kubernetes
Metrics Model
POST /v2.0/metrics
{
name: http_status,
dimensions:
{
url: http://host.domain.com:1234/service,
cluster: c1,
control_plane: ccp,
service: compute
}
timestamp: 0, /* milliseconds */
value: 1.0,
value_meta: {
status_code: 500,
msg: Internal server error
}
}
• Simple, concise, multi-dimensional flexible description
• Name (string)
• Dimensions: Dictionary of user-defined (key, value)
pairs that are used to uniquely identify a metric
• Optional dictionary of user-defined (key, value)
pairs that can be used to describe a measurement
• Normally used for errors and messages
Push vs Pull
• Monitoring-as-a-Service
• Can't always pull due to firewalls and network issues
• Low-latency: sub-second latency difficult for pull model
• Doesn't require service discovery and registration
• As entities are deployed, they can start sending metrics without have to be
discovered or registered
• Events
• Temporary caching/buffering of metrics/events while service
unreachable.
Monasca API
• Primary point for pushing metrics and handling queries
• Authenticates all requests against the Keystone identity service
• Note, auth tokens are cached to reduce the load on Keystone
• Resources: Metrics, Alarm Definitions, Alarms and Notification Methods
• API Specification:
• https://github.com/openstack/monasca-api/tree/master/docs
• Horizontally scalable
• Publishes metrics to Kafka
• Queries timeseries DB for measurements and statistics
• Queries Config DB for alarms, alarm definitions and notification methods
Persister
• Consumes both metrics and alarm state transition events from Kafka
• Stores temporarily in-memory and does batch writes to the TSDB, based on
batch size or time, to optimize write performance
• At-least once message delivery semantics:
• No metrics or alarm state transition events are lost
• The Kafka consumer offset for each batch is only updated after successfully storing
the metric or alarm state transition event
• Note, duplicates are possible
• HA/fault-tolerance:
• Multiple persisters run simultaneously and balance load
• If a persister fails, the load is automatically re-balanced across the remaining
persisters.
Time Series Databases
• Used for storing:
• Metrics
• Alarm state history
• Two databases supported:
1. Vertica
• Enterprise class, proprietary, closed-source, clustered, HA, analytics database
• Excels at time-series
2. InfluxDB
• Open-source single-node time-series DB
• Clustering is closed-source
• Note, can replicate to multiple instances of InfluxDB using Kafka
• Investigating support for additional databases
Config Database
• Stores all "transactional" data for Monasca such as
• Alarm Definitions
• Alarms
• Notification Methods
• MySQL and Postgres supported
• Typically deployed in a clustered or HA configuration
Threshold Engine
• Near real-time stream processing, clustered and highly available
threshold engine
• Based on Apache Storm
• Consumes metrics from Kafka
• Creates alarms based on metrics that match patterns specified in the
alarm definition
• Evaluates whether metrics exceed threshold
• Publishes alarm state transition events to Kafka
• Supports both simple and compound alarm expressions
Notification Engine
• Consumes "alarm state transition events" from Kafka produced by the
Threshold Engine
• Evaluates whether notifications should be sent based on actions specified
in the alarm definition.
• OK, ALARM and UNDETERMINED actions
• Supports email, PagerDuty, webhooks, HipChat, Slack and JIRA
• Dynamic plugins supported
• Supports both "one-shot" and "periodic" notifications
• If sending to the notification address fails, then notification is published to
retry topic in Kafka, and retried later
• Grouping notifications: In progress
Kafka Message Schema
• JSON messages published/consumed to/from Kafka by Monasca
micro-services
• Well-defined schema is published at:
• https://wiki.openstack.org/wiki/Monasca/Message_Schema
Metrics
Create, query and get statistics for metrics
• GET, POST /v2.0/metrics
• GET /v2.0/metrics/names:
• Returns the unique metric names
• GET /v2.0/metrics/dimension/names
• Returns the unique dimension names
• GET /v2.0/metrics/dimension/names/values
• Returns the unique dimension values
Measurements
GET /v2.0/metrics/measurements
• Returns a list of measurements
• Query parameters
• Name and dimensions to filter by
• Start_time and end_time
• Offset and limit
• merge_metrics: allow multiple metrics to be combined into a single list
of measurements.
• group_by: list of columns to group the metrics to be returned. Allows
multiple unique metrics to be returned in a single query.
Statistics
GET /v2.0/metrics/statistics
• Query parameters
• Name and dimensions to filter by
• Start_time and end_time
• Statistics: avg, min, max, sum and count
• Period: The time period to aggregate measurements by
• Offset, limit
• merge_metrics: allow multiple metrics to be combined into a single list
of statistics
• group_by: list of columns to group the metrics to be returned. Allows
multiple unique metrics to be returned in a single query.
Metrics Names
GET /v2.0/metrics/names
• Returns a list of the unique metric names
• Query parameters
• Dimensions
• Offset, limit
Metric Dimension Names
GET /v2.0/metrics/dimensions/names
• List the dimension names
• Query parameters
• Metric name
• Offset, limit
Metric Dimension Values
GET /v2.0/metrics/dimensions/names/values
• List the dimension values
• Query parameters
• Metric name
• Dimension name
• Offset, limit
Alarm Definitions
POST, GET /v2.0/alarm-definitions
• Alarm definitions are templates that are used to automatically and
dynamically create alarms based on matching metric names and
dimensions
• One alarm definition can result in zero or more alarms.
• Simple grammar for creating compound alarm expressions:
• avg(cpu.user_perc{}) > 85 or avg(disk.read_ops{device=vda}, 120) > 1000
• Alarm states (OK, ALARM and UNDETERMINED)
• Actions associated with alarms for state transitions
• User assigned severity (LOW, MEDIUM, HIGH, CRITICAL)
• Thresholds can be dynamically adjusted via PATCH
• Minimal lifecycle management, alarm_lifecycle_state and link.
List Alarms
GET /v2.0/alarms
Query parameters:
• metric_name - Name of metric to filter by
• metric_dimensions
• State: OK, ALARM or UNDETERMINED.
• Severity: One or more severities to filter by, separated with |,
ex. severity=LOW|MEDIUM
• state_updated_start_time : The start time in ISO 8601 combined date and
time format in UTC.
• Offset, limit
• sort_by
Alarms
GET, PUT, PATCH, DELETE /v2.0/alarms/{alarm-id}
• Alarms created by the Threshold Engine based on matching alarm
definitions.
• When new nodes or components are deployed, alarms are automatically created
• Alarms are resources within Monasca. They have a resource ID and
lifecycle.
• By default, three states: OK, ALARM and UNDETERMINED
• UNDETERMINED state occurs when metrics are no longer being received
• Deterministic alarms, two states: OK and ALARM
• Used for systems where metrics are sporadic. E.g. Creating metrics when errors in log
files occur, and no metrics, when there aren't any errors.
Alarm Counts
GET /v2.0/alarms/count
• Query the total number of alarms in the OK, ALARM or
UNDETERMINED state, and their severities, grouped by
metrics dimension, such as OpenStack service, state and
severity.
• Used for summary dashboards
Example: Helion Ops Console
Alarm History
GET /v2.0/alarms/state-history
• Lists the alarm state history for alarms
• Query Parameters:
• Dimensions to filter on
• Start/end timestamp
• Offset, limit
GET /v2.0/alarms/{alarm-id}/state-history
• Lists the alarm state history for a specific alarm
Notification Methods
POST, GET, DELETE /v2.0/notification-methods
Notification methods are associated with Actions in alarm definitions.
Example:
POST /v2.0/notification-methods {
"name":"Name of notification method",
"type":"EMAIL",
"address":"john.doe@hp.com"
}
Monasca Agent
• System metrics (cpu, memory, network, filesystem, …)
• Service metrics
• MySQL, Kafka, and many others
• Application metrics
• Built-in Statsd daemon
• Python monasca-statsd library: Adds support for dimensions
• VM system metrics
• Open vSwitch metrics
• Active checks
• HTTP status checks and response times
• System up/down checks (ping and ssh)
• Runs any Nagios plugin or check_mk
• Extensible/Pluggable: Additional services can be easily added
Agent details
• The Agent Forwarder buffers metrics for a short time to increase the
size of the http request body (number of metrics) sent to the
Monasca API.
• The Agent request an auth token from the Keystone Identity service
which is supplied on all requests.
• The Monasca Agent and API caches Monasca Agent and API caches
Monasca Agent and API caches auth tokens in-memory to reduce
the round-trip authorization requests to Keystone
• If network connectivity between the Agent and API occurs the Agent
will buffer metrics and send when connectivity is restored
• Metrics are submitted using a “agent” role, which only allows metrics
to be POST’d to the metrics endpoint
Grafana/Monasca Integration
• Datasource: A datasource that can be added to the Grafana
dashboard to enable Monasca
• https://github.com/openstack/monasca-grafana-datasource
• Keystone authentication
• https://github.com/twc-openstack/grafana
• Support for Alerting will be added in Grafana 4.
Grafana Monasca Data Source
Logging Architecture
Logging API
• POST /v3.0/logs
• Batch log messages in a single http request
• Global / local / mixed dimensions
• Similar to dimensions in metrics.
• JSON only
• Specification
• https://github.com/openstack/monasca-log-api/blob/master/docs/monasca-
log-api-spec.md
• Queries not done via API, but via Tenantized version of Kibana
• https://github.com/FujitsuEnablingSoftwareTechnologyGmbH/fts-keystone
Log Model
• { "dimensions": {
"hostname":"devstack",
"service":"monitoring",
"component":"monasca-api" }
"logs":[
{ "message":"msg1",
"dimensions": {
"service":"compute",
"component":"nova-api",
"path":"/var/log/mysql.log" } },
{ "message":"msg2",
"dimensions": {
"path":"/var/log/monasca/monasca-api.log" } }
]
}
Log Agents
• Logstash
• https://github.com/logstash-plugins/logstash-output-monasca_log_api/pull/1
• Beaver
• https://github.com/python-beaver/python-beaver/pull/406
• Logspout: Under Investigation
Kibana Integration
• Keystone authentication support for Kibana
• Authentication plugin:
• https://github.com/FujitsuEnablingSoftwareTechnologyGmbH/fts-keystone
• Note: In progress of moving to official OpenStack repo
Composabilty: Logging/Metrics
Transform and Analytics Engine
Monasca Transform
• A new micro-service in Monasca that aggregates and transforms metrics.
• Currently based on Apache Spark Streaming.
• Use Cases:
• Object Storage Disk Capacity
• Object Storage Capacity
• Compute Host Capacity
• VM Capacity
• More to come
• Metrics are aggregated and published every hour.
• Currently in deployment in HPE Helion OpenStack 4.0.
• OpenStack project/repo
• https://github.com/openstack/monasca-transform
Monasca Analytics
• A framework that adds data science tools (parsers, algorithms, etc).
• Features include:
• Algorithmic flow definition, enabling sharing of complex algorithmic recipes
• Thin orchestration layer that instantiates an execution environment.
• Focused on:
• Anomaly detection
• Reducing alert fatigue via alarm clustering (unsupervised machine learning).
• Example algorithms: One Class SVM and LiNGAM.
• Status: Under Development
• OpenStack project/repo
• https://github.com/openstack/monasca-analytics
Distributions & Deployments
• Charter Communications:
• Monasca and Grafana is currently deployed in production private cloud
• Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboard
• 2 datacenters, 600-700 compute nodes, 1000 VMs, 11,000 metrics/sec
• FIWARE Lab:
• http://superuser.openstack.org/articles/monitoring-a-multi-region-cloud-based-on-openstack/
• Hewlett Packard Enterprise: Cloud System, Helion OpenStack
• Supported and tested up to 65K metrics/sec injest rates.
• Fujitsu:
• FUJITSU Software ServerView Cloud Monitoring Manager.
• NEC:
• Planning to include Monasca in "Cloud Solution Menus" solution.
• Others
Statistics: Mitaka/Newton Release
• Organizations:​
• Contributors:​
• Commits:​
• Reviews:​
• Lines of code:​
31​
97​
1075​
4080​
215,370​
Ecosystem
• Hewlett Packard Enterprise
• Fujitsu
• Charter Communications
• NEC
• Cisco
• Cloudbase Solutions
• SUSE
• SolidFire
• SAP
• Cray Inc.
• FIWARE Lab
• Mirantis
• Broadcom
Containers and Kubernetes
• New Monasca Agent Plugins
• Docker plugin
• cAdviser plugin
• Kubernetes plugin: Monitors both Kubernetes control plane and containers
• Prometheus client plugin: Scrapes apps
• Mesos pugin
• Containerization of Monasca
• Heapster Monasca data sink
Next Steps
• Containerizing Monasca
• Monitoring containers and container managers, such as Kubernetes
• Grouping notifications

More Related Content

What's hot

Kafka as Message Broker
Kafka as Message BrokerKafka as Message Broker
Kafka as Message Broker
Haluan Irsad
 
High performance messaging with Apache Pulsar
High performance messaging with Apache PulsarHigh performance messaging with Apache Pulsar
High performance messaging with Apache Pulsar
Matteo Merli
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
Dmitry Tolpeko
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Viswanath J
 
Apache Kafka - Overview
Apache Kafka - OverviewApache Kafka - Overview
Apache Kafka - Overview
CodeOps Technologies LLP
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
iamtodor
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
Joe Stein
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
Microservices deck
Microservices deckMicroservices deck
Microservices deck
Raja Chattopadhyay
 
Apache Kafka
Apache Kafka Apache Kafka
Kafka tutorial
Kafka tutorialKafka tutorial
Kafka tutorial
Srikrishna k
 
Understanding kafka
Understanding kafkaUnderstanding kafka
Understanding kafka
AmitDhodi
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - Kafka
Mayank Bansal
 
Effectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache PulsarEffectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache Pulsar
Matteo Merli
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
emreakis
 
Building Kafka-powered Activity Stream
Building Kafka-powered Activity StreamBuilding Kafka-powered Activity Stream
Building Kafka-powered Activity StreamOleksiy Holubyev
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Srikrishna k
 
Devoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with KafkaDevoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with Kafka
László-Róbert Albert
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
Otávio Carvalho
 

What's hot (20)

Kafka as Message Broker
Kafka as Message BrokerKafka as Message Broker
Kafka as Message Broker
 
High performance messaging with Apache Pulsar
High performance messaging with Apache PulsarHigh performance messaging with Apache Pulsar
High performance messaging with Apache Pulsar
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka - Overview
Apache Kafka - OverviewApache Kafka - Overview
Apache Kafka - Overview
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
Microservices deck
Microservices deckMicroservices deck
Microservices deck
 
Apache Kafka
Apache Kafka Apache Kafka
Apache Kafka
 
Kafka tutorial
Kafka tutorialKafka tutorial
Kafka tutorial
 
Understanding kafka
Understanding kafkaUnderstanding kafka
Understanding kafka
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - Kafka
 
Effectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache PulsarEffectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache Pulsar
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Building Kafka-powered Activity Stream
Building Kafka-powered Activity StreamBuilding Kafka-powered Activity Stream
Building Kafka-powered Activity Stream
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Devoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with KafkaDevoxx Morocco 2016 - Microservices with Kafka
Devoxx Morocco 2016 - Microservices with Kafka
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
 

Similar to OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth

NATS: A Cloud Native Messaging System
NATS: A Cloud Native Messaging SystemNATS: A Cloud Native Messaging System
NATS: A Cloud Native Messaging System
Shiju Varghese
 
WSO2Con Asia 2014 - Essential Elements of an Enterprise PaaS
WSO2Con Asia 2014 - Essential Elements of an Enterprise PaaSWSO2Con Asia 2014 - Essential Elements of an Enterprise PaaS
WSO2Con Asia 2014 - Essential Elements of an Enterprise PaaSWSO2
 
Essential Elements of an Enterprise PaaS
Essential Elements of an Enterprise PaaSEssential Elements of an Enterprise PaaS
Essential Elements of an Enterprise PaaSLakmal Warusawithana
 
CloudStack Overview
CloudStack OverviewCloudStack Overview
CloudStack Overview
sedukull
 
OnPrem Monitoring.pdf
OnPrem Monitoring.pdfOnPrem Monitoring.pdf
OnPrem Monitoring.pdf
TarekHamdi8
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stackNitin Mehta
 
messaging.pptx
messaging.pptxmessaging.pptx
messaging.pptx
NParakh1
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
confluent
 
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
QCloudMentor
 
Transforming Legacy Applications Into Dynamically Scalable Web Services
Transforming Legacy Applications Into Dynamically Scalable Web ServicesTransforming Legacy Applications Into Dynamically Scalable Web Services
Transforming Legacy Applications Into Dynamically Scalable Web Services
Adam Takvam
 
Unleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxUnleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptx
Knoldus Inc.
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scalePulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scale
Matteo Merli
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache Pulsar
Karthik Ramasamy
 
Kafka
KafkaKafka
Kafka
shrenikp
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
Lucas Jellema
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
Cask Data
 
Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...
Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...
Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...
Flink Forward
 
Bigdata meetup dwarak_realtime_score_app
Bigdata meetup dwarak_realtime_score_appBigdata meetup dwarak_realtime_score_app
Bigdata meetup dwarak_realtime_score_app
Dwarakanath Ramachandran
 
Kubernetes Infra 2.0
Kubernetes Infra 2.0Kubernetes Infra 2.0
Kubernetes Infra 2.0
Deepak Sood
 
Event Driven Architectures with Apache Kafka
Event Driven Architectures with Apache KafkaEvent Driven Architectures with Apache Kafka
Event Driven Architectures with Apache Kafka
Matt Masuda
 

Similar to OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth (20)

NATS: A Cloud Native Messaging System
NATS: A Cloud Native Messaging SystemNATS: A Cloud Native Messaging System
NATS: A Cloud Native Messaging System
 
WSO2Con Asia 2014 - Essential Elements of an Enterprise PaaS
WSO2Con Asia 2014 - Essential Elements of an Enterprise PaaSWSO2Con Asia 2014 - Essential Elements of an Enterprise PaaS
WSO2Con Asia 2014 - Essential Elements of an Enterprise PaaS
 
Essential Elements of an Enterprise PaaS
Essential Elements of an Enterprise PaaSEssential Elements of an Enterprise PaaS
Essential Elements of an Enterprise PaaS
 
CloudStack Overview
CloudStack OverviewCloudStack Overview
CloudStack Overview
 
OnPrem Monitoring.pdf
OnPrem Monitoring.pdfOnPrem Monitoring.pdf
OnPrem Monitoring.pdf
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stack
 
messaging.pptx
messaging.pptxmessaging.pptx
messaging.pptx
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
 
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
 
Transforming Legacy Applications Into Dynamically Scalable Web Services
Transforming Legacy Applications Into Dynamically Scalable Web ServicesTransforming Legacy Applications Into Dynamically Scalable Web Services
Transforming Legacy Applications Into Dynamically Scalable Web Services
 
Unleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxUnleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptx
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scalePulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scale
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache Pulsar
 
Kafka
KafkaKafka
Kafka
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...
Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...
Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...
 
Bigdata meetup dwarak_realtime_score_app
Bigdata meetup dwarak_realtime_score_appBigdata meetup dwarak_realtime_score_app
Bigdata meetup dwarak_realtime_score_app
 
Kubernetes Infra 2.0
Kubernetes Infra 2.0Kubernetes Infra 2.0
Kubernetes Infra 2.0
 
Event Driven Architectures with Apache Kafka
Event Driven Architectures with Apache KafkaEvent Driven Architectures with Apache Kafka
Event Driven Architectures with Apache Kafka
 

Recently uploaded

Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
ayushiqss
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
Sharepoint Designs
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 

Recently uploaded (20)

Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 

OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland Hochmuth

  • 2. Speaker Roland Hochmuth Hewlett Packard Enterprise Fort Collins, Colorado, USA
  • 3. Agenda • Describe how to build a highly scalable monitoring and logging as a service platform • Architectural and design principles • Scale, HA • Provide an overview of Monasca • Features • API • Demo
  • 4. What is Monitoring-as-a-Service? • A Monitoring or Logging solution deployed as Software-as-a-Service • E.g. CloudWatch, Datadog, New Relic, Librato, Loggly and many others • First-class, preferably RESTful HTTP API • Authentication • Multi-tenancy • Provides self-provisioning to users/tenants of the service • Designed to be highly reliable and operate at scale • Historically run by an operations team doing web services
  • 5. What is OpenStack? • OpenStack is a cloud operating system that controls large pools of compute, storage, and networking resources • Open-source alternative to AWS, Microsoft Azure, Google Cloud and other cloud services • Deployed in both public and private clouds
  • 6. What is Monasca? • Open-source Monitoring/Logging-as-a-Service platform for OpenStack • Authentication currently via OpenStack Identity Service (Keystone) • Microservices message-bus based architecture • First-class RESTful API • Push-based metrics • Consolidates Operational Monitoring, Monitoring-as-a-Service, Metering & Billing and more • Designed for elastic cloud environments/deployments • High-availability / clustering built-in • Horizontally scalable and vertically 4 tiered/layered architecture • Capable of long-term data retention to address metering, SLA, capacity planning, trend analysis, post-hoc RCA, and other use cases • Extensible and Composable
  • 7. The Log • The Log: What every software engineer should know about real-time data's unifying abstraction • https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer- should-know-about-real-time-datas-unifying • Log: An append-only, totally-ordered sequence of records ordered by time From To
  • 9. Kafka • A performant, distributed, durable, publish/subscribe messaging and stream processing system • Metrics, logs and events are published to topics in Kafka • Microservices register in a "consumer group" as a consumer • Microservices "subscribe" to topics and consume metrics/logs and events • Messages are replicated per consumer group • Messages are load-balanced across all consumers in a consumer group • Can add/remove micro-services to handle load or mitigate problems • As micro-services expand/contract the partitions are automatically re-balanced • At-least-once semantic guarantees on message delivery • Also used for domain events, notification retry events, periodic notifications, grouping notifcations and other areas • Always accept data, never drop data, true elasticity • Loggly: https://www.youtube.com/watch?v=LpNbjXFPyZ0
  • 10. CQRS • Command Query Responsibility Segregation (CQRS) • CQRS involves splitting an application into two parts internally: 1. Command side ordering the system to update state 2. Query side that gets information without changing state • Advantages • Decouples the read/write load. Allows each to be scaled independently • Read store can be optimized for the query pattern of the application • Reference • Event sourcing, CQRS, stream processing and Apache Kafka • https://www.confluent.io/blog/event-sourcing-cqrs-stream-processing-apache-kafka-whats-connection/
  • 11. Microservices • Microservices are small, autonomous, decoupled services that are deployed independenty and work together as a single application • Communication between services occurs via a network • Services need to be able to change independently of each other, and be deployed by themselves without requiring consumers to change • Benefits: • Resilience • Scale • Ease of deployment • Organizational Alignment • Optimized for Change/Replaceability
  • 14. Deployment Models (HA/Scale) • Many ways to deploy Monasca • Typically deployed in a clustered/HA configuration using three nodes or greater • If any node or microservice fails, the cluster remains operational • Partitions in Kafka are redistributed among the remaining components • Preferably, the database is run on a separate layer from the other components/microservices • Note, Monasca can also be deployed on a single-node, non-clustered • Has also been containerized and run in Kubernetes
  • 15. Metrics Model POST /v2.0/metrics { name: http_status, dimensions: { url: http://host.domain.com:1234/service, cluster: c1, control_plane: ccp, service: compute } timestamp: 0, /* milliseconds */ value: 1.0, value_meta: { status_code: 500, msg: Internal server error } } • Simple, concise, multi-dimensional flexible description • Name (string) • Dimensions: Dictionary of user-defined (key, value) pairs that are used to uniquely identify a metric • Optional dictionary of user-defined (key, value) pairs that can be used to describe a measurement • Normally used for errors and messages
  • 16. Push vs Pull • Monitoring-as-a-Service • Can't always pull due to firewalls and network issues • Low-latency: sub-second latency difficult for pull model • Doesn't require service discovery and registration • As entities are deployed, they can start sending metrics without have to be discovered or registered • Events • Temporary caching/buffering of metrics/events while service unreachable.
  • 17. Monasca API • Primary point for pushing metrics and handling queries • Authenticates all requests against the Keystone identity service • Note, auth tokens are cached to reduce the load on Keystone • Resources: Metrics, Alarm Definitions, Alarms and Notification Methods • API Specification: • https://github.com/openstack/monasca-api/tree/master/docs • Horizontally scalable • Publishes metrics to Kafka • Queries timeseries DB for measurements and statistics • Queries Config DB for alarms, alarm definitions and notification methods
  • 18. Persister • Consumes both metrics and alarm state transition events from Kafka • Stores temporarily in-memory and does batch writes to the TSDB, based on batch size or time, to optimize write performance • At-least once message delivery semantics: • No metrics or alarm state transition events are lost • The Kafka consumer offset for each batch is only updated after successfully storing the metric or alarm state transition event • Note, duplicates are possible • HA/fault-tolerance: • Multiple persisters run simultaneously and balance load • If a persister fails, the load is automatically re-balanced across the remaining persisters.
  • 19. Time Series Databases • Used for storing: • Metrics • Alarm state history • Two databases supported: 1. Vertica • Enterprise class, proprietary, closed-source, clustered, HA, analytics database • Excels at time-series 2. InfluxDB • Open-source single-node time-series DB • Clustering is closed-source • Note, can replicate to multiple instances of InfluxDB using Kafka • Investigating support for additional databases
  • 20. Config Database • Stores all "transactional" data for Monasca such as • Alarm Definitions • Alarms • Notification Methods • MySQL and Postgres supported • Typically deployed in a clustered or HA configuration
  • 21. Threshold Engine • Near real-time stream processing, clustered and highly available threshold engine • Based on Apache Storm • Consumes metrics from Kafka • Creates alarms based on metrics that match patterns specified in the alarm definition • Evaluates whether metrics exceed threshold • Publishes alarm state transition events to Kafka • Supports both simple and compound alarm expressions
  • 22. Notification Engine • Consumes "alarm state transition events" from Kafka produced by the Threshold Engine • Evaluates whether notifications should be sent based on actions specified in the alarm definition. • OK, ALARM and UNDETERMINED actions • Supports email, PagerDuty, webhooks, HipChat, Slack and JIRA • Dynamic plugins supported • Supports both "one-shot" and "periodic" notifications • If sending to the notification address fails, then notification is published to retry topic in Kafka, and retried later • Grouping notifications: In progress
  • 23. Kafka Message Schema • JSON messages published/consumed to/from Kafka by Monasca micro-services • Well-defined schema is published at: • https://wiki.openstack.org/wiki/Monasca/Message_Schema
  • 24. Metrics Create, query and get statistics for metrics • GET, POST /v2.0/metrics • GET /v2.0/metrics/names: • Returns the unique metric names • GET /v2.0/metrics/dimension/names • Returns the unique dimension names • GET /v2.0/metrics/dimension/names/values • Returns the unique dimension values
  • 25. Measurements GET /v2.0/metrics/measurements • Returns a list of measurements • Query parameters • Name and dimensions to filter by • Start_time and end_time • Offset and limit • merge_metrics: allow multiple metrics to be combined into a single list of measurements. • group_by: list of columns to group the metrics to be returned. Allows multiple unique metrics to be returned in a single query.
  • 26. Statistics GET /v2.0/metrics/statistics • Query parameters • Name and dimensions to filter by • Start_time and end_time • Statistics: avg, min, max, sum and count • Period: The time period to aggregate measurements by • Offset, limit • merge_metrics: allow multiple metrics to be combined into a single list of statistics • group_by: list of columns to group the metrics to be returned. Allows multiple unique metrics to be returned in a single query.
  • 27. Metrics Names GET /v2.0/metrics/names • Returns a list of the unique metric names • Query parameters • Dimensions • Offset, limit
  • 28. Metric Dimension Names GET /v2.0/metrics/dimensions/names • List the dimension names • Query parameters • Metric name • Offset, limit
  • 29. Metric Dimension Values GET /v2.0/metrics/dimensions/names/values • List the dimension values • Query parameters • Metric name • Dimension name • Offset, limit
  • 30. Alarm Definitions POST, GET /v2.0/alarm-definitions • Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions • One alarm definition can result in zero or more alarms. • Simple grammar for creating compound alarm expressions: • avg(cpu.user_perc{}) > 85 or avg(disk.read_ops{device=vda}, 120) > 1000 • Alarm states (OK, ALARM and UNDETERMINED) • Actions associated with alarms for state transitions • User assigned severity (LOW, MEDIUM, HIGH, CRITICAL) • Thresholds can be dynamically adjusted via PATCH • Minimal lifecycle management, alarm_lifecycle_state and link.
  • 31. List Alarms GET /v2.0/alarms Query parameters: • metric_name - Name of metric to filter by • metric_dimensions • State: OK, ALARM or UNDETERMINED. • Severity: One or more severities to filter by, separated with |, ex. severity=LOW|MEDIUM • state_updated_start_time : The start time in ISO 8601 combined date and time format in UTC. • Offset, limit • sort_by
  • 32. Alarms GET, PUT, PATCH, DELETE /v2.0/alarms/{alarm-id} • Alarms created by the Threshold Engine based on matching alarm definitions. • When new nodes or components are deployed, alarms are automatically created • Alarms are resources within Monasca. They have a resource ID and lifecycle. • By default, three states: OK, ALARM and UNDETERMINED • UNDETERMINED state occurs when metrics are no longer being received • Deterministic alarms, two states: OK and ALARM • Used for systems where metrics are sporadic. E.g. Creating metrics when errors in log files occur, and no metrics, when there aren't any errors.
  • 33. Alarm Counts GET /v2.0/alarms/count • Query the total number of alarms in the OK, ALARM or UNDETERMINED state, and their severities, grouped by metrics dimension, such as OpenStack service, state and severity. • Used for summary dashboards
  • 35. Alarm History GET /v2.0/alarms/state-history • Lists the alarm state history for alarms • Query Parameters: • Dimensions to filter on • Start/end timestamp • Offset, limit GET /v2.0/alarms/{alarm-id}/state-history • Lists the alarm state history for a specific alarm
  • 36. Notification Methods POST, GET, DELETE /v2.0/notification-methods Notification methods are associated with Actions in alarm definitions. Example: POST /v2.0/notification-methods { "name":"Name of notification method", "type":"EMAIL", "address":"john.doe@hp.com" }
  • 37. Monasca Agent • System metrics (cpu, memory, network, filesystem, …) • Service metrics • MySQL, Kafka, and many others • Application metrics • Built-in Statsd daemon • Python monasca-statsd library: Adds support for dimensions • VM system metrics • Open vSwitch metrics • Active checks • HTTP status checks and response times • System up/down checks (ping and ssh) • Runs any Nagios plugin or check_mk • Extensible/Pluggable: Additional services can be easily added
  • 38. Agent details • The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API. • The Agent request an auth token from the Keystone Identity service which is supplied on all requests. • The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone • If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored • Metrics are submitted using a “agent” role, which only allows metrics to be POST’d to the metrics endpoint
  • 39. Grafana/Monasca Integration • Datasource: A datasource that can be added to the Grafana dashboard to enable Monasca • https://github.com/openstack/monasca-grafana-datasource • Keystone authentication • https://github.com/twc-openstack/grafana • Support for Alerting will be added in Grafana 4.
  • 42. Logging API • POST /v3.0/logs • Batch log messages in a single http request • Global / local / mixed dimensions • Similar to dimensions in metrics. • JSON only • Specification • https://github.com/openstack/monasca-log-api/blob/master/docs/monasca- log-api-spec.md • Queries not done via API, but via Tenantized version of Kibana • https://github.com/FujitsuEnablingSoftwareTechnologyGmbH/fts-keystone
  • 43. Log Model • { "dimensions": { "hostname":"devstack", "service":"monitoring", "component":"monasca-api" } "logs":[ { "message":"msg1", "dimensions": { "service":"compute", "component":"nova-api", "path":"/var/log/mysql.log" } }, { "message":"msg2", "dimensions": { "path":"/var/log/monasca/monasca-api.log" } } ] }
  • 44. Log Agents • Logstash • https://github.com/logstash-plugins/logstash-output-monasca_log_api/pull/1 • Beaver • https://github.com/python-beaver/python-beaver/pull/406 • Logspout: Under Investigation
  • 45. Kibana Integration • Keystone authentication support for Kibana • Authentication plugin: • https://github.com/FujitsuEnablingSoftwareTechnologyGmbH/fts-keystone • Note: In progress of moving to official OpenStack repo
  • 48. Monasca Transform • A new micro-service in Monasca that aggregates and transforms metrics. • Currently based on Apache Spark Streaming. • Use Cases: • Object Storage Disk Capacity • Object Storage Capacity • Compute Host Capacity • VM Capacity • More to come • Metrics are aggregated and published every hour. • Currently in deployment in HPE Helion OpenStack 4.0. • OpenStack project/repo • https://github.com/openstack/monasca-transform
  • 49. Monasca Analytics • A framework that adds data science tools (parsers, algorithms, etc). • Features include: • Algorithmic flow definition, enabling sharing of complex algorithmic recipes • Thin orchestration layer that instantiates an execution environment. • Focused on: • Anomaly detection • Reducing alert fatigue via alarm clustering (unsupervised machine learning). • Example algorithms: One Class SVM and LiNGAM. • Status: Under Development • OpenStack project/repo • https://github.com/openstack/monasca-analytics
  • 50. Distributions & Deployments • Charter Communications: • Monasca and Grafana is currently deployed in production private cloud • Monitoring-as-a-Service Use cases supported with Grafana as the Visualization Dashboard • 2 datacenters, 600-700 compute nodes, 1000 VMs, 11,000 metrics/sec • FIWARE Lab: • http://superuser.openstack.org/articles/monitoring-a-multi-region-cloud-based-on-openstack/ • Hewlett Packard Enterprise: Cloud System, Helion OpenStack • Supported and tested up to 65K metrics/sec injest rates. • Fujitsu: • FUJITSU Software ServerView Cloud Monitoring Manager. • NEC: • Planning to include Monasca in "Cloud Solution Menus" solution. • Others
  • 51. Statistics: Mitaka/Newton Release • Organizations:​ • Contributors:​ • Commits:​ • Reviews:​ • Lines of code:​ 31​ 97​ 1075​ 4080​ 215,370​
  • 52. Ecosystem • Hewlett Packard Enterprise • Fujitsu • Charter Communications • NEC • Cisco • Cloudbase Solutions • SUSE • SolidFire • SAP • Cray Inc. • FIWARE Lab • Mirantis • Broadcom
  • 53. Containers and Kubernetes • New Monasca Agent Plugins • Docker plugin • cAdviser plugin • Kubernetes plugin: Monitors both Kubernetes control plane and containers • Prometheus client plugin: Scrapes apps • Mesos pugin • Containerization of Monasca • Heapster Monasca data sink
  • 54. Next Steps • Containerizing Monasca • Monitoring containers and container managers, such as Kubernetes • Grouping notifications