Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Monitor everything from physical hardware to application functionality

The IT industry is a diverse and dynamic world where applications and functions may be spread out - and move between a multitude of providers and technologies such as Amazon AWS, Rackspace, KVM, volatile containers, and your internal traditional IT infrastructure with physical servers.
Monitoring all of these might require one monitoring tool per platform, or at least a few to seamlessly blend metrics, events and logs to get true Observability on your environment. OP5's intention is to address this with Project Omega. Designed from the ground up using cloud-native technologies packaged in a container environment to be running on premise or as SaaS, scaling horizontally with Kubernetes.
Initially the focus is on monitoring OpenStack with the Monasca project and developing the agent in and for the community providing patches and reviews since the Queens release of OpenStack, using modern REST API’s, time series database for metrics, message queues using Kafka and preparing the stack to for real-time analysis using Apache Storm.

  • Login to see the comments

Monitor everything from physical hardware to application functionality

  1. 1. Only 4 days SUBHEADING TEXT Monitor everything from physical hardware to application functionality Welcome to our lavish smorgasbord offering within IT Monitoring. OP5 is the market leader of IT monitoring throughout the Nordic region and in over 50 countries around the world.
  2. 2. Passionate software developer at OP5 AB. Particular interests are coding, cloud, software engineering and architecture, distributed and scalable systems. Nicolas Seyvet
  3. 3. The IT Monitoring Software Solution. From Sweden. For a Global Market. Based on Open Source. OP5 is a Swedish company founded in 2004. The vision was to develop an IT monitoring software solution based on the Open Source project Nagios that would offer an unprecedented user experience. A solution that would be easy to implement, intuitive to work with and provide unparalleled scalability to support clients and their ever changing business needs. Today, OP5 has grown into an International company with a presence in over 60 countries. Thousands of IT professionals across the world rely daily on solutions from OP5 to monitor their business-critical IT services.
  4. 4. The OP5 product Monitor is Nagios Based on: - Checks - Plugins - BUT static infrastructure
  5. 5. Infrastructure: - Increased number of devices - Virtual Applications: - On-demand deployments (cloud) - Ephemeral/moving processes - Distributed Monitor everything in the data center? The three Vs of Big Data: - Volume - Velocity - Variety Dynamic, complex environment Outpacing humans Average DC -> ~ 20 000 servers
  6. 6. Monitoring One simple dimension: Dynamicity
  7. 7. Time series Event source Multiple series of timestamp, value pairs <series name> (t0, v0) (t1, v1) (t2, v2) (t3, v3) … metric/event produces Time
  8. 8. Not all sources are created equal Time Long lived Virtual Infrastructure Application layer Medium lived Ephemeral Physical Infrastructure
  9. 9. An example Let’s assume 20 000 servers with 4 micro-services per server: Assume 100 metrics per instance: Out of which: Add dynamicity and elasticity → 0.01%/s replacement rate: Then, add the virtual infrastructure, failures in the DC, new racks, etc. → 20 000 + 4 x 20 000 = 100 000 instances → 10 000 000 active time series → 2 000 000 are long lived 8 000 000 are ephemeral → 0.01% * 8 000 000 = 80 new time series/s ~6 900 000 new time series per day
  10. 10. Monitoring Monasca
  11. 11. Monasca ( is a open-source multi-tenant, massively scalable, fault-tolerant monitoring-as-a-service solution. Main features: - An event driven architecture. - A set of REST APIs for high-speed event processing and querying. - A real-time streaming engine (alarms and transformations).. - An agent (collector) with plugins. - A push based system. Part of the (but not limited to) OpenStack family. Monasca
  12. 12. OpenStack began in 2010 as a joint project between NASA and Rackspace. Open source software for creating private and public clouds (Infrastructure as a Service).. Control large pools of compute, storage, and networking resources throughout a datacenter, managed through a dashboard or via RESTful APIs. OpenStack Key Features
  13. 13. OpenStack Open Source projects MonascaMONASCA Monitoring
  14. 14. Stack What is Monasca?
  15. 15. The clients Monasca API Horizon Dashboard Grafana Dashboard Monasca Agent Users GET/POST Push Auth. Keystone Authentication/Authorization → Multi-tenancy Query, Create/define alarms and notifications
  16. 16. Monasca API Data/Event Bus Publish/ Subscribe The core Kafka is an OpenSource massively scalable Pub-Sub message queue: - horizontally scalable - fault-tolerant - high throughput (>100K to millions of events/s) - at least once guarantee
  17. 17. Monasca API Data/Event Bus Configuration Persister Streaming Engine Notification Engine Threshold Transform Anomaly Subscribe SubscribePublish/ Subscribe TSDB Logs/Events The backend Threshold engine: What to monitor in real-time (alarms) Transform engine: From raw to smart data.
  18. 18. The Monasca stack Monasca API Horizon Dashboard Grafana Dashboard Monasca Agent Users GET/POST Push Data/Event Bus Configuration Persister Streaming Engine Notification Engine Threshold Transform Anomaly Subscribe SubscribePublish/ Subscribe TSDB Logs/Events Auth. Keystone
  19. 19. Stack Two benefits: Extensibility and “what?”
  20. 20. Easy to extend Data/Event Bus My Function/App Persister Streaming Engine Notification Engine Event driven architecture. Publish/ Subscribe ...
  21. 21. Highest level: What to alarm on? Domain Specific Language (DSL) Where a sub-expression: <sub_expression> ::= <function> '(' <metric> [',' period] ')' <operator> threshold_value ['times' periods] Example: <expression> ::= <subexpression> [(and | or) <subexpression>]* avg(disk.space_used_perc{hostname=compute_node_1}) >= 99 and count(log.error{hostname=compute_node_1,component=kafka},deterministic) >= 1 function min max sum avg count last
  22. 22. Stack In conclusion
  23. 23. To sum up: - Built for self-healing and elasticity (horizontal scalability) - Can handle billions of time-series at high throughput - Multi-tenant - Extensible - DSL to monitor what matters - Can combine different sources (metrics/events/logs) Built on top of Kubernetes, runs on AWS, OpenStack and VMWare. $ # Deploy in one line $ helm install op5_monasca OP5 Monasca
  24. 24. OP5 HQ Norgegatan 2 SE-164 32 Kista Sweden +46 (0)8 58 83 01 00 Call us Follow us Nicolas Seyvet Backend Engineer Email Twitter: @NicolasSeyvet Blog: Github: Questions?