Monasca, monasca.io ist eine Turn-Key Open Source OpenStack Monitoring-as-a-Service Plattform, die Authentifizierung und multi-Tenancy mittels OpenStack Keystone Identity Service unterstützt. Monasca ist eine hoch skalierbare, leistungsfähige und Fehler-tolerante Monitoring-as-a-Service Lösung, die Push-based Streaming-Metrics, Gesundheit/Status, Alarmierung/Thresholding und Benachrichtigungen unterstützt. Logging-as-a-Service befindet sich in der Entwicklung, und das Ziel ist es eine umfassende und integrierte Monitoring Lösung für Open Stack Clouds zur Verfügung zu stellen, die auch Kennzahlen, Events und Logs unterstützt.
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...Streamlio
Modern enterprises produce data at increasingly high volume and velocity. To process data in real time, new types of storage systems have been designed, implemented, and deployed. This presentation from Strata 2017 in New York provides an overview of Apache DistributedLog and Pulsar, real-time storage systems built using Apache BookKeeper and used heavily in production.
Matteo Merli and Sijie Guo from Streamlio gave a hands-on workshop on Apache Pulsar. #fast #durable #pubsub #messaging system. A low latency alternative to #kafka.
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
Fundamentals and Architecture of Apache Kafka.
This presentation explains Apache Kafka's architecture and internal design giving an overview of Kafka internal functions, including:
Brokers, Replication, Partitions, Producers, Consumers, Commit log, comparison over traditional message queues.
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...Streamlio
Modern enterprises produce data at increasingly high volume and velocity. To process data in real time, new types of storage systems have been designed, implemented, and deployed. This presentation from Strata 2017 in New York provides an overview of Apache DistributedLog and Pulsar, real-time storage systems built using Apache BookKeeper and used heavily in production.
Matteo Merli and Sijie Guo from Streamlio gave a hands-on workshop on Apache Pulsar. #fast #durable #pubsub #messaging system. A low latency alternative to #kafka.
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
Fundamentals and Architecture of Apache Kafka.
This presentation explains Apache Kafka's architecture and internal design giving an overview of Kafka internal functions, including:
Brokers, Replication, Partitions, Producers, Consumers, Commit log, comparison over traditional message queues.
Message broker is a method to distribute the information across server. Recently, message broker used to build a distributed system, to scale up massive data distribution in this Information Era. Kafka is one of message broker tools that emerge recently to data streaming. This slide explain the benefit of message broker and the benefit of Kafka for a good quality of data distribution.
This slide is exported from Ms. Power Point to PDF.
High performance messaging with Apache PulsarMatteo Merli
Apache Pulsar is being used for an increasingly broad array of data ingestion tasks. When operating at scale, it's very important to ensure that the system can make use of all the available resources. Karthik Ramasamy and Matteo Merli share insights into the design decisions and the implementation techniques that allow Pulsar to achieve high performance with strong durability guarantees.
Kafka is a real-time, fault-tolerant, scalable messaging system.
It is a publish-subscribe system that connects various applications with the help of messages - producers and consumers of information.
The session discusses on how companies are using Apache Kafka & also covers under the hood details like partitions, brokers, replication.
About apache kafka: Apache Kafka is a distributed a streaming platform, Apache Kafka provides low-latency, high-throughput, fault-tolerant publish and subscribe pipelines and is able to process streams of events. Kafka provides reliable, millisecond responses to support both customer-facing applications and connecting downstream systems with real-time data.
Kafka is most popular messaging queue.
Key Areas:
What is Messgaing Queue?
Why Messaging Queue?
Kafka- basic terminologies
Kafka- Architecture (Message Flow)
AWS SQS vs Apache Kafka
Effectively-once semantics in Apache PulsarMatteo Merli
“Exactly-once” is a controversial term in the messaging landscape. In this presentation we offer a detailed look at effectively-once delivery semantics in Apache Pulsar and how this is achieved without sacrificing performance.
Scalability, fault tolerance, distributed log…these are terms which we hear more and more these days. Make them happen is quite a challenge sometimes especially if our business need to be data intensive, agile and fast to market.
One way to answer to this challenge is microservices. These are small services that communicate to each other to deliver business value. The key word here is _communication_. Without communication all the power of microservices falls apart. And communication is not a trivial fact when involves systems with multiple data systems that are talking to one another over many channels. Each of the channel requiring their own protocol and communication methods. This is where communication can become a bottleneck if not handled properly.
One answer to this problem is Kafka, a distributed messaging system providing fast, highly scalable and redundant message exchange using a publish-subscribe model. And when we talk about fast we talk about one of the fastest messaging systems out there.
This presentation will show you an alternative way of doing microservices with event-driven architecture through Kafka.
Presenters:
Laszlo-Robert Albert (albertlaszlorobert [at] gmail [dot] com)
Dan Balescu (dfbalescu [at] gmail [dot] com)
Message broker is a method to distribute the information across server. Recently, message broker used to build a distributed system, to scale up massive data distribution in this Information Era. Kafka is one of message broker tools that emerge recently to data streaming. This slide explain the benefit of message broker and the benefit of Kafka for a good quality of data distribution.
This slide is exported from Ms. Power Point to PDF.
High performance messaging with Apache PulsarMatteo Merli
Apache Pulsar is being used for an increasingly broad array of data ingestion tasks. When operating at scale, it's very important to ensure that the system can make use of all the available resources. Karthik Ramasamy and Matteo Merli share insights into the design decisions and the implementation techniques that allow Pulsar to achieve high performance with strong durability guarantees.
Kafka is a real-time, fault-tolerant, scalable messaging system.
It is a publish-subscribe system that connects various applications with the help of messages - producers and consumers of information.
The session discusses on how companies are using Apache Kafka & also covers under the hood details like partitions, brokers, replication.
About apache kafka: Apache Kafka is a distributed a streaming platform, Apache Kafka provides low-latency, high-throughput, fault-tolerant publish and subscribe pipelines and is able to process streams of events. Kafka provides reliable, millisecond responses to support both customer-facing applications and connecting downstream systems with real-time data.
Kafka is most popular messaging queue.
Key Areas:
What is Messgaing Queue?
Why Messaging Queue?
Kafka- basic terminologies
Kafka- Architecture (Message Flow)
AWS SQS vs Apache Kafka
Effectively-once semantics in Apache PulsarMatteo Merli
“Exactly-once” is a controversial term in the messaging landscape. In this presentation we offer a detailed look at effectively-once delivery semantics in Apache Pulsar and how this is achieved without sacrificing performance.
Scalability, fault tolerance, distributed log…these are terms which we hear more and more these days. Make them happen is quite a challenge sometimes especially if our business need to be data intensive, agile and fast to market.
One way to answer to this challenge is microservices. These are small services that communicate to each other to deliver business value. The key word here is _communication_. Without communication all the power of microservices falls apart. And communication is not a trivial fact when involves systems with multiple data systems that are talking to one another over many channels. Each of the channel requiring their own protocol and communication methods. This is where communication can become a bottleneck if not handled properly.
One answer to this problem is Kafka, a distributed messaging system providing fast, highly scalable and redundant message exchange using a publish-subscribe model. And when we talk about fast we talk about one of the fastest messaging systems out there.
This presentation will show you an alternative way of doing microservices with event-driven architecture through Kafka.
Presenters:
Laszlo-Robert Albert (albertlaszlorobert [at] gmail [dot] com)
Dan Balescu (dfbalescu [at] gmail [dot] com)
Session on CloudStack, intended for new users to CloudStack, provides an overview to varied audience levels information on usages, use cases, deployment and its architecture.
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...confluent
Microservices, events, containers, and orchestrators are dominating our vernacular today. As operations teams adapt to support these technologies in production, cloud-native platforms like Pivotal Cloud Foundry and Kubernetes have quickly risen to serve as force multipliers of automation, productivity and value.
Apache Kafka® is providing developers a critically important component as they build and modernize applications to cloud-native architecture.
This talk will explore:
• Why cloud-native platforms and why run Apache Kafka on Kubernetes?
• What kind of workloads are best suited for this combination?
• Tips to determine the path forward for legacy monoliths in your application portfolio
• Demo: Running Apache Kafka as a Streaming Platform on Kubernetes
Transforming Legacy Applications Into Dynamically Scalable Web ServicesAdam Takvam
The tools and technologies used to power the modern data center are evolving at a pace faster than most companies can keep up. Aging web services built on LAMP, WAMP, or ASP cannot readily take advantage of the latest in scalable web platforms and technologies. In this presentation, we will discuss what factors must be considered in order for your aging web service to take advantage of technologies such as Apache Mesos, Marathon, Docker, Apache Kafka, and more.
This talk is intended for software developers, operations, and IT managers who are looking to modernize existing privately-hosted web applications. We will look at the transformation of the data center from a high-level perspective, examining before and after topology examples using Key Performance Indicators and Key Performance Metrics to show how levering modern design principles can both improve application performance and reduce operational costs. Next we will look at some example applications and show what needs to be done from both the software development and infrastructure perspectives to successfully accomplish the transformation.
Unleashing Real-time Power with Kafka.pptxKnoldus Inc.
Unlock the potential of real-time data streaming with Kafka in this session. Learn the fundamentals, architecture, and seamless integration with Scala, empowering you to elevate your data processing capabilities. Perfect for developers at all levels, this hands-on experience will equip you to harness the power of real-time data streams effectively.
Pulsar - flexible pub-sub for internet scaleMatteo Merli
Pub-Sub messaging is a very convenient abstraction that allows system and application developers to decouple components and let them communicate, by acting as durable buffer for transient data, or as a persistent log from where to recover after crashes. This talk will present an overview of Apache Pulsar, the reasons that led to its development and how it enabled many teams at Yahoo and to build scalable and reliable applications. Apache Pulsar has become the defacto pub-sub messaging at Yahoo serving 100+ applications and processing 100’s of billions of messages for over 3+ years.
In this talk, we will explore in detail different categories of use cases that highlight how Pulsar can be applied to solve a broad range of problems thanks to its flexible messaging model that supports both queuing and streaming semantics with a focus on durability and transaction guarantees.
Linked In Stream Processing Meetup - Apache PulsarKarthik Ramasamy
Apache Pulsar is the next generation messaging system that uses a fundamentally different architecture to achieve durability, performance, scalability, efficiency, multi-tenancy and geo replication.
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...Lucas Jellema
Introduction of Apache Kafka - the open source platform for real time message queuing and reliable, scalable, distributed event handling and high volume pub/sub implementation.
see GitHub https://github.com/MaartenSmeets/kafka-workshop for the workshop resources.
Cask Webinar
Date: 08/10/2016
Link to video recording: https://www.youtube.com/watch?v=XUkANr9iag0
In this webinar, Nitin Motgi, CTO of Cask, walks through the new capabilities of CDAP 3.5 and explains how your organization can benefit.
Some of the highlights include:
- Enterprise-grade security - Authentication, authorization, secure keystore for storing configurations. Plus integration with Apache Sentry and Apache Ranger.
- Preview mode - Ability to preview and debug data pipelines before deploying them.
- Joins in Cask Hydrator - Capabilities to join multiple data sources in data pipelines
- Real-time pipelines with Spark Streaming - Drag & drop real-time pipelines using Spark Streaming.
- Data usage analytics - Ability to report application usage of data sets.
- And much more!
Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...Flink Forward
A common and reliable way to buffer streaming data in between Flink pipelines is a pair of Flink Kafka Source and Sink. However, in some low-latency streaming firehouse use-cases this option is not the best choice: a) backlog will quickly accumulate in Kafka if Source consumption rate can’t keep up with the Sink production b) Kafka broker implies a double dispatch of all data via broker network IO with unnecessary network hops to and from the Flink cluster. In the Walmart Labs Smart Pricing group, we encountered one such use-case operating Walmart.com Marketplace real-time price regulation service, managing more than 100M of the 3rd party marketplace items in real-time on at least a daily basis. As an alternative to a Kafka-based integration firehose, we decided to introduce a reliable and scalable in-memory-cache-based streaming buffer as an integration hub for our Flink-based Walmart.com Marketplace pipelines. Our main motivation was to avoid firehose backlog accumulation at all cost and maximize total end-to-end throughput. Our implementation is powered by a sharded (using Akka Cluster-Sharding) collection of replicated Akka Distributed Data caches, co-located with Flink Task Managers. Flink pipelines are interacting with this streaming buffer via a pair of custom partitioned Flink Sink and Source components that we wrote specifically to expose this cache to Flink. The resulting latency and throughput performance is better than what a Kafka-broker-based approach offers: a) there is almost never a foreground data exchange over cache cluster network IO as nearly all (determined by the cache miss rate) data is written and read in Flink pipelines through local memory b) cache data size, miss rate and updates volume can be managed via both the shard fill rate (not every write needs to be a new cache record – as opposed to messaging systems like Kafka) and the number of shards to keep alive in memory (if some shards are not actively accessed – they can be automatically killed and spilled to a permanent storage to be recovered later via Akka Persistence). The downside of this cache buffer is a large RAM demand: cache shards are memory-hungry and co-locating it with Flink Task Managers means that this memory will be unavailable to allocate to the Flink Task Manager heap and/or direct buffer.
Why React Native as a Strategic Advantage for Startup Innovation.pdfayushiqss
Do you know that React Native is being increasingly adopted by startups as well as big companies in the mobile app development industry? Big names like Facebook, Instagram, and Pinterest have already integrated this robust open-source framework.
In fact, according to a report by Statista, the number of React Native developers has been steadily increasing over the years, reaching an estimated 1.9 million by the end of 2024. This means that the demand for this framework in the job market has been growing making it a valuable skill.
But what makes React Native so popular for mobile application development? It offers excellent cross-platform capabilities among other benefits. This way, with React Native, developers can write code once and run it on both iOS and Android devices thus saving time and resources leading to shorter development cycles hence faster time-to-market for your app.
Let’s take the example of a startup, which wanted to release their app on both iOS and Android at once. Through the use of React Native they managed to create an app and bring it into the market within a very short period. This helped them gain an advantage over their competitors because they had access to a large user base who were able to generate revenue quickly for them.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Modern design is crucial in today's digital environment, and this is especially true for SharePoint intranets. The design of these digital hubs is critical to user engagement and productivity enhancement. They are the cornerstone of internal collaboration and interaction within enterprises.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
3. Agenda
• Describe how to build a highly scalable monitoring and logging as a
service platform
• Architectural and design principles
• Scale, HA
• Provide an overview of Monasca
• Features
• API
• Demo
4. What is Monitoring-as-a-Service?
• A Monitoring or Logging solution deployed as Software-as-a-Service
• E.g. CloudWatch, Datadog, New Relic, Librato, Loggly and many others
• First-class, preferably RESTful HTTP API
• Authentication
• Multi-tenancy
• Provides self-provisioning to users/tenants of the service
• Designed to be highly reliable and operate at scale
• Historically run by an operations team doing web services
5. What is OpenStack?
• OpenStack is a cloud operating system that controls large pools of
compute, storage, and networking resources
• Open-source alternative to AWS, Microsoft Azure, Google Cloud and
other cloud services
• Deployed in both public and private clouds
6. What is Monasca?
• Open-source Monitoring/Logging-as-a-Service platform for OpenStack
• Authentication currently via OpenStack Identity Service (Keystone)
• Microservices message-bus based architecture
• First-class RESTful API
• Push-based metrics
• Consolidates Operational Monitoring, Monitoring-as-a-Service, Metering &
Billing and more
• Designed for elastic cloud environments/deployments
• High-availability / clustering built-in
• Horizontally scalable and vertically 4 tiered/layered architecture
• Capable of long-term data retention to address metering, SLA, capacity
planning, trend analysis, post-hoc RCA, and other use cases
• Extensible and Composable
7. The Log
• The Log: What every software engineer should know about real-time data's
unifying abstraction
• https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-
should-know-about-real-time-datas-unifying
• Log: An append-only, totally-ordered sequence of records ordered by time
From To
9. Kafka
• A performant, distributed, durable, publish/subscribe messaging and stream
processing system
• Metrics, logs and events are published to topics in Kafka
• Microservices register in a "consumer group" as a consumer
• Microservices "subscribe" to topics and consume metrics/logs and events
• Messages are replicated per consumer group
• Messages are load-balanced across all consumers in a consumer group
• Can add/remove micro-services to handle load or mitigate problems
• As micro-services expand/contract the partitions are automatically re-balanced
• At-least-once semantic guarantees on message delivery
• Also used for domain events, notification retry events, periodic notifications,
grouping notifcations and other areas
• Always accept data, never drop data, true elasticity
• Loggly: https://www.youtube.com/watch?v=LpNbjXFPyZ0
10. CQRS
• Command Query Responsibility Segregation (CQRS)
• CQRS involves splitting an application into two parts internally:
1. Command side ordering the system to update state
2. Query side that gets information without changing state
• Advantages
• Decouples the read/write load. Allows each to be scaled independently
• Read store can be optimized for the query pattern of the application
• Reference
• Event sourcing, CQRS, stream processing and Apache Kafka
• https://www.confluent.io/blog/event-sourcing-cqrs-stream-processing-apache-kafka-whats-connection/
11. Microservices
• Microservices are small, autonomous, decoupled services that are
deployed independenty and work together as a single application
• Communication between services occurs via a network
• Services need to be able to change independently of each other, and be
deployed by themselves without requiring consumers to change
• Benefits:
• Resilience
• Scale
• Ease of deployment
• Organizational Alignment
• Optimized for Change/Replaceability
14. Deployment Models (HA/Scale)
• Many ways to deploy Monasca
• Typically deployed in a clustered/HA configuration using three nodes
or greater
• If any node or microservice fails, the cluster remains operational
• Partitions in Kafka are redistributed among the remaining components
• Preferably, the database is run on a separate layer from the other
components/microservices
• Note, Monasca can also be deployed on a single-node, non-clustered
• Has also been containerized and run in Kubernetes
15. Metrics Model
POST /v2.0/metrics
{
name: http_status,
dimensions:
{
url: http://host.domain.com:1234/service,
cluster: c1,
control_plane: ccp,
service: compute
}
timestamp: 0, /* milliseconds */
value: 1.0,
value_meta: {
status_code: 500,
msg: Internal server error
}
}
• Simple, concise, multi-dimensional flexible description
• Name (string)
• Dimensions: Dictionary of user-defined (key, value)
pairs that are used to uniquely identify a metric
• Optional dictionary of user-defined (key, value)
pairs that can be used to describe a measurement
• Normally used for errors and messages
16. Push vs Pull
• Monitoring-as-a-Service
• Can't always pull due to firewalls and network issues
• Low-latency: sub-second latency difficult for pull model
• Doesn't require service discovery and registration
• As entities are deployed, they can start sending metrics without have to be
discovered or registered
• Events
• Temporary caching/buffering of metrics/events while service
unreachable.
17. Monasca API
• Primary point for pushing metrics and handling queries
• Authenticates all requests against the Keystone identity service
• Note, auth tokens are cached to reduce the load on Keystone
• Resources: Metrics, Alarm Definitions, Alarms and Notification Methods
• API Specification:
• https://github.com/openstack/monasca-api/tree/master/docs
• Horizontally scalable
• Publishes metrics to Kafka
• Queries timeseries DB for measurements and statistics
• Queries Config DB for alarms, alarm definitions and notification methods
18. Persister
• Consumes both metrics and alarm state transition events from Kafka
• Stores temporarily in-memory and does batch writes to the TSDB, based on
batch size or time, to optimize write performance
• At-least once message delivery semantics:
• No metrics or alarm state transition events are lost
• The Kafka consumer offset for each batch is only updated after successfully storing
the metric or alarm state transition event
• Note, duplicates are possible
• HA/fault-tolerance:
• Multiple persisters run simultaneously and balance load
• If a persister fails, the load is automatically re-balanced across the remaining
persisters.
19. Time Series Databases
• Used for storing:
• Metrics
• Alarm state history
• Two databases supported:
1. Vertica
• Enterprise class, proprietary, closed-source, clustered, HA, analytics database
• Excels at time-series
2. InfluxDB
• Open-source single-node time-series DB
• Clustering is closed-source
• Note, can replicate to multiple instances of InfluxDB using Kafka
• Investigating support for additional databases
20. Config Database
• Stores all "transactional" data for Monasca such as
• Alarm Definitions
• Alarms
• Notification Methods
• MySQL and Postgres supported
• Typically deployed in a clustered or HA configuration
21. Threshold Engine
• Near real-time stream processing, clustered and highly available
threshold engine
• Based on Apache Storm
• Consumes metrics from Kafka
• Creates alarms based on metrics that match patterns specified in the
alarm definition
• Evaluates whether metrics exceed threshold
• Publishes alarm state transition events to Kafka
• Supports both simple and compound alarm expressions
22. Notification Engine
• Consumes "alarm state transition events" from Kafka produced by the
Threshold Engine
• Evaluates whether notifications should be sent based on actions specified
in the alarm definition.
• OK, ALARM and UNDETERMINED actions
• Supports email, PagerDuty, webhooks, HipChat, Slack and JIRA
• Dynamic plugins supported
• Supports both "one-shot" and "periodic" notifications
• If sending to the notification address fails, then notification is published to
retry topic in Kafka, and retried later
• Grouping notifications: In progress
23. Kafka Message Schema
• JSON messages published/consumed to/from Kafka by Monasca
micro-services
• Well-defined schema is published at:
• https://wiki.openstack.org/wiki/Monasca/Message_Schema
24. Metrics
Create, query and get statistics for metrics
• GET, POST /v2.0/metrics
• GET /v2.0/metrics/names:
• Returns the unique metric names
• GET /v2.0/metrics/dimension/names
• Returns the unique dimension names
• GET /v2.0/metrics/dimension/names/values
• Returns the unique dimension values
25. Measurements
GET /v2.0/metrics/measurements
• Returns a list of measurements
• Query parameters
• Name and dimensions to filter by
• Start_time and end_time
• Offset and limit
• merge_metrics: allow multiple metrics to be combined into a single list
of measurements.
• group_by: list of columns to group the metrics to be returned. Allows
multiple unique metrics to be returned in a single query.
26. Statistics
GET /v2.0/metrics/statistics
• Query parameters
• Name and dimensions to filter by
• Start_time and end_time
• Statistics: avg, min, max, sum and count
• Period: The time period to aggregate measurements by
• Offset, limit
• merge_metrics: allow multiple metrics to be combined into a single list
of statistics
• group_by: list of columns to group the metrics to be returned. Allows
multiple unique metrics to be returned in a single query.
28. Metric Dimension Names
GET /v2.0/metrics/dimensions/names
• List the dimension names
• Query parameters
• Metric name
• Offset, limit
29. Metric Dimension Values
GET /v2.0/metrics/dimensions/names/values
• List the dimension values
• Query parameters
• Metric name
• Dimension name
• Offset, limit
30. Alarm Definitions
POST, GET /v2.0/alarm-definitions
• Alarm definitions are templates that are used to automatically and
dynamically create alarms based on matching metric names and
dimensions
• One alarm definition can result in zero or more alarms.
• Simple grammar for creating compound alarm expressions:
• avg(cpu.user_perc{}) > 85 or avg(disk.read_ops{device=vda}, 120) > 1000
• Alarm states (OK, ALARM and UNDETERMINED)
• Actions associated with alarms for state transitions
• User assigned severity (LOW, MEDIUM, HIGH, CRITICAL)
• Thresholds can be dynamically adjusted via PATCH
• Minimal lifecycle management, alarm_lifecycle_state and link.
31. List Alarms
GET /v2.0/alarms
Query parameters:
• metric_name - Name of metric to filter by
• metric_dimensions
• State: OK, ALARM or UNDETERMINED.
• Severity: One or more severities to filter by, separated with |,
ex. severity=LOW|MEDIUM
• state_updated_start_time : The start time in ISO 8601 combined date and
time format in UTC.
• Offset, limit
• sort_by
32. Alarms
GET, PUT, PATCH, DELETE /v2.0/alarms/{alarm-id}
• Alarms created by the Threshold Engine based on matching alarm
definitions.
• When new nodes or components are deployed, alarms are automatically created
• Alarms are resources within Monasca. They have a resource ID and
lifecycle.
• By default, three states: OK, ALARM and UNDETERMINED
• UNDETERMINED state occurs when metrics are no longer being received
• Deterministic alarms, two states: OK and ALARM
• Used for systems where metrics are sporadic. E.g. Creating metrics when errors in log
files occur, and no metrics, when there aren't any errors.
33. Alarm Counts
GET /v2.0/alarms/count
• Query the total number of alarms in the OK, ALARM or
UNDETERMINED state, and their severities, grouped by
metrics dimension, such as OpenStack service, state and
severity.
• Used for summary dashboards
35. Alarm History
GET /v2.0/alarms/state-history
• Lists the alarm state history for alarms
• Query Parameters:
• Dimensions to filter on
• Start/end timestamp
• Offset, limit
GET /v2.0/alarms/{alarm-id}/state-history
• Lists the alarm state history for a specific alarm
36. Notification Methods
POST, GET, DELETE /v2.0/notification-methods
Notification methods are associated with Actions in alarm definitions.
Example:
POST /v2.0/notification-methods {
"name":"Name of notification method",
"type":"EMAIL",
"address":"john.doe@hp.com"
}
37. Monasca Agent
• System metrics (cpu, memory, network, filesystem, …)
• Service metrics
• MySQL, Kafka, and many others
• Application metrics
• Built-in Statsd daemon
• Python monasca-statsd library: Adds support for dimensions
• VM system metrics
• Open vSwitch metrics
• Active checks
• HTTP status checks and response times
• System up/down checks (ping and ssh)
• Runs any Nagios plugin or check_mk
• Extensible/Pluggable: Additional services can be easily added
38. Agent details
• The Agent Forwarder buffers metrics for a short time to increase the
size of the http request body (number of metrics) sent to the
Monasca API.
• The Agent request an auth token from the Keystone Identity service
which is supplied on all requests.
• The Monasca Agent and API caches Monasca Agent and API caches
Monasca Agent and API caches auth tokens in-memory to reduce
the round-trip authorization requests to Keystone
• If network connectivity between the Agent and API occurs the Agent
will buffer metrics and send when connectivity is restored
• Metrics are submitted using a “agent” role, which only allows metrics
to be POST’d to the metrics endpoint
39. Grafana/Monasca Integration
• Datasource: A datasource that can be added to the Grafana
dashboard to enable Monasca
• https://github.com/openstack/monasca-grafana-datasource
• Keystone authentication
• https://github.com/twc-openstack/grafana
• Support for Alerting will be added in Grafana 4.
42. Logging API
• POST /v3.0/logs
• Batch log messages in a single http request
• Global / local / mixed dimensions
• Similar to dimensions in metrics.
• JSON only
• Specification
• https://github.com/openstack/monasca-log-api/blob/master/docs/monasca-
log-api-spec.md
• Queries not done via API, but via Tenantized version of Kibana
• https://github.com/FujitsuEnablingSoftwareTechnologyGmbH/fts-keystone
45. Kibana Integration
• Keystone authentication support for Kibana
• Authentication plugin:
• https://github.com/FujitsuEnablingSoftwareTechnologyGmbH/fts-keystone
• Note: In progress of moving to official OpenStack repo
48. Monasca Transform
• A new micro-service in Monasca that aggregates and transforms metrics.
• Currently based on Apache Spark Streaming.
• Use Cases:
• Object Storage Disk Capacity
• Object Storage Capacity
• Compute Host Capacity
• VM Capacity
• More to come
• Metrics are aggregated and published every hour.
• Currently in deployment in HPE Helion OpenStack 4.0.
• OpenStack project/repo
• https://github.com/openstack/monasca-transform
49. Monasca Analytics
• A framework that adds data science tools (parsers, algorithms, etc).
• Features include:
• Algorithmic flow definition, enabling sharing of complex algorithmic recipes
• Thin orchestration layer that instantiates an execution environment.
• Focused on:
• Anomaly detection
• Reducing alert fatigue via alarm clustering (unsupervised machine learning).
• Example algorithms: One Class SVM and LiNGAM.
• Status: Under Development
• OpenStack project/repo
• https://github.com/openstack/monasca-analytics
50. Distributions & Deployments
• Charter Communications:
• Monasca and Grafana is currently deployed in production private cloud
• Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboard
• 2 datacenters, 600-700 compute nodes, 1000 VMs, 11,000 metrics/sec
• FIWARE Lab:
• http://superuser.openstack.org/articles/monitoring-a-multi-region-cloud-based-on-openstack/
• Hewlett Packard Enterprise: Cloud System, Helion OpenStack
• Supported and tested up to 65K metrics/sec injest rates.
• Fujitsu:
• FUJITSU Software ServerView Cloud Monitoring Manager.
• NEC:
• Planning to include Monasca in "Cloud Solution Menus" solution.
• Others