Three Pillars, Zero Answers: Rethinking Observability

•

0 likes•215 views

Observability has never been more important: the complexity of microservices makes it harder and harder to answer basic questions about system behavior. The conventional wisdom claims that Metrics, Logging and Tracing are “the three pillars” of observability… yet software organizations check these three boxes and are still grasping at straws during emergencies. In this session, we’ll illustrate the problem with the three pillars: metrics, logs, and traces are just data – they are the fuel, not the car.

Technology

Three Pillars with Zero Answers
A New Observability Scorecard
March 5, 2020

● Katia Bazzi
● Software Engineer @ LightStep
● katia@lightstep.com
● lightstep.com

Observing microservices is hard
Google and Facebook solved this (right???)
They used Metrics, Logging, and Distributed Tracing…
So we should, too.
The Conventional Wisdom

The Three Pillars of Observability
- Metrics
- Logging
- Distributed Tracing

A word nobody knew in 2015…
Dimensions (aka “tags”) can explain variance
in timeseries data (aka “metrics”) …
… but cardinality

Logging Data Volume: a reality check
transaction rate
x all microservices
x cost of net+storage
x weeks of retention
-----------------------
way too much $$$$

The Life of Transaction Data: Dapper
Stage Overhead affects… Retained
Instrumentation Executed App 100.00%
Buffered within app process App 000.10%
Flushed out of process App 000.10%
Centralized regionally Regional network + storage 000.10%
Centralized globally WAN + storage 000.01%

Logs Metrics Dist. Traces
TCO scales gracefully
– ✓ ✓
Accounts for all data
(i.e., unsampled) ✓ ✓ –
Immune to cardinality
✓ – ✓
Fatal Flaws: A Review

Metrics, Logs, and Traces are
Just Data,
… not a feature or use case.

Part II
A New Scorecard
for Observability

Mental Model: “Goals” and “Activities”
Goals: how our services perform in the
eyes of their consumers
Activities: what we (as operators) actually
do to further our goals

“SLI” = “Service Level Indicator”
TL;DR: An SLI is an indicator of health that
a service’s consumers would care about.
… not an indicator of its inner workings
Quick Vocab Refresher: SLIs

Observability: Two Fundamental Goals
- Gradually improving an SLI
- Rapidly restoring an SLI
Reminder: “SLI” = “Service Level Indicator”
NOW!!!!
days, weeks, months…

1. Detection: measuring SLIs precisely
2. Refinement: reducing the search
space for plausible explanations
Observability: Two Fundamental Activities

Specificity:
- Cost of cardinality ($ per tag value)
- Stack support (mobile/web platforms, managed services, “black-
box OSS infra” like Kafka/Cassandra)
Fidelity:
- Correct stats!!! (global p95, p99)
- High stats frequency (stats sampling frequency, in seconds)
Freshness (lag from real-time, in seconds)
Scorecard: Detection

# of things your users
actually care about
# of microservices
# of failure modes
Must reduce
the search space!
Why “Refinement”?

The Refinement Process
Discover Variance
Explain Variance
Deploy
Fix

Scorecard: Refinement
Identifying Variance:
- Cardinality ($ per tag value)
- Robust stats (histograms (see prev slide))
- Retention horizons for plausible queries (time duration)
Explaining variance:
- Correct stats!!! (global p95, p99)
- “Suppress the messengers” of microservice failures

A fun game!
Design your own observability system:
High-throughput
High-cardinality
Lengthy retention window
Unsampled
Choose three.
(“Observability Whack-a-Mole”)

The Life of Trace Data: Dapper
Stage Overhead affects… Retained
Instrumentation Executed App 100.00%
Buffered within app process App 000.10%
Flushed out of process App 000.10%
Centralized regionally Regional network + storage 000.10%
Centralized globally WAN + storage 000.01%
(Review)

The Life of Trace Data: Dapper Other Approaches
Stage Overhead affects… Retained
Instrumentation Executed App 100.00%
Buffered within app process App 100.00%
Flushed out of process App 100.00%
Centralized regionally Regional network + storage 100.00%
Centralized globally WAN + storage “fancy”

Refinement
- Identifying variance:
cardinality cost, correct
stats, hi-fi histograms,
retention horizons
- “Suppress the messengers”
Detection
- Specificity: cardinality
cost, stack coverage
- Fidelity: correct stats,
high stats frequency
- Freshness: ≤ 5 seconds
An Observability Scorecard

● Automatic deployment and regression detection
● System and service diagrams
● Real-time and historical root cause analysis
● Correlations
● Custom alerting
● Easy Setup with no vendor lock-in
● No cardinality limitations, really
LightStep: Observability with context

Introduction to Microservices Architecture, Docker, Kubernetes, Istio, Testing Strategies for Microservices based Apps. Security Best Practices. Kanban, DevOps, and SRE. Infrastructure Design Patterns - API Gateway - Service Discovery - Load Balancer - Circuit Breaker - Let-it-Crash Pattern Software Design Patterns - Hexagonal Architecture - Domain Driven Design - Event Sourcing and CQRS - Functional Reactive Programming

Micro services Architecture

Araf Karsh Hamid

The document provides an overview of microservices architecture. It discusses key characteristics of microservices such as each service focusing on a specific business capability, decentralized governance and data management, and infrastructure automation. It also compares microservices to monolithic and SOA architectures. Some design styles enabled by microservices like domain-driven design, event sourcing, and functional reactive programming are also covered at a high level. The document aims to introduce attendees to microservices concepts and architectures.

MeasureWorks - Performance Labs - Why Observability Matters!

MeasureWorks

Chaos Engineering with Gremlin Platform

Anshul Patel

Chaos Engineering is a practice of experimenting on a system to evaluate its resilience by intentionally introducing failures. Gremlin is a failure/resiliency as a service platform that can safely simulate real world outages through various failure vectors at the infrastructure and application layers. It has a client-server architecture with clients installed in the infrastructure or applications to communicate with the Gremlin platform. Gremlin supports failure injection scenarios that can be triggered on demand or scheduled, and stores results and history.

Service Mesh - Observability

Araf Karsh Hamid

Platform engineering 101

Sander Knape

What does it take to get an application into production? Many processes, tools and automation surround that application to deliver it to the customer. As it becomes more common for development teams to autonomously deliver and run their software, the focus of the traditional operational teams shifts towards an as-a-service mindset. But how is such a team positioned within the company? And is Platform Engineering any different from Software Engineering? In this talk I’ll share my experiences as a platform engineer and explain why I believe that every company should be conscious about why and how to setup this responsibility. I’ll also discuss the biggest challenges surrounding it - and how to tackle them.

Principles of Monitoring Microservices

Michael Ducy

Containers and Microservices have radically changed how you get visibility into your applications. As developers have started to leverage orchestration systems on top of containers, the game is changing yet again. What was a simple application on a host before is now a sophisticated, dynamically orchestrated, multi-container architecture. It’s amazing for development - but introduces a whole new set of challenges for monitoring and visibility. In this talk we’ll lay out five key principles for monitoring microservices and the containers they are based on. These principles take into account the operational difference of containers and microservices when compared to traditional architectures. This talk is for the operator that needs to help development teams understand how visibility of apps has changed, and help teams implement these ideas. You’ll walk away with a good understanding of the challenges of monitoring microservices and how you can set your team up for success.

.conf Go 2022 - Observability Session

Splunk

This document summarizes a presentation about observability using Splunk. It includes an agenda introducing observability and why Splunk for observability. It discusses the need for modernization initiatives in companies and the thousands of changes required. It presents that Splunk provides end-to-end visibility across metrics, traces and logs to detect, troubleshoot and optimize systems. It shares a customer case study of Accenture using Splunk observability in their hybrid cloud environment. Finally, it concludes that observability with Splunk can drive results like reduced downtime and faster innovation.

This document discusses improving developer experience through a developer portal. It outlines some common developer pain points such as lack of standardization and visibility. A developer portal can help address these issues by providing easy access to services, standardized configurations, and visibility into operations. The document then introduces SCB TechX's Self-Service Portal, which aims to enable product teams to quickly ship code securely through automation and best practices for operations and security.

Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry

Kai Wähner

Agenda: 1) Defence, Modern Warfare, and Cybersecurity in 202X 2) Data in Motion with Apache Kafka as Defence Backbone 3) Situational Awareness 4) Threat Intelligence 5) Forensics and AI / Machine Learning 6) Air-Gapped and Zero Trust Environments 7) SIEM / SOAR Modernization Technologies discussed in the presentation include Apache Kafka, Kafka Streams, kqlDB, Kafka Connect, Elasticsearch, Splunk, IBM QRadar, Zeek, Netflow, PCAP, TensorFlow, AWS, Azure, GCP, Sigma, Confluent Cloud,

Principles of System Observability

Janis Orlovs

Observability & Datadog

JamesAnderson599331

Splunk-Presentation

PrasadThorat23

The document provides an overview of the Splunk data platform. It discusses how Splunk helps organizations overcome challenges in turning real-time data into action. Splunk provides a single platform to investigate, monitor, and take action on any type of machine data from any source. It enables multiple use cases across IT, security, and business domains. The document highlights some of Splunk's products, capabilities, and customer benefits.

CTO Summit 2022

Dr. Jimmy Schwarzkopf

This document discusses platform teams and platform engineering. It introduces platform products as products that can be easily used by other teams to focus on business problems while maintaining standards. Platform engineering is defined as designing and building toolchains and workflows that enable self-service capabilities for other teams. Platform products have functionality above the surface that developers see, and infrastructure components below the surface. Integration platforms, SRE, SASE, storage and backup tools, and micro frontends are discussed as examples of platform products.

Slide DevSecOps Microservices

Hendri Karisma

The document discusses best practices for implementing DevSecOps for microservices architectures. It begins by defining microservices and explaining their advantages over monolithic architectures. It then covers challenges of microservices including communication between services, databases, testing, and deployment. The document recommends using a choreography pattern for asynchronous communication between loosely coupled services. It provides examples of event-driven architectures and deploying to Kubernetes. It also discusses technologies like Jenkins, Docker, Kubernetes, SonarQube, and Trivy that can help support continuous integration, deployment, and security in DevSecOps pipelines.

Illuminating the potential of Scrum by comparing LeSS with SAFe

Rowan Bunning

Scrum implementations have the characteristics of an iceberg. The tip of the iceberg is what is explicit in the Scrum Guide whilst the much larger mass under the waterline is deep adoption of the implications of Scrum and Lean. This is where far greater payoffs from Agile adoption are to be found. Unfortunately, few people are aware of many of the deep implications and far fewer have experienced a Scrum adoption that goes beyond the tip of the iceberg. The recent articulation of LeSS and it’s contrast with SAFe is drawing attention to the difference between shallow and deep Scrum. This session will take you in a submersible below the waterline and use a spotlight to illuminate the vast potential to improve your organisation through deep Scrum. In comparing LeSS with SAFe, we illuminate ways to… 1. Scale vertically, not just horizontally to help thousands pull together as one. 2. Reduce bureaucratic control and increase business-development collaboration. 3. Transform the win-lose contract game between business and IT into a win-win co-operative game. 4. Focus everyone on the end-customer and re-structure around this. 5. Produce a potentially shippable product increment every fortnight. 6. Enable the organisation to "turn on a dime, for a dime". 7. Enable anti-fragile self-optimising of both What customer value is created and How it is created. 8. Radically simplify organisational structure without the overheads of unnecessary specification, co-ordination and reporting roles. 9. Unleash the potential of real self-managing teams without this being unwittingly constrained. 10. Allow managers to shift from managing the what, the how and tracking to the much more impactful work of capability building.

Juraci Paixão Kröhling - All you need to know about OpenTelemetry

Juliano Costa

OpenTelemetry is one of the newest projects in the realm of Observability at the CNCF and is already the second most active project there. In this session, Juraci Paixão Kröhling will talk about the different subprojects and how to get started using them. Even if you heard about OpenTelemetry before, you'll leave this session with a better understanding of what this is all about, the several faces of OpenTelemetry, and what you can do to make your projects more observable.

Intro to open source observability with grafana, prometheus, loki, and tempo(...

LibbySchulze

This document provides an introduction to open source observability tools including Grafana, Prometheus, Loki, and Tempo. It summarizes each tool and how they work together. Prometheus is introduced as a time series database that collects metrics. Loki is described as a log aggregation system that handles logs at scale without high costs. Tempo is explained as a tracing system that allows tracing from logs, metrics, and between services. The document emphasizes that these tools can be run together to gain observability across an entire system from logs to metrics to traces.

API Strategy Presentation

Lawrence Coburn

Introduction to Domain Driven Design

Christos Tsakostas

Cloud-Native Observability

Tyler Treat

What is observability and how is it different from traditional monitoring? How do we effectively monitor and debug complex, elastic microservice architectures? In this interactive discussion, we’ll answer these questions. We’ll also introduce the idea of an “observability pipeline” as a way to empower teams following DevOps practices. Lastly, we’ll demo cloud-native observability tools that fit this “observability pipeline” model, including Fluentd, OpenTracing, and Jaeger.

Azure data platform overview

James Serra

This document provides an overview and summary of the author's background and expertise. It states that the author has over 30 years of experience in IT working on many BI and data warehouse projects. It also lists that the author has experience as a developer, DBA, architect, and consultant. It provides certifications held and publications authored as well as noting previous recognition as an SQL Server MVP.

DevOps feedback loops

Paul Peissner

This document discusses DevOps feedback loops and the importance of closing the loop between development and operations. It provides examples of where feedback comes from in operations, including from people and machines, and where feedback needs to go in development, such as to developers and development systems. The key message is that closing the feedback loop through continuous feedback is critical for DevOps in order to optimize software development and address issues quickly before they become bigger problems.

Roadmaps That Inspire

UpUp Labs

Wouldn't it be great if no one could argue with your roadmap? Wouldn't it just rock if you could cut through the endless debates and circular arguments, get to consensus, and just execute? I'm Bruce McCarthy, Founder and Chief Product Person at UpUp Labs. In 20 years as a product person, I've built a roadmapping methodology on 7 pillars: * Strategic Goals * Generate Ideas * Objective Prioritization * Shuttle Diplomacy * Benefit-oriented Themes * Appropriate Format & Cadence * Punctuated Equilibrium At last year's ProductCamp, my standing-room-only session on prioritization was a huge hit with product people. This year I've focused on translating your priorities into a roadmap that will inspire your whole team to buy-in, stick with it, and over-deliver.

Introdution to Dataops and AIOps (or MLOps)

Adrien Blind

Intro to Delta Lake

Databricks

Delta Lake brings reliability, performance, and security to data lakes. It provides ACID transactions, schema enforcement, and unified handling of batch and streaming data to make data lakes more reliable. Delta Lake also features lightning fast query performance through its optimized Delta Engine. It enables security and compliance at scale through access controls and versioning of data. Delta Lake further offers an open approach and avoids vendor lock-in by using open formats like Parquet that can integrate with various ecosystems.

Splunk sales presentation

jpelletier123

This document provides an overview and sales presentation of Splunk software capabilities. Some key points: - Splunk is a software platform that allows users to search, monitor and analyze machine-generated data for security and operational intelligence. - It can index and search data from many different sources like servers, applications, networks and more. - Splunk offers scalability to handle indexing and searching large volumes of data up to terabytes per day across multiple data centers. - The software provides features like search and investigation, proactive monitoring, operational visibility and real-time business insights.

CI/CD Best Practices for Building Modern Applications - MAD302 - Anaheim AWS ...

Amazon Web Services

Continuous integration and continuous delivery (CI/CD) techniques enable teams to increase agility and quickly release a high-quality product. In this talk, we walk you through best practices for building CI/CD workflows that enable you to manage your serverless and containerized applications. We cover infrastructure as code application models, such as the AWS Serverless Application Model (AWS SAM), as well as how to set up CI/CD release pipelines with AWS CodePipeline and AWS CodeBuild. Finally, we show you how to automate safer deployments with AWS CodeDeploy.

Three Pillars, No Answers: Helping Platform Teams Solve Real Observability Pr...

DevOps.com

Observability has never been more important: the complexity of microservices makes it harder and harder to answer basic questions about system behavior. The conventional wisdom claims that Metrics, Logging and Tracing are “the three pillars” of observability… yet software organizations check these three boxes and are still grasping at straws during emergencies. In this session we’ll illustrate the problem with the three pillars: metrics, logs, and traces are just data – they are the fuel, not the car.

Three Pillars with Zero Answers: A New Observability Scorecard

DevOps.com

Nobody denies the importance of observability in modern production software: with microservices adding scale, concurrency, and frequent deploys, it’s getting harder and harder to answer even basic questions about application behavior. The conventional wisdom has been that metrics, logging and tracing are “the three pillars” of observability, yet organizations check these boxes and still find themselves grasping at straws during emergencies. The problem is that metrics, logs, and traces are just data – if what we need is a car, all we’re talking about is the fuel. We will continue to disappoint ourselves until we reframe observability around two fundamental activities: (1) detection and (2) refinement. In this session, we’ll summarize the contemporary observability dogma, then present a new observability scorecard for objectively reasoning about and assessing observability solutions for modern distributed systems.

What's hot

Improve Developer Experience with Developer Portal

Kumton Suttiraksiri

Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry

Kai Wähner

Principles of System Observability

Janis Orlovs

Observability & Datadog

JamesAnderson599331

Splunk-Presentation

PrasadThorat23

CTO Summit 2022

Dr. Jimmy Schwarzkopf

Slide DevSecOps Microservices

Hendri Karisma

Illuminating the potential of Scrum by comparing LeSS with SAFe

Rowan Bunning

Juraci Paixão Kröhling - All you need to know about OpenTelemetry

Juliano Costa

Intro to open source observability with grafana, prometheus, loki, and tempo(...

LibbySchulze

API Strategy Presentation

Lawrence Coburn

Introduction to Domain Driven Design

Christos Tsakostas

Cloud-Native Observability

Tyler Treat

Azure data platform overview

James Serra

DevOps feedback loops

Paul Peissner

Roadmaps That Inspire

UpUp Labs

Introdution to Dataops and AIOps (or MLOps)

Adrien Blind

Intro to Delta Lake

Databricks

Splunk sales presentation

jpelletier123

CI/CD Best Practices for Building Modern Applications - MAD302 - Anaheim AWS ...

Amazon Web Services

What's hot (20)

Improve Developer Experience with Developer Portal

Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry

Principles of System Observability

Observability & Datadog

Splunk-Presentation

CTO Summit 2022

Slide DevSecOps Microservices

Illuminating the potential of Scrum by comparing LeSS with SAFe

Juraci Paixão Kröhling - All you need to know about OpenTelemetry

Intro to open source observability with grafana, prometheus, loki, and tempo(...

API Strategy Presentation

Introduction to Domain Driven Design

Cloud-Native Observability

Azure data platform overview

DevOps feedback loops

Roadmaps That Inspire

Introdution to Dataops and AIOps (or MLOps)

Intro to Delta Lake

Splunk sales presentation

CI/CD Best Practices for Building Modern Applications - MAD302 - Anaheim AWS ...

Similar to Three Pillars, Zero Answers: Rethinking Observability

Three Pillars, No Answers: Helping Platform Teams Solve Real Observability Pr...

DevOps.com

Observability has never been more important: the complexity of microservices makes it harder and harder to answer basic questions about system behavior. The conventional wisdom claims that Metrics, Logging and Tracing are “the three pillars” of observability… yet software organizations check these three boxes and are still grasping at straws during emergencies. In this session we’ll illustrate the problem with the three pillars: metrics, logs, and traces are just data – they are the fuel, not the car.

Three Pillars with Zero Answers: A New Observability Scorecard

DevOps.com

Big Data : Bits of History, Words of Advice

Venu Vasudevan

Big Data: Bits of History, Words of Advice by Venu Vasudevan discusses the history and challenges of big data. The document provides examples from Iridium satellite communications and an industrial IoT project to illustrate issues with fast, unpredictable data streams and the challenges of integrating IT and OT systems. It advocates for hybrid human-machine approaches and emphasizes starting simply before over-engineering architectures to fit extreme needs. The key lessons are to focus on the core problem, allow for data variability, and consider lightweight data augmentation when starting with limited real-world datasets.

Mantis: Netflix's Event Stream Processing System

C4Media

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1yyaHb8. The authors discuss Netflix's new stream processing system that supports a reactive programming model, allows auto scaling, and is capable of processing millions of messages per second. Filmed at qconsf.com. Danny Yuan is an architect and software developer in Netflix’s Platform Engineering team. Justin Becker is Senior Software Engineer at Netflix.

High Availability HPC ~ Microservice Architectures for Supercomputing

inside-BigData.com

In this deck from the Stanford HPC Conference, Ryan Quick from Providentia Worldwide presents: High Availability HPC ~ Microservice Architectures for Supercomputing. "Microservices power cloud-native applications to scale thousands of times larger than single deployments. We introduce the notion of microservices for traditional HPC workloads. We will describe microservices generally, highlighting some of the more popular and large-scale applications. Then we examine similarities between large-scale cloud configurations and HPC environments. Finally we propose a microservice application for solving a traditional HPC problem, illustrating improved time-to-market and workload resiliency." Watch the video: https://insidehpc.com/2018/02/high-availability-hpc-microservice-architectures-supercomputing/ Learn more: http://www.providentiaworldwide.com/ and http://hpcadvisorycouncil.com Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

Data Science Challenge presentation given to the CinBITools Meetup Group

Doug Needham

The document describes the Cloudera Data Science Challenge, which involves solving three data science problems using large datasets. For the first problem, Smartfly, the goal is to predict flight delays using historical flight data and machine learning algorithms like logistic regression and SVM. The second problem, Almost Famous, involves statistical analysis of web log data and filtering for spam. The third problem, Winklr, requires social network analysis to recommend users to follow on a social media platform based on click data. The document discusses the approaches, tools, and algorithms used to solve each problem at scale using Apache Spark and Hadoop technologies.

Cloudera Data Science Challenge

Mark Nichols, P.E.

The document describes the Cloudera Data Science Challenge, which involves solving three data science problems using large datasets. For the first problem, Smartfly, the goal is to predict flight delays using historical flight data and machine learning algorithms like logistic regression and SVM. The second problem, Almost Famous, involves statistical analysis of web log data and filtering for spam. The third problem, Winklr, requires analyzing a social network graph to recommend users to follow. The document discusses the approaches, tools, and results for each problem.

"Traffic Speed Control System in the Cloud using Machine Learning" by Albert ...

DevClub_lv

This talk will be about difficulties they have met on the road, and how they overcame some of them. It will reveal both - technical and management side. Alberts Bušinskis (Garuda Dev Giri) - Yoga & Meditation teacher, IT enthusiast, software developer. Many years ago he had left a very good job in Switzerland and went to Indian Himalayas to practice Yoga. After he realized that Software Development is not different from Yoga, he came back to homeland trying to be helpful and inspiring.

How we evolved data pipeline at Celtra and what we learned along the way

Grega Kespret

The document discusses the evolution of Celtra's data pipeline over time as business needs and data volume grew. Key steps included: - Moving from MySQL to Spark/Hive/S3 to handle larger volumes and enable complex ETL like sessionization - Storing raw events in S3 and aggregating into cubes for reporting while also enabling exploratory analysis - Evaluating technologies like Vertica and eventually settling on Snowflake for its managed services, nested data support, and ability to evolve schemas. - Moving cubes from MySQL to Snowflake for faster queries, easier schema changes, and computing aggregates directly from sessions with SQL.

QConSF 2014 talk on Netflix Mantis, a stream processing system

Danny Yuan

Kakfa summit london 2019 - the art of the event-streaming app

Neil Avery

Have you ever imagined what it would be like to build a massively scalable streaming application on Kafka, the challenges, the patterns and the thought process involved? How much of the application can be reused? What patterns will you discover? How does it all fit together? Depending upon your use case and business, this can mean many things. Starting out with a data pipeline is one thing, but evolving into a company-wide real-time application that is business critical and entirely dependent upon a streaming platform is a giant leap. Large-scale streaming applications are also called event streaming applications. They are classically different from other data systems; event streaming applications are viewed as a series of interconnected streams that are topologically defined using stream processors; they hold state that models your use case as events. Almost like a deconstructed real-time database. In this talk, I step through the origins of event streaming systems, understanding how they are developed from raw events to evolve into something that can be adopted at an organizational scale. I start with event-first thinking, Domain Driven Design to build data models that work with the fundamentals of Streams, Kafka Streams, KSQL and Serverless (FaaS). Building upon this, I explain how to build common business functionality by stepping through the patterns for: – Scalable payment processing – Run it on rails: Instrumentation and monitoring – Control flow patterns Finally, all of these concepts are combined in a solution architecture that can be used at an enterprise scale. I will introduce enterprise patterns such as events-as-a-backbone, events as APIs and methods for governance and self-service. You will leave talk with an understanding of how to model events with event-first thinking, how to work towards reusable streaming patterns and most importantly, how it all fits together at scale.

The Art of The Event Streaming Application: Streams, Stream Processors and Sc...

confluent

1) The document discusses the art of building event streaming applications using various techniques like bounded contexts, stream processors, and architectural pillars. 2) Key aspects include modeling the application as a collection of loosely coupled bounded contexts, handling state using Kafka Streams, and building reusable stream processing patterns for instrumentation. 3) Composition patterns involve choreographing and orchestrating interactions between bounded contexts to capture business workflows and functions as event-driven data flows.

From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...

Dataconomy Media

Big Data Berlin - Criteo

Sofian Djamaa

Tooling Up for Efficiency: DIY Solutions @ Netflix - ABD319 - re:Invent 2017

Amazon Web Services

At Netflix, we have traditionally approached cloud efficiency from a human standpoint, whether it be in-person meetings with the largest service teams or manually flipping reservations. Over time, we realized that these manual processes are not scalable as the business continues to grow. Therefore, in the past year, we have focused on building out tools that allow us to make more insightful, data-driven decisions around capacity and efficiency. In this session, we discuss the DIY applications, dashboards, and processes we built to help with capacity and efficiency. We start at the ten thousand foot view to understand the unique business and cloud problems that drove us to create these products, and discuss implementation details, including the challenges encountered along the way. Tools discussed include Picsou, the successor to our AWS billing file cost analyzer; Libra, an easy-to-use reservation conversion application; and cost and efficiency dashboards that relay useful financial context to 50+ engineering teams and managers.

Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale

Sriram Krishnan

The Data Platform at Twitter supports engineers and data scientists running batch jobs on Hadoop clusters that are several 1000s of nodes, and real-time jobs on top of systems such as Storm. In this presentation, I discuss the overall Data Platform stack at Twitter. In particular, I talk about enabling real-time and batch analytics at scale with the help of Scalding, which is a Scala DSL for batch jobs using MapReduce, Summingbird, which is a framework for combined real-time and batch processing, and Tsar, which is a framework for real-time time-series aggregations.

Malstone KDD 2010

Robert Grossman

This document discusses MalStone and MalGen, benchmarks for evaluating large-scale data analytics frameworks. MalStone models log file analysis, generating synthetic event data with sites, entities, timestamps and marks. It measures the time to compute statistics on this data in MapReduce. MalStone B uses billions of records and took Hadoop 799 minutes to complete. A specialized system completed MalStone B in just 68 minutes, showing the impact of data locality optimizations. The document provides details on the MalStone schema and benchmarks to help evaluate different distributed computing architectures and their suitability for analytics workloads.

Cloud Native London 2019 Faas composition using Kafka and cloud-events

Neil Avery

Serverless functions or FaaS are all the rage. By leveraging well established event-driven microservice design principles and applying them to serverless functions you can build a homogenous ecosystem to run FaaS applications. Kafka’s natural ability to store and replay events means serverless functions can not only be replayed, but they can also be used to choreograph call chains or driven using orchestration. Kafka also means you can democratize and organize FaaS environments in a way that scales across the enterprise. Underpinning this mantra is the use of Cloud Events by the CNCF serverless working group (of which Confluent is an active member).

Tek12: Graphing real-time performance with Graphite

nanderoo

This document discusses using Graphite, an open source tool for monitoring and visualizing real-time performance data. It begins by introducing Graphite and its components - a web front end, processing backend, and database. The document then discusses what types of data are useful to capture from applications and systems, such as function times, database queries, memory usage and network traffic. Finally, it covers interpreting Graphite charts to identify deviations, jitters and other anomalies that could help debug performance issues. A demo of using Graphite to monitor a WordPress site is also mentioned.

Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...

HostedbyConfluent

Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenzhong XU | Current 2022 If you are a data scientist or a platform engineer, you probably can relate to the pains of working with the current explosive growth of Data/ML technologies and toolings. With many overlapping options and steep learning curves for each, it’s increasingly challenging for data science teams. Many platform teams started thinking about building an abstracted ML platform layer to support generalized ML use cases. But there are many complexities involved, especially as the underlying real-time data is shifting into the mainstream. In this talk, we’ll discuss why ML platforms can benefit from a simple and ""invisible"" abstraction. We’ll offer some evidence on why you should consider leveraging streaming technologies even if your use cases are not real-time yet. We’ll share learnings (combining both ML and Infra perspectives) about some of the hard complexities involved in building such simple abstractions, the design principles behind them, and some counterintuitive decisions you may come across along the way. By the end of the talk, I hope data scientists can walk away with some tips on how to evaluate ML platforms, and platform engineers learned a few architectural and design tricks.

Similar to Three Pillars, Zero Answers: Rethinking Observability (20)

Three Pillars, No Answers: Helping Platform Teams Solve Real Observability Pr...

Three Pillars with Zero Answers: A New Observability Scorecard

Big Data : Bits of History, Words of Advice

Mantis: Netflix's Event Stream Processing System

High Availability HPC ~ Microservice Architectures for Supercomputing

Data Science Challenge presentation given to the CinBITools Meetup Group

Cloudera Data Science Challenge

"Traffic Speed Control System in the Cloud using Machine Learning" by Albert ...

How we evolved data pipeline at Celtra and what we learned along the way

QConSF 2014 talk on Netflix Mantis, a stream processing system

Kakfa summit london 2019 - the art of the event-streaming app

The Art of The Event Streaming Application: Streams, Stream Processors and Sc...

From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...

Big Data Berlin - Criteo

Tooling Up for Efficiency: DIY Solutions @ Netflix - ABD319 - re:Invent 2017

Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale

Malstone KDD 2010

Cloud Native London 2019 Faas composition using Kafka and cloud-events

Tek12: Graphing real-time performance with Graphite

Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...

More from DevOps.com

Modernizing on IBM Z Made Easier With Open Source Software

DevOps.com

In the past decade, IDC has seen IBM Z evolve first from a siloed platform to what they call a "connected" platform, and then to a "transformative" platform. This transition has been driven by IBM, by the IBM Z software vendors, like Rocket Software, and by businesses themselves. IDC research shows that businesses that choose to modernize IBM Z achieve higher satisfaction than re-platformers and many are using open source software (OSS) in their modernization initiatives. Employing OSS makes it possible to crack the platform open and enable it to connect to the rest of the datacenter and the outside world. Join IDC guest speaker, Al Gillen and Peter Fandel as they take a deeper look at the value proposition associated with using commercially supported OSS in mission-critical environments, like IBM Z. In this webinar we’ll discuss: How OSS can neutralize the disparity between seasoned IBM Z and emerging developers The modernization initiatives that involve OSS What to consider before bringing OSS to IBM Z How Rocket Software is delivering commercially supported OSS to IBM Z

Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...

DevOps.com

With the growing adoption of Kubernetes, organizations want to take advantage of containerized Microsoft SQL Server 2019 to optimize transactional performance and accelerate time-to-insights from their business-critical data. However, as enterprises embrace hybrid cloud strategy, they need to consider several aspects based on the performance, cost and data protection requirements for running enterprise-grade SQL Server databases. In this webinar, we will compare and contrast various cloud-native platforms for SQL Server that would help CIOs, DevOps engineers, database administrators and applications architects to determine the most suitable platform that fits their business needs. Join us as we explore some exciting results from a recent performance benchmark study conducted by McKnight Consulting Group, an independent consulting firm, to compare the performance of Microsoft SQL Server 2019 on the best possible configurations of the following Kubernetes platforms: Diamanti Enterprise Kubernetes Platform Amazon Web Services Elastic Kubernetes Service (AWS EKS) Azure Kubernetes Service (AKS) Topics will include: Platform considerations and requirements for running Microsoft SQL Server 2019 Performance comparison and analysis of running SQL Server on various platform Best practices for running containerized SQL Server databases in Kubernetes environment

Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...

DevOps.com

Next Generation Vulnerability Assessment Using Datadog and Snyk

DevOps.com

Vulnerability assessment for teams can often be overwhelming. The dependency graph could be thousands of packages depending on the application. Triaging vulnerability data and prioritizing actions has historically been a very manual process, until now. With Datadog and Snyk, learn how to trace security and performance issues by leveraging continuous profiling capabilities for actionable insight that help developers remediate problems. Join us on Thursday, January 21 for a unique opportunity to learn more about continuous profiling, vulnerability management, and the benefit to customers from using both of these products. In this webinar, you will: Bust some myths around continuous profiling and learn how Datadog differentiates itself See decorated traces in action for sample Java applications and understand how Snyk + Datadog reduce time to triage supply chain vulnerabilities Learn roadmap information for upcoming public announcements from both partners

Vulnerability Discovery in the Cloud

DevOps.com

In the era of cloud generation, the constant activity around workloads and containers create more vulnerabilities than an organization can keep up with. Using legacy security vendors doesn't set you up for success in the cloud. You’re likely spending undue hours chasing, triaging and patching a countless stream of cloud vulnerabilities with little prioritization. Join us for this live webinar as we detail how to streamline host and container vulnerability workflows for your software teams wanting to build fast in the cloud. We'll be covering how to: Get visibility into active packages and associated vulnerabilities Reduce false positives by 98% Reduce investigation time by 30% Spot a legacy vendor looking to do some cloud washing

2021 Open Source Governance: Top Ten Trends and Predictions

DevOps.com

If you work in software development, jumpstart your engineering team in 2021—get ahead of the engineering curve and your competitors—by attending this must-watch open source trends and predictions webinar. Alex Rybak, Director of Product Management at Revenera, and Russ Eling, founder and CEO of OSS Engineering Consultants, share their top 10 open source usage, license compliance and security insights for the new year. Just a few hints at what you’ll learn more about: Where the adoption of shift-left is headed and the decisions you’ll face going forward The impact of a lack of software developer security training relative to pandemic fallout The broader role of the engineering team in open source management and governance The expanding role and impact of open source marketplaces such as GitHub Don’t miss the discussion for valuable insight and learning for software engineering teams

A New Year’s Ransomware Resolution

DevOps.com

2020 was a brutal year for ransomware. Cybercriminals operated without any human decency, targeting the most vulnerable and at-risk parties, such as hospitals, scientists, and global manufacturers. The approach has become more sophisticated and life-threatening, shifting from individual targets to global enterprises, destroying backups, blackmailing victims with public leakage of exfiltrated data, and paralyzing critical systems and infrastructure.

Getting Started with Runtime Security on Azure Kubernetes Service (AKS)

DevOps.com

This document discusses runtime security on Azure Kubernetes Service (AKS). It begins by introducing AKS and how it simplifies Kubernetes deployment and management. It then discusses the security concerns with containers and the need for runtime security. Runtime security involves monitoring activity within containers to detect unwanted behaviors. The document outlines how Sysdig provides runtime security for AKS through its agents that collect syscall data and Kubernetes audit logs. It analyzes this data using policies to detect anomalies and threats across containers, hosts, and Kubernetes clusters. Sysdig also integrates with other tools like Falco and Anchore to provide breadth and depth of security.

Don't Panic! Effective Incident Response

DevOps.com

In any fast-paced engineering environment, unexpected incidents can arise and escalate without warning. Without strong leadership within teams, you get chaotic, stressful, and tiring situations that waste valuable engineering time, slow down resolution, and most importantly, impact your customers. Operationally mature organisations use proven incident response systems led by Incident Commanders. Incident Commanders provide the leadership needed to help stabilize major incidents fast. In this webinar, we’ll take lessons learned from formalized incident response, such as those used by first responders, and show you how to apply those same practices to your organization. By utilising these methods you’ll improve both the speed and effectiveness of your team’s response, reducing the amount of downtime experienced. In this workshop, attendees will: Be introduced to the Incident Command System and learn how it can be adapted to their organisation Walk through the basics of incident response best practices Discuss examples of formal incident response from multiple organisations

Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture

DevOps.com

Chaos engineering is becoming a critical part of the DevOps toolchain when adopting Site Reliability Engineering (SRE) practices. Every system is becoming a distributed system and chaos engineering proclaims many advantages for them. It improves infrastructure automation, increases reliability and transforms incident management. However, an often-overlooked benefit of chaos engineering and SRE involves culture transformation. Culture is often touched upon when talking about chaos engineering and SRE but not as often as skills and process. In this webinar, we will discuss how you can build out a chaos engineering practice and how you can adopt a true blameless culture and maximize the potential of your team. You will learn how to: Hold blameless postmortems Share post mortems with other teams Run regular fire drills and game days Automate chaos experiments for continuous validation

Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport

DevOps.com

Enterprises are best served by leveraging an RBAC system to manage access to their SSH and Kubernetes resources. With Teleport, an open source software, employers are able to provide granular access controls to developers based on the access they need and when they need it. This makes it possible for employers to maintain secure access without getting in the way of their developers’ daily operations. Join Steven Martin, solution engineer at Teleport, as he demonstrates how to assign access to developers and SRE’s across environments with Teleport through roles mapped from enterprises’ identity providers or SSOs.

Monitoring Serverless Applications with Datadog

DevOps.com

Deliver your App Anywhere … Publicly or Privately

DevOps.com

Developers are increasingly adopting a microservices approach for their apps in order to gain rapid iteration capabilities required for delivering new services faster. However, delivering the App still requires multiple steps such as allocation of virtual IPs, provisioning the front load balancer, configuring firewall rules, configuring a public domain, and DDOS. At present, each of these steps requires coordination across multiple teams with multiple iterations per team. The time efficiencies gained by adopting microservices and cloud-native technologies is negated due to the time taken to deliver the App. In this session, Pranav Dharwadkar, VP of products at Volterra, and Jakub Pavlik, director of engineering, will help you understand these challenges and introduce a distributed proxy architecture that can alleviate the challenges across different cloud environments. This webinar will include a live demo using a distributed proxy architecture to advertise an App publicly and privately. In this webinar, you will learn: The steps required to deliver an App using the current approaches How a distributed proxy architecture can be used to deliver the app publicly and privately The operational benefits of a distributed proxy architecture for delivering new services

Securing medical apps in the age of covid final

DevOps.com

The COVID-19 pandemic has drastically altered the connected healthcare landscape, accelerating the usage of telemedicine and other remote healthcare delivery systems by as much as 11,000% for some populations. How has this unprecedented push affected healthcare and medical device application security? The security team at Intertrust recently analyzed 100 Android and iOS medical apps to find out. In this webinar, we'll discuss: Medical application and device threat trends The top mHealth security vulnerabilities uncovered in our analysis Strategies to keep your mHealth apps safe Future advances in digital healthcare and how your security can evolve with it

How to Build a Healthy On-Call Culture

DevOps.com

Raise your hand if you enjoy being buried in alerts or woken up at 2 a.m. — yeah … thought so. Ever-rising customer expectations around high availability and performance put massive pressure on the teams who develop and support SaaS products. And teams are literally losing sleep over it. Until outages and other incidents are a thing of the past, organizations need to invest in a way of dealing with them that won’t lead to burn-out. In this session, you’ll learn how to combine the latest tooling with DevOps practices in the pursuit of a sustainable incident response workflow. It’s all about transparency, actionable alerts, resilience and learning from each incident.

The Evolving Role of the Developer in 2021

DevOps.com

The role of the developer continues to change as they sit on the front line of application and even cloud infrastructure security. Today, developers are focused on innovating fast and improving security, but how do high-performing teams accomplish this? They commit code frequently, release often and update dependencies regularly (608x faster than others). In this webinar, we'll discuss the key traits of high-performing teams and how that impacts the role of the developer. Key Takeaways: Choose the best third party dependencies Determine the lowest effort upgrades between open source versions Solve for issues in both direct and transitive dependencies with a single-click Block and quarantine suspicious open source components

Service Mesh: Two Big Words But Do You Need It?

DevOps.com

Today, one of the big concepts buzzing in the app development world is service mesh. A service mesh is a configurable infrastructure layer for microservices application that makes communication flexible, reliable and fast. Let’s take a step back, though, and answer this question: Do you need a service mesh? Join this webinar to learn: What a service mesh is; when and why you need it — or when and why you may not App modernization journey and traffic management approaches for microservices-based apps How to make an informed decision based on cost and complexity before adopting service mesh Learn about NGINX Service Mesh in a live demo, and how it provides the best service mesh option for container-based L7 traffic management

Secure Data Sharing in OpenShift Environments

DevOps.com

Red Hat OpenShift is enabling quicker adoption of DevOps practices. Containers are an essential component of DevOps and the OpenShift Kubernetes Container Platform is integral for orchestration within these environments. Data security is now challenged to keep pace with the size and scope of container usage. The migration from legacy in-house deployments to hybrid-cloud installations has created new attack surfaces as data is shared more freely in Kubernetes deployments. Protecting data at rest and in motions is a necessity. Learn how you can keep data protected and securely share data in OpenShift environments with real-time data protection solutions.

How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...

DevOps.com

Managing access permissions in the public cloud can be a very complex process. In fact, by 2023, 75% of cloud security failures will result from the inadequate management of identities, access and privileges, according to Gartner. Join us as Guy Flechter, CISO of AppsFlyer, presents a real-world case of how his company works to enforce least-privilege and to govern identities in their cloud. This webinar will also provide an overview of how to govern access and achieve least privilege by analyzing the access permissions and activity in your public cloud environment. With thousands of human and machine identities, roles, policies and entitlements, this webinar will give you the tools to examine the access open to people and services in your public cloud, and determine whether that access is necessary. In this workshop, you will learn about: The risks of IAM misconfiguration and excessive entitlements in cloud environments The challenges in identifying and mitigating Identity and access risks for both human and machine identities How to automate cloud identity governance and entitlement management with Ermetic

Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...

DevOps.com

Open-source machine learning can be transformative, but without the proper tools in place, enterprises struggle to balance the IT security and governance requirements with the need to deliver these powerpoint tools into the hands of their developers and modelers. How can organizations get the latest technology from the open-source brain trust, while ensuring enterprise-grade management and security? In this webinar, we will discuss how Anaconda Team Edition, available on RedHat Marketplace, enables IT departments to mirror a curated set of packages into their organization in a safe and governed way. Join Michael Grant, VP of services at Anaconda, to discuss: How IT organizations are using Anaconda Team Edition to curate, govern and secure Python and R packages Tips for how development and data science teams can get the most out of Team Edition, from uploading your own packages to building custom channels for groups or projects How to distribute conda environments to desktops, servers and clusters: GUI-based installers for desktop users “Conda packs” for automated delivery to remote servers and distributed computing clusters Conda-enabled Docker containers for application deployment

More from DevOps.com (20)

Modernizing on IBM Z Made Easier With Open Source Software

Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...

Next Generation Vulnerability Assessment Using Datadog and Snyk

Vulnerability Discovery in the Cloud

2021 Open Source Governance: Top Ten Trends and Predictions

A New Year’s Ransomware Resolution

Getting Started with Runtime Security on Azure Kubernetes Service (AKS)

Don't Panic! Effective Incident Response

Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture

Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport

Monitoring Serverless Applications with Datadog

Deliver your App Anywhere … Publicly or Privately

Securing medical apps in the age of covid final

How to Build a Healthy On-Call Culture

The Evolving Role of the Developer in 2021

Service Mesh: Two Big Words But Do You Need It?

Secure Data Sharing in OpenShift Environments

How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...

Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...

Recently uploaded

Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors

DianaGray10

Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more. The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications. We’ll discuss and demo the benefits of UiPath Apps and connectors including: Creating a compelling user experience for any software, without the limitations of APIs. Accelerating the app creation process, saving time and effort Enjoying high-performance CRUD (create, read, update, delete) operations, for seamless data management. Speakers: Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP Charlie Greenberg, host

Dandelion Hashtable: beyond billion requests per second on a commodity server

Antonios Katsarakis

This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).

Monitoring and Managing Anomaly Detection on OpenShift.pdf

Tosin Akinosho

Monitoring and Managing Anomaly Detection on OpenShift Overview Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices. Key Topics Covered 1. Introduction to Anomaly Detection - Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems. 2. Understanding Edge (IoT) - Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source. 3. What is ArgoCD? - Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices. 4. Deployment Using ArgoCD for Edge Devices - Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD. 5. Introduction to Apache Kafka and S3 - Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions. 6. Viewing Kafka Messages in the Data Lake - Learn how to view and analyze Kafka messages stored in a data lake for better insights. 7. What is Prometheus? - Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices. 8. Monitoring Application Metrics with Prometheus - Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system. 9. What is Camel K? - Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes. 10. Configuring Camel K Integrations for Data Pipelines - Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow. 11. What is a Jupyter Notebook? - Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text. 12. Jupyter Notebooks with Code Examples - Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.

Leveraging the Graph for Clinical Trials and Standards

Neo4j

5th LF Energy Power Grid Model Meet-up Slides

DanBrown980551

5th Power Grid Model Meet-up It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology. Power Grid Model The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services. Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability. Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization. What to expect For the upcoming meetup we are organizing, we have an exciting lineup of activities planned: -Insightful presentations covering two practical applications of the Power Grid Model. -An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024. -An interactive brainstorming session to discuss and propose new feature requests. -An opportunity to connect with fellow Power Grid Model enthusiasts and users.

Your One-Stop Shop for Python Success: Top 10 US Python Development Providers

akankshawande

[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...

Jason Yip

The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.

Skybuffer SAM4U tool for SAP license adoption

Tatiana Kojar

Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool. SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.

Fueling AI with Great Data with Airbyte Webinar

Zilliz

Main news related to the CCS TSI 2023 (2023/1695)

Jakub Marek

An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers. The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 . The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .

The Microsoft 365 Migration Tutorial For Beginner.pptx

operationspcvita

Northern Engraving | Nameplate Manufacturing Process - 2024

Northern Engraving

Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!

Astute Business Solutions | Oracle Cloud Partner |

AstuteBusiness

9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...

saastr

Mutation Testing for Task-Oriented Chatbots

Pablo Gómez Abajo

Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots. To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.

AppSec PNW: Android and iOS Application Security with MobSF

Ajin Abraham

Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application. This talk covers: Using MobSF for static analysis of mobile applications. Interactive dynamic security assessment of Android and iOS applications. Solving Mobile app CTF challenges. Reverse engineering and runtime analysis of Mobile malware. How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.

High performance Serverless Java on AWS- GoTo Amsterdam 2024

Vadym Kazulkin

Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.

"Choosing proper type of scaling", Olena Syrota

Fwdays

Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...

saastr

What is an RPA CoE? Session 1 – CoE Vision

DianaGray10

Recently uploaded (20)

Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors

Dandelion Hashtable: beyond billion requests per second on a commodity server

Monitoring and Managing Anomaly Detection on OpenShift.pdf

Leveraging the Graph for Clinical Trials and Standards

5th LF Energy Power Grid Model Meet-up Slides

Your One-Stop Shop for Python Success: Top 10 US Python Development Providers

[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...

Skybuffer SAM4U tool for SAP license adoption

Fueling AI with Great Data with Airbyte Webinar

Main news related to the CCS TSI 2023 (2023/1695)

The Microsoft 365 Migration Tutorial For Beginner.pptx

Northern Engraving | Nameplate Manufacturing Process - 2024

Astute Business Solutions | Oracle Cloud Partner |

9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...

Mutation Testing for Task-Oriented Chatbots

AppSec PNW: Android and iOS Application Security with MobSF

High performance Serverless Java on AWS- GoTo Amsterdam 2024

"Choosing proper type of scaling", Olena Syrota

Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...

What is an RPA CoE? Session 1 – CoE Vision

Three Pillars, Zero Answers: Rethinking Observability

1. Three Pillars with Zero Answers A New Observability Scorecard March 5, 2020

2. ● Katia Bazzi ● Software Engineer @ LightStep ● katia@lightstep.com ● lightstep.com

3. Part I A Critique

4. Observing microservices is hard Google and Facebook solved this (right???) They used Metrics, Logging, and Distributed Tracing… So we should, too. The Conventional Wisdom

5. The Three Pillars of Observability - Metrics - Logging - Distributed Tracing

6. Metrics!

7. Logging!

8. Tracing!

10. Fatal Flaws

11. A word nobody knew in 2015… Dimensions (aka “tags”) can explain variance in timeseries data (aka “metrics”) … … but cardinality

12. Logging Data Volume: a reality check transaction rate x all microservices x cost of net+storage x weeks of retention ----------------------- way too much $$$$

13. The Life of Transaction Data: Dapper Stage Overhead affects… Retained Instrumentation Executed App 100.00% Buffered within app process App 000.10% Flushed out of process App 000.10% Centralized regionally Regional network + storage 000.10% Centralized globally WAN + storage 000.01%

14. Logs Metrics Dist. Traces TCO scales gracefully – ✓ ✓ Accounts for all data (i.e., unsampled) ✓ ✓ – Immune to cardinality ✓ – ✓ Fatal Flaws: A Review

15.

16. Data vs UI

17. Data vs UI

18. Data vs UI Metrics Logs Traces

19. Metrics, Logs, and Traces are Just Data, … not a feature or use case.

20. Part II A New Scorecard for Observability

21. Mental Model: “Goals” and “Activities” Goals: how our services perform in the eyes of their consumers Activities: what we (as operators) actually do to further our goals

22. “SLI” = “Service Level Indicator” TL;DR: An SLI is an indicator of health that a service’s consumers would care about. … not an indicator of its inner workings Quick Vocab Refresher: SLIs

23. Observability: Two Fundamental Goals - Gradually improving an SLI - Rapidly restoring an SLI Reminder: “SLI” = “Service Level Indicator” NOW!!!! days, weeks, months…

24. 1. Detection: measuring SLIs precisely 2. Refinement: reducing the search space for plausible explanations Observability: Two Fundamental Activities

25. An interlude about stats frequency

26. Specificity: - Cost of cardinality ($ per tag value) - Stack support (mobile/web platforms, managed services, “black- box OSS infra” like Kafka/Cassandra) Fidelity: - Correct stats!!! (global p95, p99) - High stats frequency (stats sampling frequency, in seconds) Freshness (lag from real-time, in seconds) Scorecard: Detection

27. # of things your users actually care about # of microservices # of failure modes Must reduce the search space! Why “Refinement”?

28. The Refinement Process Discover Variance Explain Variance Deploy Fix

29. Histograms vs “p99”

30. Scorecard: Refinement Identifying Variance: - Cardinality ($ per tag value) - Robust stats (histograms (see prev slide)) - Retention horizons for plausible queries (time duration) Explaining variance: - Correct stats!!! (global p95, p99) - “Suppress the messengers” of microservice failures

31. Wrapping up…

32. (first, a hint at my perspective)

33. A fun game! Design your own observability system: High-throughput High-cardinality Lengthy retention window Unsampled Choose three. (“Observability Whack-a-Mole”)

34. The Life of Trace Data: Dapper Stage Overhead affects… Retained Instrumentation Executed App 100.00% Buffered within app process App 000.10% Flushed out of process App 000.10% Centralized regionally Regional network + storage 000.10% Centralized globally WAN + storage 000.01% (Review)

35. The Life of Trace Data: Dapper Other Approaches Stage Overhead affects… Retained Instrumentation Executed App 100.00% Buffered within app process App 100.00% Flushed out of process App 100.00% Centralized regionally Regional network + storage 100.00% Centralized globally WAN + storage “fancy”

36. Refinement - Identifying variance: cardinality cost, correct stats, hi-fi histograms, retention horizons - “Suppress the messengers” Detection - Specificity: cardinality cost, stack coverage - Fidelity: correct stats, high stats frequency - Freshness: ≤ 5 seconds An Observability Scorecard

37. ● Automatic deployment and regression detection ● System and service diagrams ● Real-time and historical root cause analysis ● Correlations ● Custom alerting ● Easy Setup with no vendor lock-in ● No cardinality limitations, really LightStep: Observability with context

38. Get Started Today lightstep.com/play

Three Pillars, Zero Answers: Rethinking Observability

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Three Pillars, Zero Answers: Rethinking Observability

Similar to Three Pillars, Zero Answers: Rethinking Observability (20)

More from DevOps.com

More from DevOps.com (20)

Recently uploaded

Recently uploaded (20)

Three Pillars, Zero Answers: Rethinking Observability