Three Pillars with Zero Answers: A New Observability Scorecard

•

1 like•242 views

Nobody denies the importance of observability in modern production software: with microservices adding scale, concurrency, and frequent deploys, it’s getting harder and harder to answer even basic questions about application behavior. The conventional wisdom has been that metrics, logging and tracing are “the three pillars” of observability, yet organizations check these boxes and still find themselves grasping at straws during emergencies. The problem is that metrics, logs, and traces are just data – if what we need is a car, all we’re talking about is the fuel. We will continue to disappoint ourselves until we reframe observability around two fundamental activities: (1) detection and (2) refinement. In this session, we’ll summarize the contemporary observability dogma, then present a new observability scorecard for objectively reasoning about and assessing observability solutions for modern distributed systems.

Technology

Three Pillars with Zero Answers
A New Observability Scorecard
December 6, 2018

Observing microservices is hard
Google and Facebook solved this (right???)
They used Metrics, Logging, and Distributed Tracing…
So we should, too.
The Conventional Wisdom

The Three Pillars of Observability
- Metrics
- Logging
- Distributed Tracing

A word nobody knew in 2015…
Dimensions (aka “tags”) can explain variance
in timeseries data (aka “metrics”) …
… but cardinality

Logging Data Volume: a reality check
transaction rate
x all microservices
x cost of net+storage
x weeks of retention
-----------------------
way too much $$$$

The Life of Transaction Data: Dapper
Stage Overhead affects… Retained
Instrumentation Executed App 100.00%
Buffered within app process App 000.10%
Flushed out of process App 000.10%
Centralized regionally Regional network + storage 000.10%
Centralized globally WAN + storage 000.01%

Logs Metrics Dist. Traces
TCO scales gracefully
– ✓ ✓
Accounts for all data
(i.e., unsampled) ✓ ✓ –
Immune to cardinality
✓ – ✓
Fatal Flaws: A Review

Metrics, Logs, and Traces are
Just Data,
… not a feature or use case.

“SLI” = “Service Level Indicator”
TL;DR: An SLI is an indicator of health that
a service’s consumers would care about.
… not an indicator of its inner workings
Observability: Quick Vocab Refresher

Observability: Two Fundamental Goals
- Gradually improving an SLI
- Rapidly restoring an SLI
Reminder: “SLI” = “Service Level Indicator”
NOW!!!!
days, weeks, months…

1. Detection: perfect SLI capture
2. Refinement: reduce the search space
Observability: Two Fundamental Activities

Specificity:
- Arbitrary dimensionality and cardinality
- Any layer of the stack, including mobile+web!
Fidelity:
- Correct stats!!!
- High stats frequency (i.e., “beware smoothing”!)
Freshness: ≤ 5 second lag
Scorecard >> Detection

# of things your users
actually care about
# of microservices
# of failure modes
Must reduce
the search space!
Scorecard >> Refinement

Scorecard >> Refinement
Identify Variance
Explain Variance
Deploy
Fix

Scorecard >> Refinement
Identifying Variance:
- Cardinality: understand which tag changed
- Robust stats: histograms (see prev slide)
- Data retention: always “Know What’s Normal”
Explaining variance:
- Correct stats!!!
- “Suppress the messengers” of microservice failures

The Life of Transaction Data: Dapper LightStep
Stage Overhead affects… Retained
Instrumentation Executed App 100.00%
Buffered within app process App 100.00%
Flushed out of process App 100.00%
Centralized regionally Regional network + storage 100.00%
Centralized globally WAN + storage on-demand

Refinement
- Identifying variance:
unlimited cardinality, hi-fi
histograms, data retention
- “Suppress the messengers”
Detection
- Specificity: unlimited
cardinality, across the
entire stack
- Fidelity: correct stats,
high stats frequency
- Freshness: ≤ 5 seconds
An Observability Scorecard

Thank you!
Ben Sigelman, Co-founder and CEO
twitter: @el_bhs
email: bhs@lightstep.com

Ideal Refinement: Real-time
Must be able to test and eliminate hypotheses quickly
- Actual data must be ≤10s fresh
- UI / API latency must be very low

Ideal Refinement: Context-Rich
We can’t expect humans to know what’s normal

What's hot

Hands-on Introduction to Machine Learning

Brittany Lasseigne, Ph.D.

Intepretability / Explainable AI for Deep Neural Networks

Universitat Politècnica de Catalunya

Zaikun Xu from the Università della Svizzera Italiana presented this deck at the 2016 Switzerland HPC Conference. “In the past decade, deep learning as a life-changing technology, has gained a huge success on various tasks, including image recognition, speech recognition, machine translation, etc. Pio- neered by several research groups, Geoffrey Hinton (U Toronto), Yoshua Benjio (U Montreal), Yann LeCun(NYU), Juergen Schmiduhuber (IDSIA, Switzerland), Deep learning is a renaissance of neural network in the Big data era. Neural network is a learning algorithm that consists of input layer, hidden layers and output layers, where each circle represents a neural and the each arrow connection associates with a weight. The way neural network learns is based on how different between the output of output layer and the ground truth, following by calculating the gradients of this discrepancy w.r.b to the weights and adjust the weight accordingly. Ideally, it will find weights that maps input X to target y with error as lower as possible.” Watch the video presentation: http://insidehpc.com/2016/03/deep-learning/ See more talks in the Swiss Conference Video Gallery: http://insidehpc.com/2016-swiss-hpc-conference/ Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

Tutorial on Deep Learning

inside-BigData.com

AI Infra Day Oct. 25, 2023 Organized by Alluxio For more Alluxio Events: https://www.alluxio.io/events/ Speaker: - Jordan Plawner (Global Director of Artificial intelligence Product Management and Strategy, @Intel) ChatGPT and other massive models represents an amazing step forward in AI, yet they do not solve real-world business problems. We will survey how the AI ecosystem has worked non-stop over this last year to take these all-purpose multi-task models and optimize them to they can be used by organizations to address domain specific problems. We will explain these new AI-for-the-real world techniques and methods such as fine tuning and how can be applied to deliver results which are highly performant with state-of-the-art accuracy while also being economical to build and deploy everywhere to enhance products and services.

AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...

Alluxio, Inc.

Remote-first work is the "new normal" for companies around the world. There is no shortage of advice on how individual teams can bond and work effectively remotely. However, there is not much on how to address remote interactions between different teams that need to collaborate remotely, as part of the same value stream. Moving from the physical to the online world can further expose pre-existing interaction problems, increase wait times and slow down delivery and possibly response to incidents. Based on the ideas from Team Topologies, Manuel Pais and Matthew Skelton will present some useful approaches to clarify and evolve inter-team interactions and communication in this remote-first world. Designing Team APIs and virtual communication channels that promote relevant team interactions while minimizing communication overhead will help modern organizations keep a fast flow of delivery once they're past the initial adaptation to teleworking. Following well-defined interaction patterns and architecting for team-first software boundaries will also help reduce communication overhead, clarify expectations on teams, and increase visibility of on-going work and support.

Remote-first Team Interactions for Business and Technology Teams @ Berlin CTO...

Manuel Pais

[Video recording available at https://www.youtube.com/playlist?list=PLewjn-vrZ7d3x0M4Uu_57oaJPRXkiS221] Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling. As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks. In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, hiring, sales, and lending. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.

Explainable AI in Industry (KDD 2019 Tutorial)

Krishnaram Kenthapadi

Recently, Convolutional Neural Networks have been successfully applied to image segmentation tasks. Here we present some of the most recent techniques that increased the accuracy in such tasks. First we describe the Inception architecture and its evolution, which allowed to increase width and depth of the network without increasing the computational burden. We then show how to adapt classification networks into fully convolutional networks, able to perform pixel-wise classification for segmentation tasks. We finally introduce the hypercolumn technique to further improve state-of-the-art on various fine-grained localization tasks.

Modern Convolutional Neural Network techniques for image segmentation

Gioele Ciaparrone

Generative AI - The New Reality: How Key Players Are Progressing

Vishal Sharma

The objective of this project is to discuss the importance of Machine Learning in different sectors and how does it solve the problems in the Marketing Analytics field. We have discussed Marketing Segmentation, Advertisement, and Fraud detection in our project. We used different Machine Learning algorithms and used R and Python library to predict and solve these problems. After making models and running test data on those models we got following results: • We trained a Decision tree and Random Forest classifier model which has 73% accuracy to predict whether a person will be a defaulter or not based on credit history, income, job type, dependents etc. • We segmented the Social networking profiles based on the likes and dislikes of a person using K-Means Clustering. • We made a predictive model of the messages a customer receives and determined whether a message will be a Spam or not a spam with an accuracy of 97%. We used Naïve Bayes classifier for this model.

Marketing Analytics using R/Python

Sagar Singh

Gan seminar

San Kim

Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...

Robert McDermott

Big data landscape version 2.0

Matt Turck

How should data be preprocessed for use in machine learning algorithms? How to identify the most predictive attributes of a dataset? What features can generate to improve the accuracy of a model? Feature Engineering is the process of extracting and selecting, from raw data, features that can be used effectively in predictive models. As the quality of the features greatly influences the quality of the results, knowing the main techniques and pitfalls will help you to succeed in the use of machine learning in your projects. In this talk, we will present methods and techniques that allow us to extract the maximum potential of the features of a dataset, increasing flexibility, simplicity and accuracy of the models. The analysis of the distribution of features and their correlations, the transformation of numeric attributes (such as scaling, normalization, log-based transformation, binning), categorical attributes (such as one-hot encoding, feature hashing, Temporal (date / time), and free-text attributes (text vectorization, topic modeling). Python, Python, Scikit-learn, and Spark SQL examples will be presented and how to use domain knowledge and intuition to select and generate features relevant to predictive models.

Feature Engineering - Getting most out of data for predictive models

Gabriel Moreira

Algorithmic Bias : What is it? Why should we care? What can we do about it?

University of Minnesota, Duluth

Network embedding

SOYEON KIM

This presentation introduces the WHY, WHAT, and HOW of the Fractal Gridding Technique, which is an invention of Dr. Rod King. Fractal Gridding and in particular, the 3x3 Fractal Grid, can be used for effectively and efficiently organizing, managing, and tracking ideas everywhere: at School, Home, Outdoors, and Work (SHOW). The goal of daily using Fractal Gridding is to rapidly become a lifelong EPIC Learner. "EPIC" is an acronym for "Exponential Productivity, Innovation, and Creativity." Way to go lifelong EPIC Learner! And good long in your journey.

FRACTAL GRIDDING: A Visual Technique for Managing Personal, Enterprise, and N...

Rod King, Ph.D.

Orchestration & provisioning

buildacloud

PPT4: Frameworks & Libraries of Machine Learning & Deep Learning

akira-ai

Large Language Models Bootcamp

Data Science Dojo

Communication with Artificial intelligence

MOHIT AGARWAL

What's hot (20)

Hands-on Introduction to Machine Learning

Intepretability / Explainable AI for Deep Neural Networks

Tutorial on Deep Learning

AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...

Remote-first Team Interactions for Business and Technology Teams @ Berlin CTO...

Explainable AI in Industry (KDD 2019 Tutorial)

Modern Convolutional Neural Network techniques for image segmentation

Generative AI - The New Reality: How Key Players Are Progressing

Marketing Analytics using R/Python

Gan seminar

Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...

Big data landscape version 2.0

Feature Engineering - Getting most out of data for predictive models

Algorithmic Bias : What is it? Why should we care? What can we do about it?

Network embedding

FRACTAL GRIDDING: A Visual Technique for Managing Personal, Enterprise, and N...

Orchestration & provisioning

PPT4: Frameworks & Libraries of Machine Learning & Deep Learning

Large Language Models Bootcamp

Communication with Artificial intelligence

Similar to Three Pillars with Zero Answers: A New Observability Scorecard

Observability has never been more important: the complexity of microservices makes it harder and harder to answer basic questions about system behavior. The conventional wisdom claims that Metrics, Logging and Tracing are “the three pillars” of observability… yet software organizations check these three boxes and are still grasping at straws during emergencies. In this session, we’ll illustrate the problem with the three pillars: metrics, logs, and traces are just data – they are the fuel, not the car.

Three Pillars, Zero Answers: Rethinking Observability

DevOps.com

Observability has never been more important: the complexity of microservices makes it harder and harder to answer basic questions about system behavior. The conventional wisdom claims that Metrics, Logging and Tracing are “the three pillars” of observability… yet software organizations check these three boxes and are still grasping at straws during emergencies. In this session we’ll illustrate the problem with the three pillars: metrics, logs, and traces are just data – they are the fuel, not the car.

Three Pillars, No Answers: Helping Platform Teams Solve Real Observability Pr...

DevOps.com

In this deck from the Stanford HPC Conference, Ryan Quick from Providentia Worldwide presents: High Availability HPC ~ Microservice Architectures for Supercomputing. "Microservices power cloud-native applications to scale thousands of times larger than single deployments. We introduce the notion of microservices for traditional HPC workloads. We will describe microservices generally, highlighting some of the more popular and large-scale applications. Then we examine similarities between large-scale cloud configurations and HPC environments. Finally we propose a microservice application for solving a traditional HPC problem, illustrating improved time-to-market and workload resiliency." Watch the video: https://insidehpc.com/2018/02/high-availability-hpc-microservice-architectures-supercomputing/ Learn more: http://www.providentiaworldwide.com/ and http://hpcadvisorycouncil.com Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

High Availability HPC ~ Microservice Architectures for Supercomputing

inside-BigData.com

Machine Learning Impact on IoT - Part 2

Value Amplify Consulting

Have you ever imagined what it would be like to build a massively scalable streaming application on Kafka, the challenges, the patterns and the thought process involved? How much of the application can be reused? What patterns will you discover? How does it all fit together? Depending upon your use case and business, this can mean many things. Starting out with a data pipeline is one thing, but evolving into a company-wide real-time application that is business critical and entirely dependent upon a streaming platform is a giant leap. Large-scale streaming applications are also called event streaming applications. They are classically different from other data systems; event streaming applications are viewed as a series of interconnected streams that are topologically defined using stream processors; they hold state that models your use case as events. Almost like a deconstructed real-time database. In this talk, I step through the origins of event streaming systems, understanding how they are developed from raw events to evolve into something that can be adopted at an organizational scale. I start with event-first thinking, Domain Driven Design to build data models that work with the fundamentals of Streams, Kafka Streams, KSQL and Serverless (FaaS). Building upon this, I explain how to build common business functionality by stepping through the patterns for: – Scalable payment processing – Run it on rails: Instrumentation and monitoring – Control flow patterns Finally, all of these concepts are combined in a solution architecture that can be used at an enterprise scale. I will introduce enterprise patterns such as events-as-a-backbone, events as APIs and methods for governance and self-service. You will leave talk with an understanding of how to model events with event-first thinking, how to work towards reusable streaming patterns and most importantly, how it all fits together at scale.

The Art of The Event Streaming Application: Streams, Stream Processors and Sc...

confluent

Kakfa summit london 2019 - the art of the event-streaming app

Neil Avery

Intelligent Monitoring

Intelie

CQRS and Event Sourcing: A DevOps perspective

Maria Gomez

Speaker: Neil Avery, Technologist, Office of the CTO, Confluent Stream processing is now at the forefront of many company strategies. Over the last couple of years we have seen streaming use cases explode and now proliferate the landscape of any modern business. Use cases including digital transformation, IoT, real-time risk, payments microservices and machine learning are all built on the fundamental that they need fast data and they need it at scale. Apache Kafka® has long been the streaming platform of choice, its origins of being dumb pipes for big data have long since been left behind and now it is the goto-streaming platform of choice. Stream processing beckons as being the vehicle for driving those streams, and along with it brings a world of real-time semantics surrounding windowing, joining, correctness, elasticity, and accessibility. The ‘current state of stream processing’ walks through the origins of stream processing, applicable use cases and then dives into the challenges currently facing the world of stream processing as it drives the next data revolution. Neil is a Technologist in the Office of the CTO at Confluent, the company founded by the creators of Apache Kafka. He has over 20 years of expertise of working on distributed computing, messaging and stream processing. He has built or redesigned commercial messaging platforms, distributed caching products as well as developed large scale bespoke systems for tier-1 banks. After a period at ThoughtWorks, he went on to build some of the first distributed risk engines in financial services. In 2008 he launched a startup that specialised in distributed data analytics and visualization. Prior to joining Confluent he was the CTO at a fintech consultancy. Watch the recording: https://videos.confluent.io/watch/rmU6GHrd4EKFaZrRhdTE3s?.

The State of Stream Processing

confluent

As enterprises continue to transition their technology organizations to operate at higher velocity and break down barriers with DevOps transformations, they need to be able to respond to service disruptions and incidents quickly, and with minimal impact to the productivity of their teams. Learn how ViaSat — a global provider of satellite and wireless communications — securely connects businesses, governments, and military organizations to the Internet using structure, automated toolsets and teams. Join Marty Jackson, Director, Office of the CTO, xMatters and Chris Crocco, ViaSat’s Network Solutions Engineer, to learn how the ViaSat team managed the integration of alerting, issue tracking and change and release management with JIRA Software, JIRA Service Desk, HipChat and xMatters into a complex IT ecosystem and drove the company’s transition from an Operations to a DevOps-centric model of IT incident communications. Vincent Wong, Product Manager, Atlassian Jared Sutherland, Product Manager, Intuit

Modern Operations at Scale within Viasat – How to Structure Teams and Build A...

Atlassian

QuASD/PROFES 2018, Wolfsburg: Talk by Marcus Ciolkowski (@M_Ciolkowski, Principal IT Consultant at QAware) and Florian Lautenschlager (@flolaut, Senior Software Engineer) === Please download slides if blurred! === Abstract: Important and critical aspects of technical debt often surface at runtime only and are difficult to measure statically. This is a particular challenge for cloud applications because of their highly distributed nature. Fortunately, mature frameworks for collecting runtime data exist but need to be integrated. In this paper, we report an experience from a project that implements a cloud application within Kubernetes on Azure. To analyze the runtime data of this software system, we instrumented our services with Zipkin for distributed tracing; with Prometheus and Grafana for analyzing metrics; and with fluentd, Elasticsearch and Kibana for collecting, storing and exploring log files. However, project team members did not utilize these runtime data until we created a unified and simple access using a chat bot. We argue that even though your project collects runtime data, this is not sufficient to guarantee its usage: In order to be useful, a simple, unified access to different data sources is required that should be integrated into tools that are commonly used by team members. Get the research paper: http://bitly.com/2QmSNwl

Making Runtime Data Useful for Incident Diagnosis: An Experience Report

QAware GmbH

• Explored and cleaned huge amount of user activity logs (JSON) from Movies website using Map Reduce jobs in Python. • Classified user accounts into adults and children for targeted advertising by implementing Similarity Ranking algorithm. • Grouped user sessions based on user behavior using K means clustering to observe outliers and to find distinctive groups. • Predicted ratings for movies using User-user and Item-Item based recommendation algorithms using Mahout.

Cloudera Movies Data Science Project On Big Data

Abhishek M Shivalingaiah

Observability at Spotify

Aleksandr Kuboskin, CFA

Apache Beam is Flink’s sibling in the Apache family of streaming processing frameworks. The Beam and Flink teams work closely together on advancing what is possible in streaming processing, including Streaming SQL extensions and code interoperability on both platforms. Beam was originally developed at Google as the amalgamation of its internal batch and streaming frameworks to power the exabyte-scale data processing for Gmail, YouTube and Ads. It now powers a fully-managed, serverless service Google Cloud Dataflow, as well as is available to run in other Public Clouds and on-premises when deployed in portability mode on Apache Flink, Spark, Samza and other runners. Users regularly run distributed data processing jobs on Beam spanning tens of thousands of CPU cores and processing millions of events per second. In this session, Sergei Sokolenko, Cloud Dataflow product manager, and Reuven Lax, the founding member of the Dataflow and Beam team, will share Google’s learnings from building and operating a global streaming processing infrastructure shared by thousands of customers, including: safe deployment to dozens of geographic locations, resource autoscaling to minimize processing costs, separating compute and state storage for better scaling behavior, dynamic work rebalancing of work items away from overutilized worker nodes, offering a throughput-optimized batch processing capability with the same API as streaming, grouping and joining of 100s of Terabytes in a hybrid in-memory/on-desk file system, integrating with the Google Cloud security ecosystem, and other lessons. Customers benefit from these advances through faster execution of jobs, resource savings, and a fully managed data processing environment that runs in the Cloud and removes the need to manage infrastructure.

Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...

Flink Forward

How to not fail at security data analytics (by CxOSidekick)

Dinis Cruz

Artificial intelligence - A Teaser to the Topic.

Dr. Kim (Kyllesbech Larsen)

Prometheus (Prometheus London, 2016)

Brian Brazil

(Lyndon Hedderly, Confluent) Kafka Summit SF 2018 We all know real-time data has a value. But how do you quantify that value in order to create a business case for becoming more data, or event driven? The first half of this talk will explore the value of data across a variety of organizations, starting with the five most valuable companies in the world: Apple, Alphabet (Google), Microsoft, Amazon and Facebook (based on stock prices July 2017). We will go on to discuss other digital natives: Uber, Ebay, Netflix and LinkedIn, before exploring more traditional companies across retail, finance and automotive. Next, we’ll look at non-businesses such as governments and lobbyists. Whether organizations are using data to create new business products and services, improve user experiences, increase productivity, manage risk or influencing global power, we’ll see that fast and interconnected data, or “event streaming” is increasingly important. After showing that data value can be quantified, the second half of this talk will explain the five steps to creating a business case. Most businesses focus on: -Making more money or conferring competitive advantage to make more money -Increasing efficiency to save money and/or -Mitigating risk to the business to protect money -We’ll walk through examples of real business cases, discuss how business cases have evolved over the years and show the power of a sound business case. If you’re interested in big money and big business, as well as big data, this talk is for you.

How to Quantify the Value of Kafka in Your Organization

confluent

We all know real-time data has a value. But how do you quantify that value in order to create a business case for becoming more data, or event, driven? The first half of this talk will explore the value of data across a variety of organizations, starting with the five most valuable companies in the world: Apple, Alphabet (Google), Microsoft, Amazon, and Facebook. We will go on to discuss other digital natives; Uber, Ebay, Netflix and LinkedIn before exploring more traditional companies across Retail, Finance and Automotive. Whether organizations are using data to create new business products and services, improving user experiences, increasing productivity, managing risk or influencing global power, we’ll see that fast and interconnected data, or ‘event streaming’ is increasingly important. After showing data value can be quantified, the second half of this talk will explain the five steps to creating a business case around Kafka use cases. Most businesses focus on: 1. Making more money, or conferring competitive advantage to make more money 2. Increasing efficiency to save money, and / or 3. Mitigating risk to the business, to protect money. We’ll walk through examples of real business cases, discuss how business cases have evolved over the years and show the power of a sound business case. If you’re interested in Big Money and Big Business, as well as Big Data, this talk is for you.

Connecting Kafka to Cash (CKC) (Lyndon Hedderly, Confluent) Kafka Summit Lond...

confluent

The process of streaming real-time data from a wide variety of machine data sources and entities can be very complex and unwieldy. Using an agent-based approach, Informatica has invented a new technique and open access product that makes this process much more user friendly and efficient, even when dealing with multiple environments such as Hadoop, Cassandra, Storm, Amazon Kinesis and Complex Event Processing.

Streaming real time data with Vibe Data Stream

InformaticaMarketplace

More from DevOps.com

In the past decade, IDC has seen IBM Z evolve first from a siloed platform to what they call a "connected" platform, and then to a "transformative" platform. This transition has been driven by IBM, by the IBM Z software vendors, like Rocket Software, and by businesses themselves. IDC research shows that businesses that choose to modernize IBM Z achieve higher satisfaction than re-platformers and many are using open source software (OSS) in their modernization initiatives. Employing OSS makes it possible to crack the platform open and enable it to connect to the rest of the datacenter and the outside world. Join IDC guest speaker, Al Gillen and Peter Fandel as they take a deeper look at the value proposition associated with using commercially supported OSS in mission-critical environments, like IBM Z. In this webinar we’ll discuss: How OSS can neutralize the disparity between seasoned IBM Z and emerging developers The modernization initiatives that involve OSS What to consider before bringing OSS to IBM Z How Rocket Software is delivering commercially supported OSS to IBM Z

Modernizing on IBM Z Made Easier With Open Source Software

DevOps.com

With the growing adoption of Kubernetes, organizations want to take advantage of containerized Microsoft SQL Server 2019 to optimize transactional performance and accelerate time-to-insights from their business-critical data. However, as enterprises embrace hybrid cloud strategy, they need to consider several aspects based on the performance, cost and data protection requirements for running enterprise-grade SQL Server databases. In this webinar, we will compare and contrast various cloud-native platforms for SQL Server that would help CIOs, DevOps engineers, database administrators and applications architects to determine the most suitable platform that fits their business needs. Join us as we explore some exciting results from a recent performance benchmark study conducted by McKnight Consulting Group, an independent consulting firm, to compare the performance of Microsoft SQL Server 2019 on the best possible configurations of the following Kubernetes platforms: Diamanti Enterprise Kubernetes Platform Amazon Web Services Elastic Kubernetes Service (AWS EKS) Azure Kubernetes Service (AKS) Topics will include: Platform considerations and requirements for running Microsoft SQL Server 2019 Performance comparison and analysis of running SQL Server on various platform Best practices for running containerized SQL Server databases in Kubernetes environment

Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...

DevOps.com

Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...

DevOps.com

Vulnerability assessment for teams can often be overwhelming. The dependency graph could be thousands of packages depending on the application. Triaging vulnerability data and prioritizing actions has historically been a very manual process, until now. With Datadog and Snyk, learn how to trace security and performance issues by leveraging continuous profiling capabilities for actionable insight that help developers remediate problems. Join us on Thursday, January 21 for a unique opportunity to learn more about continuous profiling, vulnerability management, and the benefit to customers from using both of these products. In this webinar, you will: Bust some myths around continuous profiling and learn how Datadog differentiates itself See decorated traces in action for sample Java applications and understand how Snyk + Datadog reduce time to triage supply chain vulnerabilities Learn roadmap information for upcoming public announcements from both partners

Next Generation Vulnerability Assessment Using Datadog and Snyk

DevOps.com

In the era of cloud generation, the constant activity around workloads and containers create more vulnerabilities than an organization can keep up with. Using legacy security vendors doesn't set you up for success in the cloud. You’re likely spending undue hours chasing, triaging and patching a countless stream of cloud vulnerabilities with little prioritization. Join us for this live webinar as we detail how to streamline host and container vulnerability workflows for your software teams wanting to build fast in the cloud. We'll be covering how to: Get visibility into active packages and associated vulnerabilities Reduce false positives by 98% Reduce investigation time by 30% Spot a legacy vendor looking to do some cloud washing

Vulnerability Discovery in the Cloud

DevOps.com

If you work in software development, jumpstart your engineering team in 2021—get ahead of the engineering curve and your competitors—by attending this must-watch open source trends and predictions webinar. Alex Rybak, Director of Product Management at Revenera, and Russ Eling, founder and CEO of OSS Engineering Consultants, share their top 10 open source usage, license compliance and security insights for the new year. Just a few hints at what you’ll learn more about: Where the adoption of shift-left is headed and the decisions you’ll face going forward The impact of a lack of software developer security training relative to pandemic fallout The broader role of the engineering team in open source management and governance The expanding role and impact of open source marketplaces such as GitHub Don’t miss the discussion for valuable insight and learning for software engineering teams

2021 Open Source Governance: Top Ten Trends and Predictions

DevOps.com

2020 was a brutal year for ransomware. Cybercriminals operated without any human decency, targeting the most vulnerable and at-risk parties, such as hospitals, scientists, and global manufacturers. The approach has become more sophisticated and life-threatening, shifting from individual targets to global enterprises, destroying backups, blackmailing victims with public leakage of exfiltrated data, and paralyzing critical systems and infrastructure.

A New Year’s Ransomware Resolution

DevOps.com

As containers and Kubernetes are adopted in production, security is a critical concern and DevOps teams need to go beyond image scanning. Use cases such as runtime security, network visibility and segmentation, incident response and compliance become priorities as your Kubernetes security framework matures. In this talk, we’ll share an overview of runtime security, discuss approaches used by open source and commercial tools, and hear how users are getting started quickly without impacting developer productivity.

Getting Started with Runtime Security on Azure Kubernetes Service (AKS)

DevOps.com

In any fast-paced engineering environment, unexpected incidents can arise and escalate without warning. Without strong leadership within teams, you get chaotic, stressful, and tiring situations that waste valuable engineering time, slow down resolution, and most importantly, impact your customers. Operationally mature organisations use proven incident response systems led by Incident Commanders. Incident Commanders provide the leadership needed to help stabilize major incidents fast. In this webinar, we’ll take lessons learned from formalized incident response, such as those used by first responders, and show you how to apply those same practices to your organization. By utilising these methods you’ll improve both the speed and effectiveness of your team’s response, reducing the amount of downtime experienced. In this workshop, attendees will: Be introduced to the Incident Command System and learn how it can be adapted to their organisation Walk through the basics of incident response best practices Discuss examples of formal incident response from multiple organisations

Don't Panic! Effective Incident Response

DevOps.com

Chaos engineering is becoming a critical part of the DevOps toolchain when adopting Site Reliability Engineering (SRE) practices. Every system is becoming a distributed system and chaos engineering proclaims many advantages for them. It improves infrastructure automation, increases reliability and transforms incident management. However, an often-overlooked benefit of chaos engineering and SRE involves culture transformation. Culture is often touched upon when talking about chaos engineering and SRE but not as often as skills and process. In this webinar, we will discuss how you can build out a chaos engineering practice and how you can adopt a true blameless culture and maximize the potential of your team. You will learn how to: Hold blameless postmortems Share post mortems with other teams Run regular fire drills and game days Automate chaos experiments for continuous validation

Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture

DevOps.com

Enterprises are best served by leveraging an RBAC system to manage access to their SSH and Kubernetes resources. With Teleport, an open source software, employers are able to provide granular access controls to developers based on the access they need and when they need it. This makes it possible for employers to maintain secure access without getting in the way of their developers’ daily operations. Join Steven Martin, solution engineer at Teleport, as he demonstrates how to assign access to developers and SRE’s across environments with Teleport through roles mapped from enterprises’ identity providers or SSOs.

Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport

DevOps.com

Monitoring Serverless Applications with Datadog

DevOps.com

Developers are increasingly adopting a microservices approach for their apps in order to gain rapid iteration capabilities required for delivering new services faster. However, delivering the App still requires multiple steps such as allocation of virtual IPs, provisioning the front load balancer, configuring firewall rules, configuring a public domain, and DDOS. At present, each of these steps requires coordination across multiple teams with multiple iterations per team. The time efficiencies gained by adopting microservices and cloud-native technologies is negated due to the time taken to deliver the App. In this session, Pranav Dharwadkar, VP of products at Volterra, and Jakub Pavlik, director of engineering, will help you understand these challenges and introduce a distributed proxy architecture that can alleviate the challenges across different cloud environments. This webinar will include a live demo using a distributed proxy architecture to advertise an App publicly and privately. In this webinar, you will learn: The steps required to deliver an App using the current approaches How a distributed proxy architecture can be used to deliver the app publicly and privately The operational benefits of a distributed proxy architecture for delivering new services

Deliver your App Anywhere … Publicly or Privately

DevOps.com

The COVID-19 pandemic has drastically altered the connected healthcare landscape, accelerating the usage of telemedicine and other remote healthcare delivery systems by as much as 11,000% for some populations. How has this unprecedented push affected healthcare and medical device application security? The security team at Intertrust recently analyzed 100 Android and iOS medical apps to find out. In this webinar, we'll discuss: Medical application and device threat trends The top mHealth security vulnerabilities uncovered in our analysis Strategies to keep your mHealth apps safe Future advances in digital healthcare and how your security can evolve with it

Securing medical apps in the age of covid final

DevOps.com

Raise your hand if you enjoy being buried in alerts or woken up at 2 a.m. — yeah … thought so. Ever-rising customer expectations around high availability and performance put massive pressure on the teams who develop and support SaaS products. And teams are literally losing sleep over it. Until outages and other incidents are a thing of the past, organizations need to invest in a way of dealing with them that won’t lead to burn-out. In this session, you’ll learn how to combine the latest tooling with DevOps practices in the pursuit of a sustainable incident response workflow. It’s all about transparency, actionable alerts, resilience and learning from each incident.

How to Build a Healthy On-Call Culture

DevOps.com

The role of the developer continues to change as they sit on the front line of application and even cloud infrastructure security. Today, developers are focused on innovating fast and improving security, but how do high-performing teams accomplish this? They commit code frequently, release often and update dependencies regularly (608x faster than others). In this webinar, we'll discuss the key traits of high-performing teams and how that impacts the role of the developer. Key Takeaways: Choose the best third party dependencies Determine the lowest effort upgrades between open source versions Solve for issues in both direct and transitive dependencies with a single-click Block and quarantine suspicious open source components

The Evolving Role of the Developer in 2021

DevOps.com

Today, one of the big concepts buzzing in the app development world is service mesh. A service mesh is a configurable infrastructure layer for microservices application that makes communication flexible, reliable and fast. Let’s take a step back, though, and answer this question: Do you need a service mesh? Join this webinar to learn: What a service mesh is; when and why you need it — or when and why you may not App modernization journey and traffic management approaches for microservices-based apps How to make an informed decision based on cost and complexity before adopting service mesh Learn about NGINX Service Mesh in a live demo, and how it provides the best service mesh option for container-based L7 traffic management

Service Mesh: Two Big Words But Do You Need It?

DevOps.com

Red Hat OpenShift is enabling quicker adoption of DevOps practices. Containers are an essential component of DevOps and the OpenShift Kubernetes Container Platform is integral for orchestration within these environments. Data security is now challenged to keep pace with the size and scope of container usage. The migration from legacy in-house deployments to hybrid-cloud installations has created new attack surfaces as data is shared more freely in Kubernetes deployments. Protecting data at rest and in motions is a necessity. Learn how you can keep data protected and securely share data in OpenShift environments with real-time data protection solutions.

Secure Data Sharing in OpenShift Environments

DevOps.com

Managing access permissions in the public cloud can be a very complex process. In fact, by 2023, 75% of cloud security failures will result from the inadequate management of identities, access and privileges, according to Gartner. Join us as Guy Flechter, CISO of AppsFlyer, presents a real-world case of how his company works to enforce least-privilege and to govern identities in their cloud. This webinar will also provide an overview of how to govern access and achieve least privilege by analyzing the access permissions and activity in your public cloud environment. With thousands of human and machine identities, roles, policies and entitlements, this webinar will give you the tools to examine the access open to people and services in your public cloud, and determine whether that access is necessary. In this workshop, you will learn about: The risks of IAM misconfiguration and excessive entitlements in cloud environments The challenges in identifying and mitigating Identity and access risks for both human and machine identities How to automate cloud identity governance and entitlement management with Ermetic

How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...

DevOps.com

Open-source machine learning can be transformative, but without the proper tools in place, enterprises struggle to balance the IT security and governance requirements with the need to deliver these powerpoint tools into the hands of their developers and modelers. How can organizations get the latest technology from the open-source brain trust, while ensuring enterprise-grade management and security? In this webinar, we will discuss how Anaconda Team Edition, available on RedHat Marketplace, enables IT departments to mirror a curated set of packages into their organization in a safe and governed way. Join Michael Grant, VP of services at Anaconda, to discuss: How IT organizations are using Anaconda Team Edition to curate, govern and secure Python and R packages Tips for how development and data science teams can get the most out of Team Edition, from uploading your own packages to building custom channels for groups or projects How to distribute conda environments to desktops, servers and clusters: GUI-based installers for desktop users “Conda packs” for automated delivery to remote servers and distributed computing clusters Conda-enabled Docker containers for application deployment

Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...

DevOps.com

More from DevOps.com (20)

Modernizing on IBM Z Made Easier With Open Source Software

Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...

Next Generation Vulnerability Assessment Using Datadog and Snyk

Vulnerability Discovery in the Cloud

2021 Open Source Governance: Top Ten Trends and Predictions

A New Year’s Ransomware Resolution

Getting Started with Runtime Security on Azure Kubernetes Service (AKS)

Don't Panic! Effective Incident Response

Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture

Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport

Monitoring Serverless Applications with Datadog

Deliver your App Anywhere … Publicly or Privately

Securing medical apps in the age of covid final

How to Build a Healthy On-Call Culture

The Evolving Role of the Developer in 2021

Service Mesh: Two Big Words But Do You Need It?

Secure Data Sharing in OpenShift Environments

How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...

Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...

Recently uploaded

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Edi Saputra

Discover the innovative features and strategic vision that keep WSO2 an industry leader. Explore the exciting 2024 roadmap of WSO2 API management, showcasing innovations, unified APIM/APK control plane, natural language API interaction, and cloud native agility. Discover how open source solutions, microservices architecture, and cloud native technologies unlock seamless API management in today's dynamic landscapes. Leave with a clear blueprint to revolutionize your API journey and achieve industry success!

WSO2's API Vision: Unifying Control, Empowering Developers

WSO2

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

ICT role in 21st century education and its challenges

rafiqahmad00786416

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

[BuildWithAI] Introduction to Gemini.pdf

Sandro Moreira

Dubai, known for its towering skyscrapers, luxurious lifestyle, and relentless pursuit of innovation, often finds itself in the global spotlight. However, amidst the glitz and glamour, the emirate faces its own set of challenges, including the occasional threat of flooding. In recent years, Dubai has experienced sporadic but significant floods, disrupting normalcy and posing unique challenges to its infrastructure. Among the critical nodes in this bustling metropolis is the Dubai International Airport, a vital hub connecting the world. This article delves into the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Orbitshub

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the deployment of external web forms using Jotform for Bonterra Impact Management. This solution can be customized to your organization’s needs and deployed to support the common use cases below: - Intake and consent - Assessments - Surveys - Applications - Program registration Interested in deploying web form automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Jeffrey Haguewood

Scaling API-first – The story of a global engineering organization Ian Reasor, Senior Computer Scientist - Adobe Radu Cotescu, Senior Computer Scientist - Adobe Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

apidays

Webinar Recording: https://www.panagenda.com/webinars/why-teams-call-analytics-is-critical-to-your-entire-business Nothing is as frustrating and noticeable as being in an important call and being unable to see or hear the other person. Not surprising then, that issues with Teams calls are among the most common problems users call their helpdesk for. Having in depth insight into everything relevant going on at the user’s device, local network, ISP and Microsoft itself during the call is crucial for good Microsoft Teams Call quality support. To ensure a quick and adequate solution and to ensure your users get the most out of their Microsoft 365. But did you know that ‘bad calls’ are also an excellent indicator of other problems arising? Precisely because it is so noticeable!? Like the canary in the mine, bad calls can be early indicators of problems. Problems that might otherwise not have been noticed for a while but can have a big impact on productivity and satisfaction. Join this session by Christoph Adler to learn how true Microsoft Teams call quality analytics helped other organizations troubleshoot bad calls and identify and fix problems that impacted Teams calls or the use of Microsoft365 in general. See what it can do to keep your users happy and productive! In this session we will cover - Why CQD data alone is not enough to troubleshoot call problems - The importance of attributing call problems to the right call participant - What call quality analytics can do to help you quickly find, fix-, and prevent problems - Why having retrospective detailed insights matters - Real life examples of how others have used Microsoft Teams call quality monitoring to problem shoot problems with their ISP, network, device health and more.

Why Teams call analytics are critical to your entire business

panagenda

Following the popularity of “Cloud Revolution: Exploring the New Wave of Serverless Spatial Data,” we’re thrilled to announce this much-anticipated encore webinar. In this sequel, we’ll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you’re building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

MS Copilot expands with MS Graph connectors

Nanddeep Nachan

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

MadyBayot

Artificial Intelligence Chap.5 : Uncertainty

Khushali Kathiriya

DBX First Quarter 2024 Investor Presentation

Dropbox

Accelerating FinTech Innovation: Unleashing API Economy and GenAI Vasa Krishnan, Chief Technology Officer - FinResults Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

apidays

Understanding the FAA Part 107 License ..

Christopher Logan Kennedy

The microservices honeymoon is over. When starting a new project or revamping a legacy monolith, teams started looking for alternatives to microservices. The Modular Monolith, or 'Modulith', is an architecture that reaps the benefits of (vertical) functional decoupling without the high costs associated with separate deployments. This talk will delve into the advantages and challenges of this progressive architecture, beginning with exploring the concept of a 'module', its internal structure, public API, and inter-module communication patterns. Supported by spring-modulith, the talk provides practical guidance on addressing the main challenges of a Modultith Architecture: finding and guarding module boundaries, data decoupling, and integration module-testing. You should not miss this talk if you are a software architect or tech lead seeking practical, scalable solutions. About the author With two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Victor Rentea

Recently uploaded (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

WSO2's API Vision: Unifying Control, Empowering Developers

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

ICT role in 21st century education and its challenges

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

[BuildWithAI] Introduction to Gemini.pdf

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Why Teams call analytics are critical to your entire business

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

MS Copilot expands with MS Graph connectors

AWS Community Day CPH - Three problems of Terraform

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

Artificial Intelligence Chap.5 : Uncertainty

DBX First Quarter 2024 Investor Presentation

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

Understanding the FAA Part 107 License ..

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Three Pillars with Zero Answers: A New Observability Scorecard

1. Three Pillars with Zero Answers A New Observability Scorecard December 6, 2018

2. First, a Critique

3. Observing microservices is hard Google and Facebook solved this (right???) They used Metrics, Logging, and Distributed Tracing… So we should, too. The Conventional Wisdom

4. The Three Pillars of Observability - Metrics - Logging - Distributed Tracing

5. Metrics!

6. Logging!

7. Tracing!

9. Fatal Flaws

10. A word nobody knew in 2015… Dimensions (aka “tags”) can explain variance in timeseries data (aka “metrics”) … … but cardinality

11. Logging Data Volume: a reality check transaction rate x all microservices x cost of net+storage x weeks of retention ----------------------- way too much $$$$

12. The Life of Transaction Data: Dapper Stage Overhead affects… Retained Instrumentation Executed App 100.00% Buffered within app process App 000.10% Flushed out of process App 000.10% Centralized regionally Regional network + storage 000.10% Centralized globally WAN + storage 000.01%

13. Logs Metrics Dist. Traces TCO scales gracefully – ✓ ✓ Accounts for all data (i.e., unsampled) ✓ ✓ – Immune to cardinality ✓ – ✓ Fatal Flaws: A Review

14. Data vs UI

15. Data vs UI

16. Data vs UI Metrics Logs Traces

17. Metrics, Logs, and Traces are Just Data, … not a feature or use case.

18. A New Scorecard for Observability

19. “SLI” = “Service Level Indicator” TL;DR: An SLI is an indicator of health that a service’s consumers would care about. … not an indicator of its inner workings Observability: Quick Vocab Refresher

20. Observability: Two Fundamental Goals - Gradually improving an SLI - Rapidly restoring an SLI Reminder: “SLI” = “Service Level Indicator” NOW!!!! days, weeks, months…

21. 1. Detection: perfect SLI capture 2. Refinement: reduce the search space Observability: Two Fundamental Activities

22. An interlude about stats frequency

23. Specificity: - Arbitrary dimensionality and cardinality - Any layer of the stack, including mobile+web! Fidelity: - Correct stats!!! - High stats frequency (i.e., “beware smoothing”!) Freshness: ≤ 5 second lag Scorecard >> Detection

24. # of things your users actually care about # of microservices # of failure modes Must reduce the search space! Scorecard >> Refinement

25. Scorecard >> Refinement Identify Variance Explain Variance Deploy Fix

26. An interlude about variance and “p99”

27. Scorecard >> Refinement Identifying Variance: - Cardinality: understand which tag changed - Robust stats: histograms (see prev slide) - Data retention: always “Know What’s Normal” Explaining variance: - Correct stats!!! - “Suppress the messengers” of microservice failures

28. Wrapping up…

29. (first, a hint at my perspective)

30. The Life of Transaction Data: Dapper Stage Overhead affects… Retained Instrumentation Executed App 100.00% Buffered within app process App 000.10% Flushed out of process App 000.10% Centralized regionally Regional network + storage 000.10% Centralized globally WAN + storage 000.01% (Review)

31. The Life of Transaction Data: Dapper LightStep Stage Overhead affects… Retained Instrumentation Executed App 100.00% Buffered within app process App 100.00% Flushed out of process App 100.00% Centralized regionally Regional network + storage 100.00% Centralized globally WAN + storage on-demand

32. Refinement - Identifying variance: unlimited cardinality, hi-fi histograms, data retention - “Suppress the messengers” Detection - Specificity: unlimited cardinality, across the entire stack - Fidelity: correct stats, high stats frequency - Freshness: ≤ 5 seconds An Observability Scorecard

33. Thank you! Ben Sigelman, Co-founder and CEO twitter: @el_bhs email: bhs@lightstep.com

34. Extra slides

35. Ideal Measurement: Robust

36. Ideal Measurement: High-Dimensional

37. Ideal Refinement: Real-time Must be able to test and eliminate hypotheses quickly - Actual data must be ≤10s fresh - UI / API latency must be very low

38. Ideal Refinement: Global

39. Ideal Refinement: Context-Rich We can’t expect humans to know what’s normal

40. Thank you / Q&A

Three Pillars with Zero Answers: A New Observability Scorecard

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Three Pillars with Zero Answers: A New Observability Scorecard

Similar to Three Pillars with Zero Answers: A New Observability Scorecard (20)

More from DevOps.com

More from DevOps.com (20)

Recently uploaded

Recently uploaded (20)

Three Pillars with Zero Answers: A New Observability Scorecard