The pervasiveness of cloud and containers has led to systems that are much more distributed and dynamic in nature. Highly elastic microservice and serverless architectures mean containers spin up on demand and scale to zero when that demand goes away. In this world, servers are very much cattle, not pets. This shift has exposed deficiencies in some of the tools and practices we used in the world of servers-as-pets. Specifically, there are questions around how we monitor and debug these types of systems at scale. And with the rise of DevOps and product mindset, making data-driven decisions is becoming increasingly important for agile development teams.
In this talk, we discuss a new approach to system monitoring and data collection: the observability pipeline. For organizations that are heavily siloed, this approach can help empower teams when it comes to operating their software. The observability pipeline provides a layer of abstraction that allows you to get operational data such as logs and metrics everywhere it needs to be without impacting developers and the core system. Unlocking this data can also be a huge win for the business with things like auditability, business analytics, and pricing. Lastly, it allows you to change backing data systems easily or test multiple in parallel. With the amount of data and the number of tools modern systems demand these days, we'll see how the observability pipeline becomes just as essential to the operations of a service as the CI/CD pipeline.
This document provides an overview of service mesh and the Istio observability tool Kiali. It begins with an introduction to service mesh and what problems it addresses in microservices architectures. Istio is presented as an open source service mesh that provides traffic management, observability, and policy enforcement for microservices. Kiali is specifically discussed as a tool for visualizing the topology and traffic flow of services in an Istio mesh. The rest of the document provides an agenda and then a live demo of Kiali's features using the Bookinfo sample application on Istio.
This document discusses concepts related to observability including Prometheus, ELK stack, OpenTracing, and Victoria Metrics. It provides examples of setting up Prometheus and Grafana to monitor metrics from applications instrumented with exporters. It also demonstrates setting up Filebeat, Logstash and Elasticsearch (ELK stack) to monitor logs and send them to Elasticsearch. Additionally, it shows how to implement OpenTracing in a Java application and visualize traces using Jaeger. Finally, it outlines an exercise to build a microservices ecommerce application incorporating logging, metrics and tracing using the discussed tools.
Improve monitoring and observability for kubernetes with oss toolsNilesh Gule
Slide deck from the ASEAN Cloud Summit meetup on 27 January 2022. The session cover the following topics
1 - Centralized Loggin with Elasticsearch, Fluentbit and Kibana
2 - Monitoring and Alerting with Prometheus and Grafana
3 - Exception aggregation with Sentry
The live demo showcased these aspects using Azure Kubernetes Service (AKS)
Observability refers to the ability to infer the internal state of a system from its external outputs. It is a property of the system, not an action like monitoring. For a system to be observable, it must externalize its state through logs, metrics, and events. Improving observability involves monitoring all components of an application from the front-end to backend services to infrastructure. Common metrics include requests processed, errors encountered, and response times for applications as well as CPU usage, disk I/O, and network traffic for infrastructure. Observability extends monitoring by helping understand why a system is not working in addition to whether it is working.
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018Amazon Web Services
In modern, microservices-based applications, it’s critical to have end-to-end observability of each microservice and the communications between them in order to quickly identify and debug issues. In this session, we cover the techniques and tools to achieve consistent, full-application observability, including monitoring, tracing, logging, and service mesh.
- What are Internal Developer Portal (IDP) and Platform Engineering?
- What is Backstage?
- How Backstage can help dev to build developer portal to make their job easier
Jirayut Nimsaeng
Founder & CEO
Opsta (Thailand) Co., Ltd.
Youtube Record: https://youtu.be/u_nLbgWDwsA?t=850
Dev Mountain Tech Festival @ Chiang Mai
November 12, 2022
This presentation has been presented at the "Vienna DevOps & Security Meetup" in 2021.
It discusses the state of monitoring, what Opentelemetry is and why should you care about it.
Concepts and basics are discussed and presented in a full example extracting traces, metrics and logs.
Demo: https://github.com/secustor/opentelemetry-meetup
What is observability and how is it different from traditional monitoring? How do we effectively monitor and debug complex, elastic microservice architectures? In this interactive discussion, we’ll answer these questions. We’ll also introduce the idea of an “observability pipeline” as a way to empower teams following DevOps practices. Lastly, we’ll demo cloud-native observability tools that fit this “observability pipeline” model, including Fluentd, OpenTracing, and Jaeger.
This document provides an overview of service mesh and the Istio observability tool Kiali. It begins with an introduction to service mesh and what problems it addresses in microservices architectures. Istio is presented as an open source service mesh that provides traffic management, observability, and policy enforcement for microservices. Kiali is specifically discussed as a tool for visualizing the topology and traffic flow of services in an Istio mesh. The rest of the document provides an agenda and then a live demo of Kiali's features using the Bookinfo sample application on Istio.
This document discusses concepts related to observability including Prometheus, ELK stack, OpenTracing, and Victoria Metrics. It provides examples of setting up Prometheus and Grafana to monitor metrics from applications instrumented with exporters. It also demonstrates setting up Filebeat, Logstash and Elasticsearch (ELK stack) to monitor logs and send them to Elasticsearch. Additionally, it shows how to implement OpenTracing in a Java application and visualize traces using Jaeger. Finally, it outlines an exercise to build a microservices ecommerce application incorporating logging, metrics and tracing using the discussed tools.
Improve monitoring and observability for kubernetes with oss toolsNilesh Gule
Slide deck from the ASEAN Cloud Summit meetup on 27 January 2022. The session cover the following topics
1 - Centralized Loggin with Elasticsearch, Fluentbit and Kibana
2 - Monitoring and Alerting with Prometheus and Grafana
3 - Exception aggregation with Sentry
The live demo showcased these aspects using Azure Kubernetes Service (AKS)
Observability refers to the ability to infer the internal state of a system from its external outputs. It is a property of the system, not an action like monitoring. For a system to be observable, it must externalize its state through logs, metrics, and events. Improving observability involves monitoring all components of an application from the front-end to backend services to infrastructure. Common metrics include requests processed, errors encountered, and response times for applications as well as CPU usage, disk I/O, and network traffic for infrastructure. Observability extends monitoring by helping understand why a system is not working in addition to whether it is working.
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018Amazon Web Services
In modern, microservices-based applications, it’s critical to have end-to-end observability of each microservice and the communications between them in order to quickly identify and debug issues. In this session, we cover the techniques and tools to achieve consistent, full-application observability, including monitoring, tracing, logging, and service mesh.
- What are Internal Developer Portal (IDP) and Platform Engineering?
- What is Backstage?
- How Backstage can help dev to build developer portal to make their job easier
Jirayut Nimsaeng
Founder & CEO
Opsta (Thailand) Co., Ltd.
Youtube Record: https://youtu.be/u_nLbgWDwsA?t=850
Dev Mountain Tech Festival @ Chiang Mai
November 12, 2022
This presentation has been presented at the "Vienna DevOps & Security Meetup" in 2021.
It discusses the state of monitoring, what Opentelemetry is and why should you care about it.
Concepts and basics are discussed and presented in a full example extracting traces, metrics and logs.
Demo: https://github.com/secustor/opentelemetry-meetup
What is observability and how is it different from traditional monitoring? How do we effectively monitor and debug complex, elastic microservice architectures? In this interactive discussion, we’ll answer these questions. We’ll also introduce the idea of an “observability pipeline” as a way to empower teams following DevOps practices. Lastly, we’ll demo cloud-native observability tools that fit this “observability pipeline” model, including Fluentd, OpenTracing, and Jaeger.
Observability has emerged as one of the hottest topics on the DevOps landscape. Organizations seek to improve visibility into their cloud infrastructure and applications and identify production issues that may negatively impact #customerexperience.
➡️ But what are some of the best practices for scaling observability for modernapplications?
➡️ What challenges are #cloudplatforms facing?
Explore how to overcome the challenges and unlock speed, observability, and automation across your DevOps lifecycle.
Observability in Java: Getting Started with OpenTelemetryDevOps.com
Our software is more complex than ever: applications must be reliable, predictable, and easy to use to meet modern expectations. As developers, this means our responsibilities have grown while the things we can control have stayed the same. In order to better understand our systems and create truly modern software, we need observability.
This workshop will walk through what observability means for Java developers and how to achieve it in our systems with the least amount of work using the open source observability project OpenTelemetry.
OpenTelemetry is a set of APIs, SDKs, tooling and integrations that are designed for the creation and management of telemetry data such as traces, metrics, and logs. It aims to enable effective observability by making high-quality, portable telemetry ubiquitous and vendor-agnostic. The OpenTelemetry Collector is an independent process that acts as a "universal agent" to collect, process, and export telemetry data in a highly performant and stable manner, supporting multiple types of telemetry through customizable pipelines consisting of receivers, processors, and exporters.
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...Splunk
With the acceleration of customer and business demands, site reliability engineers and IT Ops analysts now require operational visibility into their entire architecture, something that traditional APM tools, dev logging tools, and SRE tools aren’t equipped to provide. Observability enables you to inspect and understand your IT stack on premises and in the cloud(s); It’s no longer about whether your system works (monitoring), but being able to task why it is not working? (Observability). This presentation will outline key steps to take to move from monitoring to observability.
This document provides an overview of OpenTelemetry for operators. It discusses some of the limitations of current observability platforms and how OpenTelemetry addresses these issues. It introduces the OpenTelemetry project which combines distributed tracing, metrics, and logging APIs. It describes the OpenTelemetry Collector which receives, processes, and exports telemetry data. It provides examples of Collector configuration and running it in production. It also discusses some innovations in the observability space from vendors like Dynatrace, New Relic, Splunk SignalFX, and others.
The monolith to cloud-native, microservices evolution has driven a shift from monitoring to observability. OpenTelemetry, a merger of the OpenTracing and OpenCensus projects, is enabling Observability 2.0. This talk gives an overview of the OpenTelemetry project and then outlines some production-proven architectures for improving the observability of your applications and systems.
Combining logs, metrics, and traces for unified observabilityElasticsearch
Learn how Elasticsearch efficiently combines data in a single store and how Kibana is used to analyze it. Plus, see how recent developments help identify, troubleshoot, and resolve operational issues faster.
Free GitOps Workshop + Intro to Kubernetes & GitOpsWeaveworks
Follow along in this free workshop and experience GitOps!
AGENDA:
Welcome - Tamao Nakahara, Head of DX (Weaveworks)
Introduction to Kubernetes & GitOps - Mark Emeis, Principal Engineer (Weaveworks)
Weave Gitops Overview - Tamao Nakahara
Free Gitops Workshop - David Harris, Product Manager (Weaveworks)
If you're new to Kubernetes and GitOps, we'll give you a brief introduction to both and how GitOps is the natural evolution of Kubernetes.
Weave GitOps Core is a continuous delivery product to run apps in any Kubernetes. It is free and open source, and you can get started today!
https://www.weave.works/product/gitops-core
If you’re stuck, also come talk to us at our Slack channel! #weave-gitops http://bit.ly/WeaveGitOpsSlack (If you need to invite yourself to the Slack, visit https://slack.weave.works/)
George Kobar, a community advocate for Capgemini, shared information on observability at a Meetup for New Application Development. The document defines observability as the combination of monitoring, metrics, and logging. It presents a typical observability stack that collects data from various sources to provide visibility for development, operations, and business teams through tools that analyze application performance, uptime, logs, metrics, and business KPIs. The stack advocates an elastic approach to storing all operational data together in Elasticsearch for unified access and analysis.
This document provides an overview of Docker concepts including containers, images, Dockerfiles, and the Docker architecture. It defines key Docker terms like images, containers, and registries. It explains how Docker utilizes Linux kernel features like namespaces and control groups to isolate containers. It demonstrates how to run a simple Docker container and view logs. It also describes the anatomy of a Dockerfile and common Dockerfile instructions like FROM, RUN, COPY, ENV etc. Finally, it illustrates how Docker works by interacting with the Docker daemon, client and Docker Hub registry to build, run and distribute container images.
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native WayDatabricks
At Nielsen Identity, we use Apache Spark to process 10’s of TBs of data, running on AWS EMR. We started at a point where Spark was not even supported out-of-the-box by EMR, and today we’re spinning-up clusters with 1000’s of nodes on a daily basis, orchestrated by Airflow. A few months ago, we embarked on a journey to evaluate the option of using Kubernetes as our Spark infrastructure, mainly to reduce operational costs and improve stability (as we heavily rely on Spot Instances for our clusters). To allow us to achieve those goals, we combined the open-sourced GCP Spark-on-K8s operator (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) with a native Airflow integration we developed and recently contributed back to the Airflow project (https://issues.apache.org/jira/browse/AIRFLOW-6542). Finally, we were able to migrate our existing Airflow DAGs, with minimal changes, from AWS EMR to K8s.
By Tom Wilkie, delivered at London Microservices User Group on 2/12/15
The rise of microservice-based applications has had many knock-on effects, not least on the complexity of monitoring your application. Order-of-magnitude increase in the number of moving parts and rate of change of the application require us to reassess traditional monitoring techniques.
In this talk we will discuss some different approaches to monitoring, visualising and tracing containerised, microservices-based applications. We’ll present different techniques to some of the emergent problems, and try not to rant too much.
Elastic Observability is helping organizations drive their mean time to resolution toward zero with end-to-end visibility in a single platform. Hear about the latest features and capabilities at all layers — from ingest to insight — and get a glimpse into where we are headed.
This document summarizes a presentation about observability using Splunk. It includes an agenda introducing observability and why Splunk for observability. It discusses the need for modernization initiatives in companies and the thousands of changes required. It presents that Splunk provides end-to-end visibility across metrics, traces and logs to detect, troubleshoot and optimize systems. It shares a customer case study of Accenture using Splunk observability in their hybrid cloud environment. Finally, it concludes that observability with Splunk can drive results like reduced downtime and faster innovation.
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashAmazon Web Services
Version 7 of the Elastic Stack adds powerful new features to the popular open source platform for search, logging, and analytics. Come hear directly from Elastic engineers and architecture team members on powerful new additions like GIS functionality and frozen-tier search. Plus, hear about the full range of orchestration options for getting the most out of your deployments, however and wherever you choose to run them. This session is sponsored by Elastic.
This presentation starts with an introduction to the rationale behind automated deployments in Continuous Delivery and DevOps. Then, I compare agent-based architectures, such as Chef and Puppet with the agentless architecture of the server orchestration engine Ansible. The presentation concludes with an automated deployment of Dynatrace into a simulated production environment.
My contribution to the "Grafana & Friends" Meetup.
This presentation goes into the context in the Observability landscape, the basics of OpenTelemetry with its signals and lookout what to expect next.
Understand your system like never before with OpenTelemetry, Grafana, and Pro...LibbySchulze
This document discusses using OpenTelemetry, Grafana, and Promscale to gain insights into distributed systems. It summarizes OpenTelemetry for instrumentation, Promscale as an observability backend built on TimescaleDB that allows analyzing metrics, traces and business data together, and demonstrates this using a lightweight microservices demo that generates absurd passwords. The demo can be run locally and visualized in Grafana.
Distributed systems are not strictly an engineering problem. It’s far too easy to assume a backend development concern, but the reality is there are implications at every point in the stack. Often the trade-offs we make lower in the stack in order to buy responsiveness bubble up to the top—so much, in fact, that it rarely doesn’t impact the application in some way.
Distributed systems affect the user. We need to shift the focus from system properties and guarantees to business rules and application behavior. We need to understand the limitations and trade-offs at each level in the stack and why they exist. We need to assume failure and plan for recovery. We need to start thinking of distributed systems as a UX problem.
Tyler Treat looks at distributed systems through the lens of user experience, observing how architecture, design patterns, and business problems all coalesce into UX. Tyler also shares system design anti-patterns and alternative patterns for building reliable and scalable systems with respect to business outcomes.
Topic include:
- The “truth” can be prohibitively expensive: When does strong consistency make sense, and when does it not? How do we reconcile this with application UX?
- Failure as an inevitability: If we can’t build perfect systems, what is “good enough”?
- Dealing with partial knowledge: Systems usually operate in the real world (e.g., an inventory application for a widget warehouse). How do we design for the “disconnect” between the real world and the system?
Traditional Operations isn’t going away, it’s just retooling. The move from on-premise to cloud means Ops, in the classical sense, is largely being outsourced to cloud providers. What’s left is a thin but crucial slice between cloud providers and the products built by development teams, encompassing infrastructure and deployment automation, configuration management, log management, and monitoring and instrumentation—all through the lens of self-service.
Join me as I share my vision for the future of Operations as an organizational competency and how it relates to DevOps. We will discuss where industry practices are headed while sharing some real-world stories—the good and the bad—of applying these practices at Workiva. The intended outcome of this talk is to leave listeners with a better understanding of what an effective modern engineering organization looks like, including patterns and best practices, and the path to reaching it. The end goal is an organization which delivers value to customers reliably, efficiently, and continuously.
Ops is dead, long live Ops!
Observability has emerged as one of the hottest topics on the DevOps landscape. Organizations seek to improve visibility into their cloud infrastructure and applications and identify production issues that may negatively impact #customerexperience.
➡️ But what are some of the best practices for scaling observability for modernapplications?
➡️ What challenges are #cloudplatforms facing?
Explore how to overcome the challenges and unlock speed, observability, and automation across your DevOps lifecycle.
Observability in Java: Getting Started with OpenTelemetryDevOps.com
Our software is more complex than ever: applications must be reliable, predictable, and easy to use to meet modern expectations. As developers, this means our responsibilities have grown while the things we can control have stayed the same. In order to better understand our systems and create truly modern software, we need observability.
This workshop will walk through what observability means for Java developers and how to achieve it in our systems with the least amount of work using the open source observability project OpenTelemetry.
OpenTelemetry is a set of APIs, SDKs, tooling and integrations that are designed for the creation and management of telemetry data such as traces, metrics, and logs. It aims to enable effective observability by making high-quality, portable telemetry ubiquitous and vendor-agnostic. The OpenTelemetry Collector is an independent process that acts as a "universal agent" to collect, process, and export telemetry data in a highly performant and stable manner, supporting multiple types of telemetry through customizable pipelines consisting of receivers, processors, and exporters.
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...Splunk
With the acceleration of customer and business demands, site reliability engineers and IT Ops analysts now require operational visibility into their entire architecture, something that traditional APM tools, dev logging tools, and SRE tools aren’t equipped to provide. Observability enables you to inspect and understand your IT stack on premises and in the cloud(s); It’s no longer about whether your system works (monitoring), but being able to task why it is not working? (Observability). This presentation will outline key steps to take to move from monitoring to observability.
This document provides an overview of OpenTelemetry for operators. It discusses some of the limitations of current observability platforms and how OpenTelemetry addresses these issues. It introduces the OpenTelemetry project which combines distributed tracing, metrics, and logging APIs. It describes the OpenTelemetry Collector which receives, processes, and exports telemetry data. It provides examples of Collector configuration and running it in production. It also discusses some innovations in the observability space from vendors like Dynatrace, New Relic, Splunk SignalFX, and others.
The monolith to cloud-native, microservices evolution has driven a shift from monitoring to observability. OpenTelemetry, a merger of the OpenTracing and OpenCensus projects, is enabling Observability 2.0. This talk gives an overview of the OpenTelemetry project and then outlines some production-proven architectures for improving the observability of your applications and systems.
Combining logs, metrics, and traces for unified observabilityElasticsearch
Learn how Elasticsearch efficiently combines data in a single store and how Kibana is used to analyze it. Plus, see how recent developments help identify, troubleshoot, and resolve operational issues faster.
Free GitOps Workshop + Intro to Kubernetes & GitOpsWeaveworks
Follow along in this free workshop and experience GitOps!
AGENDA:
Welcome - Tamao Nakahara, Head of DX (Weaveworks)
Introduction to Kubernetes & GitOps - Mark Emeis, Principal Engineer (Weaveworks)
Weave Gitops Overview - Tamao Nakahara
Free Gitops Workshop - David Harris, Product Manager (Weaveworks)
If you're new to Kubernetes and GitOps, we'll give you a brief introduction to both and how GitOps is the natural evolution of Kubernetes.
Weave GitOps Core is a continuous delivery product to run apps in any Kubernetes. It is free and open source, and you can get started today!
https://www.weave.works/product/gitops-core
If you’re stuck, also come talk to us at our Slack channel! #weave-gitops http://bit.ly/WeaveGitOpsSlack (If you need to invite yourself to the Slack, visit https://slack.weave.works/)
George Kobar, a community advocate for Capgemini, shared information on observability at a Meetup for New Application Development. The document defines observability as the combination of monitoring, metrics, and logging. It presents a typical observability stack that collects data from various sources to provide visibility for development, operations, and business teams through tools that analyze application performance, uptime, logs, metrics, and business KPIs. The stack advocates an elastic approach to storing all operational data together in Elasticsearch for unified access and analysis.
This document provides an overview of Docker concepts including containers, images, Dockerfiles, and the Docker architecture. It defines key Docker terms like images, containers, and registries. It explains how Docker utilizes Linux kernel features like namespaces and control groups to isolate containers. It demonstrates how to run a simple Docker container and view logs. It also describes the anatomy of a Dockerfile and common Dockerfile instructions like FROM, RUN, COPY, ENV etc. Finally, it illustrates how Docker works by interacting with the Docker daemon, client and Docker Hub registry to build, run and distribute container images.
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native WayDatabricks
At Nielsen Identity, we use Apache Spark to process 10’s of TBs of data, running on AWS EMR. We started at a point where Spark was not even supported out-of-the-box by EMR, and today we’re spinning-up clusters with 1000’s of nodes on a daily basis, orchestrated by Airflow. A few months ago, we embarked on a journey to evaluate the option of using Kubernetes as our Spark infrastructure, mainly to reduce operational costs and improve stability (as we heavily rely on Spot Instances for our clusters). To allow us to achieve those goals, we combined the open-sourced GCP Spark-on-K8s operator (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) with a native Airflow integration we developed and recently contributed back to the Airflow project (https://issues.apache.org/jira/browse/AIRFLOW-6542). Finally, we were able to migrate our existing Airflow DAGs, with minimal changes, from AWS EMR to K8s.
By Tom Wilkie, delivered at London Microservices User Group on 2/12/15
The rise of microservice-based applications has had many knock-on effects, not least on the complexity of monitoring your application. Order-of-magnitude increase in the number of moving parts and rate of change of the application require us to reassess traditional monitoring techniques.
In this talk we will discuss some different approaches to monitoring, visualising and tracing containerised, microservices-based applications. We’ll present different techniques to some of the emergent problems, and try not to rant too much.
Elastic Observability is helping organizations drive their mean time to resolution toward zero with end-to-end visibility in a single platform. Hear about the latest features and capabilities at all layers — from ingest to insight — and get a glimpse into where we are headed.
This document summarizes a presentation about observability using Splunk. It includes an agenda introducing observability and why Splunk for observability. It discusses the need for modernization initiatives in companies and the thousands of changes required. It presents that Splunk provides end-to-end visibility across metrics, traces and logs to detect, troubleshoot and optimize systems. It shares a customer case study of Accenture using Splunk observability in their hybrid cloud environment. Finally, it concludes that observability with Splunk can drive results like reduced downtime and faster innovation.
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashAmazon Web Services
Version 7 of the Elastic Stack adds powerful new features to the popular open source platform for search, logging, and analytics. Come hear directly from Elastic engineers and architecture team members on powerful new additions like GIS functionality and frozen-tier search. Plus, hear about the full range of orchestration options for getting the most out of your deployments, however and wherever you choose to run them. This session is sponsored by Elastic.
This presentation starts with an introduction to the rationale behind automated deployments in Continuous Delivery and DevOps. Then, I compare agent-based architectures, such as Chef and Puppet with the agentless architecture of the server orchestration engine Ansible. The presentation concludes with an automated deployment of Dynatrace into a simulated production environment.
My contribution to the "Grafana & Friends" Meetup.
This presentation goes into the context in the Observability landscape, the basics of OpenTelemetry with its signals and lookout what to expect next.
Understand your system like never before with OpenTelemetry, Grafana, and Pro...LibbySchulze
This document discusses using OpenTelemetry, Grafana, and Promscale to gain insights into distributed systems. It summarizes OpenTelemetry for instrumentation, Promscale as an observability backend built on TimescaleDB that allows analyzing metrics, traces and business data together, and demonstrates this using a lightweight microservices demo that generates absurd passwords. The demo can be run locally and visualized in Grafana.
Distributed systems are not strictly an engineering problem. It’s far too easy to assume a backend development concern, but the reality is there are implications at every point in the stack. Often the trade-offs we make lower in the stack in order to buy responsiveness bubble up to the top—so much, in fact, that it rarely doesn’t impact the application in some way.
Distributed systems affect the user. We need to shift the focus from system properties and guarantees to business rules and application behavior. We need to understand the limitations and trade-offs at each level in the stack and why they exist. We need to assume failure and plan for recovery. We need to start thinking of distributed systems as a UX problem.
Tyler Treat looks at distributed systems through the lens of user experience, observing how architecture, design patterns, and business problems all coalesce into UX. Tyler also shares system design anti-patterns and alternative patterns for building reliable and scalable systems with respect to business outcomes.
Topic include:
- The “truth” can be prohibitively expensive: When does strong consistency make sense, and when does it not? How do we reconcile this with application UX?
- Failure as an inevitability: If we can’t build perfect systems, what is “good enough”?
- Dealing with partial knowledge: Systems usually operate in the real world (e.g., an inventory application for a widget warehouse). How do we design for the “disconnect” between the real world and the system?
Traditional Operations isn’t going away, it’s just retooling. The move from on-premise to cloud means Ops, in the classical sense, is largely being outsourced to cloud providers. What’s left is a thin but crucial slice between cloud providers and the products built by development teams, encompassing infrastructure and deployment automation, configuration management, log management, and monitoring and instrumentation—all through the lens of self-service.
Join me as I share my vision for the future of Operations as an organizational competency and how it relates to DevOps. We will discuss where industry practices are headed while sharing some real-world stories—the good and the bad—of applying these practices at Workiva. The intended outcome of this talk is to leave listeners with a better understanding of what an effective modern engineering organization looks like, including patterns and best practices, and the path to reaching it. The end goal is an organization which delivers value to customers reliably, efficiently, and continuously.
Ops is dead, long live Ops!
This document appears to be slides from a presentation on concurrency in Ruby applications. The slides discuss different concurrency models including blocking threads, callbacks, reactors, and fibers. They explore when concurrency is useful based on factors like context switching costs. Linear and mixed data dependencies are presented as examples to illustrate different concurrency interfaces and implementations using threads or asynchronous callbacks.
User & Device Identity for Microservices @ Netflix ScaleC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2S9tOgy.
Satyajit Thadeshwar provides useful insights on how Netflix implemented a secure, token-agnostic, identity solution that works with services operating at a massive scale. He shares some of the lessons learned from this process, both from architectural diagrams and code. Filmed at qconsf.com.
Satyajit Thadeshwar is an engineer on the Product Edge Access Services team at Netflix, where he works on some of the most critical services focusing on user and device authentication. He has more than a decade of experience building fault-tolerant and highly scalable, distributed systems.
Free The Enterprise With Ruby & Master Your Own DomainKen Collins
On the heals of Luis Lavena's RailsConf talk "Infiltrating Ruby Onto The Enterprise Death Star Using Guerilla Tactics" comes a local and frank talk about the current state of Open Source Software (OSS) participation from Windows developers. Learn what OSS is, what motivates its contributors, and how OSS can make you a stronger developer. Be prepared to fall in love with writing software again!
We will start off with a 101 introduction to both the Ruby programming language and the Ruby on Rails web application framework. You will learn about ActiveRecord, a powerful ORM that maps rich objects to your databases, and the latest components to use it with SQL Server. As a Rails core contributor and author of the SQL Server stack, I will give you a modern insight into both that will allow you to leverage your legacy data with Ruby.
Lastly, I will review the bleeding edge tools being actively created for Windows developers to ease the transition to Ruby, Rails and OSS from a POSIX driven world. Many things have changed. It is time to learn and perform some occupational maintenance.
Cilium:: Application-Aware Microservices via BPFCynthia Thomas
Intro to Cilium Microservices Security with Kubernetes Integration
Open Source Cilium website: cilium.io
GH: github.com/cilium/cilium
Join our Slack! cilium.herokuapp.com
Follow us on Twitter!
@ciliumproject
@_techcet_
Cameron Dutro introduces Kuby, which is an ActiveDeployment tool for Rails applications that packages and deploys apps into a Kubernetes cluster. Kuby aims to make deployment easy with minimal configuration, while also supporting major cloud providers and being native to Rails. It handles tasks like provisioning databases and acquiring SSL certificates automatically. The talk outlines the history of deployment methods and why Kubernetes provides an extensible platform. Kuby builds on concepts from tools like Capistrano but abstracts more details by treating servers collectively in a Kubernetes cluster.
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...Laurent Bernaille
Kubernetes is a very powerful and complicated system, and many users don’t understand the underlying systems. Come learn how your users can abuse container runtimes, overwhelm your control plane, and cause outages - it’s actually quite easy!
In the last year, we have containerized hundreds of applications and deployed them in large scale clusters (more than 1000 nodes). The journey was eventful and we learned a lot along the way. We’ll share stories of our ten favorite Kubernetes foot guns, including the dangers of cargo culting, rolling updates gone wrong, the pitfalls of initContainers, and nightmarish daemonset upgrades. The talk will present solutions we adopted to avoid or work around some these problems and will finally show several improvements we plan deploy in the future.
Similar to the Kubecon talk with the same title with a few new incidents.
Tools, Tips and Techniques for Developing Real-time Apps. Phil LeggetterFuture Insights
FOWA London 2015
It's 2015 and we've all got real-time data coursing through our apps; the life-blood of their instantly updating, interactive and engaging user experiences. We're also all much more aware of development best practices and how tooling can assist this process. Many of these practices can also be applied when building realtime apps, but there are some tools and techniques that are more prevalent, and some that are unique, when working with real-time frameworks and data. In this talk I'll cover the tools, tips and techniques - from client to server - that I've found valuable when developing realtime apps.
The document discusses the pros and cons of microservices architecture. It notes that microservices can improve developer productivity by allowing code to be separated into smaller, independent services that are more manageable. However, microservices also introduce more complexity, as each service requires its own deployment and operational tasks. The document cautions that a monolithic architecture may be better unless a system is too complex to manage as a single application. It provides examples of challenges that can arise with microservices like distributed transactions, eventual consistency, and increased dependency on tools for service discovery, load balancing, and orchestration.
Elasticsearch : petit déjeuner du 13 mars 2014ALTER WAY
Elasticsearch est un moteur de recherche Open Source très puissant basé sur
Apache Lucene. Il permet l'indexation de millions de données, leur recherche et leur
analyse en temps réel. Les outils Elascticsearch sont déjà utilisés par des acteurs de
référence tels que FourSquare, GitHub, OpenDataSoft ou encore Dailymotion.
Alter Way et Elasticsearch vous convient à venir découvrir la suite Elasticsearch
enfin disponible en version 1.0 et prête pour la production !
Eric Lubow gave a presentation on how SimpleReach fixed problems with their MongoDB implementation. They implemented a sharded replica set architecture across availability zones for high availability and speed. They improved data accuracy by separating databases and enforcing consistent access patterns. SimpleReach also implemented a controlled data flow using NSQ to batch and route data between MongoDB, Cassandra, Vertica, and other tools for analytics and real-time usage. Their architecture provides redundancy, minimal downtime for changes, and monitors performance using tools like Nagios, Statsd and Cloudwatch.
The New Frontier: Optimizing Big Data ExplorationInside Analysis
The Briefing Room with Dr. Robin Bloor and Cirro
Live Webcast on February 11, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=0ec1fa381886313cc06d841015c65898
As information ecosystems continue to expand, businesses are searching for ways to combine traditional analytics with a new source of insight: Big Data. But with data flooding in from all kinds of sources, fast access and performance at scale can easily become an issue. One effective approach for solving this challenge is data federation, a method that involves taking the analytical processing to the data, allowing streamlined access to multiple data sources without the expensive ETL overhead or building of semantic layers.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor as he explains how the prevalence of distributed data calls for a new approach to Big Data. He will be briefed by Mark Theissen of Cirro, who will tout his company’s Data Hub, a data federation solution that provides a single point of access to all enterprise data assets without excessive data movements, preprocessing or staging. He will discuss how data federation differs from virtualization and ETL approaches, and demonstrate how a Cirro deployment solves the analytics challenge of integrating data silos across the data center – and the cloud – using the BI tools you already have on your desktop for real-time distributed analytics.
Visit InsideAnlaysis.com for more information.
What to do when you must monitor the whole infrastructure of the biggest European hosting and cloud provider? How to choose a tool when the most used ones fail to scale to your needs? How to build an Metrics platform to unify, conciliate and replace years of fragmented legacy partial solutions? In this talk we will relate our experience building and maintaining OVH Metrics, the platform used to monitor all OVH infrastructure. We needed to go to places where most monitoring solutions hadn’t gone before, it needed to operate at the scale of the biggest European hosting and cloud providers.
Rocking the microservice world with Helidon-LAOUCTour2023.pdfAlberto Salazar
In the banking industry, we have a lot of business logic running in the old fashion way as monolith enterprise applications; take a look from 0 to how you can work with last Java Version and Helidon to move forward your application to the Oracle Cloud.
Philip Lombardi discusses Datawire's experience using Spinnaker for continuous deployment of microservices. While Spinnaker allows for custom deployment workflows and works as promised, Datawire encountered issues with Spinnaker's complex UI, difficulty reconfiguring and upgrading, and slow developer experience. Lombardi concludes that Spinnaker may be overkill for small teams and its deployment, UI, and configuration need improvement for broader adoption.
Cybereason - behind the HackingTeam infection serverAmit Serper
On July of 2015, Italian cybersecurity solutions vendor "HackingTeam" was breached and more than 400 gigabytes of HackingTeam's most sensitive data leaked to the internet. Security researchers Amit Serper and Alex Frazer from Cybereason were one of the first to study the datadump and to publish information about. The research was quoted in several tech news sites such as Ars Technica. The research was also published in Hebrew in the DigitalWhisper e-zine, On the cybereason blog as an e-book (in english) and on public free lectures in Tel-aviv by the researchers themselves. The following slide deck is from that lecture.
Upgrading Made Easy: Moving to InfluxDB 2.x or InfluxDB Cloud with Cribl LogS...InfluxData
Many organizations agree that migrating workloads to the cloud or to a newer version of existing tooling can result in cost savings and flexibility. A well-designed observability pipeline is often the key to a quick and painless transition, leading to positive impacts on cost optimization, data visibility, and performance. Cribl’s LogStream product helps teams implement such an observability pipeline.
In this hands-on technical discussion, the audience will learn how to leverage Cribl LogStream to successfully upgrade from InfluxDB 1.x to InfluxDB 2.x or move to InfluxDB Cloud. Join us as we walk through the pros and cons of workload migration, share architecture best practices, and give a live demo on how to combine Cribl LogStream with the latest version of InfluxDB.
Talk about how and why we decided to migrate from a monolithic applocation to microservices at seedtag and how we solved the complexities that we found
Building a Distributed Message Log from Scratch - SCaLE 16xTyler Treat
Apache Kafka has shown that the log is a powerful abstraction for data-intensive applications. It can play a key role in managing data and distributing it across the enterprise efficiently. Vital to any data plane is not just performance, but availability and scalability. In this session, we examine what a distributed log is, how it works, and how it can achieve these goals. Specifically, we'll discuss lessons learned while building NATS Streaming, a reliable messaging layer built on NATS that provides similar semantics. We'll cover core components like leader election, data replication, log persistence, and message delivery. Come learn about distributed systems!
Building a Distributed Message Log from ScratchTyler Treat
Apache Kafka has shown that the log is a powerful abstraction for data-intensive applications. It can play a key role in managing data and distributing it across the enterprise efficiently. Vital to any data plane is not just performance, but availability and scalability. In this session, we examine what a distributed log is, how it works, and how it can achieve these goals. Specifically, we'll discuss lessons learned while building NATS Streaming, a reliable messaging layer built on NATS that provides similar semantics. We'll cover core components like leader election, data replication, log persistence, and message delivery. Come learn about distributed systems!
This document contains the transcript from a presentation titled "So You Wanna Go Fast?" by Tyler Treat. Some of the key topics discussed include measuring performance using tools like pprof, how different language features in Go like channels, interfaces, and memory management can impact performance, and techniques for writing concurrent and multi-core friendly code in Go like using read-write mutexes. The overall message is that performance depends greatly on the specific situation and trade-offs must be considered between concurrency, memory usage, and execution speed. Measuring first is emphasized to guide any optimizations.
This document summarizes a talk given by Tyler Treat about using simple solutions for complex distributed systems problems. Some key points:
- Distributed systems are inherently asynchronous and unreliable, but many try to build them as if they are synchronous.
- Exact delivery guarantees are expensive and impossible at scale. Replayable and idempotent delivery are better alternatives.
- NATS is a simple, high performance, and highly available messaging system that embraces asynchronous communication.
- Workiva uses NATS as a messaging backplane between microservices for pub/sub, RPC, and load balancing. Running a local NATS daemon per VM improves performance.
Probabilistic algorithms for fun and pseudorandom profitTyler Treat
There's an increasing demand for real-time data ingestion and processing. Systems like Apache Kafka, Samza, and Storm have become popular for this reason. This type of high-volume, online data processing presents an interesting set of new challenges, namely, how do we drink from the firehose without getting drenched? Explore some of the fundamental primitives used in stream processing and, specifically, how we can use probabilistic methods to solve the problem.
The Economics of Scale: Promises and Perils of Going DistributedTyler Treat
What does it take to scale a system? We'll learn how going distributed can pay dividends in areas like availability and fault tolerance by examining a real-world case study. However, we will also look at the inherent pitfalls. When it comes to distributed systems, for every promise there is a peril.
From Mainframe to Microservice: An Introduction to Distributed SystemsTyler Treat
An introductory overview of distributed systems—what they are and why they're difficult to build. We explore fundamental ideas and practical concepts in distributed programming. What is the CAP theorem? What is distributed consensus? What are CRDTs? We also look at options for solving the split-brain problem while considering the trade-off of high availability as well as options for scaling shared data.
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Crescat
Crescat is industry-trusted event management software, built by event professionals for event professionals. Founded in 2017, we have three key products tailored for the live event industry.
Crescat Event for concert promoters and event agencies. Crescat Venue for music venues, conference centers, wedding venues, concert halls and more. And Crescat Festival for festivals, conferences and complex events.
With a wide range of popular features such as event scheduling, shift management, volunteer and crew coordination, artist booking and much more, Crescat is designed for customisation and ease-of-use.
Over 125,000 events have been planned in Crescat and with hundreds of customers of all shapes and sizes, from boutique event agencies through to international concert promoters, Crescat is rigged for success. What's more, we highly value feedback from our users and we are constantly improving our software with updates, new features and improvements.
If you plan events, run a venue or produce festivals and you're looking for ways to make your life easier, then we have a solution for you. Try our software for free or schedule a no-obligation demo with one of our product specialists today at crescat.io
WWDC 2024 Keynote Review: For CocoaCoders AustinPatrick Weigel
Overview of WWDC 2024 Keynote Address.
Covers: Apple Intelligence, iOS18, macOS Sequoia, iPadOS, watchOS, visionOS, and Apple TV+.
Understandable dialogue on Apple TV+
On-device app controlling AI.
Access to ChatGPT with a guest appearance by Chief Data Thief Sam Altman!
App Locking! iPhone Mirroring! And a Calculator!!
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
How Can Hiring A Mobile App Development Company Help Your Business Grow?ToXSL Technologies
ToXSL Technologies is an award-winning Mobile App Development Company in Dubai that helps businesses reshape their digital possibilities with custom app services. As a top app development company in Dubai, we offer highly engaging iOS & Android app solutions. https://rb.gy/necdnt
Transform Your Communication with Cloud-Based IVR SolutionsTheSMSPoint
Discover the power of Cloud-Based IVR Solutions to streamline communication processes. Embrace scalability and cost-efficiency while enhancing customer experiences with features like automated call routing and voice recognition. Accessible from anywhere, these solutions integrate seamlessly with existing systems, providing real-time analytics for continuous improvement. Revolutionize your communication strategy today with Cloud-Based IVR Solutions. Learn more at: https://thesmspoint.com/channel/cloud-telephony
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
What is Master Data Management by PiLog Groupaymanquadri279
PiLog Group's Master Data Record Manager (MDRM) is a sophisticated enterprise solution designed to ensure data accuracy, consistency, and governance across various business functions. MDRM integrates advanced data management technologies to cleanse, classify, and standardize master data, thereby enhancing data quality and operational efficiency.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
Most important New features of Oracle 23c for DBAs and Developers. You can get more idea from my youtube channel video from https://youtu.be/XvL5WtaC20A
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
Measures in SQL (SIGMOD 2024, Santiago, Chile)Julian Hyde
SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context.
A talk at SIGMOD, June 9–15, 2024, Santiago, Chile
Authors: Julian Hyde (Google) and John Fremlin (Google)
https://doi.org/10.1145/3626246.3653374
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
E-commerce Development Services- Hornet DynamicsHornet Dynamics
For any business hoping to succeed in the digital age, having a strong online presence is crucial. We offer Ecommerce Development Services that are customized according to your business requirements and client preferences, enabling you to create a dynamic, safe, and user-friendly online store.
38. @tyler_treat
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
Asia Pacific
BI Server BI Server
Microservice Microservice
Microservice Microservice
39. @tyler_treat
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
CDN
40. @tyler_treat
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
CDN
Infrastructure
Load Balancers Orchestrators DNS Configuration . . .
41. @tyler_treat
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
CDN
CI/CD
Repo Repo Repo Repo
Builder Builder Builder
Builder Builder Builder
Artifacts Artifacts Artifacts
Deployer Deployer
Infrastructure
Load Balancers Orchestrators DNS Configuration . . .
44. @tyler_treat
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
CDN
CI/CD
Repo Repo Repo Repo
Builder Builder Builder
Builder Builder Builder
Artifacts Artifacts Artifacts
Deployer Deployer
Infrastructure
Load Balancers Orchestrators DNS Configuration . . .
45. @tyler_treat
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
CDN
CI/CD
Repo Repo Repo Repo
Builder Builder Builder
Builder Builder Builder
Artifacts Artifacts Artifacts
Deployer Deployer
“DevOps”
Infrastructure
Load Balancers Orchestrators DNS Configuration . . .
50. @tyler_treat
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
CDN
CI/CD
Repo Repo Repo Repo
Builder Builder Builder
Builder Builder Builder
Artifacts Artifacts Artifacts
Deployer Deployer
Infrastructure
Load Balancers Orchestrators DNS Configuration . . .
“DevOps”
55. @tyler_treat
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
CDN
CI/CD
Repo Repo Repo Repo
Builder Builder Builder
Builder Builder Builder
Artifacts Artifacts Artifacts
Deployer Deployer
Infrastructure
Load Balancers Orchestrators DNS Configuration . . .
“DevOps”
57. @tyler_treat
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
North America
BI Server BI Server
Microservice Microservice
Microservice Microservice
CDN
CI/CD
Repo Repo Repo Repo
Builder Builder Builder
Builder Builder Builder
Artifacts Artifacts Artifacts
Deployer Deployer
Infrastructure
Load Balancers Orchestrators DNS Configuration . . .
“DevOps”
85. @tyler_treat
Data Available
Understanding
Known Knowns
• Things we are aware of and understand
• “The system has a 1GB memory limit”
Known Unknowns
• Things we are aware of but don’t
understand
• “The system exceeded its memory limit
and crashed, causing an outage”
86. @tyler_treat
Data Available
Understanding
Unknown Knowns
• Things we understand but are not
aware of
• “We implemented an orchestrator to
ensure the system is always running”
Known Knowns
• Things we are aware of and understand
• “The system has a 1GB memory limit”
Known Unknowns
• Things we are aware of but don’t
understand
• “The system exceeded its memory limit
and crashed, causing an outage”
87. @tyler_treat
Data Available
Understanding
Unknown Knowns
• Things we understand but are not
aware of
• “We implemented an orchestrator to
ensure the system is always running”
Known Knowns
• Things we are aware of and understand
• “The system has a 1GB memory limit”
Unknown Unknowns
• Things we are neither aware of nor
understand
• “Instances churn because the
orchestrator restarts the process when
it approaches its memory limit, causing
sporadic failures and slowdowns”
Known Unknowns
• Things we are aware of but don’t
understand
• “The system exceeded its memory limit
and crashed, causing an outage”
88. @tyler_treat
Data Available
Understanding
Unknown Knowns
• Things we understand but are not
aware of
• “We implemented an orchestrator to
ensure the system is always running”
Known Knowns
• Things we are aware of and understand
• “The system has a 1GB memory limit”
Unknown Unknowns
• Things we are neither aware of nor
understand
• “Instances churn because the
orchestrator restarts the process when
it approaches its memory limit, causing
sporadic failures and slowdowns”
Known Unknowns
• Things we are aware of but don’t
understand
• “The system exceeded its memory limit
and crashed, causing an outage”
FACTS
89. @tyler_treat
Data Available
Understanding
Unknown Knowns
• Things we understand but are not
aware of
• “We implemented an orchestrator to
ensure the system is always running”
Known Knowns
• Things we are aware of and understand
• “The system has a 1GB memory limit”
Unknown Unknowns
• Things we are neither aware of nor
understand
• “Instances churn because the
orchestrator restarts the process when
it approaches its memory limit, causing
sporadic failures and slowdowns”
Known Unknowns
• Things we are aware of but don’t
understand
• “The system exceeded its memory limit
and crashed, causing an outage”
FACTS
HYPOTHESES
90. @tyler_treat
Data Available
Understanding
Unknown Knowns
• Things we understand but are not
aware of
• “We implemented an orchestrator to
ensure the system is always running”
Known Knowns
• Things we are aware of and understand
• “The system has a 1GB memory limit”
Unknown Unknowns
• Things we are neither aware of nor
understand
• “Instances churn because the
orchestrator restarts the process when
it approaches its memory limit, causing
sporadic failures and slowdowns”
Known Unknowns
• Things we are aware of but don’t
understand
• “The system exceeded its memory limit
and crashed, causing an outage”
ASSUMPTIONS FACTS
HYPOTHESES
91. @tyler_treat
Unknown Unknowns
• Things we are neither aware of nor
understand
• “Instances churn because the
orchestrator restarts the process when
it approaches its memory limit, causing
sporadic failures and slowdowns”
DISCOVERIES
Data Available
Understanding
Unknown Knowns
• Things we understand but are not
aware of
• “We implemented an orchestrator to
ensure the system is always running”
Known Knowns
• Things we are aware of and understand
• “The system has a 1GB memory limit”
Known Unknowns
• Things we are aware of but don’t
understand
• “The system exceeded its memory limit
and crashed, causing an outage”
ASSUMPTIONS FACTS
HYPOTHESES
92. @tyler_treat
Unknown Unknowns
• Things we are neither aware of nor
understand
• “Instances churn because the
orchestrator restarts the process when
it approaches its memory limit, causing
sporadic failures and slowdowns”
DISCOVERIES
Data Available
Understanding
Known Unknowns
• Things we are aware of but don’t
understand
• “The system exceeded its memory limit
and crashed, causing an outage”
HYPOTHESES
MonitoringObservability
93. @tyler_treat
Unknown Unknowns
• Things we are neither aware of nor
understand
• “Instances churn because the
orchestrator restarts the process when
it approaches its memory limit, causing
sporadic failures and slowdowns”
DISCOVERIES
Data Available
Understanding
Known Unknowns
• Things we are aware of but don’t
understand
• “The system exceeded its memory limit
and crashed, causing an outage”
HYPOTHESES
TestingExploring
96. @tyler_treat
Some
challenges…
Observability Data
application logs
system logs
audit logs
application metrics
distributed traces
events
- Locked up inside a single vendor’s solution
- Not readily available across the enterprise
(or in some cases, too readily available)
- Many tools and products needed for
different data and use cases
- Tool and data needs vary from team to
team
- Ever-changing landscape of tools, products,
and services
- Sheer volume of data can be overwhelming
114. System
Sumo Logic
Collector
Universal
Analytics Client
S3 Client
…
New Relic APM
Agent
System
Sumo Logic
Collector
Universal
Analytics Client
S3 Client
…
New Relic APM
Agent
System
Sumo Logic
Collector
Universal
Analytics Client
S3 Client
…
New Relic APM
Agent
System
Sum
Co
Universal
Analytics Client
S3 Client
…
New R
A
System
Sumo Logic
Collector
Universal
Analytics Client
S3 Client
…
New Relic APM
Agent
System
Sumo Logic
Collector
Universal
Analytics Client
S3 Client
…
New Relic APM
Agent
System
Sumo Logic
Collector
Universal
Analytics Client
S3 Client
…
New Relic APM
Agent
System
Sum
Co
Universal
Analytics Client
S3 Client
…
New R
A
Sumo Logic
Collector
Universal
Analytics Client
Sumo Logic
Collector
Universal
Analytics Client
Sumo Logic
Collector
Universal
Analytics Client
Sum
Co
Universal
Analytics Client
System System System System
116. System
Sumo Logic
Collector
Universal
Analytics Client
S3 Client
…
New Relic APM
Agent
System
Sumo Logic
Collector
Universal
Analytics Client
S3 Client
…
New Relic APM
Agent
System
Sumo Logic
Collector
Universal
Analytics Client
S3 Client
…
New Relic APM
Agent
System
Sum
Co
Universal
Analytics Client
S3 Client
…
New R
A
System
Sumo Logic
Collector
Universal
Analytics Client
S3 Client
…
New Relic APM
Agent
System
Sumo Logic
Collector
Universal
Analytics Client
S3 Client
…
New Relic APM
Agent
System
Sumo Logic
Collector
Universal
Analytics Client
S3 Client
…
New Relic APM
Agent
System
Sum
Co
Universal
Analytics Client
S3 Client
…
New R
A
Sumo Logic
Collector
Universal
Analytics Client
Sumo Logic
Collector
Universal
Analytics Client
Sumo Logic
Collector
Universal
Analytics Client
Sum
Co
Universal
Analytics Client
System System System System
118. System
Sumo Logic
Collector
Universal
Analytics Client
S3 Client
…
New Relic APM
Agent
System
Sumo Logic
Collector
Universal
Analytics Client
S3 Client
…
New Relic APM
Agent
System
Sumo Logic
Collector
Universal
Analytics Client
S3 Client
…
New Relic APM
Agent
System
Sum
Co
Universal
Analytics Client
S3 Client
…
New R
A
System
Sumo Logic
Collector
Universal
Analytics Client
S3 Client
…
New Relic APM
Agent
System
Sumo Logic
Collector
Universal
Analytics Client
S3 Client
…
New Relic APM
Agent
System
Sumo Logic
Collector
Universal
Analytics Client
S3 Client
…
New Relic APM
Agent
System
Sum
Co
Universal
Analytics Client
S3 Client
…
New R
A
Sumo Logic
Collector
Universal
Analytics Client
Sumo Logic
Collector
Universal
Analytics Client
Sumo Logic
Collector
Universal
Analytics Client
Sum
Co
Universal
Analytics Client
System System System System
Honeytail AgentHoneytail Agent Honeytail Agent Honey
Honeytail Agent Honeytail Agent Honeytail Agent Honey
158. @tyler_treat
We need a component to consume data
from the pipeline, perform filtering, and
write it to the appropriate backends.
5. Data Router
159. @tyler_treat
May perform transformations and processing of data,
but heavy processing should be the responsibility of a
backend system (e.g. alerting or aggregations).
171. @tyler_treat
Evolving to an Observability Pipeline
• Adopt structured logging
• Move log/data collection out of process
• Use a centralized logging system
• Introduce a streaming data solution
• Start adding data consumers
182. @tyler_treat
Benefits
• Pattern can be evolved to with quick wins along the way
• Maps to elastic and serverless architectures better
• Empowers teams in siloed organizations and unlocks data for other parts
of the business
• Enables teams to use the tools best suited to their needs
• Easier to change tools or evaluate them side-by-side by decoupling
• Minimizes impact on developers and the core system
184. @tyler_treat
Downsides
• Moving away from agent-based model means we have to handle data
routing ourselves
• A lot of the Data Router components might need to be custom-made
using various vendor SDKs or client libraries (assuming they have
APIs)
• This also means we might lose some of the value-add features of
certain agents
• Unclear how well this maps to pull-based models (e.g. Prometheus)