Running Kubernetes at scale is challenging and you can often end up in situations where you have to debug complex and unexpected issues. This requires understanding in detail how the different components work and interact with each other. Over the last 3 years, Datadog migrated most of its workloads to Kubernetes and now manages dozens of clusters consisting of thousands of nodes each. During this journey, engineers have debugged complex issues with root causes that were sometimes very surprising. In this talk Laurent and Tabitha will share some of these stories, including a favorite: how a complex interaction between familiar Kubernetes components allowed an OOM-killer invocation to trigger the deletion of a namespace.
DNS is one of the Kubernetes core systems and can quickly become a source of issues when you’re running clusters at scale. For over a year at Datadog, we’ve run Kubernetes clusters with thousands of nodes that host workloads generating tens of thousands of DNS queries per second. It wasn’t easy to build an architecture able to handle this load, and we’ve had our share of problems along the way.
This talk starts with a presentation of how Kubernetes DNS works. It then dives into the challenges we’ve faced, which span a variety of topics related to load, connection tracking, upstream servers, rolling updates, resolver implementations, and performance. We then show how our DNS architecture evolved over time to address or mitigate these problems. Finally, we share our solutions for detecting these problems before they happen—and identifying misbehaving clients.
The Kubernetes audit logs are a rich source of information: all of the calls made to the API server are stored, along with additional metadata such as usernames, timings, and source IPs. They help to answer questions such as “What is overloading my control plane?” or “Which sequence of events led to this problematic situation?”. These questions are hard to answer otherwise—especially in large clusters. At Datadog, we have been running clusters with 1000+ nodes for more than a year and during that time, the audit logs have proved invaluable.
In this presentation, we will first introduce the audit logs, explain how they are configured, and review the type of data they store. Finally, we will describe in detail several scenarios where they have helped us to diagnose complex problems.
Kube-proxy enables access to Kubernetes services (virtual IPs backed by pods) by configuring client-side load-balancing on nodes. The first implementation relied on a userspace proxy which was not very performant. The second implementation used iptables and is still the one used in most Kubernetes clusters. Recently, the community introduced an alternative based on IPVS. This talk will start with a description of the different modes and how they work. It will then focus on the IPVS implementation, the improvements it brings, the issues we encountered and how we fixed them as well as the remaining challenges and how they could be addressed. Finally, the talk will present alternative solutions based on eBPF such as Cilium.
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...Laurent Bernaille
Kubernetes is a very powerful and complicated system, and many users don’t understand the underlying systems. Come learn how your users can abuse container runtimes, overwhelm your control plane, and cause outages - it’s actually quite easy!
In the last year, we have containerized hundreds of applications and deployed them in large scale clusters (more than 1000 nodes). The journey was eventful and we learned a lot along the way. We’ll share stories of our ten favorite Kubernetes foot guns, including the dangers of cargo culting, rolling updates gone wrong, the pitfalls of initContainers, and nightmarish daemonset upgrades. The talk will present solutions we adopted to avoid or work around some these problems and will finally show several improvements we plan deploy in the future.
Similar to the Kubecon talk with the same title with a few new incidents.
Running large Kubernetes clusters is challenging. This talk focus on how you can optimize your network setup in clusters with 1000-2000 nodes. It discusses standard ingresses solutions and their drawbacks as well as potential solutions
This presentation describes the challenges we faced building, scaling and operating a Kubernetes cluster of more than 1000 nodes to host the Datadog applications
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!Laurent Bernaille
Kubernetes is a very powerful and complicated system, and many users don’t understand the underlying systems. Come learn how your users can abuse container runtimes, overwhelm your control plane, and cause outages - it’s actually quite easy!
In the last year, we have containerized hundreds of applications and deployed them in large scale clusters (more than 1000 nodes). The journey was eventful and we learned a lot along the way. We’ll share stories of our ten favorite Kubernetes foot guns, including the dangers of cargo culting, rolling updates gone wrong, the pitfalls of initContainers, and nightmarish daemonset upgrades. The talk will present solutions we adopted to avoid or work around some these problems and will finally show several improvements we plan deploy in the future.
DNS is one of the Kubernetes core systems and can quickly become a source of issues when you’re running clusters at scale. For over a year at Datadog, we’ve run Kubernetes clusters with thousands of nodes that host workloads generating tens of thousands of DNS queries per second. It wasn’t easy to build an architecture able to handle this load, and we’ve had our share of problems along the way.
This talk starts with a presentation of how Kubernetes DNS works. It then dives into the challenges we’ve faced, which span a variety of topics related to load, connection tracking, upstream servers, rolling updates, resolver implementations, and performance. We then show how our DNS architecture evolved over time to address or mitigate these problems. Finally, we share our solutions for detecting these problems before they happen—and identifying misbehaving clients.
The Kubernetes audit logs are a rich source of information: all of the calls made to the API server are stored, along with additional metadata such as usernames, timings, and source IPs. They help to answer questions such as “What is overloading my control plane?” or “Which sequence of events led to this problematic situation?”. These questions are hard to answer otherwise—especially in large clusters. At Datadog, we have been running clusters with 1000+ nodes for more than a year and during that time, the audit logs have proved invaluable.
In this presentation, we will first introduce the audit logs, explain how they are configured, and review the type of data they store. Finally, we will describe in detail several scenarios where they have helped us to diagnose complex problems.
Kube-proxy enables access to Kubernetes services (virtual IPs backed by pods) by configuring client-side load-balancing on nodes. The first implementation relied on a userspace proxy which was not very performant. The second implementation used iptables and is still the one used in most Kubernetes clusters. Recently, the community introduced an alternative based on IPVS. This talk will start with a description of the different modes and how they work. It will then focus on the IPVS implementation, the improvements it brings, the issues we encountered and how we fixed them as well as the remaining challenges and how they could be addressed. Finally, the talk will present alternative solutions based on eBPF such as Cilium.
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...Laurent Bernaille
Kubernetes is a very powerful and complicated system, and many users don’t understand the underlying systems. Come learn how your users can abuse container runtimes, overwhelm your control plane, and cause outages - it’s actually quite easy!
In the last year, we have containerized hundreds of applications and deployed them in large scale clusters (more than 1000 nodes). The journey was eventful and we learned a lot along the way. We’ll share stories of our ten favorite Kubernetes foot guns, including the dangers of cargo culting, rolling updates gone wrong, the pitfalls of initContainers, and nightmarish daemonset upgrades. The talk will present solutions we adopted to avoid or work around some these problems and will finally show several improvements we plan deploy in the future.
Similar to the Kubecon talk with the same title with a few new incidents.
Running large Kubernetes clusters is challenging. This talk focus on how you can optimize your network setup in clusters with 1000-2000 nodes. It discusses standard ingresses solutions and their drawbacks as well as potential solutions
This presentation describes the challenges we faced building, scaling and operating a Kubernetes cluster of more than 1000 nodes to host the Datadog applications
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!Laurent Bernaille
Kubernetes is a very powerful and complicated system, and many users don’t understand the underlying systems. Come learn how your users can abuse container runtimes, overwhelm your control plane, and cause outages - it’s actually quite easy!
In the last year, we have containerized hundreds of applications and deployed them in large scale clusters (more than 1000 nodes). The journey was eventful and we learned a lot along the way. We’ll share stories of our ten favorite Kubernetes foot guns, including the dangers of cargo culting, rolling updates gone wrong, the pitfalls of initContainers, and nightmarish daemonset upgrades. The talk will present solutions we adopted to avoid or work around some these problems and will finally show several improvements we plan deploy in the future.
Ara Pulido, Datadog -
Container technologies, although not new, have increased their popularity in the past few years, with container orchestrators allowing companies around the world to adopt these technologies to help them ship and scale microservices with precision and velocity. Kubernetes is currently the most popular container orchestration platform, and while many organizations are migrating their workloads to it, Kubernetes is still relatively immature. New corner cases, errors, and quirks are regularly discovered as users push the boundaries of size and scale. When Datadog adopted Kubernetes we discovered some of these boundaries the hard way, and we continuously challenge and modify our infrastructure decisions in order to fit our use case. Join me in this talk for our story on what we learned while we scaled our Kubernetes clusters, the contributions to Kubernetes we made along the way, and how you can apply those learnings when growing your Kubernetes clusters from a handful to hundreds or thousands of nodes.
Working with Ansible and AWS together. Provisioning servers, setting up Cloudwatch alarms automatically, setting up Route53 records and a simple Autoscaling workflow.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 80+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are Open Source and can be applied to most Kubernetes deployments.
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014Amazon Web Services
Tuning your EC2 web server will help you to improve application server throughput and cost-efficiency as well as reduce request latency. In this session we will walk through tactics to identify bottlenecks using tools such as CloudWatch in order to drive the appropriate allocation of EC2 and EBS resources. In addition, we will also be reviewing some performance optimizations and best practices for popular web servers such as Nginx and Apache in order to take advantage of the latest EC2 capabilities.
Jessica Gadling is a Software Engineer at OpenDNS. She gave a talk and demo at OpenLate (http://www.meetup.com/OpenLate/) on October 21st, 2014 on why Docker was chosen as a central component in OpenDNS's internal PaaS Quadra.
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...Henning Jacobs
Bootstrapping a Kubernetes cluster is easy, rolling it out to nearly 200 engineering teams and operating it at scale is a challenge. In this talk, we are presenting our approach to Kubernetes provisioning on AWS, operations and developer experience for our growing Zalando developer base. We will walk you through our horror stories of operating 80+ clusters and share the insights we gained from incidents, failures, user reports and general observations. Most of our learnings apply to other Kubernetes infrastructures (EKS, GKE, ..) as well. This talk strives to reduce the audience’s unknown unknowns about running Kubernetes in production.
https://2018.container.camp/uk/schedule/running-kubernetes-in-production-a-million-ways-to-crash-your-cluster/
EFK Stack이란 ElasticSearch, Fluentd, Kibana라는 오픈소스의 조합으로, 방대한 양의 데이터를 신속하고 실시간으로 수집/저장/분석/시각화 할 수 있는 솔루션입니다. 특히 컨테이너 환경에서 로그 수집을 위해 주로 사용되는 기술 스택입니다.
Elasitc Stack에 대한 소개와 EFK Stack 설치 방법에 대해 설명합니다.
Programmable network connectivity and network overlay technologies like Docker libnetwork, Weave Net, and Calico are essential tools for DevOps engineers using orchestration tools to manage and deploy Docker containers in production. Because network troubleshooting and optimization falls within the jurisdiction of DevOps, it’s vital that DevOps engineers understand exactly how network overlays work. Participants will learn the fundamentals of container networking, see practical examples of common network overlays, and receive guidance on effectively using and tuning network overlays.
High Performance Erlang - Pitfalls and SolutionsYinghai Lu
Presented at Erlang Factory 2016, San Francisco, CA.
Erlang is widely used for building concurrent applications. However, when we push the performance of our Erlang based application to handle millions of concurrent clients, some Erlang scalability issues begin to show and some conventional programming paradigm of Erlang no longer hold. We would like to share some of these issue and how we address them. In addition, we share some of our experience on how to profile an Erlang application to identify bottlenecks.
We will take a deep look at some of the basic mechanisms of Erlang and show how they behave under high load and parallelism, which includes message delivery, process management and shared data structures such as maps and ETS tables. We will demonstrate their limitations and propose techniques to alleviate the issues.
We will also share profiling techniques on how to find those bottlenecks in Erlang applications across different levels. We will share techniques for writing highly performant Erlang applications.
In this meetup, Liran Cohen, Cloud platform & DevOps Team Leader, will talk about some of Kubernetes key concepts. We will learn about the architecture of the system; the different resources available in the system; the problems it’s trying to solve, and the model that it uses to manage containerized application deployments.
Talk held at DevOps Gathering 2019 in Bochum on 2019-03-13.
Abstract: This talk will address one of the most common challenges of organizations adopting Kubernetes on a medium to large scale: how to keep cloud costs under control without babysitting each and every deployment and cluster configuration? How to operate 80+ Kubernetes clusters in a cost-efficient way for 200+ autonomous development teams?
This talk provides insights on how Zalando approaches this problem with central cost optimizations (e.g. Spot), cost monitoring/alerting, active measures to reduce resource slack, and automated cluster housekeeping. We will focus on how to ingrain cost efficiency in tooling and developer workflows while balancing rigid cost control with developer convenience and without impacting availability or performance. We will show our use case running Kubernetes on AWS, but all shown tools are open source and can be applied to most other infrastructure environments.
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
In this presentation I talk about our motivation to converting our microservices to run on Kubernetes. I discuss many of the technical challenges we encountered along the way, including networking issues, Java issues, monitoring and alerting, and managing all of our resources!
Kubernetes @ Squarespace: Kubernetes in the DatacenterKevin Lynch
This talk was presented at SRE NYC Meetup on August 16, 2017 at Squarespace HQ.
https://www.youtube.com/watch?v=UJ1QAKprVr4
As the engineering teams at Squarespace grow, we have been building more and more microservices. However, this has added operational strain as we try to shoehorn a growing, complex dynamic environment into our static data center infrastructure. We needed to rethink how we handle deployments, dependency management, resource allocation, monitoring, and alerting. Docker containerization and Kubernetes orchestration helps us tackle many of these problems, but the journey has been challenging. In this talk, we’ll discuss the challenges of running Kubernetes in a datacenter and how we switched to a more SLA-focused alert structure than per instance health with Prometheus and AlertManager.
Ara Pulido, Datadog -
Container technologies, although not new, have increased their popularity in the past few years, with container orchestrators allowing companies around the world to adopt these technologies to help them ship and scale microservices with precision and velocity. Kubernetes is currently the most popular container orchestration platform, and while many organizations are migrating their workloads to it, Kubernetes is still relatively immature. New corner cases, errors, and quirks are regularly discovered as users push the boundaries of size and scale. When Datadog adopted Kubernetes we discovered some of these boundaries the hard way, and we continuously challenge and modify our infrastructure decisions in order to fit our use case. Join me in this talk for our story on what we learned while we scaled our Kubernetes clusters, the contributions to Kubernetes we made along the way, and how you can apply those learnings when growing your Kubernetes clusters from a handful to hundreds or thousands of nodes.
Working with Ansible and AWS together. Provisioning servers, setting up Cloudwatch alarms automatically, setting up Route53 records and a simple Autoscaling workflow.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 80+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are Open Source and can be applied to most Kubernetes deployments.
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014Amazon Web Services
Tuning your EC2 web server will help you to improve application server throughput and cost-efficiency as well as reduce request latency. In this session we will walk through tactics to identify bottlenecks using tools such as CloudWatch in order to drive the appropriate allocation of EC2 and EBS resources. In addition, we will also be reviewing some performance optimizations and best practices for popular web servers such as Nginx and Apache in order to take advantage of the latest EC2 capabilities.
Jessica Gadling is a Software Engineer at OpenDNS. She gave a talk and demo at OpenLate (http://www.meetup.com/OpenLate/) on October 21st, 2014 on why Docker was chosen as a central component in OpenDNS's internal PaaS Quadra.
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...Henning Jacobs
Bootstrapping a Kubernetes cluster is easy, rolling it out to nearly 200 engineering teams and operating it at scale is a challenge. In this talk, we are presenting our approach to Kubernetes provisioning on AWS, operations and developer experience for our growing Zalando developer base. We will walk you through our horror stories of operating 80+ clusters and share the insights we gained from incidents, failures, user reports and general observations. Most of our learnings apply to other Kubernetes infrastructures (EKS, GKE, ..) as well. This talk strives to reduce the audience’s unknown unknowns about running Kubernetes in production.
https://2018.container.camp/uk/schedule/running-kubernetes-in-production-a-million-ways-to-crash-your-cluster/
EFK Stack이란 ElasticSearch, Fluentd, Kibana라는 오픈소스의 조합으로, 방대한 양의 데이터를 신속하고 실시간으로 수집/저장/분석/시각화 할 수 있는 솔루션입니다. 특히 컨테이너 환경에서 로그 수집을 위해 주로 사용되는 기술 스택입니다.
Elasitc Stack에 대한 소개와 EFK Stack 설치 방법에 대해 설명합니다.
Programmable network connectivity and network overlay technologies like Docker libnetwork, Weave Net, and Calico are essential tools for DevOps engineers using orchestration tools to manage and deploy Docker containers in production. Because network troubleshooting and optimization falls within the jurisdiction of DevOps, it’s vital that DevOps engineers understand exactly how network overlays work. Participants will learn the fundamentals of container networking, see practical examples of common network overlays, and receive guidance on effectively using and tuning network overlays.
High Performance Erlang - Pitfalls and SolutionsYinghai Lu
Presented at Erlang Factory 2016, San Francisco, CA.
Erlang is widely used for building concurrent applications. However, when we push the performance of our Erlang based application to handle millions of concurrent clients, some Erlang scalability issues begin to show and some conventional programming paradigm of Erlang no longer hold. We would like to share some of these issue and how we address them. In addition, we share some of our experience on how to profile an Erlang application to identify bottlenecks.
We will take a deep look at some of the basic mechanisms of Erlang and show how they behave under high load and parallelism, which includes message delivery, process management and shared data structures such as maps and ETS tables. We will demonstrate their limitations and propose techniques to alleviate the issues.
We will also share profiling techniques on how to find those bottlenecks in Erlang applications across different levels. We will share techniques for writing highly performant Erlang applications.
In this meetup, Liran Cohen, Cloud platform & DevOps Team Leader, will talk about some of Kubernetes key concepts. We will learn about the architecture of the system; the different resources available in the system; the problems it’s trying to solve, and the model that it uses to manage containerized application deployments.
Talk held at DevOps Gathering 2019 in Bochum on 2019-03-13.
Abstract: This talk will address one of the most common challenges of organizations adopting Kubernetes on a medium to large scale: how to keep cloud costs under control without babysitting each and every deployment and cluster configuration? How to operate 80+ Kubernetes clusters in a cost-efficient way for 200+ autonomous development teams?
This talk provides insights on how Zalando approaches this problem with central cost optimizations (e.g. Spot), cost monitoring/alerting, active measures to reduce resource slack, and automated cluster housekeeping. We will focus on how to ingrain cost efficiency in tooling and developer workflows while balancing rigid cost control with developer convenience and without impacting availability or performance. We will show our use case running Kubernetes on AWS, but all shown tools are open source and can be applied to most other infrastructure environments.
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
In this presentation I talk about our motivation to converting our microservices to run on Kubernetes. I discuss many of the technical challenges we encountered along the way, including networking issues, Java issues, monitoring and alerting, and managing all of our resources!
Kubernetes @ Squarespace: Kubernetes in the DatacenterKevin Lynch
This talk was presented at SRE NYC Meetup on August 16, 2017 at Squarespace HQ.
https://www.youtube.com/watch?v=UJ1QAKprVr4
As the engineering teams at Squarespace grow, we have been building more and more microservices. However, this has added operational strain as we try to shoehorn a growing, complex dynamic environment into our static data center infrastructure. We needed to rethink how we handle deployments, dependency management, resource allocation, monitoring, and alerting. Docker containerization and Kubernetes orchestration helps us tackle many of these problems, but the journey has been challenging. In this talk, we’ll discuss the challenges of running Kubernetes in a datacenter and how we switched to a more SLA-focused alert structure than per instance health with Prometheus and AlertManager.
Kubermatic How to Migrate 100 Clusters from On-Prem to Google Cloud Without D...Tobias Schneck
Have you ever thought about migrating your Kubernetes clusters to Google Cloud to get your services closer to your customers? Yes? We too! Join us on an interactive journey to discover the main challenges of live migration at scale of etcd's, traffic routing and application workloads from your on-premise platform to GCP. The talk will discuss the current state of the technical concept, known problems and insides of the already proven migration steps for stateless workload.
As part of the journey, we'll see the differences between migrating one or one hundred clusters with productive workloads; What parts can be automated? What steps may need to be manual? Let's see how an automated solution could look like in the future and what steps are missing.
How to Migrate 100 Clusters from On-Prem to Google Cloud Without Downtimeloodse
Have you ever thought about migrating your Kubernetes clusters to Google Cloud to get your services closer to your customers? Yes? Us too! Join us on an interactive journey to discover the main challenges of live migration at scale of etcd’s, traffic routing and application workloads from your on-premise platform to GCP. The talk will discuss the current state of the technical concept, known problems and insides of the already proven migration steps for stateless workloads.
As part of the journey, we'll see
- The differences between migrating one or one hundred clusters with productive workloads
- What parts can be automated?
- What steps may need to be done manually?
We are using Elasticsearch to power the search feature of our public frontend, serving 10k queries per hour across 8 markets in SEA.
Here we are sharing our experiences of running Elasticsearch on Kubernetes, presenting our general setup, configuration tweaks and possible pitfalls.
Running large Kubernetes clusters is challenging. At large scales, practitioners need to adapt and tune both their architectures and component configurations in specialized ways.
Our organisation has been running large scale Kubernetes clusters (up to 2000 nodes, and growing) for more than a year, and we have learned several lessons the hard way. This talk will dive into complex runtime and networking issues that occur when running Kubernetes in production at scale. We will provide examples of how to improve the architecture of clusters to increase scalability and performance, both on the control plane and the data plane. Further, tools from the greater ecosystem will be examined, as they are rarely tested within the context of very large clusters.
Finally, the talk will also discuss the mutually beneficial relationship we built with the larger Kubernetes community by providing feedback on the tools and contributing both fixes and improvements upstream.
Containers are everywhere these days. Many of us are containerizing our applications to take advantage of the ease of a single artifact, but what can we do to make deploying these containers to a fleet of servers easier? Kubernetes is arguably the most popular container orchestration system to date. Kubernetes was born out of a decade of research at Google and has seen success; by itself as a fantastic way to orchestrate containers across multiple machines and as a component in other platforms.
This talk will begin with the anatomy and setup of a Kubernetes cluster. We'll demonstrate (live) taking a container containing a simple web service and launch our application into a small Kubernetes cluster. Next we'll perform a rolling update to deploy a new container version with zero downtime. Also, we'll check out some cool debugging features Kubernetes provides over the course of our demo.
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsAmbassador Labs
DevOps Days Boston 2017
Microservices is an increasingly popular approach to building cloud-native applications. Dozens of new technologies that streamline adopting microservices development such as Docker, Kubernetes, and Envoy have been released over the past few years. But how do you actually use these technologies together to develop, deploy, and run microservices?
In this presentation, we’ll cover the nuances of deploying containerized applications on Kubernetes, including creating a Kubernetes manifest, debugging and logging, and how to build an automated continuous deployment pipeline. Then, we’ll do a brief tour of some of the advanced concepts related to microservices, including service mesh, canary deployments, resilience, and security.
Debugging is an essential part of Linux kernel development. In
user-space we have the support of the kernel and many debugging tools, tracking down a kernel bug, instead, can be very difficult if you don't know the proper methodologies. This talk will cover some techniques to understand how the kernel works, hunt down and fix kernel bugs in order to become a better kernel developer.
Using eBPF to Measure the k8s Cluster HealthScyllaDB
As a k8s cluster-admin your app teams have a certain expectation of your cluster to be available to deploy services at any time without problems. While there is no shortage on metrics in k8s its important to have the right metrics to alert on issues and giving you enough data to react to potential availability issues. Prometheus has become a standard and sheds light on the inner behaviour of Kubernetes clusters and workloads. Lots of KPIs (CPU, IO, network. Etc) in our On-Premise environment are less precise when we start to work in a Cloud environment. Ebpf is the perfect technology that fulfills that requirement as it gives us information down to the kernel level. In 2018 Cloudflare shared an opensource project to expose custom ebpf metrics in Prometheus. Join this session and learn about: • What is ebpf? • What type of metrics we can collect? • How to expose those metrics in a K8s environment. This session will try to deliver a step-by-step guide on how to take advantage of the ebpf exporter.
Running large Kubernetes clusters is difficult. Datadog has been running large-scale Kubernetes clusters (thousands of nodes) for more than a year and has learned several lessons the hard way.
Laurent Bernaille examines the challenges Datadog faced during this journey. He dives into problems that arise when you run large clusters—and, crucially, how to address them—by providing detailed examples based on Datadog’s experience across different cloud providers. You’ll explore complex runtime and networking issues: at scale you discover complex issues in low-level components that are very rare but happen regularly when you have a large number of nodes.
Additionally, Laurent provides examples of how to improve the architecture of clusters to increase scalability and performance, both on the control plane and the data plane (communication between pods and ingress traffic). If scale can be hard on the control plane, it’s even harder on tools from the ecosystem, which have rarely been tested on very large clusters. He explains several examples of the tools Datadog uses and how it had to improve them to handle its scale. And you’ll leave with practical advice on how to build a good relationship with the community and start contributing back.
Disenchantment: Netflix Titus, Its Feisty Team, and DaemonsC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2Gmuwlg.
Andrew Spyker talks about Netflix's feisty team’s work across container runtimes, scheduling & control plane, and cloud infrastructure integration. He also talks about the demons they’ve found on this journey covering operability, security, reliability and performance. Filmed at qconsf.com.
Andrew Spyker worked to mature the technology base of Netflix Container Cloud (Project Titus) within the development team. Recently, he moved into a product management role collaborating with supporting Netflix infrastructure dependencies as well as supporting new container cloud usage scenarios including user on-boarding, feature prioritization/delivery and relationship management.
KubeCon EU 2016: A Practical Guide to Container SchedulingKubeAcademy
Containers are at the forefront of a new wave of technology innovation but the methods for scheduling and managing them are still new to most developers. In this talk we'll look at the kind of problems that container scheduling solves and at how maximising efficiency and maiximising QoS don't have to be exclusive goals. We'll take a behind the scenes look at the Kubernetes scheduler: How does it prioritize? What about node selection and external dependencies? How do you schedule based on your own specific needs? How does it scale and what’s in it both for developers already using containers and for those that aren't? We’ll use a combination of slides, code, demos to answer all these questions and hopefully all of yours.
Sched Link: http://sched.co/6BZa
CERN, the European Organization for Nuclear Research, is running for several years a large OpenStack Cloud that helps thousands of scientists to analyze the data from the LHC.
In 2012, early in the design phase of the CERN Cloud we decided to use Nova Cells to enable the infrastructure to scale to thousands of nodes. Now with more than 280K cores spread across 70 cells that are hosted in two data centres we were faced with the challenge to migrate to Nova Cells V2 required in the Pike release.
In this presentation, we will describe how Nova Cells allowed CERN to scale to thousands of nodes, its advantages and how we mitigate the implementation issues of Nova Cells V1. Next, we will cover how we upgraded Nova from Newton with Cells V1 to Pike with Cells V2. We will explain the steps that we followed and the issues that we faced during the upgrade. Finally, we will report our experience with Cells V2 at scale, its caveats and how we are mitigate them.
What can I expect to learn?
This presentation describes how CERN migrated from Cells V1 to Cells V2 when upgraded from Newton to Pike release.
You will learn the procedures followed by CERN in order to migrate Cells V1 to Cells V2 in a large production environment.
The issues found during the upgrade and how we mitigate them will be discussed.
Also, we will present how Cells V2 behaves in a large scale deployment with serveral thounsands nodes in 70 cells.
Database as a Service (DBaaS) on KubernetesObjectRocket
Learn about ObjectRocket's adventures in Kubernetes. We'll cover why we chose Kubernetes for our DBaaS platform, the challenges we faced, and how we overcame them. A presentation for DevWeek Austin 2018.
Free GitOps Workshop + Intro to Kubernetes & GitOpsWeaveworks
Follow along in this free workshop and experience GitOps!
AGENDA:
Welcome - Tamao Nakahara, Head of DX (Weaveworks)
Introduction to Kubernetes & GitOps - Mark Emeis, Principal Engineer (Weaveworks)
Weave Gitops Overview - Tamao Nakahara
Free Gitops Workshop - David Harris, Product Manager (Weaveworks)
If you're new to Kubernetes and GitOps, we'll give you a brief introduction to both and how GitOps is the natural evolution of Kubernetes.
Weave GitOps Core is a continuous delivery product to run apps in any Kubernetes. It is free and open source, and you can get started today!
https://www.weave.works/product/gitops-core
If you’re stuck, also come talk to us at our Slack channel! #weave-gitops http://bit.ly/WeaveGitOpsSlack (If you need to invite yourself to the Slack, visit https://slack.weave.works/)
Similar to How the OOM Killer Deleted My Namespace (20)
The Docker network overlay driver relies on several technologies: network namespaces, VXLAN, Netlink and a distributed key-value store. This talk will present each of these mechanisms one by one along with their userland tools and show hands-on how they interact together when setting up an overlay to connect containers.
The talk will continue with a demo showing how to build your own simple overlay using these technologies.
The Docker network overlay driver relies on several technologies: network namespaces, VXLAN, Netlink and a distributed key-value store. This talk will present each of these mechanisms one by one along with their userland tools and show hands-on how they interact together when setting up an overlay to connect containers. The talk will continue with a demo showing how to build your own simple overlay using these technologies. Finally, it will show how we can dynamically distribute IP and MAC information to every hosts in the overlay using BGP EVPN
Serverless architectures are promising and will play an important role in the coming years but the ecosystem around serverless is still pretty young. We have been operating Lambda based applications for about a year and faced several challenges. In this presentation we share these challenges and propose some solutions to work around them.
The Docker network overlay driver relies on several technologies: network namespaces, VXLAN, Netlink and a distributed key-value store. This talk will present each of these mechanisms one by one along with their userland tools and show hands-on how they interact together when setting up an overlay to connect containers.
The talk will continue with a demo showing how to build your own simple overlay using these technologies.
Most tools to recognize the application associated with network connections use well-known signatures as basis for their classification. This approach is very effective in enterprise and campus networks to pinpoint forbidden applications (peer to peer, for instance) or security threats. However, it is easy to use encryption to evade these mechanisms. In particular, Secure Sockets Layer (SSL) libraries such as OpenSSL are widely available and can easily be used to encrypt any type of traffic. In this paper, we propose a method to detect applications in SSL encrypted connections. Our method uses only the size of the first few packets of an SSL connection to recognize the application, which enables an early classification. We test our method on packet traces collected on two campus networks and on manually-encrypted traces. Our results show that we are able to recognize the application in an SSL connection with more than 85% accuracy.
The automatic detection of applications associated with network traffic is an essential step for network security and traffic engineering. Unfortunately, simple port-based classification methods are not always efficient and systematic analysis of packet payloads is too slow. Most recent research proposals use flow statistics to classify traffic flows once they are finished, which limit their applicability for online classification. In this paper, we evaluate the feasibility of application identification at the beginning of a TCP connection. Based on an analysis of packet traces collected on eight different networks, we find that it is possible to distinguish the behavior of an application from the observation of the size and the direction of the first few packets of the TCP connection. We apply three techniques to cluster TCP connections: K-Means, Gaussian Mixture Model and spectral clustering. Resulting clusters are used together with assignment and labeling heuristics to design classifiers. We evaluate these classifiers on different packet traces. Our results show that the first four packets of a TCP connection are sufficient to classify known applications with an accuracy over 90% and to identify new applications as unknown with a probability of 60%.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
2. @lbernail
Datadog
Over 400 integrations
Over 1,500 employees
Over 12,000 customers
Runs on millions of hosts
Trillions of data points per day
10000s hosts in our infra
10s of k8s clusters with 100-4000 nodes
Multi-cloud
Very fast growth
3. @lbernail
Context
● We have shared a few stories already
○ Kubernetes the Very Hard Way
○ 10 Ways to Shoot Yourself in the Foot with Kubernetes
○ Kubernetes DNS horror stories
● The scale of our Kubernetes infra keeps growing
○ New scalability issues
○ But also, much more complex problems
6. @lbernail
Feb 17 14:30: Namespace and workloads deleted
Investigation
DeleteCollection calls by resource (audit logs)
7. @lbernail
Audit logs
● Delete calls for all resources in the namespace
● No delete call for the namespace resource
Why was the namespace deleted??
Investigation
8. @lbernail
Deletion call, 4d before
Audit logs for the namespace
Feb 13th (resource deletion happened on Feb 17th)
Successful namespace delete call, 4d
before workloads were deleted
17. @lbernail
Namespace Controller
=> If discovery fails, namespace content is not deleted
Kubernetes 1.14+: Delete as many resources as possible
Kubernetes 1.16+: Add Conditions to namespaces to surface errors
18. @lbernail
Namespace Controller logs
unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1:
the server is currently unable to handle the request
Controller-manager contained many occurrences of
No context so hard to link with the incident but
● This is the error from the Discovery call
● We have a likely cause: metrics.k8s.io/v1beta1
19. @lbernail
metrics-server
● Additional controller providing node/pod metrics
● Registers an ApiService
○ Additional Kubernetes APIs
○ Managed by the external controller
Additional APIs (group and version)
Where should the apiserver proxy calls for this APIs
20. @lbernail
Events so far
● Feb 13th: namespace deletion call
● Feb 13-17th
○ Namespace controller can't discover metrics APIs
○ Namespace content is not deleted
○ At each sync, controller retries and fails
● Feb 17th
○ ?
○ Namespace controller is unblocked
○ Everything in the namespace is deleted
21. @lbernail
Events so far
● Feb 13th: error leads to namespace deletion call
● Feb 13-17th
○ Namespace controller can't discover metrics APIs
○ Namespace content is not deleted
○ At each sync, controller retries and fails
● Feb 17th
○ OOM-killer kills metrics-server and it is restarted
○ Metrics-server is now working fine
○ Namespace controller is unblocked
○ Everything in the namespace is deleted
25. @lbernail
Metrics-server deployment
"old" nodes
no PodShareProcessNamespace
Subset of nodes using an old image without PodShareProcessNamespace
● metrics-server: fine until it was rescheduled on an old node
● At this point, server certificate is no longer reloaded
recent nodes
metrics-server
26. @lbernail
Full chain of events
● Before Feb 13th
○ metrics-server scheduled on a "bad" node
○ controller discovery calls fail
● Feb 13th: error leads to namespace deletion call
● Feb 13-17th
○ Namespace controller can't discover metrics APIs
○ Namespace content is not deleted
○ At each sync, controller retries and fails
● Feb 17th
○ OOM-killer kills metrics-server and it is restarted
○ Metrics-server is now working fine (certificate up to date)
○ Namespace controller is unblocked
○ Everything in the namespace is deleted
27. @lbernail
Key take-away
● Apiservice extensions are great but can impact your cluster
○ If the API is not used, they can be down for days without impact
○ But some controllers need to discover all apis and will fail
● In addition, Apiservice extensions security is hard
○ If you are curious, go and see
29. @lbernail
Context
● We build images for our Kubernetes nodes
● A small change pushed
● Nodes become NotReady after a few days
● Change completely unrelated to kubelet / runtime
● We can't promote to Prod
30. @lbernail
Investigation
Conditions:
Type Status Reason Message
---- ------ ------ -------
OutOfDisk False KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False KubeletHasSufficientPID kubelet has sufficient PID available
Ready False KubeletNotReady container runtime is down
Node description
31. @lbernail
Runtime is down?
$ crictl pods
POD ID STATE NAME NAMESPACE
681bb3aec7fba Ready local-volume-provisioner-jtwzf local-volume-provisioner
53f122f351da2 Ready datadog-agent-qctt2-8ljgp datadog-agent
c1edb4f19713c Ready localusers-z26n9 prod-platform
01cf5cafd6bc9 Ready node-monitoring-ws2xf node-monitoring
39f01bdaade86 Ready kube2iam-zzdt6 kube2iam
3fcf520680a63 Ready kube-proxy-sr7tn kube-system
5ecd0966634f1 Ready node-local-dns-m2zbp coredns
Looks completely fine
Let's look at containerd
32. @lbernail
Kubelet logs
remote_runtime.go:434] Status from runtime service failed: rpc error: code = DeadlineExceeded desc =
context deadline exceeded
kubelet.go:2130] Container runtime sanity check failed: rpc error: code = DeadlineExceeded desc =
context deadline exceeded
kubelet.go:1803] skipping pod synchronization - [container runtime is down]
"Status from runtime service failed"
➢ Is there a CRI call for this?
Something is definitely wrong
35. @lbernail
CNI status
● Really not much here right?
● But we have calls that hang and a lock
● Could this be related?
36. @lbernail
Containerd goroutine dump
containerd[737]: goroutine 107298383 [semacquire, 2 minutes]:
containerd[737]: goroutine 107290532 [semacquire, 4 minutes]:
containerd[737]: goroutine 107282673 [semacquire, 6 minutes]:
containerd[737]: goroutine 107274360 [semacquire, 8 minutes]:
containerd[737]: goroutine 107266554 [semacquire, 10 minutes]:
Blocked goroutines?
The runtime error from the kubelet also happens every 2mn
containerd[737]: goroutine 107298383 [semacquire, 2 minutes]:
...
containerd[737]: .../containerd/vendor/github.com/containerd/go-cni.(*libcni).Status()
containerd[737]: .../containerd/vendor/github.com/containerd/go-cni/cni.go:126
containerd[737]: .../containerd/vendor/github.com/containerd/cri/pkg/server.(*criService).Status(...)
containerd[737]: .../containerd/containerd/vendor/github.com/containerd/cri/pkg/server/status.go:45
Goroutine details match exactly the codepath we looked at
37. @lbernail
Seems CNI related
Let's bypass containerd/kubelet
$ ip netns add debug
$ CNI_PATH=/opt/cni/bin cnitool add pod-network /var/run/netns/debug
Works great and we even have connectivity
$ ip netns exec debug ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=111 time=2.41 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=111 time=1.70 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=111 time=1.76 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=111 time=1.72 ms
38. @lbernail
What about Delete?
$ CNI_PATH=/opt/cni/bin cnitool del pod-network /var/run/netns/debug
^C
● hangs indefinitely
● and we can even find the original process still running
ps aux | grep cni
root 367 0.0 0.1 410280 8004 ? Sl 15:37 0:00
/opt/cni/bin/cni-ipvlan-vpc-k8s-unnumbered-ptp
39. @lbernail
CNI plugin
● On this cluster we use the Lyft CNI plugin
● The delete calls fails for cni-ipvlan-vpc-k8s-unnumbered-ptp
● After a few tests, we tracked the problem to this call
● Weird, this is in the netlink library!
41. @lbernail
Summary
● Packer configured to use the latest Ubuntu 18.04
● The default kernel changed from 4.15 to 5.4
● Netlink library had a bug for kernels 4.20+
● Bug only happened on pod deletion
42. @lbernail
Status
● Containerd
○ Status calls trigger config file reload, which acquires a RWLock on the config mutex
○ CNI Add/Del calls acquire a RLock on the mutex
○ If CNI Add/Del hangs, Status calls block
○ Does not impact containerd 1.4+ (separate fsnotify.Watcher to reload config)
● cni-ipvlan-vpc-k8s ("Lyft" CNI plugin)
○ Bug in Netlink library with kernel 4.20+
○ Does not impact cni-ipvlan-vpc-k8s 0.6.2+ (different function used)
43. @lbernail
Key take-away
Nodes are not abstract compute resources
● Kernel
● Distribution
● Hardware
Error messages can be misleading
● For CNI errors, runtime usually reports a clear error
● Here we had nothing but a timeout
44. How a single app can overwhelm the
control plane
48. @lbernail
What we know
● Cluster size has not significantly changed
● The control plane has not been updated
● So it is very likely related to api calls
50. @lbernail
Why are list calls expensive?
Understanding Apiserver caching
apiserver
etcd
cache List + Watch
51. @lbernail
Why are list calls expensive?
What happens for GET/LIST calls?
apiserver
etcd
cache List + Watch
client storagehandler
GET/LIST RV=X
● Resources have versions (ResourceVersion, RV)
● For GET/LIST with RV=X
If cachedVersion >= X, return cachedVersion
else wait up to 3s, if cachedVersion>=X return cachedVersion
else Error: "Too large resource version"
➢ RV=X: "at least as fresh as X"
➢ RV=0: current version in cache
52. @lbernail
Why are list calls expensive?
What about GET/LIST without a resourceVersion?
apiserver
etcd
cache List + Watch
client storagehandler
GET/LIST RV=""
● If RV is not set, apiserver performs a Quorum read against etcd (for consistency)
● This is the behavior of
○ kubectl get
○ client.CoreV1().Pods("").List() with default options (client-go)
53. @lbernail
Illustration
● Test on a large cluster with more than 30k pods
● Using table view ("application/json;as=Table;v=v1beta1;g=meta.k8s.io, application/json")
○ Only ~25MB of data to minimize transfer time (full JSON: ~1GB)
time curl 'https://cluster.dog/api/v1/pods'
real 0m4.631s
time curl 'https://cluster.dog/api/v1/pods?resourceVersion=0'
real 0m1.816s
54. @lbernail
What about label filters?
time curl 'https://cluster.dog/api/v1/pods?labelSelector=app=A'
real 0m3.658s
time curl 'https://cluster.dog/api/v1/pods?labelSelector=app=A&resourceVersion=0'
real 0m0.079s
● Call with RV="" is slightly faster (less data to send)
● Call with RV=0 is much much faster
> Filtering is performed on cached data
● When RV="", all pods are still retrieved from etcd and then filtered on apiservers
55. @lbernail
Why not filter in etcd?
etcd key structure
/registry/{resource type}/{namespace)/{resource name}
So we can ask etcd for:
● a specific resource
● all resources of a type in a namespace
● all resources of a type
● no other filtering / indexing
curl 'https://cluster.dog/api/v1/pods?labelSelector=app=A'
real 0m3.658s <== get all pods (30k) in etcd, filter on apiserver
curl 'https://cluster.dog/api/v1/namespaces/datadog/pods?labelSelector=app=A'
real 0m0.188s <== get pods from datadog namespace (1000) in etcd, filter on apiserver
curl 'https://cluster.dog/api/v1/namespaces/datadog/pods?labelSelector=app=A&resourceVersion=0'
real 0m0.058s <== get pods from datadog namespace in apiserver cache, filter
56. @lbernail
Informers instead of List
How do informers work?
apiserver
etcd
cache List + Watch
Informer storagehandler
1- LIST RV=0
2- WATCH RV=X
● Informers are much better because
○ They maintain a local cache, updated on changes
○ They start with a LIST using RV=0 to get data from the cache
● Edge case
○ Disconnections: can trigger a new LIST with RV="" (to avoid getting back in time)
○ Kubernetes 1.20 will have EfficientWatchResumption (alpha)
https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1904-efficient-watch-resumption
57. @lbernail
Summary
● LIST calls go to etcd by default and can have a huge impact
● LIST calls with label filters still retrieve everything from etcd
● Avoid LIST, use Informers
● kubectl get uses LIST with RV=""
● kubectl get could allow setting RV=0
○ Much faster: better user experience
○ Much better for etcd and apiservers
○ Trade-off: small inconsistency window
Many, many thanks to Wojtek-t for his help getting all this right
58. @lbernail
Back to the incident
● We know the problem comes from LIST calls
● What application is issuing these calls?
61. @lbernail
Nodegroup controller?
● An in-house controller to manage pools of nodes
● Used extensively for 2 years
● But recent upgrade deployed: deletion protection
○ Check if pods are running on pool of nodes
○ Deny nodegroup deletion if it is the case
62. @lbernail
How did it work?
On nodegroup delete
1. List all nodes in this nodegroup based on labels
=> Some groups have 100+ nodes
2. List all pods on each node filtering on bound node
=> List all pods (30k) for node
=> Performed in parallel for all nodes in the group
=> The bigger the nodegroup, the worse the impact
All LIST calls here retrieve full node/pod list from etcd
63. @lbernail
What we learned
● List calls are very dangerous
○ The volume of data can be very large
○ Filtering happens on the apiserver
○ Use Informers (whenever possible)
● Audit logs are extremely useful
○ Who did what when?
○ Which users are responsible for processing time
65. @lbernail
Conclusion
● Apiservice extensions are powerful but can harm your cluster
● Nodes are not abstract compute resources
● For large (1000+ nodes) clusters
○ Understanding Apiserver caching is important
○ Client interactions with Apiservers are critical
○ Avoid LIST calls