Talk held at DevOps Gathering 2019 in Bochum on 2019-03-13.
Abstract: This talk will address one of the most common challenges of organizations adopting Kubernetes on a medium to large scale: how to keep cloud costs under control without babysitting each and every deployment and cluster configuration? How to operate 80+ Kubernetes clusters in a cost-efficient way for 200+ autonomous development teams?
This talk provides insights on how Zalando approaches this problem with central cost optimizations (e.g. Spot), cost monitoring/alerting, active measures to reduce resource slack, and automated cluster housekeeping. We will focus on how to ingrain cost efficiency in tooling and developer workflows while balancing rigid cost control with developer convenience and without impacting availability or performance. We will show our use case running Kubernetes on AWS, but all shown tools are open source and can be applied to most other infrastructure environments.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 80+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are Open Source and can be applied to most Kubernetes deployments.
Google has been running everything in containers for the past 15 years, but how do we orchestrate and manage all those containers? We've built and released the open source Kubernetes (http://kubernetes.io), which is based on years of running containers internally at Google. Join us for an introduction to containers and Kubernetes, followed by a hands-on workshop building and deploying your own Kubernetes cluster with multiple front end, database and caching instances.
Docker containers help solve the issue of process-level reproducibility by packaging up your apps and execution environments into a number of containers. But once you have a lot of containers running, you'll need to coordinate them across a cluster of machines while keeping them healthy and making sure they can find each other. This can quickly turn into an unmanageable mess! Wouldn't it be helpful if you could declare what wanted, and then have the cluster assign the resources to get it done and to recover from failures and scale on demand? Kubernetes is here to help!
Key takeaways
- Gentle introduction into containers: why and how
- Learn how Google manages applications using containers
- Intro to Kubernetes: managing applications and services
- Build and deploy your own multi-tier application using Kubernetes
In this session, we’ll review how previous efforts, including Netfilter, Berkley Packet Filter (BPF), Open vSwitch (OVS), and TC, approached the problem of extensibility. We’ll show you an open source solution available within the Red Hat Enterprise Linux kernel, where extending and merging some of the existing concepts leads to an extensible framework that satisfies the networking needs of datacenter and cloud virtualization.
Kubernetes와 Kubernetes on OpenStack 환경의 비교와 그 구축방법에 대해서 알아봅니다.
1. 클라우드 동향
2. Kubernetes vs Kubernetes on OpenStack
3. Kubernetes on OpenStack 구축 방벙
4. Kubernetes on OpenStack 운영 방법
Kubernetes Architecture - beyond a black box - Part 1Hao H. Zhang
This is part 1 of my Kubernetes architecture deep-dive slide series.
I have been working with Kubernetes for more than a year, from v1.3.6 to v1.6.7, and I am a CNCF certified Kubernetes administrator. Before I move on to something else, I would like to summarize and share my knowledges and take-aways about Kubernetes, from a software engineer perspective.
This set of slides is a humble dig into one level below your running application in production, revealing how different components of Kubernetes work together to orchestrate containers and present your applications to the rest of the world.
The slides contains 80+ external links to Kubernetes documentations, blog posts, Github issues, discussions, design proposals, pull requests, papers, source code files I went through when I was working with Kubernetes - which I think are valuable for people to understand how Kubernetes works, Kubernetes design philosophies and why these design came into places.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 80+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are Open Source and can be applied to most Kubernetes deployments.
Google has been running everything in containers for the past 15 years, but how do we orchestrate and manage all those containers? We've built and released the open source Kubernetes (http://kubernetes.io), which is based on years of running containers internally at Google. Join us for an introduction to containers and Kubernetes, followed by a hands-on workshop building and deploying your own Kubernetes cluster with multiple front end, database and caching instances.
Docker containers help solve the issue of process-level reproducibility by packaging up your apps and execution environments into a number of containers. But once you have a lot of containers running, you'll need to coordinate them across a cluster of machines while keeping them healthy and making sure they can find each other. This can quickly turn into an unmanageable mess! Wouldn't it be helpful if you could declare what wanted, and then have the cluster assign the resources to get it done and to recover from failures and scale on demand? Kubernetes is here to help!
Key takeaways
- Gentle introduction into containers: why and how
- Learn how Google manages applications using containers
- Intro to Kubernetes: managing applications and services
- Build and deploy your own multi-tier application using Kubernetes
In this session, we’ll review how previous efforts, including Netfilter, Berkley Packet Filter (BPF), Open vSwitch (OVS), and TC, approached the problem of extensibility. We’ll show you an open source solution available within the Red Hat Enterprise Linux kernel, where extending and merging some of the existing concepts leads to an extensible framework that satisfies the networking needs of datacenter and cloud virtualization.
Kubernetes와 Kubernetes on OpenStack 환경의 비교와 그 구축방법에 대해서 알아봅니다.
1. 클라우드 동향
2. Kubernetes vs Kubernetes on OpenStack
3. Kubernetes on OpenStack 구축 방벙
4. Kubernetes on OpenStack 운영 방법
Kubernetes Architecture - beyond a black box - Part 1Hao H. Zhang
This is part 1 of my Kubernetes architecture deep-dive slide series.
I have been working with Kubernetes for more than a year, from v1.3.6 to v1.6.7, and I am a CNCF certified Kubernetes administrator. Before I move on to something else, I would like to summarize and share my knowledges and take-aways about Kubernetes, from a software engineer perspective.
This set of slides is a humble dig into one level below your running application in production, revealing how different components of Kubernetes work together to orchestrate containers and present your applications to the rest of the world.
The slides contains 80+ external links to Kubernetes documentations, blog posts, Github issues, discussions, design proposals, pull requests, papers, source code files I went through when I was working with Kubernetes - which I think are valuable for people to understand how Kubernetes works, Kubernetes design philosophies and why these design came into places.
An Architectural Deep Dive With Kubernetes And Containers Powerpoint Presenta...SlideTeam
Introducing An Architectural Deep Dive With Kubernetes And Containers PowerPoint Presentation Slides. Present the need for the containers in an organization with the help of a readily available PPT slideshow. Discuss container architecture, use cases details to make your presentation elaborative. Showcase the features, architecture, installation roadmap, and the 30-60-90 day plan in Kubernetes with the help of modern-designed PPT infographics. Familiarize your viewers with the various components of Kubernetes with the help of content-ready Kubernetes Docker PPT visuals. Make full use of high-quality icons to make your presentation attention-grabbing and meaningful. Compare and contrast Kubernetes with docker swarm based on various parameters with the help of this attention-grabbing PPT slideshow. Elaborate on Kubelet, Kubectl, and Kubeadm with the help of labeled diagrams. Showcase the networking model of Kubernetes, security measures, and the development process with this easy-to-use docker Architecture PowerPoint template. Therefore, hit the download button now to grab this amazing presentation. https://bit.ly/3vtLeFb
- Archeology: before and without Kubernetes
- Deployment: kube-up, DCOS, GKE
- Core Architecture: the apiserver, the kubelet and the scheduler
- Compute Model: the pod, the service and the controller
OpenStack 운영을 통해 얻은 교훈을 공유합니다.
목차
1. TOAST 클라우드 지금의 모습
2. OpenStack 선택의 이유
3. 구성의 어려움과 극복 사례
4. 활용 사례
5. 풀어야 할 문제들
대상
- TOAST 클라우드를 사용하고 싶은 분
- WMI를 처음 들어보시는 분
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-RegionJi-Woong Choi
OpenStack Ceph & Neutron에 대한 설명을 담고 있습니다.
1. OpenStack
2. How to create instance
3. Ceph
- Ceph
- OpenStack with Ceph
4. Neutron
- Neutron
- How neutron works
5. OpenStack HA- controller- l3 agent
6. OpenStack multi-region
Dev and test environments require the frequent and repeatable deployment of the CloudStack setup. This can be time-consuming and prone to errors. In this presentation, Kaloyan shows how StorPool uses Ansible for automatic deployment and setting up complete CloudStack clouds.
Kaloyan Kotlarski is a system administrator in StorPools' support team. He's been in the company for two years. He's responsible for building CI/CD automation and helping clients integrate StorPool Storage in their cloud deployments.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
Deep dive in container service discoveryDocker, Inc.
Service discovery and traffic load-balancing in the container ecosystem relies on different technologies, such as IPVS and iptables, and container orchestrators use different approaches. This talk will present in details how Docker Swarm and Kubernetes achieve this. The talk will continue with a demo showing how applications that are not managed by Kubernetes can take advantage of its native load-balancing. Finally, it will compare these approaches to service-mesh solutions.
Kubernetes currently has two load balancing mode: userspace and IPTables. They both have limitation on scalability and performance. We introduced IPVS as third kube-proxy mode which scales kubernetes load balancer to support 50,000 services. Beyond that, control plane needs to be optimized in order to deploy 50,000 services. We will introduce alternative solutions and our prototypes with detailed performance data.
Cilium - API-aware Networking and Security for Containers based on BPFThomas Graf
Cilium is open source software for providing and transparently securing network connectivity and loadbalancing between application workloads such as application containers or processes. Cilium operates at Layer 3/4 to provide traditional networking and security services as well as Layer 7 to protect and secure use of modern application protocols such as HTTP, gRPC and Kafka. Cilium is integrated into common orchestration frameworks such as Kubernetes and Mesos.
Accelerating Envoy and Istio with Cilium and the Linux KernelThomas Graf
This talk will provide an introduction to injection options of Envoy and then deep dive into ongoing Linux kernel work that enables injecting Envoy while introducing as little latency as possible.
The servicemesh and the sidecar proxy model are on a steep trajectory to redefine many networking and security use cases. This talk explains and demos a new socket redirect Linux kernel technology that allows running Envoy with similar performance as if the sidecar was linked to the application using a UNIX domain socket. The talk will also give an outlook on how Envoy can use the recently merged kernel TLS functionality to gain access to the clear text payload transparently for end to end encrypted applications without requiring to decrypt and re-encrypt any data to further reduce the overhead and latency.
OVN (Open Virtual Network) を用いる事により、OVS (Open vSwitch)が動作する複数のサーバー(Hypervisor/Chassis)を横断する仮想ネットワークを構築する事ができます。
本スライドはOVNを用いた論理ネットワークの構成と設定サンプルのメモとなります。
Using OVN, you can build logical network among multiple servers (Hypervisor/Chassis) running OVS (Open vSwitch).
This slide is describes HOW TO example of OVN configuration to create 2 logical switch connecting 4 VMs running on 2 chassis.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyHenning Jacobs
Talk given at JAX DevOps London on 2019-05-15
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 90+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are open source and can be applied to most Kubernetes deployments. Topics covered in the talk include: understanding resource requests and limits, cgroups and CFS quota behavior, contributing factors to cluster costs (in public clouds), and best practices for managing Kubernetes resources.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Talk given at JAX DevOps London on 2019-05-15.
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 90+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are open source and can be applied to most Kubernetes deployments. Topics covered in the talk include: understanding resource requests and limits, cgroups and CFS quota behavior, contributing factors to cluster costs (in public clouds), and best practices for managing Kubernetes resources.
An Architectural Deep Dive With Kubernetes And Containers Powerpoint Presenta...SlideTeam
Introducing An Architectural Deep Dive With Kubernetes And Containers PowerPoint Presentation Slides. Present the need for the containers in an organization with the help of a readily available PPT slideshow. Discuss container architecture, use cases details to make your presentation elaborative. Showcase the features, architecture, installation roadmap, and the 30-60-90 day plan in Kubernetes with the help of modern-designed PPT infographics. Familiarize your viewers with the various components of Kubernetes with the help of content-ready Kubernetes Docker PPT visuals. Make full use of high-quality icons to make your presentation attention-grabbing and meaningful. Compare and contrast Kubernetes with docker swarm based on various parameters with the help of this attention-grabbing PPT slideshow. Elaborate on Kubelet, Kubectl, and Kubeadm with the help of labeled diagrams. Showcase the networking model of Kubernetes, security measures, and the development process with this easy-to-use docker Architecture PowerPoint template. Therefore, hit the download button now to grab this amazing presentation. https://bit.ly/3vtLeFb
- Archeology: before and without Kubernetes
- Deployment: kube-up, DCOS, GKE
- Core Architecture: the apiserver, the kubelet and the scheduler
- Compute Model: the pod, the service and the controller
OpenStack 운영을 통해 얻은 교훈을 공유합니다.
목차
1. TOAST 클라우드 지금의 모습
2. OpenStack 선택의 이유
3. 구성의 어려움과 극복 사례
4. 활용 사례
5. 풀어야 할 문제들
대상
- TOAST 클라우드를 사용하고 싶은 분
- WMI를 처음 들어보시는 분
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-RegionJi-Woong Choi
OpenStack Ceph & Neutron에 대한 설명을 담고 있습니다.
1. OpenStack
2. How to create instance
3. Ceph
- Ceph
- OpenStack with Ceph
4. Neutron
- Neutron
- How neutron works
5. OpenStack HA- controller- l3 agent
6. OpenStack multi-region
Dev and test environments require the frequent and repeatable deployment of the CloudStack setup. This can be time-consuming and prone to errors. In this presentation, Kaloyan shows how StorPool uses Ansible for automatic deployment and setting up complete CloudStack clouds.
Kaloyan Kotlarski is a system administrator in StorPools' support team. He's been in the company for two years. He's responsible for building CI/CD automation and helping clients integrate StorPool Storage in their cloud deployments.
-----------------------------------------
CloudStack Collaboration Conference 2022 took place on 14th-16th November in Sofia, Bulgaria and virtually. The day saw a hybrid get-together of the global CloudStack community hosting 370 attendees. The event hosted 43 sessions from leading CloudStack experts, users and skilful engineers from the open-source world, which included: technical talks, user stories, new features and integrations presentations and more.
Deep dive in container service discoveryDocker, Inc.
Service discovery and traffic load-balancing in the container ecosystem relies on different technologies, such as IPVS and iptables, and container orchestrators use different approaches. This talk will present in details how Docker Swarm and Kubernetes achieve this. The talk will continue with a demo showing how applications that are not managed by Kubernetes can take advantage of its native load-balancing. Finally, it will compare these approaches to service-mesh solutions.
Kubernetes currently has two load balancing mode: userspace and IPTables. They both have limitation on scalability and performance. We introduced IPVS as third kube-proxy mode which scales kubernetes load balancer to support 50,000 services. Beyond that, control plane needs to be optimized in order to deploy 50,000 services. We will introduce alternative solutions and our prototypes with detailed performance data.
Cilium - API-aware Networking and Security for Containers based on BPFThomas Graf
Cilium is open source software for providing and transparently securing network connectivity and loadbalancing between application workloads such as application containers or processes. Cilium operates at Layer 3/4 to provide traditional networking and security services as well as Layer 7 to protect and secure use of modern application protocols such as HTTP, gRPC and Kafka. Cilium is integrated into common orchestration frameworks such as Kubernetes and Mesos.
Accelerating Envoy and Istio with Cilium and the Linux KernelThomas Graf
This talk will provide an introduction to injection options of Envoy and then deep dive into ongoing Linux kernel work that enables injecting Envoy while introducing as little latency as possible.
The servicemesh and the sidecar proxy model are on a steep trajectory to redefine many networking and security use cases. This talk explains and demos a new socket redirect Linux kernel technology that allows running Envoy with similar performance as if the sidecar was linked to the application using a UNIX domain socket. The talk will also give an outlook on how Envoy can use the recently merged kernel TLS functionality to gain access to the clear text payload transparently for end to end encrypted applications without requiring to decrypt and re-encrypt any data to further reduce the overhead and latency.
OVN (Open Virtual Network) を用いる事により、OVS (Open vSwitch)が動作する複数のサーバー(Hypervisor/Chassis)を横断する仮想ネットワークを構築する事ができます。
本スライドはOVNを用いた論理ネットワークの構成と設定サンプルのメモとなります。
Using OVN, you can build logical network among multiple servers (Hypervisor/Chassis) running OVS (Open vSwitch).
This slide is describes HOW TO example of OVN configuration to create 2 logical switch connecting 4 VMs running on 2 chassis.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyHenning Jacobs
Talk given at JAX DevOps London on 2019-05-15
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 90+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are open source and can be applied to most Kubernetes deployments. Topics covered in the talk include: understanding resource requests and limits, cgroups and CFS quota behavior, contributing factors to cluster costs (in public clouds), and best practices for managing Kubernetes resources.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Talk given at JAX DevOps London on 2019-05-15.
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 90+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are open source and can be applied to most Kubernetes deployments. Topics covered in the talk include: understanding resource requests and limits, cgroups and CFS quota behavior, contributing factors to cluster costs (in public clouds), and best practices for managing Kubernetes resources.
The Next Frontier in Open Source Java Compilers: Just-In-Time Compilation as a Service
For Java developers, the Just-In-Time (JIT) compiler is key to improved performance. However, in a container world, the performance gains are often negated due to CPU and memory consumption constraints. To help solve this issue, the Eclipse OpenJ9 JVM provides JITServer technology, which separates the JIT compiler from the application.
JITServer allows the user to employ much smaller containers enabling a higher density of applications, resulting in cost savings for end-users and/or cloud providers. Because the CPU and memory surges due to JIT compilation are eliminated, the user has a much easier task of provisioning resources for his/her application. Additional advantages include: faster ramp-up time, better control over resources devoted to compilation, increased reliability (JIT compiler bugs no longer crash the application) and amortization of compilation costs across many application instances.
We will dig into JITServer technology, showing the challenges of implementation, detailing its strengths and weaknesses and illustrating its performance characteristics. For the cloud audience we will show how it can be deployed in containers, demonstrate its advantages compared to a traditional JIT compilation technique and offer practical recommendations about when to use this technology.
Devoxx France 2018 : Mes Applications en Production sur KubernetesMichaël Morello
Retour d'expérience sur la mise en production d'applications ( Java mais pas seulement ) sur Kubernetes à Devoxx France 2018
La vidéo avec la démo est disponible en ligne ici : https://www.youtube.com/watch?v=cqqLeS9mUyU
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
This is an updated version of my JITServer talk that I will present at Open Source Summit North America in May 2023
The Next Frontier in Open Source Java Compilers: Just-In-Time Compilation as a Service
For Java developers, the Just-In-Time (JIT) compiler is key to improved performance. However, in a container world, the performance gains are often negated due to CPU and memory consumption constraints. To help solve this issue, the Eclipse OpenJ9 JVM provides JITServer technology, which separates the JIT compiler from the application.
JITServer allows the user to employ much smaller containers enabling a higher density of applications, resulting in cost savings for end-users and/or cloud providers. Because the CPU and memory surges due to JIT compilation are eliminated, the user has a much easier task of provisioning resources for his/her application. Additional advantages include: faster ramp-up time, better control over resources devoted to compilation, increased reliability (JIT compiler bugs no longer crash the application) and amortization of compilation costs across many application instances.
We will dig into JITServer technology, showing the challenges of implementation, detailing its strengths and weaknesses and illustrating its performance characteristics. For the cloud audience we will show how it can be deployed in containers, demonstrate its advantages compared to a traditional JIT compilation technique and offer practical recommendations about when to use this technology.
Container Performance Analysis Brendan Gregg, NetflixDocker, Inc.
Containers pose interesting challenges for performance monitoring and analysis, requiring new analysis methodologies and tooling. Resource-oriented analysis, as is common with systems performance tools and GUIs, must now account for both hardware limits and soft limits, as implemented using resource controls including cgroups. The interaction between containers can also be examined, and noisy neighbors either identified of exonerated. Performance tooling can also need special usage or workarounds to function properly from within a container or on the host, to deal with different privilege levels and name spaces. At Netflix, we're using containers for some microservices, and care very much about analyzing and tuning our containers to be as fast and efficient as possible. This talk will show how to successfully analyze performance in a Docker container environment, and navigate differences encountered.
Speedrunning the Open Street Map osm2pgsql LoaderGregSmith458515
The Open Street Map project provides invaluable data that keeps driving users toward the PostGIS and PostgreSQL stacks. Loading today’s full Planet data set takes a 120GB XML file and unrolls it into over a terabyte of database data. Crunchy’s benchmark labs have followed the expansion of that Planet data over the last six database releases, as the re-ignition of the CPU wars combined with parallel execution features landing in the database. We’ll take a look at that data evolution, which server configurations worked, and which metrics techniques still matter in the all SSD era.
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and Accelerated Computing (GPU and FPGA) instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and Accelerated Computing (GPU and FPGA) instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Are you a Java developer wondering what it means to have your application running in the cloud. This session will provide a peek into how the JVM is adapting to running in the cloud and what Java developers need to be aware to ensure they get the most of running in the cloud.
The session will pick an example spring application and tune it stage by stage at the end of which we have an application that is fully optimized and takes advantage of every aspect of the running in a cloud
For Java developers, the Just-In-Time (JIT) compiler is key to improved performance. However, in a container world, the performance gains are often negated due to CPU and memory consumption constraints. To help solve this issue, the Eclipse OpenJ9 JVM provides JITServer technology, which separates the JIT compiler from the application. JITServer allows the user to employ much smaller containers enabling a higher density of applications, resulting in cost savings for end-users and/or cloud providers. Because the CPU and memory surges due to JIT compilation are eliminated, the user has a much easier task of provisioning resources for his/her application. Additional advantages include: faster ramp-up time, better control over resources devoted to compilation, increased reliability (JIT compiler bugs no longer crash the application) and amortization of compilation costs across many application instances. We will dig into JITServer technology, showing the challenges of implementation, detailing its strengths and weaknesses and illustrating its performance characteristics. For the cloud audience we will show how it can be deployed in containers, demonstrate its advantages compared to a traditional JIT compilation technique and offer practical recommendations about when to use this technology.
Similar to Ensuring Kubernetes Cost Efficiency across (many) Clusters - DevOps Gathering 2019 (20)
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:InventHenning Jacobs
Many clusters, many problems? Having many clusters has benefits: reduced blast radius, less vertical scaling of cluster components, and a natural trust boundary. In this session, Zalando shows its approach for running 140+ clusters on AWS, how it does continuous delivery for its cluster infrastructure, and how it created open-source tooling to manage cost efficiency and improve developer experience. The company openly shares its failures and the learnings collected during three years of Kubernetes in production.
AWS re:Invent session OPN211 on 2019-12-05
Why I love Kubernetes Failure Stories and you should too - GOTO BerlinHenning Jacobs
Talk held on 2019-10-24 at GOTO Berlin:
Everybody loves failure stories, but maybe for the wrong reasons: Schadenfreude and Internet comment threads are the dark side; continuous improvement through blameless postmortems, sharing incidents, and documenting learnings is what motivated me to compile the list of Kubernetes Failure Stories. Kubernetes gives us a infrastructure platform to talk in the same "language" and foster collaboration across organizations. In this talk, I will walk you through our horror stories of operating 100+ clusters and share the insights we gained from incidents, failures, user reports and general observations. I will highlight why Kubernetes makes sense despite its perceived complexity. Our failure stories will be sourced from recent and past incidents, so the talk will be up-to-date with our latest experiences.
https://gotober.com/2019/sessions/1129/why-i-love-kubernetes-failure-stories-and-you-should-too
Why Kubernetes? Cloud Native and Developer Experience at Zalando - Enterprise...Henning Jacobs
Kubernetes hat sich als defacto Standard für Cloud Native Plattformen etabliert. Doch warum? Welche Vorteile und Fallstricke gibt es in der Praxis? Henning Jacobs zeigt am Beispiel von Zalando wie Kubernetes als Infrastruktur für 1200+ Entwickler dient, welche Aspekte Kubernetes trotz seiner Komplexität einzigartig machen, und was dies für die Developer Experience bedeutet.
Why Kubernetes? Cloud Native and Developer Experience at Zalando - OWL Tech &...Henning Jacobs
Talk held on 2019-09-26 in Paderborn:
Die Keynote:
Warum Kubernetes? Cloud Native und Developer Experience bei Zalando
Kubernetes hat sich als defacto Standard for Cloud Native Plattformen durchgesetzt. Warum? Welche Vorteile und Fallstricke gibt es in der Praxis?
Henning Jacobs zeigt am Beispiel von Zalando wie Kubernetes als Infrastruktur für 1200+ Entwickler dient, welche Aspekte Kubernetes trotz seiner Komplexität einzigartig machen, und was das für die Developer.
Experience bedeutet.
Henning Jacobs ist der Head of Developer Productivity bei Zalando und damit verantwortlich für die Developer Experience von mehr als 200 Zalando Delivery Teams.
Das Kubernetes eine hervorragende Plattform für den Erfahrungsaustausch darstellt, zeigt Henning mit seiner Liste von Kubernetes Failure Stories.
https://teuto.net/owl-tech-innovation-day/
While Go is the language-of-choice in the cloud-native world, Python has a huge community and makes it really easy to extend Kubernetes in only a few lines of code.
This talk shows examples on how to use Python to query the Kubernetes API, how to write simple controllers in only 10 lines of Python, how to build complete web UIs, and how to test everything with py.test and Kind.
Some of the open-source projects which will be covered: pykube-ng, Kubernetes Web View, kube-janitor, and Kopf (Kubernetes Operator Pythonic Framework).
Talk held in Prague on 2019-09-05:
https://www.meetup.com/Cloud-Native-Prague/events/263802447/
Kubernetes Failure Stories, or: How to Crash Your Cluster - ContainerDays EU ...Henning Jacobs
Bootstrapping a Kubernetes cluster is easy, rolling it out to nearly 200 engineering teams and operating it at scale is a challenge. In this talk, we are presenting our approach to Kubernetes provisioning on AWS, operations and developer experience for our growing Zalando developer base. We will walk you through our horror stories of operating 100+ clusters and share the insights we gained from incidents, failures, user reports and general observations. Our failure stories will be sourced from recent and past incidents, so the talk will be up-to-date with our latest experiences.
Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...Henning Jacobs
While the adoption of DevOps makes teams move faster with reduced dependency on central operations, it can constrain teams who lack the skills to self-manage the full application and infrastructure stack. The way to overcome this challenge is creating an internal platform and treating it as a world-class product offering. “Applying product management to internal platforms means establishing empathy with internal consumers (read: developers) and collaborating with them on the design. Platform product managers establish roadmaps and ensure the platform delivers value to the business and enhances the developer experience”, via ThoughtWorks Technology Radar. In this talk, we will walk you through how Zalando adopted a customer-first mindset with regards to its developer tooling. We will show the effect on developer satisfaction when internal platforms are given the same respect as external product offerings. We will tell our story on how we moved from a classical infrastructure team to a product mindset with strong focus on building a world-class developer experience. We will share both our learnings and challenges going through this transition, and the impact it has on the daily life of our customers (developers).
Why we don’t use the Term DevOps: the Journey to a Product Mindset - Destinat...Henning Jacobs
While the adoption of DevOps makes teams move faster with reduced dependency on central operations, it can constrain teams who lack the skills to self-manage the full application and infrastructure stack.
The way to overcome this challenge is creating an internal platform and treating it as a world-class product offering. “Applying product management to internal platforms means establishing empathy with internal consumers (read: developers) and collaborating with them on the design. Platform product managers establish roadmaps and ensure the platform delivers value to the business and enhances the developer experience”, via ThoughtWorks Technology Radar.
In this talk, Henning Jacobs will walk you through how Zalando adopted a customer-first mindset with regards to its developer tooling. He will show the effect on developer satisfaction when internal platforms are given the same respect as external product offerings. Henning will furthermore tell his story about how Zalando moved from a classical infrastructure team to a product mindset with strong focus on building a world-class developer experience. Henning shares both their learnings and challenges going through this transition, and the impact it has on the daily life of Zalando’s customers (developers).
This talk was given in Aarhus on 4th of June 2019.
Kubernetes Failure Stories - KubeCon Europe BarcelonaHenning Jacobs
Talk given on 2019-05-21 at KubeCon Barcelona: https://kccnceu19.sched.com/event/MPcM/kubernetes-failure-stories-and-how-to-crash-your-clusters-henning-jacobs-zalando-se
Bootstrapping a Kubernetes cluster is easy, rolling it out to nearly 200 engineering teams and operating it at scale is a challenge. In this talk, we are presenting our approach to Kubernetes provisioning on AWS, operations and developer experience for our growing Zalando developer base. We will walk you through our horror stories of operating 100+ clusters and share the insights we gained from incidents, failures, user reports and general observations. Our failure stories will be sourced from recent and past incidents, so the talk will be up-to-date with our latest experiences.
Most of our learnings apply to other Kubernetes infrastructures (EKS, GKE, ..) as well. This talk strives to reduce the audience's unknown unknowns about running Kubernetes in production.
Developer Experience at Zalando - Handelsblatt Strategisches IT-Management 2019Henning Jacobs
Talk given at 25. Handelsblatt Jahrestagung Strategisches IT-Management in Munich on 2019-01-23. Original title (German): "Developer Experience bei Zalando: Entwicklerproduktivität steigern mit Cloud Native Infrastruktur"
- Wie macht man mehr als 1100 Entwickler glücklich und effektiv?
- Entwickler als Kunde: Produktmanagement für Plattformteams
- You build it – you run it: Self-Service-Infrastruktur mit Kubernetes und AWS
- Der Weg vom klassischen Infrastrukturteam zu Developer Productivity als Abteilung
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - DevO...Henning Jacobs
Bootstrapping a Kubernetes cluster is easy, rolling it out to nearly 200 engineering teams and operating it at scale is a challenge. In this talk, we are presenting our approach to Kubernetes provisioning on AWS, operations and developer experience for our growing Zalando developer base.
We will walk you through our horror stories of operating 80+ clusters and share the insights we gained from incidents, failures, user reports and general observations.
Most of our learnings apply to other Kubernetes infrastructures (EKS, GKE, ..) as well.
This talk strives to reduce the audience’s unknown unknowns about running Kubernetes in production.
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...Henning Jacobs
Bootstrapping a Kubernetes cluster is easy, rolling it out to nearly 200 engineering teams and operating it at scale is a challenge. In this talk, we are presenting our approach to Kubernetes provisioning on AWS, operations and developer experience for our growing Zalando developer base. We will walk you through our horror stories of operating 80+ clusters and share the insights we gained from incidents, failures, user reports and general observations. Most of our learnings apply to other Kubernetes infrastructures (EKS, GKE, ..) as well. This talk strives to reduce the audience’s unknown unknowns about running Kubernetes in production.
https://2018.container.camp/uk/schedule/running-kubernetes-in-production-a-million-ways-to-crash-your-cluster/
Connexion is an open source API first REST framework for Python, built on top of Flask and based on OpenAPI/Swagger, targeted for microservice development. Connexion automagically handles request routing, oauth2 security, request validation and response serialization based on an OpenAPI 2.0 Specification file in YAML, so you don’t have to care about boilerplate anymore.
Because it is based on Flask it supports everything that Flask does, including deployment options and extensions.
At Zalando we’ve adopted “API First” as one of our key engineering principles, to ensure our API are robust, consistent, general and
abstracted from specific implementation and use cases. But when we tried to implement this principle for the first time we were faced with the lack of a python framework to achieve it in a easy fashion - there were several frameworks that produce a swagger definition from the
implementation but none that do it the other way around - so we decided to fill that gap.
Henning will show how to get started with OpenAPI+Connexion, present some real-world use cases and deployment options such as Kubernetes.
Developer Journey at Zalando - Idea to Production with Containers in the Clou...Henning Jacobs
Talk held on R-ETAIL:CODE in London on 2018-03-15.
- The history of how DevOps evolved at Zalando: from on-premise data centers to autonomous teams, microservices and cluster management in the cloud
- How the developer experience looks like for the application lifecycle from idea to production and what our vision for the future is
- Challenges and learnings from our past experiences: why architecture principles and constraints are important to lead 200+ engineering teams
Large Scale Kubernetes on AWS at Europe's Leading Online Fashion Platform - C...Henning Jacobs
Bootstrapping a Kubernetes cluster is easy, rolling it out to nearly 200 engineering teams and operating it at scale is a challenge. In this talk, we are presenting our approach to Kubernetes provisioning on AWS, operations and developer experience for our growing Zalando Technology department. We will highlight in the context of Kubernetes: AWS service integrations, our IAM/OAuth infrastructure, cluster autoscaling, continuous delivery and general developer experience. The talk will cover our most important learnings and we will openly share failure stories.
Talk given at Container Days HH (https://containerdays.io/) on 2017-06-20.
From AWS/STUPS to Kubernetes on AWS @Zalando - Berlin Kubernetes MeetupHenning Jacobs
This talk will highlight our challenges while migrating from our STUPS infrastructure (Docker on EC2, Cloud Formation) to Kubernetes on AWS.
Talk was held at Berlin Kubernetes Meetup on 2017-05-18: https://www.meetup.com/Berlin-Kubernetes-Meetup/events/239313998/
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
2. 2
ZALANDO AT A GLANCE
~ 5.4billion EUR
revenue 2018
> 250
million
visits
per
month
> 15.000
employees in
Europe
> 79%
of visits via
mobile devices
> 26
million
active customers
> 300.000
product choices
~ 2.000
brands
17
countries
18. 18
REQUESTS: CPU SHARES
kubectl run --requests=cpu=10m/5m ..sha512()..
cat /sys/fs/cgroup/cpu/kubepods/burstable/pod5d5..0d/cpu.shares
10 // relative share of CPU time
cat /sys/fs/cgroup/cpu/kubepods/burstable/pod6e0..0d/cpu.shares
5 // relative share of CPU time
cat /sys/fs/cgroup/cpuacct/kubepods/burstable/pod5d5..0d/cpuacct.usage
/sys/fs/cgroup/cpuacct/kubepods/burstable/pod6e0..0d/cpuacct.usage
13432815283 // total CPU time in nanoseconds
7528759332 // total CPU time in nanoseconds
19. 19
LIMITS: COMPRESSIBLE RESOURCES
Can be taken away quickly,
"only" cause slowness
CPU Throttling
200m CPU limit
⇒ container can use 0.2s of CPU time per second
22. 22
MEMORY LIMITS: OUT OF MEMORY
kubectl get pod
NAME READY STATUS RESTARTS AGE
kube-ops-view-7bc-tcwkt 0/1 CrashLoopBackOff 3 2m
kubectl describe pod kube-ops-view-7bc-tcwkt
...
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
23. 23
QUALITY OF SERVICE (QOS)
Guaranteed: all containers have limits == requests
Burstable: some containers have limits > requests
BestEffort: no requests/limits set
kubectl describe pod …
Limits:
memory: 100Mi
Requests:
cpu: 100m
memory: 100Mi
QoS Class: Burstable
24. 24
OVERCOMMIT
Limits > Requests ⇒ Burstable QoS ⇒ Overcommit
For CPU: fine, running into completely fair scheduling
For memory: fine, as long as demand < node capacity
https://code.fb.com/production-engineering/oomd/
Might run into unpredictable OOM
situations when demand reaches node's
memory capacity (Kernel OOM Killer)
25. 25
LIMITS: CGROUPS
docker run --cpus 1 -m 200m --rm -it busybox
cat /sys/fs/cgroup/cpu/docker/8ab25..1c/cpu.{shares,cfs_*}
1024 // cpu.shares (default value)
100000 // cpu.cfs_period_us (100ms period length)
100000 // cpu.cfs_quota_us (total CPU time in µs consumable per period)
cat /sys/fs/cgroup/memory/docker/8ab25..1c/memory.limit_in_bytes
209715200
32. 32
OVERLY AGGRESSIVE CFS: EXPERIMENT #1
CPU Period: 100ms
CPU Quota: None
Burn 5ms and sleep 100ms
⇒ Quota disabled
⇒ No Throttling expected!
https://gist.github.com/bobrik/2030ff040fad360327a5fab7a09c4ff1
33. 33
EXPERIMENT #1: NO QUOTA, NO THROTTLING
2018/11/03 13:04:02 [0] burn took 5ms, real time so far: 5ms, cpu time so far: 6ms
2018/11/03 13:04:03 [1] burn took 5ms, real time so far: 510ms, cpu time so far: 11ms
2018/11/03 13:04:03 [2] burn took 5ms, real time so far: 1015ms, cpu time so far: 17ms
2018/11/03 13:04:04 [3] burn took 5ms, real time so far: 1520ms, cpu time so far: 23ms
2018/11/03 13:04:04 [4] burn took 5ms, real time so far: 2025ms, cpu time so far: 29ms
2018/11/03 13:04:05 [5] burn took 5ms, real time so far: 2530ms, cpu time so far: 35ms
2018/11/03 13:04:05 [6] burn took 5ms, real time so far: 3036ms, cpu time so far: 40ms
2018/11/03 13:04:06 [7] burn took 5ms, real time so far: 3541ms, cpu time so far: 46ms
2018/11/03 13:04:06 [8] burn took 5ms, real time so far: 4046ms, cpu time so far: 52ms
2018/11/03 13:04:07 [9] burn took 5ms, real time so far: 4551ms, cpu time so far: 58ms
34. 34
OVERLY AGGRESSIVE CFS: EXPERIMENT #2
CPU Period: 100ms
CPU Quota: 20ms
Burn 5ms and sleep 500ms
⇒ No 100ms intervals where possibly 20ms is burned
⇒ No Throttling expected!
35. 35
EXPERIMENT #2: OVERLY AGGRESSIVE CFS
2018/11/03 13:05:05 [0] burn took 5ms, real time so far: 5ms, cpu time so far: 5ms
2018/11/03 13:05:06 [1] burn took 99ms, real time so far: 690ms, cpu time so far: 9ms
2018/11/03 13:05:06 [2] burn took 99ms, real time so far: 1290ms, cpu time so far: 14ms
2018/11/03 13:05:07 [3] burn took 99ms, real time so far: 1890ms, cpu time so far: 18ms
2018/11/03 13:05:07 [4] burn took 5ms, real time so far: 2395ms, cpu time so far: 24ms
2018/11/03 13:05:08 [5] burn took 94ms, real time so far: 2990ms, cpu time so far: 27ms
2018/11/03 13:05:09 [6] burn took 99ms, real time so far: 3590ms, cpu time so far: 32ms
2018/11/03 13:05:09 [7] burn took 5ms, real time so far: 4095ms, cpu time so far: 37ms
2018/11/03 13:05:10 [8] burn took 5ms, real time so far: 4600ms, cpu time so far: 43ms
2018/11/03 13:05:10 [9] burn took 5ms, real time so far: 5105ms, cpu time so far: 49ms
36. 36
OVERLY AGGRESSIVE CFS: EXPERIMENT #3
CPU Period: 10ms
CPU Quota: 2ms
Burn 5ms and sleep 100ms
⇒ Same 20% CPU (200m) limit, but smaller period
⇒ Throttling expected!
37. 37
SMALLER CPU PERIOD ⇒ BETTER LATENCY
2018/11/03 16:31:07 [0] burn took 18ms, real time so far: 18ms, cpu time so far: 6ms
2018/11/03 16:31:07 [1] burn took 9ms, real time so far: 128ms, cpu time so far: 8ms
2018/11/03 16:31:07 [2] burn took 9ms, real time so far: 238ms, cpu time so far: 13ms
2018/11/03 16:31:07 [3] burn took 5ms, real time so far: 343ms, cpu time so far: 18ms
2018/11/03 16:31:07 [4] burn took 30ms, real time so far: 488ms, cpu time so far: 24ms
2018/11/03 16:31:07 [5] burn took 19ms, real time so far: 608ms, cpu time so far: 29ms
2018/11/03 16:31:07 [6] burn took 9ms, real time so far: 718ms, cpu time so far: 34ms
2018/11/03 16:31:08 [7] burn took 5ms, real time so far: 824ms, cpu time so far: 40ms
2018/11/03 16:31:08 [8] burn took 5ms, real time so far: 943ms, cpu time so far: 45ms
2018/11/03 16:31:08 [9] burn took 9ms, real time so far: 1068ms, cpu time so far: 48ms
43. 43
CLUSTER AUTOSCALER
Simulates the Kubernetes scheduler internally to find out..
• ..if any of the pods wouldn’t fit on existing nodes
⇒ upscale is needed
• ..if it’s possible to fit some of the pods on existing nodes
⇒ downscale is needed
⇒ Cluster size is determined by resource requests
(+ constraints)
github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
44. 44
AUTOSCALING BUFFER
• Cluster Autoscaler only triggers on Pending Pods
• Node provisioning is slow
⇒ Reserve extra capacity via low priority Pods
"Autoscaling Buffer Pods"
59. 59
VERTICAL POD AUTOSCALER (VPA)
"Some 2/3 of the (Google) Borg
users use autopilot."
- Tim Hockin
VPA: Set resource requests
automatically based on usage.
67. Example: Getting started
with Zalenium & UI Tests
Example: Step by step guide to the first UI test with Zalenium running in the
Continuous Delivery Platform. I was always afraid of UI tests because it looked too
difficult to get started, Zalenium solved this problem for me.
69. 69
KUBERNETES JANITOR
● TTL and expiry date annotations, e.g.
○ set time-to-live for your test deployment
● Custom rules, e.g.
○ delete everything without "app" label after 7 days
github.com/hjacobs/kube-janitor
70. 70
JANITOR TTL ANNOTATION
# let's try out nginx, but only for 1 hour
kubectl run nginx --image=nginx
kubectl annotate deploy nginx janitor/ttl=1h
github.com/hjacobs/kube-janitor
73. 73
SPOT ASG / LAUNCH TEMPLATE
Not upstream in cluster-autoscaler (yet)
74. 74
CLUSTER OVERHEAD: CONTROL PLANE
● GKE cluster: free
● EKS cluster: $146/month
● Zalando prod cluster: $635/month
(etcd nodes + master nodes + ELB)
Potential: fewer etcd nodes, no HA, shared control plane.
75. 75
WHAT WORKED FOR US
● Disable CPU CFS Quota in all clusters
● Prevent memory overcommit
● Kubernetes Resource Report
● Downscaling during off-hours
● EC2 Spot