Talk given at JAX DevOps London on 2019-05-15
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 90+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are open source and can be applied to most Kubernetes deployments. Topics covered in the talk include: understanding resource requests and limits, cgroups and CFS quota behavior, contributing factors to cluster costs (in public clouds), and best practices for managing Kubernetes resources.
Kubernetes is a container orchestration platform that provides a mechanism to manage the resources of containers in the cluster. That mechanism is known as "Requests and Limits".
Requests and limits play a key role not only in resource management but also in applications stability, capacity planning, scheduling the resources (i.e., on which node the pod will be running).
In this session we will cover:
- A quick review of Containers, Docker, and Kubernetes.
- Containers resource management in Kubernetes.
- Containers resource types in Kubernetes.
- 3 different ways to set requests and limits.
- The difference between capacity and allocatable resources.
- Tips and recap.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Talk given at JAX DevOps London on 2019-05-15.
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 90+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are open source and can be applied to most Kubernetes deployments. Topics covered in the talk include: understanding resource requests and limits, cgroups and CFS quota behavior, contributing factors to cluster costs (in public clouds), and best practices for managing Kubernetes resources.
Introduction to the Incremental Cooperative Protocol of KafkaGuozhang Wang
Anyone who has used Kafka consumer groups or operated a Kafka Streams application is likely familiar with the rebalancing protocol, which is used to (re)distribute partitions among the consumers of a group whenever there is a change in membership or in the topics subscribed to. The current protocol takes the safest possible approach of pausing all work and revoking ownership of all partitions so that a new assignment can be made. This “stop-the-world” approach can be frustrating especially when the mapping of partitions to the consumer that owns them barely changes. In KIP-429 we introduce incremental cooperative rebalancing for the consumer client, a new rebalancing protocol that allows consumers to retain ownership and continue fetching for their owned partitions while a rebalance is in progress. This proposal trades extra rebalances for the ability to revoke only those partitions which are to be migrated to another consumer for overall workload balance.
If you're running your container workloads on AWS EKS orchestration platform and you are trying to dynamically provision workload resources based on the current load, you might find yourself in a position where limitations and rules of node group scaling might feel a bit too rigid. This talk will focus on an interesting node lifecycle management solution from AWSlabs called Karpenter, which is an alternative approach to probably the most frequently used Cluster Autoscaler. Is this a better and more efficient way of allocating worker node resources? Would that get you around some of the node group constraints? The project has reached GA stage and still has some interesting challenges to solve on their roadmap. We will look into what the current release has to offer and how it is dealing with this challenge of improving efficient dynamic workload provisioning.
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentHostedbyConfluent
Joins in Kafka Streams and ksqlDB are a killer-feature for data processing and basic join semantics are well understood. However, in a streaming world records are associated with timestamps that impact the semantics of joins: welcome to the fabulous world of _temporal_ join semantics. For joins, timestamps are as important as the actual data and it is important to understand how they impact the join result.
In this talk we want to deep dive on the different types of joins, with a focus of their temporal aspect. Furthermore, we relate the individual join operators to the overall ""time engine"" of the Kafka Streams query runtime and explain its relationship to operator semantics. To allow developers to apply their knowledge on temporal join semantics, we provide best practices, tip and tricks to ""bend"" time, and configuration advice to get the desired join results. Last, we give an overview of recent, and an outlook to future, development that improves joins even further.
Kubernetes is a container orchestration platform that provides a mechanism to manage the resources of containers in the cluster. That mechanism is known as "Requests and Limits".
Requests and limits play a key role not only in resource management but also in applications stability, capacity planning, scheduling the resources (i.e., on which node the pod will be running).
In this session we will cover:
- A quick review of Containers, Docker, and Kubernetes.
- Containers resource management in Kubernetes.
- Containers resource types in Kubernetes.
- 3 different ways to set requests and limits.
- The difference between capacity and allocatable resources.
- Tips and recap.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Talk given at JAX DevOps London on 2019-05-15.
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 90+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are open source and can be applied to most Kubernetes deployments. Topics covered in the talk include: understanding resource requests and limits, cgroups and CFS quota behavior, contributing factors to cluster costs (in public clouds), and best practices for managing Kubernetes resources.
Introduction to the Incremental Cooperative Protocol of KafkaGuozhang Wang
Anyone who has used Kafka consumer groups or operated a Kafka Streams application is likely familiar with the rebalancing protocol, which is used to (re)distribute partitions among the consumers of a group whenever there is a change in membership or in the topics subscribed to. The current protocol takes the safest possible approach of pausing all work and revoking ownership of all partitions so that a new assignment can be made. This “stop-the-world” approach can be frustrating especially when the mapping of partitions to the consumer that owns them barely changes. In KIP-429 we introduce incremental cooperative rebalancing for the consumer client, a new rebalancing protocol that allows consumers to retain ownership and continue fetching for their owned partitions while a rebalance is in progress. This proposal trades extra rebalances for the ability to revoke only those partitions which are to be migrated to another consumer for overall workload balance.
If you're running your container workloads on AWS EKS orchestration platform and you are trying to dynamically provision workload resources based on the current load, you might find yourself in a position where limitations and rules of node group scaling might feel a bit too rigid. This talk will focus on an interesting node lifecycle management solution from AWSlabs called Karpenter, which is an alternative approach to probably the most frequently used Cluster Autoscaler. Is this a better and more efficient way of allocating worker node resources? Would that get you around some of the node group constraints? The project has reached GA stage and still has some interesting challenges to solve on their roadmap. We will look into what the current release has to offer and how it is dealing with this challenge of improving efficient dynamic workload provisioning.
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentHostedbyConfluent
Joins in Kafka Streams and ksqlDB are a killer-feature for data processing and basic join semantics are well understood. However, in a streaming world records are associated with timestamps that impact the semantics of joins: welcome to the fabulous world of _temporal_ join semantics. For joins, timestamps are as important as the actual data and it is important to understand how they impact the join result.
In this talk we want to deep dive on the different types of joins, with a focus of their temporal aspect. Furthermore, we relate the individual join operators to the overall ""time engine"" of the Kafka Streams query runtime and explain its relationship to operator semantics. To allow developers to apply their knowledge on temporal join semantics, we provide best practices, tip and tricks to ""bend"" time, and configuration advice to get the desired join results. Last, we give an overview of recent, and an outlook to future, development that improves joins even further.
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxRomanKhavronenko
VictoriaMetrics and Grafana Mimir are time series databases with support of mostly the same protocols and APIs. However, they have different architectures and components, which makes the comparison more complicated. In the talk, we'll go through the details of the benchmark where I compared both solutions. We'll see how VictoriaMetrics and Mimir are dealing with identical workloads and how efficient they’re with using the allocated resources.
The talk will cover design and architectural details, weak and strong points, trade-offs, and maintenance complexity of both solutions.
A look at some of the ways available to deploy Postgres in a Kubernetes cloud environment, either in small scale using simple configurations, or in larger scale using tools such as Helm charts and the Crunchy PostgreSQL Operator. A short introduction to Kubernetes will be given to explain the concepts involved, followed by examples from each deployment method and observations on the key differences.
Josh Berkus
You've heard that PostgreSQL is the highest-performance transactional open source database, but you're not seeing it on YOUR server. In fact, your PostgreSQL application is kind of poky. What should you do? While doing advanced performance engineering for really high-end systems takes years to learn, you can learn the basics to solve performance issues for 80% of PostgreSQL installations in less than an hour. In this session, you will learn: -- The parts of database application performance -- The performance setup procedure -- Basic troubleshooting tools -- The 13 postgresql.conf settings you need to know -- Where to look for more information.
Ceph Object Storage Reference Architecture Performance and Sizing GuideKaran Singh
Together with my colleagues at Red Hat Storage Team, i am very proud to have worked on this reference architecture for Ceph Object Storage.
If you are building Ceph object storage at scale, this document is for you.
In an environment where cloud-scaling applications is becoming more and more important, client-server architectures paradigms, as shown by memcached, are back with vengeance. In this talk, Galder will talk about Hot Rod, Infinispan's new client/server binary protocol, explaining the key differences compared to memcached's binary protocol, such as the possibility of receiving cluster topology changes. Audience of this talk will learn of the importance of Hot Rod in 'cloud-scale' application server clustering, where stateless application server instances could use Infinispan Hot Rod clients to retrieve state from an elastic farm of Infinispan Hot Rod servers, improving capabilities to run application server instances as a PaaS. The talk will finish with a brief demo of a cluster of Infinispan Hot Rod servers running on EC2 being accessed from a non-Java client. The audience is expected to have an intermediate understanding of client-server software architectures and cloud deployments.
GOTO Berlin - Battle of the Circuit Breakers: Resilience4J vs IstioNicolas Fränkel
Kubernetes in general, and Istio in particular, have changed a lot the way we look at Ops-related constraints: monitoring, load-balancing, health checks, etc. Before those products became available, there were already available solutions to handle those constraints.
Among them is Resilience4J, a Java library. From the site: "Resilience4j is a fault tolerance library designed for Java8 and functional programming." In particular, Resilience4J provides an implementation of the Circuit Breaker pattern, which prevents a network or service failure from cascading to other services. But now Istio also provides the same capability.
In this talk, we will have a look at how Istio and Resilience4J implement the Circuit Breaker pattern, and what pros/cons each of them has.
After this talk, you’ll be able to decide which one is the best fit in your context.
Talk held at DevOps Gathering 2019 in Bochum on 2019-03-13.
Abstract: This talk will address one of the most common challenges of organizations adopting Kubernetes on a medium to large scale: how to keep cloud costs under control without babysitting each and every deployment and cluster configuration? How to operate 80+ Kubernetes clusters in a cost-efficient way for 200+ autonomous development teams?
This talk provides insights on how Zalando approaches this problem with central cost optimizations (e.g. Spot), cost monitoring/alerting, active measures to reduce resource slack, and automated cluster housekeeping. We will focus on how to ingrain cost efficiency in tooling and developer workflows while balancing rigid cost control with developer convenience and without impacting availability or performance. We will show our use case running Kubernetes on AWS, but all shown tools are open source and can be applied to most other infrastructure environments.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
어떻게 하면 배포 프로세스를 빠르게 개선할 수 있을까요?
git branch를 푸시하고 개별 테스트 서버를 만드려면 어떻게 해야 할까요?
쿠버네티스와 GitOps, Argo CD를 이용한 배포 방법을 소개 합니다.
Open Infrastructure & Cloud Native Days Korea 2019 발표자료
원본 슬라이드 다운로드 - http://bit.ly/subicura-gitops
Cloud-Native PostgreSQL is a Kubernetes Operator for Postgres written by EDB entirely from scratch in the Go language and relying exclusively on the Kubernetes API.
This webinar covered:
- About DevOps & Cloud Native
- Overview of Cloud Native Postgres
- Storage for Postgres workloads in Kubernetes
- Start Using Cloud-Native Postgres
- Demo
In the session, we will go through the introduction to Karpenter, emphasizing on the importance of autoscaling in kubernetes. In addition, we will walk through a short demo that depicts the working of the Karpenter.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 80+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are Open Source and can be applied to most Kubernetes deployments.
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxRomanKhavronenko
VictoriaMetrics and Grafana Mimir are time series databases with support of mostly the same protocols and APIs. However, they have different architectures and components, which makes the comparison more complicated. In the talk, we'll go through the details of the benchmark where I compared both solutions. We'll see how VictoriaMetrics and Mimir are dealing with identical workloads and how efficient they’re with using the allocated resources.
The talk will cover design and architectural details, weak and strong points, trade-offs, and maintenance complexity of both solutions.
A look at some of the ways available to deploy Postgres in a Kubernetes cloud environment, either in small scale using simple configurations, or in larger scale using tools such as Helm charts and the Crunchy PostgreSQL Operator. A short introduction to Kubernetes will be given to explain the concepts involved, followed by examples from each deployment method and observations on the key differences.
Josh Berkus
You've heard that PostgreSQL is the highest-performance transactional open source database, but you're not seeing it on YOUR server. In fact, your PostgreSQL application is kind of poky. What should you do? While doing advanced performance engineering for really high-end systems takes years to learn, you can learn the basics to solve performance issues for 80% of PostgreSQL installations in less than an hour. In this session, you will learn: -- The parts of database application performance -- The performance setup procedure -- Basic troubleshooting tools -- The 13 postgresql.conf settings you need to know -- Where to look for more information.
Ceph Object Storage Reference Architecture Performance and Sizing GuideKaran Singh
Together with my colleagues at Red Hat Storage Team, i am very proud to have worked on this reference architecture for Ceph Object Storage.
If you are building Ceph object storage at scale, this document is for you.
In an environment where cloud-scaling applications is becoming more and more important, client-server architectures paradigms, as shown by memcached, are back with vengeance. In this talk, Galder will talk about Hot Rod, Infinispan's new client/server binary protocol, explaining the key differences compared to memcached's binary protocol, such as the possibility of receiving cluster topology changes. Audience of this talk will learn of the importance of Hot Rod in 'cloud-scale' application server clustering, where stateless application server instances could use Infinispan Hot Rod clients to retrieve state from an elastic farm of Infinispan Hot Rod servers, improving capabilities to run application server instances as a PaaS. The talk will finish with a brief demo of a cluster of Infinispan Hot Rod servers running on EC2 being accessed from a non-Java client. The audience is expected to have an intermediate understanding of client-server software architectures and cloud deployments.
GOTO Berlin - Battle of the Circuit Breakers: Resilience4J vs IstioNicolas Fränkel
Kubernetes in general, and Istio in particular, have changed a lot the way we look at Ops-related constraints: monitoring, load-balancing, health checks, etc. Before those products became available, there were already available solutions to handle those constraints.
Among them is Resilience4J, a Java library. From the site: "Resilience4j is a fault tolerance library designed for Java8 and functional programming." In particular, Resilience4J provides an implementation of the Circuit Breaker pattern, which prevents a network or service failure from cascading to other services. But now Istio also provides the same capability.
In this talk, we will have a look at how Istio and Resilience4J implement the Circuit Breaker pattern, and what pros/cons each of them has.
After this talk, you’ll be able to decide which one is the best fit in your context.
Talk held at DevOps Gathering 2019 in Bochum on 2019-03-13.
Abstract: This talk will address one of the most common challenges of organizations adopting Kubernetes on a medium to large scale: how to keep cloud costs under control without babysitting each and every deployment and cluster configuration? How to operate 80+ Kubernetes clusters in a cost-efficient way for 200+ autonomous development teams?
This talk provides insights on how Zalando approaches this problem with central cost optimizations (e.g. Spot), cost monitoring/alerting, active measures to reduce resource slack, and automated cluster housekeeping. We will focus on how to ingrain cost efficiency in tooling and developer workflows while balancing rigid cost control with developer convenience and without impacting availability or performance. We will show our use case running Kubernetes on AWS, but all shown tools are open source and can be applied to most other infrastructure environments.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
어떻게 하면 배포 프로세스를 빠르게 개선할 수 있을까요?
git branch를 푸시하고 개별 테스트 서버를 만드려면 어떻게 해야 할까요?
쿠버네티스와 GitOps, Argo CD를 이용한 배포 방법을 소개 합니다.
Open Infrastructure & Cloud Native Days Korea 2019 발표자료
원본 슬라이드 다운로드 - http://bit.ly/subicura-gitops
Cloud-Native PostgreSQL is a Kubernetes Operator for Postgres written by EDB entirely from scratch in the Go language and relying exclusively on the Kubernetes API.
This webinar covered:
- About DevOps & Cloud Native
- Overview of Cloud Native Postgres
- Storage for Postgres workloads in Kubernetes
- Start Using Cloud-Native Postgres
- Demo
In the session, we will go through the introduction to Karpenter, emphasizing on the importance of autoscaling in kubernetes. In addition, we will walk through a short demo that depicts the working of the Karpenter.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 80+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are Open Source and can be applied to most Kubernetes deployments.
The Next Frontier in Open Source Java Compilers: Just-In-Time Compilation as a Service
For Java developers, the Just-In-Time (JIT) compiler is key to improved performance. However, in a container world, the performance gains are often negated due to CPU and memory consumption constraints. To help solve this issue, the Eclipse OpenJ9 JVM provides JITServer technology, which separates the JIT compiler from the application.
JITServer allows the user to employ much smaller containers enabling a higher density of applications, resulting in cost savings for end-users and/or cloud providers. Because the CPU and memory surges due to JIT compilation are eliminated, the user has a much easier task of provisioning resources for his/her application. Additional advantages include: faster ramp-up time, better control over resources devoted to compilation, increased reliability (JIT compiler bugs no longer crash the application) and amortization of compilation costs across many application instances.
We will dig into JITServer technology, showing the challenges of implementation, detailing its strengths and weaknesses and illustrating its performance characteristics. For the cloud audience we will show how it can be deployed in containers, demonstrate its advantages compared to a traditional JIT compilation technique and offer practical recommendations about when to use this technology.
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Devoxx France 2018 : Mes Applications en Production sur KubernetesMichaël Morello
Retour d'expérience sur la mise en production d'applications ( Java mais pas seulement ) sur Kubernetes à Devoxx France 2018
La vidéo avec la démo est disponible en ligne ici : https://www.youtube.com/watch?v=cqqLeS9mUyU
This is an updated version of my JITServer talk that I will present at Open Source Summit North America in May 2023
The Next Frontier in Open Source Java Compilers: Just-In-Time Compilation as a Service
For Java developers, the Just-In-Time (JIT) compiler is key to improved performance. However, in a container world, the performance gains are often negated due to CPU and memory consumption constraints. To help solve this issue, the Eclipse OpenJ9 JVM provides JITServer technology, which separates the JIT compiler from the application.
JITServer allows the user to employ much smaller containers enabling a higher density of applications, resulting in cost savings for end-users and/or cloud providers. Because the CPU and memory surges due to JIT compilation are eliminated, the user has a much easier task of provisioning resources for his/her application. Additional advantages include: faster ramp-up time, better control over resources devoted to compilation, increased reliability (JIT compiler bugs no longer crash the application) and amortization of compilation costs across many application instances.
We will dig into JITServer technology, showing the challenges of implementation, detailing its strengths and weaknesses and illustrating its performance characteristics. For the cloud audience we will show how it can be deployed in containers, demonstrate its advantages compared to a traditional JIT compilation technique and offer practical recommendations about when to use this technology.
Speedrunning the Open Street Map osm2pgsql LoaderGregSmith458515
The Open Street Map project provides invaluable data that keeps driving users toward the PostGIS and PostgreSQL stacks. Loading today’s full Planet data set takes a 120GB XML file and unrolls it into over a terabyte of database data. Crunchy’s benchmark labs have followed the expansion of that Planet data over the last six database releases, as the re-ignition of the CPU wars combined with parallel execution features landing in the database. We’ll take a look at that data evolution, which server configurations worked, and which metrics techniques still matter in the all SSD era.
Container Performance Analysis Brendan Gregg, NetflixDocker, Inc.
Containers pose interesting challenges for performance monitoring and analysis, requiring new analysis methodologies and tooling. Resource-oriented analysis, as is common with systems performance tools and GUIs, must now account for both hardware limits and soft limits, as implemented using resource controls including cgroups. The interaction between containers can also be examined, and noisy neighbors either identified of exonerated. Performance tooling can also need special usage or workarounds to function properly from within a container or on the host, to deal with different privilege levels and name spaces. At Netflix, we're using containers for some microservices, and care very much about analyzing and tuning our containers to be as fast and efficient as possible. This talk will show how to successfully analyze performance in a Docker container environment, and navigate differences encountered.
Are you a Java developer wondering what it means to have your application running in the cloud. This session will provide a peek into how the JVM is adapting to running in the cloud and what Java developers need to be aware to ensure they get the most of running in the cloud.
The session will pick an example spring application and tune it stage by stage at the end of which we have an application that is fully optimized and takes advantage of every aspect of the running in a cloud
JCON Online 2021, International Java Community Conference, 07.10.21, Moritz Kammerer (@Moritz Kammerer, Expert Software Engineer at QAware).
== Please download slides in case they are blurred! ===
In his talk we have had a look at how Microservices can be developed with Micronaut. In our slides you can find out if it kept its promise.
POLYTEDA LLC a provider of semiconductor design software and PV-services, announced the general availability of PowerDRC/LVS version 2.0.1. This release is dedicated to delivering further significant improvements for multi-CPU mode and some new LVS functionality. From now XOR operation supports multi-CPU mode to dramatically increase performance
For Java developers, the Just-In-Time (JIT) compiler is key to improved performance. However, in a container world, the performance gains are often negated due to CPU and memory consumption constraints. To help solve this issue, the Eclipse OpenJ9 JVM provides JITServer technology, which separates the JIT compiler from the application. JITServer allows the user to employ much smaller containers enabling a higher density of applications, resulting in cost savings for end-users and/or cloud providers. Because the CPU and memory surges due to JIT compilation are eliminated, the user has a much easier task of provisioning resources for his/her application. Additional advantages include: faster ramp-up time, better control over resources devoted to compilation, increased reliability (JIT compiler bugs no longer crash the application) and amortization of compilation costs across many application instances. We will dig into JITServer technology, showing the challenges of implementation, detailing its strengths and weaknesses and illustrating its performance characteristics. For the cloud audience we will show how it can be deployed in containers, demonstrate its advantages compared to a traditional JIT compilation technique and offer practical recommendations about when to use this technology.
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and Accelerated Computing (GPU and FPGA) instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Similar to Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency (20)
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
3. 3
ZALANDO AT A GLANCE
~ 5.4billion EUR
revenue 2018
> 250
million
visits
per
month
> 15.000
employees in
Europe
> 79%
of visits via
mobile devices
> 26
million
active customers
> 300.000
product choices
~ 2.000
brands
17
countries
19. 19
REQUESTS: CPU SHARES
kubectl run --requests=cpu=10m/5m ..sha512()..
cat /sys/fs/cgroup/cpu/kubepods/burstable/pod5d5..0d/cpu.shares
10 // relative share of CPU time
cat /sys/fs/cgroup/cpu/kubepods/burstable/pod6e0..0d/cpu.shares
5 // relative share of CPU time
cat /sys/fs/cgroup/cpuacct/kubepods/burstable/pod5d5..0d/cpuacct.usage
/sys/fs/cgroup/cpuacct/kubepods/burstable/pod6e0..0d/cpuacct.usage
13432815283 // total CPU time in nanoseconds
7528759332 // total CPU time in nanoseconds
20. 20
LIMITS: COMPRESSIBLE RESOURCES
Can be taken away quickly,
"only" cause slowness
CPU Throttling
200m CPU limit
⇒ container can use 0.2s of CPU time per second
23. 23
MEMORY LIMITS: OUT OF MEMORY
kubectl get pod
NAME READY STATUS RESTARTS AGE
kube-ops-view-7bc-tcwkt 0/1 CrashLoopBackOff 3 2m
kubectl describe pod kube-ops-view-7bc-tcwkt
...
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
24. 24
QUALITY OF SERVICE (QOS)
Guaranteed: all containers have limits == requests
Burstable: some containers have limits > requests
BestEffort: no requests/limits set
kubectl describe pod …
Limits:
memory: 100Mi
Requests:
cpu: 100m
memory: 100Mi
QoS Class: Burstable
25. 25
OVERCOMMIT
Limits > Requests ⇒ Burstable QoS ⇒ Overcommit
For CPU: fine, running into completely fair scheduling
For memory: fine, as long as demand < node capacity
https://code.fb.com/production-engineering/oomd/
Might run into unpredictable OOM
situations when demand reaches node's
memory capacity (Kernel OOM Killer)
26. 26
LIMITS: CGROUPS
docker run --cpus 1 -m 200m --rm -it busybox
cat /sys/fs/cgroup/cpu/docker/8ab25..1c/cpu.{shares,cfs_*}
1024 // cpu.shares (default value)
100000 // cpu.cfs_period_us (100ms period length)
100000 // cpu.cfs_quota_us (total CPU time in µs consumable per period)
cat /sys/fs/cgroup/memory/docker/8ab25..1c/memory.limit_in_bytes
209715200
33. 33
OVERLY AGGRESSIVE CFS: EXPERIMENT #1
CPU Period: 100ms
CPU Quota: None
Burn 5ms and sleep 100ms
⇒ Quota disabled
⇒ No Throttling expected!
https://gist.github.com/bobrik/2030ff040fad360327a5fab7a09c4ff1
34. 34
EXPERIMENT #1: NO QUOTA, NO THROTTLING
2018/11/03 13:04:02 [0] burn took 5ms, real time so far: 5ms, cpu time so far: 6ms
2018/11/03 13:04:03 [1] burn took 5ms, real time so far: 510ms, cpu time so far: 11ms
2018/11/03 13:04:03 [2] burn took 5ms, real time so far: 1015ms, cpu time so far: 17ms
2018/11/03 13:04:04 [3] burn took 5ms, real time so far: 1520ms, cpu time so far: 23ms
2018/11/03 13:04:04 [4] burn took 5ms, real time so far: 2025ms, cpu time so far: 29ms
2018/11/03 13:04:05 [5] burn took 5ms, real time so far: 2530ms, cpu time so far: 35ms
2018/11/03 13:04:05 [6] burn took 5ms, real time so far: 3036ms, cpu time so far: 40ms
2018/11/03 13:04:06 [7] burn took 5ms, real time so far: 3541ms, cpu time so far: 46ms
2018/11/03 13:04:06 [8] burn took 5ms, real time so far: 4046ms, cpu time so far: 52ms
2018/11/03 13:04:07 [9] burn took 5ms, real time so far: 4551ms, cpu time so far: 58ms
35. 35
OVERLY AGGRESSIVE CFS: EXPERIMENT #2
CPU Period: 100ms
CPU Quota: 20ms
Burn 5ms and sleep 500ms
⇒ No 100ms intervals where possibly 20ms is burned
⇒ No Throttling expected!
36. 36
EXPERIMENT #2: OVERLY AGGRESSIVE CFS
2018/11/03 13:05:05 [0] burn took 5ms, real time so far: 5ms, cpu time so far: 5ms
2018/11/03 13:05:06 [1] burn took 99ms, real time so far: 690ms, cpu time so far: 9ms
2018/11/03 13:05:06 [2] burn took 99ms, real time so far: 1290ms, cpu time so far: 14ms
2018/11/03 13:05:07 [3] burn took 99ms, real time so far: 1890ms, cpu time so far: 18ms
2018/11/03 13:05:07 [4] burn took 5ms, real time so far: 2395ms, cpu time so far: 24ms
2018/11/03 13:05:08 [5] burn took 94ms, real time so far: 2990ms, cpu time so far: 27ms
2018/11/03 13:05:09 [6] burn took 99ms, real time so far: 3590ms, cpu time so far: 32ms
2018/11/03 13:05:09 [7] burn took 5ms, real time so far: 4095ms, cpu time so far: 37ms
2018/11/03 13:05:10 [8] burn took 5ms, real time so far: 4600ms, cpu time so far: 43ms
2018/11/03 13:05:10 [9] burn took 5ms, real time so far: 5105ms, cpu time so far: 49ms
37. 37
OVERLY AGGRESSIVE CFS: EXPERIMENT #3
CPU Period: 10ms
CPU Quota: 2ms
Burn 5ms and sleep 100ms
⇒ Same 20% CPU (200m) limit, but smaller period
⇒ Throttling expected!
38. 38
SMALLER CPU PERIOD ⇒ BETTER LATENCY
2018/11/03 16:31:07 [0] burn took 18ms, real time so far: 18ms, cpu time so far: 6ms
2018/11/03 16:31:07 [1] burn took 9ms, real time so far: 128ms, cpu time so far: 8ms
2018/11/03 16:31:07 [2] burn took 9ms, real time so far: 238ms, cpu time so far: 13ms
2018/11/03 16:31:07 [3] burn took 5ms, real time so far: 343ms, cpu time so far: 18ms
2018/11/03 16:31:07 [4] burn took 30ms, real time so far: 488ms, cpu time so far: 24ms
2018/11/03 16:31:07 [5] burn took 19ms, real time so far: 608ms, cpu time so far: 29ms
2018/11/03 16:31:07 [6] burn took 9ms, real time so far: 718ms, cpu time so far: 34ms
2018/11/03 16:31:08 [7] burn took 5ms, real time so far: 824ms, cpu time so far: 40ms
2018/11/03 16:31:08 [8] burn took 5ms, real time so far: 943ms, cpu time so far: 45ms
2018/11/03 16:31:08 [9] burn took 9ms, real time so far: 1068ms, cpu time so far: 48ms
45. 45
CLUSTER AUTOSCALER
Simulates the Kubernetes scheduler internally to find out..
• ..if any of the pods wouldn’t fit on existing nodes
⇒ upscale is needed
• ..if it’s possible to fit some of the pods on existing nodes
⇒ downscale is needed
⇒ Cluster size is determined by resource requests
(+ constraints)
github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
46. 46
AUTOSCALING BUFFER
• Cluster Autoscaler only triggers on Pending Pods
• Node provisioning is slow
⇒ Reserve extra capacity via low priority Pods
"Autoscaling Buffer Pods"
62. 62
VERTICAL POD AUTOSCALER (VPA)
"Some 2/3 of the (Google) Borg
users use autopilot."
- Tim Hockin
VPA: Set resource requests
automatically based on usage.
71. Example: Getting started
with Zalenium & UI Tests
Example: Step by step guide to the first UI test with Zalenium running in the
Continuous Delivery Platform. I was always afraid of UI tests because it looked too
difficult to get started, Zalenium solved this problem for me.
73. 73
KUBERNETES JANITOR
● TTL and expiry date annotations, e.g.
○ set time-to-live for your test deployment
● Custom rules, e.g.
○ delete everything without "app" label after 7 days
github.com/hjacobs/kube-janitor
74. 74
JANITOR TTL ANNOTATION
# let's try out nginx, but only for 1 hour
kubectl run nginx --image=nginx
kubectl annotate deploy nginx janitor/ttl=1h
github.com/hjacobs/kube-janitor
77. 77
SPOT ASG / LAUNCH TEMPLATE
Not upstream in cluster-autoscaler (yet)
78. 78
CLUSTER OVERHEAD: CONTROL PLANE
● GKE cluster: free
● EKS cluster: $146/month
● Zalando prod cluster: $635/month
(etcd nodes + master nodes + ELB)
Potential: fewer etcd nodes, no HA, shared control plane.
79. 79
WHAT WORKED FOR US
● Disable CPU CFS Quota in all clusters
● Prevent memory overcommit
● Kubernetes Resource Report
● Downscaling during off-hours
● EC2 Spot