Do you understand how quorum, consensus, leader election, and different scheduling algorithms can impact your running application? Could you explain these concepts to the rest of your team? Come learn about the algorithms that power all modern container orchestration platforms, and walk away with actionable steps to keep your highly available services highly available.
Join Laura Frank and Stephen Day as they explain and examine technical concepts behind container orchestration systems, like distributed consensus, object models, and node topology. These concepts build the foundation of every modern orchestration system, and each technical explanation will be illustrated using Docker’s SwarmKit as a real-world example. Gain a deeper understanding of how orchestration systems like SwarmKit work in practice, and walk away with more insights into your production applications.
Scalable and Available Services with Docker and KubernetesLaura Frank Tacho
Breaking down your monolith into microservices — or starting fresh with a microservice architecture on a new project — is only half the battle. The next step is making your containerized microservice application fault tolerant and highly available, but how do you get there? How do you go from your local development machine to a production-grade cluster? See how container orchestration platforms like Kubernetes and Swarm can be essential assets for your production workloads, and learn how to replace bottlenecks and single points of failure with highly available, scalable microservice deployments.
"Performance Analysis Methodologies", USENIX/LISA12, San Diego, 2012
Performance analysis methodologies provide guidance, save time, and can find issues that are otherwise overlooked. Example issues include hardware bus saturation, lock contention, recoverable device errors, kernel scheduling issues, and unnecessary workloads. The talk will focus on the USE Method: a simple strategy for all staff for performing a complete check of system performance health, identifying common bottlenecks and errors. Other analysis methods discussed include workload characterization, drill-down analysis, and latency analysis, with example applications from enterprise and cloud computing. Don’t just reach for tools—use a method!
From USENIX LISA 2010, San Jose.
Visualizations that include heat maps can be an effective way to present performance data: I/O latency, resource utilization, and more. Patterns can emerge that would be difficult to notice from columns of numbers or line graphs, which are revealing previously unknown behavior. These visualizations are used in a product as a replacement for traditional metrics such as %CPU and are allowing end users to identify more issues much more easily (and some issues are becoming nearly impossible to identify with tools such as vmstat(1)). This talk covers what has been learned, crazy heat map discoveries, and thoughts for future applications beyond performance analysis.
Using SLOs for Continuous Performance Optimizations of Your k8s WorkloadsScyllaDB
Moving to k8s doesn’t prevent anyone from bad architectural decisions leading to performance degradations, scalability issues or violating your SLOs in production. In fact – building smaller services running in pods connected through service meshes are even more vulnerable to bad architectural or implementation choices.
To avoid any bad deployments, the CNCF project Keptn provides automated SLO-based Performance Analysis as part of your CD process. Keptn automatically detects architectural and deployment changes that have a negative impact to performance and scalability. It uses SLOs (Service Level Objectives) to ensure your services always meet your objectives. The Keptn team has also put out SLO best practices to identify well known performance patterns that have been identified over the years analyzing hundreds of distributed software architectures deployed on k8s.
Join this session and learn what these patterns are and how Keptn helps you prevent them from entering production
Join Laura Frank and Stephen Day as they explain and examine technical concepts behind container orchestration systems, like distributed consensus, object models, and node topology. These concepts build the foundation of every modern orchestration system, and each technical explanation will be illustrated using Docker’s SwarmKit as a real-world example. Gain a deeper understanding of how orchestration systems like SwarmKit work in practice, and walk away with more insights into your production applications.
Scalable and Available Services with Docker and KubernetesLaura Frank Tacho
Breaking down your monolith into microservices — or starting fresh with a microservice architecture on a new project — is only half the battle. The next step is making your containerized microservice application fault tolerant and highly available, but how do you get there? How do you go from your local development machine to a production-grade cluster? See how container orchestration platforms like Kubernetes and Swarm can be essential assets for your production workloads, and learn how to replace bottlenecks and single points of failure with highly available, scalable microservice deployments.
"Performance Analysis Methodologies", USENIX/LISA12, San Diego, 2012
Performance analysis methodologies provide guidance, save time, and can find issues that are otherwise overlooked. Example issues include hardware bus saturation, lock contention, recoverable device errors, kernel scheduling issues, and unnecessary workloads. The talk will focus on the USE Method: a simple strategy for all staff for performing a complete check of system performance health, identifying common bottlenecks and errors. Other analysis methods discussed include workload characterization, drill-down analysis, and latency analysis, with example applications from enterprise and cloud computing. Don’t just reach for tools—use a method!
From USENIX LISA 2010, San Jose.
Visualizations that include heat maps can be an effective way to present performance data: I/O latency, resource utilization, and more. Patterns can emerge that would be difficult to notice from columns of numbers or line graphs, which are revealing previously unknown behavior. These visualizations are used in a product as a replacement for traditional metrics such as %CPU and are allowing end users to identify more issues much more easily (and some issues are becoming nearly impossible to identify with tools such as vmstat(1)). This talk covers what has been learned, crazy heat map discoveries, and thoughts for future applications beyond performance analysis.
Using SLOs for Continuous Performance Optimizations of Your k8s WorkloadsScyllaDB
Moving to k8s doesn’t prevent anyone from bad architectural decisions leading to performance degradations, scalability issues or violating your SLOs in production. In fact – building smaller services running in pods connected through service meshes are even more vulnerable to bad architectural or implementation choices.
To avoid any bad deployments, the CNCF project Keptn provides automated SLO-based Performance Analysis as part of your CD process. Keptn automatically detects architectural and deployment changes that have a negative impact to performance and scalability. It uses SLOs (Service Level Objectives) to ensure your services always meet your objectives. The Keptn team has also put out SLO best practices to identify well known performance patterns that have been identified over the years analyzing hundreds of distributed software architectures deployed on k8s.
Join this session and learn what these patterns are and how Keptn helps you prevent them from entering production
"Containers wrap up software with all its dependencies in packages that can be executed anywhere. This can be specially useful in HPC environments where, often, getting the right combination of software tools to build applications is a daunting task. However, typical container solutions such as Docker are not a perfect fit for HPC environments. Instead, Shifter is a better fit as it has been built from the ground up with HPC in mind. In this talk, we show you what Shifter is and how to leverage from the current Docker environment to run your applications with Shifter."
Watch the video presentation: http://wp.me/p3RLHQ-f81
See more talks in the Switzerland HPC Conference Video Gallery: http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Plug-ins: Building, Shipping, Storing, and Running - Nandhini Santhanam and T...Docker, Inc.
At Docker, we are striving to enable the extensibility of Docker via "Plugins" and make them available for developers and enterprises alike. Come attend this talk to understand what it takes to build, ship, store and run plugins. We will deep dive into plugin lifecycle management on a single engine and across a swarm cluster. We will also demonstrate how you can integrate plugins from other enterprises or developers into your ecosystem. There will be fun demos accompanying this talk! This will be session will be beneficial to you if you: 1) Are an ops team member trying to integrate Docker with your favorite storage or network vendor 2) Are Interested in extending or customizing Docker; or 3) Want to become a Docker partner, and want to make the technology integration seamless.
SREcon 2016 Performance Checklists for SREsBrendan Gregg
Talk from SREcon2016 by Brendan Gregg. Video: https://www.usenix.org/conference/srecon16/program/presentation/gregg . "There's limited time for performance analysis in the emergency room. When there is a performance-related site outage, the SRE team must analyze and solve complex performance issues as quickly as possible, and under pressure. Many performance tools and techniques are designed for a different environment: an engineer analyzing their system over the course of hours or days, and given time to try dozens of tools: profilers, tracers, monitoring tools, benchmarks, as well as different tunings and configurations. But when Netflix is down, minutes matter, and there's little time for such traditional systems analysis. As with aviation emergencies, short checklists and quick procedures can be applied by the on-call SRE staff to help solve performance issues as quickly as possible.
In this talk, I'll cover a checklist for Linux performance analysis in 60 seconds, as well as other methodology-derived checklists and procedures for cloud computing, with examples of performance issues for context. Whether you are solving crises in the SRE war room, or just have limited time for performance engineering, these checklists and approaches should help you find some quick performance wins. Safe flying."
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
Broken benchmarks, misleading metrics, and terrible tools. This talk will help you navigate the treacherous waters of Linux performance tools, touring common problems with system tools, metrics, statistics, visualizations, measurement overhead, and benchmarks. You might discover that tools you have been using for years, are in fact, misleading, dangerous, or broken.
The speaker, Brendan Gregg, has given many talks on tools that work, including giving the Linux PerformanceTools talk originally at SCALE. This is an anti-version of that talk, to focus on broken tools and metrics instead of the working ones. Metrics can be misleading, and counters can be counter-intuitive! This talk will include advice for verifying new performance tools, understanding how they work, and using them successfully.
Combining Cloud Native & PaaS: Building a Fully Managed Application Platform ...DigitalOcean
Watch this Tech Talk: https://do.co/video_snormore
An engineering-led talk that covers the challenges DigitalOcean encountered with a Droplet-based architecture and why we pivoted to using Kubernetes. Steven Normore, Engineering Manager at DigitalOcean, shares the benefits that a Kubernetes-based architecture provides to customers, and shares guidance on what to keep in mind as you build your business on DigitalOcean.
About the Presenter
Steven Normore is an Engineering Manager at DigitalOcean. He builds and operates systems for production web applications. He has an education in computer science and mathematics with a focus on combinatorics and distributed systems.
New to DigitalOcean? Get US $100 in credit when you sign up: https://do.co/deploytoday
To learn more about DigitalOcean: https://www.digitalocean.com/
Follow us on Twitter: https://twitter.com/digitalocean
Like us on Facebook: https://www.facebook.com/DigitalOcean
Follow us on Instagram: https://www.instagram.com/thedigitalocean/
We're hiring: http://do.co/careers
Monitorama 2015 talk by Brendan Gregg, Netflix. With our large and ever-changing cloud environment, it can be vital to debug instance-level performance quickly. There are many instance monitoring solutions, but few come close to meeting our requirements, so we've been building our own and open sourcing them. In this talk, I will discuss our real-world requirements for instance-level analysis and monitoring: not just the metrics and features we desire, but the methodologies we'd like to apply. I will also cover the new and novel solutions we have been developing ourselves to meet these needs and desires, which include use of advanced Linux performance technologies (eg, ftrace, perf_events), and on-demand self-service analysis (Vector).
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
Talk delivered at SCaLE10x, Los Angeles 2012.
Cloud Computing introduces new challenges for performance
analysis, for both customers and operators of the cloud. Apart from
monitoring a scaling environment, issues within a system can be
complicated when tenants are competing for the same resources, and are
invisible to each other. Other factors include rapidly changing
production code and wildly unpredictable traffic surges. For
performance analysis in the Joyent public cloud, we use a variety of
tools including Dynamic Tracing, which allows us to create custom
tools and metrics and to explore new concepts. In this presentation
I'll discuss a collection of these tools and the metrics that they
measure. While these are DTrace-based, the focus of the talk is on
which metrics are proving useful for analyzing real cloud issues.
We open-sourced LinuxKit in April 2017 at DockerCon in Austin. In this session, we'll take a detailed look at some advanced topics of LinuxKit ranging from the general read-only filesystem setup, multi-arch image support for x86_64 and arm64, custom network configuration, and kernel debugging and testing.
Surge 2014: From Clouds to Roots: root cause performance analysis at Netflix. Brendan Gregg.
At Netflix, high scale and fast deployment rule. The possibilities for failure are endless, and the environment excels at handling this, regularly tested and exercised by the simian army. But, when this environment automatically works around systemic issues that aren’t root-caused, they can grow over time. This talk describes the challenge of not just handling failures of scale on the Netflix cloud, but also new approaches and tools for quickly diagnosing their root cause in an ever changing environment.
Video: https://www.facebook.com/atscaleevents/videos/1693888610884236/ . Talk by Brendan Gregg from Facebook's Performance @Scale: "Linux performance analysis has been the domain of ancient tools and metrics, but that's now changing in the Linux 4.x series. A new tracer is available in the mainline kernel, built from dynamic tracing (kprobes, uprobes) and enhanced BPF (Berkeley Packet Filter), aka, eBPF. It allows us to measure latency distributions for file system I/O and run queue latency, print details of storage device I/O and TCP retransmits, investigate blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. This talk will summarize this new technology and some long-standing issues that it can solve, and how we intend to use it at Netflix."
Docker Tips And Tricks at the Docker Beijing MeetupJérôme Petazzoni
This talk was presented in October at the Docker Beijing Meetup, in the VMware offices.
It presents some of the latest features of Docker, discusses orchestration possibilities with Docker, then gives a briefing about the performance of containers; and finally shows how to use volumes to decouple components in your applications.
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedAnne Nicolas
Ftrace’s most powerful feature is the function tracer (and function graph tracer which is built from it). But to have this enabled on production systems, it had to have its overhead be negligible when disabled. As the function tracer uses gcc’s profiling mechanism, which adds a call to “mcount” (or more recently fentry, don’t worry if you don’t know what this is, it will all be explained) at the start of almost all functions, it had to do something about the overhead that causes. The solution was to turn those calls into “nops” (an instruction that the CPU simply ignores). But this was no easy feat. It took a lot to come up with a solution (and also turning a few network cards into bricks). This talk will explain the history of how ftrace came about implementing the function tracer, and brought with it the possibility of static branches and soon static calls!
Steven Rostedt
Using eBPF to Measure the k8s Cluster HealthScyllaDB
As a k8s cluster-admin your app teams have a certain expectation of your cluster to be available to deploy services at any time without problems. While there is no shortage on metrics in k8s its important to have the right metrics to alert on issues and giving you enough data to react to potential availability issues. Prometheus has become a standard and sheds light on the inner behaviour of Kubernetes clusters and workloads. Lots of KPIs (CPU, IO, network. Etc) in our On-Premise environment are less precise when we start to work in a Cloud environment. Ebpf is the perfect technology that fulfills that requirement as it gives us information down to the kernel level. In 2018 Cloudflare shared an opensource project to expose custom ebpf metrics in Prometheus. Join this session and learn about: • What is ebpf? • What type of metrics we can collect? • How to expose those metrics in a K8s environment. This session will try to deliver a step-by-step guide on how to take advantage of the ebpf exporter.
Scalability and optimization are constant
concerns for the developer and operations
manager. The Performance Zone focuses on
all things performance, covering everything
from database optimization to garbage
collection, tool and technique comparisons,
and tweaks to keep your code as effcient
as possible.
Management and Automation of MongoDB Clusters - SlidesSeveralnines
Use MongoDB at Any Scale
As you scale, one of the challenges is optimizing your clusters and mitigating operational risk. Proper preparation can result in significant savings and reduced downtime.
This session covers:
* Deployment of dev/test/production environments across private data centers or public clouds
* What to monitor in production environments
* Management automation with ClusterControl from Severalnines
* How ClusterControl works with TokuMX
The session will give you the tools to more effectively manage your cluster, immediately. The presentation will include code samples and a live Q&A session.
This webinar is being delivered jointly by Severalnines & Tokutek. Severalnines provides automation and management tools to reduce the complexity of working with highly available database clusters. Tokutek provides high-performance and scalability for MongoDB, MySQL and MariaDB.
"Containers wrap up software with all its dependencies in packages that can be executed anywhere. This can be specially useful in HPC environments where, often, getting the right combination of software tools to build applications is a daunting task. However, typical container solutions such as Docker are not a perfect fit for HPC environments. Instead, Shifter is a better fit as it has been built from the ground up with HPC in mind. In this talk, we show you what Shifter is and how to leverage from the current Docker environment to run your applications with Shifter."
Watch the video presentation: http://wp.me/p3RLHQ-f81
See more talks in the Switzerland HPC Conference Video Gallery: http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Plug-ins: Building, Shipping, Storing, and Running - Nandhini Santhanam and T...Docker, Inc.
At Docker, we are striving to enable the extensibility of Docker via "Plugins" and make them available for developers and enterprises alike. Come attend this talk to understand what it takes to build, ship, store and run plugins. We will deep dive into plugin lifecycle management on a single engine and across a swarm cluster. We will also demonstrate how you can integrate plugins from other enterprises or developers into your ecosystem. There will be fun demos accompanying this talk! This will be session will be beneficial to you if you: 1) Are an ops team member trying to integrate Docker with your favorite storage or network vendor 2) Are Interested in extending or customizing Docker; or 3) Want to become a Docker partner, and want to make the technology integration seamless.
SREcon 2016 Performance Checklists for SREsBrendan Gregg
Talk from SREcon2016 by Brendan Gregg. Video: https://www.usenix.org/conference/srecon16/program/presentation/gregg . "There's limited time for performance analysis in the emergency room. When there is a performance-related site outage, the SRE team must analyze and solve complex performance issues as quickly as possible, and under pressure. Many performance tools and techniques are designed for a different environment: an engineer analyzing their system over the course of hours or days, and given time to try dozens of tools: profilers, tracers, monitoring tools, benchmarks, as well as different tunings and configurations. But when Netflix is down, minutes matter, and there's little time for such traditional systems analysis. As with aviation emergencies, short checklists and quick procedures can be applied by the on-call SRE staff to help solve performance issues as quickly as possible.
In this talk, I'll cover a checklist for Linux performance analysis in 60 seconds, as well as other methodology-derived checklists and procedures for cloud computing, with examples of performance issues for context. Whether you are solving crises in the SRE war room, or just have limited time for performance engineering, these checklists and approaches should help you find some quick performance wins. Safe flying."
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
Broken benchmarks, misleading metrics, and terrible tools. This talk will help you navigate the treacherous waters of Linux performance tools, touring common problems with system tools, metrics, statistics, visualizations, measurement overhead, and benchmarks. You might discover that tools you have been using for years, are in fact, misleading, dangerous, or broken.
The speaker, Brendan Gregg, has given many talks on tools that work, including giving the Linux PerformanceTools talk originally at SCALE. This is an anti-version of that talk, to focus on broken tools and metrics instead of the working ones. Metrics can be misleading, and counters can be counter-intuitive! This talk will include advice for verifying new performance tools, understanding how they work, and using them successfully.
Combining Cloud Native & PaaS: Building a Fully Managed Application Platform ...DigitalOcean
Watch this Tech Talk: https://do.co/video_snormore
An engineering-led talk that covers the challenges DigitalOcean encountered with a Droplet-based architecture and why we pivoted to using Kubernetes. Steven Normore, Engineering Manager at DigitalOcean, shares the benefits that a Kubernetes-based architecture provides to customers, and shares guidance on what to keep in mind as you build your business on DigitalOcean.
About the Presenter
Steven Normore is an Engineering Manager at DigitalOcean. He builds and operates systems for production web applications. He has an education in computer science and mathematics with a focus on combinatorics and distributed systems.
New to DigitalOcean? Get US $100 in credit when you sign up: https://do.co/deploytoday
To learn more about DigitalOcean: https://www.digitalocean.com/
Follow us on Twitter: https://twitter.com/digitalocean
Like us on Facebook: https://www.facebook.com/DigitalOcean
Follow us on Instagram: https://www.instagram.com/thedigitalocean/
We're hiring: http://do.co/careers
Monitorama 2015 talk by Brendan Gregg, Netflix. With our large and ever-changing cloud environment, it can be vital to debug instance-level performance quickly. There are many instance monitoring solutions, but few come close to meeting our requirements, so we've been building our own and open sourcing them. In this talk, I will discuss our real-world requirements for instance-level analysis and monitoring: not just the metrics and features we desire, but the methodologies we'd like to apply. I will also cover the new and novel solutions we have been developing ourselves to meet these needs and desires, which include use of advanced Linux performance technologies (eg, ftrace, perf_events), and on-demand self-service analysis (Vector).
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
Talk delivered at SCaLE10x, Los Angeles 2012.
Cloud Computing introduces new challenges for performance
analysis, for both customers and operators of the cloud. Apart from
monitoring a scaling environment, issues within a system can be
complicated when tenants are competing for the same resources, and are
invisible to each other. Other factors include rapidly changing
production code and wildly unpredictable traffic surges. For
performance analysis in the Joyent public cloud, we use a variety of
tools including Dynamic Tracing, which allows us to create custom
tools and metrics and to explore new concepts. In this presentation
I'll discuss a collection of these tools and the metrics that they
measure. While these are DTrace-based, the focus of the talk is on
which metrics are proving useful for analyzing real cloud issues.
We open-sourced LinuxKit in April 2017 at DockerCon in Austin. In this session, we'll take a detailed look at some advanced topics of LinuxKit ranging from the general read-only filesystem setup, multi-arch image support for x86_64 and arm64, custom network configuration, and kernel debugging and testing.
Surge 2014: From Clouds to Roots: root cause performance analysis at Netflix. Brendan Gregg.
At Netflix, high scale and fast deployment rule. The possibilities for failure are endless, and the environment excels at handling this, regularly tested and exercised by the simian army. But, when this environment automatically works around systemic issues that aren’t root-caused, they can grow over time. This talk describes the challenge of not just handling failures of scale on the Netflix cloud, but also new approaches and tools for quickly diagnosing their root cause in an ever changing environment.
Video: https://www.facebook.com/atscaleevents/videos/1693888610884236/ . Talk by Brendan Gregg from Facebook's Performance @Scale: "Linux performance analysis has been the domain of ancient tools and metrics, but that's now changing in the Linux 4.x series. A new tracer is available in the mainline kernel, built from dynamic tracing (kprobes, uprobes) and enhanced BPF (Berkeley Packet Filter), aka, eBPF. It allows us to measure latency distributions for file system I/O and run queue latency, print details of storage device I/O and TCP retransmits, investigate blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. This talk will summarize this new technology and some long-standing issues that it can solve, and how we intend to use it at Netflix."
Docker Tips And Tricks at the Docker Beijing MeetupJérôme Petazzoni
This talk was presented in October at the Docker Beijing Meetup, in the VMware offices.
It presents some of the latest features of Docker, discusses orchestration possibilities with Docker, then gives a briefing about the performance of containers; and finally shows how to use volumes to decouple components in your applications.
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedAnne Nicolas
Ftrace’s most powerful feature is the function tracer (and function graph tracer which is built from it). But to have this enabled on production systems, it had to have its overhead be negligible when disabled. As the function tracer uses gcc’s profiling mechanism, which adds a call to “mcount” (or more recently fentry, don’t worry if you don’t know what this is, it will all be explained) at the start of almost all functions, it had to do something about the overhead that causes. The solution was to turn those calls into “nops” (an instruction that the CPU simply ignores). But this was no easy feat. It took a lot to come up with a solution (and also turning a few network cards into bricks). This talk will explain the history of how ftrace came about implementing the function tracer, and brought with it the possibility of static branches and soon static calls!
Steven Rostedt
Using eBPF to Measure the k8s Cluster HealthScyllaDB
As a k8s cluster-admin your app teams have a certain expectation of your cluster to be available to deploy services at any time without problems. While there is no shortage on metrics in k8s its important to have the right metrics to alert on issues and giving you enough data to react to potential availability issues. Prometheus has become a standard and sheds light on the inner behaviour of Kubernetes clusters and workloads. Lots of KPIs (CPU, IO, network. Etc) in our On-Premise environment are less precise when we start to work in a Cloud environment. Ebpf is the perfect technology that fulfills that requirement as it gives us information down to the kernel level. In 2018 Cloudflare shared an opensource project to expose custom ebpf metrics in Prometheus. Join this session and learn about: • What is ebpf? • What type of metrics we can collect? • How to expose those metrics in a K8s environment. This session will try to deliver a step-by-step guide on how to take advantage of the ebpf exporter.
Scalability and optimization are constant
concerns for the developer and operations
manager. The Performance Zone focuses on
all things performance, covering everything
from database optimization to garbage
collection, tool and technique comparisons,
and tweaks to keep your code as effcient
as possible.
Management and Automation of MongoDB Clusters - SlidesSeveralnines
Use MongoDB at Any Scale
As you scale, one of the challenges is optimizing your clusters and mitigating operational risk. Proper preparation can result in significant savings and reduced downtime.
This session covers:
* Deployment of dev/test/production environments across private data centers or public clouds
* What to monitor in production environments
* Management automation with ClusterControl from Severalnines
* How ClusterControl works with TokuMX
The session will give you the tools to more effectively manage your cluster, immediately. The presentation will include code samples and a live Q&A session.
This webinar is being delivered jointly by Severalnines & Tokutek. Severalnines provides automation and management tools to reduce the complexity of working with highly available database clusters. Tokutek provides high-performance and scalability for MongoDB, MySQL and MariaDB.
Architecting for Failure in a Containerized WorldTom Faulhaber
My presentation from QConSF 2016 outlining some design strategies for building robust applications in the docker and container ecosystem. Video coming soon, I hope.
Pragmatic Programmer
Estimating (提升专业素养)
DRY rule - Repeat or Reuse (提炼经验)
Control Structures & Complexity (简洁就是美)
Table Driven (善于运用算法)
Design for CHANGE (完善设计、争取主动)
Refactoring (追求卓越,勇于改进)
Automation (一切都要自动化)
Resource
Who is working for you (找个巨人的肩膀)
Accessories for you
Process and Methods
Process enhances confidence
Thought Disorder(最容易欺骗的人是自己)
Conceptual Blockbusting
Career anchors
How to define the VALUE (价值不等式)
Professionalism
Incident Management in the Age of DevOps and SRE Rundeck
Keynote presentation at DevOps Con Munich, December 3, 2019, presented by Damon Edwards, co-founder of Rundeck.
Responding to incidents has always been the core job of Operations. With the rise of DevOps and SRE, how Operations work gets done — and who is doing the work — is changing. This talk will look at how high-performing organizations are applying DevOps and SRE practices to shorten incidents and reduce escalations.
See a Demo of Rundeck Enterprise :
https://www.rundeck.com/see-demo
--or--
Download Rundeck Open Source here:
https://rundeck.com/open-source
Connect:
Stack Overflow community: https://stackoverflow.com/questions/tagged/rundeck
Github: https://github.com/rundeck/rundeck/issues
Twitter: https://twitter.com/Rundeck
Facebook: https://www.facebook.com/RundeckInc/
LinkedIn: www.linkedin.com › company › rundeck-inc
This presentation is focused on the architecture, scalability concerns, performance bottlenecks, operational characteristics and lessons learned while designing and implementing Yammer distributed real-time search system. Yammer is an enterprise social network SaaS offering with over 100,000 networks (including 85% of the Fortune 100) and nearly 2 million users. The search system we developed scales well up to 1B messages and serves a foundation of knowledge base analysis services Yammer is developing.
Real-time Search at Yammer - By Aleksandrovsky Borislucenerevolution
See conference video - http://www.lucidimagination.com/devzone/events/conferences/revolution/2011
This talk will be focused on the architecture, scalability concerns, performance bottlenecks,
operational characteristics and lessons learned while designing and implementing Yammer
distributed real-time search system. Yammer is an enterprise social network SaaS offering with over
100,000 networks (including 85% of the Fortune 100) and nearly 2 million users. The search system
we developed scales well up to 1B messages and serves a foundation of knowledge base analysis
services Yammer is developing.
Presentation on the architecture, scalability concerns, performance bottlenecks, operational characteristics and lessons learned while designing and implementing Yammer distributed real-time search system.
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Brian Brazil
Prometheus is a next-generation monitoring system with a time series database at it's core. Once you have a time series database, what do you do with it though? This talk will look at getting data in, and more importantly how to use the data you collect productively.
Contact us at prometheus@robustperception.io
With projects springing up at a dizzying pace, it’s hard not to feel like you’re falling behind the curve, or that you’re somehow not doing cloud native development correctly. Breaking out of this “container shame spiral” can be tough. It starts with being better equipped to evaluate projects and trends, and make better decisions about when is the right time to start adopting a new project. Otherwise, you’re all bound to stay stuck in a spiral of reacting to every new announcement. Guiding you through the last few years of container fervor, Laura will help you understand the current landscape and emerging trends, and introduce you to a framework that will help you make sense of all the rapid innovation happening around you.
Wouldn't it be great for a new developer on your team to have their dev environment totally set up on their first day? What about having your CI tests running in the background while you work on new features? What about having the confidence that your dev environment mirrors testing and prod? Containers enable this to become reality, along with other great benefits like keeping dependencies nice and tidy and making packaged code easier to share. Come learn about the ways containers can help you build and ship software easily.
Interested in learning how to set up a Kubernetes cluster and use automation to test and deploy an app?
During this presentation, Laura Frank will take a deep dive into CI/CD best practices with Kubernetes and Amazon EKS. You will be introduced to AmazonEKS, Amazon’s Kubernetes service and CloudBees CodeShip, a flexible continuous integration (CI)/continuous delivery(CD) tool that runs your builds in the cloud. Designed with developers in mind, both EKS and CodeShip when used together reduce the complexity of running an app with Kubernetes.
Attend this webinar to learn:
- An overview of Amazon EKS
- How to set up your own CI/CD pipeline
- How to leverage CI/CD best practices with Kubernetes
Building Efficient Parallel Testing Platforms with DockerLaura Frank Tacho
We often use containers to maintain parity across development, testing, and production environments, but we can also use containerization to significantly reduce time needed for testing by spinning up multiple instances of fully isolated testing environments and executing tests in parallel. This strategy also helps you maximize the utilization of infrastructure resources. The enhanced toolset provided by Docker makes this process simple and unobtrusive, and you’ll see how Docker Engine, Registry, and Compose can work together to make your tests fast.
Fast and efficient software testing is easy with Docker. We often
use containers to maintain parity across development, testing, and production environments, but we can also use containerization to significantly reduce time needed for testing by spinning up multiple instances of fully isolated testing environments and executing tests in parallel. This strategy also helps you maximize the utilization of infrastructure resources. The enhanced toolset provided by Docker makes this process simple and unobtrusive, and you’ll see how Docker Engine, Registry, and Compose can work together to make your tests fast.
DockerCon EU 2015 in Barcelona
Practical tips for using Docker to run tests during development, CI/CD, and different strategies for speeding up the run of your test suite by using parallel pipelines in containers.
The jet tool used to demonstrate parallel testing is available here: https://codeship.com/documentation/docker/installation/
We often optimize our software for performance, but what also optimizing our development teams for happiness? Take a look at how the tools you choose for your development team can impact developer happiness, and learn how to keep your teams happier and more productive.
*The graph on slide 3 is fabricated data, because studies also show that people are more likely to believe statements accompanied by scientific data.*
A brief introduction to containerization, Docker, and getting started with your first containerized Rails application. Source code can be found at https://github.com/rheinwein/rails-demo-apps
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Essentials of Automations: Optimizing FME Workflows with Parameters
Everything You Thought You Already Knew About Orchestration
1. Everything You Thought
You Already Knew About
Orchestration
Laura Frank
Director of Engineering,
Codeship
2.
3. Managing Distributed State with Raft
Quorum 101
Leader Election
Log Replication
Service Scheduling
Failure Recovery
Agenda
bonus debugging tips!
4. They’re trying to get a collection of nodes to behave
like a single node.
• How does the system maintain state?
• How does work get scheduled?
The Big Problem(s)
What are tools like Swarm and Kubernetes trying to do?
7. Quorum
The minimum number of votes that a consensus
group needs in order to be allowed to perform
an operation.
Without quorum, your system can’t do work.
8. Math! Managers Quorum Fault Tolerance
1 1 0
2 2 0
3 2 1
4 3 1
5 3 2
6 4 2
7 4 3
(N/2) + 1
In simpler terms, it
means a majority
9. Math! Managers Quorum Fault Tolerance
1 1 0
2 2 0
3 2 1
4 3 1
5 3 2
6 4 2
7 4 3
(N/2) + 1
In simpler terms, it
means a majority
10. Having two managers instead of
one actually doubles your
chances of losing quorum.
11. Pay attention to datacenter topology when placing
managers.
Quorum With Multiple Regions
Manager Nodes Distribution across 3 Regions
3 1-1-1
5 1-2-2
7 3-2-2
9 3-3-3
magically works with
Docker for AWS
15. Orchestration systems typically use a key/value
store backed by a consensus algorithm
In a lot of cases, that algorithm is Raft!
Raft is used everywhere…
…that etcd is used
17. In most cases, you don’t want to run
work on your manager nodes
docker node update --availability drain <NODE>
Participating in a Raft consensus group is work, too.
Make your manager nodes unavailable for tasks:
*I will run work on managers for educational purposes
21. The log is the source of truth for
your application.
22. In the context of distributed computing (and this
talk), a log is an append-only, time-based record
of data.
2 10 30 25 5 12first entry append entry here!
This log is for computers, not humans.
23. 2 10 30 25 5 12
Server
12
Client
12
In simple systems, the log is pretty straightforward.
24. In a manager group, that log entry can only “become
truth” once it is confirmed from the majority of
followers (quorum!)
Client
12
Manager
follower
Manager
follower
Manager
leader
31. Scheduling constraints
Restrict services to specific nodes, such as specific
architectures, security levels, or types
docker service create
--constraint 'node.labels.type==web' my-app
32. New in 17.04.0-ce
Topology-aware scheduling!!1!
Implements a spread strategy over nodes that belong to a
certain category.
Unlike --constraint, this is a “soft” preference
—placement-pref ‘spread=node.labels.dc’
33.
34. Swarm will not rebalance
healthy tasks when a new
node comes online
35. Debugging Tip
Add a manager to your
Swarm running with
--availability drain
and in Engine debug mode
38. • Bring the downed nodes back online (derp)
Regain quorum
• On a healthy manager, run
docker swarm init --force-new-cluster
This will create a new cluster with one healthy manager
• You need to promote new managers
40. • Bring up a new manager and stop Docker
• sudo rm -rf /var/lib/docker/swarm
• Copy backup to /var/lib/docker/swarm
• Start Docker
• docker swarm init (--force-new-cluster)
Restore from a backup in 5 easy steps!
41. • In general, users shouldn’t be allowed to modify IP
addresses of nodes
• Restoring from a backup == old IP address for node1
• Workaround is to use elastic IPs with ability to reassign
But wait, there’s a bug… or a feature