The document summarizes a talk on container performance analysis. It discusses identifying bottlenecks at the host, container, and kernel level using various Linux performance tools. It also provides an overview of how containers work in Linux using namespaces and control groups (cgroups). Specifically, it demonstrates analyzing resource usage and limitations for containers using tools like docker stats, systemd-cgtop, and investigating namespaces.
Comparing Next-Generation Container Image Building ToolsAkihiro Suda
http://sched.co/EaYe
Until recently, running `docker build` against Dockerfile had been the only way to build container images.
However, lots of opensource software are being proposed as successors/alternatives to `docker build`:
- BuildKit (Moby Project / Docker)
- img (Jessica Frazelle / Microsoft)
- Buildah (Project Atomic / Red Hat)
- umoci & Orca (SUSE)
- Bazel (Google)
- OpenShift S2I (Red Hat)
Akihiro Suda compares these new tools' advantages and disadvantages.
His evaluation basis would include but not be limited to:
- Performance (Cache efficiency, Concurrency, Distributed Execution)
- Secret management, e.g. SSH and AWS keys
- Support for non-Dockerfile
- Non-root execution
- UI & UX
- Governance of the community
He also proposes a unified interface for using these tools with Kubernetes in a vendor-neutral way.
- Understanding Time Series
- What's the Fundamental Problem
- Prometheus Solution (v1.x)
- New Design of Prometheus (v2.x)
- Data Compression Algorithm
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConJérôme Petazzoni
Containers are everywhere. But what exactly is a container? What are they made from? What's the difference between LXC, butts-nspawn, Docker, and the other container systems out there? And why should we bother about specific filesystems?
In this talk, Jérôme will show the individual roles and behaviors of the components making up a container: namespaces, control groups, and copy-on-write systems. Then, he will use them to assemble a container from scratch, and highlight the differences (and likelinesses) with existing container systems.
Everyone heard about Kubernetes. Everyone wants to use this tool. However, sometimes we forget about security, which is essential throughout the container lifecycle.
Therefore, our journey with Kubernetes security should begin in the build stage when writing the code becomes the container image.
Kubernetes provides innate security advantages, and together with solid container protection, it will be invincible.
During the sessions, we will review all those features and highlight which are mandatory to use. We will discuss the main vulnerabilities which may cause compromising your system.
Contacts:
LinkedIn - https://www.linkedin.com/in/vshynkar/
GitHub - https://github.com/sqerison
-------------------------------------------------------------------------------------
Materials from the video:
The policies and docker files examples:
https://gist.github.com/sqerison/43365e30ee62298d9757deeab7643a90
The repo with the helm chart used in a demo:
https://github.com/sqerison/argo-rollouts-demo
Tools that showed in the last section:
https://github.com/armosec/kubescape
https://github.com/aquasecurity/kube-bench
https://github.com/controlplaneio/kubectl-kubesec
https://github.com/Shopify/kubeaudit#installation
https://github.com/eldadru/ksniff
Further learning.
A book released by CISA (Cybersecurity and Infrastructure Security Agency):
https://media.defense.gov/2021/Aug/03/2002820425/-1/-1/1/CTR_KUBERNETES%20HARDENING%20GUIDANCE.PDF
O`REILLY Kubernetes Security:
https://kubernetes-security.info/
O`REILLY Container Security:
https://info.aquasec.com/container-security-book
Thanks for watching!
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...ScaleGrid.io
Compare top PostgreSQL high availability frameworks - PostgreSQL Automatic Failover (PAF), Replication Manager (repmgr) and Patroni to improve your app uptime. ScaleGrid blog - https://scalegrid.io/blog/whats-the-best-postgresql-high-availability-framework-paf-vs-repmgr-vs-patroni-infographic/
BPF of Berkeley Packet Filter mechanism was first introduced in linux in 1997 in version 2.1.75. It has seen a number of extensions of the years. Recently in versions 3.15 - 3.19 it received a major overhaul which drastically expanded it's applicability. This talk will cover how the instruction set looks today and why. It's architecture, capabilities, interface, just-in-time compilers. We will also talk about how it's being used in different areas of the kernel like tracing and networking and future plans.
Comparing Next-Generation Container Image Building ToolsAkihiro Suda
http://sched.co/EaYe
Until recently, running `docker build` against Dockerfile had been the only way to build container images.
However, lots of opensource software are being proposed as successors/alternatives to `docker build`:
- BuildKit (Moby Project / Docker)
- img (Jessica Frazelle / Microsoft)
- Buildah (Project Atomic / Red Hat)
- umoci & Orca (SUSE)
- Bazel (Google)
- OpenShift S2I (Red Hat)
Akihiro Suda compares these new tools' advantages and disadvantages.
His evaluation basis would include but not be limited to:
- Performance (Cache efficiency, Concurrency, Distributed Execution)
- Secret management, e.g. SSH and AWS keys
- Support for non-Dockerfile
- Non-root execution
- UI & UX
- Governance of the community
He also proposes a unified interface for using these tools with Kubernetes in a vendor-neutral way.
- Understanding Time Series
- What's the Fundamental Problem
- Prometheus Solution (v1.x)
- New Design of Prometheus (v2.x)
- Data Compression Algorithm
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConJérôme Petazzoni
Containers are everywhere. But what exactly is a container? What are they made from? What's the difference between LXC, butts-nspawn, Docker, and the other container systems out there? And why should we bother about specific filesystems?
In this talk, Jérôme will show the individual roles and behaviors of the components making up a container: namespaces, control groups, and copy-on-write systems. Then, he will use them to assemble a container from scratch, and highlight the differences (and likelinesses) with existing container systems.
Everyone heard about Kubernetes. Everyone wants to use this tool. However, sometimes we forget about security, which is essential throughout the container lifecycle.
Therefore, our journey with Kubernetes security should begin in the build stage when writing the code becomes the container image.
Kubernetes provides innate security advantages, and together with solid container protection, it will be invincible.
During the sessions, we will review all those features and highlight which are mandatory to use. We will discuss the main vulnerabilities which may cause compromising your system.
Contacts:
LinkedIn - https://www.linkedin.com/in/vshynkar/
GitHub - https://github.com/sqerison
-------------------------------------------------------------------------------------
Materials from the video:
The policies and docker files examples:
https://gist.github.com/sqerison/43365e30ee62298d9757deeab7643a90
The repo with the helm chart used in a demo:
https://github.com/sqerison/argo-rollouts-demo
Tools that showed in the last section:
https://github.com/armosec/kubescape
https://github.com/aquasecurity/kube-bench
https://github.com/controlplaneio/kubectl-kubesec
https://github.com/Shopify/kubeaudit#installation
https://github.com/eldadru/ksniff
Further learning.
A book released by CISA (Cybersecurity and Infrastructure Security Agency):
https://media.defense.gov/2021/Aug/03/2002820425/-1/-1/1/CTR_KUBERNETES%20HARDENING%20GUIDANCE.PDF
O`REILLY Kubernetes Security:
https://kubernetes-security.info/
O`REILLY Container Security:
https://info.aquasec.com/container-security-book
Thanks for watching!
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...ScaleGrid.io
Compare top PostgreSQL high availability frameworks - PostgreSQL Automatic Failover (PAF), Replication Manager (repmgr) and Patroni to improve your app uptime. ScaleGrid blog - https://scalegrid.io/blog/whats-the-best-postgresql-high-availability-framework-paf-vs-repmgr-vs-patroni-infographic/
BPF of Berkeley Packet Filter mechanism was first introduced in linux in 1997 in version 2.1.75. It has seen a number of extensions of the years. Recently in versions 3.15 - 3.19 it received a major overhaul which drastically expanded it's applicability. This talk will cover how the instruction set looks today and why. It's architecture, capabilities, interface, just-in-time compilers. We will also talk about how it's being used in different areas of the kernel like tracing and networking and future plans.
Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg
Talk by for Velocity 2017 by Brendan Gregg: Performance analysis superpowers with Linux eBPF.
"Advanced performance observability and debugging have arrived built into the Linux 4.x series, thanks to enhancements to Berkeley Packet Filter (BPF, or eBPF) and the repurposing of its sandboxed virtual machine to provide programmatic capabilities to system tracing. Netflix has been investigating its use for new observability tools, monitoring, security uses, and more. This talk will investigate this new technology, which sooner or later will be available to everyone who uses Linux. The talk will dive deep on these new tracing, observability, and debugging capabilities. Whether you’re doing analysis over an ssh session, or via a monitoring GUI, BPF can be used to provide an efficient, custom, and deep level of detail into system and application performance.
This talk will also demonstrate the new open source tools that have been developed, which make use of kernel- and user-level dynamic tracing (kprobes and uprobes), and kernel- and user-level static tracing (tracepoints). These tools provide new insights for file system and storage performance, CPU scheduler performance, TCP performance, and a whole lot more. This is a major turning point for Linux systems engineering, as custom advanced performance instrumentation can be used safely in production environments, powering a new generation of tools and visualizations."
No matter how resilient your database infrastructure is, backups are still needed to defend against catastrophic failures. Be it the unlikely hardware failure of all data centers, or the more likely and all-too-human user error. Acknowledging the importance of good backup procedures, the Scylla Manager now natively supports backup and restore operations. In this talk, we will learn more about how that works and the guarantees provided, as well as how to set it up to guarantee maximum resiliency to your cluster.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 80+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are Open Source and can be applied to most Kubernetes deployments.
Kubernetes is more or less one of the biggest players when it comes to Container orchestration. Since Kubernetes 1.7 RBAC (Role Based Access Control) is the default for the authorisation of actions in you cluster. There are many other components, like Pod Security Policies, Network Policies, Admisstion Controllers, that allows you to secure your Kubernetes cluster.
In this talk I will show you how these things can work together and which problem these components try to solve. Also I will show you an overview how other tools like Vault can fit into the Kubernetes ecosystem to make you platform more secure.
Event: DevFest Karlsruhe, 09.12.2017
Speaker: Johannes M. Scheuermann
Weitere Tech-Vorträge: https://www.inovex.de/de/content-pool/vortraege/
Weitere Tech-Artikel: https://www.inovex.de/blog/
The rise of Layer 7, microservices, and the proxy war with Envoy, NGINX, and ...Ambassador Labs
Modern cloud applications today are built as distributed microservices. These microservices talk to each other over L7 protocols: HTTP, gRPC, Redis, Kafka, and more. In this world, L7 proxies have assumed a crucial role in managing and observing L7 protocols. In this talk, I’ll discuss the evolution of service architectures, the role L7 proxies play in this world, and how there is now a battle raging between Envoy Proxy, HAProxy, and NGINX. I’ll wrap by talking about why we chose Envoy Proxy as the anchor of our Ambassador API Gateway and show how that has enabled a number of new capabilities.
Recording here: https://www.youtube.com/watch?v=5W4n9K3PIVg
Since Docker was open sourced in 2013, the community and adoption around Docker containers has grown to over 6 billion downloads and over 1000 contributors. Learn about why this is, and why you should start using containers for your own applications.
USENIX LISA2021 talk by Brendan Gregg (https://www.youtube.com/watch?v=_5Z2AU7QTH4). This talk is a deep dive that describes how BPF (eBPF) works internally on Linux, and dissects some modern performance observability tools. Details covered include the kernel BPF implementation: the verifier, JIT compilation, and the BPF execution environment; the BPF instruction set; different event sources; and how BPF is used by user space, using bpftrace programs as an example. This includes showing how bpftrace is compiled to LLVM IR and then BPF bytecode, and how per-event data and aggregated map data are fetched from the kernel.
Kubernetes Clusters Security with Amazon EKS (CON338-R1) - AWS re:Invent 2018Amazon Web Services
In this session, we discuss best practices for securing your Kubernetes deployments on AWS. We cover how to use AWS IAM with Kubernetes role-based access control (RBAC) for new or existing Kubernetes deployments, and we dive deep into how Amazon EKS implements secure cluster configuration by default.
** Kubernetes Certification Training: https://www.edureka.co/kubernetes-certification **
This Edureka tutorial on "Kubernetes Architecture" will give you an introduction to popular DevOps tool - Kubernetes, and will deep dive into Kubernetes Architecture and its working. The following topics are covered in this training session:
1. What is Kubernetes
2. Features of Kubernetes
3. Kubernetes Architecture and Its Components
4. Components of Master Node and Worker Node
5. ETCD
6. Network Setup Requirements
DevOps Tutorial Blog Series: https://goo.gl/P0zAfF
A comprehensive walkthrough of how to manage infrastructure-as-code using Terraform. This presentation includes an introduction to Terraform, a discussion of how to manage Terraform state, how to use Terraform modules, an overview of best practices (e.g. isolation, versioning, loops, if-statements), and a list of gotchas to look out for.
For a written and more in-depth version of this presentation, check out the "Comprehensive Guide to Terraform" blog post series: https://blog.gruntwork.io/a-comprehensive-guide-to-terraform-b3d32832baca
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
Secure Substrate: Least Privilege Container Deployment Docker, Inc.
Riyaz Faizullabhoy - Security Engineer, Docker
Diogo Mónica - Security Lead, Docker
The popularity of containers has driven the need for distributed systems that can provide a substrate for container deployments. These systems need the ability to provision and manage resources, place workloads, and adapt in the presence of failures. In particular, container orchestrators make it easy for anyone to manage their container workloads using their cloud-based or on-premise infrastructure. Unfortunately, most of these systems have not been architected with security in mind.Compromise of a less-privileged node can allow an attacker to escalate privileges to either gain control of the whole system, or to access resources it shouldn't have access to. In this talk, we will go over how Docker has been working to build secure blocks that allow you to run a least privilege infrastructure - where any participant of the system only has access to the resources that are strictly necessary for its legitimate purpose. No more, no less.
Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg
Talk by for Velocity 2017 by Brendan Gregg: Performance analysis superpowers with Linux eBPF.
"Advanced performance observability and debugging have arrived built into the Linux 4.x series, thanks to enhancements to Berkeley Packet Filter (BPF, or eBPF) and the repurposing of its sandboxed virtual machine to provide programmatic capabilities to system tracing. Netflix has been investigating its use for new observability tools, monitoring, security uses, and more. This talk will investigate this new technology, which sooner or later will be available to everyone who uses Linux. The talk will dive deep on these new tracing, observability, and debugging capabilities. Whether you’re doing analysis over an ssh session, or via a monitoring GUI, BPF can be used to provide an efficient, custom, and deep level of detail into system and application performance.
This talk will also demonstrate the new open source tools that have been developed, which make use of kernel- and user-level dynamic tracing (kprobes and uprobes), and kernel- and user-level static tracing (tracepoints). These tools provide new insights for file system and storage performance, CPU scheduler performance, TCP performance, and a whole lot more. This is a major turning point for Linux systems engineering, as custom advanced performance instrumentation can be used safely in production environments, powering a new generation of tools and visualizations."
No matter how resilient your database infrastructure is, backups are still needed to defend against catastrophic failures. Be it the unlikely hardware failure of all data centers, or the more likely and all-too-human user error. Acknowledging the importance of good backup procedures, the Scylla Manager now natively supports backup and restore operations. In this talk, we will learn more about how that works and the guarantees provided, as well as how to set it up to guarantee maximum resiliency to your cluster.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 80+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are Open Source and can be applied to most Kubernetes deployments.
Kubernetes is more or less one of the biggest players when it comes to Container orchestration. Since Kubernetes 1.7 RBAC (Role Based Access Control) is the default for the authorisation of actions in you cluster. There are many other components, like Pod Security Policies, Network Policies, Admisstion Controllers, that allows you to secure your Kubernetes cluster.
In this talk I will show you how these things can work together and which problem these components try to solve. Also I will show you an overview how other tools like Vault can fit into the Kubernetes ecosystem to make you platform more secure.
Event: DevFest Karlsruhe, 09.12.2017
Speaker: Johannes M. Scheuermann
Weitere Tech-Vorträge: https://www.inovex.de/de/content-pool/vortraege/
Weitere Tech-Artikel: https://www.inovex.de/blog/
The rise of Layer 7, microservices, and the proxy war with Envoy, NGINX, and ...Ambassador Labs
Modern cloud applications today are built as distributed microservices. These microservices talk to each other over L7 protocols: HTTP, gRPC, Redis, Kafka, and more. In this world, L7 proxies have assumed a crucial role in managing and observing L7 protocols. In this talk, I’ll discuss the evolution of service architectures, the role L7 proxies play in this world, and how there is now a battle raging between Envoy Proxy, HAProxy, and NGINX. I’ll wrap by talking about why we chose Envoy Proxy as the anchor of our Ambassador API Gateway and show how that has enabled a number of new capabilities.
Recording here: https://www.youtube.com/watch?v=5W4n9K3PIVg
Since Docker was open sourced in 2013, the community and adoption around Docker containers has grown to over 6 billion downloads and over 1000 contributors. Learn about why this is, and why you should start using containers for your own applications.
USENIX LISA2021 talk by Brendan Gregg (https://www.youtube.com/watch?v=_5Z2AU7QTH4). This talk is a deep dive that describes how BPF (eBPF) works internally on Linux, and dissects some modern performance observability tools. Details covered include the kernel BPF implementation: the verifier, JIT compilation, and the BPF execution environment; the BPF instruction set; different event sources; and how BPF is used by user space, using bpftrace programs as an example. This includes showing how bpftrace is compiled to LLVM IR and then BPF bytecode, and how per-event data and aggregated map data are fetched from the kernel.
Kubernetes Clusters Security with Amazon EKS (CON338-R1) - AWS re:Invent 2018Amazon Web Services
In this session, we discuss best practices for securing your Kubernetes deployments on AWS. We cover how to use AWS IAM with Kubernetes role-based access control (RBAC) for new or existing Kubernetes deployments, and we dive deep into how Amazon EKS implements secure cluster configuration by default.
** Kubernetes Certification Training: https://www.edureka.co/kubernetes-certification **
This Edureka tutorial on "Kubernetes Architecture" will give you an introduction to popular DevOps tool - Kubernetes, and will deep dive into Kubernetes Architecture and its working. The following topics are covered in this training session:
1. What is Kubernetes
2. Features of Kubernetes
3. Kubernetes Architecture and Its Components
4. Components of Master Node and Worker Node
5. ETCD
6. Network Setup Requirements
DevOps Tutorial Blog Series: https://goo.gl/P0zAfF
A comprehensive walkthrough of how to manage infrastructure-as-code using Terraform. This presentation includes an introduction to Terraform, a discussion of how to manage Terraform state, how to use Terraform modules, an overview of best practices (e.g. isolation, versioning, loops, if-statements), and a list of gotchas to look out for.
For a written and more in-depth version of this presentation, check out the "Comprehensive Guide to Terraform" blog post series: https://blog.gruntwork.io/a-comprehensive-guide-to-terraform-b3d32832baca
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
Secure Substrate: Least Privilege Container Deployment Docker, Inc.
Riyaz Faizullabhoy - Security Engineer, Docker
Diogo Mónica - Security Lead, Docker
The popularity of containers has driven the need for distributed systems that can provide a substrate for container deployments. These systems need the ability to provision and manage resources, place workloads, and adapt in the presence of failures. In particular, container orchestrators make it easy for anyone to manage their container workloads using their cloud-based or on-premise infrastructure. Unfortunately, most of these systems have not been architected with security in mind.Compromise of a less-privileged node can allow an attacker to escalate privileges to either gain control of the whole system, or to access resources it shouldn't have access to. In this talk, we will go over how Docker has been working to build secure blocks that allow you to run a least privilege infrastructure - where any participant of the system only has access to the resources that are strictly necessary for its legitimate purpose. No more, no less.
Escape From Your VMs with Image2Docker Jeff Nickoloff, All in Geek Consulting...Docker, Inc.
Migrating apps out of Virtual Machines is difficult, especially distributed apps with multiple components, and even more so when the components run on different operating systems. But with the Docker platform and the Image2Docker tools - which extract Linux and Windows apps from existing VMs into containers - it's easy. In this session we'll take a PHP front-end application running in a Linux VM, which connects to a .NET Web Service running in a Windows VM, and convert the whole stack to Docker automatically. Then we'll run the app on a hybrid Docker Datacenter cluster, where we can manage the Windows and Linux components from a single pane of glass.
Cilium - Network and Application Security with BPF and XDP Thomas Graf, Cova...Docker, Inc.
This talk will start with a deep dive and hands-on examples of BPF, possibly the most promising low-level technology to address challenges in application and network security, tracing, and visibility. We will discuss how BPF evolved from a simple bytecode language to filter raw sockets for tcpdump to the a JITable virtual machine capable of universally extending and instrumenting both the Linux kernel and user space applications. The introduction is followed by a concrete example of how the Cilium open source project applies BPF to solve networking, security and load balancing for highly distributed applications. We will discuss and demonstrate how Cilium with the help of BPF can be combined with distributed system orchestration such as Docker to simplify security, operations, and troubleshooting of distributed applications.
What Have Namespaces Done for you Lately? Liz Rice, Aqua SecurityDocker, Inc.
Containers are made with namespacing and cgroups, but what does that really mean? In this talk we'll write a container from scratch in Go, using bare system calls, and explore how the different namespaces affect the container's view of the world and the resources it has access to.
Taking Docker from Local to Production at Intuit JanJaap Lahpor, Intuit and H...Docker, Inc.
In this talk, we will share how a small team at Intuit moved Docker from local to production serving real and critical workloads. We will share how we addressed the organization challenges of running Docker at large enterprises by building a business case for a pilot project to prove the value of containers and its real world application. Next, we will share how we solved the technical challenges that present themselves when taking Docker from local to production in a corporate data center. We will share the blueprint for the business case and the associated pilot which laser focused on running stateless back-end services throughout the entire SDLC. Finally, we will highlight our crawl-walk-run approach that allowed us to make inexpensive mistakes before investing in the right areas as our Docker knowledge increased. We will share the major technical issues we encountered, how we overcame them and the lessons we learned.
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and Accelerated Computing (GPU and FPGA) instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Organizations continue to adopt Solr because of its ability to scale to meet even the most demanding workflows. Recently, LucidWorks has been leading the effort to identify, measure, and expand the limits of Solr. As part of this effort, we've learned a few things along the way that should prove useful for any organization wanting to scale Solr. Attendees will come away with a better understanding of how sharding and replication impact performance. Also, no benchmark is useful without being repeatable; Tim will also cover how to perform similar tests using the Solr-Scale-Toolkit in Amazon EC2.
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and Accelerated Computing (GPU and FPGA) instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and Accelerated Computing (GPU and FPGA) instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Historically, sharing a Linux server entailed all kinds of untenable compromises. In addition to the security concerns, there was simply no good way to keep one application from hogging resources and messing with the others. The classic “noisy neighbor” problem made shared systems the bargain-basement slums of the Internet, suitable only for small or throwaway projects.
Serious use-cases traditionally demanded dedicated systems. Over the past decade virtualization (in conjunction with Moore’s law) has democratized the availability of what amount to dedicated systems, and the result is hundreds of thousands of websites and applications deployed into VPS or cloud instances. It’s a step in the right direction, but still has glaring flaws.
Most of these websites are just piles of code sitting on a server somewhere. How did that code got there? How can it can be scaled? Secured? Maintained? It’s anybody’s guess. There simply isn’t enough SysAdmin talent in the world to meet the demands of managing all these apps with anything close to best practices without a better model.
Containers are a whole new ballgame. Unlike VMs, you skip the overhead of running an entire OS for every application environment. There’s also no need to provision a whole new machine to have a place to deploy, meaning you can spin up or scale your application with orders of magnitude more speed and accuracy.
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
Talk delivered at SCaLE10x, Los Angeles 2012.
Cloud Computing introduces new challenges for performance
analysis, for both customers and operators of the cloud. Apart from
monitoring a scaling environment, issues within a system can be
complicated when tenants are competing for the same resources, and are
invisible to each other. Other factors include rapidly changing
production code and wildly unpredictable traffic surges. For
performance analysis in the Joyent public cloud, we use a variety of
tools including Dynamic Tracing, which allows us to create custom
tools and metrics and to explore new concepts. In this presentation
I'll discuss a collection of these tools and the metrics that they
measure. While these are DTrace-based, the focus of the talk is on
which metrics are proving useful for analyzing real cloud issues.
Talk for PerconaLive 2016 by Brendan Gregg. Video: https://www.youtube.com/watch?v=CbmEDXq7es0 . "Systems performance provides a different perspective for analysis and tuning, and can help you find performance wins for your databases, applications, and the kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes six important areas of Linux systems performance in 50 minutes: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events), static tracing (tracepoints), and dynamic tracing (kprobes, uprobes), and much advice about what is and isn't important to learn. This talk is aimed at everyone: DBAs, developers, operations, etc, and in any environment running Linux, bare-metal or the cloud."
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and...PROIDEA
YouTube: https://www.youtube.com/watch?v=1HBP6LkKwLc&list=PLnKL6-WWWE_VtIMfNLW3N3RGuCUcQkDMl&index=13
The high level of automation for the container and microservice lifecycle makes the monitoring of Kubernetes or Swarm more challenging than in more traditional, more static deployments. Any static setup to monitor specific application containers does not work because orchestration tools like Kubernetes or Swarm make their own decisions according to the defined deployment rules. In this talk you will learn how DevOps can cope with challenges in Monitoring and Log Management on Docker Swarm and Kubernetes. We will start with the basics of container monitoring and logging, including APIs and tools, followed by an overview of the key metrics of both platforms. We will speak about cluster-wide deployments for monitoring and log management solutions and how to discover services for log collection and monitoring, tagging of logs and metrics. Finally, we will share insights derived from monitoring a 4700 node Swarm cluster, as part of the Swarm3k project.
Monitoring in Motion: Monitoring Containers and Amazon ECSAmazon Web Services
Containers and other forms of dynamic infrastructure can prove challenging to monitor. How do you define normal, when your infrastructure is intentionally in motion and change from minute to minute? Join us as we discuss proven strategies for monitoring your containerized infrastructure on AWS and ECS.
Advanced cgroups and namespaces
This talk picks up where we left off in the previous cgroups and namespaces talk and dive in even deeper!
Agenda:
* cgroups v2 design (cgroup v2 was started to be merged in the current kernel, 4.4)
* cgroups v2 examples (migrating tasks, enabling and disabling controllers, and more).
* comparison between cgroup v2 unified hierarchy and cgroup v1 legacy hierarchy.
* PIDs namespaces (from kernel 4.3)
* cgroup namespaces (not merged yet)
Containerize Your Game Server for the Best Multiplayer Experience Docker, Inc.
Raymond Arifianto, AccelByte and
Mark Mandel, Google -
We have been deploying containerized micro-services for our Game Backend Services for a while. Now we are tackling the challenge to scale up fleets of game dedicated servers in multiple regions, multiple data centers and multiple providers - some in bare metal, some in Cloud. So we leverage docker containerization to deploy Game Servers to achieve Portability, Fast Deployment and Predictability, enabling us to scale up to thousands of servers, on demand, without a sweat.
How to Improve Your Image Builds Using Advance Docker BuildDocker, Inc.
Nicholas Dille, Haufe-Lexware + Docker Captain -
Docker continues to be the standard tool for building container images. For more than a year Docker ships with BuildKit as an alternative image builder, providing advanced features for secret and cache management. These features help to make image builds faster and more secure. In this session, Docker Captain Nicholas Dille will teach you how to use Buildkit features to your advantage.
Build & Deploy Multi-Container Applications to AWSDocker, Inc.
Lukonde Mwila, Entelect -
As the cloud-native approach to development and deployment becomes more prevalent, it's an exciting time for software engineers to be equipped on how to dockerize multi-container applications and deploy them to the cloud.
In this talk, Lukonde Mwila, Software Engineer at Entelect, will cover the following topics:
- Docker Compose
- Containerizing an Nginx Server
- Containerizing an React App
- Containerizing an Node.JS App
- Containerizing anMongoDB App
- Runing Multi-Container App Locally
- Creating a CI/CD Pipeline
- Adding a build stage to test containers and push images to Docker Hub
- Deploying Multi-Container App to AWS Elastic Beanstalk
Lukonde will start by giving an overview of how Docker Compose works and how it makes it very easy and straightforward to startup multiple Docker containers at the same time and automatically connect them together with some form of networking.
After that, Lukonde will take a hands on approach to containerize an Nginx server, a React app, a NodeJS app and a MongoDB instance to demonstrate the power of Docker Compose. He'll demonstrate usage of two Docker files for an application, one production grade and the other for local development and running of tests. Lastly, he'll demonstrate creating a CI/CD pipeline in AWS to build and test our Docker images before pushing them to Docker Hub or AWS ECR, and finally deploying our multi-container application AWS Elastic Beanstalk.
Securing Your Containerized Applications with NGINXDocker, Inc.
Kevin Jones, NGNIX -
NGINX is one of the most popular images on Docker Hub and has been at the forefront of the web since the early 2000's. In this talk we will discuss how and why NGINX's lightweight and powerful architecture makes it a very popular choice for securing containerized applications as a sidecar reverse proxy within containers. We will highlight important aspects of application security that NGINX can help with, such as TLS, HTTP, AuthN, AuthZ and traffic control.
How To Build and Run Node Apps with Docker and ComposeDocker, Inc.
Kathleen Juell, Digital Ocean -
Containers are an essential part of today's microservice ecosystem, as they allow developers and operators to maintain standards of reliability and reproducibility in fast-paced deployment scenarios. And while there are best practices that extend across stacks in containerized environments, there are also things that make each stack distinct, starting with the application image itself.
This talk will dive into some of these particularities, both at the image and service level, while also covering general best practices for building and running Node applications with database backends using Docker and Compose.
Jessica Deen, Microsoft -
Helm 3 is here; let's go hands-on! In this demo-fueled session, I'll walk you through the differences between Helm 2 and Helm 3. I'll offer tips for a successful rollout or upgrade, go over how to easily use charts created for Helm 2 with Helm 3 (without changing your syntax), and review opportunities where you can participate in the project's future.
Distributed Deep Learning with Docker at SalesforceDocker, Inc.
Jeff Hajewski, Salesforce -
There is a wealth of information on building deep learning models with PyTorch or TensorFlow. Anyone interested in building a deep learning model is only a quick search away from a number of clear and well written tutorials that will take them from zero knowledge to having a working image classifier. But what happens when you need to deploy these models in a production setting? At Salesforce, we use TensorFlow models to help us provide customers with insights into their data, and we do this as close to real-time as possible. Designing these systems in a scalable manner requires overcoming a number of design challenges, but the core component is Docker. Docker enables us to design highly scalable systems by allowing us to focus on service interactions, rather than how our services will interact with the hardware. Docker is also at the core of our test infrastructure, allowing developers and data scientists to build and test the system in an end to end manner on their local machines. While some of this may sound complex, the core message is simplicity - Docker allows us to focus on the aspects of the system that matter, greatly simplifying our lives.
The First 10M Pulls: Building The Official Curl Image for Docker HubDocker, Inc.
James Fuller, webcomposite s.r.o. -
Curl is the venerable (yet very modern) 'swiss army knife' command line tool and library for transferring data with URLs. Recently we (the Curl team) decided to build a release for Docker Hub. This talk will outline our current development workflow with respect to the docker image and provide insights on what it takes to build a docker image for mass public consumption. We are also keen to learn from users and other developers how we might improve and enhance the official curl docker image.
Fabian Stäber, Instana -
In recent years, we saw a great paradigm shift in software engineering away from static monolithic applications towards dynamic distributed horizontally scalable architectures. Docker is one of the key technologies enabling this development. This shift poses a lot of new challenges for application monitoring, ranging from practical issues (need for automation) to technical challenges (Docker networking) to organizational topics (blurring line between software engineers and operations) to fundamental questions (define what is an application). In this talk we show how Docker changed the way we do monitoring, how modern application monitoring systems work, and what future developments we expect.
COVID-19 in Italy: How Docker is Helping the Biggest Italian IT Company Conti...Docker, Inc.
Clemente Biondo, Engineering Ingegneria Informatica -
When the COVID 19 pandemic started, Engineering Ingegneria Informatica Group (1.25 billion euros of revenues, 65 offices around the world, 12.000 employees) was forced to put their digital transformation to the test in order to maintain operational continuity. In this session, Clemente Biondo, the Tech Lead of the Information Systems Department, will share how his company is reacting to this unforeseeable scenario and how Docker-driven digital transformation had paved the path for work to continue remotely. Clemente will discuss learnings moving from colocated teams, manual approaches, email based-business processes, and a monolithic application to a mature DevOps culture characterized by a distributed autonomous workforce and a continuous deployment process that deploys backward-compatible Docker containerized microservices into hybrid multi cloud datacenters an average of twice a day with zero-downtime. He will detail how they use Docker to unify dev, test and production environments, and as an efficient and automated mechanism for deploying applications. Lastly, Clemente shares how, in our darkest hour, he and others are working to shine their brightest light.
Chris Lauer, NOAA Space Weather Prediction Center -
This is the story of how adopting a containerized workflow changed the way our small software team works at NOAA’s Space Weather Prediction Center. Our old architecture, a big ball of mud shared-database integration, just wasn’t cutting it - it was killing our agility. Over the past two years, our small team has adopted a microservice style architecture, using Docker with docker-compose and environment files as our deployment strategy for all new development. We’ve discovered the joys of using containers for identical dev, staging, and production environments. We work closely with scientists: much of the code we’re running has complicated and conflicting library dependencies. Docker captures these beautifully - we’ve even had some success teaching our scientists to use it! I’ll share what we’ve learned, some of the persistent challenges we face, and one place we really got it wrong. This talk builds off of a popular hallway track from DockerCon 2019.
Become a Docker Power User With Microsoft Visual Studio CodeDocker, Inc.
Brian Christner, 56k + Docker Captain -
In this session, we will unlock the full potential of using Microsoft Visual Studio Code (VS Code) and Docker Desktop to turn you into a Docker Power User. When we expand and utilize the VS Code Docker plugin, we can take our projects and Docker skills to the next level. In addition to using VS Code, we streamline our Docker Desktop development workflow with less context switching and built-in shortcuts. You will learn how to bootstrap new projects, quickly write Dockerfiles utilizing templates, build, run, and interact with containers all from VS Code.
How to Use Mirroring and Caching to Optimize your Container RegistryDocker, Inc.
Brandon Mitchell, Boxboat + Docker Captain -
How do you make your builds more performant? This talk looks at options to configure caching and mirroring of images that you need to save on bandwidth costs and to keep running even if something goes down upstream.
Monolithic to Microservices + Docker = SDLC on Steroids!Docker, Inc.
Ashish Sharma, SS&C Eze -
SS&C Eze provides various products in the stock market domain. We spent the last couple of years building Eclipse which is an investment suite born in cloud. The journey so far has been very interesting. The very first version of the product were a bunch of monolithic windows services and deployed using Octopus tool. We successfully managed to bring all the monolithic problem to the cloud and created a nightmare for ourselves. We then started applying microservices architecture principles and started breaking the monolithic into small services. Very soon we realized that we need a better packaging/deployment tool. Docker looked like a magical solution to our problem. Since its adoption, It has not only solved the deployment problem for us but has made a deep impact on different aspects of SDLC. It allowed us to use heterogeneous technology stacks, simplified development environment setup, simplified our testing strategy, improved our speed of delivery, and made our developers more productive. In this talk I would like to share our experience of using Docker and its positive impact on our SDLC.
Ara Pulido, Datadog -
Container technologies, although not new, have increased their popularity in the past few years, with container orchestrators allowing companies around the world to adopt these technologies to help them ship and scale microservices with precision and velocity. Kubernetes is currently the most popular container orchestration platform, and while many organizations are migrating their workloads to it, Kubernetes is still relatively immature. New corner cases, errors, and quirks are regularly discovered as users push the boundaries of size and scale. When Datadog adopted Kubernetes we discovered some of these boundaries the hard way, and we continuously challenge and modify our infrastructure decisions in order to fit our use case. Join me in this talk for our story on what we learned while we scaled our Kubernetes clusters, the contributions to Kubernetes we made along the way, and how you can apply those learnings when growing your Kubernetes clusters from a handful to hundreds or thousands of nodes.
Andy Clemenko, StackRox -
One underutilized, and amazing, thing about the docker image scheme is labels. Labels are a built in way to document all aspects about the image itself. Think about all the information that the tags inside your clothing carry. If you care to look you can find out everything about the garment. All that information can be very valuable. Now think about how we can leverage labels to carry similar information. We can even use the labels to contain Docker Compose or even Kubernetes Yaml. We can even include labels into the CI/CD process making things more secure and smoother. Come find out some fun techniques on how to leverage labels to do some fun and amazing things.
Using Docker Hub at Scale to Support Micro Focus' Delivery and Deployment ModelDocker, Inc.
Patrick Deloulay, Micro Focus -
Micro Focus started their digital transformation 3 years ago, moving the entire portfolio into hundreds of container images. Leveraging Docker Hub as our primary registry service, we will cover how we ended up building a simple but secure push/pull model to publish and deliver our premium assets to our customers and partners to both meet the high agility of our DevOps teams while greatly simplifying the deployment of our applications.
Build & Deploy Multi-Container Applications to AWSDocker, Inc.
Lukonde Mwila, Entelect
As the cloud-native approach to development and deployment becomes more prevalent, it's an exciting time for software engineers to be equipped on how to dockerize multi-container applications and deploy them to the cloud.
In this talk, Lukonde Mwila, Software Engineer at Entelect, will cover the following topics:
- Docker Compose
- Containerizing an Nginx Server
- Containerizing an React App
- Containerizing an Node.JS App
- Containerizing anMongoDB App
- Runing Multi-Container App Locally
- Creating a CI/CD Pipeline
- Adding a build stage to test containers and push images to Docker Hub
- Deploying Multi-Container App to AWS Elastic Beanstalk
Lukonde will start by giving an overview of how Docker Compose works and how it makes it very easy and straightforward to startup multiple Docker containers at the same time and automatically connect them together with some form of networking.
After that, Lukonde will take a hands on approach to containerize an Nginx server, a React app, a NodeJS app and a MongoDB instance to demonstrate the power of Docker Compose. He'll demonstrate usage of two Docker files for an application, one production grade and the other for local development and running of tests. Lastly, he'll demonstrate creating a CI/CD pipeline in AWS to build and test our Docker images before pushing them to Docker Hub or AWS ECR, and finally deploying our multi-container application AWS Elastic Beanstalk.
From Fortran on the Desktop to Kubernetes in the Cloud: A Windows Migration S...Docker, Inc.
Elton Stoneman, Docker Captain + Container Consultant and Trainer
How do you provide a SaaS offering when your product is a 10-year old Fortran app, currently built to run on Windows 10? With Docker and Kubernetes of course - and you can do it in a week (... to prototype level at least).
In this session I'll walk through the processes and practicalities of taking an older Windows app, making it run in containers with Kubernetes, and then building a simple API wrapper to host the whole stack as a cloud-based SaaS product.
There's a lot of technology here from a real world case study, and I'll focus on:
- running Windows apps in Docker containers
- building a .NET Core API which can run in Linux or Windows containers
- running the stack in Kubernetes with Docker Desktop locally and AKS in the cloud
- configuring AKS workloads in Azure to burst out to Azure Container Instances
And there's a core theme to this session: Docker and Kubernetes are complex technologies, but they're the key to modern development. If you invest time learning them, they make projects like this simple, portable, fast and fun.
Developing with Docker for the Arm ArchitectureDocker, Inc.
This virtual meetup introduces the concepts and best practices of using Docker containers for software development for the Arm architecture across a variety of hardware systems. Using Docker Desktop on Windows or Mac, Amazon Web Services (AWS) A1 instances, and embedded Linux, we will demonstrate the latest Docker features to build, share, and run multi-architecture images with transparent support for Arm.
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
2. Identify bottlenecks:
1. In the host vs container, using system metrics
2. In application code on containers, using CPU flame graphs
3. Deeper in the kernel, using tracing tools
Focus of this talk is how containers work in Linux (will demo on 4.9)
I will include some Docker specifics, and start with a Netflix summary (Titus)
Take Aways
4. Titus
• Cloud runtime platform for container jobs
• Scheduling
• Service & batch job management
• Advanced resource management across
elastic shared resource pool
• Container Execution
• Docker and AWS EC2 Integration
• Adds VPC, security groups, EC2
metadata, IAM roles, S3 logs, …
• Integration with Netflix infrastructure
• In depth: http://techblog.netflix.com/2017/04/the-evolution-of-container-usage-at.html
Service
Job Management
Resource Management & Op=miza=on
Container Execu=on Integra=on
Batch
5. Current Titus Scale
• Deployed across multiple AWS accounts & three regions
• Over 2,500 instances (Mostly M4.4xls & R3.8xls)
• Over a week period launched over 1,000,000 containers
6. Titus Use Cases
• Service
• Stream Processing (Flink)
• UI Services (Node.JS single core)
• Internal dashboards
• Batch
• Algorithm training, personalization & recommendations
• Adhoc reporting
• Continuous integration builds
• Queued worker model
• Media encoding
7. Container Performance @Netflix
• Ability to scale and balance workloads with EC2 and Titus
• Can already solve many perf issues
• Performance needs:
• Application analysis: using CPU flame graphs with containers
• Host tuning: file system, networking, sysctl's, …
• Container analysis and tuning: cgroups, GPUs, …
• Capacity planning: reduce over provisioning
12. cgroup v1
cpu,cpuacct:
• cap CPU usage (hard limit). e.g. 1.5 CPUs.
• CPU shares. e.g. 100 shares.
• usage statistics (cpuacct)
memory:
• limit and kmem limit (maximum bytes)
• OOM control: enable/disable
• usage statistics
blkio (block I/O):
• weights (like shares)
• IOPS/tput caps per storage device
• statistics
Docker:
--cpus (1.13)
--cpu-shares
--memory --kernel-memory
--oom-kill-disable
13. container's shares
total busy shares
CPU Shares
Container's CPU limit = 100% x
This lets a container use other tenant's idle CPU (aka "bursting"), when available.
container's shares
total allocated shares
Container's minimum CPU limit = 100% x
Can make analysis tricky. Why did perf regress? Less bursting available?
14. • Major rewrite has been happening: cgroups v2
• Supports nested groups, better organization and consistency
• Some already merged, some not yet (e.g. CPU)
• See docs/talks by maintainer Tejun Heo (Facebook)
• References:
• https://www.kernel.org/doc/Documentation/cgroup-v2.txt
• https://lwn.net/Articles/679786/
cgroup v2
15. File systems
• Containers may be setup with aufs/overlay on top of another FS
• See "in practice" pages and their performance sections from
https://docs.docker.com/engine/userguide/storagedriver/
Networking
• With Docker, can be bridge, host, or overlay networks
• Overlay networks have come with significant performance cost
Container OS Configuration
16. Performance analysis with containers:
• One kernel
• Two perspectives
• Namespaces
• cgroups
Methodologies:
• USE Method
• Workload characterization
• Checklists
• Event tracing
Analysis Strategy
17. USE Method
For every resource, check:
1. Utilization
2. Saturation
3. Errors
For example, CPUs:
• Utilization: time busy
• Saturation: run queue length or latency
• Errors: ECC errors, etc.
Can be applied to hardware resources and software resources (cgroups)
Resource
Utilization
(%)X
19. • PIDs in host don't match those seen in containers
• Symbol files aren't where tools expect them
• The kernel currently doesn't have a container ID
Host Analysis Challenges
20. I'll demo CLI tools
It's the lowest common
denominator
You may usually use GUIs
(like we do). They source
the same metrics.
CLI Tool
Disclaimer
21. 3.1. Host Physical Resources
A refresher of basics... Not container specific.
This will, however, solve many issues!
Containers are often not the problem.
23. Host Perf Analysis in 60s
http://techblog.netflix.com/2015/11/linux-performance-analysis-in-60s.html
1. uptime
2. dmesg | tail
3. vmstat 1
4. mpstat -P ALL 1
5. pidstat 1
6. iostat -xz 1
7. free -m
8. sar -n DEV 1
9. sar -n TCP,ETCP 1
10. top
load averages
kernel errors
overall stats by time
CPU balance
process usage
disk I/O
memory usage
network I/O
TCP stats
check overview
24. USE Method: Host Resources
Resource Utilization Saturation Errors
CPU mpstat -P ALL 1,
sum non-idle fields
vmstat 1, "r" perf
Memory
Capacity
free –m,
"used"/"total"
vmstat 1, "si"+"so";
demsg | grep killed
dmesg
Storage I/O iostat –xz 1,
"%util"
iostat –xnz 1,
"avgqu-sz" > 1
/sys/…/ioerr_cnt;
smartctl
Network nicstat, "%Util" ifconfig, "overrunns";
netstat –s "retrans…"
ifconfig,
"errors"
These should be in your monitoring GUI. Can do other resources too (busses, ...)
25. Event Tracing: e.g. iosnoop
Disk I/O events with latency (from perf-tools; also in bcc/BPF as biosnoop)
# ./iosnoop
Tracing block I/O... Ctrl-C to end.
COMM PID TYPE DEV BLOCK BYTES LATms
supervise 1809 W 202,1 17039968 4096 1.32
supervise 1809 W 202,1 17039976 4096 1.30
tar 14794 RM 202,1 8457608 4096 7.53
tar 14794 RM 202,1 8470336 4096 14.90
tar 14794 RM 202,1 8470368 4096 0.27
tar 14794 RM 202,1 8470784 4096 7.74
tar 14794 RM 202,1 8470360 4096 0.25
tar 14794 RM 202,1 8469968 4096 0.24
tar 14794 RM 202,1 8470240 4096 0.24
tar 14794 RM 202,1 8470392 4096 0.23
26. Event Tracing: e.g. zfsslower
• This is from our production Titan system (Docker).
• File system latency is a better pain indicator than disk latency.
• zfsslower (and btrfs*, etc) are in bcc/BPF. Can exonerate FS/disks.
# /usr/share/bcc/tools/zfsslower 1
Tracing ZFS operations slower than 1 ms
TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
23:44:40 java 31386 O 0 0 8.02 solrFeatures.txt
23:44:53 java 31386 W 8190 1812222 36.24 solrFeatures.txt
23:44:59 java 31386 W 8192 1826302 20.28 solrFeatures.txt
23:44:59 java 31386 W 8191 1826846 28.15 solrFeatures.txt
23:45:00 java 31386 W 8192 1831015 32.17 solrFeatures.txt
23:45:15 java 31386 O 0 0 27.44 solrFeatures.txt
23:45:56 dockerd 3599 S 0 0 1.03 .tmp-a66ce9aad…
23:46:16 java 31386 W 31 0 36.28 solrFeatures.txt
32. docker stats
# docker stats
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
353426a09db1 526.81% 4.061 GiB / 8.5 GiB 47.78% 0 B / 0 B 2.818 MB / 0 B 247
6bf166a66e08 303.82% 3.448 GiB / 8.5 GiB 40.57% 0 B / 0 B 2.032 MB / 0 B 267
58dcf8aed0a7 41.01% 1.322 GiB / 2.5 GiB 52.89% 0 B / 0 B 0 B / 0 B 229
61061566ffe5 85.92% 220.9 MiB / 3.023 GiB 7.14% 0 B / 0 B 43.4 MB / 0 B 61
bdc721460293 2.69% 1.204 GiB / 3.906 GiB 30.82% 0 B / 0 B 4.35 MB / 0 B 66
6c80ed61ae63 477.45% 557.7 MiB / 8 GiB 6.81% 0 B / 0 B 9.257 MB / 0 B 19
337292fb5b64 89.05% 766.2 MiB / 8 GiB 9.35% 0 B / 0 B 5.493 MB / 0 B 19
b652ede9a605 173.50% 689.2 MiB / 8 GiB 8.41% 0 B / 0 B 6.48 MB / 0 B 19
d7cd2599291f 504.28% 673.2 MiB / 8 GiB 8.22% 0 B / 0 B 12.58 MB / 0 B 19
05bf9f3e0d13 314.46% 711.6 MiB / 8 GiB 8.69% 0 B / 0 B 7.942 MB / 0 B 19
09082f005755 142.04% 693.9 MiB / 8 GiB 8.47% 0 B / 0 B 8.081 MB / 0 B 19
bd45a3e1ce16 190.26% 538.3 MiB / 8 GiB 6.57% 0 B / 0 B 10.6 MB / 0 B 19
[...]
A "top" for containers. Resource utilization. Workload characterization.
Loris Degioanni demoed a similar sysdigcloud view yesterday (needs the sysdig kernel agent)
33. top
# top - 22:46:53 up 36 days, 59 min, 1 user, load average: 5.77, 5.61, 5.63
Tasks: 1067 total, 1 running, 1046 sleeping, 0 stopped, 20 zombie
%Cpu(s): 34.8 us, 1.8 sy, 0.0 ni, 61.3 id, 0.0 wa, 0.0 hi, 1.9 si, 0.1 st
KiB Mem : 65958552 total, 12418448 free, 49247988 used, 4292116 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 13101316 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28321 root 20 0 33.126g 0.023t 37564 S 621.1 38.2 35184:09 java
97712 root 20 0 11.445g 2.333g 37084 S 3.1 3.7 404:27.90 java
98306 root 20 0 12.149g 3.060g 36996 S 2.0 4.9 194:21.10 java
96511 root 20 0 15.567g 6.313g 37112 S 1.7 10.0 168:07.44 java
5283 root 20 0 1643676 100092 94184 S 1.0 0.2 401:36.16 mesos-slave
2079 root 20 0 9512 132 12 S 0.7 0.0 220:07.75 rngd
5272 titusag+ 20 0 10.473g 1.611g 23488 S 0.7 2.6 1934:44 java
[…]
In the host, top shows all processes. Currently doesn't show a container ID.
… remember, there is no container ID in the kernel yet.
34. htop
CGROUP PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
:pids:/docker/ 28321 root 20 0 33.1G 24.0G 37564 S 524. 38.2 672h /apps/java
:pids:/docker/ 9982 root 20 0 33.1G 24.0G 37564 S 44.4 38.2 17h00:41 /apps/java
:pids:/docker/ 9985 root 20 0 33.1G 24.0G 37564 R 41.9 38.2 16h44:51 /apps/java
:pids:/docker/ 9979 root 20 0 33.1G 24.0G 37564 S 41.2 38.2 17h01:35 /apps/java
:pids:/docker/ 9980 root 20 0 33.1G 24.0G 37564 S 39.3 38.2 16h59:17 /apps/java
:pids:/docker/ 9981 root 20 0 33.1G 24.0G 37564 S 39.3 38.2 17h01:32 /apps/java
:pids:/docker/ 9984 root 20 0 33.1G 24.0G 37564 S 37.3 38.2 16h49:03 /apps/java
:pids:/docker/ 9983 root 20 0 33.1G 24.0G 37564 R 35.4 38.2 16h54:31 /apps/java
:pids:/docker/ 9986 root 20 0 33.1G 24.0G 37564 S 35.4 38.2 17h05:30 /apps/java
:name=systemd:/user.slice/user-0.slice/session-c31.scope? 74066 root 20 0 27620
:pids:/docker/ 9998 root 20 0 33.1G 24.0G 37564 R 28.3 38.2 11h38:03 /apps/java
:pids:/docker/ 10001 root 20 0 33.1G 24.0G 37564 S 27.7 38.2 11h38:59 /apps/java
:name=systemd:/system.slice/daemontools.service? 5272 titusagen 20 0 10.5G 1650M 23
:pids:/docker/ 10002 root 20 0 33.1G 24.0G 37564 S 25.1 38.2 11h40:37 /apps/java
htop can add a CGROUP field, but, can truncate important info:
Can fix, but that would be Docker + cgroup-v1 specific. Still need a kernel CID.
35. Host PID -> Container ID
# grep 28321 /sys/fs/cgroup/cpu,cpuacct/docker/*/tasks | cut -d/ -f7
dcf3a506de453107715362f6c9ba9056fcfc6e769d28fc4a1c72bbaff4a24834
… who does that (CPU busy) PID 28321 belong to?
• Only works for Docker, and that cgroup v1 layout. Some Linux commands:
# ls -l /proc/27992/ns/*
lrwxrwxrwx 1 root root 0 Apr 13 20:49 cgroup -> cgroup:[4026531835]
lrwxrwxrwx 1 root root 0 Apr 13 20:49 ipc -> ipc:[4026533354]
lrwxrwxrwx 1 root root 0 Apr 13 20:49 mnt -> mnt:[4026533352]
[…]
# cat /proc/27992/cgroup
11:freezer:/docker/dcf3a506de453107715362f6c9ba9056fcfc6e769d28fc4a1c72bbaff4a24834
10:blkio:/docker/dcf3a506de453107715362f6c9ba9056fcfc6e769d28fc4a1c72bbaff4a24834
9:perf_event:/docker/dcf3a506de453107715362f6c9ba9056fcfc6e769d28fc4a1c72bbaff4a24834
[…]
36. nsenter Wrapping
# nsenter -t 28321 -u hostname
titus-1392192-worker-14-16
… what hostname is PID 28321 running on?
• Can namespace enter:
• -m: mount -u: uts -i: ipc -n: net -p: pid -U: user
• Bypasses cgroup limits, and seccomp profile (allowing syscalls)
• For Docker, you can enter the container more completely with: docker exec -it CID command
• Handy nsenter one-liners:
• nsenter -t PID -u hostname container hostname
• nsenter -t PID -n netstat -i container netstat
• nsenter -t PID –m -p df -h container file system usage
• nsenter -t PID -p top container top
37. nsenter: Host -> Container top
# nsenter -t 28321 -m -p top
top - 18:16:13 up 36 days, 20:28, 0 users, load average: 5.66, 5.29, 5.28
Tasks: 6 total, 1 running, 5 sleeping, 0 stopped, 0 zombie
%Cpu(s): 30.5 us, 1.7 sy, 0.0 ni, 65.9 id, 0.0 wa, 0.0 hi, 1.8 si, 0.1 st
KiB Mem: 65958552 total, 54664124 used, 11294428 free, 164232 buffers
KiB Swap: 0 total, 0 used, 0 free. 1592372 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
301 root 20 0 33.127g 0.023t 37564 S 537.3 38.2 40269:41 java
1 root 20 0 21404 2236 1812 S 0.0 0.0 4:15.11 bash
87888 root 20 0 21464 1720 1348 R 0.0 0.0 0:00.00 top
… Given PID 28321, running top for its container by entering its namespaces:
Note that it is PID 301 in the container. Can also see this using:
# grep NSpid /proc/28321/status
NSpid: 28321 301
38. perf: CPU Profiling
# perf record -F 49 -a -g -- sleep 30
# perf script
Failed to open /lib/x86_64-linux-gnu/libc-2.19.so, continuing without symbols
Failed to open /tmp/perf-28321.map, continuing without symbols
Can run system-wide (-a), match a pid (-p), or cgroup (-G, if it works)
• Current symbol translation gotchas (up to 4.10-ish):
• perf can't find /tmp/perf-PID.map files in the host, and the PID is different
• perf can't find container binaries under host paths (what /usr/bin/java?)
• Can copy files to the host, map PIDs, then run perf script/report:
• http://blog.alicegoldfuss.com/making-flamegraphs-with-containerized-java/
• http://batey.info/docker-jvm-flamegraphs.html
• Can nsenter (-m -u -i -n -p) a "power" shell, and then run "perf -p PID"
• perf should be fixed to be namespace aware (like bcc was, PR#1051)
39. • See previous slide for getting perf symbols to work
• From the host, can study all containers, as well as container overheads
CPU Flame Graphs
git clone --depth 1 https://github.com/brendangregg/FlameGraph
cd FlameGraph
perf record –F 49 -a –g -- sleep 30
perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > perf.svg
Java, missing stacks (need
-XX:+PreserveFramePointer)
Kernel TCP/IP stack
Look in areas like this to find
and quantify overhead (cgroup
throttles, FS layers, networking, etc).
It's likely small and hard to find.
40. /sys/fs/cgroups (raw)
# cd /sys/fs/cgroup/cpu,cpuacct/docker/02a7cf65f82e3f3e75283944caa4462e82f8f6ff5a7c9a...
# ls
cgroup.clone_children cpuacct.usage_all cpuacct.usage_sys cpu.shares
cgroup.procs cpuacct.usage_percpu cpuacct.usage_user cpu.stat
cpuacct.stat cpuacct.usage_percpu_sys cpu.cfs_period_us notify_on_release
cpuacct.usage cpuacct.usage_percpu_user cpu.cfs_quota_us tasks
# cat cpuacct.usage
1615816262506
# cat cpu.stat
nr_periods 507
nr_throttled 74
throttled_time 3816445175
The best source for per-cgroup metrics. e.g. CPU:
• https://www.kernel.org/doc/Documentation/cgroup-v1/, ../scheduler/sched-bwc.txt
• https://blog.docker.com/2013/10/gathering-lxc-docker-containers-metrics/
Note: grep cgroup /proc/mounts to check where these are mounted
These metrics should be included in performance monitoring GUIs
total time throttled (nanoseconds). saturation metric.
average throttle time = throttled_time / nr_throttled
43. A metric collector used by
monitoring GUIs
https://github.com/intelsdi-x/snap
Has a Docker plugin to read
cgroup stats
There's also a collectd plugin:
https://github.com/bobrik/collectd-
docker
Intel snap
45. Game Scenario 1
Container user claims they have a CPU performance issue
• Container has a CPU cap and CPU shares configured
• There is idle CPU on the host
• Other tenants are CPU busy
• /sys/fs/cgroup/.../cpu.stat -> throttled_time is increasing
• /proc/PID/status nonvoluntary_ctxt_switches is increasing
• Container CPU usage equals its cap (clue: this is not really a clue)
46. Game Scenario 2
Container user claims they have a CPU performance issue
• Container has a CPU cap and CPU shares configured
• There is no idle CPU on the host
• Other tenants are CPU busy
• /sys/fs/cgroup/.../cpu.stat -> throttled_time is not increasing
• /proc/PID/status nonvoluntary_ctxt_switches is increasing
47. Game Scenario 3
Container user claims they have a CPU performance issue
• Container has CPU shares configured
• There is no idle CPU on the host
• Other tenants are CPU busy
• /sys/fs/cgroup/.../cpu.stat -> throttled_time is not increasing
• /proc/PID/status nonvoluntary_ctxt_switches is not increasing much
Experiments to confirm conclusion?
48. Methodology: Reverse Diagnosis
Enumerate possible outcomes, and work backwards to the metrics needed for diagnosis.
e.g. CPU performance outcomes:
A. physical CPU throttled
B. cap throttled
C. shares throttled (assumes physical CPU limited as well)
D. not throttled
Gameanswers:1.B,2.C,3.D
51. • Some resource metrics are for the container, some for the host. Confusing!
• May lack system capabilities or syscalls to run profilers and tracers
Guest Analysis Challenges
52. CPU
container# uptime
20:17:19 up 45 days, 21:21, 0 users, load average: 5.08, 3.69, 2.22
container# mpstat 1
Linux 4.9.0 (02a7cf65f82e) 04/14/17 _x86_64_ (8 CPU)
20:17:26 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
20:17:27 all 51.00 0.00 12.28 0.00 0.00 0.00 0.00 0.00 0.00 36.72
20:17:28 all 50.88 0.00 12.31 0.00 0.00 0.00 0.00 0.00 0.00 36.81
^C
Average: all 50.94 0.00 12.30 0.00 0.00 0.00 0.00 0.00 0.00 36.76
container# pidstat 1
Linux 4.9.0 (02a7cf65f82e) 04/14/17 _x86_64_ (8 CPU)
20:17:33 UID PID %usr %system %guest %CPU CPU Command
20:17:34 UID PID %usr %system %guest %CPU CPU Command
20:17:35 UID PID %usr %system %guest %CPU CPU Command
[...]
Can see host's CPU devices, but only container (pid namespace) processes:
load!
busy CPUs
but this container
is running nothing
(we saw CPU usage
from neighbors)
53. Memory
container# free -m
total used free shared buff/cache available
Mem: 15040 1019 8381 153 5639 14155
Swap: 0 0 0
container# perl -e '$a = "A" x 1_000_000_000'
Killed
Can see host's memory:
tries to consume ~2 Gbytes
host memory (this container is --memory=1g)
56. This confuses apps too: trying to bind on all CPUs, or using 25% of memory
• Including the JDK, which is unaware of container limits, covered yesterday by Fabiane Nardon
We could add a "metrics" namespace so the container only sees itself
• Or enhance existing namespaces to do this
If you add a metrics namespace, please consider adding an option for:
• /proc/host/stats: maps to host's /proc/stats, for CPU stats
• /proc/host/diskstats: maps to host's /proc/diskstats, for disk stats
As those host metrics can be useful, to identify/exonerate neighbor issues
Metrics Namespace
57. perf: CPU Profiling
container# ./perf record -F 99 -a -g -- sleep 10
perf_event_open(..., PERF_FLAG_FD_CLOEXEC) failed with unexpected error 1 (Operation not permitted)
perf_event_open(..., 0) failed unexpectedly with error 1 (Operation not permitted)
Error: You may not have permission to collect system-wide stats.
Consider tweaking /proc/sys/kernel/perf_event_paranoid,
which controls use of the performance events system by
unprivileged users (without CAP_SYS_ADMIN).
The current value is 2:
-1: Allow use of (almost) all events by all users
>= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK
>= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
>= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN
Needs capabilities to run from a container:
Helpful message
Although, after setting perf_event_paranoid to -1, it prints the same error...
58. perf & Container Debugging
host# strace -fp 26450
[...]
[pid 27426] perf_event_open(0x2bfe498, -1, 0, -1, 0) = -1 EPERM (Operation not permitted)
[pid 27426] perf_event_open(0x2bfe498, -1, 0, -1, 0) = -1 EPERM (Operation not permitted)
[pid 27426] perf_event_open(0x2bfc1a8, -1, 0, -1, PERF_FLAG_FD_CLOEXEC) = -1 EPERM (Operation not permitted)
Debugging using strace from the host (as ptrace() is also blocked):
bash PID, from which I then ran perf
Many different ways to debug this.
https://docs.docker.com/engine/security/seccomp/#significant-syscalls-blocked-by-the-default-profile:
…
…
59. perf, cont.
• Can enable perf_event_open() with: docker run --cap-add sys_admin
• Also need (for kernel symbols): echo 0 > /proc/sys/kernel/kptr_restrict
• perf then "works", and you can make flame graphs. But it sees all CPUs!?
• perf needs to be "container aware", and only see the container's tasks.
patch pending: https://lkml.org/lkml/2017/1/12/308
• Currently easier to run perf from the host (or secure "monitoring" container)
• Via a secure monitoring agent,
e.g. Netflix Vector -> CPU Flame Graph
• See earlier slides for steps
60. Advanced Analysis
… a few more examples
(iosnoop, zfsslower, and btrfsdist shown earlier)
5. Tracing
62. ftrace: Overlay FS Function Calls
# funccount '*ovl*'
Tracing "*ovl*"... Ctrl-C to end.
^C
FUNC COUNT
ovl_cache_free 3
ovl_xattr_get 3
[...]
ovl_fill_merge 339
ovl_path_real 617
ovl_path_upper 777
ovl_update_time 777
ovl_permission 1408
ovl_d_real 1434
ovl_override_creds 1804
Ending tracing...
Using ftrace via my perf-tools to count function calls in-kernel context:
Each can be a target for further study with kprobes
63. ftrace: Overlay FS Function Tracing
# kprobe -s 'p:ovl_fill_merge ctx=%di name=+0(%si):string'
Tracing kprobe ovl_fill_merge. Ctrl-C to end.
bash-16633 [000] d... 14390771.218973: ovl_fill_merge: (ovl_fill_merge+0x0/0x1f0
[overlay]) ctx=0xffffc90042477db0 name="iostat"
bash-16633 [000] d... 14390771.218981: <stack trace>
=> ovl_fill_merge
=> ext4_readdir
=> iterate_dir
=> ovl_dir_read_merged
=> ovl_iterate
=> iterate_dir
=> SyS_getdents
=> do_syscall_64
=> return_from_SYSCALL_64
[…]
Using kprobe (perf-tools) to trace ovl_fill_merg() args and stack trace
Good for debugging, although dumping all events can cost too much overhead. ftrace has
some solutions to this, BPF has more…
68. BPF: Namespace-ing Tools
Walking from the task_struct to the PID namespace ID:
task_struct->nsproxy->pid_ns_for_children->ns.inum
• This is unstable, and could break between kernel versions. If it becomes a problem, we'll add a
bpf_get_current_pidns()
• Does needs a *task, or bpf_get_current_task() (added in 4.8)
• Can also pull out cgroups, but gets tricker…
70. Docker Analysis & Debugging
If needed, dockerd can also be analyzed using:
• go execution tracer
• GODEBUG with gctrace and schedtrace
• gdb and Go runtime support
• perf profiling
• bcc/BPF and uprobes
Each has pros/cons. bcc/BPF can trace user & kernel events.
71. BPF: dockerd Go Function Counting
# funccount '/usr/bin/dockerd:*docker*get*'
Tracing 463 functions for "/usr/bin/dockerd:*docker*get*"... Hit Ctrl-C to end.
^C
FUNC COUNT
github.com/docker/docker/daemon.(*statsCollector).getSystemCPUUsage 3
github.com/docker/docker/daemon.(*Daemon).getNetworkSandboxID 3
github.com/docker/docker/daemon.(*Daemon).getNetworkStats 3
github.com/docker/docker/daemon.(*statsCollector).getSystemCPUUsage.func1 3
github.com/docker/docker/pkg/ioutils.getBuffer 6
github.com/docker/docker/vendor/golang.org/x/net/trace.getBucket 9
github.com/docker/docker/vendor/golang.org/x/net/trace.getFamily 9
github.com/docker/docker/vendor/google.golang.org/grpc.(*ClientConn).getTransport 10
github.com/docker/docker/vendor/github.com/golang/protobuf/proto.getbase 20
github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*http2Client).getStream 30
Detaching...
# objdump -tTj .text /usr/bin/dockerd | wc -l
35859
Counting dockerd Go calls in-kernel using BPF that match "*docker*get":
35,859 functions can be traced!
Uses uprobes, and needs newer kernels. Warning: will cost overhead at high function rates.
72. BPF: dockerd Go Stack Tracing
# stackcount 'p:/usr/bin/dockerd:*/ioutils.getBuffer'
Tracing 1 functions for "p:/usr/bin/dockerd:*/ioutils.getBuffer"... Hit Ctrl-C to end.
^C
github.com/docker/docker/pkg/ioutils.getBuffer
github.com/docker/docker/pkg/broadcaster.(*Unbuffered).Write
bufio.(*Reader).writeBuf
bufio.(*Reader).WriteTo
io.copyBuffer
io.Copy
github.com/docker/docker/pkg/pools.Copy
github.com/docker/docker/container/stream.(*Config).CopyToPipe.func1.1
runtime.goexit
dockerd [18176]
110
Detaching...
Counting stack traces that led to this ioutils.getBuffer() call:
means this stack was seen 110 times
Can also trace function arguments, and latency (with some work)
http://www.brendangregg.com/blog/2017-01-31/golang-bcc-bpf-function-tracing.html
73. Identify bottlenecks:
1. In the host vs container, using system metrics
2. In application code on containers, using CPU flame graphs
3. Deeper in the kernel, using tracing tools
Summary