thredUP team shares key learnings from after-migration processes. We tell you about what technologies and solutions worked best for us and where we spent time troubleshooting and improving. In particular we have focused on development and staging experience, user authentication, cloud-native CI pipelines, applications telemetry and service mesh. We also share our experience with Kubernetes security hardening, autoscaling and tell you about a new service creation within our infrastructure.
While Go is the language-of-choice in the cloud-native world, Python has a huge community and makes it really easy to extend Kubernetes in only a few lines of code.
This talk shows examples on how to use Python to query the Kubernetes API, how to write simple controllers in only 10 lines of Python, how to build complete web UIs, and how to test everything with py.test and Kind.
Some of the open-source projects which will be covered: pykube-ng, Kubernetes Web View, kube-janitor, and Kopf (Kubernetes Operator Pythonic Framework).
Talk held in Prague on 2019-09-05:
https://www.meetup.com/Cloud-Native-Prague/events/263802447/
This document discusses previewing an application across different terminals and operating systems by using a common preview image. It also discusses load testing the application using Vegeta and monitoring CPU usage with Kubernetes horizontal pod autoscaling and the Google Cloud Monitoring API.
Kubernetes Failure Stories, or: How to Crash Your Cluster - ContainerDays EU ...Henning Jacobs
Bootstrapping a Kubernetes cluster is easy, rolling it out to nearly 200 engineering teams and operating it at scale is a challenge. In this talk, we are presenting our approach to Kubernetes provisioning on AWS, operations and developer experience for our growing Zalando developer base. We will walk you through our horror stories of operating 100+ clusters and share the insights we gained from incidents, failures, user reports and general observations. Our failure stories will be sourced from recent and past incidents, so the talk will be up-to-date with our latest experiences.
Advanced Task Scheduling with Amazon ECS (June 2017)Julien SIMON
This document provides an overview of advanced task scheduling capabilities with Amazon ECS. It discusses the ECS placement engine which gives developers more control over task placement through constraints and strategies. Constraints allow targeting specific instance types, availability zones or custom attributes. Strategies like spread, binpack, and affinity can distribute tasks across instances. The document demonstrates how to use these features to optimize task placement and provides examples of companies using ECS for production workloads.
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:InventHenning Jacobs
Many clusters, many problems? Having many clusters has benefits: reduced blast radius, less vertical scaling of cluster components, and a natural trust boundary. In this session, Zalando shows its approach for running 140+ clusters on AWS, how it does continuous delivery for its cluster infrastructure, and how it created open-source tooling to manage cost efficiency and improve developer experience. The company openly shares its failures and the learnings collected during three years of Kubernetes in production.
AWS re:Invent session OPN211 on 2019-12-05
KubeCon EU 2016: Kubernetes and the Potential for Higher Level InterfacesKubeAcademy
Kubernetes provides rock-solid APIs for building and running your distributed systems. Pods, Services and ReplicationControllers provide trustworthy and scalable abstractions that make solving real-world infrastructure problems simpler. But that doesn’t mean interacting with those low-level primitives will be the only option for developers and operators.
Sched Link: http://sched.co/67dA
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 80+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are Open Source and can be applied to most Kubernetes deployments.
KubeCon EU 2016: Getting the Jobs Done With KubernetesKubeAcademy
When you hear words such as Kubernetes or OpenShift you immediately start thinking
about long running processes you can easily scale at will. However, Kubernetes includes a lesser known feature which allows you to run pretty much anything from simple tasks up to highly-complicated ones.
During this presentation, the author of the Job resource in Kubernetes will guide you through several techniques for performing anything ranging from simple Pi calculations to rendering a movie. No matter if you're a data scientist running large scale calculations across several data centers or a hobby programmer running simple day-to-day tasks, this presentation is to teach you how to efficiently use Kubernetes Jobs on their own or as the building blocks of something
bigger.
This presentation will feature a number of live demos to help illustrate the various ways that you can put Jobs to work. Don’t miss out on learning about one of the coolest features of Kubernetes!
Sched Link: http://sched.co/6BUw
While Go is the language-of-choice in the cloud-native world, Python has a huge community and makes it really easy to extend Kubernetes in only a few lines of code.
This talk shows examples on how to use Python to query the Kubernetes API, how to write simple controllers in only 10 lines of Python, how to build complete web UIs, and how to test everything with py.test and Kind.
Some of the open-source projects which will be covered: pykube-ng, Kubernetes Web View, kube-janitor, and Kopf (Kubernetes Operator Pythonic Framework).
Talk held in Prague on 2019-09-05:
https://www.meetup.com/Cloud-Native-Prague/events/263802447/
This document discusses previewing an application across different terminals and operating systems by using a common preview image. It also discusses load testing the application using Vegeta and monitoring CPU usage with Kubernetes horizontal pod autoscaling and the Google Cloud Monitoring API.
Kubernetes Failure Stories, or: How to Crash Your Cluster - ContainerDays EU ...Henning Jacobs
Bootstrapping a Kubernetes cluster is easy, rolling it out to nearly 200 engineering teams and operating it at scale is a challenge. In this talk, we are presenting our approach to Kubernetes provisioning on AWS, operations and developer experience for our growing Zalando developer base. We will walk you through our horror stories of operating 100+ clusters and share the insights we gained from incidents, failures, user reports and general observations. Our failure stories will be sourced from recent and past incidents, so the talk will be up-to-date with our latest experiences.
Advanced Task Scheduling with Amazon ECS (June 2017)Julien SIMON
This document provides an overview of advanced task scheduling capabilities with Amazon ECS. It discusses the ECS placement engine which gives developers more control over task placement through constraints and strategies. Constraints allow targeting specific instance types, availability zones or custom attributes. Strategies like spread, binpack, and affinity can distribute tasks across instances. The document demonstrates how to use these features to optimize task placement and provides examples of companies using ECS for production workloads.
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:InventHenning Jacobs
Many clusters, many problems? Having many clusters has benefits: reduced blast radius, less vertical scaling of cluster components, and a natural trust boundary. In this session, Zalando shows its approach for running 140+ clusters on AWS, how it does continuous delivery for its cluster infrastructure, and how it created open-source tooling to manage cost efficiency and improve developer experience. The company openly shares its failures and the learnings collected during three years of Kubernetes in production.
AWS re:Invent session OPN211 on 2019-12-05
KubeCon EU 2016: Kubernetes and the Potential for Higher Level InterfacesKubeAcademy
Kubernetes provides rock-solid APIs for building and running your distributed systems. Pods, Services and ReplicationControllers provide trustworthy and scalable abstractions that make solving real-world infrastructure problems simpler. But that doesn’t mean interacting with those low-level primitives will be the only option for developers and operators.
Sched Link: http://sched.co/67dA
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 80+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are Open Source and can be applied to most Kubernetes deployments.
KubeCon EU 2016: Getting the Jobs Done With KubernetesKubeAcademy
When you hear words such as Kubernetes or OpenShift you immediately start thinking
about long running processes you can easily scale at will. However, Kubernetes includes a lesser known feature which allows you to run pretty much anything from simple tasks up to highly-complicated ones.
During this presentation, the author of the Job resource in Kubernetes will guide you through several techniques for performing anything ranging from simple Pi calculations to rendering a movie. No matter if you're a data scientist running large scale calculations across several data centers or a hobby programmer running simple day-to-day tasks, this presentation is to teach you how to efficiently use Kubernetes Jobs on their own or as the building blocks of something
bigger.
This presentation will feature a number of live demos to help illustrate the various ways that you can put Jobs to work. Don’t miss out on learning about one of the coolest features of Kubernetes!
Sched Link: http://sched.co/6BUw
Why I love Kubernetes Failure Stories and you should too - GOTO BerlinHenning Jacobs
Talk held on 2019-10-24 at GOTO Berlin:
Everybody loves failure stories, but maybe for the wrong reasons: Schadenfreude and Internet comment threads are the dark side; continuous improvement through blameless postmortems, sharing incidents, and documenting learnings is what motivated me to compile the list of Kubernetes Failure Stories. Kubernetes gives us a infrastructure platform to talk in the same "language" and foster collaboration across organizations. In this talk, I will walk you through our horror stories of operating 100+ clusters and share the insights we gained from incidents, failures, user reports and general observations. I will highlight why Kubernetes makes sense despite its perceived complexity. Our failure stories will be sourced from recent and past incidents, so the talk will be up-to-date with our latest experiences.
https://gotober.com/2019/sessions/1129/why-i-love-kubernetes-failure-stories-and-you-should-too
BlaBlaCar is moving its infrastructure to be fully containerized using rkt and CoreOS. Key tools developed include dgr for building and running containers and ggn for managing services on fleet clusters. Service discovery is handled by go-nerve and go-synapse which monitor services in Zookeeper. The infrastructure aims for standardization, simplicity and removing unique configurations ("snowflakes"). Over 300 servers across multiple data centers are now managed this way.
Vikram Hosakote gave a presentation on using the Bullseye code coverage tool to generate code coverage numbers in Cisco NXOS. Bullseye is used to capture coverage of C and C++ code and provide a ratio of tested vs total lines of code. The presentation covered building a Bullseye NXOS image, running tests to generate coverage files, processing the files on a Linux server, and viewing coverage reports in Bullseye's GUI or merged across devices. Automation ideas and integration with eARMS testing were also discussed.
KubeCon EU 2016: Using Traffic Control to Test Apps in KubernetesKubeAcademy
Testing applications is important, as shown by the rise of continuous integration and automated testing. In this talk, I will focus on one area of testing that is difficult to automate: poor network connectivity. Developers usually work within reliable networking conditions so they might not notice issues that arise in other networking conditions. I will give examples of software that would benefit from test scenarios with varying connectivity. I will explain how traffic control on Linux can help to simulate various network connectivity. Finally, I will run a demo showing how an application running in Kubernetes behaves when changing network parameters.
Sched Link: http://sched.co/6Bb3
This document summarizes Jean-Frederic Clere's presentation on moving a Tomcat cluster to the cloud. It discusses session replication in Tomcat clusters and challenges in the cloud like lack of multicast. It introduces solutions like KUBEPing and DNSPing that enable peer discovery through the Kubernetes API and DNS lookups. The presentation demonstrates these solutions in Katacoda tutorials and shows an operator that automates deployment. It aims to make Tomcat highly available in cloud environments like Kubernetes.
KubeCon EU 2016: A Practical Guide to Container SchedulingKubeAcademy
Containers are at the forefront of a new wave of technology innovation but the methods for scheduling and managing them are still new to most developers. In this talk we'll look at the kind of problems that container scheduling solves and at how maximising efficiency and maiximising QoS don't have to be exclusive goals. We'll take a behind the scenes look at the Kubernetes scheduler: How does it prioritize? What about node selection and external dependencies? How do you schedule based on your own specific needs? How does it scale and what’s in it both for developers already using containers and for those that aren't? We’ll use a combination of slides, code, demos to answer all these questions and hopefully all of yours.
Sched Link: http://sched.co/6BZa
Leveraging the Power of containerd Events - Evan HazlettDocker, Inc.
containerd provides the low-level functionality that enables the Docker Engine to run containers. containerd events provide a simple, yet powerful mechanism to integrate with virtually any other system with minimal effort. This talk will cover what containerd events are and how to use them for integration with systems ranging from monitoring and logging to container networking using CNI (Container Network Interface) plugins.
This document provides an overview and agenda for a presentation on Nomad, an open source cluster scheduler created by HashiCorp. The presentation will cover Nomad fundamentals including architecture, job configuration, and scheduling. It will also demonstrate Nomad's ability to schedule a million containers across thousands of hosts on Google Cloud Platform.
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...KubeAcademy
Load balancing is an important part of any resilient web application. Kubernetes supports a few options for external load balancing, but they are limited in features. After a brief discussion of those options and the features they lack, we’ll show how to build an advanced load balancing solution for Kubernetes on top of NGINX, utilizing Kubernetes features including Ingress, Annotations, and ConfigMap. We’ll conclude with a demo of how to use NGINX and NGINX Plus to expose services to the Internet.
Sched Link: http://sched.co/6Bc9
Managing GCP Projects with Terraform (devfest Pisa 2018)Giovanni Toraldo
Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. It uses a declarative configuration file to describe infrastructure and allows incremental changes through a plan and apply process. The document provides an overview of Terraform and demonstrates how to set up a Google Cloud Platform project and deploy a virtual machine instance on GCP using Terraform. It also shows how to output the instance's IP address, upgrade the instance's machine type, attach additional disks, and manage multiple instances with disks using variables and counts.
Introduce the basic concept of load-balancing, common implementations of load-balancing and the detail fo kubernetes service. In the last, demonstrate how to modify the linux iptable kernel module to fulfill the layer-7 load-balcning for kubernetes
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014Amazon Web Services
Tuning your EC2 web server will help you to improve application server throughput and cost-efficiency as well as reduce request latency. In this session we will walk through tactics to identify bottlenecks using tools such as CloudWatch in order to drive the appropriate allocation of EC2 and EBS resources. In addition, we will also be reviewing some performance optimizations and best practices for popular web servers such as Nginx and Apache in order to take advantage of the latest EC2 capabilities.
Helm is a package manager for Kubernetes that makes it easier to deploy and manage Kubernetes applications. It allows you to define, install and upgrade Kubernetes applications known as charts. Helm uses templates to define the characteristics of Kubernetes resources and allows parameterization of things like container images, resource requests and limits. The Helm client interacts with Tiller, the server-side component installed in the Kubernetes cluster, to install and manage releases of charts.
The document describes a presentation about Nomad, an open-source cluster scheduler and workload orchestrator made by HashiCorp. It outlines the steps to deploy and configure a Nomad cluster across multiple datacenters and regions, including initializing a Nomad cluster, creating and running a sample job, extending the cluster to a new datacenter in France, and updating the job configuration. It also demonstrates monitoring the status of the Consul and Nomad services and sample applications running on the cluster.
The document discusses OpenShift security context constraints (SCCs) and how to configure them to allow running a WordPress container. It begins with an overview of SCCs and their purpose in OpenShift for controlling permissions for pods. It then describes issues running the WordPress container under the default "restricted" SCC due to permission errors. The document explores editing the "restricted" SCC and removing capabilities and user restrictions to address the errors. Alternatively, it notes the "anyuid" SCC can be used which is more permissive and standard for allowing the WordPress container to run successfully.
15 kubernetes failure points you should watchSysdig
Jorge Salamero discusses 15 failure points to monitor in Kubernetes:
1) Application metrics like connections, response time, and errors
2) Node availability and resource usage of CPU, memory, and disk
3) Ensuring deployments are running the desired number of instances and not experiencing glitches
4) Monitoring pod status, restarts, and the health of the Kubernetes API server and services like KubeDNS
5) Validating Kubernetes configuration changes with tools that monitor deployment commands
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with PrometheusOpenStack Korea Community
This document discusses using Prometheus for open infrastructure and cloud monitoring. It introduces Prometheus as a time series database and monitoring tool. Key features covered include metrics collection, service discovery, graphing, and alerting. The architecture of Prometheus is explained, including scrapping metrics directly or via exporters. A demo of Prometheus and Grafana is proposed to monitor Kubernetes clusters and visualize CPU usage. Alerting configuration and routes in Prometheus and Alertmanager are also summarized.
Dayta AI Seminar - Kubernetes, Docker and AI on CloudJung-Hong Kim
Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. Kubernetes services expose these units to enable dynamic load balancing while maintaining session affinity. It also provides self-healing capabilities by restarting containers that fail, replacing them, and killing containers that don't respond to their health check.
Why I love Kubernetes Failure Stories and you should too - GOTO BerlinHenning Jacobs
Talk held on 2019-10-24 at GOTO Berlin:
Everybody loves failure stories, but maybe for the wrong reasons: Schadenfreude and Internet comment threads are the dark side; continuous improvement through blameless postmortems, sharing incidents, and documenting learnings is what motivated me to compile the list of Kubernetes Failure Stories. Kubernetes gives us a infrastructure platform to talk in the same "language" and foster collaboration across organizations. In this talk, I will walk you through our horror stories of operating 100+ clusters and share the insights we gained from incidents, failures, user reports and general observations. I will highlight why Kubernetes makes sense despite its perceived complexity. Our failure stories will be sourced from recent and past incidents, so the talk will be up-to-date with our latest experiences.
https://gotober.com/2019/sessions/1129/why-i-love-kubernetes-failure-stories-and-you-should-too
BlaBlaCar is moving its infrastructure to be fully containerized using rkt and CoreOS. Key tools developed include dgr for building and running containers and ggn for managing services on fleet clusters. Service discovery is handled by go-nerve and go-synapse which monitor services in Zookeeper. The infrastructure aims for standardization, simplicity and removing unique configurations ("snowflakes"). Over 300 servers across multiple data centers are now managed this way.
Vikram Hosakote gave a presentation on using the Bullseye code coverage tool to generate code coverage numbers in Cisco NXOS. Bullseye is used to capture coverage of C and C++ code and provide a ratio of tested vs total lines of code. The presentation covered building a Bullseye NXOS image, running tests to generate coverage files, processing the files on a Linux server, and viewing coverage reports in Bullseye's GUI or merged across devices. Automation ideas and integration with eARMS testing were also discussed.
KubeCon EU 2016: Using Traffic Control to Test Apps in KubernetesKubeAcademy
Testing applications is important, as shown by the rise of continuous integration and automated testing. In this talk, I will focus on one area of testing that is difficult to automate: poor network connectivity. Developers usually work within reliable networking conditions so they might not notice issues that arise in other networking conditions. I will give examples of software that would benefit from test scenarios with varying connectivity. I will explain how traffic control on Linux can help to simulate various network connectivity. Finally, I will run a demo showing how an application running in Kubernetes behaves when changing network parameters.
Sched Link: http://sched.co/6Bb3
This document summarizes Jean-Frederic Clere's presentation on moving a Tomcat cluster to the cloud. It discusses session replication in Tomcat clusters and challenges in the cloud like lack of multicast. It introduces solutions like KUBEPing and DNSPing that enable peer discovery through the Kubernetes API and DNS lookups. The presentation demonstrates these solutions in Katacoda tutorials and shows an operator that automates deployment. It aims to make Tomcat highly available in cloud environments like Kubernetes.
KubeCon EU 2016: A Practical Guide to Container SchedulingKubeAcademy
Containers are at the forefront of a new wave of technology innovation but the methods for scheduling and managing them are still new to most developers. In this talk we'll look at the kind of problems that container scheduling solves and at how maximising efficiency and maiximising QoS don't have to be exclusive goals. We'll take a behind the scenes look at the Kubernetes scheduler: How does it prioritize? What about node selection and external dependencies? How do you schedule based on your own specific needs? How does it scale and what’s in it both for developers already using containers and for those that aren't? We’ll use a combination of slides, code, demos to answer all these questions and hopefully all of yours.
Sched Link: http://sched.co/6BZa
Leveraging the Power of containerd Events - Evan HazlettDocker, Inc.
containerd provides the low-level functionality that enables the Docker Engine to run containers. containerd events provide a simple, yet powerful mechanism to integrate with virtually any other system with minimal effort. This talk will cover what containerd events are and how to use them for integration with systems ranging from monitoring and logging to container networking using CNI (Container Network Interface) plugins.
This document provides an overview and agenda for a presentation on Nomad, an open source cluster scheduler created by HashiCorp. The presentation will cover Nomad fundamentals including architecture, job configuration, and scheduling. It will also demonstrate Nomad's ability to schedule a million containers across thousands of hosts on Google Cloud Platform.
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...KubeAcademy
Load balancing is an important part of any resilient web application. Kubernetes supports a few options for external load balancing, but they are limited in features. After a brief discussion of those options and the features they lack, we’ll show how to build an advanced load balancing solution for Kubernetes on top of NGINX, utilizing Kubernetes features including Ingress, Annotations, and ConfigMap. We’ll conclude with a demo of how to use NGINX and NGINX Plus to expose services to the Internet.
Sched Link: http://sched.co/6Bc9
Managing GCP Projects with Terraform (devfest Pisa 2018)Giovanni Toraldo
Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. It uses a declarative configuration file to describe infrastructure and allows incremental changes through a plan and apply process. The document provides an overview of Terraform and demonstrates how to set up a Google Cloud Platform project and deploy a virtual machine instance on GCP using Terraform. It also shows how to output the instance's IP address, upgrade the instance's machine type, attach additional disks, and manage multiple instances with disks using variables and counts.
Introduce the basic concept of load-balancing, common implementations of load-balancing and the detail fo kubernetes service. In the last, demonstrate how to modify the linux iptable kernel module to fulfill the layer-7 load-balcning for kubernetes
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014Amazon Web Services
Tuning your EC2 web server will help you to improve application server throughput and cost-efficiency as well as reduce request latency. In this session we will walk through tactics to identify bottlenecks using tools such as CloudWatch in order to drive the appropriate allocation of EC2 and EBS resources. In addition, we will also be reviewing some performance optimizations and best practices for popular web servers such as Nginx and Apache in order to take advantage of the latest EC2 capabilities.
Helm is a package manager for Kubernetes that makes it easier to deploy and manage Kubernetes applications. It allows you to define, install and upgrade Kubernetes applications known as charts. Helm uses templates to define the characteristics of Kubernetes resources and allows parameterization of things like container images, resource requests and limits. The Helm client interacts with Tiller, the server-side component installed in the Kubernetes cluster, to install and manage releases of charts.
The document describes a presentation about Nomad, an open-source cluster scheduler and workload orchestrator made by HashiCorp. It outlines the steps to deploy and configure a Nomad cluster across multiple datacenters and regions, including initializing a Nomad cluster, creating and running a sample job, extending the cluster to a new datacenter in France, and updating the job configuration. It also demonstrates monitoring the status of the Consul and Nomad services and sample applications running on the cluster.
The document discusses OpenShift security context constraints (SCCs) and how to configure them to allow running a WordPress container. It begins with an overview of SCCs and their purpose in OpenShift for controlling permissions for pods. It then describes issues running the WordPress container under the default "restricted" SCC due to permission errors. The document explores editing the "restricted" SCC and removing capabilities and user restrictions to address the errors. Alternatively, it notes the "anyuid" SCC can be used which is more permissive and standard for allowing the WordPress container to run successfully.
15 kubernetes failure points you should watchSysdig
Jorge Salamero discusses 15 failure points to monitor in Kubernetes:
1) Application metrics like connections, response time, and errors
2) Node availability and resource usage of CPU, memory, and disk
3) Ensuring deployments are running the desired number of instances and not experiencing glitches
4) Monitoring pod status, restarts, and the health of the Kubernetes API server and services like KubeDNS
5) Validating Kubernetes configuration changes with tools that monitor deployment commands
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with PrometheusOpenStack Korea Community
This document discusses using Prometheus for open infrastructure and cloud monitoring. It introduces Prometheus as a time series database and monitoring tool. Key features covered include metrics collection, service discovery, graphing, and alerting. The architecture of Prometheus is explained, including scrapping metrics directly or via exporters. A demo of Prometheus and Grafana is proposed to monitor Kubernetes clusters and visualize CPU usage. Alerting configuration and routes in Prometheus and Alertmanager are also summarized.
Dayta AI Seminar - Kubernetes, Docker and AI on CloudJung-Hong Kim
Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. Kubernetes services expose these units to enable dynamic load balancing while maintaining session affinity. It also provides self-healing capabilities by restarting containers that fail, replacing them, and killing containers that don't respond to their health check.
Cloud-native .NET Microservices mit KubernetesQAware GmbH
Mario-Leander Reimer presented on building cloud-native .NET microservices with Kubernetes. He discussed key principles of cloud native applications including designing for distribution, performance, automation, resiliency and elasticity. He also covered containerization with Docker, composing services with Kubernetes and common concepts like deployments, services and probes. Reimer provided examples of Dockerfiles, Kubernetes definitions and using tools like Steeltoe and docker-compose to develop cloud native applications.
This document discusses Docker container networking and publishing applications securely with Docker Enterprise. It provides an overview of key Kubernetes networking concepts like pods, services, ingress and network policies. It then details how Docker Enterprise integrates with Calico for container networking and policy-driven security. The integration provides connectivity between pods and services out of the box. It also allows enforcing network policies and zero-trust security through Calico's policy engine. The document concludes with demos of publishing sample applications using Docker Swarm services and Kubernetes ingress resources.
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...Patrick Chanezon
This document provides an overview of developing and deploying Java applications on Azure using Docker. It discusses using Docker to build Java applications, running containers, and deploying stacks. It also covers Docker Enterprise Edition, including subscriptions, certifications, and security features. Finally, it demonstrates using Docker on Azure, such as with Azure Container Service, and shows examples of building, running, and deploying Java applications with Docker.
Capacity planning is a difficult challenge faced by most companies. If you have too few machines, you will not have enough compute resources available to deal with heavy loads. On the other hand, if you have too many machines, you are wasting money. This is why companies have started investing in automatically scaling services and infrastructure to minimize the amount of wasted money and resources.
In this talk, Nathan will describe how Yelp is using PaaSTA, a PaaS built on top of open source tools including Docker, Mesos, Marathon, and Chronos, to automatically and gracefully scale services and the underlying cluster. He will go into detail about how this functionality was implemented and the design designs that were made while architecting the system. He will also provide a brief comparison of how this approach differs from existing solutions.
Scaling Docker Containers using Kubernetes and Azure Container ServiceBen Hall
This document discusses scaling Docker containers using Kubernetes and Azure Container Service. It begins with an introduction to containers and Docker, including how containers improve dependency and configuration management. It then demonstrates building and deploying containerized applications using Docker and discusses how to optimize Docker images. Finally, it introduces Kubernetes as a tool for orchestrating containers at scale and provides an example of deploying a containerized application on Kubernetes in Azure.
Drupaljam 2017 - Deploying Drupal 8 onto Hosted Kubernetes in Google CloudDropsolid
In this presentation I explain using video examples how kubernetes works and how this can be used to host your Drupal 7 or 8 site. There are obviously also gotcha's and I'd like to warn you to not use this in production until you've verified it
This document provides an overview of cloud native applications and the cloud native stack. It discusses key concepts like microservices, containerization, composition using Docker and Docker Compose, and orchestration using Kubernetes. It provides examples of building a simple microservices application with these technologies and deploying it on Kubernetes. Overall it serves as a guide to developing and deploying cloud native applications.
A hitchhiker‘s guide to the cloud native stackQAware GmbH
Container Days 2017, Hamburg: Vortrag von Mario-Leander Reimer (@LeanderReimer, Cheftechnologe bei QAware).
Abstract: Cloud-Größen wie Google, Twitter und Netflix haben die Kernbausteine ihrer Infrastruktur quelloffen verfügbar gemacht. Das Resultat aus vielen Jahren Cloud-Erfahrung ist nun frei zugänglich, und jeder kann seine eigenen Cloud-nativen Anwendungen entwickeln – Anwendungen, die in der Cloud zuverlässig laufen und fast beliebig skalieren. Die einzelnen Bausteine wachsen zu einem großen Ganzen zusammen, dem Cloud Native Stack.
In dieser Session stellen wir die wichtigsten Konzepte und Schlüsseltechnologien vor und bringen dann eine Spring-Cloud-basierte Beispielanwendung schrittweise auf Kubernetes und DC/OS zum Laufen. Dabei diskutieren wir verschiedene praktikable Architekturalternativen.
Developer Experience Cloud Native - Become Efficient and Achieve ParityMichael Hofmann
Zu einer effizienten Cloud-Entwicklung gehört nicht nur ein schnelles Deployment der Services in die Cloud. Auch ein reibungsloses Entwickeln und Debuggen der Services direkt in der Cloud steigert die Effizienz. Darüber hinaus sollte die Entwicklungsumgebung möglichst identisch mit der Produktionsumgebung sein. Diesen Umstand empfiehlt schon seit langem die 12-Factor-App-Auflistung in Punkt 10: "Dev/prod parity".
In dieser Session wird eine Auswahl an Open-Source-Tools vorgestellt, die einem Java-Entwickler bei der Erreichung folgender Ziele behilflich sind: schnelles und synchrones Deployment (Skaffold), Entwicklung und Debugging im Kubernetes Pod (OpenLiberty mit Ksync, Quarkus Live Coding), Erweiterung des Kubernetes Perimeter für eine lokale Entwicklung (telepresence oder Bridge to Kubernetes). Die einfache Handhabung dieser Tools verdeutlichen die zugehörigen Demos in dieser Session.
Running MongoDB Enterprise on KubernetesAriel Jatib
Video : https://www.youtube.com/watch?v=vmIOCYZRZu4&t=2908s
Slides from Jason Mimicks presentation at the June 2018 Chicago Kubernetes Meetup - video here : https://youtu.be/vmIOCYZRZu4?t=48m28s
Real World Lessons on the Pain Points of Node.JS ApplicationBen Hall
This document provides lessons learned from real world experiences with Node.js applications. It discusses the importance of upgrading to newer Node.js versions for security and features. It also emphasizes the importance of error handling, using promises for async flow control, and scaling applications using Docker containers. Debugging and monitoring Node.js applications for performance is also covered.
Presentation from DockerCon EU '17 about how Aurea achieved over 50% cost reduction using Docker and about two major technical obstacles we had when dockerizing legacy applications.
Since announcing Openshift version 4, deploying a single OpenShift cluster has become pretty simple. However, simple does not mean scalable, especially when you need to deploy tens, hundreds or even thousands of clusters. For example, a cellular company deploying OpenShift on Edge at the base of each of their cell towers. It would be very difficult to try and manage this using the default deployment tool.
Zero Touch Provisioning (ZTP), along with GitOps methodologies, can be leveraged to automate OpenShift deployment in parallel to multiple sites, without human intervention.
ZTP is a component of Open Cluster Management (OCM), an operator that enables a single OCP cluster to manage a fleet of clusters. This functionality uses declarative APIs to enable the configuration of a vast number of OpenShift clusters. ZTP integrates multiple open-source projects: OCM, Hive, Assisted Installer and Metal³.
In this session, you will learn about ZTP architecture and its components. We will discuss the installation flow and how the components interact with each other. We will learn about the possibility of installing in an air-gapped environment (disconnected from the Internet) and finally demonstrate how to install a Single Node Openshift on bare metal using only a few manifests.
"Look Ma, no hands! Zero Touch Provisioning for OpenShift" DevConf.US 2021Freddy Rolland
Since announcing Openshift version 4, deploying a single OpenShift cluster has become pretty simple. However, simple does not mean scalable, especially when you need to deploy tens, hundreds or even thousands of clusters. For example, a cellular company deploying OpenShift on Edge at the base of each of their cell towers. It would be very difficult to try and manage this using the default deployment tool.
Zero Touch Provisioning (ZTP), along with GitOps methodologies, can be leveraged to automate OpenShift deployment in parallel to multiple sites, without human intervention.
ZTP is a component of Open Cluster Management (OCM), an operator that enables a single OCP cluster to manage a fleet of clusters. This functionality uses declarative APIs to enable the configuration of a vast number of OpenShift clusters. ZTP integrates multiple open-source projects: OCM, Hive, Assisted Installer and Metal³.
In this session, you will learn about ZTP architecture and its components. We will discuss the installation flow and how the components interact with each other. We will learn about the possibility of installing in an air-gapped environment (disconnected from the Internet) and finally demonstrate how to install a Single Node Openshift on bare metal using only a few manifests.
TensorFlow can be installed and run in a distributed environment using Docker. The document discusses setting up TensorFlow workers and parameter servers in Docker containers using a Docker compose file. It demonstrates building Docker images for each role, and configuring the containers to communicate over gRPC. A Jupyter server container is also built to host notebooks. The distributed TensorFlow environment is deployed locally for demonstration purposes. Future directions discussed include running the distributed setup on a native cluster using tools like Docker Swarm or RancherOS, and testing TensorFlow with GPU support in Docker.
Docker for mac & local developer environment optimizationRadek Baczynski
Docker can be used to optimize a local development environment by providing the same environment as production. Issues with performance on Docker for Mac can be addressed through techniques like using delegated volume mounts, removing xdebug, and using a solution like mutagen that syncs files without mounted volumes for faster performance. Mutagen provides near native performance, easy setup and monitoring, and works with any dockerized application.
Similar to Kubernetes Navigation Stories – DevOpsStage 2019, Kyiv (20)
Ready to Unlock the Power of Blockchain!Toptal Tech
Imagine a world where data flows freely, yet remains secure. A world where trust is built into the fabric of every transaction. This is the promise of blockchain, a revolutionary technology poised to reshape our digital landscape.
Toptal Tech is at the forefront of this innovation, connecting you with the brightest minds in blockchain development. Together, we can unlock the potential of this transformative technology, building a future of transparency, security, and endless possibilities.
Instagram has become one of the most popular social media platforms, allowing people to share photos, videos, and stories with their followers. Sometimes, though, you might want to view someone's story without them knowing.
Discover the benefits of outsourcing SEO to Indiadavidjhones387
"Discover the benefits of outsourcing SEO to India! From cost-effective services and expert professionals to round-the-clock work advantages, learn how your business can achieve digital success with Indian SEO solutions.
1. THE FOLLOWING CONTAINS CONFIDENTIAL INFORMATION.
DO NOT DISTRIBUTE WITHOUT PERMISSION.
Kubernetes Navigation Stories
DevOpsStage 2019
2. Director of Infrastructure Engineering at thredUP
Senior Engineering Manager at Hotwire
Roman Chepurnyi
Staff Software Engineer at thredUP
Senior Software Engineer at Toptal
Oleksii Asiutin
12. 12
AWS-IAM-Authenticator – kubeconfig generation
dev
dev lead
infra team
kubeconfig generation service
IAM identity: john-smith
Kubeconfig for dev
IAM identity: lara-jones
Kubeconfig for dev-lead
prod
stage
dev
+ group
kubeconfig
IAM user group
22. 22
Local Development
macbook: Thredup $ git clone git@github.com:thredup/node-proxy.git
Cloning into 'node-proxy'...
...
macbook: Thredup $ cd node-proxy/
macbook: node-proxy (master) $ npm install
added 6 packages from 8 contributors and audited 6 packages in 0.595s
found 0 vulnerabilities
macbook: node-proxy (master) $ npm test
> proxy@1.0.0 test ~/Thredup/node-proxy
...
macbook: node-proxy (master) $ npm start
> proxy@1.0.0 start
> node server.js
23. 23
Local Development with Docker
macbook: Thredup $ docker run -it -v ${PWD}:/app -p 3000:3000
node:12-alpine sh
/ $ apk add --no-cache mysql-dev
/ $ npm install
/ $ npm test
/ $ npm start
> proxy@1.0.0 start
> node server.js
24. 24
Local Development with Docker Compose
version: "3.7"
services:
web:
image: node:12-alpine
volumes:
- ./:/app
ports:
- "3000"
environment:
REDIS_HOST: "127.0.0.1"
mysql:
image: ...
...
redis:
image: ...
25. 25
Local Development with Docker Compose
macbook: Thredup $ docker-compose up -d
…
macbook: Thredup $ docker-compose exec web sh
/ $ npm install
/ $ npm test
/ $ npm start
> proxy@1.0.0 start
> node server.js
26. 26
Local Development with Docker Compose
And then you need another service as a dependency ;-)
...and another one
…
docker-compose.yaml ~ 330 lines
MySQL DB ~25Gb
30. 30
Horizontal Pod Autoscaling (HPA)
● Do not over-provision
● Be ready for traffic spikes
metrics:
- type: External
external:
metricName: trace.rack.request.hits
metricSelector:
matchLabels:
env : production
service : some-service
targetAverageValue: 10
34. 34
Spot instances and AZRebalance
● spot termination works https://github.com/mumoshu/kube-spot-termination-notice-handler
● except when instance is terminated by Availability Zone
Terminating EC2 instance: i-0e685dc2a84b65f63
Cause:CauseAt 2019-07-18T06:09:59Z instances were launched to balance instances in
zones us-east-1a us-east-1e with other zones resulting in more than desired number of
instances in the group. At 2019-07-18T06:11:30Z an instance was taken out of service
in response to a difference between desired and actual capacity, shrinking the
capacity from 4 to 3. At 2019-07-18T06:11:30Z instance i-0e685dc2a84b65f63 was
selected for termination.
[Roman] Let me introduce Oleksii - staff engineer at ThredUP. Oleksii is an infrastructure enthusiast, he is an co-organizer of monthly devops digest on dou.ua, he likes sportcars and runs instagram account dedicated to cooking
[Olek] Thank you Roman. Roman is a Director of our distributed Infrastructure team. I'd say Roman is a leader, he manages us in a way we can bring innovations in our company platform.
Before thredUP Roman worked at one of the biggest hotel discounts aggregator – Hotwire.
He lives in California
Roman is as confident navigating Kubernetes as navigating a sailing boat in San Francisco bay during weekends. Great to have Roman at the helm! I know it personally.
Switching to case studies. Think abot how to do it.
[Olek] In: access Mid: danger of shared root key Out: granular permissions
Okay, here was a brief introduction, it's over and now it comes the navigation stories itself. Like Roman told us one day you wake up and realize you migrated your infra to k8s and yeah it's cool. But during the migration you cut corners and now it's probably the time to review and fill some gaps.
Lots of us been in this situation - Hey Infra team, I need an access to a Kubernetes cluster.
Really? What are you going to do there? When we created our k8s clusters we used shared admin certificate inside the team.
And on early stages we gave it also to Engineers who asked for an access. Okay, here it is, but please, use it carefully.
Aha. And then, you know... Guys, Checkout is down, where is our checkout service? Guys? Oh, I might have deleted it on prod instead of dev, ouch.
So we need to organize users in groups maybe and give them granular permissions per cluster.
[Olek] In: granular access Mid: certs, openid - no Out: aws-auth - yes, review
For authorization we use RBAC, it's defacto standard for k8s now. We can create user groups and separate permissions with it.
We've reviewed multiple authentication mechanisms for users. We started with shared root certificate as I said before, and we realized that it would be hard to create a separate certificate for each user (mainly because k8s does not support certificate revocation policy).
After that we've reviewed openid connect mechanism. It works fine and it's good, but the downside for us was that our single sign-on provider does not provided user groups support with openid, so it's possible to authorize a user but you can not get groups and we need it for our ACL
Finally we stopped at the tool which name nowadays is aws-iam-authenticator. The days we implemented auth in k8s its name was heptio-authenticator. Nowadays it's the default auth method in AWS EKS and GCE and Azure have the same tools for their platforms. Lets briefly review how it works.
[Olek]
Our kubectl auth config uses tokens, which are gnerated by client aws-auth binary. The token is generated base on your AWS credentials and contains a cluster name and a role. For simplicity lets assume a role represents user group here. So if you're cluster admin you specify adin role, if you're read-only user you have a different one.
Then you send your API token to the k8s server and the server has webhook configured to talk to a daemonset, it checks if user is allowed to use the role from token. If everything is okay user is successfully authenticated and proper user groups is assigned for it's session.
[Olek]
So basically for every our cluster user needs a proper IAM role arn. For example for prod cluster user can access with read-only permissions but for development cluster the user has admin role. And it's actually not what our engineers should care about maintaining their local kubeconfig.
[Olek] So we created a service which generates kubeconfig based on user AWS IAM credentials. So for now a user executed one-liner shell script in a terminal and then our engineer ha a cronjob installed which generates or re-generates kubeconfig periodically. Why did we implement this as a cronjob? Time to time we can update our group hierarchy, add or remove users from groups and in case of a cronjob these changes are deployed to users machines automatically.
[CONCLUSION]. What did our engineers get from it? Everyone has a kubectl with kubeconfig fully managed by infrastructure team. And Infra team has a visibility and control in terms of identity and access management. So we applied IAM best practices into k8s auth management in that way.
[Olek]
So here is our secrets management evolution path. It looks a little bit strange from the first view but let me explain why is that.
[Olek] we setup Hashicorp Vault, we love it and it’s super-cool and gives you everything you need: Secrets management, good security level, infra perks.
Here is how we work with it, so we have init container which grabs all necessary secrets and puts it into shared volume, then main service container reads it from the volume and initializes it’s env vars with secrets values. There is even an open-source project for the init container called Daytona.
We have Vault setup in our clusters and it can be used by our Engineers but in fact it didn’t get a good spread. Maybe it’s because Engineers didn’t have enough time to dive into it, maybe it’s because of our not so good guides. We succeeded in setting it up but we failed in spreading it and make our colleagues to use it. Our Engineers did not add secrets to Vault and did not use it. So we started to investigate further.
[Olek] and we stopped at SOPS project which is Secrets OPerationS. It’s a simple and flexible tools for managing secrets. What it does is text files encryption and decryption with support of YAML, JSON and .env formats. It supports AWS, GCE and Azure key management systems and old-straight PGP encryption.
Here is an example of a yaml file containing database credentials for a service.
[Olek] And here is how this file looks after encryption, so we have key by key encryption instead of the whole file.
[Olek] We deploy our services with helm package manager and for helm release we specify both unencrypted values with generic release configuration and sops encrypted values which are used to create secrets. You can see an example of helm template for secrets creation.
[Olek] So to deploy a service all you need to do is decrypt your github stored secrets first, and then run a helm release.
This solution get good adoption among our engineers and it turned out to be more popular than vault. It might be simpler also, sops turned out to be more developer friendly in thredUP.
That way we moved from fault-intolerant de-synchronized and non-manageable manual secret creation to fully predictable and monitored secret management solution, filling one more migration gap.
All helm charts are available.. We can use them to run on-demand staging setup
advantages : 1) always up-to-date with latest code and data 2) scalable
[Olek] Okay, so we just told you how we manage dynamic stagings so engineers can present results of their work to other coworkers. But where do engineers spend most of their working hours? It's local development, when you write a code on your laptop, when you run the tests and do debugging.
[Olek] And when we talk about local development the real basic workflow is just to clone a git repo, install dependencies and run the service (lets assume it's a web application). Here is an example of doing that way with nodejs. BUT it's not that simple in real world, right?
[Olek] When you install a service it might has native extensions in dependencies. And in that case you might need to install specific libraries on your machine. It's okay if there is a good guide on how to do it and if it's libraries don't have conflicts with another service libraries, and another, and because we have this trendy microservices architecture – and another serivce libraries. It becomes cumbersome to setup it on local machine and ... it's good we have such thing as Docker. So you create a Docker container from nodejs image mapping your codebase and ports to work with, install all neccessary libraries and do the same stuf as you did locally.
And everything is fine, you are good to go. Not really.
[Olek] So we moved from literally operating system native development to containerized development, what's next? It's probably convenient to use docker-compose to setup service dependencies. Usually it's a database, a caching layer, queues, workers.
[Olek] Then you run it, it works, it’s convenient to use it locally.
[Olek] Until your docker-compose file becomes 300+ lines long and your local database is 25Gb heavy.
[Olek] Why is it hard and unconvenient? Because you have to keep your local env up to date, because it consumes a lot of resources (we do have powerful laptops but even they have problems with resource consumption time to time). And if you have some issues with some service it’s hard to