SlideShare a Scribd company logo
1 of 37
Download to read offline
Monitoring Kubernetes Across Data
Center and Cloud
Specifically Tectonic and Google Container Engine using Datadog
Presenters:
Ilan Rabinovitch, Director of Technical Community, Datadog
Aleks Saul, Customer-Facing Engineer, CoreOS
Aparna Sinha, Senior Product Manager, Google
Google Cloud Platform
Kubernetes at a glance
Open source production-grade container scheduling and management
● Top 0.01% of all GitHub projects: 950+ contributors & 35,000+ commits
Run Anywhere: multi-cloud, on-prem, bare-metal, OpenStack etc
Broad industry adoption
Commercial Enterprise Support
Kubernetes at a glance
Google Cloud Platform
Kubernetes provides container-centric infrastructure
Once specific containers are no longer bound to specific machines/VMs,
host-centric infrastructure no longer works
• Scheduling: Decide where my containers should run
• Lifecycle and health: Keep my containers running despite failures
• Scaling: Make sets of containers bigger or smaller
• Naming and discovery: Find where my containers are now
• Load balancing: Distribute traffic across a set of containers
• Storage volumes: Provide data to containers
• Logging and monitoring: Track what’s happening with my containers
• Debugging and introspection: Enter or attach to containers
• Identity and authorization: Control who can do things to my containers
Google Cloud Platform
Kubernetes offers choice and flexibility for Hybrid Cloud
Setting up and managing a cluster
• Choose a cloud: GCP, AWS, Azure, Rackspace, on-premises, ...
• Choose a node OS: CoreOS, Atomic, RHEL, Debian, CentOS, Ubuntu, ...
• Provision machines: create VMs, install Docker, ...
• Configure networking: IP ranges for Pods, Services, SDN, firewalls, ...
• Start cluster services: DNS, logging, monitoring, …
• Start and configure Kubernetes
• Manage nodes: kernel upgrades, OS updates, hardware failures, …
GKE is Google hosted and managed Kubernetes
• Directly uses upstream open source
• Rolls out within 3-5 business days of the latest open source release
• Alpha features also now available through ‘alpha clusters’
Google Cloud Platform
Google Container Engine (GKE)
“It delivers a high-performing, flexible infrastructure that lets us independently scale components for maximum efficiency”
~ Philips (Hue Lights)
“Made our engineers more productive and helped us do more work with less staff”
~ CCP Games (EVE Online)
Google Cloud Platform
How Monitoring Works in Google Container Engine
Master
Storage BackendHeapster
Kubelet
cAdvisor
Node
Kubelet
cAdvisor
Node
Google Cloud Platform
Google Container Engine Monitoring Server
Metrics used for self repair, and exposed to end users via Stackdriver
Primary job is to ensure that each Kubernetes master is available
● Implements the repair logic for when a cluster is non-responsive
● Automatically resizes master machines as the number of nodes grows
Also collects metrics for each cluster
● Number of resources (nodes, pods, services, namespaces, etc)
● CPU usage, limit, utilization ratio; Memory usage and limit; Page faults;
Disk usage and limit; Uptime
● Uses number of nodes for report billing status
Google Cloud Platform
Pluggable interface for cloud monitoring
Run Influx and Grafana in the cluster
● alternative to Google Cloud Monitoring
Plug in your own!
● e.g., Prometheus, Datadog etc.
Kube State metrics: (node status, node capacity, replica state, etc)
Prometheus
Google Cloud Platform
Kube State Metrics
● Generates metrics about the state of
Kubernetes logical objects
(node status, node capacity, replica state, etc)
● Deployed alongside your other
applications as a kubernetes service.
● Exposes metrics via HTTP API or
Prometheus format
Google Cloud Platform
We focus on delivering the capabilities required by enterprise organizations
to run and manage kubernetes at scale...
● Cluster installers (for AWS and bare metal, to start).
● Management software to upgrade, backup, rollback, scale up and down the cluster.
● Console UI that surfaces management functionality, cluster information, and compute
usage to the user and includes add on services (Quay, identity and authentication).
Extending Kubernetes for the Enterprise
Google Cloud Platform
Tectonic Extends
Upstream Kubernetes
● Container orchestration
● Horizontal scale
● High availability
● Service discovery & load balancer
● Installer
● Management console
● Painless updates
● Cluster scaling
● Disaster recovery
● Alerts and logging
● Security (integrated)
● Container registry (Quay)
● Integration across environments
Extending Kubernetes for the Enterprise
Security Mgmt
Kubernetes
CoreOS Linux
Cloud Integration
Container Registry
Storage & Compute
apps/container/microservices
Google Cloud Platform
Tectonic
Kubernetes Security
● Clair: container vulnerability
scanning
● KMS integration
● LDAP integration
● RBAC integration
Extending Kubernetes for the Enterprise
Mgmt
Kubernetes
CoreOS Linux
Cloud Integration
Container Registry
Storage & Compute
apps/container/microservices
Security
•SaaS based infrastructure and application monitoring
•Focus on modern environments
•Cloud, Containers, Microservices
•Dynamic configuration models
•Processing nearly a trillion data points per day
•Intelligent Alerting and Insightful Dashboards
•Anomaly and Outlier Detection
Datadog Overview
Collecting data is cheap;
not having it when you
need it can be expensive
Operating Systems, Cloud Providers, Containers, Web Servers, Datastores, Caches,
Queues and more...
Monitor Everything
Datadog
● Deployed as a DaemonSet. One
instance per node.
● Collects metrics and events from:
○ container engine (eg Docker)
○ Kubernetes Heapster
○ kube-state-metrics
○ Deployed Applications
○ Google Monitoring APIs
● Exposes statsd end point for custom
metrics.
● Metrics are automatically tagged by
PODs, Labels, etc
Operational Complexity Increases with..
• Number of things to measure
• Velocity of change
How much we measure?
1 instance
• 10 metrics from cloud providers
1 operating system (e.g., Linux)
• 100 metrics
50~ metrics per application
Operational Complexity
100
instances
500
containers
Operational Complexity: Scale
160
metrics per host
800
metrics per host
Assuming 5 containers per host
Operational Complexity: Scale
100
instances
80,000
metrics
Assuming 5 containers per host
How much we measure?
1 instance
• 10 metrics from cloud providers
1 operating system (e.g., Linux)
• 100 metrics
50~ metrics per application
N containers
• 150*N metrics
Metrics Overload!
Operational Complexity Increases with..
• Number of things to measure
• Velocity of change
Source: Datadog
Operational Complexity Increases with..
• Number of things to measure
• Velocity of change
Monitoring Questions
• Where is a given container running?
• What is the overall capacity of my cluster?
• What port(s) are my applications running on?
• What’s the total throughput of my application?
• What’s its response time per tag? (app, version, data
center)
• What’s the distribution of 5xx error per
container? What about by data center?
Host Centric
Service Centric
Query Based Monitoring
“What’s the average throughput of application:nginx per
version ?”
“Alert me when one of my pod from replication controller:foo is
not behaving like the others?”
“Show me rate of HTTP 500 responses from nginx”
“… grouped by data center … running my app version 2….”
Service Discovery
Docker API Kubernetes
Monitoring Agent
Container
A O A O
Containers List &
Metadata
Additional Metadata
(Tags, etc)
Config Backends
Integration Configurations
Host Level
Metrics
Q&A
You can also follow us on Twitter:
@datadoghq
@googlecloud
@tectonicstack

More Related Content

What's hot

Just enough web ops for web developers
Just enough web ops for web developersJust enough web ops for web developers
Just enough web ops for web developersDatadog
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogC4Media
 
Virtualization at Gilt - Rangarajan Radhakrishnan
Virtualization at Gilt - Rangarajan RadhakrishnanVirtualization at Gilt - Rangarajan Radhakrishnan
Virtualization at Gilt - Rangarajan RadhakrishnanDatadog
 
Application Monitoring using Datadog
Application Monitoring using DatadogApplication Monitoring using Datadog
Application Monitoring using DatadogMukta Aphale
 
Running & Monitoring Docker at Scale
Running & Monitoring Docker at ScaleRunning & Monitoring Docker at Scale
Running & Monitoring Docker at ScaleDatadog
 
Serverless Swift for Mobile Developers
Serverless Swift for Mobile DevelopersServerless Swift for Mobile Developers
Serverless Swift for Mobile DevelopersAll Things Open
 
Provisioning Datadog with Terraform
Provisioning Datadog with TerraformProvisioning Datadog with Terraform
Provisioning Datadog with TerraformMatt Spurlin
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016aspyker
 
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, KayentaNetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, Kayentaaspyker
 
QCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theoryQCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theoryAysylu Greenberg
 
NetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & ContainersNetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & Containersaspyker
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Codemotion
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixCodemotion Tel Aviv
 
Lightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend
 
Herding Kats - Netflix’s Journey to Kubernetes Public
Herding Kats - Netflix’s Journey to Kubernetes PublicHerding Kats - Netflix’s Journey to Kubernetes Public
Herding Kats - Netflix’s Journey to Kubernetes Publicaspyker
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenDatabricks
 
Deploying and Operating KSQL
Deploying and Operating KSQLDeploying and Operating KSQL
Deploying and Operating KSQLconfluent
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Sourceaspyker
 
Common Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache KafkaCommon Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache Kafkaconfluent
 

What's hot (20)

Just enough web ops for web developers
Just enough web ops for web developersJust enough web ops for web developers
Just enough web ops for web developers
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
Datadog- Monitoring In Motion
Datadog- Monitoring In Motion Datadog- Monitoring In Motion
Datadog- Monitoring In Motion
 
Virtualization at Gilt - Rangarajan Radhakrishnan
Virtualization at Gilt - Rangarajan RadhakrishnanVirtualization at Gilt - Rangarajan Radhakrishnan
Virtualization at Gilt - Rangarajan Radhakrishnan
 
Application Monitoring using Datadog
Application Monitoring using DatadogApplication Monitoring using Datadog
Application Monitoring using Datadog
 
Running & Monitoring Docker at Scale
Running & Monitoring Docker at ScaleRunning & Monitoring Docker at Scale
Running & Monitoring Docker at Scale
 
Serverless Swift for Mobile Developers
Serverless Swift for Mobile DevelopersServerless Swift for Mobile Developers
Serverless Swift for Mobile Developers
 
Provisioning Datadog with Terraform
Provisioning Datadog with TerraformProvisioning Datadog with Terraform
Provisioning Datadog with Terraform
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016
 
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, KayentaNetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
 
QCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theoryQCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theory
 
NetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & ContainersNetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & Containers
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, Wix
 
Lightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend Fast Data Platform
Lightbend Fast Data Platform
 
Herding Kats - Netflix’s Journey to Kubernetes Public
Herding Kats - Netflix’s Journey to Kubernetes PublicHerding Kats - Netflix’s Journey to Kubernetes Public
Herding Kats - Netflix’s Journey to Kubernetes Public
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
 
Deploying and Operating KSQL
Deploying and Operating KSQLDeploying and Operating KSQL
Deploying and Operating KSQL
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Source
 
Common Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache KafkaCommon Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache Kafka
 

Viewers also liked

Troubleshooting Kubernetes
Troubleshooting KubernetesTroubleshooting Kubernetes
Troubleshooting KubernetesSysdig
 
CloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
CloudCamp Chicago - Big Data & Cloud May 2015 - All SlidesCloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
CloudCamp Chicago - Big Data & Cloud May 2015 - All SlidesCloudCamp Chicago
 
CloudCamp Chicago lightning talk "Connecting Vehicles on Google Cloud Platfor...
CloudCamp Chicago lightning talk "Connecting Vehicles on Google Cloud Platfor...CloudCamp Chicago lightning talk "Connecting Vehicles on Google Cloud Platfor...
CloudCamp Chicago lightning talk "Connecting Vehicles on Google Cloud Platfor...CloudCamp Chicago
 
Cashing in on logging and exception data
Cashing in on logging and exception dataCashing in on logging and exception data
Cashing in on logging and exception dataStackify
 
Native container monitoring
Native container monitoringNative container monitoring
Native container monitoringRohit Jnagal
 
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...Adrian Cockcroft
 
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSylvain Kalache
 
Managing your SaltStack Minions with Foreman
Managing your SaltStack Minions with ForemanManaging your SaltStack Minions with Foreman
Managing your SaltStack Minions with ForemanStephen Benjamin
 
Edge 2016 Session 1886 Building your own docker container cloud on ibm power...
Edge 2016 Session 1886  Building your own docker container cloud on ibm power...Edge 2016 Session 1886  Building your own docker container cloud on ibm power...
Edge 2016 Session 1886 Building your own docker container cloud on ibm power...Yong Feng
 
Deep-Dive to Application Insights
Deep-Dive to Application Insights Deep-Dive to Application Insights
Deep-Dive to Application Insights Gunnar Peipman
 
Intro to open source telemetry linux con 2016
Intro to open source telemetry   linux con 2016Intro to open source telemetry   linux con 2016
Intro to open source telemetry linux con 2016Matthew Broberg
 
Sysdig Monitorama Slides
Sysdig Monitorama SlidesSysdig Monitorama Slides
Sysdig Monitorama SlidesLoris Degioanni
 
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...Amazon Web Services
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
 
Intel SoC as a Platform to Connect Sensor Data to AWS
Intel SoC as a Platform to Connect Sensor Data to AWSIntel SoC as a Platform to Connect Sensor Data to AWS
Intel SoC as a Platform to Connect Sensor Data to AWSAmazon Web Services
 
Volta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a ServiceVolta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a ServiceLN Renganarayana
 
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...Brian Grant
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to RootsBrendan Gregg
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAdrian Cockcroft
 

Viewers also liked (20)

Troubleshooting Kubernetes
Troubleshooting KubernetesTroubleshooting Kubernetes
Troubleshooting Kubernetes
 
CloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
CloudCamp Chicago - Big Data & Cloud May 2015 - All SlidesCloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
CloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
 
CloudCamp Chicago lightning talk "Connecting Vehicles on Google Cloud Platfor...
CloudCamp Chicago lightning talk "Connecting Vehicles on Google Cloud Platfor...CloudCamp Chicago lightning talk "Connecting Vehicles on Google Cloud Platfor...
CloudCamp Chicago lightning talk "Connecting Vehicles on Google Cloud Platfor...
 
Cashing in on logging and exception data
Cashing in on logging and exception dataCashing in on logging and exception data
Cashing in on logging and exception data
 
Native container monitoring
Native container monitoringNative container monitoring
Native container monitoring
 
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
 
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
 
Managing your SaltStack Minions with Foreman
Managing your SaltStack Minions with ForemanManaging your SaltStack Minions with Foreman
Managing your SaltStack Minions with Foreman
 
Edge 2016 Session 1886 Building your own docker container cloud on ibm power...
Edge 2016 Session 1886  Building your own docker container cloud on ibm power...Edge 2016 Session 1886  Building your own docker container cloud on ibm power...
Edge 2016 Session 1886 Building your own docker container cloud on ibm power...
 
Data Logging and Telemetry
Data Logging and TelemetryData Logging and Telemetry
Data Logging and Telemetry
 
Deep-Dive to Application Insights
Deep-Dive to Application Insights Deep-Dive to Application Insights
Deep-Dive to Application Insights
 
Intro to open source telemetry linux con 2016
Intro to open source telemetry   linux con 2016Intro to open source telemetry   linux con 2016
Intro to open source telemetry linux con 2016
 
Sysdig Monitorama Slides
Sysdig Monitorama SlidesSysdig Monitorama Slides
Sysdig Monitorama Slides
 
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Intel SoC as a Platform to Connect Sensor Data to AWS
Intel SoC as a Platform to Connect Sensor Data to AWSIntel SoC as a Platform to Connect Sensor Data to AWS
Intel SoC as a Platform to Connect Sensor Data to AWS
 
Volta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a ServiceVolta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a Service
 
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to Roots
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at Netflix
 

Similar to Monitoring kubernetes across data center and cloud

Intel open stack-summit-session-nov13-final
Intel open stack-summit-session-nov13-finalIntel open stack-summit-session-nov13-final
Intel open stack-summit-session-nov13-finalDeepak Mane
 
GCCP JSCOE Session 2
GCCP JSCOE Session 2GCCP JSCOE Session 2
GCCP JSCOE Session 2GDSC
 
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...tdc-globalcode
 
Mete Atamel
Mete AtamelMete Atamel
Mete AtamelCodeFest
 
Centralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container OperationsCentralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container OperationsKublr
 
Containerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with KubernetesContainerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with KubernetesCodemotion Tel Aviv
 
Private Cloud with Open Stack, Docker
Private Cloud with Open Stack, DockerPrivate Cloud with Open Stack, Docker
Private Cloud with Open Stack, DockerDavinder Kohli
 
Kubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CDKubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CDStfalcon Meetups
 
Kubernetes for Beginners
Kubernetes for BeginnersKubernetes for Beginners
Kubernetes for BeginnersDigitalOcean
 
Kubernetes – An open platform for container orchestration
Kubernetes – An open platform for container orchestrationKubernetes – An open platform for container orchestration
Kubernetes – An open platform for container orchestrationinovex GmbH
 
Episode 1: Building Kubernetes-as-a-Service
Episode 1: Building Kubernetes-as-a-ServiceEpisode 1: Building Kubernetes-as-a-Service
Episode 1: Building Kubernetes-as-a-ServiceMesosphere Inc.
 
Migrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKS
Migrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKSMigrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKS
Migrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKSWeaveworks
 
Monitoring on Kubernetes using prometheus
Monitoring on Kubernetes using prometheusMonitoring on Kubernetes using prometheus
Monitoring on Kubernetes using prometheusChandresh Pancholi
 
Monitoring on Kubernetes using Prometheus - Chandresh
Monitoring on Kubernetes using Prometheus - Chandresh Monitoring on Kubernetes using Prometheus - Chandresh
Monitoring on Kubernetes using Prometheus - Chandresh CodeOps Technologies LLP
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integrationaspyker
 
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMwareVMUG IT
 
oci-container-engine-oke-100.pdf
oci-container-engine-oke-100.pdfoci-container-engine-oke-100.pdf
oci-container-engine-oke-100.pdfNandiniSinghal16
 
Accelerate Application Innovation Journey with Azure Kubernetes Service
Accelerate Application Innovation Journey with Azure Kubernetes Service Accelerate Application Innovation Journey with Azure Kubernetes Service
Accelerate Application Innovation Journey with Azure Kubernetes Service WinWire Technologies Inc
 

Similar to Monitoring kubernetes across data center and cloud (20)

Intel open stack-summit-session-nov13-final
Intel open stack-summit-session-nov13-finalIntel open stack-summit-session-nov13-final
Intel open stack-summit-session-nov13-final
 
GCCP JSCOE Session 2
GCCP JSCOE Session 2GCCP JSCOE Session 2
GCCP JSCOE Session 2
 
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
 
Mete Atamel
Mete AtamelMete Atamel
Mete Atamel
 
Centralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container OperationsCentralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container Operations
 
Containerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with KubernetesContainerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with Kubernetes
 
Private Cloud with Open Stack, Docker
Private Cloud with Open Stack, DockerPrivate Cloud with Open Stack, Docker
Private Cloud with Open Stack, Docker
 
OpenStack Havana Release
OpenStack Havana ReleaseOpenStack Havana Release
OpenStack Havana Release
 
Kubernetes intro
Kubernetes introKubernetes intro
Kubernetes intro
 
Kubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CDKubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CD
 
Kubernetes for Beginners
Kubernetes for BeginnersKubernetes for Beginners
Kubernetes for Beginners
 
Kubernetes – An open platform for container orchestration
Kubernetes – An open platform for container orchestrationKubernetes – An open platform for container orchestration
Kubernetes – An open platform for container orchestration
 
Episode 1: Building Kubernetes-as-a-Service
Episode 1: Building Kubernetes-as-a-ServiceEpisode 1: Building Kubernetes-as-a-Service
Episode 1: Building Kubernetes-as-a-Service
 
Migrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKS
Migrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKSMigrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKS
Migrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKS
 
Monitoring on Kubernetes using prometheus
Monitoring on Kubernetes using prometheusMonitoring on Kubernetes using prometheus
Monitoring on Kubernetes using prometheus
 
Monitoring on Kubernetes using Prometheus - Chandresh
Monitoring on Kubernetes using Prometheus - Chandresh Monitoring on Kubernetes using Prometheus - Chandresh
Monitoring on Kubernetes using Prometheus - Chandresh
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
 
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
 
oci-container-engine-oke-100.pdf
oci-container-engine-oke-100.pdfoci-container-engine-oke-100.pdf
oci-container-engine-oke-100.pdf
 
Accelerate Application Innovation Journey with Azure Kubernetes Service
Accelerate Application Innovation Journey with Azure Kubernetes Service Accelerate Application Innovation Journey with Azure Kubernetes Service
Accelerate Application Innovation Journey with Azure Kubernetes Service
 

More from Datadog

What it Means to be a Next-Generation Managed Service Provider
What it Means to be a Next-Generation Managed Service ProviderWhat it Means to be a Next-Generation Managed Service Provider
What it Means to be a Next-Generation Managed Service ProviderDatadog
 
Datadog + VictorOps Webinar
Datadog + VictorOps WebinarDatadog + VictorOps Webinar
Datadog + VictorOps WebinarDatadog
 
Dataday Texas 2016 - Datadog
Dataday Texas 2016 - DatadogDataday Texas 2016 - Datadog
Dataday Texas 2016 - DatadogDatadog
 
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog Datadog
 
Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015Datadog
 
Treating Infrastructure as Garbage
Treating Infrastructure as GarbageTreating Infrastructure as Garbage
Treating Infrastructure as GarbageDatadog
 
Big (IT) data
Big (IT) dataBig (IT) data
Big (IT) dataDatadog
 
Deep dive into Nagios analytics
Deep dive into Nagios analyticsDeep dive into Nagios analytics
Deep dive into Nagios analyticsDatadog
 
Customer Ops: DevOps <3 customer support
Customer Ops: DevOps <3 customer supportCustomer Ops: DevOps <3 customer support
Customer Ops: DevOps <3 customer supportDatadog
 
I <3 graphs in 20 slides
I <3 graphs in 20 slidesI <3 graphs in 20 slides
I <3 graphs in 20 slidesDatadog
 
Effective monitoring with StatsD
Effective monitoring with StatsDEffective monitoring with StatsD
Effective monitoring with StatsDDatadog
 
Alerting: more signal, less noise, less pain
Alerting: more signal, less noise, less painAlerting: more signal, less noise, less pain
Alerting: more signal, less noise, less painDatadog
 
Fact based monitoring
Fact based monitoringFact based monitoring
Fact based monitoringDatadog
 
Fact-Based Monitoring
Fact-Based MonitoringFact-Based Monitoring
Fact-Based MonitoringDatadog
 
Monitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-toMonitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-toDatadog
 
What’s in this Cookbook? - Mike Fiedler
What’s in this Cookbook? - Mike FiedlerWhat’s in this Cookbook? - Mike Fiedler
What’s in this Cookbook? - Mike FiedlerDatadog
 
I Love Graphs - Alexis Lê-Quôc
I Love Graphs - Alexis Lê-QuôcI Love Graphs - Alexis Lê-Quôc
I Love Graphs - Alexis Lê-QuôcDatadog
 
Why Puppet Sucks - Rob Terhaar
Why Puppet Sucks - Rob TerhaarWhy Puppet Sucks - Rob Terhaar
Why Puppet Sucks - Rob TerhaarDatadog
 
Welcome to a Computing Revolution - Alex Lesser
Welcome to a Computing Revolution - Alex LesserWelcome to a Computing Revolution - Alex Lesser
Welcome to a Computing Revolution - Alex LesserDatadog
 
Cosa Nostra - Tom Santero
Cosa Nostra - Tom SanteroCosa Nostra - Tom Santero
Cosa Nostra - Tom SanteroDatadog
 

More from Datadog (20)

What it Means to be a Next-Generation Managed Service Provider
What it Means to be a Next-Generation Managed Service ProviderWhat it Means to be a Next-Generation Managed Service Provider
What it Means to be a Next-Generation Managed Service Provider
 
Datadog + VictorOps Webinar
Datadog + VictorOps WebinarDatadog + VictorOps Webinar
Datadog + VictorOps Webinar
 
Dataday Texas 2016 - Datadog
Dataday Texas 2016 - DatadogDataday Texas 2016 - Datadog
Dataday Texas 2016 - Datadog
 
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
 
Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015
 
Treating Infrastructure as Garbage
Treating Infrastructure as GarbageTreating Infrastructure as Garbage
Treating Infrastructure as Garbage
 
Big (IT) data
Big (IT) dataBig (IT) data
Big (IT) data
 
Deep dive into Nagios analytics
Deep dive into Nagios analyticsDeep dive into Nagios analytics
Deep dive into Nagios analytics
 
Customer Ops: DevOps <3 customer support
Customer Ops: DevOps <3 customer supportCustomer Ops: DevOps <3 customer support
Customer Ops: DevOps <3 customer support
 
I <3 graphs in 20 slides
I <3 graphs in 20 slidesI <3 graphs in 20 slides
I <3 graphs in 20 slides
 
Effective monitoring with StatsD
Effective monitoring with StatsDEffective monitoring with StatsD
Effective monitoring with StatsD
 
Alerting: more signal, less noise, less pain
Alerting: more signal, less noise, less painAlerting: more signal, less noise, less pain
Alerting: more signal, less noise, less pain
 
Fact based monitoring
Fact based monitoringFact based monitoring
Fact based monitoring
 
Fact-Based Monitoring
Fact-Based MonitoringFact-Based Monitoring
Fact-Based Monitoring
 
Monitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-toMonitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-to
 
What’s in this Cookbook? - Mike Fiedler
What’s in this Cookbook? - Mike FiedlerWhat’s in this Cookbook? - Mike Fiedler
What’s in this Cookbook? - Mike Fiedler
 
I Love Graphs - Alexis Lê-Quôc
I Love Graphs - Alexis Lê-QuôcI Love Graphs - Alexis Lê-Quôc
I Love Graphs - Alexis Lê-Quôc
 
Why Puppet Sucks - Rob Terhaar
Why Puppet Sucks - Rob TerhaarWhy Puppet Sucks - Rob Terhaar
Why Puppet Sucks - Rob Terhaar
 
Welcome to a Computing Revolution - Alex Lesser
Welcome to a Computing Revolution - Alex LesserWelcome to a Computing Revolution - Alex Lesser
Welcome to a Computing Revolution - Alex Lesser
 
Cosa Nostra - Tom Santero
Cosa Nostra - Tom SanteroCosa Nostra - Tom Santero
Cosa Nostra - Tom Santero
 

Recently uploaded

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Recently uploaded (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Monitoring kubernetes across data center and cloud

  • 1. Monitoring Kubernetes Across Data Center and Cloud Specifically Tectonic and Google Container Engine using Datadog Presenters: Ilan Rabinovitch, Director of Technical Community, Datadog Aleks Saul, Customer-Facing Engineer, CoreOS Aparna Sinha, Senior Product Manager, Google
  • 2. Google Cloud Platform Kubernetes at a glance Open source production-grade container scheduling and management ● Top 0.01% of all GitHub projects: 950+ contributors & 35,000+ commits Run Anywhere: multi-cloud, on-prem, bare-metal, OpenStack etc Broad industry adoption Commercial Enterprise Support Kubernetes at a glance
  • 3. Google Cloud Platform Kubernetes provides container-centric infrastructure Once specific containers are no longer bound to specific machines/VMs, host-centric infrastructure no longer works • Scheduling: Decide where my containers should run • Lifecycle and health: Keep my containers running despite failures • Scaling: Make sets of containers bigger or smaller • Naming and discovery: Find where my containers are now • Load balancing: Distribute traffic across a set of containers • Storage volumes: Provide data to containers • Logging and monitoring: Track what’s happening with my containers • Debugging and introspection: Enter or attach to containers • Identity and authorization: Control who can do things to my containers
  • 4. Google Cloud Platform Kubernetes offers choice and flexibility for Hybrid Cloud Setting up and managing a cluster • Choose a cloud: GCP, AWS, Azure, Rackspace, on-premises, ... • Choose a node OS: CoreOS, Atomic, RHEL, Debian, CentOS, Ubuntu, ... • Provision machines: create VMs, install Docker, ... • Configure networking: IP ranges for Pods, Services, SDN, firewalls, ... • Start cluster services: DNS, logging, monitoring, … • Start and configure Kubernetes • Manage nodes: kernel upgrades, OS updates, hardware failures, … GKE is Google hosted and managed Kubernetes • Directly uses upstream open source • Rolls out within 3-5 business days of the latest open source release • Alpha features also now available through ‘alpha clusters’
  • 5. Google Cloud Platform Google Container Engine (GKE) “It delivers a high-performing, flexible infrastructure that lets us independently scale components for maximum efficiency” ~ Philips (Hue Lights) “Made our engineers more productive and helped us do more work with less staff” ~ CCP Games (EVE Online)
  • 6. Google Cloud Platform How Monitoring Works in Google Container Engine Master Storage BackendHeapster Kubelet cAdvisor Node Kubelet cAdvisor Node
  • 7. Google Cloud Platform Google Container Engine Monitoring Server Metrics used for self repair, and exposed to end users via Stackdriver Primary job is to ensure that each Kubernetes master is available ● Implements the repair logic for when a cluster is non-responsive ● Automatically resizes master machines as the number of nodes grows Also collects metrics for each cluster ● Number of resources (nodes, pods, services, namespaces, etc) ● CPU usage, limit, utilization ratio; Memory usage and limit; Page faults; Disk usage and limit; Uptime ● Uses number of nodes for report billing status
  • 8. Google Cloud Platform Pluggable interface for cloud monitoring Run Influx and Grafana in the cluster ● alternative to Google Cloud Monitoring Plug in your own! ● e.g., Prometheus, Datadog etc. Kube State metrics: (node status, node capacity, replica state, etc) Prometheus
  • 9. Google Cloud Platform Kube State Metrics ● Generates metrics about the state of Kubernetes logical objects (node status, node capacity, replica state, etc) ● Deployed alongside your other applications as a kubernetes service. ● Exposes metrics via HTTP API or Prometheus format
  • 10. Google Cloud Platform We focus on delivering the capabilities required by enterprise organizations to run and manage kubernetes at scale... ● Cluster installers (for AWS and bare metal, to start). ● Management software to upgrade, backup, rollback, scale up and down the cluster. ● Console UI that surfaces management functionality, cluster information, and compute usage to the user and includes add on services (Quay, identity and authentication). Extending Kubernetes for the Enterprise
  • 11. Google Cloud Platform Tectonic Extends Upstream Kubernetes ● Container orchestration ● Horizontal scale ● High availability ● Service discovery & load balancer ● Installer ● Management console ● Painless updates ● Cluster scaling ● Disaster recovery ● Alerts and logging ● Security (integrated) ● Container registry (Quay) ● Integration across environments Extending Kubernetes for the Enterprise Security Mgmt Kubernetes CoreOS Linux Cloud Integration Container Registry Storage & Compute apps/container/microservices
  • 12. Google Cloud Platform Tectonic Kubernetes Security ● Clair: container vulnerability scanning ● KMS integration ● LDAP integration ● RBAC integration Extending Kubernetes for the Enterprise Mgmt Kubernetes CoreOS Linux Cloud Integration Container Registry Storage & Compute apps/container/microservices Security
  • 13. •SaaS based infrastructure and application monitoring •Focus on modern environments •Cloud, Containers, Microservices •Dynamic configuration models •Processing nearly a trillion data points per day •Intelligent Alerting and Insightful Dashboards •Anomaly and Outlier Detection Datadog Overview
  • 14. Collecting data is cheap; not having it when you need it can be expensive
  • 15. Operating Systems, Cloud Providers, Containers, Web Servers, Datastores, Caches, Queues and more... Monitor Everything
  • 16. Datadog ● Deployed as a DaemonSet. One instance per node. ● Collects metrics and events from: ○ container engine (eg Docker) ○ Kubernetes Heapster ○ kube-state-metrics ○ Deployed Applications ○ Google Monitoring APIs ● Exposes statsd end point for custom metrics. ● Metrics are automatically tagged by PODs, Labels, etc
  • 17.
  • 18. Operational Complexity Increases with.. • Number of things to measure • Velocity of change
  • 19. How much we measure? 1 instance • 10 metrics from cloud providers 1 operating system (e.g., Linux) • 100 metrics 50~ metrics per application
  • 20.
  • 22. Operational Complexity: Scale 160 metrics per host 800 metrics per host Assuming 5 containers per host
  • 24.
  • 25. How much we measure? 1 instance • 10 metrics from cloud providers 1 operating system (e.g., Linux) • 100 metrics 50~ metrics per application N containers • 150*N metrics Metrics Overload!
  • 26. Operational Complexity Increases with.. • Number of things to measure • Velocity of change
  • 28. Operational Complexity Increases with.. • Number of things to measure • Velocity of change
  • 29. Monitoring Questions • Where is a given container running? • What is the overall capacity of my cluster? • What port(s) are my applications running on? • What’s the total throughput of my application? • What’s its response time per tag? (app, version, data center) • What’s the distribution of 5xx error per container? What about by data center?
  • 32.
  • 33.
  • 34. Query Based Monitoring “What’s the average throughput of application:nginx per version ?” “Alert me when one of my pod from replication controller:foo is not behaving like the others?” “Show me rate of HTTP 500 responses from nginx” “… grouped by data center … running my app version 2….”
  • 35. Service Discovery Docker API Kubernetes Monitoring Agent Container A O A O Containers List & Metadata Additional Metadata (Tags, etc) Config Backends Integration Configurations Host Level Metrics
  • 36.
  • 37. Q&A You can also follow us on Twitter: @datadoghq @googlecloud @tectonicstack