SlideShare a Scribd company logo
1 of 43
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Deep Dive in Cloud Monitoring
with Amazon EKS and Prometheus
Pahud Hsieh
Specialist SA, Serverless
Amazon Web Services
Kakashi Liu
Infra Lead
UmboCV
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon EKS in the Past Year
● Started in us-east-1 and us-west-2
● Released VPC CNI 1.0
● HIPPA Support
● Released AMI build scripts on Github
● Released VPC CNI 1.1
● Enabled GPU Support
● Support API Aggregation
● Support HPA
● Support eu-west-1
● CLI support for writing the kubeconfig
● Support for Admission Controllers
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon EKS in the Past Year
● Released VPC CNI 1.2
● Allow for additional VPC CIDR ranges
● Support for us-east-2
● Official support for ALB Ingress
● Container Marketplace
● CloudMap Integration
● Support for AWS App Mesh
● Support for eu-central1, ap-southeast-1, ap-southeast-2, ap-
northeast-1
● Support for ap-northeast-2
● Added the SLA
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Immediately after that
● Achieved ISO and PCI compliance
● Support for ap-south-1, eu-west-2, eu-west-3
● Released VPC CNI 1.3
● Added a new qiuckstart
● Allowed private API Endpoints
● Launched an App Mesh controller at GA
● Public Preview for Windows nodes
● Deep Learning container launch
● Added 1.2 with a new cluster update API
● Released CSI Drivers for FSx and EFS
● Control plane logs
● Public Preview of A1 instances
● Released a Machine Learning Benchmark tool
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
CloudWatch Container Insights(preview)
Dimensions for Kubernetes
• Clusters
• Nodes
• Services
• Namespaces
• Pods
Pod Metrics
• pod_cpu_reserved_capacity
• pod_cpu_utilization
• pod_cpu_utilization_over_pod_li
mit
• pod_memory_reserved_capacity
• pod_memory_utilization
• pod_memory_utilization_over_p
od_limit
• pod_network_rx_bytes
• pod_network_tx_bytes
Other Metrics
• cluster_failed_node_count
• cluster_node_count
• namespace_number_of_runni
ng_pods
• node_cpu_limit
• node_cpu_reserved_capacity
• node_cpu_usage_total
• node_cpu_utilization
• node_filesystem_utilization
• node_memory_limit
• node_memory_reserved_capa
city
• node_memory_utilization
• node_memory_working_set
• node_network_total_bytes
• node_number_of_running_containers
• node_number_of_running_pods
• service_number_of_running_pods
Reference - https://amzn.to/2HFtHDt
Threshold and Alarm Actions
Amazon EKS and Prometheus
Prometheus
Why Prometheus?
Community
Number of integrations
Ease of use
Why not Prometheus?
Manage it yourself
Complexity in large setups
Possibility: Hybrid Approach
Use Prometheus to collect metrics that
are exposed on /metrics endpoints
Send a subset of critical metrics to
Amazon CloudWatch or a third party
solution.
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Hello!
I am kakashi
- Infra Lead @Umbo CV
- Co-organizer @Golang Taipei Gathering
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Traditional
Solutions
Umbo
Light
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Agenda
Why monitoring
Umbo CV Monitoring pipeline
Prometheus: Why and What
Prometheus with EKS
Use cases
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Why monitoring
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Why monitoring
Alerting Long-term trends
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Umbo CV Monitoring pipeline
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Monitoring types
Infrastructure
Application
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Application monitoring
EC2
Metrics
Store
container
container
exporter
exporter
exporter
/metrics
EC2 /metrics
Collect
Alert
Expose
Metrics
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Prometheus: Why and What
● Graduates Within CNCF.
● Can handle multi-dimensional metrics.
● Performance: can ingest millions of samples per second.
● Powerful query language: PromQL.
● Built-in alerting tool and service discovery mechanism.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Prometheus metrics
EC2 /metrics
EC2 /metrics
User request
http_requests_total{code=200, path="/api/user"} 10
metric_name labels value
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
PromQL example
Total requests / second
sum(rate(http_requests_total[5m]))
Total 5xx requests / second
sum(rate(http_requests_total{code=~"5.*"}[5
m]))
Current percentage of errors across all instances
sum(rate(http_requests_total{code=~"5.*"}[5m])) /
sum(rate(http_requests_total[5m]))
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Alerting rule
alert: Percentage_Of_Errors_Is_High
expr: sum(rate(http_requests_total{code=~"5.*"}[5m]))
/
sum(rate(http_requests_total[5m])) > 5
for: 5m
labels:
severity: critical
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Prometheus with EKS
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Prometheus ❤ EKS
● Monitoring system is critical.
● Running Prometheus on Kubernetes can
easily achieve HA.
● Prometheus operator makes it ever easier
○ Automated management and upgrades of
Prometheus.
○ Native k8s configuration.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Install Prometheus on EKS by helm
1. Install Promethues Operator chart
2. Verify
$ helm install --name prom --namespace monitoring stable/prometheus-operator
$ kubectl --namespace monitoring get pods
NAME READY STATUS RESTARTS AGE
alertmanager-prom-op-alertmanager-0 2/2 Running 0 1m
prometheus-prom-op-prometheus-0 3/3 Running 1 1m
prom-op-grafana-5c59ddfb9d-zqfqt 2/2 Running 0 2m
prom-op-kube-state-metrics-76786cc9b4-8q4bj 1/1 Running 0 2m
prom-op-prometheus-node-exporter-6jclc 1/1 Running 0 2m
prom-op-prometheus-node-exporter-bxr49 1/1 Running 0 2m
prom-op-prometheus-operato-operator-6cbf5d5cfd-z6fz4 1/1 Running 0 2m
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Prometheus Operator CRD
● Prometheus & AlertManager
○ Define Prometheus and AlertManager deployment.
● ServiceMonitor
○ Used to specify how metric of k8s services can be
scraped.
● PrometheusRule
○ Can be loaded by a Prometheus instance containing
Prometheus alerting and recording rules.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
EKS cluster monitoring
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
EKS application monitoring through ServiceMonitor
apiVersion:
monitoring.coreos.com/v1
kind: Servicemonitor
metadata:
name: api-servicemonitor
spec:
selector:
matchLabels:
app: api-server
Labels:
app: api-server
Labels:
app: api-server2
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Alerting by PrometheusRule
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
spec:
groups:
- name: api.rules
rules:
- alert: Percentage_Of_Errors_Is_High
expr:
sum(rate(http_requests_total{code=~"5.*"}[5m])) /
sum(rate(http_requests_total[5m])) > 5
for: 5m
labels:
severity: critical
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Dashboard for EKS cluster
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Monitoring camera detection pipeline
Media
Serve
r
CV
Detectio
n
API
Serve
r
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Monitoring camera detection pipeline
Media
Serve
r
CV
Detectio
n
API
Serve
r
# of
frames # cv
requests
# of events
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Service discovery
Media
Serve
r
CV
Detectio
n
API
Serve
r
Scraping through EC2 service
discovery
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Service discovery
Media
Server
CV
Detection
API
Server
Scraping
global:
scrape_interval: 1s
evaluation_interval: 1s
scrape_configs:
- job_name: 'node'
ec2_sd_configs:
- region: eu-east-1
access_key:
<ACCESS_KEY_HERE>
secret_key:
<SECRET_KEY_HERE>
port: 9273
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Application metrics
Media
Serve
r
CV
Detectio
n
API
Serve
r
ms_frames_total{env="production", service="ms", cameraId="ID-123456"}
1000
# of frames
# of cv requests cvreqest_total{env="production", service="cv", cameraId="ID-123456"} 300
# of events event_total{env="production", service="cv", cameraId="ID-123456"} 5
# of frames # of cv request # of events
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Dashboard
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Alerting
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
spec:
groups:
- name: camera.rules
rules:
- alert: FpsLow
annotations:
message: "{{ $labels.cameraid }} fps is lower than 2fps"
expr: sum(rate(ms_frames_total{env="production", cameraId=".+"}[10m])) < 2
for: 30mins
labels:
severity: critical
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Thank you!
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

深探-IaC-(Infrastructure as Code-基礎設施即程式碼-)-在-AWS-上的應用
深探-IaC-(Infrastructure as Code-基礎設施即程式碼-)-在-AWS-上的應用深探-IaC-(Infrastructure as Code-基礎設施即程式碼-)-在-AWS-上的應用
深探-IaC-(Infrastructure as Code-基礎設施即程式碼-)-在-AWS-上的應用Amazon Web Services
 
Accelerating your cloud migration with VMware Cloud on AWS - CMP205 - Chicago...
Accelerating your cloud migration with VMware Cloud on AWS - CMP205 - Chicago...Accelerating your cloud migration with VMware Cloud on AWS - CMP205 - Chicago...
Accelerating your cloud migration with VMware Cloud on AWS - CMP205 - Chicago...Amazon Web Services
 
如何成功的完成混合雲遷移專案
如何成功的完成混合雲遷移專案如何成功的完成混合雲遷移專案
如何成功的完成混合雲遷移專案Amazon Web Services
 
利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統Amazon Web Services
 
Building-Modern-Distributed-Applications
Building-Modern-Distributed-ApplicationsBuilding-Modern-Distributed-Applications
Building-Modern-Distributed-ApplicationsAmazon Web Services
 
Scalable serverless architectures using event-driven design - MAD301 - Atlant...
Scalable serverless architectures using event-driven design - MAD301 - Atlant...Scalable serverless architectures using event-driven design - MAD301 - Atlant...
Scalable serverless architectures using event-driven design - MAD301 - Atlant...Amazon Web Services
 
利用微服務加速創新的步伐
利用微服務加速創新的步伐利用微服務加速創新的步伐
利用微服務加速創新的步伐Amazon Web Services
 
AWS 如何協助客戶建立 DevOps 流程
AWS 如何協助客戶建立 DevOps 流程AWS 如何協助客戶建立 DevOps 流程
AWS 如何協助客戶建立 DevOps 流程Amazon Web Services
 
利用 AWS Step Functions 建構穩定的資料處理流程.pdf
利用 AWS Step Functions 建構穩定的資料處理流程.pdf利用 AWS Step Functions 建構穩定的資料處理流程.pdf
利用 AWS Step Functions 建構穩定的資料處理流程.pdfAmazon Web Services
 
AWS App Mesh (Service Mesh Magic)- AWS Container Day 2019 Barcelona
AWS App Mesh (Service Mesh Magic)- AWS Container Day 2019 BarcelonaAWS App Mesh (Service Mesh Magic)- AWS Container Day 2019 Barcelona
AWS App Mesh (Service Mesh Magic)- AWS Container Day 2019 BarcelonaAmazon Web Services
 
Essential capabilities behind Microservices
Essential capabilities behind MicroservicesEssential capabilities behind Microservices
Essential capabilities behind MicroservicesAmazon Web Services
 
Storing data long term with Amazon S3 Glacier Deep Archive - STG302 - Chicago...
Storing data long term with Amazon S3 Glacier Deep Archive - STG302 - Chicago...Storing data long term with Amazon S3 Glacier Deep Archive - STG302 - Chicago...
Storing data long term with Amazon S3 Glacier Deep Archive - STG302 - Chicago...Amazon Web Services
 
Running Amazon EC2 workloads at scale - CMP301 - New York AWS Summit
Running Amazon EC2 workloads at scale - CMP301 - New York AWS SummitRunning Amazon EC2 workloads at scale - CMP301 - New York AWS Summit
Running Amazon EC2 workloads at scale - CMP301 - New York AWS SummitAmazon Web Services
 
Modernizing legacy applications with Amazon EKS - MAD301 - Chicago AWS Summit
Modernizing legacy applications with Amazon EKS - MAD301 - Chicago AWS SummitModernizing legacy applications with Amazon EKS - MAD301 - Chicago AWS Summit
Modernizing legacy applications with Amazon EKS - MAD301 - Chicago AWS SummitAmazon Web Services
 
How to speed up and scale your innovation efforts - MAD203 - Chicago AWS Summit
How to speed up and scale your innovation efforts - MAD203 - Chicago AWS SummitHow to speed up and scale your innovation efforts - MAD203 - Chicago AWS Summit
How to speed up and scale your innovation efforts - MAD203 - Chicago AWS SummitAmazon Web Services
 
Orchestrating containers on AWS | AWS Summit Tel Aviv 2019
Orchestrating containers on AWS  | AWS Summit Tel Aviv 2019Orchestrating containers on AWS  | AWS Summit Tel Aviv 2019
Orchestrating containers on AWS | AWS Summit Tel Aviv 2019AWS Summits
 
Using automation to drive continuous-compliance best practices - SVC309 - Chi...
Using automation to drive continuous-compliance best practices - SVC309 - Chi...Using automation to drive continuous-compliance best practices - SVC309 - Chi...
Using automation to drive continuous-compliance best practices - SVC309 - Chi...Amazon Web Services
 
How Millennium Management achieves provable security with AWS Zelkova - FSV30...
How Millennium Management achieves provable security with AWS Zelkova - FSV30...How Millennium Management achieves provable security with AWS Zelkova - FSV30...
How Millennium Management achieves provable security with AWS Zelkova - FSV30...Amazon Web Services
 
Detecting and mitigating threats with AWS - SEC301 - Chicago AWS Summit
Detecting and mitigating threats with AWS - SEC301 - Chicago AWS SummitDetecting and mitigating threats with AWS - SEC301 - Chicago AWS Summit
Detecting and mitigating threats with AWS - SEC301 - Chicago AWS SummitAmazon Web Services
 

What's hot (20)

深探-IaC-(Infrastructure as Code-基礎設施即程式碼-)-在-AWS-上的應用
深探-IaC-(Infrastructure as Code-基礎設施即程式碼-)-在-AWS-上的應用深探-IaC-(Infrastructure as Code-基礎設施即程式碼-)-在-AWS-上的應用
深探-IaC-(Infrastructure as Code-基礎設施即程式碼-)-在-AWS-上的應用
 
Accelerating your cloud migration with VMware Cloud on AWS - CMP205 - Chicago...
Accelerating your cloud migration with VMware Cloud on AWS - CMP205 - Chicago...Accelerating your cloud migration with VMware Cloud on AWS - CMP205 - Chicago...
Accelerating your cloud migration with VMware Cloud on AWS - CMP205 - Chicago...
 
如何成功的完成混合雲遷移專案
如何成功的完成混合雲遷移專案如何成功的完成混合雲遷移專案
如何成功的完成混合雲遷移專案
 
利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統
 
Building-Modern-Distributed-Applications
Building-Modern-Distributed-ApplicationsBuilding-Modern-Distributed-Applications
Building-Modern-Distributed-Applications
 
Scalable serverless architectures using event-driven design - MAD301 - Atlant...
Scalable serverless architectures using event-driven design - MAD301 - Atlant...Scalable serverless architectures using event-driven design - MAD301 - Atlant...
Scalable serverless architectures using event-driven design - MAD301 - Atlant...
 
利用微服務加速創新的步伐
利用微服務加速創新的步伐利用微服務加速創新的步伐
利用微服務加速創新的步伐
 
AWS 如何協助客戶建立 DevOps 流程
AWS 如何協助客戶建立 DevOps 流程AWS 如何協助客戶建立 DevOps 流程
AWS 如何協助客戶建立 DevOps 流程
 
利用 AWS Step Functions 建構穩定的資料處理流程.pdf
利用 AWS Step Functions 建構穩定的資料處理流程.pdf利用 AWS Step Functions 建構穩定的資料處理流程.pdf
利用 AWS Step Functions 建構穩定的資料處理流程.pdf
 
AWS App Mesh (Service Mesh Magic)- AWS Container Day 2019 Barcelona
AWS App Mesh (Service Mesh Magic)- AWS Container Day 2019 BarcelonaAWS App Mesh (Service Mesh Magic)- AWS Container Day 2019 Barcelona
AWS App Mesh (Service Mesh Magic)- AWS Container Day 2019 Barcelona
 
.NET on AWS
.NET on AWS.NET on AWS
.NET on AWS
 
Essential capabilities behind Microservices
Essential capabilities behind MicroservicesEssential capabilities behind Microservices
Essential capabilities behind Microservices
 
Storing data long term with Amazon S3 Glacier Deep Archive - STG302 - Chicago...
Storing data long term with Amazon S3 Glacier Deep Archive - STG302 - Chicago...Storing data long term with Amazon S3 Glacier Deep Archive - STG302 - Chicago...
Storing data long term with Amazon S3 Glacier Deep Archive - STG302 - Chicago...
 
Running Amazon EC2 workloads at scale - CMP301 - New York AWS Summit
Running Amazon EC2 workloads at scale - CMP301 - New York AWS SummitRunning Amazon EC2 workloads at scale - CMP301 - New York AWS Summit
Running Amazon EC2 workloads at scale - CMP301 - New York AWS Summit
 
Modernizing legacy applications with Amazon EKS - MAD301 - Chicago AWS Summit
Modernizing legacy applications with Amazon EKS - MAD301 - Chicago AWS SummitModernizing legacy applications with Amazon EKS - MAD301 - Chicago AWS Summit
Modernizing legacy applications with Amazon EKS - MAD301 - Chicago AWS Summit
 
How to speed up and scale your innovation efforts - MAD203 - Chicago AWS Summit
How to speed up and scale your innovation efforts - MAD203 - Chicago AWS SummitHow to speed up and scale your innovation efforts - MAD203 - Chicago AWS Summit
How to speed up and scale your innovation efforts - MAD203 - Chicago AWS Summit
 
Orchestrating containers on AWS | AWS Summit Tel Aviv 2019
Orchestrating containers on AWS  | AWS Summit Tel Aviv 2019Orchestrating containers on AWS  | AWS Summit Tel Aviv 2019
Orchestrating containers on AWS | AWS Summit Tel Aviv 2019
 
Using automation to drive continuous-compliance best practices - SVC309 - Chi...
Using automation to drive continuous-compliance best practices - SVC309 - Chi...Using automation to drive continuous-compliance best practices - SVC309 - Chi...
Using automation to drive continuous-compliance best practices - SVC309 - Chi...
 
How Millennium Management achieves provable security with AWS Zelkova - FSV30...
How Millennium Management achieves provable security with AWS Zelkova - FSV30...How Millennium Management achieves provable security with AWS Zelkova - FSV30...
How Millennium Management achieves provable security with AWS Zelkova - FSV30...
 
Detecting and mitigating threats with AWS - SEC301 - Chicago AWS Summit
Detecting and mitigating threats with AWS - SEC301 - Chicago AWS SummitDetecting and mitigating threats with AWS - SEC301 - Chicago AWS Summit
Detecting and mitigating threats with AWS - SEC301 - Chicago AWS Summit
 

Similar to 深探如何使用-Amazon-EKS-與-Prometheus-進行雲端監控

AWS Container Services – 유재석 (AWS 솔루션즈 아키텍트)
AWS Container Services – 유재석 (AWS 솔루션즈 아키텍트)AWS Container Services – 유재석 (AWS 솔루션즈 아키텍트)
AWS Container Services – 유재석 (AWS 솔루션즈 아키텍트)Amazon Web Services Korea
 
Amazon Container Services – 유재석 (AWS 솔루션즈 아키텍트)
 Amazon Container Services – 유재석 (AWS 솔루션즈 아키텍트) Amazon Container Services – 유재석 (AWS 솔루션즈 아키텍트)
Amazon Container Services – 유재석 (AWS 솔루션즈 아키텍트)Amazon Web Services Korea
 
AWS 고객사를 위한 ‘AWS 컨테이너 교육’ - 유재석, AWS 솔루션즈 아키텍트
AWS 고객사를 위한 ‘AWS 컨테이너 교육’ - 유재석, AWS 솔루션즈 아키텍트AWS 고객사를 위한 ‘AWS 컨테이너 교육’ - 유재석, AWS 솔루션즈 아키텍트
AWS 고객사를 위한 ‘AWS 컨테이너 교육’ - 유재석, AWS 솔루션즈 아키텍트Amazon Web Services Korea
 
Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018
Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018
Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018Amazon Web Services
 
Running Kubernetes with Amazon EKS - AWS Online Tech Talks
Running Kubernetes with Amazon EKS - AWS Online Tech TalksRunning Kubernetes with Amazon EKS - AWS Online Tech Talks
Running Kubernetes with Amazon EKS - AWS Online Tech TalksAmazon Web Services
 
Expert Tips for Successful Kubernetes Deployments on AWS
Expert Tips for Successful Kubernetes Deployments on AWSExpert Tips for Successful Kubernetes Deployments on AWS
Expert Tips for Successful Kubernetes Deployments on AWSAmazon Web Services
 
Introduction to AWS Global Accelerator - SVC211 - Chicago AWS Summit
Introduction to AWS Global Accelerator - SVC211 - Chicago AWS SummitIntroduction to AWS Global Accelerator - SVC211 - Chicago AWS Summit
Introduction to AWS Global Accelerator - SVC211 - Chicago AWS SummitAmazon Web Services
 
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfAmazon Web Services
 
Websites go Serverless - AWS Summit Berlin
Websites go Serverless - AWS Summit BerlinWebsites go Serverless - AWS Summit Berlin
Websites go Serverless - AWS Summit BerlinBoaz Ziniman
 
Introduction to the AWS Well-Architected Framework and AWS WA Tool - SVC214-R...
Introduction to the AWS Well-Architected Framework and AWS WA Tool - SVC214-R...Introduction to the AWS Well-Architected Framework and AWS WA Tool - SVC214-R...
Introduction to the AWS Well-Architected Framework and AWS WA Tool - SVC214-R...Amazon Web Services
 
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfAmazon Web Services
 
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018Amazon Web Services
 
CICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdfCICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdfAmazon Web Services
 
Container, Container, Container -유재석 (AWS 솔루션즈 아키텍트)
Container, Container, Container -유재석 (AWS 솔루션즈 아키텍트)Container, Container, Container -유재석 (AWS 솔루션즈 아키텍트)
Container, Container, Container -유재석 (AWS 솔루션즈 아키텍트)Amazon Web Services Korea
 
Introduction to AWS App Mesh - MAD303 - Atlanta AWS Summit
Introduction to AWS App Mesh - MAD303 - Atlanta AWS SummitIntroduction to AWS App Mesh - MAD303 - Atlanta AWS Summit
Introduction to AWS App Mesh - MAD303 - Atlanta AWS SummitAmazon Web Services
 
Expert Tips for Successful Kubernetes Deployment on AWS
Expert Tips for Successful Kubernetes Deployment on AWSExpert Tips for Successful Kubernetes Deployment on AWS
Expert Tips for Successful Kubernetes Deployment on AWSAmazon Web Services
 
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS SummitGetting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS SummitAmazon Web Services
 
Building Serverless Applications That Align with Twelve-Factor Methods - AWS ...
Building Serverless Applications That Align with Twelve-Factor Methods - AWS ...Building Serverless Applications That Align with Twelve-Factor Methods - AWS ...
Building Serverless Applications That Align with Twelve-Factor Methods - AWS ...Amazon Web Services
 
Managing microservices using AWS App Mesh - MAD302 - Chicago AWS Summit
Managing microservices using AWS App Mesh - MAD302 - Chicago AWS SummitManaging microservices using AWS App Mesh - MAD302 - Chicago AWS Summit
Managing microservices using AWS App Mesh - MAD302 - Chicago AWS SummitAmazon Web Services
 
AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...
AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...
AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...Amazon Web Services
 

Similar to 深探如何使用-Amazon-EKS-與-Prometheus-進行雲端監控 (20)

AWS Container Services – 유재석 (AWS 솔루션즈 아키텍트)
AWS Container Services – 유재석 (AWS 솔루션즈 아키텍트)AWS Container Services – 유재석 (AWS 솔루션즈 아키텍트)
AWS Container Services – 유재석 (AWS 솔루션즈 아키텍트)
 
Amazon Container Services – 유재석 (AWS 솔루션즈 아키텍트)
 Amazon Container Services – 유재석 (AWS 솔루션즈 아키텍트) Amazon Container Services – 유재석 (AWS 솔루션즈 아키텍트)
Amazon Container Services – 유재석 (AWS 솔루션즈 아키텍트)
 
AWS 고객사를 위한 ‘AWS 컨테이너 교육’ - 유재석, AWS 솔루션즈 아키텍트
AWS 고객사를 위한 ‘AWS 컨테이너 교육’ - 유재석, AWS 솔루션즈 아키텍트AWS 고객사를 위한 ‘AWS 컨테이너 교육’ - 유재석, AWS 솔루션즈 아키텍트
AWS 고객사를 위한 ‘AWS 컨테이너 교육’ - 유재석, AWS 솔루션즈 아키텍트
 
Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018
Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018
Day Two Operations of Kubernetes on AWS (GPSTEC309) - AWS re:Invent 2018
 
Running Kubernetes with Amazon EKS - AWS Online Tech Talks
Running Kubernetes with Amazon EKS - AWS Online Tech TalksRunning Kubernetes with Amazon EKS - AWS Online Tech Talks
Running Kubernetes with Amazon EKS - AWS Online Tech Talks
 
Expert Tips for Successful Kubernetes Deployments on AWS
Expert Tips for Successful Kubernetes Deployments on AWSExpert Tips for Successful Kubernetes Deployments on AWS
Expert Tips for Successful Kubernetes Deployments on AWS
 
Introduction to AWS Global Accelerator - SVC211 - Chicago AWS Summit
Introduction to AWS Global Accelerator - SVC211 - Chicago AWS SummitIntroduction to AWS Global Accelerator - SVC211 - Chicago AWS Summit
Introduction to AWS Global Accelerator - SVC211 - Chicago AWS Summit
 
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdf
 
Websites go Serverless - AWS Summit Berlin
Websites go Serverless - AWS Summit BerlinWebsites go Serverless - AWS Summit Berlin
Websites go Serverless - AWS Summit Berlin
 
Introduction to the AWS Well-Architected Framework and AWS WA Tool - SVC214-R...
Introduction to the AWS Well-Architected Framework and AWS WA Tool - SVC214-R...Introduction to the AWS Well-Architected Framework and AWS WA Tool - SVC214-R...
Introduction to the AWS Well-Architected Framework and AWS WA Tool - SVC214-R...
 
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdf
 
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018
Expert Tips for Successful Kubernetes Deployment - AWS Summit Sydney 2018
 
CICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdfCICDforModernApplications-Oslo.pdf
CICDforModernApplications-Oslo.pdf
 
Container, Container, Container -유재석 (AWS 솔루션즈 아키텍트)
Container, Container, Container -유재석 (AWS 솔루션즈 아키텍트)Container, Container, Container -유재석 (AWS 솔루션즈 아키텍트)
Container, Container, Container -유재석 (AWS 솔루션즈 아키텍트)
 
Introduction to AWS App Mesh - MAD303 - Atlanta AWS Summit
Introduction to AWS App Mesh - MAD303 - Atlanta AWS SummitIntroduction to AWS App Mesh - MAD303 - Atlanta AWS Summit
Introduction to AWS App Mesh - MAD303 - Atlanta AWS Summit
 
Expert Tips for Successful Kubernetes Deployment on AWS
Expert Tips for Successful Kubernetes Deployment on AWSExpert Tips for Successful Kubernetes Deployment on AWS
Expert Tips for Successful Kubernetes Deployment on AWS
 
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS SummitGetting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
 
Building Serverless Applications That Align with Twelve-Factor Methods - AWS ...
Building Serverless Applications That Align with Twelve-Factor Methods - AWS ...Building Serverless Applications That Align with Twelve-Factor Methods - AWS ...
Building Serverless Applications That Align with Twelve-Factor Methods - AWS ...
 
Managing microservices using AWS App Mesh - MAD302 - Chicago AWS Summit
Managing microservices using AWS App Mesh - MAD302 - Chicago AWS SummitManaging microservices using AWS App Mesh - MAD302 - Chicago AWS Summit
Managing microservices using AWS App Mesh - MAD302 - Chicago AWS Summit
 
AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...
AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...
AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

深探如何使用-Amazon-EKS-與-Prometheus-進行雲端監控

  • 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Deep Dive in Cloud Monitoring with Amazon EKS and Prometheus Pahud Hsieh Specialist SA, Serverless Amazon Web Services Kakashi Liu Infra Lead UmboCV
  • 2.
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon EKS in the Past Year ● Started in us-east-1 and us-west-2 ● Released VPC CNI 1.0 ● HIPPA Support ● Released AMI build scripts on Github ● Released VPC CNI 1.1 ● Enabled GPU Support ● Support API Aggregation ● Support HPA ● Support eu-west-1 ● CLI support for writing the kubeconfig ● Support for Admission Controllers
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon EKS in the Past Year ● Released VPC CNI 1.2 ● Allow for additional VPC CIDR ranges ● Support for us-east-2 ● Official support for ALB Ingress ● Container Marketplace ● CloudMap Integration ● Support for AWS App Mesh ● Support for eu-central1, ap-southeast-1, ap-southeast-2, ap- northeast-1 ● Support for ap-northeast-2 ● Added the SLA
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Immediately after that ● Achieved ISO and PCI compliance ● Support for ap-south-1, eu-west-2, eu-west-3 ● Released VPC CNI 1.3 ● Added a new qiuckstart ● Allowed private API Endpoints ● Launched an App Mesh controller at GA ● Public Preview for Windows nodes ● Deep Learning container launch ● Added 1.2 with a new cluster update API ● Released CSI Drivers for FSx and EFS ● Control plane logs ● Public Preview of A1 instances ● Released a Machine Learning Benchmark tool
  • 6.
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T CloudWatch Container Insights(preview)
  • 8. Dimensions for Kubernetes • Clusters • Nodes • Services • Namespaces • Pods
  • 9. Pod Metrics • pod_cpu_reserved_capacity • pod_cpu_utilization • pod_cpu_utilization_over_pod_li mit • pod_memory_reserved_capacity • pod_memory_utilization • pod_memory_utilization_over_p od_limit • pod_network_rx_bytes • pod_network_tx_bytes
  • 10. Other Metrics • cluster_failed_node_count • cluster_node_count • namespace_number_of_runni ng_pods • node_cpu_limit • node_cpu_reserved_capacity • node_cpu_usage_total • node_cpu_utilization • node_filesystem_utilization • node_memory_limit • node_memory_reserved_capa city • node_memory_utilization • node_memory_working_set • node_network_total_bytes • node_number_of_running_containers • node_number_of_running_pods • service_number_of_running_pods Reference - https://amzn.to/2HFtHDt
  • 12. Amazon EKS and Prometheus Prometheus Why Prometheus? Community Number of integrations Ease of use Why not Prometheus? Manage it yourself Complexity in large setups Possibility: Hybrid Approach Use Prometheus to collect metrics that are exposed on /metrics endpoints Send a subset of critical metrics to Amazon CloudWatch or a third party solution.
  • 13. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Hello! I am kakashi - Infra Lead @Umbo CV - Co-organizer @Golang Taipei Gathering
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Traditional Solutions Umbo Light
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Agenda Why monitoring Umbo CV Monitoring pipeline Prometheus: Why and What Prometheus with EKS Use cases
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Why monitoring
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Why monitoring Alerting Long-term trends
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Umbo CV Monitoring pipeline
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Monitoring types Infrastructure Application
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Application monitoring EC2 Metrics Store container container exporter exporter exporter /metrics EC2 /metrics Collect Alert Expose Metrics
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Prometheus: Why and What ● Graduates Within CNCF. ● Can handle multi-dimensional metrics. ● Performance: can ingest millions of samples per second. ● Powerful query language: PromQL. ● Built-in alerting tool and service discovery mechanism.
  • 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Prometheus metrics EC2 /metrics EC2 /metrics User request http_requests_total{code=200, path="/api/user"} 10 metric_name labels value
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T PromQL example Total requests / second sum(rate(http_requests_total[5m])) Total 5xx requests / second sum(rate(http_requests_total{code=~"5.*"}[5 m])) Current percentage of errors across all instances sum(rate(http_requests_total{code=~"5.*"}[5m])) / sum(rate(http_requests_total[5m]))
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Alerting rule alert: Percentage_Of_Errors_Is_High expr: sum(rate(http_requests_total{code=~"5.*"}[5m])) / sum(rate(http_requests_total[5m])) > 5 for: 5m labels: severity: critical
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Prometheus with EKS
  • 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Prometheus ❤ EKS ● Monitoring system is critical. ● Running Prometheus on Kubernetes can easily achieve HA. ● Prometheus operator makes it ever easier ○ Automated management and upgrades of Prometheus. ○ Native k8s configuration.
  • 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Install Prometheus on EKS by helm 1. Install Promethues Operator chart 2. Verify $ helm install --name prom --namespace monitoring stable/prometheus-operator $ kubectl --namespace monitoring get pods NAME READY STATUS RESTARTS AGE alertmanager-prom-op-alertmanager-0 2/2 Running 0 1m prometheus-prom-op-prometheus-0 3/3 Running 1 1m prom-op-grafana-5c59ddfb9d-zqfqt 2/2 Running 0 2m prom-op-kube-state-metrics-76786cc9b4-8q4bj 1/1 Running 0 2m prom-op-prometheus-node-exporter-6jclc 1/1 Running 0 2m prom-op-prometheus-node-exporter-bxr49 1/1 Running 0 2m prom-op-prometheus-operato-operator-6cbf5d5cfd-z6fz4 1/1 Running 0 2m
  • 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Prometheus Operator CRD ● Prometheus & AlertManager ○ Define Prometheus and AlertManager deployment. ● ServiceMonitor ○ Used to specify how metric of k8s services can be scraped. ● PrometheusRule ○ Can be loaded by a Prometheus instance containing Prometheus alerting and recording rules.
  • 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T EKS cluster monitoring
  • 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T EKS application monitoring through ServiceMonitor apiVersion: monitoring.coreos.com/v1 kind: Servicemonitor metadata: name: api-servicemonitor spec: selector: matchLabels: app: api-server Labels: app: api-server Labels: app: api-server2
  • 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Alerting by PrometheusRule apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule spec: groups: - name: api.rules rules: - alert: Percentage_Of_Errors_Is_High expr: sum(rate(http_requests_total{code=~"5.*"}[5m])) / sum(rate(http_requests_total[5m])) > 5 for: 5m labels: severity: critical
  • 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Dashboard for EKS cluster
  • 35. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Monitoring camera detection pipeline Media Serve r CV Detectio n API Serve r
  • 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Monitoring camera detection pipeline Media Serve r CV Detectio n API Serve r # of frames # cv requests # of events
  • 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Service discovery Media Serve r CV Detectio n API Serve r Scraping through EC2 service discovery
  • 39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Service discovery Media Server CV Detection API Server Scraping global: scrape_interval: 1s evaluation_interval: 1s scrape_configs: - job_name: 'node' ec2_sd_configs: - region: eu-east-1 access_key: <ACCESS_KEY_HERE> secret_key: <SECRET_KEY_HERE> port: 9273
  • 40. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Application metrics Media Serve r CV Detectio n API Serve r ms_frames_total{env="production", service="ms", cameraId="ID-123456"} 1000 # of frames # of cv requests cvreqest_total{env="production", service="cv", cameraId="ID-123456"} 300 # of events event_total{env="production", service="cv", cameraId="ID-123456"} 5 # of frames # of cv request # of events
  • 41. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Dashboard
  • 42. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Alerting apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule spec: groups: - name: camera.rules rules: - alert: FpsLow annotations: message: "{{ $labels.cameraid }} fps is lower than 2fps" expr: sum(rate(ms_frames_total{env="production", cameraId=".+"}[10m])) < 2 for: 30mins labels: severity: critical
  • 43. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Thank you! S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.