More Related Content
Similar to Monitoring Tools 大亂鬥 - AWS CloudWatch (20)
More from Rick Hwang (20)
Monitoring Tools 大亂鬥 - AWS CloudWatch
- 2. 2018/05/26 @ DevOps Taiwan
Monitoring Tools
之
AWS CloudWatch
Rick Hwang @ 91APP
2018/05/26
2
- 3. 2018/05/26 @ DevOps Taiwan
打雜@91APP
缺好手,歡迎來聊!
缺:Dev + SRE + IT
Cloud / AWS / GCP
DevOps / SRE
Distributed Systems
經營管理
Rick Hwang
https://www.gtcafe.com
3
音樂 吉他 鍵盤 編曲
哲學 科幻 金庸
拉低賽 練肖話
都做過惹
FB - SRE Taiwan 義工
- 7. 2018/05/26 @ DevOps Taiwan
Consideration for Monitoring Tools
● Feedback and Actions
● Observability and Monitoring
● Who Needs the Metrics?
● Latency: Realtime or Batch
● Cost Efficiency
7
目標策略
● Software Engineering
● Services, NOT Servers
● Event-driven
● Programmable
● Configurable
執行策略
- 8. 2018/05/26 @ DevOps Taiwan
動作
Pipeline
8
蒐集 儲存 分析
Data Pipeline
Observability
觀測:量測、度量
Monitoring
控管、控制
氣象局 政府
- 9. 2018/05/26 @ DevOps Taiwan
9
Business
(EC, IoT, Backing)
Login / Logon
Shopping Car
User Sessions
Device Sessions
Invention / Stock
eDM / SMS
Push
Shipping QTY
GA
GMV
ROI
Tracking
Application
Servers /
Services
Tomcat / IIS
Nginx / HAProxy
RDBMS / NoSQL
JVM Heap Size
Node.js
Task Queue
SQS / Kafaka
Cache / CDN
HTTP Requests
HTTP 4XXs / 5XXs
LB Latency
System /
Virtual
Machine
CPU Utilizations
Disk I/O
Disk IOPS / Throughput
Network I/O
Memory Utilizations
Disk Usage
CPU Credit
System Check
Instance Check
Network
Infrastructure
Security
Traffic Flow
Network ACL
Firewall
AD/DC
LDAP
IAM
AAA
DNS
SSL
誰・看・哪些指標?
Boss
Managers
Developers
Administrators
Network
Security
- 10. 2018/05/26 @ DevOps Taiwan
10
Business
(EC, IoT, Backing)
Login / Logon
Shopping Car
User Sessions
Device Sessions
Invention / Stock
eDM / SMS
Push
Shipping QTY
GA
GMV
ROI
Tracking
Application
Servers /
Services
Tomcat / IIS
Nginx / HAProxy
RDBMS / NoSQL
JVM Heap Size
Node.js
Task Queue
SQS / Kafaka
Cache / CDN
HTTP Requests
HTTP 4XXs / 5XXs
LB Latency
System /
Virtual
Machine
CPU Utilizations
Disk I/O
Disk IOPS / Throughput
Network I/O
Memory Utilizations
Disk Usage
CPU Credit
System Check
Instance Check
Network
Infrastructure
Security
Traffic Flow
Network ACL
Firewall
AD/DC
LDAP
IAM
AAA
DNS
SSL
誰・看・哪些指標?
Boss
Managers
Developers
Administrators
Network
Security
全公司看的指標,盡可能標準化!
On-Call 要看的指標,盡可能結構化!
值班人員看的系統資源,盡可能自動化!
資安、Infra 要注意,GDPR / APT 很恐怖!
- 13. 2018/05/26 @ DevOps Taiwan
Why AWS CloudWatch
● Serverless Monitoring System
● Event-driven → Lambda
● Managed Storage
● Programmable and Automation
● Realtime and Backup
● CloudWatch 滿足 “Basic Montioring” 的需求
● 不用 Monitoring Monitoring System
13
- 14. 14
EC2 Instances
Log Shipper
Logs
Log Groups
Log Stream A
Log Stream B
Log Stream C
Log Stream N
Alarms
Filters
[ts, hostname, scope=NGX, tcp_all, tcp_time_wait, tcp_established, ...]
/var/log/app/*.log
2017-06-11T08:45:01 app1 NGX 47 0 47 0 0 0
2017-06-11T08:45:01 app2 NGX 52 0 52 0 0 0
2017-06-11T08:46:01 app1 NGX 53 0 52 0 0 0
2017-06-11T08:46:01 app2 NGX 52 0 51 0 0 0
2017-06-11T08:47:01 app1 NGX 53 0 53 0 0 0
2017-06-11T08:47:01 app2 NGX 53 0 53 0 0 0
2017-06-11T08:48:01 app1 NGX 59 0 59 0 0 0
2017-06-11T08:48:01 app2 NGX 52 0 51 0 0 0
2017-06-11T08:49:01 app1 NGX 48 0 48 0 0 0
Dashboard
Metrics
S3
Amazon ESLambda
SNS Topics
Export
Streaming
Push
Lambda
- 16. 2018/05/26 @ DevOps Taiwan
● 分析:ELK
○ 架構:複雜、高大上
○ 頻率:即時
○ 成本:很貴 (不要問,很恐怖!)
○ 用途:Aggregation、花花綠綠的圖
● Log 備份:Kinesis Firehose + S3 + Glacier
○ 架構:分片處理、效能、ETL
○ 頻率:每分、每刻、每時、每天
○ 成本:涓涓流水 (還是有)
○ 用途:Auditing, Compliance
● Athena: BigQuery on AWS
○ 架構:用就是了
○ 用途:報表、分析
○ 頻率:每週、每月、每季、每年
○ 成本:很低
配套方案 (場景)
16
Partial Realtime (個案)
On-Demond (通案)
Hourly, Daily (通案)
Kinesis
Athena
ELK
- 17. 2018/05/26 @ DevOps Taiwan
動作
Pipeline
17
蒐集 儲存 分析
S3CW Logs
Kinesis
Elasticsearch
CW Logs Athena
Lambda Grafana
Kibana Dashboard
CW Agent
Elasticsearch
Observability
觀測:量測、度量
Monitoring
控管、控制
- 21. 2018/05/26 @ DevOps Taiwan
21
指標?誰?做什麼?
Boss
Managers
Developers
Administrators
Network
Security
全公司看的指標,盡可能標準化!
On-Call 要看的指標,盡可能結構化!
值班人員看的系統資源,盡可能自動化!
資安、Infra 要注意,GDPR / APT 很恐怖!
- 22. 2018/05/26 @ DevOps Taiwan
為什麼不選其它監控工具?
● 不想自己蓋機器、養機器
● 監控系統做得再好,都只是成本
● 監控系統不是搞 Big Data、搞 AI
● 不想養 Storage Service
22
相關分享: Ops as Code using Serverless
- 24. 2018/05/26 @ DevOps Taiwan
● 活用 SaaS,像是 AWS CloudWatch、GCP
Stackdriver
● 考慮部署:設定成 Configurable、跨區部署
● Log 結構化格式 (csv or json):才可以查
詢、自動化
● 設計 Health Check
(Best?) Practice
24
● 利用 Big Data Solution 處理 Log
Query 需求,像是 AWS Athena or
GCP BigQuery
● Log 透過 Shipper (awslogs, statsd,
collectd, fluentd, telegraf ... ) 同時傳到
○ S3 備份,以符合稽核需求
● 巨量 Log Streaming 資料需要有
Queue 協助
○ AWS Kinesis Firehose
○ GCP Pub/Sub
- 26. 2018/05/26 @ DevOps Taiwan
● Feedback and Actions
● Observability and Monitoring
● Who Needs the Metrics?
● Latency: Realtime or Batch
● Cost Efficiency
Consideration for Monitoring Tools
26
目標策略
● Software Engineering
● Services, NOT Servers
● Event-driven
● Programmable
● Configurable
執行策略
- 28. 2018/05/26 @ DevOps Taiwan
延伸閱讀
● 淺談系統監控與 CloudWatch 的應用
● 什麼是『監控』? (What is monitoring )
● Monitoring vs Observability
● Ops as Code using Serverless
● TED: 偉大的領導者如何激勵行為 - by Simon Sinek
28