Monitoring the Cloud
Gang Tao
What to Monitor?
• Object -> App, Service, Device, Infrastructure

• Health Status

• Metric (performance)
Why Cloud Monitoring is
Different?
Changing World
Dynamic Cloud
Different Org/
People are using
different tools
Grand Grandpa of
Monitoring
IBM Tivoli

HP Openview
Network Monitor
Nagios
Application Performance
Monitor (APM)
AppDynamics
Cisco announced its
intents to acquire
AppDynamics for
$3.7 billion
New Relic
•Key host health metrics (CPU, Memory,
Load, Disk, Network), collected every five
seconds

•Disk resource utilization and time to full
metrics at-a-glance

•Full search of all your hosts that makes it
easy to find vulnerable packages — or
anything else — in seconds

•A real-time feed of all changes as they
happen across all your hosts, including
config changes

•Docker support, including the ability to
track container performance by image,
version, and other labels

•Correlation of your metrics and events to
provide better context of your hosts'
performance
New Relic AWS Monitoring
Metrics DB
Prometheus
Service Discovery
Consul
Consul Sample Architecture
Why not Splunk?
Some Requirements
• Flexible/Extensible

• Scalable

• adapt to quick change

• Light

• Easy to integrate with automation tools

Cloud monitoring