1
Álvaro Lobato
Observability team lead
March 2019
Logging, Metrics, and APM: The Operations Trifecta
Logs
Metrics
APM
3
Benefits of Logs + Metrics + APM in one stack
4
Unified Dashboards
Same UI for KPI summaries and root cause analysis
5
Unified Alerting
Trigger off any operational data to provide unified SLA monitoring
6
Unified Machine Learning
Correlate multiple data sources for more intelligent anomaly detection
7
Operational gains
Single technology for operational data saves on administrative costs
8
Elastic Stack for logs
Metrics vs Logs
64.242.88.10 - - [07/Mar/2017:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291
64.242.88.10 - - [07/Mar/2017:16:11:58 -0800] "POST /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 404 7352
64.242.88.10 - - [07/Mar/2017:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253
For each event, print out what happened.
Logs are chronological records of events
Making logging more turnkey with ‘modules’
• Turnkey experience for specific data types
• Data to dashboard in just one step
• Automated parsing and enrichment
• Default dashboards, alerts, ML jobs
Logging modules
System
• Linux / MacOS
• Windows Events
Containers
• Docker
• Kubernetes
Databases
• MySQL
• PostgreSQL
Queues
• Kafka
• Redis
Web servers
• Apache
• Nginx
Audit data
• Filesystem
• System calls
WINLOGBEATFILEBEATAUDITBEAT
Infrastructure Applications
12
Log File Import
Automatic Structure Discovery
13
Ad-hoc log search and visualization
Kibana Discover, Visualize, Dashboard
14
Elastic Stack for metrics
Metrics vs Logs
64.242.88.10 - - [07/Mar/2017:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291
64.242.88.10 - - [07/Mar/2017:16:11:58 -0800] "POST /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 404 7352
64.242.88.10 - - [07/Mar/2017:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253
For each event, print out what happened.
Logs are chronological records of events
07/Mar/2017 16:10:00 all 2.58 0.00 0.70 1.12 0.05 95.55 server1 containerX regionA

07/Mar/2017 16:20:00 all 2.56 0.00 0.69 1.05 0.04 95.66 server2 containerY regionB

07/Mar/2017 16:30:00 all 2.64 0.00 0.65 1.15 0.05 95.50 server2 containerZ regionC



Every x minutes, measure the CPU load and print it out, and annotate with meta-data.

Metrics are periodic measurements of numeric KPIs
Elasticsearch for search and numerical analytics
Inverted Index for full-text search Columnar store for structured data
BKD Trees for numerical operations Rollups save space
17
Elastic Stack as a Metrics Solution
Metrics modules
System
• Linux
• MacOS
• Windows
• Perfmon
Infrastructure
Cloud
• AWS
• GCP
• Azure
• DigitalOcean
• Alibaba
Containers
• Docker
• Kubernetes
Virtualization
• vSphere
PACKETBEATMETRICBEAT
Network
• Netflow
• Packets
• TLS Envelope
Storage
• Ceph
HEARTBEAT
Applications
Datastores
• MySQL
• PostgreSQL
• MongoDB
• Couchbase
• Aerospike
• Graphite
Web servers
• Apache
• Nginx
Other
• HAProxy
• Zookeeper
Queues
• Kafka
• Redis
• RabbitMQ
Caches
• Memcached
Uptime
• Heartbeat
Custom apps
• JMX/Jolokia
• PHP-FPM
• Golang
Metrics modules PACKETBEATMETRICBEATHEARTBEAT
Visualizing time series data
Time Series Visual Builder
Visualizing time series data
Annotations
22
Elastic Stack for APM
What is APM?
Example
08:32:10 Request "/api/checkout"
08.32:11 Response "/api/checkout 500 ERROR"
What is APM?
Example
08:32:10 Request "/api/products/top"
08.32:17 Response "/api/products/top 200 OK"
7 seconds - zZzzZZz
How does APM work?
Data
processor
apm-server
Data storage
elasticsearch
Browser
Agent
Web server
Agent
Web server
Agent
Web server
Agent
UI
kibana
Browser
Agent
Browser
Agent
• Focuses on search experience on top of APM data
• ‘Just another index’ in Elastic Stack
Elastic APM
APM adds end-user experience and application-level monitoring to the stack
Language support
● Python

● Node.js

● Ruby

● RUM


● Java
● Go
● .NET (in dev)
APM is another index in Elasticsearch
Need another visualization? Build a dashboard, no need to wait for your vendor
Single transaction
Distributed Tracing
Transaction
Span
Span
Span
HTTP request Response
Distributed tracing example
Distributed Tracing
Trace A
Transaction 1
Span
Span
Span
Transaction 2
Span
Transaction 3
Span
Span
Distributed Tracing
Trace and map across multiple services

• See the end-to-end view and
navigate to individual transactions
• Based on the notion of a end-to-
end Trace ID across services
• Investigating compatibility with
OpenTracing API and aligning
with W3C trace context spec
31
DEMO
32
12:00 in London room
APM Group Discussion
33
Come to Speaker AMA!
Questions?

Logging, Metrics, and APM: The Operations Trifecta (P)

  • 1.
    1 Álvaro Lobato Observability teamlead March 2019 Logging, Metrics, and APM: The Operations Trifecta
  • 2.
  • 3.
    3 Benefits of Logs+ Metrics + APM in one stack
  • 4.
    4 Unified Dashboards Same UIfor KPI summaries and root cause analysis
  • 5.
    5 Unified Alerting Trigger offany operational data to provide unified SLA monitoring
  • 6.
    6 Unified Machine Learning Correlatemultiple data sources for more intelligent anomaly detection
  • 7.
    7 Operational gains Single technologyfor operational data saves on administrative costs
  • 8.
  • 9.
    Metrics vs Logs 64.242.88.10- - [07/Mar/2017:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291 64.242.88.10 - - [07/Mar/2017:16:11:58 -0800] "POST /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 404 7352 64.242.88.10 - - [07/Mar/2017:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253 For each event, print out what happened. Logs are chronological records of events
  • 10.
    Making logging moreturnkey with ‘modules’ • Turnkey experience for specific data types • Data to dashboard in just one step • Automated parsing and enrichment • Default dashboards, alerts, ML jobs
  • 11.
    Logging modules System • Linux/ MacOS • Windows Events Containers • Docker • Kubernetes Databases • MySQL • PostgreSQL Queues • Kafka • Redis Web servers • Apache • Nginx Audit data • Filesystem • System calls WINLOGBEATFILEBEATAUDITBEAT Infrastructure Applications
  • 12.
    12 Log File Import AutomaticStructure Discovery
  • 13.
    13 Ad-hoc log searchand visualization Kibana Discover, Visualize, Dashboard
  • 14.
  • 15.
    Metrics vs Logs 64.242.88.10- - [07/Mar/2017:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291 64.242.88.10 - - [07/Mar/2017:16:11:58 -0800] "POST /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 404 7352 64.242.88.10 - - [07/Mar/2017:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253 For each event, print out what happened. Logs are chronological records of events 07/Mar/2017 16:10:00 all 2.58 0.00 0.70 1.12 0.05 95.55 server1 containerX regionA
 07/Mar/2017 16:20:00 all 2.56 0.00 0.69 1.05 0.04 95.66 server2 containerY regionB
 07/Mar/2017 16:30:00 all 2.64 0.00 0.65 1.15 0.05 95.50 server2 containerZ regionC
 
 Every x minutes, measure the CPU load and print it out, and annotate with meta-data.
 Metrics are periodic measurements of numeric KPIs
  • 16.
    Elasticsearch for searchand numerical analytics Inverted Index for full-text search Columnar store for structured data BKD Trees for numerical operations Rollups save space
  • 17.
    17 Elastic Stack asa Metrics Solution
  • 18.
    Metrics modules System • Linux •MacOS • Windows • Perfmon Infrastructure Cloud • AWS • GCP • Azure • DigitalOcean • Alibaba Containers • Docker • Kubernetes Virtualization • vSphere PACKETBEATMETRICBEAT Network • Netflow • Packets • TLS Envelope Storage • Ceph HEARTBEAT
  • 19.
    Applications Datastores • MySQL • PostgreSQL •MongoDB • Couchbase • Aerospike • Graphite Web servers • Apache • Nginx Other • HAProxy • Zookeeper Queues • Kafka • Redis • RabbitMQ Caches • Memcached Uptime • Heartbeat Custom apps • JMX/Jolokia • PHP-FPM • Golang Metrics modules PACKETBEATMETRICBEATHEARTBEAT
  • 20.
    Visualizing time seriesdata Time Series Visual Builder
  • 21.
    Visualizing time seriesdata Annotations
  • 22.
  • 23.
    What is APM? Example 08:32:10Request "/api/checkout" 08.32:11 Response "/api/checkout 500 ERROR"
  • 24.
    What is APM? Example 08:32:10Request "/api/products/top" 08.32:17 Response "/api/products/top 200 OK" 7 seconds - zZzzZZz
  • 25.
    How does APMwork? Data processor apm-server Data storage elasticsearch Browser Agent Web server Agent Web server Agent Web server Agent UI kibana Browser Agent Browser Agent
  • 26.
    • Focuses onsearch experience on top of APM data • ‘Just another index’ in Elastic Stack Elastic APM APM adds end-user experience and application-level monitoring to the stack Language support ● Python
 ● Node.js
 ● Ruby
 ● RUM 
 ● Java ● Go ● .NET (in dev)
  • 27.
    APM is anotherindex in Elasticsearch Need another visualization? Build a dashboard, no need to wait for your vendor
  • 28.
  • 29.
    Distributed tracing example DistributedTracing Trace A Transaction 1 Span Span Span Transaction 2 Span Transaction 3 Span Span
  • 30.
    Distributed Tracing Trace andmap across multiple services
 • See the end-to-end view and navigate to individual transactions • Based on the notion of a end-to- end Trace ID across services • Investigating compatibility with OpenTracing API and aligning with W3C trace context spec
  • 31.
  • 32.
    32 12:00 in Londonroom APM Group Discussion
  • 33.
    33 Come to SpeakerAMA! Questions?