!1
Tanya Bragin
Sept 2018
Logging, Metrics, and APM: The Operations Trifecta
Logs
Metrics
APM
!3
Benefits of Logs + Metrics + APM in one stack
!4
Unified Dashboards
Same UI for KPI summaries and root cause analysis
!5
Unified Alerting
Trigger off any operational data to provide unified SLA monitoring
!6
Unified Machine Learning
Correlate multiple data sources for more intelligent anomaly detection
!7
Operational gains
Single technology for operational data saves on administrative costs
!8
Elastic Stack for logs
Metrics vs Logs
64.242.88.10 - - [07/Mar/2017:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291
64.242.88.10 - - [07/Mar/2017:16:11:58 -0800] "POST /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 404 7352
64.242.88.10 - - [07/Mar/2017:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253
For each event, print out what happened.
Logs are chronological records of events
Making logging more turnkey with ‘modules’
• Turnkey experience for specific data types
• Data to dashboard in just one step
• Automated parsing and enrichment
• Default dashboards, alerts, ML jobs
Logging modules
System
• Linux / MacOS
• Windows Events
Containers
• Docker
• Kubernetes
Databases
• MySQL
• PostgreSQL
Queues
• Kafka
• Redis
Web servers
• Apache
• Nginx
Audit data
• Filesystem
• System calls
WINLOGBEATFILEBEATAUDITBEAT
Infrastructure Applications
!12
Ad-hoc log search and visualization
Kibana Discover, Visualize, Dashboard
!13
Hot/Warm architectures in EC / ECE
• One click hot-warm deployments
• Shipped in EC in Aug 2018
• ECE support coming!
!14
Elastic Stack for metrics
Metrics vs Logs
64.242.88.10 - - [07/Mar/2017:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291
64.242.88.10 - - [07/Mar/2017:16:11:58 -0800] "POST /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 404 7352
64.242.88.10 - - [07/Mar/2017:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253
For each event, print out what happened.
Logs are chronological records of events
07/Mar/2017 16:10:00 all 2.58 0.00 0.70 1.12 0.05 95.55 server1 containerX regionA

07/Mar/2017 16:20:00 all 2.56 0.00 0.69 1.05 0.04 95.66 server2 containerY regionB

07/Mar/2017 16:30:00 all 2.64 0.00 0.65 1.15 0.05 95.50 server2 containerZ regionC



Every x minutes, measure the CPU load and print it out, and annotate with meta-data.

Metrics are periodic measurements of numeric KPIs
!16
Evolution of Elasticsearch into Metrics Store
Elasticsearch for search and numerical analytics
Inverted Index for full-text search Columnar store for structured data
BKD Trees for numerical operations Rollups
• Elasticsearch primarily used for application search
• Lucene data structure: Inverted index
Elasticsearch beginnings
Circa 2010
• Elasticsearch 1.0 evolves to support a columnar store (built on top of Lucene “doc values”)
• Structured string and numerical data can be stored there for fast retrieval and summarization / analytics
Elasticsearch evolving to support analytics
~ 2010 to 2014
https://www.elastic.co/blog/elasticsearch-as-a-column-store
• Elasticsearch 5.0 adds more data structures for efficient storing and querying numbers (BKD Trees)
• These structures become the default storage for numerical and geospatial data in Elasticsearch
Elasticsearch storage efficiencies
2016
https://www.elastic.co/blog/searching-numb3rs-in-5.0
1-Dimension 2-Dimensions
• Elasticsearch 6.0 improves Lucene sparse values storage efficiency (41.5% in Metricbeat index size)
Elasticsearch storage efficiencies
2017
https://www.elastic.co/blog/minimize-index-storage-size-elasticsearch-6-0
Rollup support for long-term retentions
https://www.elastic.co/blog/data-rollups-in-elasticsearch-you-know-for-saving-space
Added in Elasticsearch 6.3
!23
DEMO
!24
Elastic Stack as a Metrics Solution
Metrics modules
System
• Linux
• MacOS
• Windows
• Perfmon
Infrastructure
Cloud
• AWS
• GCP
• Azure
• DigitalOcean
• Alibaba
Containers
• Docker
• Kubernetes
Virtualization
• vSphere
PACKETBEATMETRICBEAT
Network
• Netflow
• Packets
• TLS Envelope
Storage
• Ceph
LOGSTASHHEARTBEAT
Applications
Datastores
• MySQL
• PostgreSQL
• MongoDB
• Couchbase
• Aerospike
• Graphite
Web servers
• Apache
• Nginx
Other
• HAProxy
• Zookeeper
Queues
• Kafka
• Redis
• RabbitMQ
Caches
• Memcached
Uptime
• Heartbeat
Custom apps
• JMX/Jolokia
• PHP-FPM
• Golang
Metrics modules PACKETBEATMETRICBEAT LOGSTASHHEARTBEAT
Roadmap: New operational data sources
New Beats,
Logstash inputs
and modules
Default actions
for existing
modules
Agentless
Shippers
• Cloud Monitoring (Azure,
Amazon, GCP, …)
• Security Analytics (Bro,
Suricata, Sysmon,…)
• Machine Learning jobs for
Docker/Kubernetes
• Default alerts for top 5
modules
• Deploy as functions
• Ship data without needing to
tent to infrastructure
• Correlate data from different sources
• Ability to re-use analysis content
• Ability to re-use Elastic-provided content
Correlation between logs, metrics, and APM
Benefits
• Version 0.1 published: github.com/elastic/ecs
• Working with internal groups to validate
• Community feedback welcome!
Status
Elastic Common Schema
Visualizing time series data
Time Series Visual Builder
Visualizing time series data
Annotations
!31
Elastic Stack for APM
What is APM?
Example
08:32:10 Request "/api/checkout"
08.32:11 Response "/api/checkout 500 ERROR"
What is APM?
Example
08:32:10 Request "/api/products/top"
08.32:17 Response "/api/products/top 200 OK"
7 seconds - zZzzZZz
How does APM work?
Data
processor
apm-server
Data storage
elasticsearch
Browser
Agent
Web server
Agent
Web server
Agent
Web server
Agent
UI
kibana
Browser
Agent
Browser
Agent
• Focuses on search experience on top of APM data
• ‘Just another index’ in Elastic Stack
Elastic APM
APM adds end-user experience and application-level monitoring to the stack
Language support
● Python

● Node.js

● Ruby (Beta)

● RUM (Beta)


● Java (Beta)
● Go (Beta)
Curated UI for APM
Combine custom
workflow with
freedom of search
Roadmap: Distributed Tracing
Trace and map across multiple services

• See the end-to-end view and
navigate to individual transactions
• Based on the notion of a end-to-
end Trace ID across services
• Investigating compatibility with
OpenTracing API and aligning
with W3C trace context spec
Single transaction
Distributed tracing
Transaction
Span
Span
Span
HTTP request Response
Distributed tracing example
Distributed tracing
Trace A
Transaction 1
Span
Span
Span
Transaction 2
Span
Transaction 3
Span
Span
APM is another index in Elasticsearch
Need another visualization? Build a dashboard, no need to wait for your vendor
!41
DEMO
!42
What now?
Try it yourself!
!44
Come to Speaker AMA!
Questions?

Logging, Metrics, and APM: The Operations Trifecta

  • 1.
    !1 Tanya Bragin Sept 2018 Logging,Metrics, and APM: The Operations Trifecta
  • 2.
  • 3.
    !3 Benefits of Logs+ Metrics + APM in one stack
  • 4.
    !4 Unified Dashboards Same UIfor KPI summaries and root cause analysis
  • 5.
    !5 Unified Alerting Trigger offany operational data to provide unified SLA monitoring
  • 6.
    !6 Unified Machine Learning Correlatemultiple data sources for more intelligent anomaly detection
  • 7.
    !7 Operational gains Single technologyfor operational data saves on administrative costs
  • 8.
  • 9.
    Metrics vs Logs 64.242.88.10- - [07/Mar/2017:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291 64.242.88.10 - - [07/Mar/2017:16:11:58 -0800] "POST /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 404 7352 64.242.88.10 - - [07/Mar/2017:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253 For each event, print out what happened. Logs are chronological records of events
  • 10.
    Making logging moreturnkey with ‘modules’ • Turnkey experience for specific data types • Data to dashboard in just one step • Automated parsing and enrichment • Default dashboards, alerts, ML jobs
  • 11.
    Logging modules System • Linux/ MacOS • Windows Events Containers • Docker • Kubernetes Databases • MySQL • PostgreSQL Queues • Kafka • Redis Web servers • Apache • Nginx Audit data • Filesystem • System calls WINLOGBEATFILEBEATAUDITBEAT Infrastructure Applications
  • 12.
    !12 Ad-hoc log searchand visualization Kibana Discover, Visualize, Dashboard
  • 13.
    !13 Hot/Warm architectures inEC / ECE • One click hot-warm deployments • Shipped in EC in Aug 2018 • ECE support coming!
  • 14.
  • 15.
    Metrics vs Logs 64.242.88.10- - [07/Mar/2017:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291 64.242.88.10 - - [07/Mar/2017:16:11:58 -0800] "POST /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 404 7352 64.242.88.10 - - [07/Mar/2017:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253 For each event, print out what happened. Logs are chronological records of events 07/Mar/2017 16:10:00 all 2.58 0.00 0.70 1.12 0.05 95.55 server1 containerX regionA
 07/Mar/2017 16:20:00 all 2.56 0.00 0.69 1.05 0.04 95.66 server2 containerY regionB
 07/Mar/2017 16:30:00 all 2.64 0.00 0.65 1.15 0.05 95.50 server2 containerZ regionC
 
 Every x minutes, measure the CPU load and print it out, and annotate with meta-data.
 Metrics are periodic measurements of numeric KPIs
  • 16.
  • 17.
    Elasticsearch for searchand numerical analytics Inverted Index for full-text search Columnar store for structured data BKD Trees for numerical operations Rollups
  • 18.
    • Elasticsearch primarilyused for application search • Lucene data structure: Inverted index Elasticsearch beginnings Circa 2010
  • 19.
    • Elasticsearch 1.0evolves to support a columnar store (built on top of Lucene “doc values”) • Structured string and numerical data can be stored there for fast retrieval and summarization / analytics Elasticsearch evolving to support analytics ~ 2010 to 2014 https://www.elastic.co/blog/elasticsearch-as-a-column-store
  • 20.
    • Elasticsearch 5.0adds more data structures for efficient storing and querying numbers (BKD Trees) • These structures become the default storage for numerical and geospatial data in Elasticsearch Elasticsearch storage efficiencies 2016 https://www.elastic.co/blog/searching-numb3rs-in-5.0 1-Dimension 2-Dimensions
  • 21.
    • Elasticsearch 6.0improves Lucene sparse values storage efficiency (41.5% in Metricbeat index size) Elasticsearch storage efficiencies 2017 https://www.elastic.co/blog/minimize-index-storage-size-elasticsearch-6-0
  • 22.
    Rollup support forlong-term retentions https://www.elastic.co/blog/data-rollups-in-elasticsearch-you-know-for-saving-space Added in Elasticsearch 6.3
  • 23.
  • 24.
    !24 Elastic Stack asa Metrics Solution
  • 25.
    Metrics modules System • Linux •MacOS • Windows • Perfmon Infrastructure Cloud • AWS • GCP • Azure • DigitalOcean • Alibaba Containers • Docker • Kubernetes Virtualization • vSphere PACKETBEATMETRICBEAT Network • Netflow • Packets • TLS Envelope Storage • Ceph LOGSTASHHEARTBEAT
  • 26.
    Applications Datastores • MySQL • PostgreSQL •MongoDB • Couchbase • Aerospike • Graphite Web servers • Apache • Nginx Other • HAProxy • Zookeeper Queues • Kafka • Redis • RabbitMQ Caches • Memcached Uptime • Heartbeat Custom apps • JMX/Jolokia • PHP-FPM • Golang Metrics modules PACKETBEATMETRICBEAT LOGSTASHHEARTBEAT
  • 27.
    Roadmap: New operationaldata sources New Beats, Logstash inputs and modules Default actions for existing modules Agentless Shippers • Cloud Monitoring (Azure, Amazon, GCP, …) • Security Analytics (Bro, Suricata, Sysmon,…) • Machine Learning jobs for Docker/Kubernetes • Default alerts for top 5 modules • Deploy as functions • Ship data without needing to tent to infrastructure
  • 28.
    • Correlate datafrom different sources • Ability to re-use analysis content • Ability to re-use Elastic-provided content Correlation between logs, metrics, and APM Benefits • Version 0.1 published: github.com/elastic/ecs • Working with internal groups to validate • Community feedback welcome! Status Elastic Common Schema
  • 29.
    Visualizing time seriesdata Time Series Visual Builder
  • 30.
    Visualizing time seriesdata Annotations
  • 31.
  • 32.
    What is APM? Example 08:32:10Request "/api/checkout" 08.32:11 Response "/api/checkout 500 ERROR"
  • 33.
    What is APM? Example 08:32:10Request "/api/products/top" 08.32:17 Response "/api/products/top 200 OK" 7 seconds - zZzzZZz
  • 34.
    How does APMwork? Data processor apm-server Data storage elasticsearch Browser Agent Web server Agent Web server Agent Web server Agent UI kibana Browser Agent Browser Agent
  • 35.
    • Focuses onsearch experience on top of APM data • ‘Just another index’ in Elastic Stack Elastic APM APM adds end-user experience and application-level monitoring to the stack Language support ● Python
 ● Node.js
 ● Ruby (Beta)
 ● RUM (Beta) 
 ● Java (Beta) ● Go (Beta)
  • 36.
    Curated UI forAPM Combine custom workflow with freedom of search
  • 37.
    Roadmap: Distributed Tracing Traceand map across multiple services
 • See the end-to-end view and navigate to individual transactions • Based on the notion of a end-to- end Trace ID across services • Investigating compatibility with OpenTracing API and aligning with W3C trace context spec
  • 38.
  • 39.
    Distributed tracing example Distributedtracing Trace A Transaction 1 Span Span Span Transaction 2 Span Transaction 3 Span Span
  • 40.
    APM is anotherindex in Elasticsearch Need another visualization? Build a dashboard, no need to wait for your vendor
  • 41.
  • 42.
  • 43.
  • 44.
    !44 Come to SpeakerAMA! Questions?