SlideShare a Scribd company logo
TELEMETRY
KIM CHRISTENSEN
Senior Developer, Scrum master and Cloud
TechLead at SimCorp
2
WHAT IS TELEMETRY?1
3
“
The science or process of collecting
information about objects that are far
away and sending the information
somewhere electronically
-Cambridge Dictionary
4
TELEMETRY
Enables you to answer questions like
◦ What features are used?
◦ How many requests are
processed?
◦ What is the CPU and memory
load?
◦ What is the queue length?
5
6
LOGGING IS NOT
TELEMETRY
7
But they can be correlated
LOGGING IS NOT TELEMETRY
Logging is
◦ Diagnose errors and code flows
◦ Used for in-depth investigations
Telemetry is
◦ Raw data
◦ Easy to process
◦ Foundation for alerting
8
WHY NOT LOGGING AS TELEMETRY?
◦ Logs use much more space
▫ Prometheus: Average 1.37 bytes
▫ InfluxDB: Average 3 bytes
◦ Post-processing is necessary
9
WHY COLLECT TELEMETRY?2
10
WHY?
◦ Application performance
▫ Which parts are slow?
▫ Memory leaks
▫ Monitor external dependencies
◦ Business performance
▫ How does new features affect
customers?
▫ Which parts of the system is used?
◦ Visibility
▫ Base decisions on facts
▫ Create reports on data
11
SAMPLE
12
SAMPLE
13
SAMPLE
14
HOW TO COLLECT TELEMETRY?3
15
TYPES OF TELEMETRY
White-box
● How many times an action has been performance
● Processing speed
Gray-box
● Dynamic instrumentation of code
● HTTP server metrics
Black-box
● CPU load
● Memory load
16
WAYS TO COLLECT DATA
Pull
Push
17
WAYS TO COLLECT DATA - PUSH
◦ Pros
▫ Allows for real-time data
▫ Data can be sent before shutting down
▫ No external access to app is needed
◦ Cons
▫ All apps needs to know location of
storage
▫ Storage cannot easily be moved
18
WAYS TO COLLECT DATA - PULL
◦ Pros
▫ Apps don’t know the location of the
storage
▫ Storage can easily be moved
◦ Cons
▫ Cannot provide real-time data
▫ Data can be lost when shutting down
▫ External access to app is needed
19
AGGREGATION TYPES
Common aggregation types:
◦ Counter
▫ Incremental values
◦ Gauge
▫ Current value
◦ Histogram
▫ Group values in buckets
◦ Timer
▫ Measure time, usually a combination of
the types above
20
WHY AGGREGATE?
◦ Pros
▫ Less memory needed
▫ Uses less bandwidth
▫ Often raw values aren’t need
◦ Cons
▫ Requires more computing
▫ Precision loss
21
WHAT TELEMETRY NOT IS4
22
WHAT TELEMETRY NOT IS
◦ Profiling
▫ Profiling is used during development
◦ Debugging
▫ Debugging is done during development
▫ In extreme cases in production
◦ Logging
▫ Provides textual context
▫ Can be correlated with telemetry data
23
USING THE DATA5
24
TELEMETRY IS
REQUIRED FOR
ALERTING AND
TRENDING
25
ALERTING
Telemetry is the foundation of
alerting
◦ What alerts to you need?
◦ When should the alert be raised?
◦ Is it a spike?
Data resolution needs to be fairly
high
26
TRENDING
◦ Has the performance degraded
since last version?
◦ Has users changed behaviour?
Data resolution doesn’t need to be
high
27
WHAT TO COLLECT?
Could be
◦ Version
◦ Machine ID
◦ User ID
◦ Memory and CPU consumption
◦ Customer ID
◦ Request path
◦ Latency
28
WHAT TOOLS TO USE?
A lot of tools exists
◦ Prometheus (OSS)
◦ TICK stack (InfluxData, OSS)
◦ Okanshi (OSS)
◦ AppMetrics (OSS)
◦ AppDynamics
◦ Graphite (OSS)
◦ Grafana (OSS)
◦ and more...
29
SITE RELIABILITY BOOKS
30
Thanks!
ANY QUESTIONS?
You can find me at
@Xharze
kimworking@gmail.com
31

More Related Content

Similar to Telemetry - what and why

Seeing RED: Monitoring and Observability in the Age of Microservices
Seeing RED: Monitoring and Observability in the Age of MicroservicesSeeing RED: Monitoring and Observability in the Age of Microservices
Seeing RED: Monitoring and Observability in the Age of Microservices
Dave McAllister
 
Why we decided on RSA Security Analytics for network visibility
Why we decided on RSA Security Analytics for network visibilityWhy we decided on RSA Security Analytics for network visibility
Why we decided on RSA Security Analytics for network visibility
Recruit Technologies
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
InfluxData
 
SQLite3
SQLite3SQLite3
SQLite3
cltru
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
Ed Hunter
 
I pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekendI pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekend
Nicolas Carlier
 
Performance in Android: Tips and Techniques [IndicThreads Mobile Application ...
Performance in Android: Tips and Techniques [IndicThreads Mobile Application ...Performance in Android: Tips and Techniques [IndicThreads Mobile Application ...
Performance in Android: Tips and Techniques [IndicThreads Mobile Application ...
IndicThreads
 
Tactics for Testing DevOps Infrastructure Code
Tactics for Testing DevOps Infrastructure CodeTactics for Testing DevOps Infrastructure Code
Tactics for Testing DevOps Infrastructure Code
Derek Ashmore
 
How AI and ML are driving Memory Architecture changes
How AI and ML are driving Memory Architecture changesHow AI and ML are driving Memory Architecture changes
How AI and ML are driving Memory Architecture changes
Danny Sabour
 
Turbo2018 workshop JIT as a Service
Turbo2018 workshop   JIT as a ServiceTurbo2018 workshop   JIT as a Service
Turbo2018 workshop JIT as a Service
Mark Stoodley
 
Stored Procedure Superpowers: A Developer’s Guide
Stored Procedure Superpowers: A Developer’s GuideStored Procedure Superpowers: A Developer’s Guide
Stored Procedure Superpowers: A Developer’s Guide
VoltDB
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
jhugg
 
PCI: Building Compliant Applications in the Public Cloud - RightScale Compute...
PCI: Building Compliant Applications in the Public Cloud - RightScale Compute...PCI: Building Compliant Applications in the Public Cloud - RightScale Compute...
PCI: Building Compliant Applications in the Public Cloud - RightScale Compute...
RightScale
 
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Databricks
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
Ender Aydin Orak
 
Streaming datasets for personalization
Streaming datasets for personalizationStreaming datasets for personalization
Streaming datasets for personalization
Shriya Arora
 
AMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interactionAMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interaction
Daniel Norman
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB
 
High Reliabilty Systems
High Reliabilty SystemsHigh Reliabilty Systems
High Reliabilty Systems
LloydMoore
 
Big Data Approaches to Cloud Security
Big Data Approaches to Cloud SecurityBig Data Approaches to Cloud Security
Big Data Approaches to Cloud Security
Paul Morse
 

Similar to Telemetry - what and why (20)

Seeing RED: Monitoring and Observability in the Age of Microservices
Seeing RED: Monitoring and Observability in the Age of MicroservicesSeeing RED: Monitoring and Observability in the Age of Microservices
Seeing RED: Monitoring and Observability in the Age of Microservices
 
Why we decided on RSA Security Analytics for network visibility
Why we decided on RSA Security Analytics for network visibilityWhy we decided on RSA Security Analytics for network visibility
Why we decided on RSA Security Analytics for network visibility
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
 
SQLite3
SQLite3SQLite3
SQLite3
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
 
I pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekendI pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekend
 
Performance in Android: Tips and Techniques [IndicThreads Mobile Application ...
Performance in Android: Tips and Techniques [IndicThreads Mobile Application ...Performance in Android: Tips and Techniques [IndicThreads Mobile Application ...
Performance in Android: Tips and Techniques [IndicThreads Mobile Application ...
 
Tactics for Testing DevOps Infrastructure Code
Tactics for Testing DevOps Infrastructure CodeTactics for Testing DevOps Infrastructure Code
Tactics for Testing DevOps Infrastructure Code
 
How AI and ML are driving Memory Architecture changes
How AI and ML are driving Memory Architecture changesHow AI and ML are driving Memory Architecture changes
How AI and ML are driving Memory Architecture changes
 
Turbo2018 workshop JIT as a Service
Turbo2018 workshop   JIT as a ServiceTurbo2018 workshop   JIT as a Service
Turbo2018 workshop JIT as a Service
 
Stored Procedure Superpowers: A Developer’s Guide
Stored Procedure Superpowers: A Developer’s GuideStored Procedure Superpowers: A Developer’s Guide
Stored Procedure Superpowers: A Developer’s Guide
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
PCI: Building Compliant Applications in the Public Cloud - RightScale Compute...
PCI: Building Compliant Applications in the Public Cloud - RightScale Compute...PCI: Building Compliant Applications in the Public Cloud - RightScale Compute...
PCI: Building Compliant Applications in the Public Cloud - RightScale Compute...
 
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
 
Streaming datasets for personalization
Streaming datasets for personalizationStreaming datasets for personalization
Streaming datasets for personalization
 
AMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interactionAMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interaction
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
High Reliabilty Systems
High Reliabilty SystemsHigh Reliabilty Systems
High Reliabilty Systems
 
Big Data Approaches to Cloud Security
Big Data Approaches to Cloud SecurityBig Data Approaches to Cloud Security
Big Data Approaches to Cloud Security
 

Recently uploaded

Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
kgyxske
 
Going AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applicationsGoing AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applications
Alina Yurenko
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Paul Brebner
 
Cost-Effective Strategies For iOS App Development
Cost-Effective Strategies For iOS App DevelopmentCost-Effective Strategies For iOS App Development
Cost-Effective Strategies For iOS App Development
Softradix Technologies
 
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
Luigi Fugaro
 
42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert
vaishalijagtap12
 
Computer Science & Engineering VI Sem- New Syllabus.pdf
Computer Science & Engineering VI Sem- New Syllabus.pdfComputer Science & Engineering VI Sem- New Syllabus.pdf
Computer Science & Engineering VI Sem- New Syllabus.pdf
chandangoswami40933
 
The Comprehensive Guide to Validating Audio-Visual Performances.pdf
The Comprehensive Guide to Validating Audio-Visual Performances.pdfThe Comprehensive Guide to Validating Audio-Visual Performances.pdf
The Comprehensive Guide to Validating Audio-Visual Performances.pdf
kalichargn70th171
 
Building API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructureBuilding API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructure
confluent
 
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom KittEnhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Peter Caitens
 
Secure-by-Design Using Hardware and Software Protection for FDA Compliance
Secure-by-Design Using Hardware and Software Protection for FDA ComplianceSecure-by-Design Using Hardware and Software Protection for FDA Compliance
Secure-by-Design Using Hardware and Software Protection for FDA Compliance
ICS
 
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptxMigration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
ervikas4
 
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
widenerjobeyrl638
 
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio, Inc.
 
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
kalichargn70th171
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
Alina Yurenko
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Flutter vs. React Native: A Detailed Comparison for App Development in 2024
Flutter vs. React Native: A Detailed Comparison for App Development in 2024Flutter vs. React Native: A Detailed Comparison for App Development in 2024
Flutter vs. React Native: A Detailed Comparison for App Development in 2024
dhavalvaghelanectarb
 
The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024
Yara Milbes
 

Recently uploaded (20)

Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
 
Going AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applicationsGoing AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applications
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
 
Cost-Effective Strategies For iOS App Development
Cost-Effective Strategies For iOS App DevelopmentCost-Effective Strategies For iOS App Development
Cost-Effective Strategies For iOS App Development
 
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
 
42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert
 
Computer Science & Engineering VI Sem- New Syllabus.pdf
Computer Science & Engineering VI Sem- New Syllabus.pdfComputer Science & Engineering VI Sem- New Syllabus.pdf
Computer Science & Engineering VI Sem- New Syllabus.pdf
 
The Comprehensive Guide to Validating Audio-Visual Performances.pdf
The Comprehensive Guide to Validating Audio-Visual Performances.pdfThe Comprehensive Guide to Validating Audio-Visual Performances.pdf
The Comprehensive Guide to Validating Audio-Visual Performances.pdf
 
Building API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructureBuilding API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructure
 
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom KittEnhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
 
Secure-by-Design Using Hardware and Software Protection for FDA Compliance
Secure-by-Design Using Hardware and Software Protection for FDA ComplianceSecure-by-Design Using Hardware and Software Protection for FDA Compliance
Secure-by-Design Using Hardware and Software Protection for FDA Compliance
 
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptxMigration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
 
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
 
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
 
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Flutter vs. React Native: A Detailed Comparison for App Development in 2024
Flutter vs. React Native: A Detailed Comparison for App Development in 2024Flutter vs. React Native: A Detailed Comparison for App Development in 2024
Flutter vs. React Native: A Detailed Comparison for App Development in 2024
 
The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024
 

Telemetry - what and why

  • 2. KIM CHRISTENSEN Senior Developer, Scrum master and Cloud TechLead at SimCorp 2
  • 4. “ The science or process of collecting information about objects that are far away and sending the information somewhere electronically -Cambridge Dictionary 4
  • 5. TELEMETRY Enables you to answer questions like ◦ What features are used? ◦ How many requests are processed? ◦ What is the CPU and memory load? ◦ What is the queue length? 5
  • 6. 6
  • 7. LOGGING IS NOT TELEMETRY 7 But they can be correlated
  • 8. LOGGING IS NOT TELEMETRY Logging is ◦ Diagnose errors and code flows ◦ Used for in-depth investigations Telemetry is ◦ Raw data ◦ Easy to process ◦ Foundation for alerting 8
  • 9. WHY NOT LOGGING AS TELEMETRY? ◦ Logs use much more space ▫ Prometheus: Average 1.37 bytes ▫ InfluxDB: Average 3 bytes ◦ Post-processing is necessary 9
  • 11. WHY? ◦ Application performance ▫ Which parts are slow? ▫ Memory leaks ▫ Monitor external dependencies ◦ Business performance ▫ How does new features affect customers? ▫ Which parts of the system is used? ◦ Visibility ▫ Base decisions on facts ▫ Create reports on data 11
  • 15. HOW TO COLLECT TELEMETRY?3 15
  • 16. TYPES OF TELEMETRY White-box ● How many times an action has been performance ● Processing speed Gray-box ● Dynamic instrumentation of code ● HTTP server metrics Black-box ● CPU load ● Memory load 16
  • 17. WAYS TO COLLECT DATA Pull Push 17
  • 18. WAYS TO COLLECT DATA - PUSH ◦ Pros ▫ Allows for real-time data ▫ Data can be sent before shutting down ▫ No external access to app is needed ◦ Cons ▫ All apps needs to know location of storage ▫ Storage cannot easily be moved 18
  • 19. WAYS TO COLLECT DATA - PULL ◦ Pros ▫ Apps don’t know the location of the storage ▫ Storage can easily be moved ◦ Cons ▫ Cannot provide real-time data ▫ Data can be lost when shutting down ▫ External access to app is needed 19
  • 20. AGGREGATION TYPES Common aggregation types: ◦ Counter ▫ Incremental values ◦ Gauge ▫ Current value ◦ Histogram ▫ Group values in buckets ◦ Timer ▫ Measure time, usually a combination of the types above 20
  • 21. WHY AGGREGATE? ◦ Pros ▫ Less memory needed ▫ Uses less bandwidth ▫ Often raw values aren’t need ◦ Cons ▫ Requires more computing ▫ Precision loss 21
  • 23. WHAT TELEMETRY NOT IS ◦ Profiling ▫ Profiling is used during development ◦ Debugging ▫ Debugging is done during development ▫ In extreme cases in production ◦ Logging ▫ Provides textual context ▫ Can be correlated with telemetry data 23
  • 26. ALERTING Telemetry is the foundation of alerting ◦ What alerts to you need? ◦ When should the alert be raised? ◦ Is it a spike? Data resolution needs to be fairly high 26
  • 27. TRENDING ◦ Has the performance degraded since last version? ◦ Has users changed behaviour? Data resolution doesn’t need to be high 27
  • 28. WHAT TO COLLECT? Could be ◦ Version ◦ Machine ID ◦ User ID ◦ Memory and CPU consumption ◦ Customer ID ◦ Request path ◦ Latency 28
  • 29. WHAT TOOLS TO USE? A lot of tools exists ◦ Prometheus (OSS) ◦ TICK stack (InfluxData, OSS) ◦ Okanshi (OSS) ◦ AppMetrics (OSS) ◦ AppDynamics ◦ Graphite (OSS) ◦ Grafana (OSS) ◦ and more... 29
  • 31. Thanks! ANY QUESTIONS? You can find me at @Xharze kimworking@gmail.com 31