SlideShare a Scribd company logo
How to improve database
observability?
@Charles_JUDITH
About me
● Senior Site Reliability Engineer at Criteo
● Working on monitoring topics since few years
● Currently providing the (open source) database service
at Criteo
● Previously worked on the observability stack at Criteo
● @Charles_JUDITH on Twitter
Agenda
1. Context
2. First iteration
3. Second iteration
4. Next steps
5. Resources
Context
Context
● Feedback from my experience at Criteo
● MariaDB/MySQL setup on multiple data centers
● Bare metal servers
Context
● Hidden issues (backup, usage, …)
● Incident resolution was based on vague information
● No alerting
● Dashboard with metrics
Goal
● Alerting
● No more hidden issues
● Dashboards for everyone
● An observable platform!
● The DBA team shouldn’t be a “blocker” for the users!
What is observability?
OBSERVABILITY IS A MEASURE OF HOW WELL
INTERNAL STATES OF A SYSTEM CAN BE
INFERRED FROM KNOWLEDGE OF ITS EXTERNAL
OUTPUTS. »
SOURCE: WIKIPEDIA
What I think about observability
● It’s not only about the tools
● It’s not a fancy name to say “monitoring”
● It’s more about “transparency”
Why a system needs to be
observable?
Why a system needs to be observable?
● Is it working as expected by the users?
● To answer basic questions about your service/platform
● Increase the visibility for you and your users/customers
● Long term tends analysis
● “If can’t measure it, you can’t manage it”
Observability is fundamental for reliability
Analogy to the Maslow’s hierarchy of needs
The observability effects
The observability effects
● Giving superpowers
● It’s like a roller coaster
● You need to be patient
Let’s go!
Metrics
How to start?
USE method
● USE was introduced by @brendangregg
● Utilization: disk,CPU usage …
● Saturation: disk I/O
● Errors: network interface errors
The four golden signals
● Introduced in the Google SRE book
● Latency: response time, queue/wait time
● Traffic: A measure of how much demand is being placed on the service
● Errors: The rate of requests that fail
● Saturation: How “full” is the service
RED method
● RED was introduced by @tom_wilkie
● (Request) Rate - the number of requests, per second, you services are serving.
● (Request) Errors - the number of failed requests per second.
● (Request) Duration - distributions of the amount of time each request takes.
● Subset of “The Four Golden Signals”
The seven golden signals
● CELT + USE introduced by @xaprb
● Concurrency: number of simultaneous requests
● Error rate
● Latency: response time
● Throughput: query per seconds (QPS)
CASE method
● CASE was introduced by @gphat
● Context-heavy
● Actionnable
● Symptom-based
● Evaluated
Preferred approach
● The seven golden signals
● Good to measure the service quality
● System and application metrics are valuable in our case
How to collect the metrics?
● Collectd
● Node exporter
● MySQLD exporter
● Python MySQL plugin for CollectD
● Few others
What to do with all these metrics?
● Pick some useful “indicators” like:
○ thread usage
○ service status
○ backup status, duration, size
○ replication lag
How to show/use those
metrics?
Global overview
InnoDB metrics
Simple user view
USE dashboard
Disk partition full with
tmp_table
Max connection reached
Database cleaning and
optimize table
DATABASES EXPOSE LOTS OF METRICS ABOUT
THEIR STATUS, BUT MUCH LESS ABOUT THE
DETAILS OF THEIR WORKLOAD.
Current status
● We have system and database metrics
● Alerting
● Dashboards with metrics easily available for everyone
“WE THINK OUR DATABASE IS SLOW?”
“Last week week we noticed that
the database was slow.”
Logs
Logs
● Logs all the SQL queries (general log)
● Install an agent to ship those logs with “custom fields”
● Make the logs available for our users
Logs
● Logs all the SQL queries (general log)
● Install an agent to ship those logs with “custom fields”
● Configure MySQL/MariaDB to log the slow queries
● Use Rsyslog with a custom template!
● Make the logs available for our users
Happy customers
Benefits
● The DBA is not a blocker for the developers
● Better visibility on the database service
● Happy customers/developers/users
Conclusions
● The visibility and transparency
● Effective monitoring
● Shipping slow queries is not easy
● In that case metrics and logs is a good combo but we want more!
Next steps
● Continue to improve the SQL logging
● Leverage the usage of sys_schema
● Metrics per database
● Publish the SLA
● Open source our probe for MySQL/MariaDB
Resources
https://github.com/CharlesJUDITH/database-observability-toolkit
Thank you!

More Related Content

What's hot

MySQL 高可用性
MySQL 高可用性MySQL 高可用性
MySQL 高可用性
YUCHENG HU
 
Debugging the Deadlock for the Scheduler
Debugging the Deadlock for the SchedulerDebugging the Deadlock for the Scheduler
Debugging the Deadlock for the Scheduler
Amit Banerjee
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Demi Ben-Ari
 
Security Best Practices for your Postgres Deployment
Security Best Practices for your Postgres DeploymentSecurity Best Practices for your Postgres Deployment
Security Best Practices for your Postgres Deployment
PGConf APAC
 
Monitoring sql server
Monitoring sql serverMonitoring sql server
Monitoring sql server
John Martin
 
Instaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & ToubleshootingInstaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
MongoDB
 
Log Management: AtlSecCon2015
Log Management: AtlSecCon2015Log Management: AtlSecCon2015
Log Management: AtlSecCon2015
cameronevans
 
Instaclustr introduction to managing cassandra
Instaclustr introduction to managing cassandraInstaclustr introduction to managing cassandra
Instaclustr introduction to managing cassandra
Instaclustr
 
Metrics lightning talk
Metrics lightning talkMetrics lightning talk
Metrics lightning talk
Chris Lohfink
 
Cassandra Community Webinar | Data Model on Fire
Cassandra Community Webinar | Data Model on FireCassandra Community Webinar | Data Model on Fire
Cassandra Community Webinar | Data Model on Fire
DataStax
 
Flux architecture and Redux - theory, context and practice
Flux architecture and Redux - theory, context and practiceFlux architecture and Redux - theory, context and practice
Flux architecture and Redux - theory, context and practice
Jakub Kocikowski
 
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
Modern Data Stack France
 
Zookeeper
ZookeeperZookeeper
Zookeeper
SatyaHadoop
 
Евгений Хыст "Application performance database related problems"
Евгений Хыст "Application performance database related problems"Евгений Хыст "Application performance database related problems"
Евгений Хыст "Application performance database related problems"
Anna Shymchenko
 
Architecture by Accident
Architecture by AccidentArchitecture by Accident
Architecture by Accident
Gleicon Moraes
 

What's hot (17)

MySQL 高可用性
MySQL 高可用性MySQL 高可用性
MySQL 高可用性
 
Debugging the Deadlock for the Scheduler
Debugging the Deadlock for the SchedulerDebugging the Deadlock for the Scheduler
Debugging the Deadlock for the Scheduler
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 
Security Best Practices for your Postgres Deployment
Security Best Practices for your Postgres DeploymentSecurity Best Practices for your Postgres Deployment
Security Best Practices for your Postgres Deployment
 
Monitoring sql server
Monitoring sql serverMonitoring sql server
Monitoring sql server
 
Instaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & ToubleshootingInstaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & Toubleshooting
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
Log Management: AtlSecCon2015
Log Management: AtlSecCon2015Log Management: AtlSecCon2015
Log Management: AtlSecCon2015
 
Instaclustr introduction to managing cassandra
Instaclustr introduction to managing cassandraInstaclustr introduction to managing cassandra
Instaclustr introduction to managing cassandra
 
Metrics lightning talk
Metrics lightning talkMetrics lightning talk
Metrics lightning talk
 
Cassandra Community Webinar | Data Model on Fire
Cassandra Community Webinar | Data Model on FireCassandra Community Webinar | Data Model on Fire
Cassandra Community Webinar | Data Model on Fire
 
Caveats
CaveatsCaveats
Caveats
 
Flux architecture and Redux - theory, context and practice
Flux architecture and Redux - theory, context and practiceFlux architecture and Redux - theory, context and practice
Flux architecture and Redux - theory, context and practice
 
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
 
Zookeeper
ZookeeperZookeeper
Zookeeper
 
Евгений Хыст "Application performance database related problems"
Евгений Хыст "Application performance database related problems"Евгений Хыст "Application performance database related problems"
Евгений Хыст "Application performance database related problems"
 
Architecture by Accident
Architecture by AccidentArchitecture by Accident
Architecture by Accident
 

Similar to OSMC 2019 | How to improve database Observability by Charles Judith

#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
Paris Open Source Summit
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community
 
Scaling apps for the big time
Scaling apps for the big timeScaling apps for the big time
Scaling apps for the big time
proitconsult
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Guglielmo Iozzia
 
Sql server tips from the field
Sql server tips from the fieldSql server tips from the field
Sql server tips from the field
JoAnna Cheshire
 
Design patterns for scaling web applications
Design patterns for scaling web applicationsDesign patterns for scaling web applications
Design patterns for scaling web applications
Ivan Dimitrov
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
Corey Huinker
 
20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...
20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...
20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...
Umair Shahid
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
InfluxData
 
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksAccelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on Databricks
Databricks
 
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA
 
Training Webinar: Detect Performance Bottlenecks of Applications
Training Webinar: Detect Performance Bottlenecks of ApplicationsTraining Webinar: Detect Performance Bottlenecks of Applications
Training Webinar: Detect Performance Bottlenecks of Applications
OutSystems
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
Zhenxiao Luo
 
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA
EDB
 
Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014
Dan Cundiff
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
jhugg
 
Improve your SQL workload with observability
Improve your SQL workload with observabilityImprove your SQL workload with observability
Improve your SQL workload with observability
OVHcloud
 
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesDruid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Shivji Kumar Jha
 
DockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging WorkshopDockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging Workshop
Brian Christner
 

Similar to OSMC 2019 | How to improve database Observability by Charles Judith (20)

#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
Scaling apps for the big time
Scaling apps for the big timeScaling apps for the big time
Scaling apps for the big time
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
Sql server tips from the field
Sql server tips from the fieldSql server tips from the field
Sql server tips from the field
 
Design patterns for scaling web applications
Design patterns for scaling web applicationsDesign patterns for scaling web applications
Design patterns for scaling web applications
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...
20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...
20240515 - Chicago PUG - Clustering in PostgreSQL: Because one database serve...
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
 
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksAccelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on Databricks
 
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
 
Training Webinar: Detect Performance Bottlenecks of Applications
Training Webinar: Detect Performance Bottlenecks of ApplicationsTraining Webinar: Detect Performance Bottlenecks of Applications
Training Webinar: Detect Performance Bottlenecks of Applications
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA
 
Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014Apache Cassandra at Target - Cassandra Summit 2014
Apache Cassandra at Target - Cassandra Summit 2014
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
Improve your SQL workload with observability
Improve your SQL workload with observabilityImprove your SQL workload with observability
Improve your SQL workload with observability
 
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesDruid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
 
DockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging WorkshopDockerCon Europe 2018 Monitoring & Logging Workshop
DockerCon Europe 2018 Monitoring & Logging Workshop
 

Recently uploaded

LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Game Development with Unity3D (Game Development lecture 3)
Game Development  with Unity3D (Game Development lecture 3)Game Development  with Unity3D (Game Development lecture 3)
Game Development with Unity3D (Game Development lecture 3)
abdulrafaychaudhry
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 

Recently uploaded (20)

LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Game Development with Unity3D (Game Development lecture 3)
Game Development  with Unity3D (Game Development lecture 3)Game Development  with Unity3D (Game Development lecture 3)
Game Development with Unity3D (Game Development lecture 3)
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 

OSMC 2019 | How to improve database Observability by Charles Judith

  • 1. How to improve database observability? @Charles_JUDITH
  • 2. About me ● Senior Site Reliability Engineer at Criteo ● Working on monitoring topics since few years ● Currently providing the (open source) database service at Criteo ● Previously worked on the observability stack at Criteo ● @Charles_JUDITH on Twitter
  • 3. Agenda 1. Context 2. First iteration 3. Second iteration 4. Next steps 5. Resources
  • 5. Context ● Feedback from my experience at Criteo ● MariaDB/MySQL setup on multiple data centers ● Bare metal servers
  • 6. Context ● Hidden issues (backup, usage, …) ● Incident resolution was based on vague information ● No alerting ● Dashboard with metrics
  • 7. Goal ● Alerting ● No more hidden issues ● Dashboards for everyone ● An observable platform! ● The DBA team shouldn’t be a “blocker” for the users!
  • 9. OBSERVABILITY IS A MEASURE OF HOW WELL INTERNAL STATES OF A SYSTEM CAN BE INFERRED FROM KNOWLEDGE OF ITS EXTERNAL OUTPUTS. » SOURCE: WIKIPEDIA
  • 10. What I think about observability ● It’s not only about the tools ● It’s not a fancy name to say “monitoring” ● It’s more about “transparency”
  • 11. Why a system needs to be observable?
  • 12. Why a system needs to be observable? ● Is it working as expected by the users? ● To answer basic questions about your service/platform ● Increase the visibility for you and your users/customers ● Long term tends analysis ● “If can’t measure it, you can’t manage it”
  • 13. Observability is fundamental for reliability Analogy to the Maslow’s hierarchy of needs
  • 15. The observability effects ● Giving superpowers ● It’s like a roller coaster ● You need to be patient
  • 19. USE method ● USE was introduced by @brendangregg ● Utilization: disk,CPU usage … ● Saturation: disk I/O ● Errors: network interface errors
  • 20. The four golden signals ● Introduced in the Google SRE book ● Latency: response time, queue/wait time ● Traffic: A measure of how much demand is being placed on the service ● Errors: The rate of requests that fail ● Saturation: How “full” is the service
  • 21. RED method ● RED was introduced by @tom_wilkie ● (Request) Rate - the number of requests, per second, you services are serving. ● (Request) Errors - the number of failed requests per second. ● (Request) Duration - distributions of the amount of time each request takes. ● Subset of “The Four Golden Signals”
  • 22. The seven golden signals ● CELT + USE introduced by @xaprb ● Concurrency: number of simultaneous requests ● Error rate ● Latency: response time ● Throughput: query per seconds (QPS)
  • 23. CASE method ● CASE was introduced by @gphat ● Context-heavy ● Actionnable ● Symptom-based ● Evaluated
  • 24.
  • 25. Preferred approach ● The seven golden signals ● Good to measure the service quality ● System and application metrics are valuable in our case
  • 26. How to collect the metrics? ● Collectd ● Node exporter ● MySQLD exporter ● Python MySQL plugin for CollectD ● Few others
  • 27. What to do with all these metrics? ● Pick some useful “indicators” like: ○ thread usage ○ service status ○ backup status, duration, size ○ replication lag
  • 28. How to show/use those metrics?
  • 33. Disk partition full with tmp_table
  • 36. DATABASES EXPOSE LOTS OF METRICS ABOUT THEIR STATUS, BUT MUCH LESS ABOUT THE DETAILS OF THEIR WORKLOAD.
  • 37. Current status ● We have system and database metrics ● Alerting ● Dashboards with metrics easily available for everyone
  • 38. “WE THINK OUR DATABASE IS SLOW?” “Last week week we noticed that the database was slow.”
  • 39.
  • 40. Logs
  • 41. Logs ● Logs all the SQL queries (general log) ● Install an agent to ship those logs with “custom fields” ● Make the logs available for our users
  • 42. Logs ● Logs all the SQL queries (general log) ● Install an agent to ship those logs with “custom fields” ● Configure MySQL/MariaDB to log the slow queries ● Use Rsyslog with a custom template! ● Make the logs available for our users
  • 43.
  • 44.
  • 45.
  • 47. Benefits ● The DBA is not a blocker for the developers ● Better visibility on the database service ● Happy customers/developers/users
  • 48. Conclusions ● The visibility and transparency ● Effective monitoring ● Shipping slow queries is not easy ● In that case metrics and logs is a good combo but we want more!
  • 49. Next steps ● Continue to improve the SQL logging ● Leverage the usage of sys_schema ● Metrics per database ● Publish the SLA ● Open source our probe for MySQL/MariaDB