SlideShare a Scribd company logo
1 of 17
Download to read offline
EH Monitoring System
Engineering Team
Minh Nguyen & Luong Vo
Before we start
- How to answer those questions?
+ Why is the system too slow?
+ Does everything work fine?
+ What’s the main bottleneck of our system?
+ What did happen at 10:00 AM this morning that made a
lot of customers complain?
+ What’s the average time the user has to wait until they get
the notification?
+ etc.
In short, we built a system successfully.
BUT WE HAVE NO IDEA HOW IT PERFORMS.
Observability
- Programmatically and continuously capture the states of a
running system
- Analyze and extract the information to produce a set of
knowledge that the observer is interested in
- Detect the abnormal behaviors and notify the responsible,
and automatically take actions to resolve the situation
- Archive the data in convenient forms that support future
investigation or analyzing
Pillars of Observability
Log Management
Distributed TracingMetrics Monitoring
Error Tracking
Pillars of Observability
Metrics Monitoring
We need a solution that offers
- Detailed (both real-time and aggregated) statistics about our
microservices.
- Alerting when usage peeks or accidents happen.
- Easy method to implement for our microservices.
- Supports a variety of ways to keep data. (counter, gauge,
histogram ….)
- Two-way integration with Kubernetes
Demo time
Prometheus and Grafana
- Prometheus is an open-source systems
monitoring and alerting toolkit
originally built at SoundCloud.
- Grafana is is an open source
dashboard tool for data visualization.
- They are our selected approach to
extract/collect and display monitored
data.
Node 1
Push Model
Application
Node 3
Metrics collector
Node 2
Application
POST /metrics
POST /metrics
Node 1
Pull Model
Application
Node 3
Metrics collector
Node 2
Application
GET /metrics
GET /metrics
Node 1
Pull Model and Sidecar Model
Application
Node 3
Metrics collector
Node 2
GET /metrics
GET /metrics
Metric Server
/tmp/monitoring
Application Metric Server
/tmp/monitoring
- This gem helps you monitor your
service with ease.
- It abstracts away many infrastructural
layer via a lot of helpers
- Built-in native supports for gRPC,
Kafka, Sidekiq (soon)
EhMonitoring gem
Service owners are responsible for their children
What’s next?
- Support other common libraries, like Sidekiq
- Apply EhMonitoring to all services
- Dump Instana and create our own Tracing system
Reference
https://github.com/Thinkei/feature-flag-api/pull/81 - Add metrics to feature flag API.
https://docs.google.com/document/d/1-wjTM600u5Q68ImhHHA2DTtlh8wX5mc9Xv5
EEawFNFI/edit - Employment Hero microservices documents.
https://github.com/Thinkei/eh-monitoring - EH monitoring gem
http://monitor.staging.ehrocks.com/ - Our monitoring page.
The End

More Related Content

Similar to EH Monitoring System Overview

PreMonR - A Reactive Platform To Monitor Reactive Application
PreMonR - A Reactive Platform To Monitor Reactive ApplicationPreMonR - A Reactive Platform To Monitor Reactive Application
PreMonR - A Reactive Platform To Monitor Reactive ApplicationKnoldus Inc.
 
Go Observability (in practice)
Go Observability (in practice)Go Observability (in practice)
Go Observability (in practice)Eran Levy
 
Monitoring Distributed Systems
Monitoring Distributed SystemsMonitoring Distributed Systems
Monitoring Distributed SystemsAleksandr Tavgen
 
Unified Monitoring Webinar with Dustin Whittle
Unified Monitoring Webinar with Dustin WhittleUnified Monitoring Webinar with Dustin Whittle
Unified Monitoring Webinar with Dustin WhittleAppDynamics
 
Why Use Open Source to Gain More Visibility into Network Monitoring
Why Use Open Source to Gain More Visibility into Network MonitoringWhy Use Open Source to Gain More Visibility into Network Monitoring
Why Use Open Source to Gain More Visibility into Network MonitoringDevOps.com
 
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSylvain Kalache
 
Adventures in Observability - Clickhouse and Instana
Adventures in Observability - Clickhouse and InstanaAdventures in Observability - Clickhouse and Instana
Adventures in Observability - Clickhouse and InstanaMarcel Birkner
 
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
 Adventures in Observability: How in-house ClickHouse deployment enabled Inst... Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...Altinity Ltd
 
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...AgileNetwork
 
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...IRJET Journal
 
OnTune suggestion for value_2012
OnTune suggestion for value_2012OnTune suggestion for value_2012
OnTune suggestion for value_2012Austin Lee
 
Observability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptxObservability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptxOpsTree solutions
 
Product and sevices management system
Product and sevices management systemProduct and sevices management system
Product and sevices management systemVinod Gurram
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
 
A practical look at how to build & run IoT business logic
A practical look at how to build & run IoT business logicA practical look at how to build & run IoT business logic
A practical look at how to build & run IoT business logicVeselin Pizurica
 
Never Lose Data Again: Robust Integrations With MuleSoft
Never Lose Data Again: Robust Integrations With MuleSoftNever Lose Data Again: Robust Integrations With MuleSoft
Never Lose Data Again: Robust Integrations With MuleSoftAaronLieberman5
 

Similar to EH Monitoring System Overview (20)

PreMonR - A Reactive Platform To Monitor Reactive Application
PreMonR - A Reactive Platform To Monitor Reactive ApplicationPreMonR - A Reactive Platform To Monitor Reactive Application
PreMonR - A Reactive Platform To Monitor Reactive Application
 
Go Observability (in practice)
Go Observability (in practice)Go Observability (in practice)
Go Observability (in practice)
 
Monitoring Distributed Systems
Monitoring Distributed SystemsMonitoring Distributed Systems
Monitoring Distributed Systems
 
Unified Monitoring Webinar with Dustin Whittle
Unified Monitoring Webinar with Dustin WhittleUnified Monitoring Webinar with Dustin Whittle
Unified Monitoring Webinar with Dustin Whittle
 
Why Use Open Source to Gain More Visibility into Network Monitoring
Why Use Open Source to Gain More Visibility into Network MonitoringWhy Use Open Source to Gain More Visibility into Network Monitoring
Why Use Open Source to Gain More Visibility into Network Monitoring
 
onTune the differences
onTune the differencesonTune the differences
onTune the differences
 
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
 
Adventures in Observability - Clickhouse and Instana
Adventures in Observability - Clickhouse and InstanaAdventures in Observability - Clickhouse and Instana
Adventures in Observability - Clickhouse and Instana
 
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
 Adventures in Observability: How in-house ClickHouse deployment enabled Inst... Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
 
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
 
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
 
Hari proposal
Hari proposalHari proposal
Hari proposal
 
OnTune suggestion for value_2012
OnTune suggestion for value_2012OnTune suggestion for value_2012
OnTune suggestion for value_2012
 
Observability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptxObservability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptx
 
IDEA.pptx
IDEA.pptxIDEA.pptx
IDEA.pptx
 
Product and sevices management system
Product and sevices management systemProduct and sevices management system
Product and sevices management system
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
A practical look at how to build & run IoT business logic
A practical look at how to build & run IoT business logicA practical look at how to build & run IoT business logic
A practical look at how to build & run IoT business logic
 
Data automation 101
Data automation 101Data automation 101
Data automation 101
 
Never Lose Data Again: Robust Integrations With MuleSoft
Never Lose Data Again: Robust Integrations With MuleSoftNever Lose Data Again: Robust Integrations With MuleSoft
Never Lose Data Again: Robust Integrations With MuleSoft
 

More from Luong Vo

Skeleton-based Human Action Recognition with Recurrent Neural Network
Skeleton-based Human Action Recognition with Recurrent Neural NetworkSkeleton-based Human Action Recognition with Recurrent Neural Network
Skeleton-based Human Action Recognition with Recurrent Neural NetworkLuong Vo
 
Introduction to Ruby threads
Introduction to Ruby threadsIntroduction to Ruby threads
Introduction to Ruby threadsLuong Vo
 
Why our platform needs Redis Sentinel
Why our platform needs Redis SentinelWhy our platform needs Redis Sentinel
Why our platform needs Redis SentinelLuong Vo
 
Multiple sandboxes environment for parallel team deployment
Multiple sandboxes environment for parallel team deploymentMultiple sandboxes environment for parallel team deployment
Multiple sandboxes environment for parallel team deploymentLuong Vo
 
Facebook Product School Final Product Pitch: Lalaland
Facebook Product School Final Product Pitch: LalalandFacebook Product School Final Product Pitch: Lalaland
Facebook Product School Final Product Pitch: LalalandLuong Vo
 
State of JSON Web Tokens at Employment Hero
State of JSON Web Tokens at Employment HeroState of JSON Web Tokens at Employment Hero
State of JSON Web Tokens at Employment HeroLuong Vo
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to DockerLuong Vo
 
Migration from Heroku to Amazon Web Services
Migration from Heroku to Amazon Web ServicesMigration from Heroku to Amazon Web Services
Migration from Heroku to Amazon Web ServicesLuong Vo
 
Caching with Ruby
Caching with RubyCaching with Ruby
Caching with RubyLuong Vo
 
Performance Management at Employment Hero
Performance Management at Employment Hero Performance Management at Employment Hero
Performance Management at Employment Hero Luong Vo
 

More from Luong Vo (10)

Skeleton-based Human Action Recognition with Recurrent Neural Network
Skeleton-based Human Action Recognition with Recurrent Neural NetworkSkeleton-based Human Action Recognition with Recurrent Neural Network
Skeleton-based Human Action Recognition with Recurrent Neural Network
 
Introduction to Ruby threads
Introduction to Ruby threadsIntroduction to Ruby threads
Introduction to Ruby threads
 
Why our platform needs Redis Sentinel
Why our platform needs Redis SentinelWhy our platform needs Redis Sentinel
Why our platform needs Redis Sentinel
 
Multiple sandboxes environment for parallel team deployment
Multiple sandboxes environment for parallel team deploymentMultiple sandboxes environment for parallel team deployment
Multiple sandboxes environment for parallel team deployment
 
Facebook Product School Final Product Pitch: Lalaland
Facebook Product School Final Product Pitch: LalalandFacebook Product School Final Product Pitch: Lalaland
Facebook Product School Final Product Pitch: Lalaland
 
State of JSON Web Tokens at Employment Hero
State of JSON Web Tokens at Employment HeroState of JSON Web Tokens at Employment Hero
State of JSON Web Tokens at Employment Hero
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to Docker
 
Migration from Heroku to Amazon Web Services
Migration from Heroku to Amazon Web ServicesMigration from Heroku to Amazon Web Services
Migration from Heroku to Amazon Web Services
 
Caching with Ruby
Caching with RubyCaching with Ruby
Caching with Ruby
 
Performance Management at Employment Hero
Performance Management at Employment Hero Performance Management at Employment Hero
Performance Management at Employment Hero
 

Recently uploaded

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

EH Monitoring System Overview

  • 1. EH Monitoring System Engineering Team Minh Nguyen & Luong Vo
  • 2. Before we start - How to answer those questions? + Why is the system too slow? + Does everything work fine? + What’s the main bottleneck of our system? + What did happen at 10:00 AM this morning that made a lot of customers complain? + What’s the average time the user has to wait until they get the notification? + etc.
  • 3. In short, we built a system successfully. BUT WE HAVE NO IDEA HOW IT PERFORMS.
  • 4. Observability - Programmatically and continuously capture the states of a running system - Analyze and extract the information to produce a set of knowledge that the observer is interested in - Detect the abnormal behaviors and notify the responsible, and automatically take actions to resolve the situation - Archive the data in convenient forms that support future investigation or analyzing
  • 5. Pillars of Observability Log Management Distributed TracingMetrics Monitoring Error Tracking
  • 7. We need a solution that offers - Detailed (both real-time and aggregated) statistics about our microservices. - Alerting when usage peeks or accidents happen. - Easy method to implement for our microservices. - Supports a variety of ways to keep data. (counter, gauge, histogram ….) - Two-way integration with Kubernetes
  • 9. Prometheus and Grafana - Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. - Grafana is is an open source dashboard tool for data visualization. - They are our selected approach to extract/collect and display monitored data.
  • 10. Node 1 Push Model Application Node 3 Metrics collector Node 2 Application POST /metrics POST /metrics
  • 11. Node 1 Pull Model Application Node 3 Metrics collector Node 2 Application GET /metrics GET /metrics
  • 12. Node 1 Pull Model and Sidecar Model Application Node 3 Metrics collector Node 2 GET /metrics GET /metrics Metric Server /tmp/monitoring Application Metric Server /tmp/monitoring
  • 13. - This gem helps you monitor your service with ease. - It abstracts away many infrastructural layer via a lot of helpers - Built-in native supports for gRPC, Kafka, Sidekiq (soon) EhMonitoring gem
  • 14. Service owners are responsible for their children
  • 15. What’s next? - Support other common libraries, like Sidekiq - Apply EhMonitoring to all services - Dump Instana and create our own Tracing system
  • 16. Reference https://github.com/Thinkei/feature-flag-api/pull/81 - Add metrics to feature flag API. https://docs.google.com/document/d/1-wjTM600u5Q68ImhHHA2DTtlh8wX5mc9Xv5 EEawFNFI/edit - Employment Hero microservices documents. https://github.com/Thinkei/eh-monitoring - EH monitoring gem http://monitor.staging.ehrocks.com/ - Our monitoring page.