SlideShare a Scribd company logo
1 of 81
Download to read offline
3 types of
monitoring
for 2020
T. Alexander Lystad
Chief Cloud Architect
Visma Enterprise PD
Two major reasons for monitoring
● Reliability
○ Preventing, detecting and resolving incidents
● Continuous Delivery
○ Building the right thing
Monitoring as part of development
● Refinement
○ Who do you expect will use
the feature?
○ How do you expect the
feature will be used?
○ Performance
requirements/expectations?
○ Technical dependencies?
○ What monitoring do we
need?
Monitoring Monitoring Monitoring
Monitoring as part of development
● Implementation
● Monitoring
○ 1st/2nd test env
■ Functional testing
● Errors?
■ Performance testing
● As expected?
○ Production
■ Validate expectations
■ Learn
Monitoring Monitoring Monitoring
What monitoring do we need?
What should we monitor (and alert on)?
1. General availability/health
2. Performance and errors
3. Analytics
1. Can we process requests?
2. Quickly and successfully?
3. Are we achieving our goals? Are
the customers achieving theirs?
General availability/health
Can we process requests?
Availability
● “99.8%”
● Traditional definition
○ Server/OS availability
○ Network availability: Users can reach the cloud service
● Customer definition
○ Functional availability: The cloud service “works”
● Our definition
○ Users can reach the cloud service, and critical components and dependencies are
healthy
● How can we monitor this?
Heartbeat monitoring
● Checking availability/health frequently
1) Synthetic monitoring
Network
availability
confirmed
1) Synthetic monitoring - Configuration
1) Synthetic monitoring - Configuration
1) Synthetic monitoring - Configuration
1) Synthetic monitoring - Configuration
1) Synthetic monitoring - Configuration
1) Synthetic monitoring - Configuration
Results
1) Synthetic monitoring - Results
1) Synthetic monitoring - Results
1) Synthetic monitoring - Summary
● Quick and easy to set up and use
● 5 lines of Python will be required if you need to authenticate
● Only checks one webpage; doesn’t reflect health of the whole system
● Fragile; just looks for HTTP 200 (unless you use more scripting)
● Can only run every 5 minutes
2) Smarter heartbeat monitoring
● “Users can reach the cloud service, and critical components and
dependencies are healthy”
● What are critical components and dependencies?
○ Database? → Critical!
○ Authorization service? → Critical!
○ Background processing job → ?
○ Zip code lookup service → Not critical
2) Smarter heartbeat monitoring
● How do we know if they are healthy?
○ Database
■ Connect
■ SELECT 1
■ SELECT id FROM table LIMIT 1
Suitable for heartbeat
2) Smarter heartbeat monitoring - Endpoint
http://myservice.com/heartbeats/availability
2) Smarter heartbeat monitoring
● How do we know if they are healthy?
○ Authorization service
■ Make synchronous request?
■ Log and check last successful call, ping only if necessary
○ Background processing job (e.g. calculating wagerun, generating report)
■ Log and check last successful run, trigger test payload if necessary
■ If we expect test payload to process fast, wait for it before returning OK
■ If not, return OK optimistically, then NOT OK on later calls if test payload has
timed out
2) Heartbeat monitoring - Architecture
2) Heartbeat monitoring - Architecture
2) Heartbeat monitoring - AWS Health Check
2) Heartbeat monitoring - AWS Health Check
2) Heartbeat monitoring - Architecture
2) Heartbeat monitoring - Architecture
2
15
1
Now we can use this metric in dashboards and alerts!
PDAvailabilityDashboard
Availability alerting!
But we will look at alerting later
Synthetic monitoring with AppDynamics Heartbeat monitoring with AWS + AppDynamics
Maximum frequency is every 5 minutes Maximum frequency is every 10 seconds
From 3 locations From 8 locations
$123 per year $81 per year
Quick and easy to get started Some design and implementation effort
Superficial health assessment (network
avail.)
Holistic health assessment (functional avail.)
Heartbeat monitoring - Summary
Takeaways
1. Define availability for your service (may change over time!)
2. Implement holistic heartbeat monitoring (starting simple is OK)
3. Configure alerts (incident detection)
4. Configure dashboards (for reporting/analysis/improvement)
2) Performance and errors
Quickly and successfully?
What’s (most) important?
Business Transactions
Business Transactions
● Examples for Visma.net HRM Employee Management
○ Registering a new employee
○ Saving changes to an employee
○ Getting data for an employee
● Examples for Visma.net HRM Payroll
○ Calculating a wagerun (for an organization)
○ Generating a bank payment file for a wagerun
● Defined by URL pattern, or method in application
code (doesn’t have to be web-based or user-facing)
●
● POST /api/employees
● PUT /api/employees/<id>
● GET /api/employees/<id>
●
● WageRunManager.RunForOrg
● GenerateWageRunPayslips.HandleEvent
Example App
New feature: Rejecting claim
Performance and errors
Refinement v1
● Today, claims can only be deleted by managers
● Managers and Payroll Administrators should be able to reject a claim, which sends it back to
the employee
● Monitoring
○ No changes to availability monitoring
○ Add monitoring for performance and errors
Example App
Feature dashboard v1
Alerting - Config
Alerting - Config
Alerting - Config
Alerting - Config
Alerting - Notification options
1. Email
2. HTTP (e.g. Slack, OpsGenie, …)
No alerts!
Can we still find something to improve?
Example App - Performance Problem
Example App - Performance Problem
Example App - Performance Problem
Example App - Performance Problem
Example App - Performance Problem
Performance problem fixed!
But… remember this?
Example App - Error Problem
Example App - Error Problem
Example App - Error Problem
Error fixed!
Quickly and successfully
Takeaways
1. Identify critical business transactions
1.1. Start small, but then continuously!
2. Configure alerts (for anomaly detection)
2.1. Consider response time and error rate
2.2. Don’t send a critical alert unless human action is required
2.3. Discussing alerts in Slack 💕
3. Configure dashboards (for reporting/analysis/improvement)
3.1. Look at “top 10 lists” to identify possible quick wins
3) Analytics
1. Are we achieving our goals?
Are customers achieving theirs?
Identify goals and relevant metrics
● Visma-oriented
○ Goal: Become the leader in the Danish market
■ Metric: Number of payslips generated per month (for DK customers)
○ Goal: Increase cross-sales
■ Metric: Number of customers who activate the invoicing module
● Customer-oriented
○ Goal: Schools want to enable efficient communication with parents
■ Metric: Messages sent, by user role
■ Metric: Inbox size, by user role
○ Goal: Enterprises want an efficient expense management process
■ Metric: Rejected expenses, by reason
■ Metric: Rejected expenses, by industry
New feature: Rejecting claim
Analytics
Refinement v2
● Today, claims can only be deleted by managers
● Managers and Payroll Administrators should be able to reject a claim, which sends it back to
the employee
● Assumptions
○ ~60% of rejections will be by managers, ~40% will be by Payroll Administrators
○ Rejections by managers will often be done on a mobile device, while PAs use PCs
○ Most common reason for rejection will be incorrect or insufficient documentation
● Monitoring
○ No changes to availability monitoring
○ Add monitoring for performance and errors
○ Add analytics: Rejections by role, rejections by device, rejections by reason
Feature dashboard v2
INSUFFICIENT_DOCS
INCORRECT_ACCOUNT
UNAUTHORIZED_SPEND
450
331
123
Administrator
Manager
New requirement: Currency!
Any changes in monitoring?
Refinement
● Claims must have a new mandatory currency field
● Assumptions
○ 95% of claims will use NOK, SEK, DKK, EUR, USD
○ Currency support will not affect how many claims are created/approved/rejected/paid
● Monitoring
○ Changes to availability monitoring?
■ Yes, we depend on 3rd party for exchange rates (but maybe not main heartbeat?)
○ Add performance and/or error monitoring?
■ Payment errors by currency could be interesting
○ Add analytics
■ New claims by currency
■ Approved claims by currency
■ Rejected claims by currency
■ Paid claims by currency
Feature dashboard v3
Manager
Administrator
INSUFFICIENT_DOCS
OTHER_REASON
450
331
INCORRECT_CURRENCY 3975
Takeaways
1. Identify Visma and customer-oriented goals as part of development
2. Monitor those goals to achieve them
3. Identify relevant assumptions as part of refinement
4. Monitor those assumptions, and use data to decide what to do next
Rounding off
Wake up!
Focus on capabilities
● SaaS Compliance Requirements + ArchTech Maturity Index
● Ability to monitor, and alert on
○ general availability and service health over time
○ performance of backend transactions
○ backend transaction errors
○ end-to-end performance of loading web pages and route changes
○ frontend errors
○ business metrics (number of users, number of certain actions, use
of functionality, etc.)
Thank you!
Respect
Reliability
Innovation
Competence
Team spirit

More Related Content

Similar to 3 types of monitoring for 2020

Core Areas of a CA- Interlinked with computers
Core Areas of a CA- Interlinked with computersCore Areas of a CA- Interlinked with computers
Core Areas of a CA- Interlinked with computersShikha Gupta
 
People Metrics: How to Use Team Data to Produce Positive Change
People Metrics: How to Use Team Data to Produce Positive ChangePeople Metrics: How to Use Team Data to Produce Positive Change
People Metrics: How to Use Team Data to Produce Positive ChangeAmin Astaneh
 
MuleSoft Manchester Meetup #2 slides 29th October 2019
MuleSoft Manchester Meetup #2 slides 29th October 2019MuleSoft Manchester Meetup #2 slides 29th October 2019
MuleSoft Manchester Meetup #2 slides 29th October 2019Ieva Navickaite
 
PBF Electronic or Electrials shop MS.pptx
PBF Electronic or Electrials shop MS.pptxPBF Electronic or Electrials shop MS.pptx
PBF Electronic or Electrials shop MS.pptxisraelidropadjah
 
Service Levels and Error Budgets - Paweł Kucharski
Service Levels and Error Budgets - Paweł KucharskiService Levels and Error Budgets - Paweł Kucharski
Service Levels and Error Budgets - Paweł KucharskiPROIDEA
 
Analytics measurement plan [step by-step + free template]
Analytics measurement plan  [step by-step + free template]Analytics measurement plan  [step by-step + free template]
Analytics measurement plan [step by-step + free template]Magda Baciu
 
Indix Engineering Culture Code (2015)
Indix Engineering Culture Code (2015)Indix Engineering Culture Code (2015)
Indix Engineering Culture Code (2015)Rajesh Muppalla
 
Follow the evidence: Troubleshooting Performance Issues
Follow the evidence:  Troubleshooting Performance IssuesFollow the evidence:  Troubleshooting Performance Issues
Follow the evidence: Troubleshooting Performance IssuesSalesforce Developers
 
Accounting System Design and Development-Internal Controls
Accounting System Design and Development-Internal ControlsAccounting System Design and Development-Internal Controls
Accounting System Design and Development-Internal ControlsHelpWithAssignment.com
 
iDatix Workshop: PEX Week Part 2
iDatix Workshop: PEX Week Part 2iDatix Workshop: PEX Week Part 2
iDatix Workshop: PEX Week Part 2iDatix
 
Data driven @startups
Data driven @startups Data driven @startups
Data driven @startups IIMBNSRCEL
 
The How, Why and What of Metrics?
The How, Why and What of Metrics?The How, Why and What of Metrics?
The How, Why and What of Metrics?The Wisdom Daily
 
Big data and other buzzwords
Big data and other buzzwordsBig data and other buzzwords
Big data and other buzzwordsAndrew Clark
 
AppDynamics User Group
AppDynamics User GroupAppDynamics User Group
AppDynamics User GroupMike Ruangutai
 
SplunkLive! Zurich 2018: MARVES GmbH
SplunkLive! Zurich 2018: MARVES GmbHSplunkLive! Zurich 2018: MARVES GmbH
SplunkLive! Zurich 2018: MARVES GmbHSplunk
 
Driving Service Ownership with Distributed Tracing
Driving Service Ownership with Distributed TracingDriving Service Ownership with Distributed Tracing
Driving Service Ownership with Distributed TracingDevOps.com
 
Process Improvement for Pabit Solutions
Process Improvement for Pabit SolutionsProcess Improvement for Pabit Solutions
Process Improvement for Pabit SolutionsSnehal Datta
 
Metrics that Matters in Software Engineering
Metrics that Matters in Software EngineeringMetrics that Matters in Software Engineering
Metrics that Matters in Software EngineeringPanji Gautama
 

Similar to 3 types of monitoring for 2020 (20)

Core Areas of a CA- Interlinked with computers
Core Areas of a CA- Interlinked with computersCore Areas of a CA- Interlinked with computers
Core Areas of a CA- Interlinked with computers
 
People Metrics: How to Use Team Data to Produce Positive Change
People Metrics: How to Use Team Data to Produce Positive ChangePeople Metrics: How to Use Team Data to Produce Positive Change
People Metrics: How to Use Team Data to Produce Positive Change
 
Accurate systems - ERP
Accurate systems - ERPAccurate systems - ERP
Accurate systems - ERP
 
MuleSoft Manchester Meetup #2 slides 29th October 2019
MuleSoft Manchester Meetup #2 slides 29th October 2019MuleSoft Manchester Meetup #2 slides 29th October 2019
MuleSoft Manchester Meetup #2 slides 29th October 2019
 
PBF Electronic or Electrials shop MS.pptx
PBF Electronic or Electrials shop MS.pptxPBF Electronic or Electrials shop MS.pptx
PBF Electronic or Electrials shop MS.pptx
 
Service Levels and Error Budgets - Paweł Kucharski
Service Levels and Error Budgets - Paweł KucharskiService Levels and Error Budgets - Paweł Kucharski
Service Levels and Error Budgets - Paweł Kucharski
 
Sea of Data
Sea of DataSea of Data
Sea of Data
 
Analytics measurement plan [step by-step + free template]
Analytics measurement plan  [step by-step + free template]Analytics measurement plan  [step by-step + free template]
Analytics measurement plan [step by-step + free template]
 
Indix Engineering Culture Code (2015)
Indix Engineering Culture Code (2015)Indix Engineering Culture Code (2015)
Indix Engineering Culture Code (2015)
 
Follow the evidence: Troubleshooting Performance Issues
Follow the evidence:  Troubleshooting Performance IssuesFollow the evidence:  Troubleshooting Performance Issues
Follow the evidence: Troubleshooting Performance Issues
 
Accounting System Design and Development-Internal Controls
Accounting System Design and Development-Internal ControlsAccounting System Design and Development-Internal Controls
Accounting System Design and Development-Internal Controls
 
iDatix Workshop: PEX Week Part 2
iDatix Workshop: PEX Week Part 2iDatix Workshop: PEX Week Part 2
iDatix Workshop: PEX Week Part 2
 
Data driven @startups
Data driven @startups Data driven @startups
Data driven @startups
 
The How, Why and What of Metrics?
The How, Why and What of Metrics?The How, Why and What of Metrics?
The How, Why and What of Metrics?
 
Big data and other buzzwords
Big data and other buzzwordsBig data and other buzzwords
Big data and other buzzwords
 
AppDynamics User Group
AppDynamics User GroupAppDynamics User Group
AppDynamics User Group
 
SplunkLive! Zurich 2018: MARVES GmbH
SplunkLive! Zurich 2018: MARVES GmbHSplunkLive! Zurich 2018: MARVES GmbH
SplunkLive! Zurich 2018: MARVES GmbH
 
Driving Service Ownership with Distributed Tracing
Driving Service Ownership with Distributed TracingDriving Service Ownership with Distributed Tracing
Driving Service Ownership with Distributed Tracing
 
Process Improvement for Pabit Solutions
Process Improvement for Pabit SolutionsProcess Improvement for Pabit Solutions
Process Improvement for Pabit Solutions
 
Metrics that Matters in Software Engineering
Metrics that Matters in Software EngineeringMetrics that Matters in Software Engineering
Metrics that Matters in Software Engineering
 

More from T. Alexander Lystad

Lichess.org: Serving 5 Million Chess Games a Day with 125 Volunteers and €5 D...
Lichess.org: Serving 5 Million Chess Games a Day with 125 Volunteers and €5 D...Lichess.org: Serving 5 Million Chess Games a Day with 125 Volunteers and €5 D...
Lichess.org: Serving 5 Million Chess Games a Day with 125 Volunteers and €5 D...T. Alexander Lystad
 
Best practices for running Windows workloads on AWS - AWS Summit Stockholm (M...
Best practices for running Windows workloads on AWS - AWS Summit Stockholm (M...Best practices for running Windows workloads on AWS - AWS Summit Stockholm (M...
Best practices for running Windows workloads on AWS - AWS Summit Stockholm (M...T. Alexander Lystad
 
AWS in Visma 2015-2018: Lessons Learned
AWS in Visma 2015-2018: Lessons LearnedAWS in Visma 2015-2018: Lessons Learned
AWS in Visma 2015-2018: Lessons LearnedT. Alexander Lystad
 
Visma Cloud Delivery Model - 3 years and 40 teams later (DevOpsDays Oslo 2018)
Visma Cloud Delivery Model - 3 years and 40 teams later (DevOpsDays Oslo 2018)Visma Cloud Delivery Model - 3 years and 40 teams later (DevOpsDays Oslo 2018)
Visma Cloud Delivery Model - 3 years and 40 teams later (DevOpsDays Oslo 2018)T. Alexander Lystad
 

More from T. Alexander Lystad (7)

Lichess.org: Serving 5 Million Chess Games a Day with 125 Volunteers and €5 D...
Lichess.org: Serving 5 Million Chess Games a Day with 125 Volunteers and €5 D...Lichess.org: Serving 5 Million Chess Games a Day with 125 Volunteers and €5 D...
Lichess.org: Serving 5 Million Chess Games a Day with 125 Volunteers and €5 D...
 
Best practices for running Windows workloads on AWS - AWS Summit Stockholm (M...
Best practices for running Windows workloads on AWS - AWS Summit Stockholm (M...Best practices for running Windows workloads on AWS - AWS Summit Stockholm (M...
Best practices for running Windows workloads on AWS - AWS Summit Stockholm (M...
 
AWS in Visma 2015-2018: Lessons Learned
AWS in Visma 2015-2018: Lessons LearnedAWS in Visma 2015-2018: Lessons Learned
AWS in Visma 2015-2018: Lessons Learned
 
Visma Cloud Delivery Model - 3 years and 40 teams later (DevOpsDays Oslo 2018)
Visma Cloud Delivery Model - 3 years and 40 teams later (DevOpsDays Oslo 2018)Visma Cloud Delivery Model - 3 years and 40 teams later (DevOpsDays Oslo 2018)
Visma Cloud Delivery Model - 3 years and 40 teams later (DevOpsDays Oslo 2018)
 
Feature toggling
Feature togglingFeature toggling
Feature toggling
 
Test Automation Pyramid
Test Automation PyramidTest Automation Pyramid
Test Automation Pyramid
 
Agility in 2016
Agility in 2016Agility in 2016
Agility in 2016
 

Recently uploaded

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Recently uploaded (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

3 types of monitoring for 2020

  • 1. 3 types of monitoring for 2020 T. Alexander Lystad Chief Cloud Architect Visma Enterprise PD
  • 2. Two major reasons for monitoring ● Reliability ○ Preventing, detecting and resolving incidents ● Continuous Delivery ○ Building the right thing
  • 3. Monitoring as part of development ● Refinement ○ Who do you expect will use the feature? ○ How do you expect the feature will be used? ○ Performance requirements/expectations? ○ Technical dependencies? ○ What monitoring do we need? Monitoring Monitoring Monitoring
  • 4. Monitoring as part of development ● Implementation ● Monitoring ○ 1st/2nd test env ■ Functional testing ● Errors? ■ Performance testing ● As expected? ○ Production ■ Validate expectations ■ Learn Monitoring Monitoring Monitoring
  • 5.
  • 7. What should we monitor (and alert on)? 1. General availability/health 2. Performance and errors 3. Analytics 1. Can we process requests? 2. Quickly and successfully? 3. Are we achieving our goals? Are the customers achieving theirs?
  • 9. Availability ● “99.8%” ● Traditional definition ○ Server/OS availability ○ Network availability: Users can reach the cloud service ● Customer definition ○ Functional availability: The cloud service “works” ● Our definition ○ Users can reach the cloud service, and critical components and dependencies are healthy ● How can we monitor this?
  • 10. Heartbeat monitoring ● Checking availability/health frequently
  • 12. 1) Synthetic monitoring - Configuration
  • 13. 1) Synthetic monitoring - Configuration
  • 14. 1) Synthetic monitoring - Configuration
  • 15. 1) Synthetic monitoring - Configuration
  • 16. 1) Synthetic monitoring - Configuration
  • 17. 1) Synthetic monitoring - Configuration
  • 21. 1) Synthetic monitoring - Summary ● Quick and easy to set up and use ● 5 lines of Python will be required if you need to authenticate ● Only checks one webpage; doesn’t reflect health of the whole system ● Fragile; just looks for HTTP 200 (unless you use more scripting) ● Can only run every 5 minutes
  • 22. 2) Smarter heartbeat monitoring ● “Users can reach the cloud service, and critical components and dependencies are healthy” ● What are critical components and dependencies? ○ Database? → Critical! ○ Authorization service? → Critical! ○ Background processing job → ? ○ Zip code lookup service → Not critical
  • 23. 2) Smarter heartbeat monitoring ● How do we know if they are healthy? ○ Database ■ Connect ■ SELECT 1 ■ SELECT id FROM table LIMIT 1 Suitable for heartbeat
  • 24. 2) Smarter heartbeat monitoring - Endpoint http://myservice.com/heartbeats/availability
  • 25. 2) Smarter heartbeat monitoring ● How do we know if they are healthy? ○ Authorization service ■ Make synchronous request? ■ Log and check last successful call, ping only if necessary ○ Background processing job (e.g. calculating wagerun, generating report) ■ Log and check last successful run, trigger test payload if necessary ■ If we expect test payload to process fast, wait for it before returning OK ■ If not, return OK optimistically, then NOT OK on later calls if test payload has timed out
  • 26. 2) Heartbeat monitoring - Architecture
  • 27. 2) Heartbeat monitoring - Architecture
  • 28. 2) Heartbeat monitoring - AWS Health Check
  • 29. 2) Heartbeat monitoring - AWS Health Check
  • 30. 2) Heartbeat monitoring - Architecture
  • 31. 2) Heartbeat monitoring - Architecture
  • 32.
  • 34. Now we can use this metric in dashboards and alerts!
  • 36.
  • 37. Availability alerting! But we will look at alerting later
  • 38. Synthetic monitoring with AppDynamics Heartbeat monitoring with AWS + AppDynamics Maximum frequency is every 5 minutes Maximum frequency is every 10 seconds From 3 locations From 8 locations $123 per year $81 per year Quick and easy to get started Some design and implementation effort Superficial health assessment (network avail.) Holistic health assessment (functional avail.) Heartbeat monitoring - Summary
  • 39. Takeaways 1. Define availability for your service (may change over time!) 2. Implement holistic heartbeat monitoring (starting simple is OK) 3. Configure alerts (incident detection) 4. Configure dashboards (for reporting/analysis/improvement)
  • 40. 2) Performance and errors Quickly and successfully?
  • 41.
  • 43. Business Transactions ● Examples for Visma.net HRM Employee Management ○ Registering a new employee ○ Saving changes to an employee ○ Getting data for an employee ● Examples for Visma.net HRM Payroll ○ Calculating a wagerun (for an organization) ○ Generating a bank payment file for a wagerun ● Defined by URL pattern, or method in application code (doesn’t have to be web-based or user-facing) ● ● POST /api/employees ● PUT /api/employees/<id> ● GET /api/employees/<id> ● ● WageRunManager.RunForOrg ● GenerateWageRunPayslips.HandleEvent
  • 45. New feature: Rejecting claim Performance and errors
  • 46. Refinement v1 ● Today, claims can only be deleted by managers ● Managers and Payroll Administrators should be able to reject a claim, which sends it back to the employee ● Monitoring ○ No changes to availability monitoring ○ Add monitoring for performance and errors
  • 53. Alerting - Notification options 1. Email 2. HTTP (e.g. Slack, OpsGenie, …)
  • 54. No alerts! Can we still find something to improve?
  • 55. Example App - Performance Problem
  • 56. Example App - Performance Problem
  • 57. Example App - Performance Problem
  • 58. Example App - Performance Problem
  • 59. Example App - Performance Problem
  • 61. Example App - Error Problem
  • 62. Example App - Error Problem
  • 63. Example App - Error Problem
  • 64. Error fixed! Quickly and successfully
  • 65. Takeaways 1. Identify critical business transactions 1.1. Start small, but then continuously! 2. Configure alerts (for anomaly detection) 2.1. Consider response time and error rate 2.2. Don’t send a critical alert unless human action is required 2.3. Discussing alerts in Slack 💕 3. Configure dashboards (for reporting/analysis/improvement) 3.1. Look at “top 10 lists” to identify possible quick wins
  • 66. 3) Analytics 1. Are we achieving our goals? Are customers achieving theirs?
  • 67. Identify goals and relevant metrics ● Visma-oriented ○ Goal: Become the leader in the Danish market ■ Metric: Number of payslips generated per month (for DK customers) ○ Goal: Increase cross-sales ■ Metric: Number of customers who activate the invoicing module ● Customer-oriented ○ Goal: Schools want to enable efficient communication with parents ■ Metric: Messages sent, by user role ■ Metric: Inbox size, by user role ○ Goal: Enterprises want an efficient expense management process ■ Metric: Rejected expenses, by reason ■ Metric: Rejected expenses, by industry
  • 68. New feature: Rejecting claim Analytics
  • 69. Refinement v2 ● Today, claims can only be deleted by managers ● Managers and Payroll Administrators should be able to reject a claim, which sends it back to the employee ● Assumptions ○ ~60% of rejections will be by managers, ~40% will be by Payroll Administrators ○ Rejections by managers will often be done on a mobile device, while PAs use PCs ○ Most common reason for rejection will be incorrect or insufficient documentation ● Monitoring ○ No changes to availability monitoring ○ Add monitoring for performance and errors ○ Add analytics: Rejections by role, rejections by device, rejections by reason
  • 71.
  • 72. New requirement: Currency! Any changes in monitoring?
  • 73. Refinement ● Claims must have a new mandatory currency field ● Assumptions ○ 95% of claims will use NOK, SEK, DKK, EUR, USD ○ Currency support will not affect how many claims are created/approved/rejected/paid ● Monitoring ○ Changes to availability monitoring? ■ Yes, we depend on 3rd party for exchange rates (but maybe not main heartbeat?) ○ Add performance and/or error monitoring? ■ Payment errors by currency could be interesting ○ Add analytics ■ New claims by currency ■ Approved claims by currency ■ Rejected claims by currency ■ Paid claims by currency
  • 74.
  • 76.
  • 77. Takeaways 1. Identify Visma and customer-oriented goals as part of development 2. Monitor those goals to achieve them 3. Identify relevant assumptions as part of refinement 4. Monitor those assumptions, and use data to decide what to do next
  • 79. Focus on capabilities ● SaaS Compliance Requirements + ArchTech Maturity Index ● Ability to monitor, and alert on ○ general availability and service health over time ○ performance of backend transactions ○ backend transaction errors ○ end-to-end performance of loading web pages and route changes ○ frontend errors ○ business metrics (number of users, number of certain actions, use of functionality, etc.)