SlideShare a Scribd company logo
1 of 40
Download to read offline
Practical solutions
to detecting bugs
Karl Norling @karlnorling
I’m Karl Norling
2
• Swedish, moved to U.S ~13 years ago
• Spends most of my time in Brooklyn with my family
• Lead software engineer at Quartet
Quartet
3
• Healthcare technology company focused on improving
Behavioral Healthcare
• 1 in 4 Americans experienced a mental disorder in the last
year; most were moderate to severe
• Behavioral Health side is often ignored, leading to poor
outcomes (medication non-adherence & ER visits)
Quartet
4
• Quartet delivers scalable behavioral health integration for
our partners, leading to better patient care and cost
savings
• Quartet’s product is a marriage of our three pillars, Data &
Analytics, Collaborative Platform, Engagement and
Support.
Engineering at Quartet
5
• Work with highly sensitive and regulated information that
demands high reliability (PHI/HIPAA)
• Develop for very distinct users with very different
challenges (BHP, PCP, Patients, Quartet Users)
• Need to deliver a robust solution
How do you know
something is wrong?
Catching errors
How do I know that’s something is
wrong?
Identify critical ‘this should never 

happen, but if’ — and log them
if (user.isAuthenticated) {
…
} else {
this.logger.log(‘warn’, ‘Not
authorized user trying to
access ..’, {
user,
});

}
How do I know that’s something is
wrong?
Wrap code in try catch statements try {
const patient = new
Patient(metrics, logger);



patient.hydrate(rawPatient);
} catch (error) {

this.logger.log(‘error’,’Failed
to hydrated patient’, {
patientId: rawPatient.id
});
}
How do I know that’s something is
wrong?
Measure events with metrics const res =
request.authenticate(email, pw);

if (res.status === 200) {
this.metrics(‘user_login_success’
);
} else if (res.status === 403) {
this.metrics(‘user_login_unathori
zed’);
} else if (res.status >= 500) {
this.metrics(‘user_login_error’);
}
How do I know that’s something is
wrong?
11
Measure everything
• Environment alerts: CPU usage, disk space etc.
• External reporting: Customer, employee reports issue.
Create alerts
Create alerts
13
Create search queries in logging software 

(e.g. kibana, sumologic, splunk)

that will alert on specific log message, level, or threshold.
At Quartet we’re using elastalert.
Create alerts
14
Metrics are good to use to detect trends.
Example:
If we haven’t had any logins in the last 24 hours,
it’s time to investigate.
Create alerts
15
There should be a way for employees and customers to report
issues — either from the website or via email address.
Example:
Employee using internal tool cannot change shipping address for
an order.
Organize alerts
Organize alerts
17
Add tags to log messages.
Then, search queries are easier to group, delegate, and report
upon.
Organize alerts
18
Define a naming convention system for your tags.
Either prefix them with functional areas or team names.

Organize alerts
19
Alerts should create tickets.
When an alert gets triggered, a ticket should be generated in
whichever tool being used to track work in (i.e., JIRA).
Tickets should be created within the project associated with the
team that owns the service.
Communicate
Communicate
21
Choose the right tool for communicating the alert to the person
on call (e.g. Slack, Hipchat, email, JIRA).
At Quartet we’re using Slack.
Communicate
22
Make sure the tool can be configured to send alerts via different
channels depending on the alert, so the correct team, on-call
person sees it.
At Quartet we’re using PagerDuty.
What happens next?
On-call
acknowledges the
issue
Who is on-call
25
On-call is the employee that’s responding to alerts. Other terms
might be red-hot, on-duty, etc.
On-call acknowledges the issue
26
On-call schedule should be created, rotating weekly (depending
on # of employees).
You may also have a secondary on-call, in case primary is
unavailable (i.e., on subway ride home).
At Quartet we have one for app devs, core, and infra.
On-call acknowledges the issue
27
On-call acknowledges the issue
28
Primary on-call receives alert, acknowledges through same
channel within defined range of time.
If time expires, issue is bumped to alert secondary.
Make sure to set time range that makes sense for your
organization.
At Quartet, we use 15 minutes.
How to respond to
the issue
How to respond to the issue
30
Alerts need to be investigated to determine how urgently they
need to be addressed.
For critical issues, on-call should be empowered to reach out and
involve owner of code causing issue even if it’s after-hours.
At Quartet, alerts create tickets automatically. For all issues, on-
call will make sure tickets are assigned to the right team.
How to respond to the issue
31
You need to define a process for marking issues resolved that
makes sense for your organizational model.
It’s helpful if there’s a link to a handbook in the reported issue.
The handbook should contain steps for how to investigate and
possibly resolve the issue.
How to respond to the issue
32
If it’s an employee or customer filing the issue, there needs to be
an established process for communicating externally, i.e., internal
email or involving customer service.
Depending on SLA, it needs to happen within a timeframe.
On-call best practices
Establish process and guidelines
34
At Quartet, we have a doc that details our process and best
practices.
New employees shadow the primary on-calls for a month before
they get added to the rotation.
Example guidelines
35
• Each engineering team member will have the PagerDuty app
installed on their phone.
• If your PagerDuty schedule overlaps with planned vacation,
arrange a schedule override in PagerDuty.
• Each team is responsible for creating PagerDuty alerts for the
services that they are responsible for.
Ensure information is maintained
36
To facilitate a good continuum of the on-call schedule to the next
person, there should be a hand-off meeting.
The on-call is responsible for walking the next on-call through the
weekly report for the previous week.
Evolve
37
How did we get here?
Stop the noise - if an error happens over and over again, dedupe
it. Investigate why.
Downgrade - is the error actually an error or should we measure
via a metric.
On-call
38
On-call is dedicated 100% of their time to investigate bugs, this
makes sense where we’re at, shipping a lot of code. More code
generates more bugs naturally.
Tools
39
Tools ❤ Process
The tooling will not solve your issues, you have to have a process
how to use the tools.

“If you only have a hammer, everything looks like a nail”
Thanks

More Related Content

What's hot

IT Disaster Recovery Readiness (Maturity Assessement)
IT Disaster Recovery Readiness (Maturity Assessement) IT Disaster Recovery Readiness (Maturity Assessement)
IT Disaster Recovery Readiness (Maturity Assessement) Bashar Alkhatib
 
Kept up by Potential IT Disasters? Your Guide to Disaster Recovery as a Servi...
Kept up by Potential IT Disasters? Your Guide to Disaster Recovery as a Servi...Kept up by Potential IT Disasters? Your Guide to Disaster Recovery as a Servi...
Kept up by Potential IT Disasters? Your Guide to Disaster Recovery as a Servi...VAST
 
CAPRA BOGDAN-APPENDIX 3 JD&KPI (Key Performance Index) for Technical Engineer_v2
CAPRA BOGDAN-APPENDIX 3 JD&KPI (Key Performance Index) for Technical Engineer_v2CAPRA BOGDAN-APPENDIX 3 JD&KPI (Key Performance Index) for Technical Engineer_v2
CAPRA BOGDAN-APPENDIX 3 JD&KPI (Key Performance Index) for Technical Engineer_v2CAPRA BOGDAN IULIAN
 
Why People do not understand the P-F Curve
Why People do not understand the P-F CurveWhy People do not understand the P-F Curve
Why People do not understand the P-F CurveRicky Smith CMRP, CMRT
 
Corrective & Preventive Action
Corrective & Preventive Action Corrective & Preventive Action
Corrective & Preventive Action Praneet Surti
 
IT Disaster Recovery Plan
IT Disaster Recovery PlanIT Disaster Recovery Plan
IT Disaster Recovery PlanCallOneTel
 

What's hot (6)

IT Disaster Recovery Readiness (Maturity Assessement)
IT Disaster Recovery Readiness (Maturity Assessement) IT Disaster Recovery Readiness (Maturity Assessement)
IT Disaster Recovery Readiness (Maturity Assessement)
 
Kept up by Potential IT Disasters? Your Guide to Disaster Recovery as a Servi...
Kept up by Potential IT Disasters? Your Guide to Disaster Recovery as a Servi...Kept up by Potential IT Disasters? Your Guide to Disaster Recovery as a Servi...
Kept up by Potential IT Disasters? Your Guide to Disaster Recovery as a Servi...
 
CAPRA BOGDAN-APPENDIX 3 JD&KPI (Key Performance Index) for Technical Engineer_v2
CAPRA BOGDAN-APPENDIX 3 JD&KPI (Key Performance Index) for Technical Engineer_v2CAPRA BOGDAN-APPENDIX 3 JD&KPI (Key Performance Index) for Technical Engineer_v2
CAPRA BOGDAN-APPENDIX 3 JD&KPI (Key Performance Index) for Technical Engineer_v2
 
Why People do not understand the P-F Curve
Why People do not understand the P-F CurveWhy People do not understand the P-F Curve
Why People do not understand the P-F Curve
 
Corrective & Preventive Action
Corrective & Preventive Action Corrective & Preventive Action
Corrective & Preventive Action
 
IT Disaster Recovery Plan
IT Disaster Recovery PlanIT Disaster Recovery Plan
IT Disaster Recovery Plan
 

Viewers also liked

Intelligent Guides: Architecting Systems for Context-driven Interactions
Intelligent Guides: Architecting Systems for Context-driven InteractionsIntelligent Guides: Architecting Systems for Context-driven Interactions
Intelligent Guides: Architecting Systems for Context-driven InteractionsTim Caynes
 
11 flowers gifts which are perfect for allergy sufferers
11 flowers gifts which are perfect for allergy sufferers11 flowers gifts which are perfect for allergy sufferers
11 flowers gifts which are perfect for allergy sufferersCeline Wilson
 
SP.Matveev.IComp.Cover.AUG2016
SP.Matveev.IComp.Cover.AUG2016SP.Matveev.IComp.Cover.AUG2016
SP.Matveev.IComp.Cover.AUG2016Alex Matveev
 
Secret encoder ring
Secret encoder ringSecret encoder ring
Secret encoder ringToby Jaffey
 
Forum IA BX mars 2016 - Blade Runner
Forum IA BX mars 2016 - Blade RunnerForum IA BX mars 2016 - Blade Runner
Forum IA BX mars 2016 - Blade RunnerArmelle Gilliard
 
Pharmaceutical microbiology west coast
Pharmaceutical microbiology west coastPharmaceutical microbiology west coast
Pharmaceutical microbiology west coastAlia Malick
 
Prise en charge du lymphoedème en hospitalisation complète
Prise en charge du lymphoedème en hospitalisation complètePrise en charge du lymphoedème en hospitalisation complète
Prise en charge du lymphoedème en hospitalisation complèteMaxime Blanc-Fontes
 
Dementia: An Overview
Dementia: An OverviewDementia: An Overview
Dementia: An OverviewIrene Ryan
 
Qgis tutorial 01
Qgis tutorial 01Qgis tutorial 01
Qgis tutorial 01O Fukuoka
 
تعليم Css
تعليم Cssتعليم Css
تعليم CssFataho Ali
 
437 King Lear Drive Charles Town WV 25414
437 King Lear Drive Charles Town WV 25414437 King Lear Drive Charles Town WV 25414
437 King Lear Drive Charles Town WV 25414ERA Liberty Realty
 

Viewers also liked (16)

CommerceHub Brand Insights 2017
CommerceHub Brand Insights 2017CommerceHub Brand Insights 2017
CommerceHub Brand Insights 2017
 
Intelligent Guides: Architecting Systems for Context-driven Interactions
Intelligent Guides: Architecting Systems for Context-driven InteractionsIntelligent Guides: Architecting Systems for Context-driven Interactions
Intelligent Guides: Architecting Systems for Context-driven Interactions
 
11 flowers gifts which are perfect for allergy sufferers
11 flowers gifts which are perfect for allergy sufferers11 flowers gifts which are perfect for allergy sufferers
11 flowers gifts which are perfect for allergy sufferers
 
SP.Matveev.IComp.Cover.AUG2016
SP.Matveev.IComp.Cover.AUG2016SP.Matveev.IComp.Cover.AUG2016
SP.Matveev.IComp.Cover.AUG2016
 
Secret encoder ring
Secret encoder ringSecret encoder ring
Secret encoder ring
 
Forum IA BX mars 2016 - Blade Runner
Forum IA BX mars 2016 - Blade RunnerForum IA BX mars 2016 - Blade Runner
Forum IA BX mars 2016 - Blade Runner
 
Pharmaceutical microbiology west coast
Pharmaceutical microbiology west coastPharmaceutical microbiology west coast
Pharmaceutical microbiology west coast
 
Prise en charge du lymphoedème en hospitalisation complète
Prise en charge du lymphoedème en hospitalisation complètePrise en charge du lymphoedème en hospitalisation complète
Prise en charge du lymphoedème en hospitalisation complète
 
The Crazy Cuban's Secret
The  Crazy Cuban's   SecretThe  Crazy Cuban's   Secret
The Crazy Cuban's Secret
 
Dementia: An Overview
Dementia: An OverviewDementia: An Overview
Dementia: An Overview
 
Zooth
ZoothZooth
Zooth
 
Qgis tutorial 01
Qgis tutorial 01Qgis tutorial 01
Qgis tutorial 01
 
تعليم Css
تعليم Cssتعليم Css
تعليم Css
 
437 King Lear Drive Charles Town WV 25414
437 King Lear Drive Charles Town WV 25414437 King Lear Drive Charles Town WV 25414
437 King Lear Drive Charles Town WV 25414
 
ISOPLYO20 DOS MODELOS
ISOPLYO20 DOS MODELOSISOPLYO20 DOS MODELOS
ISOPLYO20 DOS MODELOS
 
Escenes Locals 2017
Escenes Locals 2017Escenes Locals 2017
Escenes Locals 2017
 

Similar to Practical solutions to detecting bugs

Backups and Disaster Recovery for Nonprofits
Backups and Disaster Recovery for NonprofitsBackups and Disaster Recovery for Nonprofits
Backups and Disaster Recovery for NonprofitsCommunity IT Innovators
 
The on-call survival guide - how to be confident on-call
The on-call survival guide - how to be confident on-call The on-call survival guide - how to be confident on-call
The on-call survival guide - how to be confident on-call Raygun
 
Asp Abstracts, Sample Copy 15+ Abstracts
Asp Abstracts, Sample Copy 15+ AbstractsAsp Abstracts, Sample Copy 15+ Abstracts
Asp Abstracts, Sample Copy 15+ Abstractsncct
 
How to assess your it needs and implement technology at your nonprofit
How to assess your it needs and implement technology at your nonprofitHow to assess your it needs and implement technology at your nonprofit
How to assess your it needs and implement technology at your nonprofitTechSoup Canada
 
Business continuity in general
Business continuity in generalBusiness continuity in general
Business continuity in generalJohn Johari
 
Wellness presentation
Wellness presentationWellness presentation
Wellness presentation3DTechnology
 
Businesses involved in mergers and acquisitions must exercise due di.docx
Businesses involved in mergers and acquisitions must exercise due di.docxBusinesses involved in mergers and acquisitions must exercise due di.docx
Businesses involved in mergers and acquisitions must exercise due di.docxdewhirstichabod
 
THE CIO PLAYBOOK NINE STEPS CIOS MUST TAKE FOR SUCCESSFUL DIVESTITURE
THE CIO PLAYBOOK NINE STEPS CIOS MUST TAKE FOR  SUCCESSFUL DIVESTITURETHE CIO PLAYBOOK NINE STEPS CIOS MUST TAKE FOR  SUCCESSFUL DIVESTITURE
THE CIO PLAYBOOK NINE STEPS CIOS MUST TAKE FOR SUCCESSFUL DIVESTITUREAbhishek Sood
 
Selling Infosec to the CSuite
Selling Infosec to the CSuiteSelling Infosec to the CSuite
Selling Infosec to the CSuiteDave R. Taylor
 
2016 Risk Management Workshop
2016 Risk Management Workshop2016 Risk Management Workshop
2016 Risk Management WorkshopStacy Willis
 
What to expect from your IT People
What to expect from your IT PeopleWhat to expect from your IT People
What to expect from your IT PeopleJason Caras
 
NARCA Presentation - IT Best Practice
NARCA Presentation - IT Best PracticeNARCA Presentation - IT Best Practice
NARCA Presentation - IT Best PracticeBrenda Majewski
 
Penetration testing services
Penetration testing servicesPenetration testing services
Penetration testing servicesAlisha Henderson
 
The Importance of Security within the Computer Environment
The Importance of Security within the Computer EnvironmentThe Importance of Security within the Computer Environment
The Importance of Security within the Computer EnvironmentAdetula Bunmi
 
5 Steps to Improve Your Incident Response Plan
5 Steps to Improve Your Incident Response Plan5 Steps to Improve Your Incident Response Plan
5 Steps to Improve Your Incident Response PlanResilient Systems
 
Endpoint Security & Why It Matters!
Endpoint Security & Why It Matters!Endpoint Security & Why It Matters!
Endpoint Security & Why It Matters!Net at Work
 
Flatworld Solutions - Call Center.
Flatworld Solutions - Call Center.Flatworld Solutions - Call Center.
Flatworld Solutions - Call Center.Basavaraj Betageri
 
Jack Nichelson - Information Security Metrics - Practical Security Metrics
Jack Nichelson - Information Security Metrics - Practical Security MetricsJack Nichelson - Information Security Metrics - Practical Security Metrics
Jack Nichelson - Information Security Metrics - Practical Security Metricscentralohioissa
 

Similar to Practical solutions to detecting bugs (20)

Backups and Disaster Recovery for Nonprofits
Backups and Disaster Recovery for NonprofitsBackups and Disaster Recovery for Nonprofits
Backups and Disaster Recovery for Nonprofits
 
The on-call survival guide - how to be confident on-call
The on-call survival guide - how to be confident on-call The on-call survival guide - how to be confident on-call
The on-call survival guide - how to be confident on-call
 
Asp Abstracts, Sample Copy 15+ Abstracts
Asp Abstracts, Sample Copy 15+ AbstractsAsp Abstracts, Sample Copy 15+ Abstracts
Asp Abstracts, Sample Copy 15+ Abstracts
 
How to assess your it needs and implement technology at your nonprofit
How to assess your it needs and implement technology at your nonprofitHow to assess your it needs and implement technology at your nonprofit
How to assess your it needs and implement technology at your nonprofit
 
Business continuity in general
Business continuity in generalBusiness continuity in general
Business continuity in general
 
1 07-30
1 07-301 07-30
1 07-30
 
Wellness presentation
Wellness presentationWellness presentation
Wellness presentation
 
Businesses involved in mergers and acquisitions must exercise due di.docx
Businesses involved in mergers and acquisitions must exercise due di.docxBusinesses involved in mergers and acquisitions must exercise due di.docx
Businesses involved in mergers and acquisitions must exercise due di.docx
 
THE CIO PLAYBOOK NINE STEPS CIOS MUST TAKE FOR SUCCESSFUL DIVESTITURE
THE CIO PLAYBOOK NINE STEPS CIOS MUST TAKE FOR  SUCCESSFUL DIVESTITURETHE CIO PLAYBOOK NINE STEPS CIOS MUST TAKE FOR  SUCCESSFUL DIVESTITURE
THE CIO PLAYBOOK NINE STEPS CIOS MUST TAKE FOR SUCCESSFUL DIVESTITURE
 
DRP.ppt
DRP.pptDRP.ppt
DRP.ppt
 
Selling Infosec to the CSuite
Selling Infosec to the CSuiteSelling Infosec to the CSuite
Selling Infosec to the CSuite
 
2016 Risk Management Workshop
2016 Risk Management Workshop2016 Risk Management Workshop
2016 Risk Management Workshop
 
What to expect from your IT People
What to expect from your IT PeopleWhat to expect from your IT People
What to expect from your IT People
 
NARCA Presentation - IT Best Practice
NARCA Presentation - IT Best PracticeNARCA Presentation - IT Best Practice
NARCA Presentation - IT Best Practice
 
Penetration testing services
Penetration testing servicesPenetration testing services
Penetration testing services
 
The Importance of Security within the Computer Environment
The Importance of Security within the Computer EnvironmentThe Importance of Security within the Computer Environment
The Importance of Security within the Computer Environment
 
5 Steps to Improve Your Incident Response Plan
5 Steps to Improve Your Incident Response Plan5 Steps to Improve Your Incident Response Plan
5 Steps to Improve Your Incident Response Plan
 
Endpoint Security & Why It Matters!
Endpoint Security & Why It Matters!Endpoint Security & Why It Matters!
Endpoint Security & Why It Matters!
 
Flatworld Solutions - Call Center.
Flatworld Solutions - Call Center.Flatworld Solutions - Call Center.
Flatworld Solutions - Call Center.
 
Jack Nichelson - Information Security Metrics - Practical Security Metrics
Jack Nichelson - Information Security Metrics - Practical Security MetricsJack Nichelson - Information Security Metrics - Practical Security Metrics
Jack Nichelson - Information Security Metrics - Practical Security Metrics
 

Recently uploaded

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

Practical solutions to detecting bugs

  • 1. Practical solutions to detecting bugs Karl Norling @karlnorling
  • 2. I’m Karl Norling 2 • Swedish, moved to U.S ~13 years ago • Spends most of my time in Brooklyn with my family • Lead software engineer at Quartet
  • 3. Quartet 3 • Healthcare technology company focused on improving Behavioral Healthcare • 1 in 4 Americans experienced a mental disorder in the last year; most were moderate to severe • Behavioral Health side is often ignored, leading to poor outcomes (medication non-adherence & ER visits)
  • 4. Quartet 4 • Quartet delivers scalable behavioral health integration for our partners, leading to better patient care and cost savings • Quartet’s product is a marriage of our three pillars, Data & Analytics, Collaborative Platform, Engagement and Support.
  • 5. Engineering at Quartet 5 • Work with highly sensitive and regulated information that demands high reliability (PHI/HIPAA) • Develop for very distinct users with very different challenges (BHP, PCP, Patients, Quartet Users) • Need to deliver a robust solution
  • 6. How do you know something is wrong?
  • 8. How do I know that’s something is wrong? Identify critical ‘this should never 
 happen, but if’ — and log them if (user.isAuthenticated) { … } else { this.logger.log(‘warn’, ‘Not authorized user trying to access ..’, { user, });
 }
  • 9. How do I know that’s something is wrong? Wrap code in try catch statements try { const patient = new Patient(metrics, logger);
 
 patient.hydrate(rawPatient); } catch (error) {
 this.logger.log(‘error’,’Failed to hydrated patient’, { patientId: rawPatient.id }); }
  • 10. How do I know that’s something is wrong? Measure events with metrics const res = request.authenticate(email, pw);
 if (res.status === 200) { this.metrics(‘user_login_success’ ); } else if (res.status === 403) { this.metrics(‘user_login_unathori zed’); } else if (res.status >= 500) { this.metrics(‘user_login_error’); }
  • 11. How do I know that’s something is wrong? 11 Measure everything • Environment alerts: CPU usage, disk space etc. • External reporting: Customer, employee reports issue.
  • 13. Create alerts 13 Create search queries in logging software 
 (e.g. kibana, sumologic, splunk)
 that will alert on specific log message, level, or threshold. At Quartet we’re using elastalert.
  • 14. Create alerts 14 Metrics are good to use to detect trends. Example: If we haven’t had any logins in the last 24 hours, it’s time to investigate.
  • 15. Create alerts 15 There should be a way for employees and customers to report issues — either from the website or via email address. Example: Employee using internal tool cannot change shipping address for an order.
  • 17. Organize alerts 17 Add tags to log messages. Then, search queries are easier to group, delegate, and report upon.
  • 18. Organize alerts 18 Define a naming convention system for your tags. Either prefix them with functional areas or team names.

  • 19. Organize alerts 19 Alerts should create tickets. When an alert gets triggered, a ticket should be generated in whichever tool being used to track work in (i.e., JIRA). Tickets should be created within the project associated with the team that owns the service.
  • 21. Communicate 21 Choose the right tool for communicating the alert to the person on call (e.g. Slack, Hipchat, email, JIRA). At Quartet we’re using Slack.
  • 22. Communicate 22 Make sure the tool can be configured to send alerts via different channels depending on the alert, so the correct team, on-call person sees it. At Quartet we’re using PagerDuty.
  • 25. Who is on-call 25 On-call is the employee that’s responding to alerts. Other terms might be red-hot, on-duty, etc.
  • 26. On-call acknowledges the issue 26 On-call schedule should be created, rotating weekly (depending on # of employees). You may also have a secondary on-call, in case primary is unavailable (i.e., on subway ride home). At Quartet we have one for app devs, core, and infra.
  • 28. On-call acknowledges the issue 28 Primary on-call receives alert, acknowledges through same channel within defined range of time. If time expires, issue is bumped to alert secondary. Make sure to set time range that makes sense for your organization. At Quartet, we use 15 minutes.
  • 29. How to respond to the issue
  • 30. How to respond to the issue 30 Alerts need to be investigated to determine how urgently they need to be addressed. For critical issues, on-call should be empowered to reach out and involve owner of code causing issue even if it’s after-hours. At Quartet, alerts create tickets automatically. For all issues, on- call will make sure tickets are assigned to the right team.
  • 31. How to respond to the issue 31 You need to define a process for marking issues resolved that makes sense for your organizational model. It’s helpful if there’s a link to a handbook in the reported issue. The handbook should contain steps for how to investigate and possibly resolve the issue.
  • 32. How to respond to the issue 32 If it’s an employee or customer filing the issue, there needs to be an established process for communicating externally, i.e., internal email or involving customer service. Depending on SLA, it needs to happen within a timeframe.
  • 34. Establish process and guidelines 34 At Quartet, we have a doc that details our process and best practices. New employees shadow the primary on-calls for a month before they get added to the rotation.
  • 35. Example guidelines 35 • Each engineering team member will have the PagerDuty app installed on their phone. • If your PagerDuty schedule overlaps with planned vacation, arrange a schedule override in PagerDuty. • Each team is responsible for creating PagerDuty alerts for the services that they are responsible for.
  • 36. Ensure information is maintained 36 To facilitate a good continuum of the on-call schedule to the next person, there should be a hand-off meeting. The on-call is responsible for walking the next on-call through the weekly report for the previous week.
  • 37. Evolve 37 How did we get here? Stop the noise - if an error happens over and over again, dedupe it. Investigate why. Downgrade - is the error actually an error or should we measure via a metric.
  • 38. On-call 38 On-call is dedicated 100% of their time to investigate bugs, this makes sense where we’re at, shipping a lot of code. More code generates more bugs naturally.
  • 39. Tools 39 Tools ❤ Process The tooling will not solve your issues, you have to have a process how to use the tools.
 “If you only have a hammer, everything looks like a nail”