SlideShare a Scribd company logo
1 of 43
Download to read offline
🤖
HOW AI HELPS OBSERVE
DECENTRALISED SYSTEMS
Dominic Wellington | @dwellington
FULL DISCLOSURE
I work for a vendor (Moogsoft)
…but this is not a product pitch
We are hiring!
We are living in a different world
from the one our systems and processes
were designed for
OLD WORLD
Static Environment
• Relatively small number of devices
• Slow rate of growth
• Low frequency of change (deployments)
Manageable AlertVolumes
• Problem is extracting enough information
• Relatively easy to understand
NEW WORLD
Fast-growing, fast-changing environment
• More and more devices
• More and more frequent releases
• More and more automation
Massive AlertVolumes
• From monitoring to observability
• Increasing specialisation
WE SPEND MORE TIME MANAGING IT
THAN USING IT
–JustinTrudeau, Prime Minister of Canada, Davos WEF 2018
“The pace of change has never been this fast,
and it will never be this slow again.”
COMPLEXITY
• Compute
• Network
• Storage
• Bare metal
• Hypervisor
• Private cloud
• Public cloud
• Hybrid cloud
• Virtual private cloud
• Software-defined networking
• Software-defined data center
• Software-defined everything
• Containers
• Serverless
• IaaS
• PaaS
• SaaS
• DevOps
Why every 9
costs 10 times more
than the last one
LIVING ONTHE EDGE
• What happens on the network edge is more & more important
• But!The edge is really far away
• Unreliable connectivity, limited bandwidth, constant flux
• There’s always something going wrong somewhere
• One device or a region? One production line or a factory?
Single faults
no longer cause impacts
Fault tolerance
does not mean
Zero Incidents
Theory
☁
"
! !Practice
WE NEED TO CHANGE MONITORING
BECAUSE SYSTEMS HAVE CHANGED
A MAZE OF TWISTY SERVICES, ALL ALIKE
#OpsLife
Booking software outages:
Passengers across world unable
to board planes
System outage:
Customers unable to use ATMs
to withdraw cash
4-hour outage:
Co-workers & teammates
unable to communicate
Worldwide outage on NewYear’s Eve:
Family members unable to exchange
NewYear greetings
🏦✈
📱💬
QUICK,AN ALERT!
📟
“Let’s have a good old-fashioned blamestorm”
THE STATISTICS SAY IT ALL
74% of incidents
detected by
end users
before Support
is aware
>62% of the time
the Application
is not the cause
of the Incident
>36%
IncidentTickets
escalated
>32%Tickets
reassigned
across silos
😱
From an informal attendee survey at
SREcon 18
🤔
SO HOW DO WE FIX MONITORING?
SOLUTION:
BUT WHATTO MONITOR?
MONITORING
🔍
• Periodic polling
• Filtered
• Late addition
• Incident-driven
HIDDEN ASSUMPTIONS
• Information is expensive and valuable
• Faults are easy to detect (Byzantine Fault)
• All failure conditions are knowable
DASHBOARDS 🤮
• The internal health of the system
is irrelevant
• Individual requests are what
users care about
• Every dashboard is an artefact of
a past failure
OBSERVABILITY
👁
• Continuous stream
• High-cardinality
• Built in to infrastructure & apps
• Insight-driven
REALISATIONS
• Information is cheap, only valuable if queried
• User experience is not an afterthought
• …in fact it’s a key diagnostic information source
(just don’t treat your users as canaries) 🐤
INCIDENT-DRIVEN
—
RESOURCE
CONSUMPTION
INSIGHT-DRIVEN
—
ACTIONABLE
UNDERSTANDING
HOWTO FIND ACTIONABLE INSIGHTS?
PUT EVERYTHING IN A DATA LAKE!
Objects in rear view mirror
may be less relevant than they appear
–Donald Rumsfeld
“There are known knowns; there are things we know we know.
We also know there are known unknowns; that is to say we
know there are some things we do not know.
But there are also unknown unknowns –
the ones we don't know we don't know.
It is the latter category that tend to be the difficult ones.”
MONITORING AS IT IS
* slaps roof of NOC *
this bad boy can fit so many monitoring tools in it
-
🤷
MONITORING AS IT SHOULD BE
🤖
/
0
1
2
😕 AI? MACHINE LEARNING? 🤔
• Stanford definition:“Machine learning is the science of getting
computers to act without being explicitly programmed.”
• AI in IT Ops: bring interesting information to the attention of human
operators – without having to define it beforehand
AI
Machine learning
Deep learning
WHERETO USE AI IN IT OPS?
• Ingestion: reduce noise and false alarms
• Correlation: identify related events across domains, avoid duplication
of effort and missed signals
• Collaboration: intelligent teaming, root cause analysis, knowledge
capture
TEACHINGTHE MACHINE
• Inputs matter: choose the right feature vectors
• Regression problems: continuous distribution
• Classification vs clustering: set of categories
AIOps
A New Framework for IT Ops
• Proactive insight
• Intelligent notification
• Intelligent collaboration
• Workflow automation
• Causal analysis
• Decision support
Ref: Innovation Insight for Algorithmic IT Operations Platforms
IN PRACTICE:
• This is IT Ops: speed matters, work in real time
• You don’t know what you need to know
• AI is a tool, not magic
• Process is how you make sure it works for users
🧠
⚡
6
7
WHY ARE WE NOT DOINGTHIS
ALREADY?
The Greek triad:
• Fear
• Honour
• Interest
These tools and
processes are incredible
force multipliers
WHYYOU SHOULD START RIGHT AWAY
🤗
THANK YOU!
Dominic Wellington | @dwellington

More Related Content

Similar to AI Helps Observe Decentralised Systems

Adversary Driven Defense in the Real World
Adversary Driven Defense in the Real WorldAdversary Driven Defense in the Real World
Adversary Driven Defense in the Real WorldJames Wickett
 
From ensembles to computer networks
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networksCSIRO
 
Normal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesNormal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesJonathan Creasy
 
20170613 iasa architecture - Tim Willoughby presentation
20170613   iasa architecture  - Tim Willoughby presentation20170613   iasa architecture  - Tim Willoughby presentation
20170613 iasa architecture - Tim Willoughby presentationTim Willoughby
 
Spirent: The Internet of Things: The Expanded Security Perimeter
Spirent: The Internet of Things:  The Expanded Security Perimeter Spirent: The Internet of Things:  The Expanded Security Perimeter
Spirent: The Internet of Things: The Expanded Security Perimeter Sailaja Tennati
 
Radical Innovation In Security (New Techniques Applied To Tomorrow’s Risk)
Radical Innovation In Security (New Techniques Applied To Tomorrow’s Risk)Radical Innovation In Security (New Techniques Applied To Tomorrow’s Risk)
Radical Innovation In Security (New Techniques Applied To Tomorrow’s Risk)Fujitsu Middle East
 
Artificial Intelligence Primer
Artificial Intelligence PrimerArtificial Intelligence Primer
Artificial Intelligence PrimerImam Hoque
 
Securing Systems - Still Crazy After All These Years
Securing Systems - Still Crazy After All These YearsSecuring Systems - Still Crazy After All These Years
Securing Systems - Still Crazy After All These YearsAdrian Sanabria
 
Identify and Stop Insider Threats
Identify and Stop Insider ThreatsIdentify and Stop Insider Threats
Identify and Stop Insider ThreatsLancope, Inc.
 
Perfect Information - How IoT empowers you to know anything, anytime, anywhere
Perfect Information - How IoT empowers you to know anything, anytime, anywherePerfect Information - How IoT empowers you to know anything, anytime, anywhere
Perfect Information - How IoT empowers you to know anything, anytime, anywhere10x Nation
 
Cyber threat enterprise leadership required march 2014
Cyber threat   enterprise leadership required  march 2014Cyber threat   enterprise leadership required  march 2014
Cyber threat enterprise leadership required march 2014Peter ODell
 
SRE Topics with Charity Majors and Liz Fong-Jones of Honeycomb
SRE Topics with Charity Majors and Liz Fong-Jones of HoneycombSRE Topics with Charity Majors and Liz Fong-Jones of Honeycomb
SRE Topics with Charity Majors and Liz Fong-Jones of HoneycombDaniel Zivkovic
 
Analytics in business
Analytics in businessAnalytics in business
Analytics in businessNiko Vuokko
 
SpiceWorks Webinar: Whose logs, what logs, why logs
SpiceWorks Webinar: Whose logs, what logs, why logs  SpiceWorks Webinar: Whose logs, what logs, why logs
SpiceWorks Webinar: Whose logs, what logs, why logs AlienVault
 
How the Internet of Things (IoT) Works for Business
How the Internet of Things (IoT) Works for BusinessHow the Internet of Things (IoT) Works for Business
How the Internet of Things (IoT) Works for Business10x Nation
 
IoT: Entering an Era of Perfect Information
IoT: Entering an Era of Perfect InformationIoT: Entering an Era of Perfect Information
IoT: Entering an Era of Perfect InformationChristopher Mohritz
 
Considerations for a secure internet of things for cities and communities
Considerations for a secure internet of things for cities and communitiesConsiderations for a secure internet of things for cities and communities
Considerations for a secure internet of things for cities and communitiesMrinal Wadhwa
 
Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)Charity Majors
 
Sensors, Identifiers & Digital Twins: Tracking Identity on the Supply Chain
Sensors, Identifiers & Digital Twins: Tracking Identity on the Supply ChainSensors, Identifiers & Digital Twins: Tracking Identity on the Supply Chain
Sensors, Identifiers & Digital Twins: Tracking Identity on the Supply ChainHeather Vescent
 
UMS Cybersecurity Awareness Seminar: Cybersecurity - Lessons learned from sec...
UMS Cybersecurity Awareness Seminar: Cybersecurity - Lessons learned from sec...UMS Cybersecurity Awareness Seminar: Cybersecurity - Lessons learned from sec...
UMS Cybersecurity Awareness Seminar: Cybersecurity - Lessons learned from sec...APNIC
 

Similar to AI Helps Observe Decentralised Systems (20)

Adversary Driven Defense in the Real World
Adversary Driven Defense in the Real WorldAdversary Driven Defense in the Real World
Adversary Driven Defense in the Real World
 
From ensembles to computer networks
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networks
 
Normal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesNormal accidents and outpatient surgeries
Normal accidents and outpatient surgeries
 
20170613 iasa architecture - Tim Willoughby presentation
20170613   iasa architecture  - Tim Willoughby presentation20170613   iasa architecture  - Tim Willoughby presentation
20170613 iasa architecture - Tim Willoughby presentation
 
Spirent: The Internet of Things: The Expanded Security Perimeter
Spirent: The Internet of Things:  The Expanded Security Perimeter Spirent: The Internet of Things:  The Expanded Security Perimeter
Spirent: The Internet of Things: The Expanded Security Perimeter
 
Radical Innovation In Security (New Techniques Applied To Tomorrow’s Risk)
Radical Innovation In Security (New Techniques Applied To Tomorrow’s Risk)Radical Innovation In Security (New Techniques Applied To Tomorrow’s Risk)
Radical Innovation In Security (New Techniques Applied To Tomorrow’s Risk)
 
Artificial Intelligence Primer
Artificial Intelligence PrimerArtificial Intelligence Primer
Artificial Intelligence Primer
 
Securing Systems - Still Crazy After All These Years
Securing Systems - Still Crazy After All These YearsSecuring Systems - Still Crazy After All These Years
Securing Systems - Still Crazy After All These Years
 
Identify and Stop Insider Threats
Identify and Stop Insider ThreatsIdentify and Stop Insider Threats
Identify and Stop Insider Threats
 
Perfect Information - How IoT empowers you to know anything, anytime, anywhere
Perfect Information - How IoT empowers you to know anything, anytime, anywherePerfect Information - How IoT empowers you to know anything, anytime, anywhere
Perfect Information - How IoT empowers you to know anything, anytime, anywhere
 
Cyber threat enterprise leadership required march 2014
Cyber threat   enterprise leadership required  march 2014Cyber threat   enterprise leadership required  march 2014
Cyber threat enterprise leadership required march 2014
 
SRE Topics with Charity Majors and Liz Fong-Jones of Honeycomb
SRE Topics with Charity Majors and Liz Fong-Jones of HoneycombSRE Topics with Charity Majors and Liz Fong-Jones of Honeycomb
SRE Topics with Charity Majors and Liz Fong-Jones of Honeycomb
 
Analytics in business
Analytics in businessAnalytics in business
Analytics in business
 
SpiceWorks Webinar: Whose logs, what logs, why logs
SpiceWorks Webinar: Whose logs, what logs, why logs  SpiceWorks Webinar: Whose logs, what logs, why logs
SpiceWorks Webinar: Whose logs, what logs, why logs
 
How the Internet of Things (IoT) Works for Business
How the Internet of Things (IoT) Works for BusinessHow the Internet of Things (IoT) Works for Business
How the Internet of Things (IoT) Works for Business
 
IoT: Entering an Era of Perfect Information
IoT: Entering an Era of Perfect InformationIoT: Entering an Era of Perfect Information
IoT: Entering an Era of Perfect Information
 
Considerations for a secure internet of things for cities and communities
Considerations for a secure internet of things for cities and communitiesConsiderations for a secure internet of things for cities and communities
Considerations for a secure internet of things for cities and communities
 
Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)
 
Sensors, Identifiers & Digital Twins: Tracking Identity on the Supply Chain
Sensors, Identifiers & Digital Twins: Tracking Identity on the Supply ChainSensors, Identifiers & Digital Twins: Tracking Identity on the Supply Chain
Sensors, Identifiers & Digital Twins: Tracking Identity on the Supply Chain
 
UMS Cybersecurity Awareness Seminar: Cybersecurity - Lessons learned from sec...
UMS Cybersecurity Awareness Seminar: Cybersecurity - Lessons learned from sec...UMS Cybersecurity Awareness Seminar: Cybersecurity - Lessons learned from sec...
UMS Cybersecurity Awareness Seminar: Cybersecurity - Lessons learned from sec...
 

Recently uploaded

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 

Recently uploaded (20)

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 

AI Helps Observe Decentralised Systems

  • 1. 🤖 HOW AI HELPS OBSERVE DECENTRALISED SYSTEMS Dominic Wellington | @dwellington
  • 2. FULL DISCLOSURE I work for a vendor (Moogsoft) …but this is not a product pitch We are hiring!
  • 3.
  • 4. We are living in a different world from the one our systems and processes were designed for
  • 5. OLD WORLD Static Environment • Relatively small number of devices • Slow rate of growth • Low frequency of change (deployments) Manageable AlertVolumes • Problem is extracting enough information • Relatively easy to understand
  • 6. NEW WORLD Fast-growing, fast-changing environment • More and more devices • More and more frequent releases • More and more automation Massive AlertVolumes • From monitoring to observability • Increasing specialisation
  • 7. WE SPEND MORE TIME MANAGING IT THAN USING IT
  • 8. –JustinTrudeau, Prime Minister of Canada, Davos WEF 2018 “The pace of change has never been this fast, and it will never be this slow again.”
  • 9. COMPLEXITY • Compute • Network • Storage • Bare metal • Hypervisor • Private cloud • Public cloud • Hybrid cloud • Virtual private cloud • Software-defined networking • Software-defined data center • Software-defined everything • Containers • Serverless • IaaS • PaaS • SaaS • DevOps Why every 9 costs 10 times more than the last one
  • 10. LIVING ONTHE EDGE • What happens on the network edge is more & more important • But!The edge is really far away • Unreliable connectivity, limited bandwidth, constant flux • There’s always something going wrong somewhere • One device or a region? One production line or a factory?
  • 11. Single faults no longer cause impacts Fault tolerance does not mean Zero Incidents
  • 14. WE NEED TO CHANGE MONITORING BECAUSE SYSTEMS HAVE CHANGED
  • 15. A MAZE OF TWISTY SERVICES, ALL ALIKE
  • 17. Booking software outages: Passengers across world unable to board planes System outage: Customers unable to use ATMs to withdraw cash 4-hour outage: Co-workers & teammates unable to communicate Worldwide outage on NewYear’s Eve: Family members unable to exchange NewYear greetings 🏦✈ 📱💬
  • 19. “Let’s have a good old-fashioned blamestorm”
  • 20. THE STATISTICS SAY IT ALL 74% of incidents detected by end users before Support is aware >62% of the time the Application is not the cause of the Incident >36% IncidentTickets escalated >32%Tickets reassigned across silos
  • 21. 😱 From an informal attendee survey at SREcon 18
  • 22. 🤔 SO HOW DO WE FIX MONITORING?
  • 25. MONITORING 🔍 • Periodic polling • Filtered • Late addition • Incident-driven
  • 26. HIDDEN ASSUMPTIONS • Information is expensive and valuable • Faults are easy to detect (Byzantine Fault) • All failure conditions are knowable
  • 27. DASHBOARDS 🤮 • The internal health of the system is irrelevant • Individual requests are what users care about • Every dashboard is an artefact of a past failure
  • 28. OBSERVABILITY 👁 • Continuous stream • High-cardinality • Built in to infrastructure & apps • Insight-driven
  • 29. REALISATIONS • Information is cheap, only valuable if queried • User experience is not an afterthought • …in fact it’s a key diagnostic information source (just don’t treat your users as canaries) 🐤
  • 31. HOWTO FIND ACTIONABLE INSIGHTS? PUT EVERYTHING IN A DATA LAKE!
  • 32. Objects in rear view mirror may be less relevant than they appear
  • 33. –Donald Rumsfeld “There are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don't know we don't know. It is the latter category that tend to be the difficult ones.”
  • 34. MONITORING AS IT IS * slaps roof of NOC * this bad boy can fit so many monitoring tools in it - 🤷
  • 35. MONITORING AS IT SHOULD BE 🤖 / 0 1 2
  • 36. 😕 AI? MACHINE LEARNING? 🤔 • Stanford definition:“Machine learning is the science of getting computers to act without being explicitly programmed.” • AI in IT Ops: bring interesting information to the attention of human operators – without having to define it beforehand AI Machine learning Deep learning
  • 37. WHERETO USE AI IN IT OPS? • Ingestion: reduce noise and false alarms • Correlation: identify related events across domains, avoid duplication of effort and missed signals • Collaboration: intelligent teaming, root cause analysis, knowledge capture
  • 38. TEACHINGTHE MACHINE • Inputs matter: choose the right feature vectors • Regression problems: continuous distribution • Classification vs clustering: set of categories
  • 39. AIOps A New Framework for IT Ops • Proactive insight • Intelligent notification • Intelligent collaboration • Workflow automation • Causal analysis • Decision support Ref: Innovation Insight for Algorithmic IT Operations Platforms
  • 40. IN PRACTICE: • This is IT Ops: speed matters, work in real time • You don’t know what you need to know • AI is a tool, not magic • Process is how you make sure it works for users 🧠 ⚡ 6 7
  • 41. WHY ARE WE NOT DOINGTHIS ALREADY? The Greek triad: • Fear • Honour • Interest
  • 42. These tools and processes are incredible force multipliers WHYYOU SHOULD START RIGHT AWAY