SlideShare a Scribd company logo
1 of 32
Download to read offline
© 2018 SPLUNK INC.
Still haven’t got on top of IT outages?
Accept failure, learn from failure and get rid of failure to protect your business
Dr. Siyka Andreeva | IT Operations Analytics Specialist
April 2019
© 2018 SPLUNK INC.
Forward Looking Statements
During the course of this presentation, we may make forward-looking statements regarding future events or
the expected performance of the company. We caution you that such statements reflect our current
expectations and estimates based on factors currently known to us and that actual events or results could
differ materially. For important factors that may cause actual results to differ from those contained in our
forward-looking statements, please review our filings with the SEC.
The forward-looking statements made in this presentation are being made as of the time and date of its live
presentation. If reviewed after its live presentation, this presentation may not contain current or accurate
information. We do not assume any obligation to update any forward-looking statements we may make. In
addition, any information about our roadmap outlines our general product direction and is subject to change
at any time without notice. It is for informational purposes only and shall not be incorporated into any contract
or other commitment. Splunk undertakes no obligation either to develop the features or functionality
described or to include any such feature or functionality in a future release.
Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other
brand names, product names, or trademarks belong to their respective owners. © 2017 Splunk Inc. All rights reserved.
© 2018 SPLUNK INC.
Agenda
Why You Need to Stop Being Reactive
Data and Machine Learning: How to Get to a Predictive IT
Case Study with CMC Markets
© 2018 SPLUNK INC.
High Availability is everywhere !
How many 9’s
do you have?
100%
100%
100%
99,999%
© 2018 SPLUNK INC.
Because we live in a (theorical) SLA world
But surrounded by storms, human errors and trolls
SQLApp Service
99,95% 99,95%
• App service is down
• SQL is down
• Both are down
Serial Compound
Availability
The overall “service”
availability is lower: 99,90%
SQLApp Service
99,95% 99,95%
SQLApp Service
99,95% 99,95%
Serial and parallel
Availability
99,99%
A
B
Traffic Mger
• App service (A) is down
• SQL (A) is down
• Both App/SQL (A) are down
• App service (B) is down
• SQL (B) is down
• Both App/SQL (B) are down
• Traffic Mger is down
• Combination of above
99,98%You still have a SPOF
Overall SLA is:
© 2018 SPLUNK INC.
And yet there are more outages than ever
25%
2017
31%
2018
suffered an outage or
period of “server
service degradation”
over the past 12
months,
Source: Uptime Institute 2018 (8th annual Data Center Survey)
48%
If on-prem
DC
80%Could have been prevented
Leading causes:
Human errors, power outages,
network, configuration issues
© 2018 SPLUNK INC.
More outages than ever + higher cost / incident
Customer
Satisfaction
Brand
Reputation
Line of
Revenue
*According to “Damage Control: The Impact of Critical IT Incidents”
$105,302
the mean business
cost of an IT incident
© 2018 SPLUNK INC.
Predict and Prevent Operational Issues with AI
$ Impact
Proactive
(add logs and metrics)
Effective
$ Impact
Existing
Events
Cost of
Impact
Reactively Alerted
MTTR
Automated Resolution
MTTR
MTTR
Splunk ML Alert
© 2018 SPLUNK INC.
Predict and Prevent Operational Issues with AI
$ Impact
Predictive
Proactive
(add logs and metrics)
Effective
$ Impact
Existing
Events
NEGATIVE
MTTR!!
Predict 30 Minutes
in Advance
Time
Return to
Business
Cost of
Impact
Reactively Alerted
MTTR
Automated Resolution
MTTR
MTTR
Splunk ML Alert
© 2018 SPLUNK INC.
Online
ServicesNetworks
Security
Call Detail
Records
Web
Services
Telecoms
Web
Clickstreams
Tracing
Online
Shopping Cart
Smartphones
and Devices
Custom
Applications
Energy Meters
Storage
Public
Cloud Private
Cloud
Containers
On-Premises
Servers
GPS
Location
RFID
Packaged
ApplicationsDatabases MessagingFirewall
Logs Wired DB Mobile IoT APIMetrics
Data lake
APM
Traces
+ Machine Learning
Multiple Data Sources
The right teams are
automatically alerted of
the incidents to take
actions
Teams are notified of
the potential issues
BEFORE they turn red
Automate runbooks for
known issues
Alerts correlated across the stack and prioritized and presented by Service Impact
© 2018 SPLUNK INC.
How to find a needle in multiple haystacks?
(choose your tool)
Network?
Database?
Middleware?
Hardware?
Wrong
command?
Connection?
Apache?
VM?
Mainframe?
Load
balancer?Wrong code
released?
Collect ALL data
• Collect from all silos
• Data in original raw format
• Add open sources apps to
ingest data on the fly
• Schema on the fly
• Dynamic thresholding
• Realtime correlation
Clustering & aggregation
• Real time event
clustering/correlation
• Reduce alert noise
• Behavioural analytics
• Deduplication
Add context
• Measure / report on
indicators that matters
• Add service / business
context
• Add actionable
information to detection
Salessso
Claims
Anomaly detection
• Catch issues that thresholds
cannot
• Reduce event clutter
• Deviation from past
behaviour
• Deviation from peers
• Unusual change in features
Assisted deep dive
investigation
• Root cause analysis
• Powerful & easy to use
search & investigate
language
?
Predictive
Analytics
• Predict service health
• Predict events
• Trend forecasting
• Detect influencing
entities
• Early warning of
failure
70% to 90%
Reduction in investigation time
15% to 45%
Reduction in high priority incidents
67% to 82%
Reduction in business
impact
© 2018 SPLUNK INC.
How We’re Getting There
Richard Bailey
CMC Markets
© 2018 SPLUNK INC.
Introduction
• Not a blueprint
• Organic / agile
• Our challenges
• Multiple use cases
• Process
• What we collect
• DIY anomaly detection
• Predicting the predictable
• Essential housekeeping
© 2018 SPLUNK INC.
What Does CMC
Markets Do?
• Online Retail Financial Trading
• Spreadbets & CFDs
• Leveraged Products
• Short-term Positions
• Automated Trading
• Worldwide Product Base
© 2018 SPLUNK INC.
Specific Monitoring Situation
(That may not apply to all Splunk Customers)
• Short, sharp, unpredictable load
• Sub-second performance targets
• External SLAs
• Regulatory environment
• In-house development
 Highly granular stats (e.g per sec,cpu)
 Care about short pauses
 Financial penalties
 Fast, fair, transparent, evidenced
 Can change logging
© 2018 SPLUNK INC.
@
Base Splunk
1TB/day - On-Prem - 2-Site Clustered - All-Flash Storage
Enterprise Security
Log
Management
Application
Performance
Monitoring
Monitoring
(everything)
IT Ops
Security
(incl. SIEM)
Business Ops Perf Testing Surveillance Capacity Mgmt SLA Reporting
Alert Generation
© 2018 SPLUNK INC.
(e.g. Splunk’s MC)
Full Picture
Multi-use
Peace-of-mind
Support Specific Alert
Reduce MTTR
Self-explanatory
Support runbook
Encapsulates expertise
+ Alert Tuning
Rare (prefer Alerts)
Maximize Info
Not self-explanatory
Human correlation
Operational
Their only route to data
Dashboards
We have distinct types of dashboard
General
Alert
Response
Live Business
© 2018 SPLUNK INC.
Process
Culture of Closed-Loop and Continuous Improvement
Service
Monitoring
Restore
Service
InvestigatePost-
Incident
Review
Machine Learning
Incidents
Alerts
Anomalies
Predictions
Insights
Noise
Reduction
Improvements
Lessons
learnt
Solutions
• Could we have prevented this?
• Could we have seen this coming?
• Could we have got to root cause faster?
• Did we have all the data/insights we needed?
• Can we eliminate any noise?
• Did we need to write SPL?
• Runbooks
• Dashboards
• Aim: No SPL
© 2018 SPLUNK INC.
Monitoring Services
Service Internals
Application Logs
In-Memory Counters
JMX
GC Logs
Monitoring API
Load
EUM -> CDN -> TM -> Logs
& Upstream Services
Performance
Logs -> TM -> CDN ->EUM
& Downstream Services
State
Infrastructure
Storage NW Messaging DBs
Resource
Utilisation
CPU IO
Mem NW
Correlation
© 2018 SPLUNK INC.
Anomaly Detection
The Goals
• Detect effects of changes
• Early Warning
…but still value in post-incident info
• Must handle incidents
- today’s slowdown must not become tomorrow’s normal
- yet responsive to intended service changes
- but not ignore long-term gradual degradation
• Control: adjust sensitivity, reduce false alarms
• Handle hot/cold nodes and rolling restarts
• Relatable (black box vs plain sight)
• Traceable (back to real figures)
• Actionable (deal with both incidents and false alarms)
© 2018 SPLUNK INC.
Anomaly Detection
Our typical pattern
Events
SI
Per-minute
KPI summary
SI
Daily
Baseline
(KPI)
Time
Operation
Instance
avg value
…in bulk
Operation
[time of day]
all instances
Median KPI
Key percentiles
-> range
Typically:3w
Express difference between current
KPI value and baseline median, as
a multiple of a percentile range
…for this operation
…for this time of day
Trigger on the value of the multiple
(e.g. 2x)
© 2018 SPLUNK INC.
Anomaly Detection
Visualise the time-based baseline
© 2018 SPLUNK INC.
Levers
Building in the control we need
Events
Per-minute
KPI summary
Daily
Baseline
(KPI)
Time
Operation
Instance
avg value
Operation
[time of day]
all instances
Median KPI
Key percentiles
-> range
Typically:3w
Express difference between current
KPI value and baseline median, as
a multiple of the typical range
…for this operation
…for this time of day
Trigger on the value of the multiple
(e.g. 2x)
Don’t let todays anomaly be tomorrow’s baseline
The range is not the threshold
Use range to eliminate outliers and the multiple to control the threshold
• No Data Cleansing
• Backtest
• Dashboard Support
• 2 week trial
As short as possible, to get a decent spread of data
© 2018 SPLUNK INC.
Predicting EOD License
Will we bust the Splunk license today?
• Usage varies over day
• Simple extrapolation would not work
• We run license close so need accuracy
• Has to handle earlier incidents (high usage)
• Uses a typical day as baseline, which
• Must be recent
• Must recalibrate after incident
© 2018 SPLUNK INC.
EOD License: Baseline
Derive baseline, using percentiles again to remove outliers
• Looks at last 9 days
• Use cumulative volume (streamstats)
• Find EOD volume (eventstats)
• 2 will be weekends
• Up to 3 more days could be trading holidays
• Allow up to 2 days to have incidents
• Don’t want to blend days
• Use 3rd biggest day (exactperc72(EOD))
• Could have >1 day with same EOD
• Could be smarter but good enough
© 2018 SPLUNK INC.
Predicting EOD License
Predicting EOD based on sensible assumptions
?Same time
of day
Rolling 60m
Rest of day
Rolling 60m
Used
TodayBaseline day
© 2018 SPLUNK INC.
Dashboard Support
Getting to root cause
Avoiding SPL
 Recent History
 Breakdown by sourcetype
 Breakdown by index
 Historical context
 Comparison Day
 Biggest increases by sourcetype
 Biggest increases by index
 Comparison by time – sourcetype
 Comparison by time - index
 Latest prediction
 Trendline
© 2018 SPLUNK INC.
Trajectory-based Disk Alert
Why wait until a static alert fires?
© 2018 SPLUNK INC.
Live Dashboard
Fed by Splunk (but not built in Splunk)
• Combines static and dynamic
• Services are grouped
• Static view shows RAG
• Dynamic list shows details
• Middle column is single-site
• + News and Changes
© 2018 SPLUNK INC.
Behind the Scenes
Housekeeping tasks we do to keep this on the road
• Build summary indexes for speed & retention
• Handle late-arriving data
• Detect increases/decrease in index volume
• Detect when events stop (not trivial)
• Check assumptions made in searches
• Manage lookups
• Handle alert exclusions
• Handle clock changes
• Dashboard/report curation
• Manage schedule report load
• Treat as code. Test it. KISS. Manage changes
• It’s not rocket science!
© 2018 SPLUNK INC.© 2018 SPLUNK INC.
Thank you

More Related Content

What's hot

Turning Data Into Business Outcomes with the Splunk Platform
Turning Data Into Business Outcomes with the Splunk PlatformTurning Data Into Business Outcomes with the Splunk Platform
Turning Data Into Business Outcomes with the Splunk PlatformSplunk
 
Accelerate Incident Response with Orchestration & Automation
Accelerate Incident Response with Orchestration & AutomationAccelerate Incident Response with Orchestration & Automation
Accelerate Incident Response with Orchestration & AutomationSplunk
 
Predictive, Proactive, and Collaborative ML with iT Service Intelligence
Predictive, Proactive, and Collaborative ML with iT Service Intelligence Predictive, Proactive, and Collaborative ML with iT Service Intelligence
Predictive, Proactive, and Collaborative ML with iT Service Intelligence Splunk
 
Abenteuer bei Monitoring und Troubleshooting
Abenteuer bei Monitoring und TroubleshootingAbenteuer bei Monitoring und Troubleshooting
Abenteuer bei Monitoring und TroubleshootingSplunk
 
Splunk4Leaders: How to Supercharge your Decision Making Capability
Splunk4Leaders: How to Supercharge your Decision Making CapabilitySplunk4Leaders: How to Supercharge your Decision Making Capability
Splunk4Leaders: How to Supercharge your Decision Making CapabilitySplunk
 
Exploring Frameworks of Splunk Enterprise Security
Exploring Frameworks of Splunk Enterprise Security Exploring Frameworks of Splunk Enterprise Security
Exploring Frameworks of Splunk Enterprise Security Splunk
 
Machine Learning in Action
Machine Learning in Action Machine Learning in Action
Machine Learning in Action Splunk
 
Mit Splunk Artificial Intelligence und Machine Learning mehr aus Ihren Daten ...
Mit Splunk Artificial Intelligence und Machine Learning mehr aus Ihren Daten ...Mit Splunk Artificial Intelligence und Machine Learning mehr aus Ihren Daten ...
Mit Splunk Artificial Intelligence und Machine Learning mehr aus Ihren Daten ...Splunk
 
Accelerate incident Response Using Orchestration and Automation
Accelerate incident Response Using Orchestration and Automation Accelerate incident Response Using Orchestration and Automation
Accelerate incident Response Using Orchestration and Automation Splunk
 
Adventures in Monitoring and Troubleshooting
Adventures in Monitoring and Troubleshooting Adventures in Monitoring and Troubleshooting
Adventures in Monitoring and Troubleshooting Splunk
 
Einführung in Security Analytics Methoden
Einführung in Security Analytics MethodenEinführung in Security Analytics Methoden
Einführung in Security Analytics MethodenSplunk
 
Alle Neuigkeiten im letzten Plattform Release
Alle Neuigkeiten im letzten Plattform ReleaseAlle Neuigkeiten im letzten Plattform Release
Alle Neuigkeiten im letzten Plattform ReleaseSplunk
 
Worst Splunk practices...and how to fix them
Worst Splunk practices...and how to fix them Worst Splunk practices...and how to fix them
Worst Splunk practices...and how to fix them Splunk
 
Splunk Cloud and Splunk Enterprise 7.2
Splunk Cloud and Splunk Enterprise 7.2 Splunk Cloud and Splunk Enterprise 7.2
Splunk Cloud and Splunk Enterprise 7.2 Splunk
 
Machine Learning in Action
Machine Learning in ActionMachine Learning in Action
Machine Learning in ActionSplunk
 
Get more from your Machine Data with Splunk AI and ML
Get more from your Machine Data with Splunk AI and ML Get more from your Machine Data with Splunk AI and ML
Get more from your Machine Data with Splunk AI and ML Splunk
 

What's hot (16)

Turning Data Into Business Outcomes with the Splunk Platform
Turning Data Into Business Outcomes with the Splunk PlatformTurning Data Into Business Outcomes with the Splunk Platform
Turning Data Into Business Outcomes with the Splunk Platform
 
Accelerate Incident Response with Orchestration & Automation
Accelerate Incident Response with Orchestration & AutomationAccelerate Incident Response with Orchestration & Automation
Accelerate Incident Response with Orchestration & Automation
 
Predictive, Proactive, and Collaborative ML with iT Service Intelligence
Predictive, Proactive, and Collaborative ML with iT Service Intelligence Predictive, Proactive, and Collaborative ML with iT Service Intelligence
Predictive, Proactive, and Collaborative ML with iT Service Intelligence
 
Abenteuer bei Monitoring und Troubleshooting
Abenteuer bei Monitoring und TroubleshootingAbenteuer bei Monitoring und Troubleshooting
Abenteuer bei Monitoring und Troubleshooting
 
Splunk4Leaders: How to Supercharge your Decision Making Capability
Splunk4Leaders: How to Supercharge your Decision Making CapabilitySplunk4Leaders: How to Supercharge your Decision Making Capability
Splunk4Leaders: How to Supercharge your Decision Making Capability
 
Exploring Frameworks of Splunk Enterprise Security
Exploring Frameworks of Splunk Enterprise Security Exploring Frameworks of Splunk Enterprise Security
Exploring Frameworks of Splunk Enterprise Security
 
Machine Learning in Action
Machine Learning in Action Machine Learning in Action
Machine Learning in Action
 
Mit Splunk Artificial Intelligence und Machine Learning mehr aus Ihren Daten ...
Mit Splunk Artificial Intelligence und Machine Learning mehr aus Ihren Daten ...Mit Splunk Artificial Intelligence und Machine Learning mehr aus Ihren Daten ...
Mit Splunk Artificial Intelligence und Machine Learning mehr aus Ihren Daten ...
 
Accelerate incident Response Using Orchestration and Automation
Accelerate incident Response Using Orchestration and Automation Accelerate incident Response Using Orchestration and Automation
Accelerate incident Response Using Orchestration and Automation
 
Adventures in Monitoring and Troubleshooting
Adventures in Monitoring and Troubleshooting Adventures in Monitoring and Troubleshooting
Adventures in Monitoring and Troubleshooting
 
Einführung in Security Analytics Methoden
Einführung in Security Analytics MethodenEinführung in Security Analytics Methoden
Einführung in Security Analytics Methoden
 
Alle Neuigkeiten im letzten Plattform Release
Alle Neuigkeiten im letzten Plattform ReleaseAlle Neuigkeiten im letzten Plattform Release
Alle Neuigkeiten im letzten Plattform Release
 
Worst Splunk practices...and how to fix them
Worst Splunk practices...and how to fix them Worst Splunk practices...and how to fix them
Worst Splunk practices...and how to fix them
 
Splunk Cloud and Splunk Enterprise 7.2
Splunk Cloud and Splunk Enterprise 7.2 Splunk Cloud and Splunk Enterprise 7.2
Splunk Cloud and Splunk Enterprise 7.2
 
Machine Learning in Action
Machine Learning in ActionMachine Learning in Action
Machine Learning in Action
 
Get more from your Machine Data with Splunk AI and ML
Get more from your Machine Data with Splunk AI and ML Get more from your Machine Data with Splunk AI and ML
Get more from your Machine Data with Splunk AI and ML
 

Similar to Still Suffering from IT Outages? Accept Failure, Learn from Failure and Get Rid of Failure to Protect your Business

Legacy IBM Systems and Splunk: Security, Compliance and Uptime
Legacy IBM Systems and Splunk: Security, Compliance and UptimeLegacy IBM Systems and Splunk: Security, Compliance and Uptime
Legacy IBM Systems and Splunk: Security, Compliance and UptimePrecisely
 
SplunkLive! Paris 2018: Integrating Metrics and Logs
SplunkLive! Paris 2018: Integrating Metrics and LogsSplunkLive! Paris 2018: Integrating Metrics and Logs
SplunkLive! Paris 2018: Integrating Metrics and LogsSplunk
 
SplunkLive! Munich 2018: Predictive, Proactive, and Collaborative ML with IT ...
SplunkLive! Munich 2018: Predictive, Proactive, and Collaborative ML with IT ...SplunkLive! Munich 2018: Predictive, Proactive, and Collaborative ML with IT ...
SplunkLive! Munich 2018: Predictive, Proactive, and Collaborative ML with IT ...Splunk
 
SplunkLive! Paris 2018: Delivering New Visibility And Analytics For IT Operat...
SplunkLive! Paris 2018: Delivering New Visibility And Analytics For IT Operat...SplunkLive! Paris 2018: Delivering New Visibility And Analytics For IT Operat...
SplunkLive! Paris 2018: Delivering New Visibility And Analytics For IT Operat...Splunk
 
AIOps Roundtable Munich 2018
AIOps Roundtable Munich 2018AIOps Roundtable Munich 2018
AIOps Roundtable Munich 2018Splunk
 
SplunkLive! Frankfurt 2018 - Predictive, Proactive, and Collaborative ML with...
SplunkLive! Frankfurt 2018 - Predictive, Proactive, and Collaborative ML with...SplunkLive! Frankfurt 2018 - Predictive, Proactive, and Collaborative ML with...
SplunkLive! Frankfurt 2018 - Predictive, Proactive, and Collaborative ML with...Splunk
 
Splunk Webinar: IT Operations Demo für Troubleshooting & Dashboarding
Splunk Webinar: IT Operations Demo für Troubleshooting & DashboardingSplunk Webinar: IT Operations Demo für Troubleshooting & Dashboarding
Splunk Webinar: IT Operations Demo für Troubleshooting & DashboardingGeorg Knon
 
Predictive, Proactive, and Collaborative ML with iT Service Intelligence
Predictive, Proactive, and Collaborative ML with iT Service Intelligence Predictive, Proactive, and Collaborative ML with iT Service Intelligence
Predictive, Proactive, and Collaborative ML with iT Service Intelligence Splunk
 
Adventures in Monitoring and Troubleshooting
Adventures in Monitoring and Troubleshooting Adventures in Monitoring and Troubleshooting
Adventures in Monitoring and Troubleshooting Splunk
 
Splunk for Industrial Data and the Internet of Things
Splunk for Industrial Data and the Internet of ThingsSplunk for Industrial Data and the Internet of Things
Splunk for Industrial Data and the Internet of Thingsaliciasyc
 
Splunk - Splunk for Industrial Data and the Internet of Things
Splunk - Splunk for Industrial Data and the Internet of ThingsSplunk - Splunk for Industrial Data and the Internet of Things
Splunk - Splunk for Industrial Data and the Internet of ThingsAruj Thirawat
 
Splunk IT Service Intelligence Overview - AIOps Roundtable Bern
Splunk IT Service Intelligence Overview - AIOps Roundtable BernSplunk IT Service Intelligence Overview - AIOps Roundtable Bern
Splunk IT Service Intelligence Overview - AIOps Roundtable BernSplunk
 
Splunk Discovery: Warsaw 2018 - Legacy SIEM to Splunk, How to Conquer Migrati...
Splunk Discovery: Warsaw 2018 - Legacy SIEM to Splunk, How to Conquer Migrati...Splunk Discovery: Warsaw 2018 - Legacy SIEM to Splunk, How to Conquer Migrati...
Splunk Discovery: Warsaw 2018 - Legacy SIEM to Splunk, How to Conquer Migrati...Splunk
 
SplunkLive! Zurich 2018: Monitoring the End User Experience with Splunk
SplunkLive! Zurich 2018: Monitoring the End User Experience with SplunkSplunkLive! Zurich 2018: Monitoring the End User Experience with Splunk
SplunkLive! Zurich 2018: Monitoring the End User Experience with SplunkSplunk
 
Splunk Webinar: Full-Stack End-to-End SAP-Monitoring mit Splunk
Splunk Webinar: Full-Stack End-to-End SAP-Monitoring mit SplunkSplunk Webinar: Full-Stack End-to-End SAP-Monitoring mit Splunk
Splunk Webinar: Full-Stack End-to-End SAP-Monitoring mit SplunkSplunk
 
Splunk Discovery Köln - 17-01-2020 - Splunk for ITOps
Splunk Discovery Köln - 17-01-2020 - Splunk for ITOpsSplunk Discovery Köln - 17-01-2020 - Splunk for ITOps
Splunk Discovery Köln - 17-01-2020 - Splunk for ITOpsSplunk
 
SplunkLive! Munich 2018: Monitoring the End-User Experience with Splunk
SplunkLive! Munich 2018: Monitoring the End-User Experience with SplunkSplunkLive! Munich 2018: Monitoring the End-User Experience with Splunk
SplunkLive! Munich 2018: Monitoring the End-User Experience with SplunkSplunk
 
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...Harry McLaren
 
SplunkLive! Frankfurt 2018 - Monitoring the End User Experience with Splunk
SplunkLive! Frankfurt 2018 - Monitoring the End User Experience with SplunkSplunkLive! Frankfurt 2018 - Monitoring the End User Experience with Splunk
SplunkLive! Frankfurt 2018 - Monitoring the End User Experience with SplunkSplunk
 
How to Handle the Realities of DevOps Monitoring Today
How to Handle the Realities of DevOps Monitoring TodayHow to Handle the Realities of DevOps Monitoring Today
How to Handle the Realities of DevOps Monitoring TodayDevOps.com
 

Similar to Still Suffering from IT Outages? Accept Failure, Learn from Failure and Get Rid of Failure to Protect your Business (20)

Legacy IBM Systems and Splunk: Security, Compliance and Uptime
Legacy IBM Systems and Splunk: Security, Compliance and UptimeLegacy IBM Systems and Splunk: Security, Compliance and Uptime
Legacy IBM Systems and Splunk: Security, Compliance and Uptime
 
SplunkLive! Paris 2018: Integrating Metrics and Logs
SplunkLive! Paris 2018: Integrating Metrics and LogsSplunkLive! Paris 2018: Integrating Metrics and Logs
SplunkLive! Paris 2018: Integrating Metrics and Logs
 
SplunkLive! Munich 2018: Predictive, Proactive, and Collaborative ML with IT ...
SplunkLive! Munich 2018: Predictive, Proactive, and Collaborative ML with IT ...SplunkLive! Munich 2018: Predictive, Proactive, and Collaborative ML with IT ...
SplunkLive! Munich 2018: Predictive, Proactive, and Collaborative ML with IT ...
 
SplunkLive! Paris 2018: Delivering New Visibility And Analytics For IT Operat...
SplunkLive! Paris 2018: Delivering New Visibility And Analytics For IT Operat...SplunkLive! Paris 2018: Delivering New Visibility And Analytics For IT Operat...
SplunkLive! Paris 2018: Delivering New Visibility And Analytics For IT Operat...
 
AIOps Roundtable Munich 2018
AIOps Roundtable Munich 2018AIOps Roundtable Munich 2018
AIOps Roundtable Munich 2018
 
SplunkLive! Frankfurt 2018 - Predictive, Proactive, and Collaborative ML with...
SplunkLive! Frankfurt 2018 - Predictive, Proactive, and Collaborative ML with...SplunkLive! Frankfurt 2018 - Predictive, Proactive, and Collaborative ML with...
SplunkLive! Frankfurt 2018 - Predictive, Proactive, and Collaborative ML with...
 
Splunk Webinar: IT Operations Demo für Troubleshooting & Dashboarding
Splunk Webinar: IT Operations Demo für Troubleshooting & DashboardingSplunk Webinar: IT Operations Demo für Troubleshooting & Dashboarding
Splunk Webinar: IT Operations Demo für Troubleshooting & Dashboarding
 
Predictive, Proactive, and Collaborative ML with iT Service Intelligence
Predictive, Proactive, and Collaborative ML with iT Service Intelligence Predictive, Proactive, and Collaborative ML with iT Service Intelligence
Predictive, Proactive, and Collaborative ML with iT Service Intelligence
 
Adventures in Monitoring and Troubleshooting
Adventures in Monitoring and Troubleshooting Adventures in Monitoring and Troubleshooting
Adventures in Monitoring and Troubleshooting
 
Splunk for Industrial Data and the Internet of Things
Splunk for Industrial Data and the Internet of ThingsSplunk for Industrial Data and the Internet of Things
Splunk for Industrial Data and the Internet of Things
 
Splunk - Splunk for Industrial Data and the Internet of Things
Splunk - Splunk for Industrial Data and the Internet of ThingsSplunk - Splunk for Industrial Data and the Internet of Things
Splunk - Splunk for Industrial Data and the Internet of Things
 
Splunk IT Service Intelligence Overview - AIOps Roundtable Bern
Splunk IT Service Intelligence Overview - AIOps Roundtable BernSplunk IT Service Intelligence Overview - AIOps Roundtable Bern
Splunk IT Service Intelligence Overview - AIOps Roundtable Bern
 
Splunk Discovery: Warsaw 2018 - Legacy SIEM to Splunk, How to Conquer Migrati...
Splunk Discovery: Warsaw 2018 - Legacy SIEM to Splunk, How to Conquer Migrati...Splunk Discovery: Warsaw 2018 - Legacy SIEM to Splunk, How to Conquer Migrati...
Splunk Discovery: Warsaw 2018 - Legacy SIEM to Splunk, How to Conquer Migrati...
 
SplunkLive! Zurich 2018: Monitoring the End User Experience with Splunk
SplunkLive! Zurich 2018: Monitoring the End User Experience with SplunkSplunkLive! Zurich 2018: Monitoring the End User Experience with Splunk
SplunkLive! Zurich 2018: Monitoring the End User Experience with Splunk
 
Splunk Webinar: Full-Stack End-to-End SAP-Monitoring mit Splunk
Splunk Webinar: Full-Stack End-to-End SAP-Monitoring mit SplunkSplunk Webinar: Full-Stack End-to-End SAP-Monitoring mit Splunk
Splunk Webinar: Full-Stack End-to-End SAP-Monitoring mit Splunk
 
Splunk Discovery Köln - 17-01-2020 - Splunk for ITOps
Splunk Discovery Köln - 17-01-2020 - Splunk for ITOpsSplunk Discovery Köln - 17-01-2020 - Splunk for ITOps
Splunk Discovery Köln - 17-01-2020 - Splunk for ITOps
 
SplunkLive! Munich 2018: Monitoring the End-User Experience with Splunk
SplunkLive! Munich 2018: Monitoring the End-User Experience with SplunkSplunkLive! Munich 2018: Monitoring the End-User Experience with Splunk
SplunkLive! Munich 2018: Monitoring the End-User Experience with Splunk
 
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...
 
SplunkLive! Frankfurt 2018 - Monitoring the End User Experience with Splunk
SplunkLive! Frankfurt 2018 - Monitoring the End User Experience with SplunkSplunkLive! Frankfurt 2018 - Monitoring the End User Experience with Splunk
SplunkLive! Frankfurt 2018 - Monitoring the End User Experience with Splunk
 
How to Handle the Realities of DevOps Monitoring Today
How to Handle the Realities of DevOps Monitoring TodayHow to Handle the Realities of DevOps Monitoring Today
How to Handle the Realities of DevOps Monitoring Today
 

More from Splunk

.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routineSplunk
 
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTVSplunk
 
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica).conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica)Splunk
 
.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank International.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank InternationalSplunk
 
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett .conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett Splunk
 
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär).conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)Splunk
 
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu....conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...Splunk
 
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever....conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...Splunk
 
.conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex).conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex)Splunk
 
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)Splunk
 
Splunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11ySplunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11ySplunk
 
Splunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go KölnSplunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go KölnSplunk
 
Splunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk
 
Data foundations building success, at city scale – Imperial College London
 Data foundations building success, at city scale – Imperial College London Data foundations building success, at city scale – Imperial College London
Data foundations building success, at city scale – Imperial College LondonSplunk
 
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...Splunk
 
SOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security WebinarSOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security WebinarSplunk
 
.conf Go 2022 - Observability Session
.conf Go 2022 - Observability Session.conf Go 2022 - Observability Session
.conf Go 2022 - Observability SessionSplunk
 
.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - Keynote.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - KeynoteSplunk
 
.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform SessionSplunk
 
.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security Session.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security SessionSplunk
 

More from Splunk (20)

.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine
 
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
 
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica).conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
 
.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank International.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank International
 
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett .conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
 
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär).conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
 
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu....conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
 
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever....conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
 
.conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex).conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex)
 
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
 
Splunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11ySplunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11y
 
Splunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go KölnSplunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go Köln
 
Splunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go Köln
 
Data foundations building success, at city scale – Imperial College London
 Data foundations building success, at city scale – Imperial College London Data foundations building success, at city scale – Imperial College London
Data foundations building success, at city scale – Imperial College London
 
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
 
SOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security WebinarSOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security Webinar
 
.conf Go 2022 - Observability Session
.conf Go 2022 - Observability Session.conf Go 2022 - Observability Session
.conf Go 2022 - Observability Session
 
.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - Keynote.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - Keynote
 
.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session
 
.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security Session.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security Session
 

Recently uploaded

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 

Recently uploaded (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 

Still Suffering from IT Outages? Accept Failure, Learn from Failure and Get Rid of Failure to Protect your Business

  • 1. © 2018 SPLUNK INC. Still haven’t got on top of IT outages? Accept failure, learn from failure and get rid of failure to protect your business Dr. Siyka Andreeva | IT Operations Analytics Specialist April 2019
  • 2. © 2018 SPLUNK INC. Forward Looking Statements During the course of this presentation, we may make forward-looking statements regarding future events or the expected performance of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward-looking statements made in this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information. We do not assume any obligation to update any forward-looking statements we may make. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include any such feature or functionality in a future release. Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2017 Splunk Inc. All rights reserved.
  • 3. © 2018 SPLUNK INC. Agenda Why You Need to Stop Being Reactive Data and Machine Learning: How to Get to a Predictive IT Case Study with CMC Markets
  • 4. © 2018 SPLUNK INC. High Availability is everywhere ! How many 9’s do you have? 100% 100% 100% 99,999%
  • 5. © 2018 SPLUNK INC. Because we live in a (theorical) SLA world But surrounded by storms, human errors and trolls SQLApp Service 99,95% 99,95% • App service is down • SQL is down • Both are down Serial Compound Availability The overall “service” availability is lower: 99,90% SQLApp Service 99,95% 99,95% SQLApp Service 99,95% 99,95% Serial and parallel Availability 99,99% A B Traffic Mger • App service (A) is down • SQL (A) is down • Both App/SQL (A) are down • App service (B) is down • SQL (B) is down • Both App/SQL (B) are down • Traffic Mger is down • Combination of above 99,98%You still have a SPOF Overall SLA is:
  • 6. © 2018 SPLUNK INC. And yet there are more outages than ever 25% 2017 31% 2018 suffered an outage or period of “server service degradation” over the past 12 months, Source: Uptime Institute 2018 (8th annual Data Center Survey) 48% If on-prem DC 80%Could have been prevented Leading causes: Human errors, power outages, network, configuration issues
  • 7. © 2018 SPLUNK INC. More outages than ever + higher cost / incident Customer Satisfaction Brand Reputation Line of Revenue *According to “Damage Control: The Impact of Critical IT Incidents” $105,302 the mean business cost of an IT incident
  • 8. © 2018 SPLUNK INC. Predict and Prevent Operational Issues with AI $ Impact Proactive (add logs and metrics) Effective $ Impact Existing Events Cost of Impact Reactively Alerted MTTR Automated Resolution MTTR MTTR Splunk ML Alert
  • 9. © 2018 SPLUNK INC. Predict and Prevent Operational Issues with AI $ Impact Predictive Proactive (add logs and metrics) Effective $ Impact Existing Events NEGATIVE MTTR!! Predict 30 Minutes in Advance Time Return to Business Cost of Impact Reactively Alerted MTTR Automated Resolution MTTR MTTR Splunk ML Alert
  • 10. © 2018 SPLUNK INC. Online ServicesNetworks Security Call Detail Records Web Services Telecoms Web Clickstreams Tracing Online Shopping Cart Smartphones and Devices Custom Applications Energy Meters Storage Public Cloud Private Cloud Containers On-Premises Servers GPS Location RFID Packaged ApplicationsDatabases MessagingFirewall Logs Wired DB Mobile IoT APIMetrics Data lake APM Traces + Machine Learning Multiple Data Sources The right teams are automatically alerted of the incidents to take actions Teams are notified of the potential issues BEFORE they turn red Automate runbooks for known issues Alerts correlated across the stack and prioritized and presented by Service Impact
  • 11. © 2018 SPLUNK INC. How to find a needle in multiple haystacks? (choose your tool) Network? Database? Middleware? Hardware? Wrong command? Connection? Apache? VM? Mainframe? Load balancer?Wrong code released? Collect ALL data • Collect from all silos • Data in original raw format • Add open sources apps to ingest data on the fly • Schema on the fly • Dynamic thresholding • Realtime correlation Clustering & aggregation • Real time event clustering/correlation • Reduce alert noise • Behavioural analytics • Deduplication Add context • Measure / report on indicators that matters • Add service / business context • Add actionable information to detection Salessso Claims Anomaly detection • Catch issues that thresholds cannot • Reduce event clutter • Deviation from past behaviour • Deviation from peers • Unusual change in features Assisted deep dive investigation • Root cause analysis • Powerful & easy to use search & investigate language ? Predictive Analytics • Predict service health • Predict events • Trend forecasting • Detect influencing entities • Early warning of failure 70% to 90% Reduction in investigation time 15% to 45% Reduction in high priority incidents 67% to 82% Reduction in business impact
  • 12. © 2018 SPLUNK INC. How We’re Getting There Richard Bailey CMC Markets
  • 13. © 2018 SPLUNK INC. Introduction • Not a blueprint • Organic / agile • Our challenges • Multiple use cases • Process • What we collect • DIY anomaly detection • Predicting the predictable • Essential housekeeping
  • 14. © 2018 SPLUNK INC. What Does CMC Markets Do? • Online Retail Financial Trading • Spreadbets & CFDs • Leveraged Products • Short-term Positions • Automated Trading • Worldwide Product Base
  • 15.
  • 16. © 2018 SPLUNK INC. Specific Monitoring Situation (That may not apply to all Splunk Customers) • Short, sharp, unpredictable load • Sub-second performance targets • External SLAs • Regulatory environment • In-house development  Highly granular stats (e.g per sec,cpu)  Care about short pauses  Financial penalties  Fast, fair, transparent, evidenced  Can change logging
  • 17. © 2018 SPLUNK INC. @ Base Splunk 1TB/day - On-Prem - 2-Site Clustered - All-Flash Storage Enterprise Security Log Management Application Performance Monitoring Monitoring (everything) IT Ops Security (incl. SIEM) Business Ops Perf Testing Surveillance Capacity Mgmt SLA Reporting Alert Generation
  • 18. © 2018 SPLUNK INC. (e.g. Splunk’s MC) Full Picture Multi-use Peace-of-mind Support Specific Alert Reduce MTTR Self-explanatory Support runbook Encapsulates expertise + Alert Tuning Rare (prefer Alerts) Maximize Info Not self-explanatory Human correlation Operational Their only route to data Dashboards We have distinct types of dashboard General Alert Response Live Business
  • 19. © 2018 SPLUNK INC. Process Culture of Closed-Loop and Continuous Improvement Service Monitoring Restore Service InvestigatePost- Incident Review Machine Learning Incidents Alerts Anomalies Predictions Insights Noise Reduction Improvements Lessons learnt Solutions • Could we have prevented this? • Could we have seen this coming? • Could we have got to root cause faster? • Did we have all the data/insights we needed? • Can we eliminate any noise? • Did we need to write SPL? • Runbooks • Dashboards • Aim: No SPL
  • 20. © 2018 SPLUNK INC. Monitoring Services Service Internals Application Logs In-Memory Counters JMX GC Logs Monitoring API Load EUM -> CDN -> TM -> Logs & Upstream Services Performance Logs -> TM -> CDN ->EUM & Downstream Services State Infrastructure Storage NW Messaging DBs Resource Utilisation CPU IO Mem NW Correlation
  • 21. © 2018 SPLUNK INC. Anomaly Detection The Goals • Detect effects of changes • Early Warning …but still value in post-incident info • Must handle incidents - today’s slowdown must not become tomorrow’s normal - yet responsive to intended service changes - but not ignore long-term gradual degradation • Control: adjust sensitivity, reduce false alarms • Handle hot/cold nodes and rolling restarts • Relatable (black box vs plain sight) • Traceable (back to real figures) • Actionable (deal with both incidents and false alarms)
  • 22. © 2018 SPLUNK INC. Anomaly Detection Our typical pattern Events SI Per-minute KPI summary SI Daily Baseline (KPI) Time Operation Instance avg value …in bulk Operation [time of day] all instances Median KPI Key percentiles -> range Typically:3w Express difference between current KPI value and baseline median, as a multiple of a percentile range …for this operation …for this time of day Trigger on the value of the multiple (e.g. 2x)
  • 23. © 2018 SPLUNK INC. Anomaly Detection Visualise the time-based baseline
  • 24. © 2018 SPLUNK INC. Levers Building in the control we need Events Per-minute KPI summary Daily Baseline (KPI) Time Operation Instance avg value Operation [time of day] all instances Median KPI Key percentiles -> range Typically:3w Express difference between current KPI value and baseline median, as a multiple of the typical range …for this operation …for this time of day Trigger on the value of the multiple (e.g. 2x) Don’t let todays anomaly be tomorrow’s baseline The range is not the threshold Use range to eliminate outliers and the multiple to control the threshold • No Data Cleansing • Backtest • Dashboard Support • 2 week trial As short as possible, to get a decent spread of data
  • 25. © 2018 SPLUNK INC. Predicting EOD License Will we bust the Splunk license today? • Usage varies over day • Simple extrapolation would not work • We run license close so need accuracy • Has to handle earlier incidents (high usage) • Uses a typical day as baseline, which • Must be recent • Must recalibrate after incident
  • 26. © 2018 SPLUNK INC. EOD License: Baseline Derive baseline, using percentiles again to remove outliers • Looks at last 9 days • Use cumulative volume (streamstats) • Find EOD volume (eventstats) • 2 will be weekends • Up to 3 more days could be trading holidays • Allow up to 2 days to have incidents • Don’t want to blend days • Use 3rd biggest day (exactperc72(EOD)) • Could have >1 day with same EOD • Could be smarter but good enough
  • 27. © 2018 SPLUNK INC. Predicting EOD License Predicting EOD based on sensible assumptions ?Same time of day Rolling 60m Rest of day Rolling 60m Used TodayBaseline day
  • 28. © 2018 SPLUNK INC. Dashboard Support Getting to root cause Avoiding SPL  Recent History  Breakdown by sourcetype  Breakdown by index  Historical context  Comparison Day  Biggest increases by sourcetype  Biggest increases by index  Comparison by time – sourcetype  Comparison by time - index  Latest prediction  Trendline
  • 29. © 2018 SPLUNK INC. Trajectory-based Disk Alert Why wait until a static alert fires?
  • 30. © 2018 SPLUNK INC. Live Dashboard Fed by Splunk (but not built in Splunk) • Combines static and dynamic • Services are grouped • Static view shows RAG • Dynamic list shows details • Middle column is single-site • + News and Changes
  • 31. © 2018 SPLUNK INC. Behind the Scenes Housekeeping tasks we do to keep this on the road • Build summary indexes for speed & retention • Handle late-arriving data • Detect increases/decrease in index volume • Detect when events stop (not trivial) • Check assumptions made in searches • Manage lookups • Handle alert exclusions • Handle clock changes • Dashboard/report curation • Manage schedule report load • Treat as code. Test it. KISS. Manage changes • It’s not rocket science!
  • 32. © 2018 SPLUNK INC.© 2018 SPLUNK INC. Thank you