© 2017 SPLUNK INC.© 2017 SPLUNK INC.
ITOA Roundtable
30.11.2017
© 2017 SPLUNK INC.
▶ 09:00 – Breakfast
▶ 09:30 – Niklaus Seiler to kick off
▶ 09:35 – Round table introductions.
▶ 10:00 – Presentation of ML and AIOps
▶ 10:30 – Break
▶ 10:45 – Round table discussion
▶ 12:00 – Lunch
Agenda
© 2017 SPLUNK INC.
Five things you will learn today
What is Machine Learning
Why ML is critical for today’s IT
The challenges you will need to
overcome
Some real examples of
ML use cases
How to get started doing ML
1
2
3
4
5
© 2017 SPLUNK INC.
Splunk turns
machine data
into answers
Splunk’s trusted analytics platform
empowers people to dive into their
machine data so they can find answers
quickly and see opportunities in real-time.
© 2017 SPLUNK INC.
Machine Learning
Deep Learning
Artificial Intelligence
Neural Networks
Unsupervised
Supervised
What does it
mean?
© 2017 SPLUNK INC.
Machine learning used in our
everyday lives
Facial
Recognition
Product
Recommendations
Natural Language
Processing
Self Driving
Cars
© 2017 SPLUNK INC.
AI
Machine
Learning
Deep
Learning
AI, Deep Learning, And Machine Learning
Intelligent Agents
No Human Involvement
Sentient Machines
Tensorflow
Data sets
are large and
unknowable
Guided Data Driven Decisions
Augmenting Human Reasoning
Operational
Intelligence
Splunk ML offerings today
Neural Networks
© 2017 SPLUNK INC.
Three Types Of Machine Learning
Supervised Learning:
Unsupervised Learning:
Reinforcement Learning:
© 2017 SPLUNK INC.
Deviation from past behavior
Deviation from peers
(aka Multivariate AD or Cohesive AD)
Unusual change in features
ITSI MAD Anomaly Detection
Predict Service Outages
Predicting Churn
Predicting Events
Trend Forecasting
Detecting influencing entities
Early warning of failure – predictive
maintenance
Identify peer groups
Event Correlation
Reduce alert noise
ITSI Event Analytics
Anomaly detection Predictive Analytics Clustering
Typical ML use cases in Splunk
© 2017 SPLUNK INC.
Custom Machine Learning – Success Formula
Domain
Expertise
(IT, Security, …)
Data
Science
Expertise
Splunk
Expertise
Identify use cases
Drive decisions
Set business/ops priorities
SPL
Data prep
Statistics/math background
Algorithm selection
Model building
Splunk ML Toolkit
facilitates and simplifies
via examples & guidance
Operational success
© 2017 SPLUNK INC.
▶ Assistants: Guided model building, testing
and deployment for common objectives
▶ Showcases: Interactive examples for typical
IT, security, business and IoT use cases
▶ Algorithms: 25+ standard algorithms included
with the toolkit
▶ ML Commands: New SPL commands to
fit, test and operationalize models
▶ Python for Scientific Computing Library:
Access to 300+ open source algorithms
Splunk Machine Learning Toolkit
Extends Splunk platform functions and
provides a guided modeling environment
Build custom analytics for any use case
© 2017 SPLUNK INC.
Demo
© 2017 SPLUNK INC.
Machine Skills
• Creative
• Design
• Empathy
• Intuition
• Sales
• Marketing
• Business Acumen
• Analyzing high-velocity
data
• Analyzing big data sets
• Complex data sets
• Predicting future values
• Detect Anomalies
• Complex patterns
Human Skills
Human and machines
© 2017 SPLUNK INC.
How long do you spend driving a
car?
How much of that time is
productive?
What’s the value?
© 2017 SPLUNK INC.
How much time do we spend
doing this in IT?
Searching lots of data
Creating static rules
Monitoring
Troubleshooting
Root Cause Analysis
Capacity Planning
War Rooms
Forecasting
….
© 2017 SPLUNK INC.
Current Challenges that need to be addressed
Lower customer
satisfaction
High cost of IT
Operation
Inefficient use of
resources
Lost revenue
Most of IT budget is
spent operating
datacenters
$426 Billion in
downtime per year
300% increase in
number of IT events
over last 5 years
Drop in customer
satisfaction largely
due to issues with
digital channels
© 2017 SPLUNK INC.
Analysing large sets of data
Fast-flowing data
Complex data
Spotting complex patterns
Reacting in real time
And yet we said that machines are good at
What is stopping us from doing ML for IT operations?
© 2017 SPLUNK INC.
Collecting and analysing this data has never been so easy
Network
InfrastructureLayer
Packet, Payload, Traffic,
Utilization, Perf
Storage
Utilization, Capacity,
Performance
Server
Performance, Usage,
Dependency
ApplicationLayer
User Experience
Usage, Response Time,
Failed Interactions
Byte Code Instrumentation
Usage, Experience,
Performance, Quality
Business Performance
Corporate Data, Intake,
Output, Throughput
Splunk Approach:
▶ Single repository for ALL data
▶ Data in original raw format
▶ Machine learning
▶ Simplified architecture
▶ Fewer resources to manage
▶ Collaborative approach
MACHINE
DATA
© 2017 SPLUNK INC.
1 2 3 4 5 6
How Splunk helps with Machine Learning
Splunk Splunk ML Toolkit
Third party
Data
Collection
Data
Cleansing &
Preparation
Analysis &
Feature
Extraction
Create
Models
Refine
Models and
Algorithms
Put into
Production
Splunk Apps with ML
© 2017 SPLUNK INC.
AIOps
Common technologies and data sources in use today
© 2017 SPLUNK INC.
© 2017 SPLUNK INC.
Album: Honeymoon
Common characteristic: Water
Album: Honeymoon
Common characteristic: Car Noise/Junk
Album: Safari
Common characteristic: Wildlife
© 2017 SPLUNK INC.
Demo
© 2017 SPLUNK INC.
What we just saw…
Adaptive Thresholds Anomaly Detection Event Correlation
Manage and maintain KPI thresholds by dynamically adapting to changing operational patterns
Catch issues that thresholds can’t—baseline normal operations and alert on anomalous conditions
Reduce event clutter, false positives and rules maintenance by auto-grouping related events
© 2017 SPLUNK INC.
▶ Modelling on ITSI Services and KPIs
▶ Predictive Analytics for future Service Health Scores
▶ Proactive alerting integrated into the ITSI notable events framework
▶ Identify leading indicators for possible service degradation
Predict Service Health Score and
Prevent Outages with Machine Learning
https://www.splunk.com/blog/2017/08/28/itsi-and-sophisticated-machine-learning.html
© 2017 SPLUNK INC.
Machine
Learning on
Events at
the World
Bank
Thanks to the integrated machine learning
in Splunk ITSI, we now have a reduced
number of events to process and the
streamlined event analytics framework
allows us to process events eight minutes
more quickly
Laurent Amouroux,
Technical Director
Econocom Infrastructure Management Services
15% increased
SLA Performance
60% reduction
In number of events
10x reduction
in number of system
performance events through
machine learning
© 2017 SPLUNK INC.
Outlier detected for faulty cell tower light
The tower light turns on only at night,
controlled by Canadian air traffic control
© 2017 SPLUNK INC.
This aligns directly with a company’s top priorities
Drive revenue
95% passengers through
security >5 mins
> spending more time
shopping
High inventory waste and
food going stale
> immediate sales
insight
Lower cost
Hard to get threat
insights
> real-time security
response
Reduce
risk
Negative impact of outage
at peak times
> improved business
& customer insight
Customer experience
© 2017 SPLUNK INC.
Where do I
start?
Better manage your
alerts generated by
your IT systems with
ML
Apply adaptive
thresholds with ML to
remove time
configuring static rules
Build a model to
predict service
outages
Do not go big
bang – start
simple and
prove value
very quickly
Faster
detection of
IT incidents
IT teams
more
productive
Prediction of
failures
© 2017 SPLUNK INC.
Open Discussion
© 2017 SPLUNK INC.© 2017 SPLUNK INC.
Thank you.

Splunk ITOA Roundtable - Zurich: 30th November 2017

  • 1.
    © 2017 SPLUNKINC.© 2017 SPLUNK INC. ITOA Roundtable 30.11.2017
  • 2.
    © 2017 SPLUNKINC. ▶ 09:00 – Breakfast ▶ 09:30 – Niklaus Seiler to kick off ▶ 09:35 – Round table introductions. ▶ 10:00 – Presentation of ML and AIOps ▶ 10:30 – Break ▶ 10:45 – Round table discussion ▶ 12:00 – Lunch Agenda
  • 3.
    © 2017 SPLUNKINC. Five things you will learn today What is Machine Learning Why ML is critical for today’s IT The challenges you will need to overcome Some real examples of ML use cases How to get started doing ML 1 2 3 4 5
  • 4.
    © 2017 SPLUNKINC. Splunk turns machine data into answers Splunk’s trusted analytics platform empowers people to dive into their machine data so they can find answers quickly and see opportunities in real-time.
  • 5.
    © 2017 SPLUNKINC. Machine Learning Deep Learning Artificial Intelligence Neural Networks Unsupervised Supervised What does it mean?
  • 6.
    © 2017 SPLUNKINC. Machine learning used in our everyday lives Facial Recognition Product Recommendations Natural Language Processing Self Driving Cars
  • 7.
    © 2017 SPLUNKINC. AI Machine Learning Deep Learning AI, Deep Learning, And Machine Learning Intelligent Agents No Human Involvement Sentient Machines Tensorflow Data sets are large and unknowable Guided Data Driven Decisions Augmenting Human Reasoning Operational Intelligence Splunk ML offerings today Neural Networks
  • 8.
    © 2017 SPLUNKINC. Three Types Of Machine Learning Supervised Learning: Unsupervised Learning: Reinforcement Learning:
  • 9.
    © 2017 SPLUNKINC. Deviation from past behavior Deviation from peers (aka Multivariate AD or Cohesive AD) Unusual change in features ITSI MAD Anomaly Detection Predict Service Outages Predicting Churn Predicting Events Trend Forecasting Detecting influencing entities Early warning of failure – predictive maintenance Identify peer groups Event Correlation Reduce alert noise ITSI Event Analytics Anomaly detection Predictive Analytics Clustering Typical ML use cases in Splunk
  • 10.
    © 2017 SPLUNKINC. Custom Machine Learning – Success Formula Domain Expertise (IT, Security, …) Data Science Expertise Splunk Expertise Identify use cases Drive decisions Set business/ops priorities SPL Data prep Statistics/math background Algorithm selection Model building Splunk ML Toolkit facilitates and simplifies via examples & guidance Operational success
  • 11.
    © 2017 SPLUNKINC. ▶ Assistants: Guided model building, testing and deployment for common objectives ▶ Showcases: Interactive examples for typical IT, security, business and IoT use cases ▶ Algorithms: 25+ standard algorithms included with the toolkit ▶ ML Commands: New SPL commands to fit, test and operationalize models ▶ Python for Scientific Computing Library: Access to 300+ open source algorithms Splunk Machine Learning Toolkit Extends Splunk platform functions and provides a guided modeling environment Build custom analytics for any use case
  • 12.
    © 2017 SPLUNKINC. Demo
  • 13.
    © 2017 SPLUNKINC. Machine Skills • Creative • Design • Empathy • Intuition • Sales • Marketing • Business Acumen • Analyzing high-velocity data • Analyzing big data sets • Complex data sets • Predicting future values • Detect Anomalies • Complex patterns Human Skills Human and machines
  • 14.
    © 2017 SPLUNKINC. How long do you spend driving a car? How much of that time is productive? What’s the value?
  • 15.
    © 2017 SPLUNKINC. How much time do we spend doing this in IT? Searching lots of data Creating static rules Monitoring Troubleshooting Root Cause Analysis Capacity Planning War Rooms Forecasting ….
  • 16.
    © 2017 SPLUNKINC. Current Challenges that need to be addressed Lower customer satisfaction High cost of IT Operation Inefficient use of resources Lost revenue Most of IT budget is spent operating datacenters $426 Billion in downtime per year 300% increase in number of IT events over last 5 years Drop in customer satisfaction largely due to issues with digital channels
  • 17.
    © 2017 SPLUNKINC. Analysing large sets of data Fast-flowing data Complex data Spotting complex patterns Reacting in real time And yet we said that machines are good at What is stopping us from doing ML for IT operations?
  • 18.
    © 2017 SPLUNKINC. Collecting and analysing this data has never been so easy Network InfrastructureLayer Packet, Payload, Traffic, Utilization, Perf Storage Utilization, Capacity, Performance Server Performance, Usage, Dependency ApplicationLayer User Experience Usage, Response Time, Failed Interactions Byte Code Instrumentation Usage, Experience, Performance, Quality Business Performance Corporate Data, Intake, Output, Throughput Splunk Approach: ▶ Single repository for ALL data ▶ Data in original raw format ▶ Machine learning ▶ Simplified architecture ▶ Fewer resources to manage ▶ Collaborative approach MACHINE DATA
  • 19.
    © 2017 SPLUNKINC. 1 2 3 4 5 6 How Splunk helps with Machine Learning Splunk Splunk ML Toolkit Third party Data Collection Data Cleansing & Preparation Analysis & Feature Extraction Create Models Refine Models and Algorithms Put into Production Splunk Apps with ML
  • 20.
    © 2017 SPLUNKINC. AIOps Common technologies and data sources in use today
  • 21.
  • 22.
    © 2017 SPLUNKINC. Album: Honeymoon Common characteristic: Water Album: Honeymoon Common characteristic: Car Noise/Junk Album: Safari Common characteristic: Wildlife
  • 23.
    © 2017 SPLUNKINC. Demo
  • 24.
    © 2017 SPLUNKINC. What we just saw… Adaptive Thresholds Anomaly Detection Event Correlation Manage and maintain KPI thresholds by dynamically adapting to changing operational patterns Catch issues that thresholds can’t—baseline normal operations and alert on anomalous conditions Reduce event clutter, false positives and rules maintenance by auto-grouping related events
  • 25.
    © 2017 SPLUNKINC. ▶ Modelling on ITSI Services and KPIs ▶ Predictive Analytics for future Service Health Scores ▶ Proactive alerting integrated into the ITSI notable events framework ▶ Identify leading indicators for possible service degradation Predict Service Health Score and Prevent Outages with Machine Learning https://www.splunk.com/blog/2017/08/28/itsi-and-sophisticated-machine-learning.html
  • 26.
    © 2017 SPLUNKINC. Machine Learning on Events at the World Bank
  • 27.
    Thanks to theintegrated machine learning in Splunk ITSI, we now have a reduced number of events to process and the streamlined event analytics framework allows us to process events eight minutes more quickly Laurent Amouroux, Technical Director Econocom Infrastructure Management Services 15% increased SLA Performance 60% reduction In number of events 10x reduction in number of system performance events through machine learning
  • 28.
    © 2017 SPLUNKINC. Outlier detected for faulty cell tower light The tower light turns on only at night, controlled by Canadian air traffic control
  • 29.
    © 2017 SPLUNKINC. This aligns directly with a company’s top priorities Drive revenue 95% passengers through security >5 mins > spending more time shopping High inventory waste and food going stale > immediate sales insight Lower cost Hard to get threat insights > real-time security response Reduce risk Negative impact of outage at peak times > improved business & customer insight Customer experience
  • 30.
    © 2017 SPLUNKINC. Where do I start? Better manage your alerts generated by your IT systems with ML Apply adaptive thresholds with ML to remove time configuring static rules Build a model to predict service outages Do not go big bang – start simple and prove value very quickly Faster detection of IT incidents IT teams more productive Prediction of failures
  • 31.
    © 2017 SPLUNKINC. Open Discussion
  • 32.
    © 2017 SPLUNKINC.© 2017 SPLUNK INC. Thank you.