Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
SplunkLive! Zurich 2018: Event Analytics
1. ▶ Splunk Usergroup Zürich
▶ Regular Splunk User get-togethers
▶ Frequent Splunk Ninja Presentations (D/E)
▶ Meetings throughout all major german
speaking cities (not only Zurich)
▶ Amtssprache deutsch
▶ Not a sales thing
▶ Kick-off soon
▶ Join now:
▶ https://usergroups.splunk.com/group/splunk-
user-group-zurich.html
Splunk Usergroup Zurich
http://bit.do/SPLUGZ
2. Predictive, Proactive, and Collaborative
ML with IT Service Intelligence
Not a Science Project – Creating Actionable
Events Through Analytics
Hans-Henning Gehrts. ITOA & AIOps Subject Matter Expert,
ITIL Expert
8.5.2018
7. ▶ Key Performance
Indicators (KPIs)
• Defined metrics that
are used to evaluate
the overall status of
the service
▶ Leading Indicators
• Drivers of a result
▶ Lagging Indicators
• Outcome of a result
Indicators
They Matter!
Example
Scenario
DB Run Out of Space KPI Storage Value = 100% KPI User Response
Time Value = 2000+ secs
8. KPI’s So What?
Understanding Leading vs Lagging KPIs From the Service Experts is Critical
Use Case Data Needed
Set Business Priorities
Measure the Right KPIs
Drive Decisions
10. Where Does Data Come From?
Service Intelligence
Machine Data Human Data
Synthetic APM
Application Code
Code
DeploymentNetwork
Change Records
Server Changes
Byte Code
Instrumentation
ML Adaptive
Thresholding
Server
Storage
Service Intelligence
ApplicationLayerInfrastructureLayer
11. Use Splunk ITSI Provide flexible
dependency service
mapping for
interactions at scale
Leverage a platform to
build out KPIs that
ensure repeatability for
consistency and allows
aggregation and per
entity KPI values
Ease the burden of
cleaning data
What Do I Do With It Now
13. Custom Machine Learning Success Formula
Domain Expertise
(IT, Security)
▶ ID Use Cases
▶ Set Business &
Operations Priorities
▶ What KPIs Matter
▶ Drive Decisions
Data Science
Expertise
▶ Math/Stats Background
▶ Algorithm Selection
▶ Model Building
▶ Splunk ML Toolkit
Splunk
Expertise
▶ SPL
▶ Data Prep
▶ Operational
Success
14. Overview of ML at Splunk
CORE PLATFORM
SEARCH
PACKAGED PREMIUM
SOLUTIONS
MACHINE LEARNING
TOOLKIT
Platform for Machine Data
16. Deviation from past behavior
Deviation from peers
(aka Multivariate AD or Cohesive AD)
Unusual change in features
ITSI MAD Anomaly Detection
Predict Service Health Score
Predicting churn
Predicting events
Trend forecasting
Detecting influencing entities
Early warning of failure – predictive
maintenance
Identify peer groups
Event correlation
Reduce alert noise
ITSI Event Analytics
Anomaly Detection Predictive Analytics Clustering
Splunk Customers Have ML Problems
17. Splunk Machine Learning Toolkit
25+ standard algorithms
available prepackaged
New commands to fit, test
and operationalize models
EXAMPLE
Algorithms
Guide model building,
testing and deployment
for common objectives
Interactive examples for
25+ use cases
MLib Integration
Showcase
EXAMPLE EXAMPLE
300+ open source
algorithms
Assistants
19. ▶ Notable Events are key
• We created a notable for Web Store
Service – Called webstore_health_alert
• Lets see how Splunk can cluster this
with other events to show other Notable
Events that are attributed
▶ Smart Mode is actually Smart
• It use clustering to group events
remember unsupervised ML at the
beginning and spot the anomalies
IT Service Intelligence Event Analytics
20. ▶ Take 1000s or 100s of 1000s
of alerts and connect them
▶ Use ML prediction to improve
correlation
▶ Boil the events down to
reasonable count of
Actionable Events
The Diamonds in the Rough
Clustering Events into Actionable Alerts
21. Let me show it to you
Predictive Analytics in Real-time