Machine Learning and Analytics Breakout Session

Copyright © 2015 Splunk Inc.
Operationalizing Machine Learning
Pierre Brunel
Senior Sales Engineer
Splunk, Inc.

2
Disclaimer
During the course of this presentation, we may make forward looking statements regarding future events
or the expected performance of the company. We caution you that such statements reflect our current
expectations and estimates based on factors currently known to us and that actual events or results
could differ materially. For important factors that may cause actual results to differ from those contained
in our forward-looking statements, please review our filings with the SEC. The forward-looking
statements made in the this presentation are being made as of the time and date of its live presentation.
If reviewed after its live presentation, this presentation may not contain current or accurate information.
We do not assume any obligation to update any forward looking statements we may make.
In addition, any information about our roadmap outlines our general product direction and is subject to
change at any time without notice. It is for informational purposes only and shall not, be incorporated
into any contract or other commitment. Splunk undertakes no obligation either to develop the features
or functionality described or to include any such feature or functionality in a future release.

Why do we need ML?

Historical Data Real-time Data Statistical Models
DB, HDFS, NoSQL, Splunk, etc Machine Learning
T – a few days T + a few days
Why is this so challenging using traditional methods?
• DATA IS STILL IN MOTION, still in a BUSINESS PROCESS.
• Enrich real-time MACHINE DATA with structured HISTORICAL DATA
• Make decisions IN REAL TIME using ALL THE DATA
Splunk
Security Operations Center
Network Operations Center
Business Operations Center

What is ML?

6
ML 101: What is it?
• Machine Learning (ML) is a process for generalizing from examples
– Examples = example or “training” data
– Generalizing = building “statistical models” to capture correlations
– Process = ML is never done, you must keep validating & refitting models
• Simple ML workflow:
– Explore data
– FIT models based on data
– APPLY models in production
– Keep validating models
“All models are wrong, but some are useful.”
- George Box

7
3 Types of Machine Learning
1. Supervised Learning: generalizing from labeled data

8
2. Unsupervised Learning: generalizing from unlabeled data

9
3. Reinforcement Learning: generalizing from rewards in time
Leitner System Recommender systems

ML Use Cases

12
IT Ops: Predictive Maintenance
1. Get resource usage data (CPU, latency, outage reports)
2. Explore data, and fit predictive models on past / real-time data
3. Apply & validate models until predictions are accurate
4. Forecast resource saturation, demand & usage
5. Surface incidents to IT Ops, who INVESTIGATES & ACTS
Problem: Network outages and truck rolls cause big time & money expense
Solution: Build predictive model to forecast outage scenarios, act pre-emptively & learn

13
Security: Find Insider Threats
Problem: Security breaches cause big time & money expense
Solution: Build predictive model to forecast threat scenarios, act pre-emptively & learn
1. Get security data (data transfers, authentication, incidents)
4. Forecast abnormal behavior, risk scores & notable events
5. Surface incidents to Security Ops, who INVESTIGATES & ACTS

14
Business Analytics: Predict Customer Churn
Problem: Customer churn causes big time & money expense
Solution: Build predictive model to forecast possible churn, act pre-emptively & learn
1. Get customer data (set-top boxes, web logs, transaction history)
4. Identify customers likely to churn
5. Surface incidents to Business Ops, who INVESTIGATES & ACTS

15
Summary: The ML Process
Problem: <Stuff in the world> causes big time & money expense
Solution: Build predictive model to forecast <possible incidents>, act pre-emptively & learn
1. Get all relevant data to problem
4. Forecast KPIs & metrics associated to use case
5. Surface incidents to X Ops, who INVESTIGATES & ACTS
Operationalize

How do we do ML with Splunk?

17
Analysts Business Users
1. Get Data & Find Decision-Makers!
1
IT Users
ODBC
SDK
API
DB Connect
Look-Ups
Ad Hoc
Search
Monitor
and Alert
Reports /
Analyze
Custom
Dashboards
GPS /
Cellular
Devices Networks Hadoop
Servers Applications Online
Shopping Carts
Analysts Business Users
Structured Data Sources
CRM ERP HR Billing Product Finance
Data Warehouse
Clickstreams

18
2. Explore Data, Build Searches & Dashboards
• Start with the Exploratory Data Analysis phase
– “80% of data science is sourcing, cleaning, and preparing the data”
• For each data source, build “data diagnostic” dashboard
– What’s interesting? Throw up some basic charts.
– What’s relevant for this use case?
– Any anomalies? Are thresholds useful?
• Mix data streams & compute aggregates
– Compute KPIs & statistics w/ stats, eventstats, etc.
– Enrich data streams with useful structured data
– stats count by X Y – where X,Y from different sources

19
3. Get the ML Toolkit & Showcase App
• Get the App: http://tiny.cc/splunkmlapp
• Leverages Python for Scientific Computing (PSC) add-on:
– Open-source Python data science ecosystem
– NumPy, SciPy, scitkit-learn, pandas, statsmodels
• Showcase use cases: Hard Drive Failure, Server Power consumption,
Server Response Time, Application Usage
• Standard algorithms out of the box:
– Supervised: Logistic Regression, SVM, Linear Regression, Random Forest
– Unsupervised: KMeans, DBSCAN, Spectral Clustering
• Implement one of 300+ algorithms by editing Python scripts

20
4. Fit, Apply & Validate Models
• ML SPL – New grammar for doing ML in Splunk
• fit – fit models based on training data
– [training data] | fit LinearRegression costly_KPI
from feature1 feature2 feature3 into my_model
• apply – apply models on testing and production data
– [testing/production data] | apply my_model
• Validate Your Model (The Hard Part)
– Why hard? Because statistics is hard! Also: model error ≠ real world risk.
– Analyze residuals, mean-square error, goodness of fit, cross-validate, etc.
– Take Splunk’s Analytics & Data Science Education course

21
5. Operationalize Your Models
• Remember the ML Process:
1. Get data
2. Explore data & fit models
3. Apply & validate models
4. Forecast KPIs
5. Surface incidents to Ops team
• Then operationalize: feed back Ops analysis to data inputs, repeat
• Lots of hard work & stats, but lots of value will come out.
Operationalize

Show me the ML!

23
Sneak Peak Recap: What’s new in GA
• New Algorithms (Random Forest, Lasso, Kernel PCA, and more…)
• More use cases to explore
• Support added for Search Head Clustering
• Removed 50k limit on model fitting
• Sampling for training/test data
• Guided ML via a ML Assistant aka Model / Query Builder
• Install on 6.4 Search Head

24
Next Steps with Splunk ML
• Reach out to your Tech Team! We can help architect ML workflows.
• Lots of ML commands in Core Splunk (predict, anomalydetection, stats)
• ML Toolkit & Showcase – available and free, ready to use
• Splunk ITSI: Applied ML for ITOA use cases
– Manage 1000s of KPIs & alerts
– Adaptive Thresholding & Anomaly Detection
• Splunk UBA: Applied ML for Security
– Unsupervised learning of Users & Entities
– Surfaces Anomalies & Threats
• ML Customer Advisory Program:
– Connect with Product & Engineering teams - mlprogram@splunk.com

25
Northern Cal Tech Talks!
Monthly WebEx Sessions
• Ted Talk style presentation
• Q&A Chat forum
So what’s next on the agenda?
• March 23rd @ 10AM PST - Building &
Deploying Apps.
• April 20th @ 10AM PST - Top 5 most useful
search commands.
See more at:
http://live.splunk.com/NorCalTechTalks

26
SEPT 26-29, 2016
WALT DISNEY WORLD, ORLANDO
SWAN AND DOLPHIN RESORTS
• 5000+ IT & Business Professionals
• 3 days of technical content
• 165+ sessions
• 80+ Customer Speakers
• 35+ Apps in Splunk Apps Showcase
• 75+ Technology Partners
• 1:1 networking: Ask The Experts and Security
Experts, Birds of a Feather and Chalk Talks
• NEW hands-on labs!
• Expanded show floor, Dashboards Control
Room & Clinic, and MORE!
The 7th Annual Splunk Worldwide Users’ Conference
PLUS Splunk University
• Three days: Sept 24-26, 2016
• Get Splunk Certified for FREE!
• Get CPE credits for CISSP, CAP, SSCP
• Save thousands on Splunk education!

Thank you!

Machine Learning and Analytics Breakout Session

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Machine Learning and Analytics Breakout Session

Similar to Machine Learning and Analytics Breakout Session (20)

More from Splunk

More from Splunk (20)

Recently uploaded

Recently uploaded (20)

Machine Learning and Analytics Breakout Session

Editor's Notes