Splunk is a powerful platform for understanding your data. The preview of the Machine Learning Toolkit and Showcase App extends Splunk with a rich suite of advanced analytics and machine learning algorithms, which are exposed via an API and demonstrated in a showcase. In this session, we'll present an overview of the app architecture and API and then show you how to use Splunk to easily perform a wide variety of tasks, including outlier detection, predictive analytics, event clustering, and anomaly detection. We’ll use real data to explore these techniques and explain the intuition behind the analytics.
21. 21
ML Toolkit & Showcase – DIY ML
• Splunk Supported framework for building ML Apps
– Get it for free: https://splunkbase.splunk.com/app/2890/
• Leverages Python for Scientific Computing (PSC) add-
on:
– Get it for free: refer to Splunkbasefor your OS version
ê https://splunkbase.splunk.com/app/2881/ to /2884/
– Open-source Python data science ecosystem
– NumPy, SciPy, scitkit-learn, pandas, statsmodels
• Showcase use cases: Predict Hard Drive Failure, Server
Power Consumption, Application Usage, Customer
Churn & more
22. 22
Standard algorithms out of the box:
Clustering: DBSCAN, KMeans, Birch, SpectralClustering
Regression: LinearRegression, RandomForestRegressor, ElasticNet, Ridge, Lasso
Classification: LogisticRegression, RandomForestClassifier, SVM, Naïve Bayes
(GaussianNB, BernoulliNB)
Transformation: PCA, KernelPCA, TFIDF Vectorizer, StandardScaler
Text Analytics: TF-IDF
Feature Extraction: FieldSelector (e.g. Univariate, ANOVA, K-best, etc.)
Implement one of 300+ algorithms by editing Python scripts
27. 27
3. Fit, Apply & Validate Models
• ML SPL – New grammar for doing ML in Splunk
• fit – fit models based on training data
– [training data] | fit LinearRegression costly_KPI
from feature1 feature2 feature3 into my_model
• apply – apply models on testing and production data
– [testing/production data] | apply my_model
• Validate Your Model (The Hard Part)
– Why hard? Because statistics is hard! Also: model error ≠ real world risk.
– Analyze residuals, mean-square error, goodness of fit, cross-validate, etc.
– Take Splunk’s Analytics & Data Science Education course
28. 28
4. Predict & Act
• Forecast KPIs & predict notable events
– When will my system have a critical error?
– In which service or process?
– What’s the probable root cause?
• How will people act on predictions?
– Is this a Sev 1/2/3 event? Who responds?
– Deliver via Notable Events or dashboard?
– Human response or automated response?
• How do you improve the models?
– Iterate, add more data, extract more features
– Keep track of true/false positives
31. 31
Getting started
• Pre-requisite: you must be running Splunk 6.4.x
• Download and install the free ML Toolkit & Showcase!
– https://splunkbase.splunk.com/app/2890/
– https://splunkbase.splunk.com/app/2881/ to /2884/
• Speak to your local SE to discuss ways you could use ML
• Join our local User Group – we’ll be running ML workshops!
– http://www.meetup.com/splunk-melbourne/
• Contact me! (aphillips@splunk.com)