SlideShare a Scribd company logo
Advanced Outlier Detection and Noise Reduction
with Splunk & MLTK
Presented by: Urwah Haq
August 10th, 2021
Presented by Urwah Haq @ San Francisco Splunk User Group
1
Slide 2
DI Confidential
18th Dec 2019
Agenda
1. Common Ways of finding outliers
• Review of some math terminology
• Review on Outlier blog what it covers
• Re-introduce moving average & foreach function
2. Using the ‘density function’ in MLTK
• An example of ML algorithm to detect outliers
3. Combining Multiple methods 1+2
• Ensemble Learning (combining multiple ML methods)
4. T-Tests & Clustering – What are they are how to use them?
2
Slide 3
DI Confidential
18th Dec 2019
ML/Splunk Terminology Refresher
Statistics Terms:
• Mean/Average – Central value in a set of data
• Standard Deviation – Measure of spread of data (higher the stdev the larger the difference between the
points)
• Time Series Data/Events - Time Series Data is data that is collected/ingested in Splunk over intervals of
time
ML Terms:
• Outliers – Legitimate Data Points that deviate far away from the norm
• Anomalies – An action that may seem out of order with the rest of data
• Outliers vs Anomalies – For our purposes in Splunk any deviations in data such as mb_out from firewall
data or cpu/mem/network utilization can be considered ‘Outliers’. Anything involving user actions such as
Urwah installing 10+ splunkbase applications on a Sunday is considered an ‘Anomaly’
Anomalies
Outliers
Relational
anomalies
+ Others
3
Slide 4
DI Confidential
18th Dec 2019
1 - What is an Outlier
• A point away from the
body of data points
• A data point different than
rest of the points
• In Splunk one of the most common ways to find
outliers is to set boundaries
• If datapoint deviates away from these boundaries
tag them as outliers
4
Slide 5
DI Confidential
18th Dec 2019
1- Types of Outlier detection (NO ML)
Blog: https://discoveredintelligence.ca/quick-guide-to-outlier-detection-in-splunk/
1. Static Threshold
a) If(value) > X(fixed threshold) THEN X is an outlier
2. Moving Thresholding
a) If(value) > X(moving average or moving value) THEN X is an outlier
b) Can use functions such as ‘trendline sma/ema’ OR ‘streamstats window=N’
c) We can get creative with this
index=main user=* sourcetype=WinEventLog| timechart count by user| eval
threshold=100
Static Threshold
| inputlookup app_usage.csv| rename * as user_*| rename user__time as _time|
table _time *| eval threshold=100
Moving Threshold
| inputlookup app_usage.csv | rename * as user_* | rename user__time as _time |
table _time * | eval threshold=1200 | addtotals fieldname=total | eval limit=0|
rename OTHER as u_OTHER | eval distinct_values=0 | foreach user_* [ eval
distinct_values=if(<<FIELD>> >0,distinct_values+1,distinct_values)] | eval
average=round(total/distinct_values,2) | eval average=if(distinct_values=1 AND
average >50,round(average/5),average)| table _time average user_*
5
Slide 6
DI Confidential
18th Dec 2019
1 – How basic moving average works
Moving Thresholding
a) A moving threshold is not just the average of past X number of points it can be a lot more
b) Basic search for moving average of past 5 data points
| inputlookup user_usage.csv | table _time * | eval threshold=1200 | addtotals fieldname=total | table _time total|
trendline sma5(total) as 5_moving_average
Here is what a simple average looks like with window=2:
_time User_a User_b User_C User_D User_E Average Moving Average
9:00 0 0 10 15 5  (0 + 0 + 0 +10
+ 15 +5)/5 = 6
9:15 0 0 0 5 5 (0 + 0 + 0 + 0 + 5
+ 5 )/5 = 2
4
9:30 1 2 3 4 5 3 2.5
9:45 0 1 5 4 5 3 3
10:00 1 3 0 5 2 2.2 2.1
10:15 1 0 4 6 3 2.8
10:30 1 0 0 7 0 1.6
5
Active
Users
Using trendline
| inputlookup app_usage.csv | rename * as user_* | rename user__time as _time |
table _time * | eval threshold=1200 | addtotals fieldname=total | table _time total|
trendline sma5(total) as 5_moving_average
Using streamstats
| inputlookup app_usage.csv | rename * as user_* | rename user__time as _time |
table _time * | eval threshold=1200 | addtotals fieldname=total | table _time total |
trendline sma5(total) as 5_moving_average| streamstats window=5 avg(total) as
streamstats_moving_average
Using streamstats & autoregress
| inputlookup app_usage.csv
| rename * as user_*
| rename user__time as _time
| table _time *
| eval threshold=1200
| addtotals fieldname=total
| table _time total
6
| streamstats window=5 avg(total) as streamstats_moving_average
| autoregress streamstats_moving_average as previous_moving_average
6
Slide 7
DI Confidential
18th Dec 2019
1 – Using Foreach Function to adjust moving average
• Use ‘Foreach’ function with conditions. E.g
ONLY use ‘active’ users with hits>0 to calculate average
_time User_a User_b User_C User_
D
User_E New Average New Moving Average
9:00 0 0 10 15 5  (10 + 15
+5)/3 = 10
9:15 0 0 0 5 5 (5 + 5 )/2 = 10 10
9:30 1 2 3 4 5 3 6.5
9:45 0 1 5 4 5 3 3
10:00 1 3 0 5 2 2.2 2.65
10:15 1 0 4 6 3 2.8
10:30 1 0 0 7 0 1.6
3 Active Users
Using Foreach function
| inputlookup app_usage.csv | rename * as user_* | rename user__time as _time |
table _time * | eval threshold=1200 | addtotals fieldname=total | rename OTHER as
u_OTHER | eval distinct_values=0 | foreach user_* [ eval
distinct_values=if(<<FIELD>> >100,distinct_values+1,distinct_values)] | eval
new_average=round(total/distinct_values,2) | eval old_average=round(total/11,2)|
table _time new_average old_average
7
Slide 8
DI Confidential
18th Dec 2019
1 – Using Foreach vs Aggregate Moving Average
_time User
_a
User
_b
User
_C
User
_D
User
_E
9:00 0 0 10 15 5
9:15 0 0 0 5 5
9:30 1 2 3 4 5
9:45 0 1 5 4 5
10:00 1 3 0 5 2
10:15 1 0 4 6 3
10:30 1 0 0 7 0
Basic method Using Foreach method
• Designed such that a user with 0
activity will count as an ‘active
user’
• Simple to implement
• Better to use for total
aggregates
• Results in more ‘outliers’ due to
static or moving bound
• Only users with activity will be
counted as ‘active users’
• More Complicated to setup
• Better to use when you have a
limited number of Users/Ips or
Entities
• Gives a more accurate picture of
User/IP that is more active than
normal
Using Foreach function
| inputlookup app_usage.csv | rename * as user_* | rename user__time as _time |
table _time * | eval threshold=1200 | addtotals fieldname=total | rename OTHER as
u_OTHER | eval distinct_values=0 | foreach user_* [ eval
distinct_values=if(<<FIELD>> >100,distinct_values+1,distinct_values)] | eval
new_average=round(total/distinct_values,2) | eval old_average=round(total/11,2)|
table _time new_average old_average
8
Slide 9
DI Confidential
18th Dec 2019
2 - Introducing the ‘Density Function’
• What is the ‘Density Function’ within MLTK?
• It is another tool for you to use in anomaly detection on top of previous methods to find anomalies.
• It is better to use at an aggregate level (e.g span=15/30/60min)
• It works by plotting your values against mathematics distributions to calculate the probability of them happening
• Similar to the “| anomalydetection method=histogram [field_name]”
All user activity
counts
Activity Bins
0-100 500-600 600-700
Activity between 500-700 is
usually most common in a day
when span and have the
highest probability of
happening
1100-1200
Activities in these bins have the lowest
probability of occurring  More likely to be
outliers
DensityFunction -
https://docs.splunk.com/Documentation/MLApp/5.2.1/User/Algorithms#DensityF
unction
AnomalyDetection -
https://docs.splunk.com/Documentation/SplunkCloud/8.2.2104/SearchReference/
Anomalydetection
DensityFunction
| inputlookup app_usage.csv | rename * as user_* | rename user__time as _time |
table _time * | eval threshold=1200 | addtotals fieldname=total | fields _time total|
bin total start=1 end=5| stats count by total
DensityFunction Example
| inputlookup app_usage.csv | rename * as user_* | rename user__time as _time |
table _time * | eval threshold=1200 | addtotals fieldname=total | fields _time total|
fit DensityFunction total
9
Overlay - Overlay Line using visual formatting options
| inputlookup app_usage.csv | rename * as user_* | rename user__time as _time |
table _time * | eval threshold=1200 | addtotals fieldname=total | fields _time total |
bin total start=1 end=5 | stats count by total| eval overlay=count
9
Slide 10
DI Confidential
18th Dec 2019
2 – Using the Density Function
• Where it works well
• Data that is continuous, with little to no gaps
• For Aggregate-level e.g total activity
• For Entity-level (users/Ips) that has few or no gaps (fit DensityFunction <Field> by “User” into Model_Name)
10
Slide 11
DI Confidential
18th Dec 2019
• Using Density Function at Aggregate Level
• Use foreach moving average method
3 – Combining Density Function with Moving Averages
11
Slide 12
DI Confidential
18th Dec 2019
• Using Density Function at Aggregate Level
• …..| fields _time Total| fit DensityFunction Total show_density=true into
my_usergroup_model
• Use foreach moving average method
• …. | foreach user_* [ eval distinct_values=if(<<FIELD>>
>0,distinct_values+1,distinct_values)] | eval
new_average=round(total/distinct_values,2) | table _time * new_average
| foreach user_* [ eval isOutlier_<<FIELD>>=if(<<FIELD>> >
2*new_average,1,0)]
3 – Combining Searches
Output
Fields
Output
Fields
_time, isOutlier (Aggregate)
_time, isOutlier_user1, isOutlier_user2,
isOutlier_user3, …
Reference Outlier (Aggregate) in user-level outlier
search from 1 of 3 options:
1 – Lookup
2 – Summary Index
3 – Inline Search
| inputlookup user_usage.csv| addtotals| fields _time Total| fit DensityFunction Total
show_density=true into my_usergroup_model
| inputlookup app_usage.csv | rename * as user_* | rename user__time as _time |
table _time * | eval threshold=1200 | addtotals fieldname=total | rename OTHER as
u_OTHER | eval distinct_values=0 | foreach user_* [ eval
distinct_values=if(<<FIELD>> >0,distinct_values+1,distinct_values)] | eval
new_average=round(total/distinct_values,2) | table _time * new_average | foreach
user_* [ eval isOutlier_<<FIELD>>=if(<<FIELD>> > 2*new_average,1,0)]
12
Slide 13
DI Confidential
18th Dec 2019
How do I make most use of all of outlier methods?
• Apply Density Function or any other technique
to find a time frame that was an outlier
• Save results in lookup or summary index for
reference
Aggregate
Level
• Use user-level outlier technique to find a user
who was an outlier at a certain time
• Reference that time with the aggregate level
Entity
(user/Ip) level
• Reference regional outliers using _time or time
buckets as the common field with the
aggregate level & user level
(Optional)
Regional-Level
Advantages of combining multiple
styles of outlier detection at
different data levels
• Verification of true outliers vs a
simple static value
• Less noisy for alerting
• Alert only when all 2 or 3 levels of
outliers are met
• Validate if rise/fall of aggregate
level was contributed by one or
more user. If one user that is a
confirmed outlier
13
Slide 14
DI Confidential
18th Dec 2019
More Advanced Ensemble Techniques
Aggregate Level Entity Level
Available ML Techniques
• Density Function to find most rare time
buckets with highest values as outliers
• Regression to find loudest times buckets
• Classification to find times with highest
probability of being outliers
• Statespace algorithm & anomaly
detection algorithm
Available Non-ML Techniques
• Static thresholds
• Moving Averages thresholds
Available ML Techniques
• Density Function to find most rare time
buckets with highest values as outliers
• Classification to find entities with highest
probability of going above thresholds
• Statespace algorithm & anomaly
detection algorithm
Available Non-ML Techniques
• Static thresholds
• Moving Averages thresholds
• Foreach and activity based averages
• Better outliers
• Less mundane alerting
• Statespace algorithm
& anomaly detection
algorithm
14
Slide 15
DI Confidential
18th Dec 2019
4 – Increasing Outlier Function Accuracy
1. Find Entities/Users/Ips that form a large percentage of your overall activity and
remove them
• This can be measured by using the correlation OR t-test function from MLTK
2. Group Similar sets of Entities/Users/Ips using the clustering command in MLTK
• Analyze each cluster individually. The cluster command
15
Slide 16
DI Confidential
18th Dec 2019
Thank you
| inputlookup query.csv| fit TFIDF query stop_words=english analyzer=word
token_pattern="w{3,20}" max_features=200| fit KMeans query* k=3| fields user
query cluster cluster_distance
16
Slide 17
DI Confidential
18th Dec 2019
Scoring Function to determine similarity
Scoring function
| score <test_name> <fields>…
https://docs.splunk.com/Documentation/MLApp/5.2.1/User/Scorecommand#T-test_.281_sample.29
Available tests:
• T-test(s):
1. Test if two Ips/User have identical pattern from different groups/domains (T-test 2 independent
sample)
2. Test if single user/ip is equal to a average from group (T-test 1 sample)
3. Test if two Ips/User have identical pattern from same group/domain (T-test 2 related samples)
• Energy Distance: The closer this value to 0 the similar two fields are in-terms of gain/loss overtime (or
mathematically they have similar cumulative distributive function)
• Kolmogorov-Smirnov (KS): Test if something is statistically identical to another field
• Kwiatkowski-Phillips-Schmidt-Shin: Test if field(s) trend is stationary – no or little gain/loss
T-test examples:
1
| inputlookup app_usage.csv
| rename * as user_*
| rename user__time as _time
| table _time user_ITOps
| score ttest_1samp user_ITOps popmean=100 alpha=0.1
2
| inputlookup app_usage.csv
| rename * as user_*
| rename user__time as _time
| table _time user_HR1 user_HR2 user_ERP user_CRM user_ITOps
| score ttest_ind user_HR1 against user_HR2 user_ITOps
| inputlookup app_usage.csv
| rename * as user_*
| rename user__time as _time
| table _time user_HR1 user_HR2 user_ERP user_CRM user_ITOps
17
| score ttest_ind user_HR1 against user_HR1 user_ITOps
3
| inputlookup app_usage.csv
| rename * as user_*
| rename user__time as _time
| table _time user_HR1 user_HR2 user_ERP user_CRM user_ITOps
| score ttest_rel user_HR1 against user_HR1 user_ITOps
Energy Distance
| inputlookup app_usage.csv
| rename * as user_*
| rename user__time as _time
| table _time user_HR1 user_HR2 user_ERP user_CRM user_ITOps
user_RemoteAccess user_Webmail
| score energy_distance user_Webmail against user_RemoteAccess
| inputlookup app_usage.csv
| rename * as user_*
| rename user__time as _time
| table _time user_HR1 user_HR2 user_ERP user_CRM user_ITOps
user_RemoteAccess user_Webmail
| score energy_distance user_HR1 against user_HR1
| inputlookup app_usage.csv
| rename * as user_*
| rename user__time as _time
| table _time user_RemoteAccess user_Webmail
| fit CorrelationMatrix method=kendall user_Webmail user_RemoteAccess
17
Slide 18
DI Confidential
18th Dec 2019
Streamstats - Explanation
Window=2
18

More Related Content

What's hot

Threat hunting 101 by Sandeep Singh
Threat hunting 101 by Sandeep SinghThreat hunting 101 by Sandeep Singh
Threat hunting 101 by Sandeep Singh
OWASP Delhi
 
Threat Hunting with Splunk
Threat Hunting with SplunkThreat Hunting with Splunk
Threat Hunting with Splunk
Splunk
 
Nmap and metasploitable
Nmap and metasploitableNmap and metasploitable
Nmap and metasploitable
Mohammed Akbar Shariff
 
Hacking With Nmap - Scanning Techniques
Hacking With Nmap - Scanning TechniquesHacking With Nmap - Scanning Techniques
Hacking With Nmap - Scanning Techniques
amiable_indian
 
Threat Hunting with Splunk
Threat Hunting with SplunkThreat Hunting with Splunk
Threat Hunting with Splunk
Splunk
 
Threat Hunting Workshop
Threat Hunting WorkshopThreat Hunting Workshop
Threat Hunting Workshop
Splunk
 
Metasploit
MetasploitMetasploit
Metasploit
Lalith Sai
 
SplunkLive! Data Models 101
SplunkLive! Data Models 101SplunkLive! Data Models 101
SplunkLive! Data Models 101
Splunk
 
Effective Threat Hunting with Tactical Threat Intelligence
Effective Threat Hunting with Tactical Threat IntelligenceEffective Threat Hunting with Tactical Threat Intelligence
Effective Threat Hunting with Tactical Threat Intelligence
Dhruv Majumdar
 
Bug Bounty #Defconlucknow2016
Bug Bounty #Defconlucknow2016Bug Bounty #Defconlucknow2016
Bug Bounty #Defconlucknow2016
Shubham Gupta
 
XXE Exposed: SQLi, XSS, XXE and XEE against Web Services
XXE Exposed: SQLi, XSS, XXE and XEE against Web ServicesXXE Exposed: SQLi, XSS, XXE and XEE against Web Services
XXE Exposed: SQLi, XSS, XXE and XEE against Web Services
Abraham Aranguren
 
Exploring Frameworks of Splunk Enterprise Security
Exploring Frameworks of Splunk Enterprise SecurityExploring Frameworks of Splunk Enterprise Security
Exploring Frameworks of Splunk Enterprise Security
Splunk
 
Cyber Threat Hunting: Identify and Hunt Down Intruders
Cyber Threat Hunting: Identify and Hunt Down IntrudersCyber Threat Hunting: Identify and Hunt Down Intruders
Cyber Threat Hunting: Identify and Hunt Down Intruders
Infosec
 
Addressing the cyber kill chain
Addressing the cyber kill chainAddressing the cyber kill chain
Addressing the cyber kill chain
Symantec Brasil
 
Splunk Enterprise Security
Splunk Enterprise SecuritySplunk Enterprise Security
Splunk Enterprise Security
Splunk
 
Threat Hunting
Threat HuntingThreat Hunting
Threat Hunting
Splunk
 
Threat hunting - Every day is hunting season
Threat hunting - Every day is hunting seasonThreat hunting - Every day is hunting season
Threat hunting - Every day is hunting season
Ben Boyd
 
Network security - Defense in Depth
Network security - Defense in DepthNetwork security - Defense in Depth
Network security - Defense in Depth
Dilum Bandara
 
The top 10 windows logs event id's used v1.0
The top 10 windows logs event id's used v1.0The top 10 windows logs event id's used v1.0
The top 10 windows logs event id's used v1.0
Michael Gough
 
Attacking thru HTTP Host header
Attacking thru HTTP Host headerAttacking thru HTTP Host header
Attacking thru HTTP Host header
Sergey Belov
 

What's hot (20)

Threat hunting 101 by Sandeep Singh
Threat hunting 101 by Sandeep SinghThreat hunting 101 by Sandeep Singh
Threat hunting 101 by Sandeep Singh
 
Threat Hunting with Splunk
Threat Hunting with SplunkThreat Hunting with Splunk
Threat Hunting with Splunk
 
Nmap and metasploitable
Nmap and metasploitableNmap and metasploitable
Nmap and metasploitable
 
Hacking With Nmap - Scanning Techniques
Hacking With Nmap - Scanning TechniquesHacking With Nmap - Scanning Techniques
Hacking With Nmap - Scanning Techniques
 
Threat Hunting with Splunk
Threat Hunting with SplunkThreat Hunting with Splunk
Threat Hunting with Splunk
 
Threat Hunting Workshop
Threat Hunting WorkshopThreat Hunting Workshop
Threat Hunting Workshop
 
Metasploit
MetasploitMetasploit
Metasploit
 
SplunkLive! Data Models 101
SplunkLive! Data Models 101SplunkLive! Data Models 101
SplunkLive! Data Models 101
 
Effective Threat Hunting with Tactical Threat Intelligence
Effective Threat Hunting with Tactical Threat IntelligenceEffective Threat Hunting with Tactical Threat Intelligence
Effective Threat Hunting with Tactical Threat Intelligence
 
Bug Bounty #Defconlucknow2016
Bug Bounty #Defconlucknow2016Bug Bounty #Defconlucknow2016
Bug Bounty #Defconlucknow2016
 
XXE Exposed: SQLi, XSS, XXE and XEE against Web Services
XXE Exposed: SQLi, XSS, XXE and XEE against Web ServicesXXE Exposed: SQLi, XSS, XXE and XEE against Web Services
XXE Exposed: SQLi, XSS, XXE and XEE against Web Services
 
Exploring Frameworks of Splunk Enterprise Security
Exploring Frameworks of Splunk Enterprise SecurityExploring Frameworks of Splunk Enterprise Security
Exploring Frameworks of Splunk Enterprise Security
 
Cyber Threat Hunting: Identify and Hunt Down Intruders
Cyber Threat Hunting: Identify and Hunt Down IntrudersCyber Threat Hunting: Identify and Hunt Down Intruders
Cyber Threat Hunting: Identify and Hunt Down Intruders
 
Addressing the cyber kill chain
Addressing the cyber kill chainAddressing the cyber kill chain
Addressing the cyber kill chain
 
Splunk Enterprise Security
Splunk Enterprise SecuritySplunk Enterprise Security
Splunk Enterprise Security
 
Threat Hunting
Threat HuntingThreat Hunting
Threat Hunting
 
Threat hunting - Every day is hunting season
Threat hunting - Every day is hunting seasonThreat hunting - Every day is hunting season
Threat hunting - Every day is hunting season
 
Network security - Defense in Depth
Network security - Defense in DepthNetwork security - Defense in Depth
Network security - Defense in Depth
 
The top 10 windows logs event id's used v1.0
The top 10 windows logs event id's used v1.0The top 10 windows logs event id's used v1.0
The top 10 windows logs event id's used v1.0
 
Attacking thru HTTP Host header
Attacking thru HTTP Host headerAttacking thru HTTP Host header
Attacking thru HTTP Host header
 

Similar to Advanced Outlier Detection and Noise Reduction with Splunk & MLTK August 11, 2021

Prog1-L2.pptx
Prog1-L2.pptxProg1-L2.pptx
Prog1-L2.pptx
valerie5142000
 
Object Oriented Programming Short Notes for Preperation of Exams
Object Oriented Programming Short Notes for Preperation of ExamsObject Oriented Programming Short Notes for Preperation of Exams
Object Oriented Programming Short Notes for Preperation of Exams
MuhammadTalha436
 
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
Unity Technologies Japan K.K.
 
Project for Student Result System
Project for Student Result SystemProject for Student Result System
Project for Student Result System
KuMaR AnAnD
 
lec4.ppt
lec4.pptlec4.ppt
lec4.ppt
NanoSana
 
Optimization of workload prediction based on map reduce frame work in a cloud...
Optimization of workload prediction based on map reduce frame work in a cloud...Optimization of workload prediction based on map reduce frame work in a cloud...
Optimization of workload prediction based on map reduce frame work in a cloud...
eSAT Journals
 
Optimization of workload prediction based on map reduce frame work in a cloud...
Optimization of workload prediction based on map reduce frame work in a cloud...Optimization of workload prediction based on map reduce frame work in a cloud...
Optimization of workload prediction based on map reduce frame work in a cloud...
eSAT Publishing House
 
Ch02 primitive-data-definite-loops
Ch02 primitive-data-definite-loopsCh02 primitive-data-definite-loops
Ch02 primitive-data-definite-loops
James Brotsos
 
Measuring User Experience
Measuring User ExperienceMeasuring User Experience
Measuring User Experience
Tenia Wahyuningrum
 
Object Oriented Programming using C++: Ch06 Objects and Classes.pptx
Object Oriented Programming using C++: Ch06 Objects and Classes.pptxObject Oriented Programming using C++: Ch06 Objects and Classes.pptx
Object Oriented Programming using C++: Ch06 Objects and Classes.pptx
RashidFaridChishti
 
MuleSoft Meetup Warsaw Group DataWeave 2.0
MuleSoft Meetup Warsaw Group DataWeave 2.0MuleSoft Meetup Warsaw Group DataWeave 2.0
MuleSoft Meetup Warsaw Group DataWeave 2.0
Patryk Bandurski
 
Design-Principles.ppt
Design-Principles.pptDesign-Principles.ppt
Design-Principles.ppt
nazimsattar
 
Soft performance - measuring
Soft performance - measuringSoft performance - measuring
Soft performance - measuring
Dimiter Simov
 
Cs 568 Spring 10 Lecture 5 Estimation
Cs 568 Spring 10  Lecture 5 EstimationCs 568 Spring 10  Lecture 5 Estimation
Cs 568 Spring 10 Lecture 5 Estimation
Lawrence Bernstein
 
cloudworkloadanalysisandsimulation-140521153543-phpapp02
cloudworkloadanalysisandsimulation-140521153543-phpapp02cloudworkloadanalysisandsimulation-140521153543-phpapp02
cloudworkloadanalysisandsimulation-140521153543-phpapp02
PRIYANKA MEHTA
 
Bca winter 2013 2nd sem
Bca winter 2013 2nd semBca winter 2013 2nd sem
Bca winter 2013 2nd sem
smumbahelp
 
C++ Memory Management
C++ Memory ManagementC++ Memory Management
C++ Memory Management
Anil Bapat
 
Industrial egineering
Industrial egineeringIndustrial egineering
Industrial egineering
Rajeev Sharan
 
how to build a Length of Stay model for a ProofOfConcept project
how to build a Length of Stay model for a ProofOfConcept projecthow to build a Length of Stay model for a ProofOfConcept project
how to build a Length of Stay model for a ProofOfConcept project
Zenodia Charpy
 
Forecasting database performance
Forecasting database performanceForecasting database performance
Forecasting database performance
Shenglin Du
 

Similar to Advanced Outlier Detection and Noise Reduction with Splunk & MLTK August 11, 2021 (20)

Prog1-L2.pptx
Prog1-L2.pptxProg1-L2.pptx
Prog1-L2.pptx
 
Object Oriented Programming Short Notes for Preperation of Exams
Object Oriented Programming Short Notes for Preperation of ExamsObject Oriented Programming Short Notes for Preperation of Exams
Object Oriented Programming Short Notes for Preperation of Exams
 
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
 
Project for Student Result System
Project for Student Result SystemProject for Student Result System
Project for Student Result System
 
lec4.ppt
lec4.pptlec4.ppt
lec4.ppt
 
Optimization of workload prediction based on map reduce frame work in a cloud...
Optimization of workload prediction based on map reduce frame work in a cloud...Optimization of workload prediction based on map reduce frame work in a cloud...
Optimization of workload prediction based on map reduce frame work in a cloud...
 
Optimization of workload prediction based on map reduce frame work in a cloud...
Optimization of workload prediction based on map reduce frame work in a cloud...Optimization of workload prediction based on map reduce frame work in a cloud...
Optimization of workload prediction based on map reduce frame work in a cloud...
 
Ch02 primitive-data-definite-loops
Ch02 primitive-data-definite-loopsCh02 primitive-data-definite-loops
Ch02 primitive-data-definite-loops
 
Measuring User Experience
Measuring User ExperienceMeasuring User Experience
Measuring User Experience
 
Object Oriented Programming using C++: Ch06 Objects and Classes.pptx
Object Oriented Programming using C++: Ch06 Objects and Classes.pptxObject Oriented Programming using C++: Ch06 Objects and Classes.pptx
Object Oriented Programming using C++: Ch06 Objects and Classes.pptx
 
MuleSoft Meetup Warsaw Group DataWeave 2.0
MuleSoft Meetup Warsaw Group DataWeave 2.0MuleSoft Meetup Warsaw Group DataWeave 2.0
MuleSoft Meetup Warsaw Group DataWeave 2.0
 
Design-Principles.ppt
Design-Principles.pptDesign-Principles.ppt
Design-Principles.ppt
 
Soft performance - measuring
Soft performance - measuringSoft performance - measuring
Soft performance - measuring
 
Cs 568 Spring 10 Lecture 5 Estimation
Cs 568 Spring 10  Lecture 5 EstimationCs 568 Spring 10  Lecture 5 Estimation
Cs 568 Spring 10 Lecture 5 Estimation
 
cloudworkloadanalysisandsimulation-140521153543-phpapp02
cloudworkloadanalysisandsimulation-140521153543-phpapp02cloudworkloadanalysisandsimulation-140521153543-phpapp02
cloudworkloadanalysisandsimulation-140521153543-phpapp02
 
Bca winter 2013 2nd sem
Bca winter 2013 2nd semBca winter 2013 2nd sem
Bca winter 2013 2nd sem
 
C++ Memory Management
C++ Memory ManagementC++ Memory Management
C++ Memory Management
 
Industrial egineering
Industrial egineeringIndustrial egineering
Industrial egineering
 
how to build a Length of Stay model for a ProofOfConcept project
how to build a Length of Stay model for a ProofOfConcept projecthow to build a Length of Stay model for a ProofOfConcept project
how to build a Length of Stay model for a ProofOfConcept project
 
Forecasting database performance
Forecasting database performanceForecasting database performance
Forecasting database performance
 

More from Becky Burwell

SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
Becky Burwell
 
SFBA Splunk Usergroup meeting December 14, 2023
SFBA Splunk Usergroup meeting December 14, 2023SFBA Splunk Usergroup meeting December 14, 2023
SFBA Splunk Usergroup meeting December 14, 2023
Becky Burwell
 
SFBA_SUG_2023-08-02.pdf
SFBA_SUG_2023-08-02.pdfSFBA_SUG_2023-08-02.pdf
SFBA_SUG_2023-08-02.pdf
Becky Burwell
 
SFBA Splunk Usergroup meeting May 3, 2023
SFBA Splunk Usergroup meeting May 3, 2023SFBA Splunk Usergroup meeting May 3, 2023
SFBA Splunk Usergroup meeting May 3, 2023
Becky Burwell
 
SFBA Splunk User Group Meeting February 2023
SFBA Splunk User Group Meeting February 2023SFBA Splunk User Group Meeting February 2023
SFBA Splunk User Group Meeting February 2023
Becky Burwell
 
SFBA Splunk Usergroup meeting December 2022
SFBA Splunk Usergroup meeting December 2022SFBA Splunk Usergroup meeting December 2022
SFBA Splunk Usergroup meeting December 2022
Becky Burwell
 
SFBA Usergroup meeting November 2, 2022
SFBA Usergroup meeting November 2, 2022SFBA Usergroup meeting November 2, 2022
SFBA Usergroup meeting November 2, 2022
Becky Burwell
 
SF Bay Area Splunk User Group Meeting October 5, 2022
SF Bay Area Splunk User Group Meeting October 5, 2022SF Bay Area Splunk User Group Meeting October 5, 2022
SF Bay Area Splunk User Group Meeting October 5, 2022
Becky Burwell
 
SFBA Splunk User Group Meeting August 10, 2022
SFBA Splunk User Group Meeting August 10, 2022SFBA Splunk User Group Meeting August 10, 2022
SFBA Splunk User Group Meeting August 10, 2022
Becky Burwell
 
SFBA Splunk Usergroup meeting July 13, 2022
SFBA Splunk Usergroup meeting July 13, 2022SFBA Splunk Usergroup meeting July 13, 2022
SFBA Splunk Usergroup meeting July 13, 2022
Becky Burwell
 
designing-resilient-cloud-native-splunk-arch-in-aws-austin-rose.pdf
designing-resilient-cloud-native-splunk-arch-in-aws-austin-rose.pdfdesigning-resilient-cloud-native-splunk-arch-in-aws-austin-rose.pdf
designing-resilient-cloud-native-splunk-arch-in-aws-austin-rose.pdf
Becky Burwell
 
Splunking configfiles 20211208_daniel_wilson
Splunking configfiles 20211208_daniel_wilsonSplunking configfiles 20211208_daniel_wilson
Splunking configfiles 20211208_daniel_wilson
Becky Burwell
 
Getting Started with Splunk Observability September 8, 2021
Getting Started with Splunk Observability September 8, 2021Getting Started with Splunk Observability September 8, 2021
Getting Started with Splunk Observability September 8, 2021
Becky Burwell
 

More from Becky Burwell (13)

SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
SFBA Splunk Usergroup meeting December 14, 2023
SFBA Splunk Usergroup meeting December 14, 2023SFBA Splunk Usergroup meeting December 14, 2023
SFBA Splunk Usergroup meeting December 14, 2023
 
SFBA_SUG_2023-08-02.pdf
SFBA_SUG_2023-08-02.pdfSFBA_SUG_2023-08-02.pdf
SFBA_SUG_2023-08-02.pdf
 
SFBA Splunk Usergroup meeting May 3, 2023
SFBA Splunk Usergroup meeting May 3, 2023SFBA Splunk Usergroup meeting May 3, 2023
SFBA Splunk Usergroup meeting May 3, 2023
 
SFBA Splunk User Group Meeting February 2023
SFBA Splunk User Group Meeting February 2023SFBA Splunk User Group Meeting February 2023
SFBA Splunk User Group Meeting February 2023
 
SFBA Splunk Usergroup meeting December 2022
SFBA Splunk Usergroup meeting December 2022SFBA Splunk Usergroup meeting December 2022
SFBA Splunk Usergroup meeting December 2022
 
SFBA Usergroup meeting November 2, 2022
SFBA Usergroup meeting November 2, 2022SFBA Usergroup meeting November 2, 2022
SFBA Usergroup meeting November 2, 2022
 
SF Bay Area Splunk User Group Meeting October 5, 2022
SF Bay Area Splunk User Group Meeting October 5, 2022SF Bay Area Splunk User Group Meeting October 5, 2022
SF Bay Area Splunk User Group Meeting October 5, 2022
 
SFBA Splunk User Group Meeting August 10, 2022
SFBA Splunk User Group Meeting August 10, 2022SFBA Splunk User Group Meeting August 10, 2022
SFBA Splunk User Group Meeting August 10, 2022
 
SFBA Splunk Usergroup meeting July 13, 2022
SFBA Splunk Usergroup meeting July 13, 2022SFBA Splunk Usergroup meeting July 13, 2022
SFBA Splunk Usergroup meeting July 13, 2022
 
designing-resilient-cloud-native-splunk-arch-in-aws-austin-rose.pdf
designing-resilient-cloud-native-splunk-arch-in-aws-austin-rose.pdfdesigning-resilient-cloud-native-splunk-arch-in-aws-austin-rose.pdf
designing-resilient-cloud-native-splunk-arch-in-aws-austin-rose.pdf
 
Splunking configfiles 20211208_daniel_wilson
Splunking configfiles 20211208_daniel_wilsonSplunking configfiles 20211208_daniel_wilson
Splunking configfiles 20211208_daniel_wilson
 
Getting Started with Splunk Observability September 8, 2021
Getting Started with Splunk Observability September 8, 2021Getting Started with Splunk Observability September 8, 2021
Getting Started with Splunk Observability September 8, 2021
 

Recently uploaded

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 

Recently uploaded (20)

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 

Advanced Outlier Detection and Noise Reduction with Splunk & MLTK August 11, 2021

  • 1. Advanced Outlier Detection and Noise Reduction with Splunk & MLTK Presented by: Urwah Haq August 10th, 2021 Presented by Urwah Haq @ San Francisco Splunk User Group 1
  • 2. Slide 2 DI Confidential 18th Dec 2019 Agenda 1. Common Ways of finding outliers • Review of some math terminology • Review on Outlier blog what it covers • Re-introduce moving average & foreach function 2. Using the ‘density function’ in MLTK • An example of ML algorithm to detect outliers 3. Combining Multiple methods 1+2 • Ensemble Learning (combining multiple ML methods) 4. T-Tests & Clustering – What are they are how to use them? 2
  • 3. Slide 3 DI Confidential 18th Dec 2019 ML/Splunk Terminology Refresher Statistics Terms: • Mean/Average – Central value in a set of data • Standard Deviation – Measure of spread of data (higher the stdev the larger the difference between the points) • Time Series Data/Events - Time Series Data is data that is collected/ingested in Splunk over intervals of time ML Terms: • Outliers – Legitimate Data Points that deviate far away from the norm • Anomalies – An action that may seem out of order with the rest of data • Outliers vs Anomalies – For our purposes in Splunk any deviations in data such as mb_out from firewall data or cpu/mem/network utilization can be considered ‘Outliers’. Anything involving user actions such as Urwah installing 10+ splunkbase applications on a Sunday is considered an ‘Anomaly’ Anomalies Outliers Relational anomalies + Others 3
  • 4. Slide 4 DI Confidential 18th Dec 2019 1 - What is an Outlier • A point away from the body of data points • A data point different than rest of the points • In Splunk one of the most common ways to find outliers is to set boundaries • If datapoint deviates away from these boundaries tag them as outliers 4
  • 5. Slide 5 DI Confidential 18th Dec 2019 1- Types of Outlier detection (NO ML) Blog: https://discoveredintelligence.ca/quick-guide-to-outlier-detection-in-splunk/ 1. Static Threshold a) If(value) > X(fixed threshold) THEN X is an outlier 2. Moving Thresholding a) If(value) > X(moving average or moving value) THEN X is an outlier b) Can use functions such as ‘trendline sma/ema’ OR ‘streamstats window=N’ c) We can get creative with this index=main user=* sourcetype=WinEventLog| timechart count by user| eval threshold=100 Static Threshold | inputlookup app_usage.csv| rename * as user_*| rename user__time as _time| table _time *| eval threshold=100 Moving Threshold | inputlookup app_usage.csv | rename * as user_* | rename user__time as _time | table _time * | eval threshold=1200 | addtotals fieldname=total | eval limit=0| rename OTHER as u_OTHER | eval distinct_values=0 | foreach user_* [ eval distinct_values=if(<<FIELD>> >0,distinct_values+1,distinct_values)] | eval average=round(total/distinct_values,2) | eval average=if(distinct_values=1 AND average >50,round(average/5),average)| table _time average user_* 5
  • 6. Slide 6 DI Confidential 18th Dec 2019 1 – How basic moving average works Moving Thresholding a) A moving threshold is not just the average of past X number of points it can be a lot more b) Basic search for moving average of past 5 data points | inputlookup user_usage.csv | table _time * | eval threshold=1200 | addtotals fieldname=total | table _time total| trendline sma5(total) as 5_moving_average Here is what a simple average looks like with window=2: _time User_a User_b User_C User_D User_E Average Moving Average 9:00 0 0 10 15 5  (0 + 0 + 0 +10 + 15 +5)/5 = 6 9:15 0 0 0 5 5 (0 + 0 + 0 + 0 + 5 + 5 )/5 = 2 4 9:30 1 2 3 4 5 3 2.5 9:45 0 1 5 4 5 3 3 10:00 1 3 0 5 2 2.2 2.1 10:15 1 0 4 6 3 2.8 10:30 1 0 0 7 0 1.6 5 Active Users Using trendline | inputlookup app_usage.csv | rename * as user_* | rename user__time as _time | table _time * | eval threshold=1200 | addtotals fieldname=total | table _time total| trendline sma5(total) as 5_moving_average Using streamstats | inputlookup app_usage.csv | rename * as user_* | rename user__time as _time | table _time * | eval threshold=1200 | addtotals fieldname=total | table _time total | trendline sma5(total) as 5_moving_average| streamstats window=5 avg(total) as streamstats_moving_average Using streamstats & autoregress | inputlookup app_usage.csv | rename * as user_* | rename user__time as _time | table _time * | eval threshold=1200 | addtotals fieldname=total | table _time total 6
  • 7. | streamstats window=5 avg(total) as streamstats_moving_average | autoregress streamstats_moving_average as previous_moving_average 6
  • 8. Slide 7 DI Confidential 18th Dec 2019 1 – Using Foreach Function to adjust moving average • Use ‘Foreach’ function with conditions. E.g ONLY use ‘active’ users with hits>0 to calculate average _time User_a User_b User_C User_ D User_E New Average New Moving Average 9:00 0 0 10 15 5  (10 + 15 +5)/3 = 10 9:15 0 0 0 5 5 (5 + 5 )/2 = 10 10 9:30 1 2 3 4 5 3 6.5 9:45 0 1 5 4 5 3 3 10:00 1 3 0 5 2 2.2 2.65 10:15 1 0 4 6 3 2.8 10:30 1 0 0 7 0 1.6 3 Active Users Using Foreach function | inputlookup app_usage.csv | rename * as user_* | rename user__time as _time | table _time * | eval threshold=1200 | addtotals fieldname=total | rename OTHER as u_OTHER | eval distinct_values=0 | foreach user_* [ eval distinct_values=if(<<FIELD>> >100,distinct_values+1,distinct_values)] | eval new_average=round(total/distinct_values,2) | eval old_average=round(total/11,2)| table _time new_average old_average 7
  • 9. Slide 8 DI Confidential 18th Dec 2019 1 – Using Foreach vs Aggregate Moving Average _time User _a User _b User _C User _D User _E 9:00 0 0 10 15 5 9:15 0 0 0 5 5 9:30 1 2 3 4 5 9:45 0 1 5 4 5 10:00 1 3 0 5 2 10:15 1 0 4 6 3 10:30 1 0 0 7 0 Basic method Using Foreach method • Designed such that a user with 0 activity will count as an ‘active user’ • Simple to implement • Better to use for total aggregates • Results in more ‘outliers’ due to static or moving bound • Only users with activity will be counted as ‘active users’ • More Complicated to setup • Better to use when you have a limited number of Users/Ips or Entities • Gives a more accurate picture of User/IP that is more active than normal Using Foreach function | inputlookup app_usage.csv | rename * as user_* | rename user__time as _time | table _time * | eval threshold=1200 | addtotals fieldname=total | rename OTHER as u_OTHER | eval distinct_values=0 | foreach user_* [ eval distinct_values=if(<<FIELD>> >100,distinct_values+1,distinct_values)] | eval new_average=round(total/distinct_values,2) | eval old_average=round(total/11,2)| table _time new_average old_average 8
  • 10. Slide 9 DI Confidential 18th Dec 2019 2 - Introducing the ‘Density Function’ • What is the ‘Density Function’ within MLTK? • It is another tool for you to use in anomaly detection on top of previous methods to find anomalies. • It is better to use at an aggregate level (e.g span=15/30/60min) • It works by plotting your values against mathematics distributions to calculate the probability of them happening • Similar to the “| anomalydetection method=histogram [field_name]” All user activity counts Activity Bins 0-100 500-600 600-700 Activity between 500-700 is usually most common in a day when span and have the highest probability of happening 1100-1200 Activities in these bins have the lowest probability of occurring  More likely to be outliers DensityFunction - https://docs.splunk.com/Documentation/MLApp/5.2.1/User/Algorithms#DensityF unction AnomalyDetection - https://docs.splunk.com/Documentation/SplunkCloud/8.2.2104/SearchReference/ Anomalydetection DensityFunction | inputlookup app_usage.csv | rename * as user_* | rename user__time as _time | table _time * | eval threshold=1200 | addtotals fieldname=total | fields _time total| bin total start=1 end=5| stats count by total DensityFunction Example | inputlookup app_usage.csv | rename * as user_* | rename user__time as _time | table _time * | eval threshold=1200 | addtotals fieldname=total | fields _time total| fit DensityFunction total 9
  • 11. Overlay - Overlay Line using visual formatting options | inputlookup app_usage.csv | rename * as user_* | rename user__time as _time | table _time * | eval threshold=1200 | addtotals fieldname=total | fields _time total | bin total start=1 end=5 | stats count by total| eval overlay=count 9
  • 12. Slide 10 DI Confidential 18th Dec 2019 2 – Using the Density Function • Where it works well • Data that is continuous, with little to no gaps • For Aggregate-level e.g total activity • For Entity-level (users/Ips) that has few or no gaps (fit DensityFunction <Field> by “User” into Model_Name) 10
  • 13. Slide 11 DI Confidential 18th Dec 2019 • Using Density Function at Aggregate Level • Use foreach moving average method 3 – Combining Density Function with Moving Averages 11
  • 14. Slide 12 DI Confidential 18th Dec 2019 • Using Density Function at Aggregate Level • …..| fields _time Total| fit DensityFunction Total show_density=true into my_usergroup_model • Use foreach moving average method • …. | foreach user_* [ eval distinct_values=if(<<FIELD>> >0,distinct_values+1,distinct_values)] | eval new_average=round(total/distinct_values,2) | table _time * new_average | foreach user_* [ eval isOutlier_<<FIELD>>=if(<<FIELD>> > 2*new_average,1,0)] 3 – Combining Searches Output Fields Output Fields _time, isOutlier (Aggregate) _time, isOutlier_user1, isOutlier_user2, isOutlier_user3, … Reference Outlier (Aggregate) in user-level outlier search from 1 of 3 options: 1 – Lookup 2 – Summary Index 3 – Inline Search | inputlookup user_usage.csv| addtotals| fields _time Total| fit DensityFunction Total show_density=true into my_usergroup_model | inputlookup app_usage.csv | rename * as user_* | rename user__time as _time | table _time * | eval threshold=1200 | addtotals fieldname=total | rename OTHER as u_OTHER | eval distinct_values=0 | foreach user_* [ eval distinct_values=if(<<FIELD>> >0,distinct_values+1,distinct_values)] | eval new_average=round(total/distinct_values,2) | table _time * new_average | foreach user_* [ eval isOutlier_<<FIELD>>=if(<<FIELD>> > 2*new_average,1,0)] 12
  • 15. Slide 13 DI Confidential 18th Dec 2019 How do I make most use of all of outlier methods? • Apply Density Function or any other technique to find a time frame that was an outlier • Save results in lookup or summary index for reference Aggregate Level • Use user-level outlier technique to find a user who was an outlier at a certain time • Reference that time with the aggregate level Entity (user/Ip) level • Reference regional outliers using _time or time buckets as the common field with the aggregate level & user level (Optional) Regional-Level Advantages of combining multiple styles of outlier detection at different data levels • Verification of true outliers vs a simple static value • Less noisy for alerting • Alert only when all 2 or 3 levels of outliers are met • Validate if rise/fall of aggregate level was contributed by one or more user. If one user that is a confirmed outlier 13
  • 16. Slide 14 DI Confidential 18th Dec 2019 More Advanced Ensemble Techniques Aggregate Level Entity Level Available ML Techniques • Density Function to find most rare time buckets with highest values as outliers • Regression to find loudest times buckets • Classification to find times with highest probability of being outliers • Statespace algorithm & anomaly detection algorithm Available Non-ML Techniques • Static thresholds • Moving Averages thresholds Available ML Techniques • Density Function to find most rare time buckets with highest values as outliers • Classification to find entities with highest probability of going above thresholds • Statespace algorithm & anomaly detection algorithm Available Non-ML Techniques • Static thresholds • Moving Averages thresholds • Foreach and activity based averages • Better outliers • Less mundane alerting • Statespace algorithm & anomaly detection algorithm 14
  • 17. Slide 15 DI Confidential 18th Dec 2019 4 – Increasing Outlier Function Accuracy 1. Find Entities/Users/Ips that form a large percentage of your overall activity and remove them • This can be measured by using the correlation OR t-test function from MLTK 2. Group Similar sets of Entities/Users/Ips using the clustering command in MLTK • Analyze each cluster individually. The cluster command 15
  • 18. Slide 16 DI Confidential 18th Dec 2019 Thank you | inputlookup query.csv| fit TFIDF query stop_words=english analyzer=word token_pattern="w{3,20}" max_features=200| fit KMeans query* k=3| fields user query cluster cluster_distance 16
  • 19. Slide 17 DI Confidential 18th Dec 2019 Scoring Function to determine similarity Scoring function | score <test_name> <fields>… https://docs.splunk.com/Documentation/MLApp/5.2.1/User/Scorecommand#T-test_.281_sample.29 Available tests: • T-test(s): 1. Test if two Ips/User have identical pattern from different groups/domains (T-test 2 independent sample) 2. Test if single user/ip is equal to a average from group (T-test 1 sample) 3. Test if two Ips/User have identical pattern from same group/domain (T-test 2 related samples) • Energy Distance: The closer this value to 0 the similar two fields are in-terms of gain/loss overtime (or mathematically they have similar cumulative distributive function) • Kolmogorov-Smirnov (KS): Test if something is statistically identical to another field • Kwiatkowski-Phillips-Schmidt-Shin: Test if field(s) trend is stationary – no or little gain/loss T-test examples: 1 | inputlookup app_usage.csv | rename * as user_* | rename user__time as _time | table _time user_ITOps | score ttest_1samp user_ITOps popmean=100 alpha=0.1 2 | inputlookup app_usage.csv | rename * as user_* | rename user__time as _time | table _time user_HR1 user_HR2 user_ERP user_CRM user_ITOps | score ttest_ind user_HR1 against user_HR2 user_ITOps | inputlookup app_usage.csv | rename * as user_* | rename user__time as _time | table _time user_HR1 user_HR2 user_ERP user_CRM user_ITOps 17
  • 20. | score ttest_ind user_HR1 against user_HR1 user_ITOps 3 | inputlookup app_usage.csv | rename * as user_* | rename user__time as _time | table _time user_HR1 user_HR2 user_ERP user_CRM user_ITOps | score ttest_rel user_HR1 against user_HR1 user_ITOps Energy Distance | inputlookup app_usage.csv | rename * as user_* | rename user__time as _time | table _time user_HR1 user_HR2 user_ERP user_CRM user_ITOps user_RemoteAccess user_Webmail | score energy_distance user_Webmail against user_RemoteAccess | inputlookup app_usage.csv | rename * as user_* | rename user__time as _time | table _time user_HR1 user_HR2 user_ERP user_CRM user_ITOps user_RemoteAccess user_Webmail | score energy_distance user_HR1 against user_HR1 | inputlookup app_usage.csv | rename * as user_* | rename user__time as _time | table _time user_RemoteAccess user_Webmail | fit CorrelationMatrix method=kendall user_Webmail user_RemoteAccess 17
  • 21. Slide 18 DI Confidential 18th Dec 2019 Streamstats - Explanation Window=2 18