SlideShare a Scribd company logo

Data Mining with Splunk

1 of 66
Download to read offline
Data Mining and
  Exploration
           David Carasso, Office of CTO, Chief Mind
AGENDA
What is data mining?

What’s the plan of attack?

What type of events do I have?

How do I mine fields?

How do I to detect anomalous events?

Why do I need to visualize my data?
What is Data Mining?


                       3
Is this data mining?

This is an orange




                                   4
What is Data Mining?

Extracting implicit, previously unknown, and
potentially useful information from data.




                                               5
Better




         6

Recommended

Data Onboarding Breakout Session
Data Onboarding Breakout SessionData Onboarding Breakout Session
Data Onboarding Breakout SessionSplunk
 
SplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with SplunkSplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with SplunkSplunk
 
English language investigation
English language investigationEnglish language investigation
English language investigationConnorevansmedia
 
Iptables Configuration
Iptables ConfigurationIptables Configuration
Iptables Configurationstom123
 
ICS/SCADA/PLC Google/Shodanhq Cheat Sheet
ICS/SCADA/PLC Google/Shodanhq Cheat SheetICS/SCADA/PLC Google/Shodanhq Cheat Sheet
ICS/SCADA/PLC Google/Shodanhq Cheat Sheetqqlan
 
Research Methods - fun slides
Research Methods - fun slidesResearch Methods - fun slides
Research Methods - fun slidesDamian T. Gordon
 

More Related Content

What's hot

Web Uygulama Güvenliği Ve Güvenli Kod Geliştirme Eğitim Notlarım
Web Uygulama Güvenliği Ve Güvenli Kod Geliştirme Eğitim NotlarımWeb Uygulama Güvenliği Ve Güvenli Kod Geliştirme Eğitim Notlarım
Web Uygulama Güvenliği Ve Güvenli Kod Geliştirme Eğitim NotlarımNur Yesilyurt
 
Network Forensics Intro
Network Forensics IntroNetwork Forensics Intro
Network Forensics IntroJake K.
 
Hping ile IP/ICMP ve UDP Paketleri Oluşturma
Hping ile IP/ICMP ve UDP Paketleri OluşturmaHping ile IP/ICMP ve UDP Paketleri Oluşturma
Hping ile IP/ICMP ve UDP Paketleri OluşturmaBGA Cyber Security
 
The illuminati explained rothschild family
The illuminati explained rothschild familyThe illuminati explained rothschild family
The illuminati explained rothschild familyMustakeem Chaudhri
 
Get More From Your Data with Splunk AI + ML
Get More From Your Data with Splunk AI + MLGet More From Your Data with Splunk AI + ML
Get More From Your Data with Splunk AI + MLSplunk
 
Wired and Wireless Network Forensics
Wired and Wireless Network ForensicsWired and Wireless Network Forensics
Wired and Wireless Network ForensicsSavvius, Inc
 
Decifrando as cartas iluminati 2016,2017,2018,2019,2020,2030,2045 a 2150 o co...
Decifrando as cartas iluminati 2016,2017,2018,2019,2020,2030,2045 a 2150 o co...Decifrando as cartas iluminati 2016,2017,2018,2019,2020,2030,2045 a 2150 o co...
Decifrando as cartas iluminati 2016,2017,2018,2019,2020,2030,2045 a 2150 o co...ELIAS OMEGA
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseSplunk
 
Launch and Environment Constraints Overview
Launch and Environment Constraints OverviewLaunch and Environment Constraints Overview
Launch and Environment Constraints OverviewCsaba Fitzl
 
SANS Purple Team Summit 2021: Active Directory Purple Team Playbooks
SANS Purple Team Summit 2021: Active Directory Purple Team PlaybooksSANS Purple Team Summit 2021: Active Directory Purple Team Playbooks
SANS Purple Team Summit 2021: Active Directory Purple Team PlaybooksMauricio Velazco
 
Threat hunting on the wire
Threat hunting on the wireThreat hunting on the wire
Threat hunting on the wireInfoSec Addicts
 
Stuxnet mass weopan of cyber attack
Stuxnet mass weopan of cyber attackStuxnet mass weopan of cyber attack
Stuxnet mass weopan of cyber attackAjinkya Nikam
 
Reverse of DPAPI - BlackHat DC 2010
Reverse of DPAPI - BlackHat DC 2010Reverse of DPAPI - BlackHat DC 2010
Reverse of DPAPI - BlackHat DC 2010jmichel.p
 
Web Application Forensics: Taxonomy and Trends
Web Application Forensics: Taxonomy and TrendsWeb Application Forensics: Taxonomy and Trends
Web Application Forensics: Taxonomy and TrendsKrassen Deltchev
 

What's hot (20)

Web Uygulama Güvenliği Ve Güvenli Kod Geliştirme Eğitim Notlarım
Web Uygulama Güvenliği Ve Güvenli Kod Geliştirme Eğitim NotlarımWeb Uygulama Güvenliği Ve Güvenli Kod Geliştirme Eğitim Notlarım
Web Uygulama Güvenliği Ve Güvenli Kod Geliştirme Eğitim Notlarım
 
Network Forensics Intro
Network Forensics IntroNetwork Forensics Intro
Network Forensics Intro
 
Project Charter
Project CharterProject Charter
Project Charter
 
Hping ile IP/ICMP ve UDP Paketleri Oluşturma
Hping ile IP/ICMP ve UDP Paketleri OluşturmaHping ile IP/ICMP ve UDP Paketleri Oluşturma
Hping ile IP/ICMP ve UDP Paketleri Oluşturma
 
Vandyke SecureCRT tips and tricks
Vandyke SecureCRT tips and tricksVandyke SecureCRT tips and tricks
Vandyke SecureCRT tips and tricks
 
The illuminati explained rothschild family
The illuminati explained rothschild familyThe illuminati explained rothschild family
The illuminati explained rothschild family
 
Genes And Neurons
Genes And NeuronsGenes And Neurons
Genes And Neurons
 
Get More From Your Data with Splunk AI + ML
Get More From Your Data with Splunk AI + MLGet More From Your Data with Splunk AI + ML
Get More From Your Data with Splunk AI + ML
 
Wired and Wireless Network Forensics
Wired and Wireless Network ForensicsWired and Wireless Network Forensics
Wired and Wireless Network Forensics
 
Decifrando as cartas iluminati 2016,2017,2018,2019,2020,2030,2045 a 2150 o co...
Decifrando as cartas iluminati 2016,2017,2018,2019,2020,2030,2045 a 2150 o co...Decifrando as cartas iluminati 2016,2017,2018,2019,2020,2030,2045 a 2150 o co...
Decifrando as cartas iluminati 2016,2017,2018,2019,2020,2030,2045 a 2150 o co...
 
Tcp ip
Tcp ipTcp ip
Tcp ip
 
Network Forensic
Network ForensicNetwork Forensic
Network Forensic
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
 
OS Fingerprinting
OS FingerprintingOS Fingerprinting
OS Fingerprinting
 
Launch and Environment Constraints Overview
Launch and Environment Constraints OverviewLaunch and Environment Constraints Overview
Launch and Environment Constraints Overview
 
SANS Purple Team Summit 2021: Active Directory Purple Team Playbooks
SANS Purple Team Summit 2021: Active Directory Purple Team PlaybooksSANS Purple Team Summit 2021: Active Directory Purple Team Playbooks
SANS Purple Team Summit 2021: Active Directory Purple Team Playbooks
 
Threat hunting on the wire
Threat hunting on the wireThreat hunting on the wire
Threat hunting on the wire
 
Stuxnet mass weopan of cyber attack
Stuxnet mass weopan of cyber attackStuxnet mass weopan of cyber attack
Stuxnet mass weopan of cyber attack
 
Reverse of DPAPI - BlackHat DC 2010
Reverse of DPAPI - BlackHat DC 2010Reverse of DPAPI - BlackHat DC 2010
Reverse of DPAPI - BlackHat DC 2010
 
Web Application Forensics: Taxonomy and Trends
Web Application Forensics: Taxonomy and TrendsWeb Application Forensics: Taxonomy and Trends
Web Application Forensics: Taxonomy and Trends
 

Viewers also liked

Splunk conf2014 - Detecting Fraud and Suspicious Events Using Risk Scoring
Splunk conf2014 - Detecting Fraud and Suspicious Events Using Risk ScoringSplunk conf2014 - Detecting Fraud and Suspicious Events Using Risk Scoring
Splunk conf2014 - Detecting Fraud and Suspicious Events Using Risk ScoringSplunk
 
SplunkLive! Splunk for Insider Threats and Fraud Detection
SplunkLive! Splunk for Insider Threats and Fraud DetectionSplunkLive! Splunk for Insider Threats and Fraud Detection
SplunkLive! Splunk for Insider Threats and Fraud DetectionSplunk
 
Machine Learning + Analytics in Splunk
Machine Learning + Analytics in SplunkMachine Learning + Analytics in Splunk
Machine Learning + Analytics in SplunkSplunk
 
Splunk FISMA for Continuous Monitoring
Splunk FISMA for Continuous Monitoring Splunk FISMA for Continuous Monitoring
Splunk FISMA for Continuous Monitoring Greg Hanchin
 
SplunkLive! Data Models 101
SplunkLive! Data Models 101SplunkLive! Data Models 101
SplunkLive! Data Models 101Splunk
 
Virtual SplunkLive! for Higher Education Overview/Customers
Virtual SplunkLive! for Higher Education Overview/CustomersVirtual SplunkLive! for Higher Education Overview/Customers
Virtual SplunkLive! for Higher Education Overview/CustomersSplunk
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatternsgrepalex
 
How to integrate Splunk with any data solution
How to integrate Splunk with any data solutionHow to integrate Splunk with any data solution
How to integrate Splunk with any data solutionJulian Hyde
 
.conf2011: Web Analytics Throwdown: with NPR and Intuit
.conf2011: Web Analytics Throwdown: with NPR and Intuit.conf2011: Web Analytics Throwdown: with NPR and Intuit
.conf2011: Web Analytics Throwdown: with NPR and IntuitErin Sweeney
 
HawkEye : A Real-time Anomaly Detection System
HawkEye : A Real-time Anomaly Detection SystemHawkEye : A Real-time Anomaly Detection System
HawkEye : A Real-time Anomaly Detection SystemSatnam Singh
 
Internship_presentation
Internship_presentationInternship_presentation
Internship_presentationAditya Gautam
 
Splunk Fundamentals: Investigations with Core Splunk - Splunk Tech Day
Splunk Fundamentals: Investigations with Core Splunk - Splunk Tech DaySplunk Fundamentals: Investigations with Core Splunk - Splunk Tech Day
Splunk Fundamentals: Investigations with Core Splunk - Splunk Tech DayZivaro Inc
 
Splunk | Reporting Use Cases
Splunk | Reporting Use CasesSplunk | Reporting Use Cases
Splunk | Reporting Use CasesBeth Goldman
 
Analytics for large-scale time series and event data
Analytics for large-scale time series and event dataAnalytics for large-scale time series and event data
Analytics for large-scale time series and event dataAnodot
 
Science of Anomaly Detection
Science of Anomaly Detection Science of Anomaly Detection
Science of Anomaly Detection Numenta
 
Splunk .conf2011: Real Time Alerting and Monitoring
Splunk .conf2011: Real Time Alerting and MonitoringSplunk .conf2011: Real Time Alerting and Monitoring
Splunk .conf2011: Real Time Alerting and MonitoringErin Sweeney
 
Splunk at Expedia - Gartner Symposium
Splunk at Expedia - Gartner SymposiumSplunk at Expedia - Gartner Symposium
Splunk at Expedia - Gartner SymposiumEddie Satterly
 
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A..."Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...Dataconomy Media
 
Visualizing the Insider Threat: Challenges and tools for identifying maliciou...
Visualizing the Insider Threat: Challenges and tools for identifying maliciou...Visualizing the Insider Threat: Challenges and tools for identifying maliciou...
Visualizing the Insider Threat: Challenges and tools for identifying maliciou...Phil Legg
 

Viewers also liked (20)

Splunk conf2014 - Detecting Fraud and Suspicious Events Using Risk Scoring
Splunk conf2014 - Detecting Fraud and Suspicious Events Using Risk ScoringSplunk conf2014 - Detecting Fraud and Suspicious Events Using Risk Scoring
Splunk conf2014 - Detecting Fraud and Suspicious Events Using Risk Scoring
 
SplunkLive! Splunk for Insider Threats and Fraud Detection
SplunkLive! Splunk for Insider Threats and Fraud DetectionSplunkLive! Splunk for Insider Threats and Fraud Detection
SplunkLive! Splunk for Insider Threats and Fraud Detection
 
Machine Learning + Analytics in Splunk
Machine Learning + Analytics in SplunkMachine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk
 
Splunk FISMA for Continuous Monitoring
Splunk FISMA for Continuous Monitoring Splunk FISMA for Continuous Monitoring
Splunk FISMA for Continuous Monitoring
 
SplunkLive! Data Models 101
SplunkLive! Data Models 101SplunkLive! Data Models 101
SplunkLive! Data Models 101
 
Virtual SplunkLive! for Higher Education Overview/Customers
Virtual SplunkLive! for Higher Education Overview/CustomersVirtual SplunkLive! for Higher Education Overview/Customers
Virtual SplunkLive! for Higher Education Overview/Customers
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatterns
 
How to integrate Splunk with any data solution
How to integrate Splunk with any data solutionHow to integrate Splunk with any data solution
How to integrate Splunk with any data solution
 
.conf2011: Web Analytics Throwdown: with NPR and Intuit
.conf2011: Web Analytics Throwdown: with NPR and Intuit.conf2011: Web Analytics Throwdown: with NPR and Intuit
.conf2011: Web Analytics Throwdown: with NPR and Intuit
 
HawkEye : A Real-time Anomaly Detection System
HawkEye : A Real-time Anomaly Detection SystemHawkEye : A Real-time Anomaly Detection System
HawkEye : A Real-time Anomaly Detection System
 
Internship_presentation
Internship_presentationInternship_presentation
Internship_presentation
 
Splunk Fundamentals: Investigations with Core Splunk - Splunk Tech Day
Splunk Fundamentals: Investigations with Core Splunk - Splunk Tech DaySplunk Fundamentals: Investigations with Core Splunk - Splunk Tech Day
Splunk Fundamentals: Investigations with Core Splunk - Splunk Tech Day
 
Splunk | Reporting Use Cases
Splunk | Reporting Use CasesSplunk | Reporting Use Cases
Splunk | Reporting Use Cases
 
Analytics for large-scale time series and event data
Analytics for large-scale time series and event dataAnalytics for large-scale time series and event data
Analytics for large-scale time series and event data
 
Science of Anomaly Detection
Science of Anomaly Detection Science of Anomaly Detection
Science of Anomaly Detection
 
Splunk .conf2011: Real Time Alerting and Monitoring
Splunk .conf2011: Real Time Alerting and MonitoringSplunk .conf2011: Real Time Alerting and Monitoring
Splunk .conf2011: Real Time Alerting and Monitoring
 
Splunk at Expedia - Gartner Symposium
Splunk at Expedia - Gartner SymposiumSplunk at Expedia - Gartner Symposium
Splunk at Expedia - Gartner Symposium
 
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A..."Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...
 
Visualizing the Insider Threat: Challenges and tools for identifying maliciou...
Visualizing the Insider Threat: Challenges and tools for identifying maliciou...Visualizing the Insider Threat: Challenges and tools for identifying maliciou...
Visualizing the Insider Threat: Challenges and tools for identifying maliciou...
 
Carta Teccsen
Carta TeccsenCarta Teccsen
Carta Teccsen
 

Similar to Data Mining with Splunk

MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB
 
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxyBo-Yi Wu
 
Layer one 2011-gh0stwood-d-dos-attacks
Layer one 2011-gh0stwood-d-dos-attacksLayer one 2011-gh0stwood-d-dos-attacks
Layer one 2011-gh0stwood-d-dos-attacksfangjiafu
 
Diagnostics and Debugging
Diagnostics and DebuggingDiagnostics and Debugging
Diagnostics and DebuggingMongoDB
 
Diagnostics & Debugging webinar
Diagnostics & Debugging webinarDiagnostics & Debugging webinar
Diagnostics & Debugging webinarMongoDB
 
Virtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log AnalysisVirtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log AnalysisKabul Kurniawan
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadeaviadea
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5SAP Concur
 
Containers with systemd-nspawn
Containers with systemd-nspawnContainers with systemd-nspawn
Containers with systemd-nspawnGábor Nyers
 
The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?Raffael Marty
 
Ring 0/-2 Rootkits: bypassing defenses -- DEF CON 2018 USA
Ring 0/-2 Rootkits: bypassing defenses  -- DEF CON 2018 USARing 0/-2 Rootkits: bypassing defenses  -- DEF CON 2018 USA
Ring 0/-2 Rootkits: bypassing defenses -- DEF CON 2018 USAAlexandre Borges
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Saltstack - Orchestration & Application Deployment
Saltstack - Orchestration & Application DeploymentSaltstack - Orchestration & Application Deployment
Saltstack - Orchestration & Application Deploymentinovex GmbH
 
Finding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQLFinding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQLOlivier Doucet
 
Network Simulator Tutorial
Network Simulator TutorialNetwork Simulator Tutorial
Network Simulator Tutorialcscarcas
 
Nagios Conference 2011 - Mike Weber - Training: Getting Started With SNMP
Nagios Conference 2011 - Mike Weber - Training: Getting Started With SNMPNagios Conference 2011 - Mike Weber - Training: Getting Started With SNMP
Nagios Conference 2011 - Mike Weber - Training: Getting Started With SNMPNagios
 
RING 0/-2 ROOKITS : COMPROMISING DEFENSES
 RING 0/-2 ROOKITS : COMPROMISING DEFENSES RING 0/-2 ROOKITS : COMPROMISING DEFENSES
RING 0/-2 ROOKITS : COMPROMISING DEFENSESPriyanka Aash
 
How to Troubleshoot OpenStack Without Losing Sleep
How to Troubleshoot OpenStack Without Losing SleepHow to Troubleshoot OpenStack Without Losing Sleep
How to Troubleshoot OpenStack Without Losing SleepSadique Puthen
 

Similar to Data Mining with Splunk (20)

MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB Performance
 
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
 
Layer one 2011-gh0stwood-d-dos-attacks
Layer one 2011-gh0stwood-d-dos-attacksLayer one 2011-gh0stwood-d-dos-attacks
Layer one 2011-gh0stwood-d-dos-attacks
 
Diagnostics and Debugging
Diagnostics and DebuggingDiagnostics and Debugging
Diagnostics and Debugging
 
Diagnostics & Debugging webinar
Diagnostics & Debugging webinarDiagnostics & Debugging webinar
Diagnostics & Debugging webinar
 
Virtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log AnalysisVirtual Knowledge Graphs for Federated Log Analysis
Virtual Knowledge Graphs for Federated Log Analysis
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadea
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5
 
Metasploitable
MetasploitableMetasploitable
Metasploitable
 
Containers with systemd-nspawn
Containers with systemd-nspawnContainers with systemd-nspawn
Containers with systemd-nspawn
 
The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?
 
Ring 0/-2 Rootkits: bypassing defenses -- DEF CON 2018 USA
Ring 0/-2 Rootkits: bypassing defenses  -- DEF CON 2018 USARing 0/-2 Rootkits: bypassing defenses  -- DEF CON 2018 USA
Ring 0/-2 Rootkits: bypassing defenses -- DEF CON 2018 USA
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Saltstack - Orchestration & Application Deployment
Saltstack - Orchestration & Application DeploymentSaltstack - Orchestration & Application Deployment
Saltstack - Orchestration & Application Deployment
 
Finding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQLFinding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQL
 
Network Simulator Tutorial
Network Simulator TutorialNetwork Simulator Tutorial
Network Simulator Tutorial
 
Ns network simulator
Ns network simulatorNs network simulator
Ns network simulator
 
Nagios Conference 2011 - Mike Weber - Training: Getting Started With SNMP
Nagios Conference 2011 - Mike Weber - Training: Getting Started With SNMPNagios Conference 2011 - Mike Weber - Training: Getting Started With SNMP
Nagios Conference 2011 - Mike Weber - Training: Getting Started With SNMP
 
RING 0/-2 ROOKITS : COMPROMISING DEFENSES
 RING 0/-2 ROOKITS : COMPROMISING DEFENSES RING 0/-2 ROOKITS : COMPROMISING DEFENSES
RING 0/-2 ROOKITS : COMPROMISING DEFENSES
 
How to Troubleshoot OpenStack Without Losing Sleep
How to Troubleshoot OpenStack Without Losing SleepHow to Troubleshoot OpenStack Without Losing Sleep
How to Troubleshoot OpenStack Without Losing Sleep
 

Recently uploaded

Centralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-ManagerCentralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-ManagerSaiLinnThu2
 
Geospatial Synergy: Amplifying Efficiency with FME & Esri
Geospatial Synergy: Amplifying Efficiency with FME & EsriGeospatial Synergy: Amplifying Efficiency with FME & Esri
Geospatial Synergy: Amplifying Efficiency with FME & EsriSafe Software
 
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...ShapeBlue
 
AI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientAI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientKari Kakkonen
 
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...SearchNorwich
 
Artificial Intelligence, Design, and More-than-Human Justice
Artificial Intelligence, Design, and More-than-Human JusticeArtificial Intelligence, Design, and More-than-Human Justice
Artificial Intelligence, Design, and More-than-Human JusticeJosh Gellers
 
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...ShapeBlue
 
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...htrindia
 
How to write an effective Cyber Incident Response Plan
How to write an effective Cyber Incident Response PlanHow to write an effective Cyber Incident Response Plan
How to write an effective Cyber Incident Response PlanDatabarracks
 
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxGraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxNeo4j
 
AMER Introduction to ThousandEyes Webinar
AMER Introduction to ThousandEyes WebinarAMER Introduction to ThousandEyes Webinar
AMER Introduction to ThousandEyes WebinarThousandEyes
 
Roundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfRoundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfMostafa Higazy
 
Battle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsBattle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsEvangelia Mitsopoulou
 
iOncologi_Pitch Deck_2024 slide show for hostinger
iOncologi_Pitch Deck_2024 slide show for hostingeriOncologi_Pitch Deck_2024 slide show for hostinger
iOncologi_Pitch Deck_2024 slide show for hostingerssuser9354ce
 
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHubHow We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHubShapeBlue
 
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...UiPathCommunity
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Product School
 
Confoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceConfoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceSusan Ibach
 
New ThousandEyes Product Features and Release Highlights: February 2024
New ThousandEyes Product Features and Release Highlights: February 2024New ThousandEyes Product Features and Release Highlights: February 2024
New ThousandEyes Product Features and Release Highlights: February 2024ThousandEyes
 

Recently uploaded (20)

Centralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-ManagerCentralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-Manager
 
Geospatial Synergy: Amplifying Efficiency with FME & Esri
Geospatial Synergy: Amplifying Efficiency with FME & EsriGeospatial Synergy: Amplifying Efficiency with FME & Esri
Geospatial Synergy: Amplifying Efficiency with FME & Esri
 
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
 
AI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientAI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficient
 
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...
 
Artificial Intelligence, Design, and More-than-Human Justice
Artificial Intelligence, Design, and More-than-Human JusticeArtificial Intelligence, Design, and More-than-Human Justice
Artificial Intelligence, Design, and More-than-Human Justice
 
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
 
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
 
How to write an effective Cyber Incident Response Plan
How to write an effective Cyber Incident Response PlanHow to write an effective Cyber Incident Response Plan
How to write an effective Cyber Incident Response Plan
 
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxGraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
 
AMER Introduction to ThousandEyes Webinar
AMER Introduction to ThousandEyes WebinarAMER Introduction to ThousandEyes Webinar
AMER Introduction to ThousandEyes Webinar
 
Roundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdfRoundtable_-_API_Research__Testing_Tools.pdf
Roundtable_-_API_Research__Testing_Tools.pdf
 
Battle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsBattle of React State Managers in frontend applications
Battle of React State Managers in frontend applications
 
iOncologi_Pitch Deck_2024 slide show for hostinger
iOncologi_Pitch Deck_2024 slide show for hostingeriOncologi_Pitch Deck_2024 slide show for hostinger
iOncologi_Pitch Deck_2024 slide show for hostinger
 
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHubHow We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
 
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
 
Confoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceConfoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data science
 
New ThousandEyes Product Features and Release Highlights: February 2024
New ThousandEyes Product Features and Release Highlights: February 2024New ThousandEyes Product Features and Release Highlights: February 2024
New ThousandEyes Product Features and Release Highlights: February 2024
 
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
 

Data Mining with Splunk

  • 1. Data Mining and Exploration David Carasso, Office of CTO, Chief Mind
  • 2. AGENDA What is data mining? What’s the plan of attack? What type of events do I have? How do I mine fields? How do I to detect anomalous events? Why do I need to visualize my data?
  • 3. What is Data Mining? 3
  • 4. Is this data mining? This is an orange 4
  • 5. What is Data Mining? Extracting implicit, previously unknown, and potentially useful information from data. 5
  • 6. Better 6
  • 7. Data Preparation Understanding Data Exploration Data Mining 7
  • 8. What’s the plan of attack? 8
  • 9. Preparing the data You've been thrown data you aren't familiar with… Mar 7 12:40:01 willLaptop crond(pam_unix)[10696]: session opened for user root by (uid=0) Mar 7 12:40:01 willLaptop crond(pam_unix)[10695]: session closed for user root Mar 7 12:40:02 willLaptop crond(pam_unix)[10696]: session closed for user root Mar 7 12:44:47 willLaptop gconfd (root-10750): starting (version 2.10.0), pid 10750 user 'root' Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only config... Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address "xml:readwrite:/root/.gconf”… Mar 7 12:45:01 willLaptop crond(pam_unix)[10754]: session opened for user root by (uid=0) Mar 7 12:45:02 willLaptop crond(pam_unix)[10754]: session closed for user root .... Eventtypes Fields Transactions Anomalies (closed sessions) (pid) (open-close) (unexpected address) 9
  • 10. Is Understanding Linear? Event Groups Events reports Anomalies Fields No. 10
  • 11. What type of events do I have? 11
  • 12. Given Some Unknown Data Mar 7 12:40:01 willLaptop crond(pam_unix)[10696]: session opened for user root by (uid=0) Mar 7 12:40:01 willLaptop crond(pam_unix)[10695]: session closed for user root Mar 7 12:40:02 willLaptop crond(pam_unix)[10696]: session closed for user root Mar 7 12:44:47 willLaptop gconfd (root-10750): starting (version 2.10.0), pid 10750 user 'root' Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only config... Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address "xml:readwrite:/root/.gconf”… Mar 7 12:44:47 willLaptop gconfd (root-10750): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration ... Mar 7 12:45:01 willLaptop crond(pam_unix)[10754]: session opened for user root by (uid=0) Mar 7 12:45:02 willLaptop crond(pam_unix)[10754]: session closed for user root .... 12
  • 13. Find Broad Categories of Events Group Events by Content, Format, and Time 13
  • 14. Group Events by Content Cluster events with similar values. Show 3 examples from each cluster, from the most common cluster to the least: …| cluster labelonly=t showcount=t | dedup 3 cluster_label sortby -cluster_count, cluster_label, _time 14
  • 15. Events By Content count label _raw -------------------------------------------------------------------------------------------------------- - 1339 3 Mar 7 11:05:01 willLaptop crond(pam_unix)[6785]: session opened for user root by… 1339 3 Mar 7 11:10:01 willLaptop crond(pam_unix)[1769]: session opened for user root by … 1339 3 Mar 7 11:10:01 willLaptop crond(pam_unix)[1766]: session opened for user root by … 1324 2 Mar 7 11:05:02 willLaptop crond(pam_unix)[6785]: session closed for user root 1324 2 Mar 7 11:10:01 willLaptop crond(pam_unix)[1766]: session closed for user root 1324 2 Mar 7 11:10:02 willLaptop crond(pam_unix)[1769]: session closed for user root 136 13 Mar 7 20:05:08 willLaptop kernel: SELinux: initialized (dev selinuxfs, type selinuxfs)… 136 13 Mar 7 20:05:09 willLaptop kernel: SELinux: initialized (dev usbfs, type usbfs), uses … 136 13 Mar 7 20:05:09 willLaptop kernel: SELinux: initialized (dev sysfs, type sysfs), uses … 15
  • 16. Group by $%#! Format Cluster events by first 7 punctuation chars: …| rex field=punct "(?<smallpunct>.{7})” | eventstats count by smallpunct | sort -count, smallpunct | dedup 3 smallpunct 16
  • 17. Events by Format count smallpunct raw ------------------------------------------------------------------------------------------------ 637 __::__( Mar 10 16:50:02 willLaptop crond(pam_unix)[9639]: session closed for user root 637 __::__( Mar 10 16:50:01 willLaptop crond(pam_unix)[9638]: session closed for user root 637 __::__( Mar 10 16:50:01 willLaptop crond(pam_unix)[9639]: session opened for user root by … 367 __::__: Mar 10 15:30:25 willLaptop dhclient: bound to 10.1.1.194 -- renewal in 5788 seconds. 367 __::__: Mar 10 15:30:25 willLaptop dhclient: DHCPACK from 10.1.1.50 367 __::__: Mar 10 15:30:25 willLaptop dhclient: DHCPREQUEST on eth0 to 10.1.1.50 port 67 57 __::__[ Mar 10 16:46:32 willLaptop ntpd[2544]: synchronized to 138.23.180.126, stratum 2 57 __::__[ Mar 10 16:46:27 willLaptop ntpd[2544]: synchronized to LOCAL(0), stratum 10 57 __::__[ Mar 10 16:42:09 willLaptop ntpd[2544]: time reset -0.236567 s 17
  • 18. Group by Time Look for bursts of events • Turn on computer • Load a web page • Detects speeding car • Print document • Scan security badge 18
  • 19. Group by Time Bursts … | transaction maxpause=2s | search eventcount>1 Mar 10 16:50:01 willLaptop crond(pam_unix)[9638]: session opened for user root by (uid=0) Mar 10 16:50:01 willLaptop crond(pam_unix)[9639]: session opened for user root by (uid=0) Mar 10 16:50:01 willLaptop crond(pam_unix)[9638]: session closed for user root Mar 10 16:50:02 willLaptop crond(pam_unix)[9639]: session closed for user root Mar 10 15:30:25 willLaptop dhclient: DHCPREQUEST on eth0 to 10.1.1.50 port 67 Mar 10 15:30:25 willLaptop dhclient: DHCPACK from 10.1.1.50 Mar 10 15:30:25 willLaptop dhclient: bound to 10.1.1.194 -- renewal in 5788 seconds. Mar 10 16:45:01 willLaptop crond(pam_unix)[9553]: session opened for user root by (uid=0) Mar 10 16:45:02 willLaptop crond(pam_unix)[9553]: session closed for user root 19
  • 20. Multiple Sources (not really correct) 20
  • 21. Now what? 1. ✓ group your data 2. tell splunk! 21
  • 22. Telling Splunk (about your groups of events) Add eventtypes and tags Huh? 22
  • 23. SURPRISE TANGENT! What is an eventtype? 23
  • 24. Eventtype A dynamic “tag” added to events, if they would match the search that defines the eventtype. 24
  • 25. Eventtype: Name: “closed_root” Definition: “session closed” root Event: … session closed for user root … => eventtype=closed_root 25
  • 27. Independent searches will return events tagged with previous eventtypes that help classify events. 27
  • 28. Create reports on the classifications you’ve made Ok, it wasn’t a tangent. 28
  • 29. How do I mine fields? 29
  • 30. Fields Correlation Discover correlations to remove uninteresting fields and narrow in on promising reports. haiku 30
  • 31. Fields Correlation Haiku Discover patterns in fields with a correlation: co-occurring fields. indulgence 31
  • 32. Splunkd.log Sample File 09-05-2012 15:34:11.886 -0700 INFO ExecProcessor - Ran script: python /opt/splunk/etc/apps/... 09-05-2012 15:34:02.467 -0700 ERROR TcpOutputProc - Can't find or illegal IP address or ... 09-05-2012 15:32:03.397 -0700 INFO ProcessTracker - Process ran long; type=SplunkOptimize ... 09-05-2012 15:30:20.016 -0700 WARN DispatchCommand - The system is approaching the maximum ... fascinating 32
  • 33. Field Correlation … | correlate RowField C CN Component Context L ... ------------------------ ---- ---- --------- ------- ---- C 1.00 1.00 0.00 0.00 1.00 CN 1.00 1.00 0.00 0.00 1.00 Component 0.00 0.00 1.00 0.06 0.00 Context 0.00 0.00 0.06 1.00 0.00 L 1.00 1.00 0.00 0.00 1.00 Log_Level 0.00 0.00 1.00 0.06 0.00 … 33
  • 34. Field Associations automatically deduce correlations and implications of field values: …| associate Log_Level Component 34
  • 35. Field Association Summary Uncond Cond Ref_Key Ref_Value Target_Key Support Entropy Entropy Increase Top_Conditional_Value --------- ------------------------ ---------- ------- ------- ------- -------- ------------------------ Component DatabaseDirectoryManager Log_Level 34.67% 1.182 0.000 1.182201 WARN (62.25% -> 100.00%) Component HotDBManager Log_Level 38.25% 1.182 0.000 1.182201 INFO (33.15% -> 100.00%) Component SavedSplunker Log_Level 394.31% 1.182 0.000 1.182201 WARN (62.25% -> 100.00%) Component databasePartitionPolicy Log_Level 95.50% 1.182 0.417 0.765017 INFO (33.15% -> 91.57%) Component loader Log_Level 79.17% 1.182 0.050 1.131883 INFO (33.15% -> 99.44%) Component timeinvertedIndex Log_Level 44.28% 1.182 0.000 1.182201 INFO (33.15% -> 100.00%) 35
  • 36. Top Fields by Fields Most common Log_Level by Component: ... | top Log_Level by Component Component Log_Level count percent ---------------------------------- --------- ----- ---------- AdminManager WARN 1 100.000000 DatabaseDirectoryManager WARN 153 100.000000 DateParserVerbose WARN 262 100.000000 DedupProcessor ERROR 1 100.000000 DeploymentClient DEBUG 60 85.714286 DeploymentClient WARN 5 7.142857 36
  • 37. How do I to detect anomalous events? 37
  • 38. Types of Anomalies Anomalies you know about Anomalies you don’t know about 38
  • 39. Handling Known Anomalies. Easy. Define a search for the anomalous condition and make an alert to detect it. ip=10.* NOT domain=mycompany.com … | stats perc99(spent)  500ms. Alert on “spent>500” 39
  • 40. Finding Unknown Anomalies Look for Abnormal • Single-Field Values • Multi-Field Values • Contexts • Visual Inspections… 40
  • 41. Anomalies by Single Field Values Identify anomalous values in a given field either by frequency of occurrence or number of standard deviations from the mean. … | anomalousvalue action=summary pthresh=0.02 | search isNum=YES 41
  • 42. Anomalies by Single Field Values 42
  • 43. Anomalous by Many Values Look for small clusters – by content, format, and time – to find anomalies. For example… …| cluster …| sort cluster_count 43
  • 44. Smallest Clusters by Content count label uri 1 7 /img/skins/default/bolt.png 1 37 /en-US/search/inspector?sid=1345075042.125&namespace=search 1 45 /services/admin/summarization?count=10 1 53 /services/pdfgen/is_available?viewId=index_status_health&... 1 57 /static/splunkrc_cmds.xml 44
  • 45. Small Clusters: Bursts of One Find bursts of just a single events where a pause of 2 seconds occurred around it. … |transaction maxpause=2s | search eventcount = 1 Mar 10 16:46:32 willLaptop ntpd[2544]: synchronized to 138.23.180.126… Mar 10 16:46:27 willLaptop ntpd[2544]: synchronized to LOCAL(0), stratum… Mar 10 16:42:09 willLaptop ntpd[2544]: time reset -0.236567… 45
  • 46. Burst of One Same idea, different data source: splunk [11:58:08] "POST /services/search/jobs/export HTTP/1.1" 200 201630 … [11:12:51] "POST /services/search/jobs/export HTTP/1.1" 200 459441 … [10:00:58] "GET /servicesNS/nobody/SplunkDeploymentMonitor/backfill/… 46
  • 47. Anomalous by Context Identify values not expected by the context of other events. … | anomalies field=file labelonly=true maxvalues=10 47
  • 48. Anomalous by Context Unexpectedness file 0.00 shelper 0.16 shelper 0.00 1345502591.356 0.00 1345502591.356 0.00 1345074401.191 0.00 1345074031.153 time 0.03 1345074328.186 0.00 1345502591.356 0.35 conf-dm_backfill 0.00 1345074309.185 0.00 1345502591.356 48
  • 49. Surprise Eventtype: Part Deux! Classified major categories of your data with eventtypes? -- just search for things that don’t match those eventtypes 49
  • 50. 50
  • 51. Once you can describe anomalous behavior as a search… 51
  • 52. 52
  • 53. Other mining commands • kmeans: Performs k-means clustering on selected fields. • outlier: Removes outlying numerical values. • af (analyze fields): Analyzes numerical fields for their ability to predict another discrete field • fieldsummary : Generates summary information fields. • shape: Produces a symbolic 'shape' attribute describing the shape of a numeric multivalued field 53
  • 54. Why do I need to visualize my data? 54
  • 55. Data Mining by Visualization Visualization can capture nuances in the data that numerical or linguistic summaries cannot easily capture. 55
  • 56. These data points are radically different. *Source: Anscombe’s Quartet (Anscombe 1973) 56
  • 57. Why visualize? Because they all have the exact same • average (7.50) • standard deviation (2.03) • least-squares fit (3 + 0.5x). Do not just rely on numerical summarization. 57
  • 58. But I already have charts! You don’t graph enough. Data Exploration Don’t decide ahead of time what graphs you want Regularly do out-of-the-box scenarios with graphs 58
  • 59. Data Exploration Variations: • Subsets of Events (paying customers vs lookers) • Fields by Fields (including eventtypes and tags) • Ignored fields • Min/max/avg/count • Compare to other times windows • Transactions 59
  • 60. Visual Arrangement Sorting data, Changing Scales (Linear/Log), Min/Max can have a huge difference on looking at the same data. 60
  • 61. Visual Considerations Pick representations that make obvious the distinctions you need to care about. 61
  • 62. Summary 62
  • 63. Summary • Discovery is an iterative process. • Group events by content, format, and time, and define classifications with eventtypes and tags • Focus on promising fields with correlations • Discover unknown anomalies with small clusters. • Visualize your data, from a dozen angles. 63
  • 64. But wait! 64
  • 65. More to come: Predictive Analytics … | forecast foo 65
  • 66. The End Mine the Gap. .,`...,`...,`...,`...,`...,`...,`...,`...,`...,`...,`...,`... .,`......_.,`...,`...,`...,`...,`...,`...,`...,`...,`....._.. ...___..|.|...__._..._.__.,`..._.__.,`..___...__.,`...__.|.|. ../.__|.|.|../._`.|.|.'_.....|.'_..../._...../././.|.|. .|.(__..|.|.|.(_|.|.|.|_).|...|.|.|.|.|.(_).|...V..V./..|_|. ..___|.|_|..__,_|.|..__/....|_|.|_|..___/...._/_/...(_). .,`...,`...,`...,`..|_|.,`...,`...,`...,`...,`...,`...,`..... Golf clapping at #datamining .,`...,`...,`...,`...,`...,`...,`...,`...,`...,`...,`...,`... 66

Editor's Notes

  1. ----- Meeting Notes (9/7/12 14:21) -----[ASK AUDIENCE -- WHAT IS DATA MINING?]
  2. No. Explicit. Learning nothing new. Not significant in meaning.I’m explicitly telling you what it is. You’re not mining it. By looking at the data, you’re not learning anything new by me saying this is an orange. And frankly it’s not useful.
  3. Regularities, patterns, anomalies that are interesting, meaning not obvious, explicit inferences, and at the same time not coincidental or noisy inferences.
  4. Yellow is SodaBlue is PopRed is Coke
  5. Before we can really mine a bunch of text for valuable information, we need to do some prep work. We need to understand our data – the dimensions, the sets of values. In Splunk terms – create fields, eventtypes, transactions, etc.By adding fields, you’re mining out dimensions; by adding eventtypes, you’re mining classes; my adding transactions, you’re mining correlations; etc.BUT… Prepping the data for mining is a data mining task of sorts in itself, and the line between understanding your data and mining is really non-existent. This before-work is sometimes called Data Exploration.
  6. The more knowledge you can add to Splunk about your data the more options you’ll have to analyze it.There maybe data cleaning involved.
  7. You can go from groups of events to understanding events to understanding fields to understanding normality/anomalies to generating reports. But the truth is, this is an iterative process. Each step tells you more about something else. (Un)fortunately, this presentation is linear.
  8. Raw values, like raw text.
  9. Make eventtypes for “session opened”, “session closed”, “linux initialized”. Tag them. Then mine out questions like “how long is the average session?, “how much churn is there?”, etc
  10. Consider linecount as well.
  11. Make eventtypes or tags for cron jobs, ntpd, dhclient. Then mine out questions like “who is running what jobs? Which are the most common?
  12. One of the most useful ways to see how your individual events relate to each other is to look for pauses in your events, as real-physical events often happen in bursts. For example, there are bursts of log activity:When you shutdown a computerWhen you access a web page, which has many images.When a car factory robot detects the next carWhen you turn on a printer and it connects to your computerWhen you scan your security badge
  13. Make transactions for sessions opening and closing. Find unclosed transactions. How often, how many, by whom?
  14. No reason to limit correlations to a particular data source. Splunk can easily correlate them together in one search.Search isn’t correct in that the dedup is removing important consecutive events, but it was useful for showing small correlated events across sources.
  15. If facebook had eventtypes, you’d define any picture that has any of your family members but no co-workers as a ‘family’ pix that you could then have a virtual photo album for. Any pix with a family member outside the bayarea as a “family vacation” pix.When you search your data, you’re essentially weeding out all unwanted events; the results of your search are events that share common characteristics, and you can give them a collective name or “event type”. The names of your event types are added as values into an eventtype field. This means that you can search for, and report on, these groups of events the same way you search for any field. The following example takes you through the steps to save a search as an eventtype and then searching for that field. If you run frequent searches to investigate SSH and firewall activities, such as sshd logins or firewall denies, you can save these searches as an event type. Also, if you see error messages that are cryptic, you can save it as an event type with a more descriptive name.
  16. If facebook had eventtypes, you’d define any picture that has any of your family members but no co-workers as a ‘family’ pix that you could then have a virtual photo album for. Any pix with a family member outside the bayarea as a “family vacation” pix.When you search your data, you’re essentially weeding out all unwanted events; the results of your search are events that share common characteristics, and you can give them a collective name or “event type”. The names of your event types are added as values into an eventtype field. This means that you can search for, and report on, these groups of events the same way you search for any field. The following example takes you through the steps to save a search as an eventtype and then searching for that field. If you run frequent searches to investigate SSH and firewall activities, such as sshd logins or firewall denies, you can save these searches as an event type. Also, if you see error messages that are cryptic, you can save it as an event type with a more descriptive name.
  17. If facebook had eventtypes, you’d define any picture that has any of your family members but no co-workers as a ‘family’ pix that you could then have a virtual photo album for. Any pix with a family member outside the bayarea as a “family vacation” pix.
  18. If facebook had eventtypes, you’d define any picture that has any of your family members but no co-workers as a ‘family’ pix that you could then have a virtual photo album for. Any pix with a family member outside the bayarea as a “family vacation” pix.
  19. If facebook had eventtypes, you’d define any picture that has any of your family members but no co-workers as a ‘family’ pix that you could then have a virtual photo album for. Any pix with a family member outside the bayarea as a “family vacation” pix.
  20. Why? Reduce the number of fields you should focus on to those with the most value. For analysis and graphing
  21. Why? Reduce the number of fields you should focus on to those with the most value. For analysis and graphing
  22. A 1.0 means two fields always co-occur. For example, Component and Log_Level always co-occur in splunkd.log. You can filter out fields to make this table more manageable.
  23. ----- Meeting Notes (9/4/12 11:49) -----give splunkd example output first to show log
  24. This shows that before we know the component is SavedSplunker, the odds of a WARN Log_Level is 62.25%; afterwords, the odds are 100%. Before we know the component is loader, the odds of INFO Log_Level is 33.15%; afterwards, 99.44%.
  25. What are anomalies/outliers?The set of data points that are considerably differentApplications: network intrusion detection, fault detection, credit card fraud detection, telecommunication fraud detection– Build a profile of the “normal” behavior – patterns, stats to detect anomaliesVery often you want to find “problems” in your IT data, but you don’t know what to look for. If you know what to look for, by all means, look.
  26. Very often you want to find “problems” in your IT data, but you don’t know what to look for. If you know what to look for, by all means, look.… | eventstats perc99(spent) as bigspender | where spent &gt; bigspender
  27. Very often you want to find anomalies/problems in your IT data, but you don’t know what to look for. Single Value: – ‘port’ value is highly irregularMany Values: – many values look different than othersAnomalous: – many values were unexpected by contextEvernything applies to transactions as well. Look for anomalies
  28. Identifies values in the data that are anomalous either by frequency of occurrence or number of standard deviations from the mean. Make searches to find these anomalous values and create alerts.
  29. catNormFreq = the average frequency of non-anomalous valuesisNum means all values of the field were numerical.basically we assume a normal distribution, but if we find that ends up causing too many values to be anomalous we don&apos;t use it
  30. Earlier we looked for large clusters to get a broad understanding of the events. We grouped by content, format, and time.Now, just flip it. Make searches to find these anomalous values and create alerts.
  31. Same for for form (looking for unusual punctuation) or especially long pauses between events (10 seconds?)Make searches to find these anomalous values and create alerts.
  32. . These slow events are often important and indicate longer tasks.
  33. Make eventtypes or tags for these slow, important events. Who runs them most? Are they a problem? Why is someone exporting, or backfilling their data? Make an alert when it happens.
  34. Experimental search command that uses compression and a window of N last events to see if a new command compresses well with past events, or if it looks unexpected.Make searches to find these anomalous values and create alerts.
  35. Make searches to find these anomalous values and create alerts.
  36. One of the most obvious and important methods of discovering what your data is saying is to simply graph your data.Humans have a well-developed ability to analyze large amounts of data presented visually, detecting general patterns and trends, as well as outliers and unusual patterns.
  37. What data points are outliers? what inferences would you make?radically different.
  38. Limitations of Statistical Approaches:   usually tests a single attribute. distributions aren’t known  for many dimensions, hard to estimate the true distribution Do not just rely on numerical summarization, or you won’t see what’s going on.
  39. Same for transactions of events, and classes of events (eventtypes) and field-values (tags)
  40. Eventually you’ll tweak out little nuggets of knowledge.Over time, what is the average duration users spend on my website by language of country, compared to last month.How does the time on the website correlate with the time of day, or browserdoes the max delay for each server vary over time by languageSame for transactions of events, and classes of events (eventtypes) and field-values (tags)
  41. .  So reducing the number of dimensions down to 2 or 3 for visualization and limiting the data shown
  42. Heat map vs much more useful chart
  43. Discovery: Each step tells you more about everything else.
  44. predicting foo and getting better and better at it, and towards the right edge you can see it&apos;s predicting values that haven&apos;t happened yet&quot;