This document provides an overview of advanced outlier detection and noise reduction techniques using Splunk and the Machine Learning Toolkit (MLTK). It discusses common ways to detect outliers including static thresholds, moving averages, density functions, and combining multiple methods. Ensemble learning and clustering algorithms are also introduced as ways to increase outlier detection accuracy.
This document outlines a presentation on threat hunting with Splunk. The presenter is Ken Westin, a security strategist at Splunk with over 20 years of experience in technology and security. The agenda includes an overview of threat hunting basics and data sources, examining the cyber kill chain through a hands-on attack scenario using Splunk, and advanced threat hunting techniques including machine learning. Log-in credentials are provided for access to hands-on demo environments related to the presentation.
The document discusses threat hunting techniques using Splunk, including an overview of threat hunting basics, data sources for threat hunting, and Lockheed Martin's Cyber Kill Chain model. It provides examples of using endpoint data to hunt for threats across the kill chain by analyzing processes, communications, and file artifacts in a demo dataset. Advanced techniques discussed include hunting for SQL injection attacks and lateral movement.
The document is a presentation on threat hunting with Splunk. It discusses threat hunting basics and data sources, the cyber kill chain model, and conducting a hands-on attack scenario investigation using Splunk. It also covers advanced threat hunting techniques and tools, applying machine learning and data science to security, and increasing an organization's threat hunting maturity. The presentation includes examples of using Splunk to investigate a hypothetical attack spanning multiple stages of the cyber kill chain using various security data sources.
This document provides an overview of a presentation on security monitoring and analytics using Splunk. The presentation covers using Splunk Enterprise for security operations like alert management and incident response. It also covers using Splunk User Behavior Analytics to detect anomalies and threats using machine learning. The presentation highlights new features in Splunk Enterprise Security 4.1 like prioritizing investigations and expanded threat intelligence, and new features in Splunk UBA 2.2 like enhanced security analytics and custom threat modeling. It demonstrates integrating UBA results into the Splunk Enterprise Security workflow for faster investigation of advanced threats.
This document provides an overview of offensive open-source intelligence (OSINT) techniques. It defines OSINT and discusses the differences between offensive and defensive OSINT approaches. Offensive OSINT focuses on gathering as much public information as possible to facilitate an attack against a target. The document outlines the OSINT process and details specific techniques for harvesting data from public sources, including scraping websites, using APIs, searching social media, analyzing images and metadata, and researching infrastructure components like IP addresses, domains, and software versions. The goal of offensive OSINT is to discover valuable information like employee emails, usernames, relationships, locations and technical vulnerabilities to enable attacks like phishing, social engineering, and infiltration.
Your adversaries continue to attack and get into companies. You can no longer rely on alerts from point solutions alone to secure your network. To identify and mitigate these advanced threats, analysts must become proactive in identifying not just indicators, but attack patterns and behavior. In this workshop we will walk through a hands-on exercise with a real world attack scenario. The workshop will illustrate how advanced correlations from multiple data sources and machine learning can enhance security analysts capability to detect and quickly mitigate advanced attacks.
On your marks, get set GO!
Take a more in-depth look at the automation and orchestration journey and the future of SOAR.
Watch the SOCtails video here: https://www.youtube.com/watch?v=YzsGQzqaDYw&t=2s
Find out how to threat hunt commonly found web shells in your infrastructure using the powerful Splunk querying language. Discover queries to hunt for various aspects of web shells and other malicious artifacts.
This document outlines a presentation on threat hunting with Splunk. The presenter is Ken Westin, a security strategist at Splunk with over 20 years of experience in technology and security. The agenda includes an overview of threat hunting basics and data sources, examining the cyber kill chain through a hands-on attack scenario using Splunk, and advanced threat hunting techniques including machine learning. Log-in credentials are provided for access to hands-on demo environments related to the presentation.
The document discusses threat hunting techniques using Splunk, including an overview of threat hunting basics, data sources for threat hunting, and Lockheed Martin's Cyber Kill Chain model. It provides examples of using endpoint data to hunt for threats across the kill chain by analyzing processes, communications, and file artifacts in a demo dataset. Advanced techniques discussed include hunting for SQL injection attacks and lateral movement.
The document is a presentation on threat hunting with Splunk. It discusses threat hunting basics and data sources, the cyber kill chain model, and conducting a hands-on attack scenario investigation using Splunk. It also covers advanced threat hunting techniques and tools, applying machine learning and data science to security, and increasing an organization's threat hunting maturity. The presentation includes examples of using Splunk to investigate a hypothetical attack spanning multiple stages of the cyber kill chain using various security data sources.
This document provides an overview of a presentation on security monitoring and analytics using Splunk. The presentation covers using Splunk Enterprise for security operations like alert management and incident response. It also covers using Splunk User Behavior Analytics to detect anomalies and threats using machine learning. The presentation highlights new features in Splunk Enterprise Security 4.1 like prioritizing investigations and expanded threat intelligence, and new features in Splunk UBA 2.2 like enhanced security analytics and custom threat modeling. It demonstrates integrating UBA results into the Splunk Enterprise Security workflow for faster investigation of advanced threats.
This document provides an overview of offensive open-source intelligence (OSINT) techniques. It defines OSINT and discusses the differences between offensive and defensive OSINT approaches. Offensive OSINT focuses on gathering as much public information as possible to facilitate an attack against a target. The document outlines the OSINT process and details specific techniques for harvesting data from public sources, including scraping websites, using APIs, searching social media, analyzing images and metadata, and researching infrastructure components like IP addresses, domains, and software versions. The goal of offensive OSINT is to discover valuable information like employee emails, usernames, relationships, locations and technical vulnerabilities to enable attacks like phishing, social engineering, and infiltration.
Your adversaries continue to attack and get into companies. You can no longer rely on alerts from point solutions alone to secure your network. To identify and mitigate these advanced threats, analysts must become proactive in identifying not just indicators, but attack patterns and behavior. In this workshop we will walk through a hands-on exercise with a real world attack scenario. The workshop will illustrate how advanced correlations from multiple data sources and machine learning can enhance security analysts capability to detect and quickly mitigate advanced attacks.
On your marks, get set GO!
Take a more in-depth look at the automation and orchestration journey and the future of SOAR.
Watch the SOCtails video here: https://www.youtube.com/watch?v=YzsGQzqaDYw&t=2s
Find out how to threat hunt commonly found web shells in your infrastructure using the powerful Splunk querying language. Discover queries to hunt for various aspects of web shells and other malicious artifacts.
This document outlines an agenda for a presentation on threat hunting with Splunk. The presentation will cover threat hunting basics, data sources for threat hunting including Sysmon endpoint data, applying the cyber kill chain framework, and a hands-on demo of investigating an attack scenario across various Splunk data sources like endpoint, network, email, and threat intelligence. Credentials are provided for accessing the demo environment. An overview of Sysmon endpoint event data and using it to map processes and network connections is also given.
The document discusses different nmap scanning techniques including SYN scans, FIN scans, ACK scans, and window scans. It provides pros and cons of each technique. It then details a mission to penetrate SCO's firewall and discern open ports on a target system using different scan types. Another mission works to locate webservers on the Playboy network offering free images, optimizing the scan by getting timing information and scanning faster without DNS lookups. Several IP addresses with port 80 open are identified.
This document provides an overview of threat hunting using Splunk. It begins with an introduction to threat hunting and why it is important. The presentation then discusses key building blocks for driving threat hunting maturity, including search and visualization, data enrichment, ingesting data sources, and applying machine learning. It provides examples of internal data sources that can be used for hunting like IP addresses, network artifacts, DNS, and endpoint data. The presentation demonstrates hunting using the Microsoft Sysmon endpoint agent, walking through an example attack scenario matching the Cyber Kill Chain framework. It shows how to investigate a potential compromise by searching across web, DNS, proxy, firewall, and endpoint data in Splunk to trace suspicious activity back to a specific user.
The document outlines a presentation on threat hunting with Splunk. It provides an agenda that includes an overview of threat hunting basics and data sources, a demonstration of using Sysmon endpoint data to investigate an attack scenario according to the cyber kill chain framework, and a discussion of applying machine learning and data science to security. It also includes credentials for logging into the demo environment and notes that hands-on participation is part of the session.
The document provides an overview of the Metasploit framework. It describes Metasploit as an open-source penetration testing software that contains exploits, payloads, and other tools to help identify vulnerabilities. Key points covered include Metasploit's architecture and modules for scanning, exploitation, and post-exploitation. Examples of tasks that can be performed include port scanning, vulnerability assessment, exploiting known issues, and gaining access to systems using payloads and meterpreter sessions. The document warns that Metasploit should only be used for legitimate security testing and cautions about the potential risks if misused.
This document provides an overview of data models in Splunk:
- A data model maps raw machine data onto a hierarchical structure to encapsulate domain knowledge and enable non-technical users to interact with data via pivot reports.
- There are three root object types: events, searches, and transactions. Objects have constraints, attributes, and inherit properties from parent objects.
- Data models are built using the UI or REST API. Pivot reports leverage data models by generating optimized search strings from the model.
- Data model acceleration improves performance of pivot reports by pre-computing searches on disk. Only the first event object and descendants are accelerated by default.
Effective Threat Hunting with Tactical Threat IntelligenceDhruv Majumdar
How to set up a Threat Hunting Team for Active Defense utilizing Cyber Threat Intelligence and how CTI can help a company grow and improve its security posture.
The document discusses bug bounty hunting. It introduces Shubham Gupta and Yash Pandya who are security consultants and top bug hunters. It outlines the agenda which includes an introduction to bug bounty programs, reasons for bug hunting, how to find bugs, quick tips, proofs of concept, pros and cons, and a Q&A. It provides a brief history of bug bounty programs and notes that now anyone can participate from home. It discusses types of bugs and tools used for hunting. Quick tips include using Google dorks, testing for information disclosure vulnerabilities, and completing challenges to improve skills. Examples are provided of unique bugs found like SVG XSS and an IDOR issue found in Google.
XXE Exposed: SQLi, XSS, XXE and XEE against Web ServicesAbraham Aranguren
XXE Exposed Webinar Slides:
Brief coverage of SQLi and XSS against Web Services to then talk about XXE and XEE attacks and mitigation. Heavily inspired on the "Practical Web Defense" (PWD) style of pwnage + fixing (https://www.elearnsecurity.com/PWD)
Full recording here:
NOTE: (~20 minute) XXE + XEE Demo Recording starts at minute 25
https://www.elearnsecurity.com/collateral/webinar/xxe-exposed/
Exploring Frameworks of Splunk Enterprise SecuritySplunk
This document discusses Splunk Enterprise Security and its frameworks for addressing security operations challenges. It provides an overview of Splunk's security portfolio and how it can help with issues like slow investigations, limited data ingestion, and inflexible deployments faced by legacy SIEMs. Key frameworks covered include the Notable Events framework for streamlining incident management across the entire lifecycle from detection to remediation. It also discusses the Asset and Identity framework for automatically enriching incidents with relevant context to help with rapid qualification and situational awareness.
Cyber Threat Hunting: Identify and Hunt Down IntrudersInfosec
View webinar: "Cyber Threat Hunting: Identify and Hunt Down Intruders": https://www2.infosecinstitute.com/l/12882/2018-11-29/b9gwfd
View companion webinar:
"Red Team Operations: Attack and Think Like a Criminal": https://www2.infosecinstitute.com/l/12882/2018-11-29/b9gw5q
Are you red team, blue team — or both? Get an inside look at the offensive and defensive sides of information security in our webinar series.
Senior Security Researcher and InfoSec Instructor Jeremy Martin discusses what it takes to be modern-day threat hunter during our webinar, Cyber Threat Hunting: Identify and Hunt Down Intruders.
The webinar covers:
- The job duties of a Cyber Threat Hunting professional
- Frameworks and strategies for Cyber Threat Hunting
- How to get started and progress your defensive security career
- And questions from live viewers!
Learn about InfoSec Institute's Cyber Threat Hunting couse here: https://www.infosecinstitute.com/courses/cyber-threat-hunting/
Conceito militar, agora aplicado a Cibersegurança, o "the cyber kill chain" foi desenvolvido pela Lockheed Martin em 2011. Ele descreve as fases que um adversário seguirá para alvejar uma Organização. São 7 fases bem definidas e este ataque é considerado bem sucedido
se / quando todas as fases foram realizadas.
(DOCUMENTO EM INGLÊS)
This document discusses Splunk Enterprise Security and its frameworks for analyzing security data. It provides an overview of Splunk's security portfolio and how it addresses challenges with legacy SIEM solutions. Key frameworks covered include Notable Events for streamlining incident management, Asset and Identity for enriching incidents with contextual data, Risk Analysis for prioritizing incidents based on quantitative risk scores, and Threat Intelligence for detecting indicators of compromise in machine data. Interactive dashboards and incident review interfaces are highlighted as ways to investigate threats and monitor the security posture.
The document is a presentation on threat hunting with Splunk. It discusses threat hunting basics, data sources for threat hunting, knowing your endpoint, and using the cyber kill chain framework. It outlines an agenda that includes a hands-on walkthrough of an attack scenario using Splunk's core capabilities. It also discusses advanced threat hunting techniques and tools, enterprise security walkthroughs, and applying machine learning and data science to security.
Threat hunting - Every day is hunting seasonBen Boyd
Breakout Presentation by Ben Boyd during the 2018 Nebraska Cybersecurity Conference.
Introduction to Threat Hunting and helpful steps for building a Threat Hunting Program of any size, from small to massive.
The document provides an overview of network security threats and countermeasures. It discusses various types of threats like viruses, denial of service attacks, and spoofing. It recommends a defense-in-depth approach using multiple layers of security like firewalls, intrusion detection systems, antivirus software, and encryption. Specific security measures are examined, including network monitoring, access control, and securing servers and applications.
The top 10 windows logs event id's used v1.0Michael Gough
How to catch malicious activity on Windows systems using properly configured audit logging and the Top 10 events and more you must have enable, configured and alerting.
LOG-MD
MalwareArchaeology.com
This document discusses exploiting vulnerabilities related to HTTP host header tampering. It notes that tampering with the host header can lead to issues like password reset poisoning, cache poisoning, and cross-site scripting. It provides examples of how normal host header usage can be tampered with, including by spoofing the header to direct traffic to malicious sites. The document also lists some potential victims of host header attacks, like Drupal, Django and Joomla, and recommends developers check settings to restrict allowed hosts. It proposes methods for bruteforcing subdomains and host headers to find vulnerabilities.
The document discusses various C++ programming concepts including:
- The cin statement is used to read input from the keyboard and store values in variables. It is often used with cout to display prompts.
- Variables must be declared with valid names using letters, digits, and underscores. Keywords like int and float cannot be used as names.
- Different data types like int, float, and char are used to store different kinds of data. Variables of the specified types need to be declared before use.
- Arithmetic operators like +, -, *, /, and % are used to perform calculations in expressions and assignments. Parentheses can be used to alter operator precedence.
Object Oriented Programming Short Notes for Preperation of ExamsMuhammadTalha436
The document appears to be lecture notes on object-oriented programming using C++. It covers key concepts like classes, objects, encapsulation, inheritance, and polymorphism. It also provides examples of input/output statements, arithmetic operators, assignment operators, and relational operators in C++ code. The document is divided into multiple chapters with topics like classes, inheritance, templates, and exceptions.
This document outlines an agenda for a presentation on threat hunting with Splunk. The presentation will cover threat hunting basics, data sources for threat hunting including Sysmon endpoint data, applying the cyber kill chain framework, and a hands-on demo of investigating an attack scenario across various Splunk data sources like endpoint, network, email, and threat intelligence. Credentials are provided for accessing the demo environment. An overview of Sysmon endpoint event data and using it to map processes and network connections is also given.
The document discusses different nmap scanning techniques including SYN scans, FIN scans, ACK scans, and window scans. It provides pros and cons of each technique. It then details a mission to penetrate SCO's firewall and discern open ports on a target system using different scan types. Another mission works to locate webservers on the Playboy network offering free images, optimizing the scan by getting timing information and scanning faster without DNS lookups. Several IP addresses with port 80 open are identified.
This document provides an overview of threat hunting using Splunk. It begins with an introduction to threat hunting and why it is important. The presentation then discusses key building blocks for driving threat hunting maturity, including search and visualization, data enrichment, ingesting data sources, and applying machine learning. It provides examples of internal data sources that can be used for hunting like IP addresses, network artifacts, DNS, and endpoint data. The presentation demonstrates hunting using the Microsoft Sysmon endpoint agent, walking through an example attack scenario matching the Cyber Kill Chain framework. It shows how to investigate a potential compromise by searching across web, DNS, proxy, firewall, and endpoint data in Splunk to trace suspicious activity back to a specific user.
The document outlines a presentation on threat hunting with Splunk. It provides an agenda that includes an overview of threat hunting basics and data sources, a demonstration of using Sysmon endpoint data to investigate an attack scenario according to the cyber kill chain framework, and a discussion of applying machine learning and data science to security. It also includes credentials for logging into the demo environment and notes that hands-on participation is part of the session.
The document provides an overview of the Metasploit framework. It describes Metasploit as an open-source penetration testing software that contains exploits, payloads, and other tools to help identify vulnerabilities. Key points covered include Metasploit's architecture and modules for scanning, exploitation, and post-exploitation. Examples of tasks that can be performed include port scanning, vulnerability assessment, exploiting known issues, and gaining access to systems using payloads and meterpreter sessions. The document warns that Metasploit should only be used for legitimate security testing and cautions about the potential risks if misused.
This document provides an overview of data models in Splunk:
- A data model maps raw machine data onto a hierarchical structure to encapsulate domain knowledge and enable non-technical users to interact with data via pivot reports.
- There are three root object types: events, searches, and transactions. Objects have constraints, attributes, and inherit properties from parent objects.
- Data models are built using the UI or REST API. Pivot reports leverage data models by generating optimized search strings from the model.
- Data model acceleration improves performance of pivot reports by pre-computing searches on disk. Only the first event object and descendants are accelerated by default.
Effective Threat Hunting with Tactical Threat IntelligenceDhruv Majumdar
How to set up a Threat Hunting Team for Active Defense utilizing Cyber Threat Intelligence and how CTI can help a company grow and improve its security posture.
The document discusses bug bounty hunting. It introduces Shubham Gupta and Yash Pandya who are security consultants and top bug hunters. It outlines the agenda which includes an introduction to bug bounty programs, reasons for bug hunting, how to find bugs, quick tips, proofs of concept, pros and cons, and a Q&A. It provides a brief history of bug bounty programs and notes that now anyone can participate from home. It discusses types of bugs and tools used for hunting. Quick tips include using Google dorks, testing for information disclosure vulnerabilities, and completing challenges to improve skills. Examples are provided of unique bugs found like SVG XSS and an IDOR issue found in Google.
XXE Exposed: SQLi, XSS, XXE and XEE against Web ServicesAbraham Aranguren
XXE Exposed Webinar Slides:
Brief coverage of SQLi and XSS against Web Services to then talk about XXE and XEE attacks and mitigation. Heavily inspired on the "Practical Web Defense" (PWD) style of pwnage + fixing (https://www.elearnsecurity.com/PWD)
Full recording here:
NOTE: (~20 minute) XXE + XEE Demo Recording starts at minute 25
https://www.elearnsecurity.com/collateral/webinar/xxe-exposed/
Exploring Frameworks of Splunk Enterprise SecuritySplunk
This document discusses Splunk Enterprise Security and its frameworks for addressing security operations challenges. It provides an overview of Splunk's security portfolio and how it can help with issues like slow investigations, limited data ingestion, and inflexible deployments faced by legacy SIEMs. Key frameworks covered include the Notable Events framework for streamlining incident management across the entire lifecycle from detection to remediation. It also discusses the Asset and Identity framework for automatically enriching incidents with relevant context to help with rapid qualification and situational awareness.
Cyber Threat Hunting: Identify and Hunt Down IntrudersInfosec
View webinar: "Cyber Threat Hunting: Identify and Hunt Down Intruders": https://www2.infosecinstitute.com/l/12882/2018-11-29/b9gwfd
View companion webinar:
"Red Team Operations: Attack and Think Like a Criminal": https://www2.infosecinstitute.com/l/12882/2018-11-29/b9gw5q
Are you red team, blue team — or both? Get an inside look at the offensive and defensive sides of information security in our webinar series.
Senior Security Researcher and InfoSec Instructor Jeremy Martin discusses what it takes to be modern-day threat hunter during our webinar, Cyber Threat Hunting: Identify and Hunt Down Intruders.
The webinar covers:
- The job duties of a Cyber Threat Hunting professional
- Frameworks and strategies for Cyber Threat Hunting
- How to get started and progress your defensive security career
- And questions from live viewers!
Learn about InfoSec Institute's Cyber Threat Hunting couse here: https://www.infosecinstitute.com/courses/cyber-threat-hunting/
Conceito militar, agora aplicado a Cibersegurança, o "the cyber kill chain" foi desenvolvido pela Lockheed Martin em 2011. Ele descreve as fases que um adversário seguirá para alvejar uma Organização. São 7 fases bem definidas e este ataque é considerado bem sucedido
se / quando todas as fases foram realizadas.
(DOCUMENTO EM INGLÊS)
This document discusses Splunk Enterprise Security and its frameworks for analyzing security data. It provides an overview of Splunk's security portfolio and how it addresses challenges with legacy SIEM solutions. Key frameworks covered include Notable Events for streamlining incident management, Asset and Identity for enriching incidents with contextual data, Risk Analysis for prioritizing incidents based on quantitative risk scores, and Threat Intelligence for detecting indicators of compromise in machine data. Interactive dashboards and incident review interfaces are highlighted as ways to investigate threats and monitor the security posture.
The document is a presentation on threat hunting with Splunk. It discusses threat hunting basics, data sources for threat hunting, knowing your endpoint, and using the cyber kill chain framework. It outlines an agenda that includes a hands-on walkthrough of an attack scenario using Splunk's core capabilities. It also discusses advanced threat hunting techniques and tools, enterprise security walkthroughs, and applying machine learning and data science to security.
Threat hunting - Every day is hunting seasonBen Boyd
Breakout Presentation by Ben Boyd during the 2018 Nebraska Cybersecurity Conference.
Introduction to Threat Hunting and helpful steps for building a Threat Hunting Program of any size, from small to massive.
The document provides an overview of network security threats and countermeasures. It discusses various types of threats like viruses, denial of service attacks, and spoofing. It recommends a defense-in-depth approach using multiple layers of security like firewalls, intrusion detection systems, antivirus software, and encryption. Specific security measures are examined, including network monitoring, access control, and securing servers and applications.
The top 10 windows logs event id's used v1.0Michael Gough
How to catch malicious activity on Windows systems using properly configured audit logging and the Top 10 events and more you must have enable, configured and alerting.
LOG-MD
MalwareArchaeology.com
This document discusses exploiting vulnerabilities related to HTTP host header tampering. It notes that tampering with the host header can lead to issues like password reset poisoning, cache poisoning, and cross-site scripting. It provides examples of how normal host header usage can be tampered with, including by spoofing the header to direct traffic to malicious sites. The document also lists some potential victims of host header attacks, like Drupal, Django and Joomla, and recommends developers check settings to restrict allowed hosts. It proposes methods for bruteforcing subdomains and host headers to find vulnerabilities.
The document discusses various C++ programming concepts including:
- The cin statement is used to read input from the keyboard and store values in variables. It is often used with cout to display prompts.
- Variables must be declared with valid names using letters, digits, and underscores. Keywords like int and float cannot be used as names.
- Different data types like int, float, and char are used to store different kinds of data. Variables of the specified types need to be declared before use.
- Arithmetic operators like +, -, *, /, and % are used to perform calculations in expressions and assignments. Parentheses can be used to alter operator precedence.
Object Oriented Programming Short Notes for Preperation of ExamsMuhammadTalha436
The document appears to be lecture notes on object-oriented programming using C++. It covers key concepts like classes, objects, encapsulation, inheritance, and polymorphism. It also provides examples of input/output statements, arithmetic operators, assignment operators, and relational operators in C++ code. The document is divided into multiple chapters with topics like classes, inheritance, templates, and exceptions.
This document describes a student result system project created using C programming language. It allows users to perform operations like adding student records, viewing all records, searching records by roll number, calculating average marks, and sorting records by marks or roll number. The key algorithms used are merge sort for sorting and linear search for searching and insertion. The source code implements functions for the main menu, record insertion, display, sorting, searching, and average calculation. UML diagrams show the design of the student record class and interaction between functions.
This document discusses various methods for software cost estimation, including expert judgement techniques like the Delphi method, model-based techniques like COCOMO and Function Points, and dynamic models like Putnam and Parr that consider staffing levels and schedule over time. Static models estimate effort as a function of size factors alone while dynamic models also incorporate time-based elements. Both approaches rely at least partly on expert judgement and may not capture all project costs.
Optimization of workload prediction based on map reduce frame work in a cloud...eSAT Journals
Abstract Nowadays cloud computing is emerging Technology. It is used to access anytime and anywhere through the internet. Hadoop is an open-source Cloud computing environment that implements the Googletm MapReduce framework. Hadoop is a framework for distributed processing of large datasets across large clusters of computers. This paper proposes the workload of jobs in clusters mode using Hadoop. MapReduce is a programming model in hadoop used for maintaining the workload of the jobs. Depend on the job analysis statistics the future workload of the cluster is predicted for potential performance optimization by using genetic algorithm. Key Words: Cloud computing, Hadoop Framework, MapReduce Analysis, Workload
Optimization of workload prediction based on map reduce frame work in a cloud...eSAT Publishing House
This document summarizes a research paper that proposes optimizing workload prediction in Hadoop clusters using MapReduce and genetic algorithms. It describes collecting job history data from Hadoop, analyzing workload patterns, and using genetic algorithms to predict future workloads and optimize performance. The implementation analyzes a sample Hadoop trace log to calculate error rates for workload predictions. The goal is to integrate workload prediction into multi-node Hadoop clusters for real-time optimization.
This document provides an overview of primitive data types, expressions, and definite loops (for loops) in Java. It discusses Java's primitive types like int, double, char, and boolean. It covers arithmetic operators, precedence rules, and mixing numeric types. It also introduces variables, declarations, assignments, and using variables in expressions. Finally, it explains the syntax of for loops using initialization, a test condition, and an update to repeat a block of code a specified number of times.
The document summarizes a MuleSoft meetup that took place in Warsaw, Poland on January 23rd, 2019. The meetup agenda included introductions, an introduction to DataWeave 2.0 focusing on map, filter and reduce functions, examples and use cases of these functions, and a Q&A session. The speaker was introduced and details were provided on organizing future meetups and providing feedback.
Here are some key usability principles that seem important for your project based on the information provided:
- Learnability: Since this is a class project, learnability principles like predictability, familiarity, and consistency are important to help users quickly understand how to use the system. The design should leverage existing concepts and have consistent behaviors.
- Flexibility: Allowing for multiple ways of completing tasks and customization supports different user needs and preferences. Incorporating options like alternative dialog flows and customization can improve flexibility.
- Robustness: Principles like recoverability, error prevention, and responsiveness are important to ensure the system is robust. The design should minimize potential for errors, support undo/redo, handle exceptions
Overview of the basic metrics for measuring the usability dimensions of effectiveness, efficiency, and satisfaction. Discussed metrics are task time, orientation, effort, errors, learnability, and usability. Some specific methods are presented and examples are provided.
The slides are from 19 Nov 2015, my talk at ISTA 2015 https://istacon.org/Home/Session/538e4223-a158-45a1-8d99-f6dfc018367b
This document discusses various techniques for estimating software project costs, schedules, and sizes. It covers function point analysis, lines of code estimation, productivity models like COCOMO, and probabilistic techniques like PERT estimation. Key approaches mentioned include analogies, decomposition, mathematical models, mean schedule dates, and probability distributions.
This document analyzes a cloud workload dataset from Google to characterize usage patterns. The key steps are:
1) The data is preprocessed and important attributes like CPU/memory usage are analyzed.
2) Clustering algorithms are used to classify users based on resource estimation ratios and tasks based on attributes.
3) Time series analysis via DTW is performed on tasks to identify patterns, and tasks are clustered.
4) For target high estimation ratio users, resource usage is predicted based on matching task patterns and allocated dynamically with a threshold to allow for spikes. This approach aims to reallocate unused resources to other users.
The document provides information about fully solved assignments for the winter 2013 semester in the BCA program. It lists the subject code and name as BCA2030 - Object Oriented Programming - C++. It provides 6 questions related to the subject and asks students to send their semester and specialization details to the provided email ID or call the given phone number to get the solved assignments. It provides answers to the 6 questions related to topics like objects and classes, friend functions, constructors vs destructors, operator overloading, virtual functions and polymorphism, and exception handling models.
The summary highlights that the document discusses getting fully solved winter 2013 semester assignments for the BCA program's subject on Object Oriented Programming - C
The document discusses C++ memory management and smart pointers. It provides an overview of common memory issues with pointers, the new and delete operators, overloading new and delete, and memory pools. It then discusses different types of smart pointers like scoped pointers and shared pointers, which implement reference counting to prevent memory leaks and dangling pointers while allowing multiple pointers to the same data.
Work measurement involves determining the time it should take to complete tasks through various techniques. Standard times are set based on how long a trained worker would take and are used for planning workloads, scheduling tasks, costing labor, and calculating productivity. These times are set by qualified observers using appropriate methods depending on factors like the task length, required precision, and cycle time. Common methods include predetermined motion time systems, timing tasks, estimating, and activity sampling to determine time percentages without continuous observation.
how to build a Length of Stay model for a ProofOfConcept projectZenodia Charpy
walk through end to end and in detail how a machine learning process on Healthcare related model works ( here i picked LengthOfStay probelm) as a touch point to start the discussion, the scope is set to POC
The document discusses database capacity planning and analysis. It covers collecting and analyzing resource utilization data, developing mathematical models to predict performance, using queueing theory and response time analysis. Steps outlined include determining goals, gathering workload data, characterizing and modeling data, validating forecasts, and conducting case studies on deletion performance and evaluating MySQL capacity. The overall aim is to accurately measure current capacity, predict future growth, maintain balanced performance, and identify risks.
Similar to Advanced Outlier Detection and Noise Reduction with Splunk & MLTK August 11, 2021 (20)
SFBA Splunk Usergroup meeting December 14, 2023Becky Burwell
The summary provides an overview of the key topics and announcements from the Splunk User Group meeting:
1. The meeting will start at 11:10 am PST with a welcome and announcements before speakers present.
2. Upcoming meeting dates and locations for 2023 are provided, including a virtual meeting in March 2023.
3. The presentation will cover writing documentation for Splunk, including administrator documentation, user documentation, and documenting known issues. Tips are provided about iterating on documentation.
The document discusses a Splunk User Group meeting where the CISO of Los Angeles discussed the importance of automation and intelligence to act on threats. It then provides an overview of threat intelligence and how Recorded Future collects and organizes data from various sources to understand the threat landscape. Finally, it describes how the Recorded Future integration with Splunk can help accelerate security workflows like investigation, automation, and strategic planning.
SFBA Splunk User Group Meeting February 2023Becky Burwell
This presentation provides an overview of Splunk apps and how to build Splunk addons. It discusses the different types of Splunk apps and addons, such as modular inputs, parsing configurations, and custom search commands. It also covers ways to build addons using the UCC framework or Addon Builder, as well as how to package and vet apps using CLI commands, APIs, and the packaging toolkit. Resources for learning app development are also provided.
SFBA Splunk Usergroup meeting December 2022Becky Burwell
This presentation discusses Splunk Ideas, a program that allows users to submit enhancement requests for Splunk products. It provides metrics on the number of ideas submitted, voted on, and implemented. The presentation outlines the lifecycle of an idea from submission to implementation. It also discusses upcoming improvements to Splunk Ideas including customer champions, newsletters, and better response rates.
SF Bay Area Splunk User Group Meeting October 5, 2022Becky Burwell
Andrew D'Auria, the Director of Sales Engineering at Anvilogic, gave a presentation on modernizing threat detection engineering. He discussed problems with the current detection engineering process, including that it is slow, results in noisy alerts, and lacks coordination across tools. D'Auria proposed using Anvilogic's platform to build detections based on MITRE ATT&CK techniques and scenarios, correlate events of interest without code, and measure detection program effectiveness to improve security operations. He provided examples of how Anvilogic helped a financial client improve detections and reduce alerts.
SFBA Splunk User Group Meeting August 10, 2022Becky Burwell
The document summarizes the agenda and presentations for the August SF Bay Area Splunk User Group meeting. Ryan O'Connor gave a presentation on Dashboard Studio and the Splunk UI. He discussed why to build with Dashboard Studio, how to quickly customize dashboards, reduce searches, and tips for building with Dashboard Studio. Rinita Datta then presented on driving customer success through self-service resources like the Adoption Boards, signing up for tech talks and newsletters, and finding guidance on Splunk Lantern.
Getting Started with Splunk Observability September 8, 2021Becky Burwell
This document provides an introduction to getting started with Splunk Observability, including setting up a Splunk Observability trial, installing integrations for Windows, Linux, and GCP, and collecting events and metrics from cloud and observability systems. It also references a workshop for further guidance and discusses plans to get the Gateway installation working and collecting more data.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Kaxil Naik
Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical.
In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions.
This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next.
The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs).
This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future.
Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"sameer shah
Embark on a captivating financial journey with 'Financial Odyssey,' our hackathon project. Delve deep into the past performance of two companies as we employ an array of financial statement analysis techniques. From ratio analysis to trend analysis, uncover insights crucial for informed decision-making in the dynamic world of finance."
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
Advanced Outlier Detection and Noise Reduction with Splunk & MLTK August 11, 2021
1. Advanced Outlier Detection and Noise Reduction
with Splunk & MLTK
Presented by: Urwah Haq
August 10th, 2021
Presented by Urwah Haq @ San Francisco Splunk User Group
1
2. Slide 2
DI Confidential
18th Dec 2019
Agenda
1. Common Ways of finding outliers
• Review of some math terminology
• Review on Outlier blog what it covers
• Re-introduce moving average & foreach function
2. Using the ‘density function’ in MLTK
• An example of ML algorithm to detect outliers
3. Combining Multiple methods 1+2
• Ensemble Learning (combining multiple ML methods)
4. T-Tests & Clustering – What are they are how to use them?
2
3. Slide 3
DI Confidential
18th Dec 2019
ML/Splunk Terminology Refresher
Statistics Terms:
• Mean/Average – Central value in a set of data
• Standard Deviation – Measure of spread of data (higher the stdev the larger the difference between the
points)
• Time Series Data/Events - Time Series Data is data that is collected/ingested in Splunk over intervals of
time
ML Terms:
• Outliers – Legitimate Data Points that deviate far away from the norm
• Anomalies – An action that may seem out of order with the rest of data
• Outliers vs Anomalies – For our purposes in Splunk any deviations in data such as mb_out from firewall
data or cpu/mem/network utilization can be considered ‘Outliers’. Anything involving user actions such as
Urwah installing 10+ splunkbase applications on a Sunday is considered an ‘Anomaly’
Anomalies
Outliers
Relational
anomalies
+ Others
3
4. Slide 4
DI Confidential
18th Dec 2019
1 - What is an Outlier
• A point away from the
body of data points
• A data point different than
rest of the points
• In Splunk one of the most common ways to find
outliers is to set boundaries
• If datapoint deviates away from these boundaries
tag them as outliers
4
5. Slide 5
DI Confidential
18th Dec 2019
1- Types of Outlier detection (NO ML)
Blog: https://discoveredintelligence.ca/quick-guide-to-outlier-detection-in-splunk/
1. Static Threshold
a) If(value) > X(fixed threshold) THEN X is an outlier
2. Moving Thresholding
a) If(value) > X(moving average or moving value) THEN X is an outlier
b) Can use functions such as ‘trendline sma/ema’ OR ‘streamstats window=N’
c) We can get creative with this
index=main user=* sourcetype=WinEventLog| timechart count by user| eval
threshold=100
Static Threshold
| inputlookup app_usage.csv| rename * as user_*| rename user__time as _time|
table _time *| eval threshold=100
Moving Threshold
| inputlookup app_usage.csv | rename * as user_* | rename user__time as _time |
table _time * | eval threshold=1200 | addtotals fieldname=total | eval limit=0|
rename OTHER as u_OTHER | eval distinct_values=0 | foreach user_* [ eval
distinct_values=if(<<FIELD>> >0,distinct_values+1,distinct_values)] | eval
average=round(total/distinct_values,2) | eval average=if(distinct_values=1 AND
average >50,round(average/5),average)| table _time average user_*
5
6. Slide 6
DI Confidential
18th Dec 2019
1 – How basic moving average works
Moving Thresholding
a) A moving threshold is not just the average of past X number of points it can be a lot more
b) Basic search for moving average of past 5 data points
| inputlookup user_usage.csv | table _time * | eval threshold=1200 | addtotals fieldname=total | table _time total|
trendline sma5(total) as 5_moving_average
Here is what a simple average looks like with window=2:
_time User_a User_b User_C User_D User_E Average Moving Average
9:00 0 0 10 15 5 (0 + 0 + 0 +10
+ 15 +5)/5 = 6
9:15 0 0 0 5 5 (0 + 0 + 0 + 0 + 5
+ 5 )/5 = 2
4
9:30 1 2 3 4 5 3 2.5
9:45 0 1 5 4 5 3 3
10:00 1 3 0 5 2 2.2 2.1
10:15 1 0 4 6 3 2.8
10:30 1 0 0 7 0 1.6
5
Active
Users
Using trendline
| inputlookup app_usage.csv | rename * as user_* | rename user__time as _time |
table _time * | eval threshold=1200 | addtotals fieldname=total | table _time total|
trendline sma5(total) as 5_moving_average
Using streamstats
| inputlookup app_usage.csv | rename * as user_* | rename user__time as _time |
table _time * | eval threshold=1200 | addtotals fieldname=total | table _time total |
trendline sma5(total) as 5_moving_average| streamstats window=5 avg(total) as
streamstats_moving_average
Using streamstats & autoregress
| inputlookup app_usage.csv
| rename * as user_*
| rename user__time as _time
| table _time *
| eval threshold=1200
| addtotals fieldname=total
| table _time total
6
7. | streamstats window=5 avg(total) as streamstats_moving_average
| autoregress streamstats_moving_average as previous_moving_average
6
8. Slide 7
DI Confidential
18th Dec 2019
1 – Using Foreach Function to adjust moving average
• Use ‘Foreach’ function with conditions. E.g
ONLY use ‘active’ users with hits>0 to calculate average
_time User_a User_b User_C User_
D
User_E New Average New Moving Average
9:00 0 0 10 15 5 (10 + 15
+5)/3 = 10
9:15 0 0 0 5 5 (5 + 5 )/2 = 10 10
9:30 1 2 3 4 5 3 6.5
9:45 0 1 5 4 5 3 3
10:00 1 3 0 5 2 2.2 2.65
10:15 1 0 4 6 3 2.8
10:30 1 0 0 7 0 1.6
3 Active Users
Using Foreach function
| inputlookup app_usage.csv | rename * as user_* | rename user__time as _time |
table _time * | eval threshold=1200 | addtotals fieldname=total | rename OTHER as
u_OTHER | eval distinct_values=0 | foreach user_* [ eval
distinct_values=if(<<FIELD>> >100,distinct_values+1,distinct_values)] | eval
new_average=round(total/distinct_values,2) | eval old_average=round(total/11,2)|
table _time new_average old_average
7
9. Slide 8
DI Confidential
18th Dec 2019
1 – Using Foreach vs Aggregate Moving Average
_time User
_a
User
_b
User
_C
User
_D
User
_E
9:00 0 0 10 15 5
9:15 0 0 0 5 5
9:30 1 2 3 4 5
9:45 0 1 5 4 5
10:00 1 3 0 5 2
10:15 1 0 4 6 3
10:30 1 0 0 7 0
Basic method Using Foreach method
• Designed such that a user with 0
activity will count as an ‘active
user’
• Simple to implement
• Better to use for total
aggregates
• Results in more ‘outliers’ due to
static or moving bound
• Only users with activity will be
counted as ‘active users’
• More Complicated to setup
• Better to use when you have a
limited number of Users/Ips or
Entities
• Gives a more accurate picture of
User/IP that is more active than
normal
Using Foreach function
| inputlookup app_usage.csv | rename * as user_* | rename user__time as _time |
table _time * | eval threshold=1200 | addtotals fieldname=total | rename OTHER as
u_OTHER | eval distinct_values=0 | foreach user_* [ eval
distinct_values=if(<<FIELD>> >100,distinct_values+1,distinct_values)] | eval
new_average=round(total/distinct_values,2) | eval old_average=round(total/11,2)|
table _time new_average old_average
8
10. Slide 9
DI Confidential
18th Dec 2019
2 - Introducing the ‘Density Function’
• What is the ‘Density Function’ within MLTK?
• It is another tool for you to use in anomaly detection on top of previous methods to find anomalies.
• It is better to use at an aggregate level (e.g span=15/30/60min)
• It works by plotting your values against mathematics distributions to calculate the probability of them happening
• Similar to the “| anomalydetection method=histogram [field_name]”
All user activity
counts
Activity Bins
0-100 500-600 600-700
Activity between 500-700 is
usually most common in a day
when span and have the
highest probability of
happening
1100-1200
Activities in these bins have the lowest
probability of occurring More likely to be
outliers
DensityFunction -
https://docs.splunk.com/Documentation/MLApp/5.2.1/User/Algorithms#DensityF
unction
AnomalyDetection -
https://docs.splunk.com/Documentation/SplunkCloud/8.2.2104/SearchReference/
Anomalydetection
DensityFunction
| inputlookup app_usage.csv | rename * as user_* | rename user__time as _time |
table _time * | eval threshold=1200 | addtotals fieldname=total | fields _time total|
bin total start=1 end=5| stats count by total
DensityFunction Example
| inputlookup app_usage.csv | rename * as user_* | rename user__time as _time |
table _time * | eval threshold=1200 | addtotals fieldname=total | fields _time total|
fit DensityFunction total
9
11. Overlay - Overlay Line using visual formatting options
| inputlookup app_usage.csv | rename * as user_* | rename user__time as _time |
table _time * | eval threshold=1200 | addtotals fieldname=total | fields _time total |
bin total start=1 end=5 | stats count by total| eval overlay=count
9
12. Slide 10
DI Confidential
18th Dec 2019
2 – Using the Density Function
• Where it works well
• Data that is continuous, with little to no gaps
• For Aggregate-level e.g total activity
• For Entity-level (users/Ips) that has few or no gaps (fit DensityFunction <Field> by “User” into Model_Name)
10
13. Slide 11
DI Confidential
18th Dec 2019
• Using Density Function at Aggregate Level
• Use foreach moving average method
3 – Combining Density Function with Moving Averages
11
14. Slide 12
DI Confidential
18th Dec 2019
• Using Density Function at Aggregate Level
• …..| fields _time Total| fit DensityFunction Total show_density=true into
my_usergroup_model
• Use foreach moving average method
• …. | foreach user_* [ eval distinct_values=if(<<FIELD>>
>0,distinct_values+1,distinct_values)] | eval
new_average=round(total/distinct_values,2) | table _time * new_average
| foreach user_* [ eval isOutlier_<<FIELD>>=if(<<FIELD>> >
2*new_average,1,0)]
3 – Combining Searches
Output
Fields
Output
Fields
_time, isOutlier (Aggregate)
_time, isOutlier_user1, isOutlier_user2,
isOutlier_user3, …
Reference Outlier (Aggregate) in user-level outlier
search from 1 of 3 options:
1 – Lookup
2 – Summary Index
3 – Inline Search
| inputlookup user_usage.csv| addtotals| fields _time Total| fit DensityFunction Total
show_density=true into my_usergroup_model
| inputlookup app_usage.csv | rename * as user_* | rename user__time as _time |
table _time * | eval threshold=1200 | addtotals fieldname=total | rename OTHER as
u_OTHER | eval distinct_values=0 | foreach user_* [ eval
distinct_values=if(<<FIELD>> >0,distinct_values+1,distinct_values)] | eval
new_average=round(total/distinct_values,2) | table _time * new_average | foreach
user_* [ eval isOutlier_<<FIELD>>=if(<<FIELD>> > 2*new_average,1,0)]
12
15. Slide 13
DI Confidential
18th Dec 2019
How do I make most use of all of outlier methods?
• Apply Density Function or any other technique
to find a time frame that was an outlier
• Save results in lookup or summary index for
reference
Aggregate
Level
• Use user-level outlier technique to find a user
who was an outlier at a certain time
• Reference that time with the aggregate level
Entity
(user/Ip) level
• Reference regional outliers using _time or time
buckets as the common field with the
aggregate level & user level
(Optional)
Regional-Level
Advantages of combining multiple
styles of outlier detection at
different data levels
• Verification of true outliers vs a
simple static value
• Less noisy for alerting
• Alert only when all 2 or 3 levels of
outliers are met
• Validate if rise/fall of aggregate
level was contributed by one or
more user. If one user that is a
confirmed outlier
13
16. Slide 14
DI Confidential
18th Dec 2019
More Advanced Ensemble Techniques
Aggregate Level Entity Level
Available ML Techniques
• Density Function to find most rare time
buckets with highest values as outliers
• Regression to find loudest times buckets
• Classification to find times with highest
probability of being outliers
• Statespace algorithm & anomaly
detection algorithm
Available Non-ML Techniques
• Static thresholds
• Moving Averages thresholds
Available ML Techniques
• Density Function to find most rare time
buckets with highest values as outliers
• Classification to find entities with highest
probability of going above thresholds
• Statespace algorithm & anomaly
detection algorithm
Available Non-ML Techniques
• Static thresholds
• Moving Averages thresholds
• Foreach and activity based averages
• Better outliers
• Less mundane alerting
• Statespace algorithm
& anomaly detection
algorithm
14
17. Slide 15
DI Confidential
18th Dec 2019
4 – Increasing Outlier Function Accuracy
1. Find Entities/Users/Ips that form a large percentage of your overall activity and
remove them
• This can be measured by using the correlation OR t-test function from MLTK
2. Group Similar sets of Entities/Users/Ips using the clustering command in MLTK
• Analyze each cluster individually. The cluster command
15
18. Slide 16
DI Confidential
18th Dec 2019
Thank you
| inputlookup query.csv| fit TFIDF query stop_words=english analyzer=word
token_pattern="w{3,20}" max_features=200| fit KMeans query* k=3| fields user
query cluster cluster_distance
16
19. Slide 17
DI Confidential
18th Dec 2019
Scoring Function to determine similarity
Scoring function
| score <test_name> <fields>…
https://docs.splunk.com/Documentation/MLApp/5.2.1/User/Scorecommand#T-test_.281_sample.29
Available tests:
• T-test(s):
1. Test if two Ips/User have identical pattern from different groups/domains (T-test 2 independent
sample)
2. Test if single user/ip is equal to a average from group (T-test 1 sample)
3. Test if two Ips/User have identical pattern from same group/domain (T-test 2 related samples)
• Energy Distance: The closer this value to 0 the similar two fields are in-terms of gain/loss overtime (or
mathematically they have similar cumulative distributive function)
• Kolmogorov-Smirnov (KS): Test if something is statistically identical to another field
• Kwiatkowski-Phillips-Schmidt-Shin: Test if field(s) trend is stationary – no or little gain/loss
T-test examples:
1
| inputlookup app_usage.csv
| rename * as user_*
| rename user__time as _time
| table _time user_ITOps
| score ttest_1samp user_ITOps popmean=100 alpha=0.1
2
| inputlookup app_usage.csv
| rename * as user_*
| rename user__time as _time
| table _time user_HR1 user_HR2 user_ERP user_CRM user_ITOps
| score ttest_ind user_HR1 against user_HR2 user_ITOps
| inputlookup app_usage.csv
| rename * as user_*
| rename user__time as _time
| table _time user_HR1 user_HR2 user_ERP user_CRM user_ITOps
17
20. | score ttest_ind user_HR1 against user_HR1 user_ITOps
3
| inputlookup app_usage.csv
| rename * as user_*
| rename user__time as _time
| table _time user_HR1 user_HR2 user_ERP user_CRM user_ITOps
| score ttest_rel user_HR1 against user_HR1 user_ITOps
Energy Distance
| inputlookup app_usage.csv
| rename * as user_*
| rename user__time as _time
| table _time user_HR1 user_HR2 user_ERP user_CRM user_ITOps
user_RemoteAccess user_Webmail
| score energy_distance user_Webmail against user_RemoteAccess
| inputlookup app_usage.csv
| rename * as user_*
| rename user__time as _time
| table _time user_HR1 user_HR2 user_ERP user_CRM user_ITOps
user_RemoteAccess user_Webmail
| score energy_distance user_HR1 against user_HR1
| inputlookup app_usage.csv
| rename * as user_*
| rename user__time as _time
| table _time user_RemoteAccess user_Webmail
| fit CorrelationMatrix method=kendall user_Webmail user_RemoteAccess
17