Your adversaries continue to attack and get into companies. You can no longer rely on alerts from point solutions alone to secure your network. To identify and mitigate these advanced threats, analysts must become proactive in identifying not just indicators, but attack patterns and behavior. In this workshop we will walk through a hands-on exercise with a real world attack scenario. The workshop will illustrate how advanced correlations from multiple data sources and machine learning can enhance security analysts capability to detect and quickly mitigate advanced attacks.
My INSURER PTE LTD - Insurtech Innovation Award 2024
Threat Hunting Workshop: Investigating a Zeus APT Attack
1. 1
> René Agüero raguero@splunk.com
• > 1 year at Splunk – Security Specialist
• Based in Manhattan
• 18 years in security – MCSE NT4.0
• CISSP, MSBA – Information Assurance (forensics, auditing and
security)
• Offensive Security
• Exploitation – Metasploit, Web attacks
• Rapid7 SE Director
$whoami
2. Agenda
• Threat Huting Basics
• Threat Hunting Data Sources
• Sysmon Endpoint Data
• Cyber Kill Chain
• Walkthrough of Attack Scenario Using Core Splunk (hands on)
• Enterprise Security Walkthrough
• Applying Machine Learning and Data Science to Security
3. Log In Credentials
January, February & March https://od-threathunting-01.splunkoxygen.com
April, May & June https://od-threathunting-02.splunkoxygen.com
July and August https://od-threathunting-03.splunkoxygen.com
September and October https://od-threathunting-04.splunkoxygen.com
November and December https://od-threathunting-05.splunkoxygen.com
User: hunter
Pass: pr3dator
Birth Month
5. Am I in the right place?
Some familiarity with…
● CSIRT/SOC Operations
● General understanding of Threat Intelligence
● General understanding of DNS, Proxy, and Endpoint types of data
5
6. This is a hands-on session.
The overview slides are important for building your
“hunt” methodology
10 minutes - Seriously.
8. SANS Threat Hunting Maturity
8
Ad Hoc
Search
Statistical
Analysis
Visualization
Techniques
Aggregation Machine Learning/
Data Science
85% 55% 50% 48% 32%
Source: SANS IR & Threat Hunting Summit 2016
9. Hunting Tools: Internal Data
9
• IP Addresses: threat intelligence, blacklist, whitelist, reputation monitoring
Tools: Firewalls, proxies, Splunk Stream, Bro, IDS
• Network Artifacts and Patterns: network flow, packet capture, active network connections, historic network connections, ports
and services
Tools: Splunk Stream, Bro IDS, FPC, Netflow
• DNS: activity, queries and responses, zone transfer activity
Tools: Splunk Stream, Bro IDS, OpenDNS
• Endpoint – Host Artifacts and Patterns: users, processes, services, drivers, files, registry, hardware, memory, disk activity, file
monitoring: hash values, integrity checking and alerts, creation or deletion
Tools: Windows/Linux, Carbon Black, Tanium, Tripwire, Active Directory
• Vulnerability Management Data
Tools: Tripwire IP360, Qualys, Nessus
• User Behavior Analytics: TTPs, user monitoring, time of day location, HR watchlist
Splunk UBA, (All of the above)
10. Endpoint: Microsoft Sysmon Primer
10
● TA Available on the App Store
● Great Blog Post to get you started
● Increases the fidelity of Microsoft
Logging
Blog Post:
http://blogs.splunk.com/2014/11/24/monitoring-network-traffic-with-sysmon-and-splunk/
11. User: hunter
Pass: pr3dator
January, February & March https://od-threathunting-01.splunkoxygen.com
April, May & June https://od-threathunting-02.splunkoxygen.com
July and August https://od-threathunting-03.splunkoxygen.com
September and October https://od-threathunting-04.splunkoxygen.com
November and December https://od-threathunting-05.splunkoxygen.com
12. Sysmon Event Tags
12
Maps Network Comm to process_id
Process_id creation and mapping to parentprocess_id
16. Demo Story - Kill Chain Framework
Successful brute force
– download sensitive
pdf document
Weaponize the pdf file
with Zeus Malware
Convincing email
sent with
weaponized pdf
Vulnerable pdf reader
exploited by malware.
Dropper created on machine
Dropper retrieves
and installs the
malware
Persistence via regular
outbound comm
Data Exfiltration
Source: Lockheed Martin
19. APT Transaction Flow Across Data Sources
19
http (proxy) session
to
command & control
server
Remote control
Steal data
Persist in company
Rent as botnet
Proxy
Conduct
Business
Create additional
environment
Gain Access
to systemTransaction
Threat
Intelligence
Endpoint
Network
Email, Proxy,
DNS, and Web
Data Sources
.pdf
.pdf executes & unpacks malware
overwriting and running “allowed” programs
Svchost.exe
(malware)
Calc.exe
(dropper)
Attacker hacks website
Steals .pdf files
Web
Portal.pdf
Attacker creates
malware, embed in .pdf,
emails
to the target
MAIL
Read email, open attachment
Our Investigation begins by
detecting high risk
communications through the
proxy, at the endpoint, and
even a DNS call.
20. To begin our
investigation, we will
start with a quick search
to familiarize ourselves
with the data sources.
In this demo
environment, we have a
variety of security
relevant data including…
Web
DNS
Proxy
Firewall
Endpoint
Email
21. Take a look at the
endpoint data source.
We are using the
Microsoft Sysmon TA.
We have endpoint
visibility into all network
communication and can
map each connection
back to a process.
}
We also have detailed
info on each process and
can map it back to the
user and parent process.}
Lets get our day started by looking
using threat intel to prioritize our
efforts and focus on communication
with known high risk entities.
22. We have multiple source
IPs communicating to
high risk entities
identified by these 2
threat sources.
We are seeing high risk
communication from
multiple data sources.
We see multiple threat intel related
events across multiple source types
associated with the IP Address of
Chris Gilbert. Let’s take closer look
at the IP Address.
We can now see the owner of the system
(Chris Gilbert) and that it isn’t a PII or PCI
related asset, so there are no immediate
business implications that would require
informing agencies or external customers
within a certain timeframe.
This dashboard is based on event
data that contains a threat intel
based indicator match( IP Address,
domain, etc.). The data is further
enriched with CMDB based
Asset/identity information.
23. We are now looking at only threat
intel related activity for the IP
Address associated with Chris
Gilbert and see activity spanning
endpoint, proxy, and DNS data
sources.
These trend lines tell a very
interesting visual story. It appears
that the asset makes a DNS query
involving a threat intel related
domain or IP Address.
ScrollDown
Scroll down the dashboard to
examine these threat intel events
associated with the IP Address.
We then see threat intel related
endpoint and proxy events
occurring periodically and likely
communicating with a known Zeus
botnet based on the threat intel
source (zeus_c2s).
24. It’s worth mentioning that at this point
you could create a ticket to have
someone re-image the machine to
prevent further damage as we continue
our investigation within Splunk.
Within the same dashboard, we have
access to very high fidelity endpoint
data that allows an analyst to continue
the investigation in a very efficient
manner. It is important to note that
near real-time access to this type of
endpoint data is not not common within
the traditional SOC.
The initial goal of the investigation is
to determine whether this
communication is malicious or a
potential false positive. Expand the
endpoint event to continue the
investigation.
Proxy related threat intel matches are
important for helping us to prioritize our
efforts toward initiating an
investigation. Further investigation into
the endpoint is often very time
consuming and often involves multiple
internal hand-offs to other teams or
needing to access additional systems.
This encrypted proxy traffic is concerning
because of the large amount of data
(~1.5MB) being transferred which is
common when data is being exfiltrated.
25. Exfiltration of data is a serious
concern and outbound
communication to external entity
that has a known threat intel
indicator, especially when it is
encrypted as in this case.
Lets continue the investigation.
Another clue. We also see that
svchost.exe should be located in a
Windows system directory but this is
being run in the user space. Not
good.
We immediately see the outbound
communication with 115.29.46.99 via
https is associated with the svchost.exe
process on the windows endpoint. The
process id is 4768. There is a great deal
more information from the endpoint as
you scroll down such as the user ID that
started the process and the associated
CMDB enrichment information.
26. We have a workflow action that will
link us to a Process Explorer
dashboard and populate it with the
process id extracted from the event
(4768).
27. This is a standard Windows app, but
not in its usual directory, telling us
that the malware has again spoofed
a common file name.
We also can see that the parent
process that created this
suspicuous svchost.exe process is
called calc.exe.
This has brought us to the Process
Explorer dashboard which lets us
view Windows Sysmon endpoint
data.
Suspected Malware
Lets continue the investigation by
examining the parent process as this
is almost certainly a genuine threat
and we are now working toward a
root cause.
This is very consistent with Zeus
behavior. The initial exploitation
generally creates a downloader or
dropper that will then download the
Zeus malware. It seems like calc.exe
may be that downloader/dropper.
Suspected Downloader/Dropper
This process calls itself “svchost.exe,”
a common Windows process, but the
path is not the normal path for
svchost.exe.
…which is a common trait of
malware attempting to evade
detection. We also see it making a
DNS query (port 53) then
communicating via port 443.
28. The Parent Process of our suspected
downloader/dropper is the legitimate PDF
Reader program. This will likely turn out to
be the vulnerable app that was exploited
in this attack.
Suspected Downloader/Dropper
Suspected Vulnerable AppWe have very quickly moved from
threat intel related network and
endpoint activity to the likely
exploitation of a vulnerable app.
Click on the parent process to keep
investigating.
29. We can see that the PDF
Reader process has no
identified parent and is the
root of the infection.
ScrollDown
Scroll down the dashboard to
examine activity related to the PDF
reader process.
30. Chris opened 2nd_qtr_2014_report.pdf
which was an attachment to an email!
We have our root cause! Chris opened a
weaponized .pdf file which contained the Zeus
malware. It appears to have been delivered via
email and we have access to our email logs as one
of our important data sources. Lets copy the
filename 2nd_qtr_2014_report.pdf and search a
bit further to determine the scope of this
compromise.
31. Lets dig a little further into
2nd_qtr_2014_report.pdf to determine the scope
of this compromise.
32. Lets search though multiple data sources to
quickly get a sense for who else may have
have been exposed to this file.
We will come back to the web
activity that contains reference to
the pdf file but lets first look at the
email event to determine the scope
of this apparent phishing attack.
33. We have access to the email
body and can see why this was
such a convincing attack. The
sender apparently had access to
sensitive insider knowledge and
hinted at quarterly results.
There is our attachment.
Hold On! That’s not our
Domain Name! The spelling is
close but it’s missing a “t”. The
attacker likely registered a
domain name that is very close
to the company domain hoping
Chris would not notice.
This looks to be a very
targeted spear phishing
attack as it was sent to
only one employee (Chris).
34. Root Cause Recap
34
Data Sources
.pdf executes & unpacks malware
overwriting and running “allowed” programs
http (proxy) session
to
command & control
server
Remote control
Steal data
Persist in company
Rent as botnet
Proxy
Conduct
Business
Create additional
environment
Gain Access
to systemTransaction
Threat
Intelligence
Endpoint
Network
Email, Proxy,
DNS, and Web
.pdf
Svchost.exe
(malware)
Calc.exe
(dropper)
Attacker hacks website
Steals .pdf files
Web
Portal.pdf
Attacker creates
malware, embed in .pdf,
emails
to the target
MAIL
Read email, open attachment
We utilized threat intel to detect
communication with known high risk
indicators and kick off our investigation
then worked backward through the kill
chain toward a root cause.
Key to this investigative process is the
ability to associate network
communications with endpoint process
data.
This high value and very relevant ability to
work a malware related investigation
through to root cause translates into a very
streamlined investigative process compared
to the legacy SIEM based approach.
35. 35
Lets revisit the search for additional
information on the 2nd_qtr_2014-
_report.pdf file.
We understand that the file was delivered
via email and opened at the endpoint. Why
do we see a reference to the file in the
access_combined (web server) logs?
Select the access_combined
sourcetype to investigate
further.
36. 36
The results show 54.211.114.134 has
accessed this file from the web portal
of buttergames.com.
There is also a known threat intel
association with the source IP
Address downloading (HTTP GET)
the file.
37. 37
Select the IP Address, left-click, then
select “New search”. We would like to
understand what else this IP Address
has accessed in the environment.
38. 38
That’s an abnormally large
number of requests sourced
from a single IP Address in a
~90 minute window.
This looks like a scripted
action given the constant
high rate of requests over
the below window.
ScrollDown
Scroll down the dashboard to
examine other interesting fields to
further investigate.
Notice the Googlebot
useragent string which is
another attempt to avoid
raising attention..
39. 39
The requests from 52.211.114.134 are
dominated by requests to the login page
(wp-login.php). It’s clearly not possible to
attempt a login this many times in a short
period of time – this is clearly a scripted
brute force attack.
After successfully gaining access to our
website, the attacker downloaded the
pdf file, weaponized it with the zeus
malware, then delivered it to Chris
Gilbert as a phishing email.
The attacker is also accessing admin
pages which may be an attempt to
establish persistence via a backdoor into
the web site.
40. Kill Chain Analysis Across Data Sources
40
http (proxy) session
to
command & control
server
Remote control
Steal data
Persist in company
Rent as botnet
Proxy
Conduct
Business
Create additional
environment
Gain Access
to systemTransaction
Threat
Intelligence
Endpoint
Network
Email, Proxy,
DNS, and Web
Data Sources
.pdf
.pdf executes & unpacks malware
overwriting and running “allowed” programs
Svchost.exe
(malware)
Calc.exe
(dropper)
Attacker hacks website
Steals .pdf files
Web
Portal.pdf
Attacker creates
malware, embed in .pdf,
emails
to the target
MAIL
Read email, open attachment
We continued the investigation
by pivoting into the endpoint
data source and used a
workflow action to determine
which process on the endpoint
was responsible for the
outbound communication.
We Began by reviewing
threat intel related events
for a particular IP address
and observed DNS, Proxy,
and Endpoint events for a
user in Sales.
Investigation complete! Lets get this
turned over to Incident Reponse team.
We traced the svchost.exe
Zeus malware back to it’s
parent process ID which was
the calc.exe
downloader/dropper.
Once our root cause analysis
was complete, we shifted out
focus into the web logs to
determine that the sensitive pdf
file was obtained via a brute
force attack against the
company website.
We were able to see which
file was opened by the
vulnerable app and
determined that the
malicious file was delivered
to the user via email.
A quick search into the mail
logs revealed the details
behind the phishing attack
and revealed that the scope
of the compromise was
limited to just the one user.
We traced calc.exe back to
the vulnerable application
PDF Reader.
53. Data from asset framework
Configurable Swimlanes
Darker=more events
All happened around same timeChange to
“Today” if needed
Asset Investigator, enter
“192.168.56.102”
61. ML Toolkit & Showcase
• Splunk Supported framework for building ML Apps
– Get it for free: http://tiny.cc/splunkmlapp
• Leverages Python for Scientific Computing (PSC) add-on:
– Open-source Python data science ecosystem
– NumPy, SciPy, scitkit-learn, pandas, statsmodels
• Showcase use cases: Predict Hard Drive Failure, Server Power
Consumption, Application Usage, Customer Churn & more
• Standard algorithms out of the box:
– Supervised: Logistic Regression, SVM, Linear Regression, Random Forest, etc.
– Unsupervised: KMeans, DBSCAN, Spectral Clustering, PCA, KernelPCA, etc.
• Implement one of 300+ algorithms by editing Python scripts
65. Splunk UBA Use Cases
ACCOUNT TAKEOVER
• Privileged account compromise
• Data exfiltration
LATERAL MOVEMENT
• Pass-the-hash kill chain
• Privilege escalation
SUSPICIOUS ACTIVITY
• Misuse of credentials
• Geo-location anomalies
MALWARE ATTACKS
• Hidden malware activity
BOTNET, COMMAND & CONTROL
• Malware beaconing
• Data leakage
USER & ENTITY BEHAVIOR ANALYTICS
• Suspicious behavior by accounts or
devices
EXTERNAL THREATSINSIDER THREATS
66. Splunk User Behavior Analytics (UBA)
• ~100% of breaches involve valid credentials (Mandiant Report)
• Need to understand normal & anomalous behaviors for ALL users
• UBA detects Advanced Cyberattacks and Malicious Insider Threats
• Lots of ML under the hood:
– Behavior Baselining & Modeling
– Anomaly Detection (30+ models)
– Advanced Threat Detection
• E.g., Data Exfil Threat:
– “Saw this strange login & data transferfor user kwestin
at 3am in China…”
– Surface threat to SOC Analysts
Now unfortunately, you do need a modern laptop with a modern browser to participate. You can probably get away with a Surface or something like that, but iPads, old browsers, and especially IBM PCjr’s will not work. (don’t laugh – I actually had one of those.)
We are going to get hands on, and I want to make sure we have enough time to go through the exercises. But I need to frame up why this is important first, so bear with me.
-Why did we decide to do a hands on session
-compelling presentations, especially from Splunk customers but….
-we want to get our hands on the software and experience it for ourselves
Great Blog Post covering the Microsoft Sysmon TA (download, config, install, usage):
http://blogs.splunk.com/2014/11/24/monitoring-network-traffic-with-sysmon-and-splunk/
The App TS is located here:
https://apps.splunk.com/apps/#/search/sysmon/page/1
Given a general timeframe and a network 5-tuple (src_ip dest_ip, src_port. Dest_port, and protocol) we can query our mountain of endpoint data VERY QUICKLY to figure out who/what was responsible for the communication.
Armed with an understanding of the general methodologies that the attacker utilizes in modern Advanced Persistent Threats (APT), we can better equip ourselves to defend and disrupt this type of attack. The above phases are described by the Lockheed Martin Cyber Kill Chain. Our goal is to make certain that we choose our data sources wisely such that we have covered across all of the phases to give us the best chance or defending ourselves. We overlay both CMDB and Threat Intelligence based lookups to enrich that data to act as both context to help prioritize our efforts and act as a detection mechanism in this example.
Keep in mind, this exercise is focused on investigations (hunting) and not on detection. Threat Intelligence will likely not look as clean as it does in this lab and will usually contain a good deal of false positives depending on the fidelity of the threat intelligence source.
Here is the scenario that we will be walking through in this lab.
So to find all of the “known” threats – that’s pretty simple. We’re going to grab all of the data from all of the things in your environment that are regularly updated with knowledge about known threats. All the traditional security stuff. IDS/IPS. Anti-malware defenses. DLP, Vulnerability scans. SIEM technology. And of course we will collect firewall data and auth data.
But what about unknown threats? How do we find those? Well we need to look at a much bigger set of data, and then find the unusual patterns in that data. So we want to look at things like threat intel, email, web, desktops – the first four items on the top line are what we’ll focus on in this hands-on exercise. But note that Splunk can collect a whole lot more data that we believe is extremely security relevant – especially when it comes to detecting those unknown threats.
Cymru is pronounced (“cum-ree”)
This should look familiar to you. What we’re doing here is giving a starting point for any Security Analyst to understand at a high level what’s going on in the environment. A single pane of glass, if you will, for all security data.
Everything we are seeing here is customizable – the panels, the indicators, via standard Splunk functionality.
Most of the data on this dashboard is centered on Notable Events. Notable Events are a concept unique to Splunk with ES – there’s an entire Notable Event framework that allows us to perform simple or complex correlations, and then create events by analyzing disparate events from disparate sources.
Notable Events in ES are categorized into various high-level security domains: access, audit, identity, network, and threat. We’ll see those categories throughout the app.
You can see Splunk Sparklines here – these little green lines. These are great for detecting quick trends in the security events – a continuous line means something constant, which could be a heartbeat or a scripted attack. A spike could be a single attack or maybe just someone fat-fingering their password a few times.
We’ll drill into some of these incidents in a few minutes, but let’s continue on with our tour. How does all this data get into Splunk?
This should look familiar to you. What we’re doing here is giving a starting point for any Security Analyst to understand at a high level what’s going on in the environment. A single pane of glass, if you will, for all security data.
Everything we are seeing here is customizable – the panels, the indicators, via standard Splunk functionality.
Most of the data on this dashboard is centered on Notable Events. Notable Events are a concept unique to Splunk with ES – there’s an entire Notable Event framework that allows us to perform simple or complex correlations, and then create events by analyzing disparate events from disparate sources.
Notable Events in ES are categorized into various high-level security domains: access, audit, identity, network, and threat. We’ll see those categories throughout the app.
You can see Splunk Sparklines here – these little green lines. These are great for detecting quick trends in the security events – a continuous line means something constant, which could be a heartbeat or a scripted attack. A spike could be a single attack or maybe just someone fat-fingering their password a few times.
We’ll drill into some of these incidents in a few minutes, but let’s continue on with our tour. How does all this data get into Splunk?
Here we have a simple dashboard showing us all sorts of detail about recent malware activity in the environment. Like Security Posture, this is high level information, but more granular about a certain security domain (Malware, which is under Endpoint). We have these “centers” throughout ES for things like Access, Traffic, Intrusions, Updates, Vulnerabilities, and many other security-relevant areas, and you can investigate them later.
For now, let’s drill into two of the “top infections” to see CIM at work. Looking at this dashboard we can’t tell that we actually have at least two different endpoint protection systems feeding data into Splunk: Sophos, Trend Micro, and Symantec Endpoint Protection. Splunk normalizes the data on search time, according to CIM, to create this (and the other) dashboards.
Click on Mal/Packer, and you’ll see that this infection was detected by Sophos. The raw logs are literally a click away:
The main reason why this risk framework is important is that it gets you away from writing specific rules for specific threats or assets. You don’t need 1,000 correlation rules anymore – you simply can elevate risk scores on whatever object you want, based on the behavior you’re seeing in the environment. So the idea here is, a correlation rule fires, and then a risk modifier takes effect and changes the risk score based on cumulative scoring of whatever else has happened to that user, or system, or other object.
On the dashboard, we can define filters to find a particular system or user or timeframe.
Note the natural language descriptions (in the screenshot they are medium and low). We track how your overall risk scoring is doing over time, and constantly re-calculate the baseline. Got a lot of activity going on that isn’t “normal” for that timeframe and you might see things going from “increasing minimally” to “extremely increasing” – all based on what the historical norm is.
We can of course see which objects have the highest risk and which correlation rules are contributing the most to the highest risk.
On the dashboard we can see that we’re using the power of Splunk search to match artifacts in our incoming data against IoC’s we find in our threat feeds. Splunk de-duplicates the threat feeds so that if an artifact shows up in multiple feeds you don’t get duplicate notifications.
We can filter the display by threat_group, which is essentially the source of the IoCs. This could be something commercial like ThreatStream or ThreatConnect or Norse, something open-source like Sans or iblocklist, or something from your ISAC that is delivered over a TAXII feed in STIX format.
The threat collection shows that we can use various IoCs to match up against artifacts in our data – IP addresses, domain names, URLs, filenames, certificate common names and organizations, email addresses, registry keys – as long as it can be defined in your incoming feed or locally, you can use it as an IoC.
You can see the most active threat sources, and if you scroll down, you can see the most recent matches against your threat feeds.
How are these configured? Let’s go to the configuration, and see.
In version 3.1 of Enterprise Security we introduced a full Risk Analysis framework. This is unique because we allow you to assign an arbitrary risk number, that means something to you, based on a notable event. You can assign risk to a user, or to a system, or to some other object that you see in the environment – perhaps a particular piece of malware is considered risky to you so you elevate the risk on the malware “object” itself.
Let’s bring up the Risk Analysis page associated with Advanced Threat:
Rounding out the Threat Intelligence capabilities are the Threat Artifacts browser, which allows us to search through all of the artifacts stored in ES:
We don’t have time to go through each and every one of the advanced threat capabilities in the ES app. However, let’s just see that up here under Advanced Threat we have some very interesting capabilities:
Some of the most useful ones are the Protocol Intelligence that leverages wire data from things like Splunk Stream, Netflow, and Bro. Also the Access Anomalies and User Activity, which are very useful to detect possible insider threat. And the New Domain Analysis, which analyzes traffic patterns and DNS queries to domains, and then tells you if you have devices communicating with recently registered garbage domains (that are often associated with DGA). Again – this is something you can go through on your own time.
Asset Investigator shows us, at the top, all of the things we know about this asset from sources such as CMDBs or Active Directory. It also has multiple “swimlanes” that visually show you what’s been going on with the asset:
We can see Threat List, Exec File, IDS, and Notable Events associated with this asset, most of those happening right around the same time (this was likely the time of infection).
More beaker than bunson
Supervised learning is where you have existing LABELS in the data to help you out.
Example: If you’re training a model for CUSTOMER CHURN, historically you know which customers stayed and which left. You can build a model to correlate historical churn with other features in the data. Then you can PREDICT churn for each customer based on everything they’re doing in real-time and have done in the past.
Unsupervised learning is where you have NO LABELS to help you out. You have to figure out patterns
Example: If you’re trying to do BEHAVIORAL ANALYTICS, you might just have a big confusing pile of IT & Security data to wade through. Unsupervised learning is the art & science of finding PATTERNS, BASELINES and ANOMALIES in the data. Once you understand all this (that’s hard!) you can try to predict possible INCIDENTS and THREATS.
Good ML involves FEEDBACK loops. Best bet is to incorporate INCIDENT RESPONSE data and learn from what analysts have done in the past.
Free app! Toolkit & PSC are both free. Go to ML App link above, and click Documentation. Links for all distros ()
Q: Why standalone SH?
A: Don’t want ML exploration & production to bring down other Splunk workloads
Can use standalone 6.4 SH with older version SH cluster & indexers.
Re: 100% of breaches involve valid credentials:
"Mandiant is the leader in incident response. They are the best of the best. They're brought in to deal with the largest, highest profile, most damaging breaches. When you read a news headline about a large organization being compromised, there is a great chance that Mandiant is working behind the scenes to eradicate the attackers from the environment. In a recent yearly report-when they looked across all the very damaging attacks they responded to-they noticed that valid credentials were used at some point in every single one of them. Why do we care? Well we care because it means that we cannot use simple techniques like counting failed logins to detect an attack. In fact based on this stat, there may not even be any failed logins to count! Instead we need to be much smarter. For example how can we look at 1000 successful logins and determine which of them was the malicious one? We do that through behavior analytics, baseline/outlier, ML, etc."