This document discusses mining cyber threat intelligence (CTI) to improve cybersecurity risk mitigation. It describes how CTI can be extracted from unstructured data sources using machine learning techniques and converted into a structured format. It also outlines challenges including dealing with the large and heterogeneous volumes of CTI data, exploring the deep and dark web, and ensuring legal compliance and privacy. The overall goal is to develop an advanced CTI platform that can accurately model attack strategies and determine effective countermeasures.
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
CTI crawling and classification in Cyber-Trust
1. Advanced Cyber-Threat Intelligence, Detection and
Mitigation Platform for a Trusted Internet of Things
Mining for cyber-threat intelligence to
improve cyber-security risk mitigation
Panel on Cyber-security Intelligence
2019 Community of Users Workshop
Nicholas Kolokotronis
Department of Informatics and Telecommunications
University of Peloponnese • nkolok@uop.gr
2. Advanced Cyber-Threat Intelligence, Detection and
Mitigation Platform for a Trusted Internet of Things
Cyber-threat intelligence
▪ From unstructured (textual)
high-volume data to
o Vulnerabilities/exploits
o Links to CVE/other VDB IDs
o Threat actors TTPs
o Specific products/platforms
o Popularity, price, …
o CVSS => measurable
▪ CTI needs to be compliant
against legal requirements
2
CT
3. Advanced Cyber-Threat Intelligence, Detection and
Mitigation Platform for a Trusted Internet of Things
Cyber-defense goals
▪ Accurate modelling of the
attack strategies
▪ Determine the attackers’
capabilities
o constraint resources (budget,
tools, etc.)
▪ The attackers’ goals vary
depending on the target
o access level, degrade QoS, …
▪ Define the defender’s
available actions
o possible counter-measures
o highlight parameters
▪ Cyber-defense
needs to
minimize
the attack
surface
3
5. Advanced Cyber-Threat Intelligence, Detection and
Mitigation Platform for a Trusted Internet of Things
Dynamic risk analysis: attack models
5
6. Advanced Cyber-Threat Intelligence, Detection and
Mitigation Platform for a Trusted Internet of Things
Example: exploitation probability
▪ Need to be
measurable
o Estimated from
CVSS metrics
o 𝑃 𝑒𝑖 = 2 ×
𝐴𝑉 × 𝐴𝐶 × 𝐴𝑢
▪ Likewise for an
attack’s attempt
probability
6
7. Advanced Cyber-Threat Intelligence, Detection and
Mitigation Platform for a Trusted Internet of Things
ML – from CTI to structured TTPs
▪ Conversion of CTIs to a semi-structured format (JSON, XML)
▪ Filtering specific (TTP, exploits) information, has the benefits:
o More easily processed in a automated way
o Only condensed information will be available
o Reports will be still readable
▪ Known formats for attack patterns is STIX v2.1
▪ The conversion of CTIs into actionable information can be
achieved using ML techniques
7
10. Advanced Cyber-Threat Intelligence, Detection and
Mitigation Platform for a Trusted Internet of Things
Classifier needed with a
number of features, like:
▪ Word size (CTIs with
elaborated TTPs tend to be
larger)
▪ Security action word
density (security correlated
verbs)
▪ Security target word
density (security correlated
nouns)
Data pre-processing
1. Need crawler that gathers all
pages from the web
o CTI vendors (e.g. Symantec)
o Forums, blogs, etc.
2. Sanitize content and keep all
textual information as articles
o Remove HTML tags, images,
etc.
3. Automated decision on the
CTI value of each article
o otherwise it is dropped
10
11. Advanced Cyber-Threat Intelligence, Detection and
Mitigation Platform for a Trusted Internet of Things
[CT] CTI crawling and classification
▪ Crawling components used in Cyber-Trust
11
12. Advanced Cyber-Threat Intelligence, Detection and
Mitigation Platform for a Trusted Internet of Things
[CT] CTI crawling and classification
▪ Clear/Deep/Forum web crawling in Cyber-Trust
o Implement topic-specific crawling on publicly available web sites
▶︎ focus on Deep/Dark web sites that don’t require authentication
o Model Builder is responsible for creating the classification
model; needs a set of positive and negative URLs.
o Seed Finder identifies the initial seed of URLs to crawl based on
a user-defined query, e.g. on “IoT vulnerabilities”
o The crawled websites go through the Article/Forum Parser,
which extracts the useful text part of each one
▶︎ internally forums are structured in a different way compared to websites
12
14. Advanced Cyber-Threat Intelligence, Detection and
Mitigation Platform for a Trusted Internet of Things
Data pre-processing
▪ Security correlated verbs/nouns are extracted from CVEs,
CAPEC, CWE repositories using NLP techniques
o Used on each article to find all OVS (Object, Verb, Subject) triplets;
these are candidate threat actions
▪ CTI contain strings that an NLP parser may not understand,
such as IoCs
o To remedy this,
we temporally
substitute these
with RegEx, e.g.:
14
15. Advanced Cyber-Threat Intelligence, Detection and
Mitigation Platform for a Trusted Internet of Things
TTP specific ontology
15
▪ An ontology created by TTPs provided by ATT&CK and
CAPEC repositories (MITRE)
Class name Class description Example
Kill chain phase Phase information, e.g. name or order Control or 5
Tactic Description of how to achieve a phase Privilege escalation
Technique Description of how to achieve a tactic DLL injection
Threat action Verb associated with malicious action Overwrite, Terminate
Object The action’s target File, Process
Pre-condition Action prerequisites that have to hold User access
Intent Goal/subgoal of an action Run malicious code
16. Advanced Cyber-Threat Intelligence, Detection and
Mitigation Platform for a Trusted Internet of Things
Towards threat actions
▪ Find similarity of candidate actions with all records in ontology
▪ Information Retrieval (IR) scoring vs. threshold
▪ Vocabulary based on synonyms (e.g. by WordNet) or custom
▪ Best scoring class is assigned to the threat action
16
17. Advanced Cyber-Threat Intelligence, Detection and
Mitigation Platform for a Trusted Internet of Things
[CT] CTI classification
▪ Topic vocabulary in Cyber-
Trust
o XML docs converted into text
via XML Data Retriever
o Normalizer drops symbols,
converts to lowercase, etc.
o Collected tags are multi-word
terms given to Multi-Word
Expression Tokenizer
▶︎“exploit kits” => “exploit-kits”
o Word2Vec finds the similarity
17
18. Advanced Cyber-Threat Intelligence, Detection and
Mitigation Platform for a Trusted Internet of Things
[CT] CTI classification
▪ Example top terms in Cyber-Trust collection for tag ddos
19. Advanced Cyber-Threat Intelligence, Detection and
Mitigation Platform for a Trusted Internet of Things
CTI sharing: using STIX
▪ Structured language for
any CTI
o wide range use cases support
o can focus on relevant aspects
▪ High level of recognition by
CSIRTs and LEAs
▪ Combined with TAXII 2.0
o OSS implementations
▪ Supported by MISP
Attack pattern SDO
{
“type” : “attack”,
“id” : “attack-pattern-xyz…”,
“created” : “2017-06-8T08:17:27.000Z”,
“modified” : “2017-06-8T08:17:27.000Z”,
“name” : “Input Capture”,
“description” : “Adversary logs
keystrokes to obtain credentials”,
“kill_chain_phases” : “Maintain”,
“external_references” :
[ {
“source_name” : “ATT&CK”,
“id” : “T1056”
} ]
}
19
20. Advanced Cyber-Threat Intelligence, Detection and
Mitigation Platform for a Trusted Internet of Things
CTI sources’ quality aspects
▪ Existence of conflicting data among sources
▪ Techniques can be used to assess the credibility of source
o Using special-purpose ranking engines (e.g. SimilarWeb)
▶︎ A combination of metrics (page views, unique site users, web traffic, etc.)
▶︎ Include some Dark Web sites
o Number of users (useful for Dark Web sites)
o Number of posts per day
o Number of CVEs per day
▶︎ More than 3/4 of vulnerabilities are publicly reported online ~7d before NVD
▶︎ Mainly concerns Dark Web, paste sites, and cyber-criminal forums
20
21. Advanced Cyber-Threat Intelligence, Detection and
Mitigation Platform for a Trusted Internet of Things
Use of CTI in Cyber-Trust
21
CTI sharing
dark web
deep web
clear web
22. Advanced Cyber-Threat Intelligence, Detection and
Mitigation Platform for a Trusted Internet of Things
Conclusions - challenges
▪ ML can be used for extracting CTIs to structured and
actionable formats
▪ Technical challenges for coping with heterogeneity and
volume of cyber-threat data
o Need for (semi-)automated means of processing
o Focused and topic-based crawling can improve performance
o Deep/dark web exploration presents additional challenges
o Big data management and NoSQL stores for efficiency
▪ Legal compliance and privacy-preserving data mining?
22