Building a Cyber Threat Intelligence Knowledge Management System (Paris August 2019)

Building a management system for cyber threat
intelligence knowledge using
OpenCTI
Paris Meetup, 27th August 2019

SPEAKERS
Head of CyberThreat Intelligence @ ANSSI VP of Engineering @YOOI
yooi.comssi.gouv.fr
Co-founders of Luatix
@SamuelHassine Samuel Hassine @richardjulien Julien Richard
openex.io
luatix.org
opencti.io

THE STORY
WHY OPENCTI
WHY WE/YOU NEED A SOFTWARE LIKE THIS
HOW TO START THE PROJECT
YOU CANNOT DEVELOP ALONE
HOW GRAKN EMPOWERS OPENCTI
READY TO USE GRAKN FOR YOUR PROJECT?
WHAT’S NEXT FOR OPENCTI
HEY SAM, YOUR MODEL LOOKS LIKE A GRAPH DOESN’T IT?
DECISION CRITERIA, LEARN AND DISCOVER THE POWER
EVERYTHING HAS A PRICE. PERFECTION DOES NOT EXIST
LOOKING FOR A STANDARD AND FIRST IMPLEMENTATION
ITS JUST THE BEGINNING, WE WANT TO DO MORE

WHY OPENCTI
Intelligence for partner CTI teams
Indicators and signatures for SOC teams
Tactics, techniques and procedures for DFIR teams
Behaviors to help prioritizing EDR and IDS development roadmaps
Red team scenarios for hackers and pentesters teams
Provide knowledge about threat actors of interest
Daily work of a CTI analyst
Investigate adversary behaviors, arsenals and infrastructures
Pivoting on technical elements
Correlating behaviors and finding patterns
< KNOWLEDGE REQUIRES KNOWLEDGE MANAGEMENT >

WHY OPENCTI
TTPs
Tools
Host/network artefacts
Domain names
IP addresses
Hash values Trivial
Easy
Simple
Annoying
Challenging
Tough!
The adversaries “pyramid of pain”
< FOCUS ON TOP! >

WHY OPENCTI
Knowledge issues to solve
From a strategic level...
Victimology of an intrusion set or a threat actor over time.
Tactics and procedures of a campaign targeting a specific sector.
Reusing of legitimate tools in malicious codes families.
Campaigns targeting an organization or sector over time.
to an operational one.
Observables linked to a specific threat and evolution over time.
Clusters of malicious artefacts and enrichment (hosters, registrars, etc.).
< NEEDED ANSWERS >

WHY OPENCTI
Today in the CTI world, long live unstructured data!
CTI analyst
Partners
Vendors
OSINT
SIGINT
We have intel!
Enrich
Investigate
IntelligenceWe have intel!
< CTI ANALYSTS ARE NOT LIBRARIANS >

WHY OPENCTI
A complex role in a complex workflow
Cyber threat
intelligence is
not doomed to
be only the
main data
source of the
security
detection chain
and associated
to guys who
produce
reports that
may be read.

Functional needs
Structured and organized storage of information related to cyber threats
Unified data space between all levels of information from operational to strategic
Traceability of the source of all capitalized information
Viewing, sharing, correlation features
According to this context, we need:

From unstructured to structured!
STIX2 is the most accurate data model that currently exists
to store cyber threat intelligence knowledge.
Storage of TTPs can be done with any framework
(ATT&CK, KillChain, NSA, custom, etc.). A connector
fully integrates Enterprise ATT&CK and Pre-ATT&CK.
Think the data model
https://oasis-open.github.io/cti-documentation/stix/intro
https://attack.mitre.org
< STIX2 RULES >

STIX2 in a nutshell
{
"id": "intrusion-set--bef4c620-0787-42a8-a96d-
b7eb6e85917c",
"type": "intrusion-set",
"name": "APT28",
"aliases": [
"APT28",
"Sednit",
"Sofacy",
"Fancy Bear",
],
"description": "APT28 is a threat group that has been
attributed to Russia's Main Intelligence Directorate of the
Russian General Staff by a July 2018 U.S. Department of
Justice indictment.",
"created_by_ref": "identity--c78cb6e5-0c4b-4611-8297-
d1b8b55e40b5",
"modified": "2019-07-27T00:09:33.254Z",
"created": "2017-05-31T21:31:48.664Z",
"object_marking_refs": [
"marking-definition--fa42a846-8d90-4e51-bc29-
71d5b4802168"
]
}
STIX2 is composed of entities,
embedded relationships and
relationships.
Embedded relations
Entity Intrusion Set
uses
Embedded relation

STIX2 in a nutshell
{
"id": "malware--af2ad3b7-ab6a-4807-91fd-51bcaff9acbb",
"type": "malware",
"name": "USBStealer",
"description": "USBStealer is malware that has used by APT28 since at
least 2005 to extract information from air-gapped networks.",
"created_by_ref": "identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5",
"modified": "2018-10-17T00:14:20.652Z",
"created": "2017-05-31T21:33:17.716Z",
"marking-definition--fa42a846-8d90-4e51-bc29-71d5b4802168"
]
}
{
"id": "relationship--d26b3aeb-972f-471e-ab59-dc1ee2aa532e",
"type": "relationship",
"relationship_type": "uses",
"description": "APT28 uses USBStealer.",
"source_ref": "intrusion-set--bef4c620-0787-42a8-a96d-b7eb6e85917c",
"target_ref": "malware--af2ad3b7-ab6a-4807-91fd-51bcaff9acbb"
"created_by_ref": "identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5",
"modified": "2019-07-27T00:09:36.949Z",
"created": "2017-05-31T21:33:27.041Z",
"marking-definition--fa42a846-8d90-4e51-bc29-71d5b4802168"
]
}
Entity Malware
Relationship
Embedded relations
Embedded relation
Embedded relation
Embedded relations

Let’s start something with
Why
Community works on importing ATT&CK STIX2 data to MongoDB
Support for embedded documents in a document (created_by_ref, object_marking_refs, etc…)
Simple JSON documents importation
Interesting query language and full text search feature
Proof of concept
How
REST API architecture, backend in PHP / Symfony and frontend in ReactJS
Home made Symfony library to handle STIX2 objects and relationships in MongoDB
Relationships are documents with an embedded link to source and target documents
< CODING…. >
https://www.mongodb.com

Proof of concept
Create a new identity with
the corresponding document
class.
Use Symfony form to create
entities and store them.
< CALL JULIEN >

Does this approach suit you?
Proof of concept
Seems really cool, but are you sure about
, and real timewe try a contract-based API backend, using
https://graphql.org https://relay.dev
Relay
https://redis.io
and
collaboration powered by the addition of ?
By the way, what do you think if
the model and usage of the database ?

NO ONE SUCCEEDS ALONE
STIX2 looks like a graph model
Search “Graph database” in your favorite search engine:
sounds like the #1 choice
< LET’S GIVE IT A TRY >
First try
https://freetaxii.github.io/stix2-object-relationships.html
https://neo4j.com

REPORT
Do it on top of or ? I’m too old for this …
< SO WHAT? FORGET SOMETHING >
Looks like a good idea until “the report use case”
uses
Object_refs
https://janusgraph.org
{
"id": "report--bef4c621-0787-42a8-a96d-b7eb6e85917c",
"type": "report",
"name": "APT28 is using USBStealer since 2012!",
"description": "APT28 is using USBStealer in a new
campaign.",
"created_by_ref": "identity--c78cb6e5-0c4b-4611-8297-
d1b8b55e40b5",
"published": "2019-07-27T00:09:33.254Z",
"created": "2017-05-31T21:31:48.664Z",
"object_refs": [
"intrusion-set--bef4c620-0787-42a8-a96d-b7eb6e85917c",
"malware--af2ad3b7-ab6a-4807-91fd-51bcaff9acbb",
"relationship--d26b3aeb-972f-471e-ab59-dc1ee2aa532e"
]
}
The report is about APT28 and USBStaler, but it is mostly about the relationship between the 2 entities!
We need a solution for this kind of nested relations…

This is not graph … this is hypergraph
AtomSpace
https://github.com/opencog/atomspace
http://hypergraphdb.org
https://www.datachemist.com
https://grakn.ai
< LET’S GIVE GRAKN.AI A TRY… AGAIN >
https://en.wikipedia.org/wiki/Hypergraph

Hey Samuel! Let’s go with Grakn!
< A RISK WE SHOULD TAKE >
DUDE… REALLY?
Open source / Github First release in 2016
Active community Good documentation
Confidential but with the feeling that Grakn will solve our future requirements.
Grakn overview

Database
Indexing
Main storage
Storage for speed-up
lists and search
Frontend
Workers
Subscriptions
Messaging system
Push data
Connectors
Background jobs such
as importing, exporting, etc.
Consume messages
API
Outside world
OPENCTI ARCHITECTURE
Applications and databases

{
"reports":{
"edges":[
{
"node":{
"name":"2019-01-21: APT28 Autoit Zebrocy Progression",
"published":"2019-01-21T00:00:00Z",
"createdByRef":{
"node":{
"name":"VK-Intel",
...
}
}
}
}, ...
ElaticSearch.get(index:"stix_domain_entities", identifier)
match $r isa Report;
$rel(creator:$x, so:$r) isa created_by_ref; $x has name $o;
get $r, $rel, $o; sort $o asc;
offset 0; limit 25;
query ReportsLinesPaginationQuery
($objectId: String ... $orderBy: ReportsOrdering) {
reports(objectId: $objectId, ... orderBy: $orderBy) {
edges {
node {
id
name
object_status
published
createdByRef {
node {
name
id
}
}
markingDefinitions {
edges {
node {
id
definition
}
}
}
}
}
}
}
GraphQL Query Graql Query
ES query ES query ES query
GraphQL response
“Just” the ordered list of reports by Author

Database
Indexing
Workers
Messaging systemMISP
Workers Workers
GrahQL API
MISP data event integration

By enforcing the data model in Grakn, we found out many useful features which
actually saved us time and new dependencies.
What we loved:
Knowledge schema
Entities, abstract entities and sub entities
Nested relations
Logical inference of relations
Reasoning rules language
Inferred relations computed at runtime
< GRAKN ENABLES NEW OPENCTI FEATURES >

Stix-Domain sub entity,
abstract,
has internal_id,
has stix_id,
has stix_label,
has created,
has modified,
has revoked,
plays so;
Stix-Domain-Entity sub Stix-Domain,
abstract,
has name,
has description,
has alias;
Intrusion-Set sub Stix-Domain-Entity,
has first_seen,
has last_seen,
has goal,
has sophistication,
has resource_level,
has primary_motivation,
has secondary_motivation,
plays attribution,
plays source,
plays user,
plays origin;
Data model

France
APT28
localized-in
From a functional point
of view, this is not
satisfactory.
Considering the whole
knowledge graph, that’s not
correct.
Nested relations
Industry
Energy
Germany
APT28
Energy (Germany)
Energy (France)
Industry (France)
France
Germany
targets

This makes sense!
targets
targets
Localized-in
Localized-in
Localized-in
Nested relations
FranceAPT28
Industry
Energy
Germany
targets
APT28 has 3 target relations :
Energy in Germany
Energy in France
Industry in France

APT28
GRU
XTunnel
attributed-to uses
## USES RULES
AttributionUsesRule sub rule,
when {
(origin: $origin, attribution: $entity) isa attributed-to;
(user: $entity, usage: $object) isa uses;
}, then {
(user: $origin, usage: $object) isa uses;
};
usesinferred relation
inferredrelation
Logical inference of relations
Customizable inference rules directly in the UI in the roadmap!
This is a very important feature since you can have complex use cases with multiple levels of
inferences, and use reasoning rules to make your data more meaningful.

targets
Localized-in
Localized-in
Localized-in
Nested relations with inferences
FranceAPT28
Industry
Energy
Germany
targets
APT28 has 2 target inferred relations :
targets Germany because targets Energy in Germany
targets France because targets Energy or Industry in France
targets
targets
targets
LocalizationOfTargetsRule sub rule,
when {
$rel(source: $entity, target: $target) isa targets;
(location: $location, localized: $rel) isa localization;
}, then {
(source: $entity, target: $location) isa targets;
};

Ready for a ride?
https://demo.opencti.io
DEMONSTRATION

Query language
match $intrusionSet isa Intrusion-Set;
{$intrusionSet has name "APT28";} or {$intrusionSet has name "Turla";} or {$intrusionSet has name "FIN6";};
$attackPattern isa Attack-Pattern;
$relations($intrusionSet, $attackPattern) isa uses;
get;

GRAKN READY FOR PRODUCTION?
Is Grakn ready for production complex use cases? YES!
But today…
ElasticSearch
We need data indexation for ordering and filtering.
The only way we found for now is to use
Automatic data migration between major releases in case of data structure upgrade is a must have. We did it one
time and we definitively need a built-in solution.
No database migration
No full indexing
https://www.elastic.co

GRAKN READY FOR PRODUCTION?
Grakn provides all the basic interfaces you need to make it work but it will be very helpful to have more API
around update / delete, query builder and various languages and simpler drivers.
Lack of syntactic sugar
Is Grakn ready for production complex use cases? YES!
But today…
Inference is really interesting for OpenCTI, so we already use it as much as possible. We need a simpler/better
answer structure for the inference explanation.
You should not have this kind of problem up until some advanced usage.
Difficult inference explanation

Powerful importation and exportation system
and storage system
Allow users to interact with
nested relations
Medium term
Automatic data completion and
enrichment
Advanced analytics and visualization

Implement multiple levels of knowledge in
the same context
Use Grakn capabilities to add
further correlation features
Implement an investigation graph in the UI using
Grakn capabilities
Long term

Graph theory and ML
For investigation purposes
compute path from V229424, to V446496;
Compute the shortest path
compute centrality in [Intrusion-Set, Attack-Pattern, Sector], using degree;
Find the most interesting instances
compute cluster in [Intrusion-Set, Attack-Pattern, Sector], using connected-component;
For knowledge purposes
Identify clusters
Extract named entities and relationships in context using NLP and ML
https://www.microsoft.com/security/blog/2019/08/08/from-unstructured-data-to-actionable-
intelligence-using-machine-learning-for-threat-intelligence/
Powered by
Powered by
Powered by

Questions?
Thank you for your attention
github.com/OpenCTI-Platform
681 83 10
Released 2 months ago (2019-06-28)
samuel.hassine@luatix.org julien.richard@luatix.org
Join us on Slack
https://slack.luatix.org

Building a Cyber Threat Intelligence Knowledge Management System (Paris August 2019)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Building a Cyber Threat Intelligence Knowledge Management System (Paris August 2019)

Similar to Building a Cyber Threat Intelligence Knowledge Management System (Paris August 2019) (20)

More from Vaticle

More from Vaticle (20)

Recently uploaded

Recently uploaded (20)

Building a Cyber Threat Intelligence Knowledge Management System (Paris August 2019)

Editor's Notes