Proper data modeling is probably the most underrated aspect of security data analysis. Our addiction to logs and string pattern matching as a primary source of knowledge have painted the security industry practitioners into a corner. The data never tells the full story, and the path to discovery is laborious and painful.
We'll discover how graph based ontologies can help consolidate all relevant information across technical verticals, model expert knowledge, and serve as a single source of knowledge. We'll discuss how semantic reasoning can revolutionize low-level data analysis and reduce 'zombie workflows' by automatically drawing hard logical conclusions the same way a human analyst does.
2. Agenda
Who Am I?
What Is This?
A Bit of Perspective
Semantic Technology
Things, not Strings
Power of Ontologies
Real Intelligence with Reasoning
New Opportunities
3. Who Am I
Anton Goncharov, CISSP
20+ years Information Security
Management
12+ years Security Information
Management
Semantic Reasoning battle scars
My passion: data tells a story
https://linkedin.com/in/
securityservices/
https://twitter.com/ag0x00
4. What’s This About
A. Our current approach to InfoSec Management is broken
B. Semantic Reasoning can help
13. Semantic Technology
Capture and represent knowledge as a network of facts
Automatically make conclusions based on these facts
Fill gaps in information provided by data
Focus on problems of higher order
15. Brief Explanation of OWL and RDF
OWL2 is an ontology description language built on top of the Resource
Description Framework (RDF). RDF describes a way of storing data that is
different from the traditional table-based conception.
RDF data consists of triples, and only triples; each triple, called
a statement consists of a subject, predicate, and an object.
The subject represents a resource of some kind, the predicate a relation, and
the object can either be a literal value or another resource.
OWL2 and the RDF Schema (RDFS) define a set of resources and properties
that can be used to develop ontologies for RDF datasets.
18. Object Based Processing (OBP)
Same ‘username’ label can refer to accounts on multiple hosts
with the same name (‘root’)
Thinking about everything as objects and relationships
between them is how we understand our world
Friend “George” vs an unknown friend who gambles, watches
QVC, buys a fancy litter box, does not own a cat
Which one is better described?
19. Defining Uniqueness
Q: So how do we know if two “things” are the same?
A: It depends:
In some cases (e.g. IP address), label is enough - 127.0.0.1 is the same address, even though it
might be used by every host
In some cases, not enough — like a person’s name.
Either look for unique identifier (like SSN or passport number), or
Use probabilistic approach and leverage known relationships.
For example, there’s probably only one John Smith that works at ‘Marketing', out of
‘London’ office, and reports to 'Bill Baker’.
There is are many ‘eth0’ network interfaces, but only one on your host.
20. The Power of Language
‘MYLAPTOP’
(host)
‘ACMEANTON’
(credential)
exists
on
Subject ObjectPredicate
“I can use domain account to log in to my laptop.”
Vertex VertexEdge
21. Event as a subgraph
Attempted
Authentication
(event)
‘MYLAPTOP’
(host)
‘ACMEANTON’
(credential)
‘PRDSERVER’
(host)
‘ANTON’
(auth token)
’10.0.0.1’
(ip address)
exists
on
initiated
by
initiated
from
uses
attempted
at
uses
23. Why Use Ontology
Federates data in common language — search across domains
Facilitates reasoning — automate low level data analysis
Supports “analytic pivoting” — answer questions you didn’t
realize you had
Chains attack evidence — find parts of other attacks
24. Some useful concepts
1. Inheritance
2. Reverse edges
3. Axioms (reasoning rules) Employee
Person
DoB
DL #
DoB
DL #
Employee ID
Class
Subclass
Domain
Host
belongs to
contains
1.
2.
26. OWL Example
:UserAccount a owl:Class ;
rdfs:comment "an individual set of credentials."@en ;
rdfs:label “User Account”@en ;
rdfs:subClassOf owl:Thing .
:hasFullName
a owl:DatatypeProperty ;
rdfs:comment "and extended name or description, used only for display purposes"@en ;
rdfs:label "has full name"@en ;
rdfs:domain :UserAccount ;
rdfs:range xsd:string .
:memberOfGroup
a owl:ObjectProperty ;
rdfs:domain :UserAccount ;
rdfs:label "is member of Group"@en ;
rdfs:range :Group ;
Source: https://github.com/twosixlabs/icas-ontology
27. Existing Ontologies for Cybersecurity
By Mark Philpot:
https://github.com/daedafusion/cyber-ontology
Focuses on intelligence standards like CAPEC, STIX, CVE, etc.
Integrated Cyber Analysis System (ICAS), DARPA funded:
https://github.com/twosixlabs/icas-ontology
Healthy mix of intelligence feeds and internal environment objects
MITRE DFAX:
https://www.sciencedirect.com/science/article/pii/S1742287615000158
Built around CybOX
Focused on digital forensic investigations
32. Opportunities
Contextual Analytics — analyze facts instead of raw data
Clustering — “this host looks and acts a lot like Oracle DB
servers”
Outliers — “very unusual permissions for a salesperson
compared to the rest of the Sales team”
Similar Subgraphs — “this subgraph is an attack; find other
similar subgraphs”
33. Words of caution
• RDFS struggles with dynamic and temporal facts
A. Stick to property graphs
B. Express states as object
C. Make Edge a special Vertex class
D. Use GRAKN (https://grakn.ai)
• OWL ontologies can get complicated
• Use Turtle and visual editors (https://en.wikipedia.org/wiki/
Ontology_(information_science)#Editors)
• Continuously check grammar and dependencies
34. Parting Words
Semantic Web is not just for Google
anymore
You’re always dealing with Things
Judicious data modeling
Automate low level analysis
Manage knowledge
Stay in touch
Source: https://gizmodo.com/this-google-dream-bot-inspired-artwork-is-mind-blowing-1761049728