Semantically-Enabled Digital Investigations

Semantically-enabled Digital
Investigations
by Spyridon Dosis

Outline
• Problem
• Background
• Developed Method
• Demonstration
• Conclusions
2015-05-17 ISACA Dagen 2013

Problem Area
• Complex attacks against
networked systems
• Multiple data sources of possible
evidentiary value
– Volume & Variety
– ”looking for a needle in a stack of
needles” – Paul Pillar, CIA CoA
• Analysis of the collected digital
data
– Least formalized process step
– Rely on investigators’ expertise and
experience
2015-05-17 ISACA Dagen 2013

Digital Evidence / Investigations
• Reliable digital data that support
hypothesizing about a security
incident
• Sound methods for collecting and
interpreting digital data
• Reconstruct events found to be
criminal (DF)
• Investigate and learn from
information security breaches (IR)2015-05-17 ISACA Dagen 2013

Forensic Tools
• Interpreters between data
abstraction layers
– e.g. Reconstruct raw disk data into
filesystem hierarchy and objects (files,
directories)
• Evidence- but not investigation-
centric design
• Limited tool interoperability
– Manual integration of tool findings
– Multiple (proprietary, undocumented)
data formats/models
2015-05-17 ISACA Dagen 2013

A Digital Investigation Example
ISACA Dagen 20132015-05-17

Semantic Web & Linked Data
Technologies
• ”… information is given well-defined
meaning, better enabling computers
and people to work in cooperation” –
(Tim Berners Lee, 2001)
• Ontology – ”explicit and formal
specification of a conceptualization”
– Entities, attributes, relationships
• Metadata - Context-based or domain-
specific annotation of data
• Reason and inference of implicit facts
2015-05-17 ISACA Dagen 2013

Semantic Web Architecture
• URI/IRI enables global data object
identification
• XML provides a machine readable,
validatable data encoding scheme
• RDF(S) is a metadata data model and
knowledge representation language
– Subject-Property-Object/Value statements
– Class and Property hierarchies
• OWL 2 is a more expressive KR
language for specifying ontologies
– Restrictions, Equivalence, Cardinality,
Property Chains
• Rule and RDF-query languages
2015-05-17 ISACA Dagen 2013

Method Overview
2015-05-17 ISACA Dagen 2013
Data Collection
Semantic
Representation
Ontological
Reasoning
Rule-based
Reasoning
Integrated
Query

Domain Ontologies
• Introduced a set of lightweight domain-specific OWL
ontologies
– Storage Media
– Network Traffic
– Windows Firewall Log, WHOIS RIR DB
– Malicious Networks Reputation List
– Malware Detection
2015-05-17 ISACA Dagen 2013

Evidence Representation (Graph)
2015-05-17 ISACA Dagen 2013

Semantic Representation
• Resource Unique Identification Scheme
• Parsing tools able to process each source type with
respect to the domain ontology
2015-05-17 ISACA Dagen 2013

Evidence Integration
• Automated linking among (homo/hetero-)geneous evidence
sources based on key properties & matching rules
2015-05-17 ISACA Dagen 2013

Evidence Correlation
• Link instances of dissimilar
type across a shared
domain
• Temporal Correlation
– Rules for establishing time
instant & interval relations
among recovered artifacts
• Mereological Correlation
– “partOf” transitivity relations
2015-05-17 ISACA Dagen 2013

Semantic Integration & Correlation
2015-05-17 ISACA Dagen 2013

Integrated Query
• Purpose-built triplestore (graph) database engine can
store the final dataset
– Up to billions of triples
• SQL-like queries against the integrated/correlated
evidence set
• Graph pattern matching
techniques
2015-05-17 ISACA Dagen 2013

A PoC Instantiation
• Evidence Manager
• Filtering / Pre-processing
• Semantic Parser
• Inference Engine
• Classification, Inverse &
Transitive Properties
• Rule & Query Engines2015-05-17 ISACA Dagen 2013

Experiment A
2015-05-17 ISACA Dagen 2013

Experiment B
2015-05-17 ISACA Dagen 2013

Sample Query
• “Is any file resident on the disk malicious and if yes where
has it been downloaded from and which ISP did the IP
belong to?”
2015-05-17 ISACA Dagen 2013

Sample Query
SELECT DISTINCT ?pathName ?uri ?ipvalue
?asnumber ?link
WHERE {
?file rdf:type digitalmedia:File .
?file digitalmedia:hasPathName ?pathName .
?file digitalmedia:hasMD5 ?md5 .
?httpbody integration:HTTPContentToMediaFile ?file .
?file integration:MediaFileToVTFile ?vtfile .
?vtfile virustotal:hasAVReport ?report .
?report virustotal:hasPermanentLink ?link .
?httpresp http:body ?httpbody .
?httpreq http:requestURI ?uri .
?httpreq http:resp ?httpresp .
?http packetcapture:hasHTTPRequest ?httpreq .
?http rdf:type packetcapture:HTTP .
?tcpflow packetcapture:hasApplicationLayerProtocol
?http .
?tcpflow packetcapture:hasDestinationIP ?destip .
?destip packetcapture:hasIPValue ?ipvalue .
?destip integration:PcapIPToWHOISIpAddr ?whoisip .
?whoisip whois:isContainedInRange ?range .
?range whois:hasRange ?rangeValue .
?range whois:isContainedInAS ?as .
?as whois:hasNetName ?netname .
?as whois:hasASNumber ?asnumber
2015-05-17 ISACA Dagen 2013

Example Hypothesies-Queries
• Have there been any unsuccessful connection attempts
from systems in the same network as the one that hosted
the malicious file?
• Which disk files have been created or accessed shortly after
the malicious file was downloaded?
• Has there been any successful connection between our
system and a known malicious host?
• Which files have been accessed shortly before the host
communicated with any blacklisted network host?
• Which websites have been visited by the user shortly
before the download of the malicious file?
2015-05-17 ISACA Dagen 2013

Summary
• Ability to represent and integrate heterogeneous data
• Supports the formulation and execution of complex queries
• Expandable (ontologies, rules, queries)
• Computational complexity depends on the ontology, rules,
amount of data
• Reliance to online data sources may affect the accuracy of
the results
2015-05-17 ISACA Dagen 2013

Future Work
• Advanced reasoning capabilities (e.g. detect
anti-forensic inconsistencies)
• Extended analysis techniques (e.g. additional
data sources, user activities)
• Large scale performance evaluation, distributed
architecture
• User-friendly graphical interface for rule/query
formulation and result navigation
2015-05-17 ISACA Dagen 2013

Thank you
2015-05-17 ISACA Dagen 2013

Semantically-Enabled Digital Investigations

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (16)

Similar to Semantically-Enabled Digital Investigations

Similar to Semantically-Enabled Digital Investigations (20)

Recently uploaded

Recently uploaded (20)

Semantically-Enabled Digital Investigations