SlideShare a Scribd company logo
1 of 39
A M E T H O D F O R S E M A N T I C I N T E G R A T I O N
A N D C O R R E L A T I O N O F D I G I T A L E V I D E N C E
U S I N G A H Y P O T H E S I S - B A S E D A P P R O A C H
Semantically-Enabled Digital
Investigations
by Spyridon Dosis
February 2013, Stockholm
Problem Definition
 Sophisticated attacks against highly interconnected
networked systems.
 Multitude, variety and size of data sources with
possible evidentiary value.
 Need for continuous state-of-the-art technical
expertise.
 Evidence-oriented first-generation forensic tools
with poor integration and correlation features.
 Lack of common, standardized data
representation/abstraction formats.
Research Questions and Limitations
 How can the Semantic Web technologies and the Linked Data
initiative be applied to Digital Forensics?
 How a common ontological-based knowledge representation
layer can improve the level of integration of currently disjoint
specialized areas of DF such as storage, network, mobile, live
memory and others?
 How such a new method may improve the efficiency and
capabilities of existing DF investigation models, techniques and
tools?
 Not full coverage of the features and capabilities of the
Semantic Web technologies.
 Simplified complexity for the conducted experiments.
Digital Evidence
 “any digital data that contain reliable information
that supports or refutes a hypothesis about an
incident” – (Carrier & Spafford 2004)
 Continuously increasing scope
 Varying layers of abstraction
 (Schatz 2007) identifies 3 basic properties
 Latency -> Semantic Interpretation
 Fidelity -> Chain of Custody
 Volatility -> Order of Volatility
Digital Investigations
 The set of principles and methods that are followed
during the lifecycle of digital evidence with the goal
of event reconstruction.
 Slight definition variations among different contexts.
 The Event-based Digital Forensic Investigation
Framework (Carrier & Spafford 2004)
 System Preservation, Evidence Searching, Event
Reconstruction
 The Digital Investigation Process (Casey 2004)
 The Hypothesis-based Approach (Carrier 2006)
Semantic Web Technologies
 “… information is given well-defined meaning, better
enabling computers and people to work in
cooperation” – (Tim Berners Lee 2001)
 Metadata – Annotation of data providing contextual
or domain-specific information about the content
 Ontology – “explicit and formal specification of a
conceptualization” – (Gruber 1993)
 Entities, attributes, interrelationships
 Open world assumption
 Reasoning over data by inferencing implicit
conclusions
Semantic Web Architecture : Part A
adapted from Antoniou & Van Harmelen 2004
• URI/IRI enables unique
identification of a resource under a
global scope.
• XML provides a consistent
machine-consumable data encoding
scheme in an unambiguous scoped
manner.
• XML Schema used for defining the
rules and the ‘tag’ vocabulary that
data must conform against.
• RDF provides a simple but flexible
data model for encoding metadata
• Subject-Predicate-Object
• RDF Schema used for defining RDF
vocabularies
• Class and Property hierarchies
Semantic Web Architecture : Part B
adapted from Antoniou & Van Harmelen 2004
• OWL 2 is a computational logic-based
language that enables automated
reasoning for inferencing and
consistency verification.
• Increased expressivity
• Property Restrictions
• Class and Property Equivalency
• Property Relationships
• Global Cardinality Constraints and
Individual Identity (no unique-
names assumption)
• OWL Dialects for varying levels of
expressiveness and computational
complexity.
• SWRL supports more advanced
reasoning cases.
• SPARQL is an RDF-based query
language and protocol
Previous Work #1
 XML-based Approaches
 Digital Forensics XML (Garfinkel 2009) for describing disk images
and their contents (partitions, files, byte runs).
 EDRM XML for describing electronic document metadata.
 XIRAF for XML-based extraction, storage and querying of evidence
files.
 DEX for including provenance-related metadata.
 Other domain-specific XML approaches for live forensics, network
forensics, vulnerability assessment, logs, malware.
 Support a level of tool interoperability and standarization
 No support for automated reasoning or semantic
integration of data.
Previous Work #2
 RDF-based Approaches
 AFF forensic format uses RDF for including arbitrary metadata
(system or process-related, user-specific ones)
 Strengthening the chain-of-custody by additional RDF metadata
(evidence-access, examiner or artifact-related information) (Giova
2011)
 Ontological Approaches
 FORE (Schatz 2004) comprised of a log parser, a forensic ontology
and a custom rule language for aggregating lower level events into
higher level ones. Later expanded by referencing external ontologies.
 DIALOG conceptualized ‘procedural’ and ‘practical’ aspects of a
digital investigation with practical examples of registry analysis.
Later expanded with additional concepts for encoding forensically
relevant types of data.
 (Saad 2010) applied an ontology in the network forensics area for
modeling network attacks and supporting different types of
reasoning based on collected events
Methodology
 Two main research paradigms in IT (Hevner 2004)
 Behavioural Science
 Design Science
 Outcomes of a design science process can be:
 Constructs
 Models
 Methods
 Instantiations
Design Science Method
adapted from Johannesson & Perjons 2012
• Problem Specification
• Literature Review
• Case studies
• Empirical Observations
• Artifact Outline and
Requirements
• Literature Review
• Case Studies
• Design and Development
• Artifact Demonstration
• Laboratory Experiment
(Simulated cases)
• Artifact Evaluation
• “ex ante evaluation”
• Communication of the artifact
A Semantic Web approach for Digital Investigations
 Information Integration
 Common identifiers
 Different identifiers
A Semantic Web approach for Digital Investigations
 Semi-structured Data Support
 Classification and Inference
 Extensibility
 Provenance
 Named Graphs
 Search
Relation to Digital Investigation Reference Models
• Conceptual Mapping between the Semantic Web
architecture and digital investigation frameworks
• Previous phases are assumed as prerequisites
Evaluation Criteria
 Goal – Question – Metric (GQM) approach
Generic Criteria
Goal Questions Metric
The proposed method
should be appropriate
for the task in hand
What is the relationship of the proposed method with
existing digital investigations practices and tools?
What are the case context requirements for the method
to be applied?
The ability of the method to handle different types of cases (network-related
events, media devices examination etc.) measured by the number of different
data types it can process.
The method should
provide good support
for decision-making by
providing relevant and
usable results.
What are the types of new knowledge that such the
method can extract and what is its usefulness.
How can the examiner formulate and evaluate
hypotheses about the evidence files and receive
informative results
The ability of the method to support arbitrary queries and provide answers over
the whole body of collected evidence. This can be quantified by the precision and
recall information retrieval measures over the query results.
The method should be
cost effective in terms
of storage and time
needs
How the method accepts and stores input data,
intermediate and final results. What are the storage
requirements for such an implementation?
How much time is needed for applying the method on
the input data and how can it reduce the time that the
investigation process takes?
Storage size requirements for representing input and output data.
Time needed for performing the analysis of data or evaluating user-submitted
queries.
The method should be
flexible and scalable
Can the method deal with new sources of data or being
able to seamlessly integrate new forms of ontologically-
expressed knowledge and rules.
Can the method support large amounts of data and what
problems such complexity may incur?
The ability of the method to process new data and accept additional ontologies or
rules without the need of major (possibly even none) modifications on the
existing steps. It can be measured by the amount of configuration or code
modifications such changes may require.
The method’s ability to handle large amounts of data. It can be measured by the
amounts of input size in relation to the processing time or produced errors (e.g.
number of captured network packets, firewall logs, disk image sizes etc.)
Evaluation Criteria
Forensic Criteria
Goal Questions Metric
The method’s results
should be reproducible
Are the results of the method behave in a deterministic manner when applied on
the same input data or they are inconsistent among multiple tests?
The method’s results (e.g. inferred axioms, query
results) should be the same given the same dataset and
independently of other factors like order of processing
the evidence files. This can be measured by the number
of errors or different results after multiple applications
of the method on the same dataset.
The method’s possible
errors should be minimal
and determined
Does the method produce accurate results? Can the method accept inconsistent
or malformed input data? How the method deals with incomplete data? Can the
method produce results that are ambiguous or inconsistent to the specified
ontologies?
The method’s results can be automatically checked by a
reasoning engine for possible inconsistencies between
asserted and inferred axioms and the given ontologies.
The method’s error rate can be measured by the error
messages produced during its lifecycle.
The method must provide
logging capabilities for
the inclusion of arbitrary
metadata regarding the
case, the entities and the
evidence objects
involved.
Does the method support the addition of annotation axioms with respect to the
asserted or inferred axioms?
Does the method allow the logging of the various steps of it as they are applied
and their results produced?
The ability to insert logging information during the
method can be measured by its flexibility to accept
arbitrary metadata.
The method should
protect the integrity of
the collected data
Can the method operate on forensic copies of the collected evidence?
Does the method use hashing algorithms in order to ensure the consistency and
integrity of these forensic copies?
The method should protect the integrity of the
collected data, files and devices throughout its whole
lifecycle by being able to work on forensic copies
instead of the original and verify any hash values that
these copies carry as forensic metadata. The ability of
performing these checks for different data sources can
be considered as a metric.
Evaluation Criteria
Semantic Web Related Criteria
Goal Questions Metric
System Heterogeneity –
Platform Independence
Can parts of the method be applied in different system and the partial results
later recombined? Are there any restrictions with respect to the configuration of
these analysis systems?
The ability of the method to be successfully applied in
different system configurations can be measured
through multiple tests in different systems.
Implementable with the
current Semantic Web
Stack
Can the method’s steps that utilize Semantic Web concepts be implemented
with current technology or other improvements/extensions are needed?
The method should be able to rely on existing
Semantic Web technologies without the need to
develop or improve their current status. Errors
produced or modifications needed when
implementing the proposed method can be considered
a metric of how much implementable the method
currently is.
The method and its
results should be
semantically rich
allowing the description
of high level contexts and
events along with their
interrelationships.
Can the method describe arbitrary data? Can the method accept descriptions of
high level and user-defined concepts and associate set of lower level events into
them? Can the method establish relationships between these higher level
descriptions?
The method should be able to accept user-defined high
level concepts and associate lower level events to them
using well defined rules/restrictions. Errors produced
or inability to define custom-defined events can be
considered as a metric of how semantically rich the
method is.
Description of the Method
 Design structure of the method
 The Data Collection phase assumes proper acquisition
techniques and possible pre-processing tasks.
 Ontological representation based on light-weight domain
specific ontologies to the RDF data model.
 Automated Reasoning for inferencing new axioms (class,
property, inverse property assertion axioms).
 Rule evaluation / integration with rule engines.
 Integrated query against the set of asserted and inferred
axioms.
Ontological Representation of Evidence
 Two types of data
 Case Related Data
 Storage Media Forensic Images, Network Packet Captures,
Firewall Logs
 Supportive Data
 WHOIS domain information, IP geo-location, IP to ASN
mappings, databases of malicious files or hosts
 Lightweight ontologies have been specified with the
Protégé Ontology Editor based on
 PCAP Network Captures, Disk Images, Windows XP Firewall
Logs, WHOIS RIPE Database, VirusTotal, FIRE malicious
networks tracker
Ontologies
 Network Capture
 Protocol stack
reconstruction
 Focused on HTTP
 W3C ERT RDF
vocabulary for HTTP
 Forensic Disk
Image
 DFXML and fiwalk
 Timestamps, hash
values, file type
Ontologies
 Windows XP
Firewall Log
 W3C Extended
Log File Format
 RIPE WHOIS
 RIPE NCC web
interface
 XML/JSON formatted
results
Ontologies
 Malicious Networks
 FIRE project
(Wombat EU FP7)
 Aggregation from
sources like
 Anubis, Wepawet,
SpamCop, PhishTank
 Web interface (Discontinued)
 Malware Detection
 VirusTotal provides a
web interface to a variety
of antimalware engines
 Database search web interface
based on hash values
Semantic Integration of Evidence
 URI Format
 urn://<source_id>/<resource_ID>
 Ontological representation
 Natively supported / Semantic Parsers
 De-duplication
 Single URI resource representation under the same namespace
 owl:sameAs for same resource / differently namespaced URIs
 OWL 2 hasKey
 SWRL rules for integrating individuals in different ontologies
 Realistic (manual) approach
 Integration ontology (IP address, MD5 hash value)
PacketCapture :
IPAddress
WindowsXPFirewallLog :
Host
PcapIPToFWLogHost
Semantic Correlation of Evidence
 Establishing relations between resources of different
nature.
 Temporal Correlation
 SWRL Temporal Ontology (Connor & Das 2011)
 Support for time instants and intervals
 Two approaches
 Modify existing ontologies by
importing the time ontology.
 Specifying existing classes as
subclasses of ‘ExtendedProposition’
in an external ontology.
Semantic Correlation of Evidence
 Temporal Correlation (Cont’d)
 Relations between time intervals
 Allen’s Interval Algebra (Allen 1983)
 Relations between time instants
and intervals
 ‘inside’,’before’,’after’ (Hobbs 2004)
 SWRL builtins
 Mereological Correlation
 ‘partOf’ relations
 Transitivity
 E.g. IP address (partOf) IP range (partOf) AS =>
IP address (partOfAS) AS
Integrated Query Formulation and Evaluation
 Two methods of query
preparation
 Precomputing inferred axioms
 Back-propagation
 Two methods of query
evaluation
 Merging ontologies
 Named graphs
(Distributed SPARQL
Endpoints)
A Reference Implementation
 Tools Used
 Java 6
 Protégé 4.1.0
 OWL API 3.2.4
 Pellet 2.3.0
 Protégé OWL API 3.4.8
 Jena 2.6.4
 Jess 7.1p2
 Kraken Pcap API 1.3.0
 Apache HTTP Components, Jsoup, JSON
A Reference Implementation
• Evidence Manager
• Load evidence files
• Semantic Parser
• 6 parsers
• Filtering options (NIST NSRL)
can lead to 40-50% reduction
of an XP image.
• Collector Objects
• Reduce complexity
• Coupled with parsers
• Inference Engine
• Class Assertion
• Inverse Property Assertion
• Integration Ontology
• Investigator-specific
classes/properties
• SWRL Rule Engine
• SPARQL In-memory endpoint
Experimental Setup
 2x HP Compaq 8000 Elite
 Intel Core 2 Duo E8400 Processor
 4 GB RAM
 Microsoft XP SP 3
 Backtrack 5 R1
 MS11-006
 Vulnerability in Windows Shell Graphics Processing
 Office documents in thumbnail mode
 Analysis Workstation
 Dell XPS 15
 Intel Core i7
 4GB RAM
Experiment A
Experiment B
Results (Experiment A)
CompromisedSystem.xml (Fiwalk output of the system’s disk image)
Original Disk Size 25GB
Original Fiwalk XML output File Size 9,46MB
RDF/XML Serialization File Size 7,08MB
Number of Allocated Files in the Disk 6610
Number of Nodes in the Graph Representation 34012
Number of Edges in the Graph Representation 83032
Network Packet Capture (filtered for the system’s IP address and TCP protocol only)
Original File Size 454KB
RDF/XML Serialization File Size 662KB
Number of TCP sessions 40
Number of Nodes in the Graph Representation 1616
Number of Edges in the Graph Representation 5891
Windows XP Firewall Log of the compromised system
Original File Size 38KB
RDF/XML Serialization File Size 684KB
Number of Log Entries 413
Number of Nodes in the Graph Representation 1344
Number of Edges in the Graph Representation 5866
RIPE NCC WHOIS Database
RDF/XML Serialization File Size 210KB
Number of Queried IP Addresses 37
Number of Nodes in the Graph Representation 137
Number of Edges in the Graph Representation 395
FIRE Malicious Networks Database
RDF/XML Serialization File Size 113KB
Number of Queried Autonomous Systems 5
Number of Nodes in the Graph Representation 384
Number of Edges in the Graph Representation 1083
VirusTotal Anti-Malware Web Service
RDF/XML Serialization File Size 2,45MB
Number of Queried and Indexed by VT Files 2304
Number of Nodes in the Graph Representation 11519
Number of Edges in the Graph Representation 18508
Results (Experiment A)
 Reasoning Engine
 72130 inferred axioms (approx. 6.1 MB)
 SWRL Engine
 160 ‘bridging’ properties
 PacketCapture:hasIPValue(?x,?y) ^ WindowsXPFirewallLog:hasAddress(?w,?z) ^
swrlb:stringEqualIgnoreCase(?y,?z) ->
IntegrationOntology:PcapIPToFWLogHost(?x,?w)
 39610 time-related re-mapping properties
 DigitalMedia:File(?x) ^ DigitalMedia:hasFileModificationTime(?x,?y) ^
temporal:ValidInstant(?z) ^ temporal:hasTime(?z,?w) ^
swrlb:stringEqualIgnoreCase(?y,?w) ^
swrlx:makeOWLThing(?filemodificationevent,?x) ->
IntegrationOntology:FileModificationEvent(?filemodificationevent) ^
IntegrationOntology:Event(?filemodificationevent) ^
temporal:hasValidTime(?filemodificationevent,?z)
Results (Experiment B)
CompromisedSystem.xml (Fiwalk output of the system’s disk image)
Original Disk Size 25GB
Original Fiwalk XML output File Size 9,34MB
RDF/XML Serialization File Size 6,44MB
Number of Allocated Files in the Disk 3273
Number of Nodes in the Graph Representation 16330
Number of Edges in the Graph Representation 45039
Network Packet Capture (filtered for the system’s IP address and TCP protocol only)
Original File Size 2,63MB
RDF/XML Serialization File Size 2MB
Number of TCP sessions 57
Number of Nodes in the Graph Representation 5419
Number of Edges in the Graph Representation 21712
Windows XP Firewall Log of the compromised system
Original File Size 46KB
RDF/XML Serialization File Size 784KB
Number of Log Entries 480
Number of Nodes in the Graph Representation 1510
Number of Edges in the Graph Representation 6794
RIPE NCC WHOIS Database
RDF/XML Serialization File Size 38KB
Number of Queried IP Addresses 41
Number of Nodes in the Graph Representation 181
Number of Edges in the Graph Representation 326
FIRE Malicious Networks Database
RDF/XML Serialization File Size 113KB
Number of Queried Autonomous Systems 5
Number of Nodes in the Graph Representation 384
Number of Edges in the Graph Representation 1083
VirusTotal Anti-Malware Web Service
RDF/XML Serialization File Size 54KB
Number of Queried and Indexed by VT Files 2540
Number of Nodes in the Graph Representation 253
Number of Edges in the Graph Representation 386
Results (Experiment B)
 Additional Temporal Rules
 temporalBefore between
 Time Instants
 Time Intervals
 Time Instants and Time Periods
 Time Periods and Time Instants
 temporalStarts
 temporalInside
 1024 ValidInstant individuals
 21 ValidPeriod individuals
 58854 inferred temporal relations
Example Hypotheses - Queries
Hypoth
esis
The investigator hypothesizes that the compromised system may have had network
communications with external IP addresses that belong to autonomous systems that may be
listed as malicious networks.
Query SELECT ?tcpflow ?destipvalue ?netname ?asnumber ?host_fire
WHERE {
?tcpflow packetcapture:hasDestinationIP ?destip .
?destip packetcapture:hasIPValue ?destipvalue .
?destip integration:PcapIPToWHOISIpAddr ?whoisip .
?whoisip whois:isContainedInRange ?range .
?whoisip integration:WHOISIpAddrToFireIPAddr ?fireip .
?fireip fire:IPbelongsToHost ?host_fire .
?host_fire rdf:type fire:MaliciousHost .
?range whois:hasRange ?rangeValue .
?range whois:isContainedInAS ?as .
?as whois:hasNetName ?netname .
?as whois:hasASNumber ?asnumber .
?as whois:hasRoute ?route
}
Results tcpflow destipvalue netname asnumber
<urn://bind_tcp_F
Wed_tcp.pcap#tcpS
ession_6>
"78.46.173.193"
^^<http://www.w3.
org/2001/XMLSche
ma#string>
"HETZNER-AS"
^^<http://www.w3.
org/2001/XMLSche
ma#string>
"24940"
^^<http://www.w3.
org/2001/XMLSche
ma#string>
<urn://bind_tcp_F
Wed_tcp.pcap#tcpS
ession_4>
"78.46.173.193"
^^<http://www.w3.
org/2001/XMLSche
ma#string>
"HETZNER-AS"
^^<http://www.w3.
org/2001/XMLSche
ma#string>
"24940"
^^<http://www.w3.
org/2001/XMLSche
ma#string>
Interpr
etation
The results of the query support the hypothesis that the compromised system had indeed
network communications with IP addresses that belongs to autonomous systems known to
demonstrate malicious behavior. The query is able to match a graph pattern in the provided
dataset thus retrieving additional information regarding the specific blacklisted AS.
Evaluation
 The method can be relevant to a lot of different cases
due to its ability to deal with heterogeneous data.
 Ability to formulate complex and expressive queries
over the integrated data that match closely logical
hypotheses
 Efficient data abstraction and query evaluation,
given axiom pre-inference
 Inverse object properties can improve considerably query
evaluation time
 Evidence-neutral implementation
 Temporal correlation can be computationally demanding
Evaluation
 Reliance to online source may affect the precision of
the results.
 Ontological consistency of the results given valid
ontologies.
 The implementation can be system-independent.
 Ontologies can be dynamically expanded or new
ones (case-specific) introduced.

More Related Content

What's hot

data Fusion and log correlation
data Fusion and log correlationdata Fusion and log correlation
data Fusion and log correlationMahdi Sayyad
 
a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...
a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...
a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...Manoj895639
 
Performance evaluation of decision tree classification algorithms using fraud...
Performance evaluation of decision tree classification algorithms using fraud...Performance evaluation of decision tree classification algorithms using fraud...
Performance evaluation of decision tree classification algorithms using fraud...journalBEEI
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...IRJET Journal
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptbutest
 
An efficeient privacy preserving ranked keyword search
An efficeient privacy preserving ranked keyword searchAn efficeient privacy preserving ranked keyword search
An efficeient privacy preserving ranked keyword searchredpel dot com
 
A Frame Work for Ontological Privacy Preserved Mining
A Frame Work for Ontological Privacy Preserved MiningA Frame Work for Ontological Privacy Preserved Mining
A Frame Work for Ontological Privacy Preserved MiningIJNSA Journal
 
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...University of Bari (Italy)
 
2016 BE Final year Projects in chennai - 1 Crore Projects
2016 BE Final year Projects in chennai - 1 Crore Projects 2016 BE Final year Projects in chennai - 1 Crore Projects
2016 BE Final year Projects in chennai - 1 Crore Projects 1crore projects
 
An Approach for Managing Knowledge in Digital Forensics Examinations
An Approach for Managing Knowledge in Digital Forensics ExaminationsAn Approach for Managing Knowledge in Digital Forensics Examinations
An Approach for Managing Knowledge in Digital Forensics ExaminationsCSCJournals
 
Efficient Similarity Search Over Encrypted Data
Efficient Similarity Search Over Encrypted DataEfficient Similarity Search Over Encrypted Data
Efficient Similarity Search Over Encrypted DataIRJET Journal
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...Patricia Tavares Boralli
 
LINK MINING PROCESS
LINK MINING PROCESSLINK MINING PROCESS
LINK MINING PROCESSIJDKP
 
Prediction of Answer Keywords using Char-RNN
Prediction of Answer Keywords using Char-RNNPrediction of Answer Keywords using Char-RNN
Prediction of Answer Keywords using Char-RNNIJECEIAES
 

What's hot (20)

data Fusion and log correlation
data Fusion and log correlationdata Fusion and log correlation
data Fusion and log correlation
 
a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...
a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...
a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...
 
IJET-V2I6P33
IJET-V2I6P33IJET-V2I6P33
IJET-V2I6P33
 
Performance evaluation of decision tree classification algorithms using fraud...
Performance evaluation of decision tree classification algorithms using fraud...Performance evaluation of decision tree classification algorithms using fraud...
Performance evaluation of decision tree classification algorithms using fraud...
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...
 
Research Proposal
Research ProposalResearch Proposal
Research Proposal
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
Sub1522
Sub1522Sub1522
Sub1522
 
A Comparison between Flooding and Bloom Filter Based Multikeyword Search in P...
A Comparison between Flooding and Bloom Filter Based Multikeyword Search in P...A Comparison between Flooding and Bloom Filter Based Multikeyword Search in P...
A Comparison between Flooding and Bloom Filter Based Multikeyword Search in P...
 
An efficeient privacy preserving ranked keyword search
An efficeient privacy preserving ranked keyword searchAn efficeient privacy preserving ranked keyword search
An efficeient privacy preserving ranked keyword search
 
A Frame Work for Ontological Privacy Preserved Mining
A Frame Work for Ontological Privacy Preserved MiningA Frame Work for Ontological Privacy Preserved Mining
A Frame Work for Ontological Privacy Preserved Mining
 
Digital Forensic
Digital ForensicDigital Forensic
Digital Forensic
 
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
 
2016 BE Final year Projects in chennai - 1 Crore Projects
2016 BE Final year Projects in chennai - 1 Crore Projects 2016 BE Final year Projects in chennai - 1 Crore Projects
2016 BE Final year Projects in chennai - 1 Crore Projects
 
An Approach for Managing Knowledge in Digital Forensics Examinations
An Approach for Managing Knowledge in Digital Forensics ExaminationsAn Approach for Managing Knowledge in Digital Forensics Examinations
An Approach for Managing Knowledge in Digital Forensics Examinations
 
Efficient Similarity Search Over Encrypted Data
Efficient Similarity Search Over Encrypted DataEfficient Similarity Search Over Encrypted Data
Efficient Similarity Search Over Encrypted Data
 
International Journal of Engineering Inventions (IJEI),
International Journal of Engineering Inventions (IJEI), International Journal of Engineering Inventions (IJEI),
International Journal of Engineering Inventions (IJEI),
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...
 
LINK MINING PROCESS
LINK MINING PROCESSLINK MINING PROCESS
LINK MINING PROCESS
 
Prediction of Answer Keywords using Char-RNN
Prediction of Answer Keywords using Char-RNNPrediction of Answer Keywords using Char-RNN
Prediction of Answer Keywords using Char-RNN
 

Similar to Semantically-Enabled Digital Investigations - Research Overview

SDOT Secure Hash, Semantic Keyword Extraction, and Dynamic Operator Pattern-B...
SDOT Secure Hash, Semantic Keyword Extraction, and Dynamic Operator Pattern-B...SDOT Secure Hash, Semantic Keyword Extraction, and Dynamic Operator Pattern-B...
SDOT Secure Hash, Semantic Keyword Extraction, and Dynamic Operator Pattern-B...Shakas Technologies
 
Peng Privette SMM_AMS2014_P695
Peng Privette SMM_AMS2014_P695Peng Privette SMM_AMS2014_P695
Peng Privette SMM_AMS2014_P695Ge Peng
 
A Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And AnalysisA Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And AnalysisMichele Thomas
 
Paper id 25201431
Paper id 25201431Paper id 25201431
Paper id 25201431IJRAT
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplicationidescitation
 
Entity resolution for hierarchical data using attributes value comparison ove...
Entity resolution for hierarchical data using attributes value comparison ove...Entity resolution for hierarchical data using attributes value comparison ove...
Entity resolution for hierarchical data using attributes value comparison ove...IAEME Publication
 
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...IJEACS
 
Extraction and Retrieval of Web based Content in Web Engineering
Extraction and Retrieval of Web based Content in Web EngineeringExtraction and Retrieval of Web based Content in Web Engineering
Extraction and Retrieval of Web based Content in Web EngineeringIRJET Journal
 
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-RSelecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-RIOSR Journals
 
Digital Forensics by William C. Barker (NIST)
Digital Forensics by William C. Barker (NIST)Digital Forensics by William C. Barker (NIST)
Digital Forensics by William C. Barker (NIST)AltheimPrivacy
 
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...1crore projects
 
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...1crore projects
 
Applying Data Mining Principles in the Extraction of Digital Evidence
Applying Data Mining Principles in the Extraction of Digital EvidenceApplying Data Mining Principles in the Extraction of Digital Evidence
Applying Data Mining Principles in the Extraction of Digital EvidenceDr. Richard Otieno
 
Achieving Semantic Integration of Medical Knowledge for Clinical Decision Sup...
Achieving Semantic Integration of Medical Knowledge for Clinical Decision Sup...Achieving Semantic Integration of Medical Knowledge for Clinical Decision Sup...
Achieving Semantic Integration of Medical Knowledge for Clinical Decision Sup...AmrAlaaEldin12
 

Similar to Semantically-Enabled Digital Investigations - Research Overview (20)

SDOT Secure Hash, Semantic Keyword Extraction, and Dynamic Operator Pattern-B...
SDOT Secure Hash, Semantic Keyword Extraction, and Dynamic Operator Pattern-B...SDOT Secure Hash, Semantic Keyword Extraction, and Dynamic Operator Pattern-B...
SDOT Secure Hash, Semantic Keyword Extraction, and Dynamic Operator Pattern-B...
 
Peng Privette SMM_AMS2014_P695
Peng Privette SMM_AMS2014_P695Peng Privette SMM_AMS2014_P695
Peng Privette SMM_AMS2014_P695
 
A Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And AnalysisA Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And Analysis
 
Paper id 25201431
Paper id 25201431Paper id 25201431
Paper id 25201431
 
Bi4101343346
Bi4101343346Bi4101343346
Bi4101343346
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
ICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptxICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptx
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplication
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Entity resolution for hierarchical data using attributes value comparison ove...
Entity resolution for hierarchical data using attributes value comparison ove...Entity resolution for hierarchical data using attributes value comparison ove...
Entity resolution for hierarchical data using attributes value comparison ove...
 
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
 
Lspnew (1)
Lspnew (1)Lspnew (1)
Lspnew (1)
 
Extraction and Retrieval of Web based Content in Web Engineering
Extraction and Retrieval of Web based Content in Web EngineeringExtraction and Retrieval of Web based Content in Web Engineering
Extraction and Retrieval of Web based Content in Web Engineering
 
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-RSelecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
 
Digital Forensics by William C. Barker (NIST)
Digital Forensics by William C. Barker (NIST)Digital Forensics by William C. Barker (NIST)
Digital Forensics by William C. Barker (NIST)
 
Cloud java titles adrit solutions
Cloud java titles adrit solutionsCloud java titles adrit solutions
Cloud java titles adrit solutions
 
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
 
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...
 
Applying Data Mining Principles in the Extraction of Digital Evidence
Applying Data Mining Principles in the Extraction of Digital EvidenceApplying Data Mining Principles in the Extraction of Digital Evidence
Applying Data Mining Principles in the Extraction of Digital Evidence
 
Achieving Semantic Integration of Medical Knowledge for Clinical Decision Sup...
Achieving Semantic Integration of Medical Knowledge for Clinical Decision Sup...Achieving Semantic Integration of Medical Knowledge for Clinical Decision Sup...
Achieving Semantic Integration of Medical Knowledge for Clinical Decision Sup...
 

Recently uploaded

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Semantically-Enabled Digital Investigations - Research Overview

  • 1. A M E T H O D F O R S E M A N T I C I N T E G R A T I O N A N D C O R R E L A T I O N O F D I G I T A L E V I D E N C E U S I N G A H Y P O T H E S I S - B A S E D A P P R O A C H Semantically-Enabled Digital Investigations by Spyridon Dosis February 2013, Stockholm
  • 2. Problem Definition  Sophisticated attacks against highly interconnected networked systems.  Multitude, variety and size of data sources with possible evidentiary value.  Need for continuous state-of-the-art technical expertise.  Evidence-oriented first-generation forensic tools with poor integration and correlation features.  Lack of common, standardized data representation/abstraction formats.
  • 3. Research Questions and Limitations  How can the Semantic Web technologies and the Linked Data initiative be applied to Digital Forensics?  How a common ontological-based knowledge representation layer can improve the level of integration of currently disjoint specialized areas of DF such as storage, network, mobile, live memory and others?  How such a new method may improve the efficiency and capabilities of existing DF investigation models, techniques and tools?  Not full coverage of the features and capabilities of the Semantic Web technologies.  Simplified complexity for the conducted experiments.
  • 4. Digital Evidence  “any digital data that contain reliable information that supports or refutes a hypothesis about an incident” – (Carrier & Spafford 2004)  Continuously increasing scope  Varying layers of abstraction  (Schatz 2007) identifies 3 basic properties  Latency -> Semantic Interpretation  Fidelity -> Chain of Custody  Volatility -> Order of Volatility
  • 5. Digital Investigations  The set of principles and methods that are followed during the lifecycle of digital evidence with the goal of event reconstruction.  Slight definition variations among different contexts.  The Event-based Digital Forensic Investigation Framework (Carrier & Spafford 2004)  System Preservation, Evidence Searching, Event Reconstruction  The Digital Investigation Process (Casey 2004)  The Hypothesis-based Approach (Carrier 2006)
  • 6. Semantic Web Technologies  “… information is given well-defined meaning, better enabling computers and people to work in cooperation” – (Tim Berners Lee 2001)  Metadata – Annotation of data providing contextual or domain-specific information about the content  Ontology – “explicit and formal specification of a conceptualization” – (Gruber 1993)  Entities, attributes, interrelationships  Open world assumption  Reasoning over data by inferencing implicit conclusions
  • 7. Semantic Web Architecture : Part A adapted from Antoniou & Van Harmelen 2004 • URI/IRI enables unique identification of a resource under a global scope. • XML provides a consistent machine-consumable data encoding scheme in an unambiguous scoped manner. • XML Schema used for defining the rules and the ‘tag’ vocabulary that data must conform against. • RDF provides a simple but flexible data model for encoding metadata • Subject-Predicate-Object • RDF Schema used for defining RDF vocabularies • Class and Property hierarchies
  • 8. Semantic Web Architecture : Part B adapted from Antoniou & Van Harmelen 2004 • OWL 2 is a computational logic-based language that enables automated reasoning for inferencing and consistency verification. • Increased expressivity • Property Restrictions • Class and Property Equivalency • Property Relationships • Global Cardinality Constraints and Individual Identity (no unique- names assumption) • OWL Dialects for varying levels of expressiveness and computational complexity. • SWRL supports more advanced reasoning cases. • SPARQL is an RDF-based query language and protocol
  • 9. Previous Work #1  XML-based Approaches  Digital Forensics XML (Garfinkel 2009) for describing disk images and their contents (partitions, files, byte runs).  EDRM XML for describing electronic document metadata.  XIRAF for XML-based extraction, storage and querying of evidence files.  DEX for including provenance-related metadata.  Other domain-specific XML approaches for live forensics, network forensics, vulnerability assessment, logs, malware.  Support a level of tool interoperability and standarization  No support for automated reasoning or semantic integration of data.
  • 10. Previous Work #2  RDF-based Approaches  AFF forensic format uses RDF for including arbitrary metadata (system or process-related, user-specific ones)  Strengthening the chain-of-custody by additional RDF metadata (evidence-access, examiner or artifact-related information) (Giova 2011)  Ontological Approaches  FORE (Schatz 2004) comprised of a log parser, a forensic ontology and a custom rule language for aggregating lower level events into higher level ones. Later expanded by referencing external ontologies.  DIALOG conceptualized ‘procedural’ and ‘practical’ aspects of a digital investigation with practical examples of registry analysis. Later expanded with additional concepts for encoding forensically relevant types of data.  (Saad 2010) applied an ontology in the network forensics area for modeling network attacks and supporting different types of reasoning based on collected events
  • 11. Methodology  Two main research paradigms in IT (Hevner 2004)  Behavioural Science  Design Science  Outcomes of a design science process can be:  Constructs  Models  Methods  Instantiations
  • 12. Design Science Method adapted from Johannesson & Perjons 2012 • Problem Specification • Literature Review • Case studies • Empirical Observations • Artifact Outline and Requirements • Literature Review • Case Studies • Design and Development • Artifact Demonstration • Laboratory Experiment (Simulated cases) • Artifact Evaluation • “ex ante evaluation” • Communication of the artifact
  • 13. A Semantic Web approach for Digital Investigations  Information Integration  Common identifiers  Different identifiers
  • 14. A Semantic Web approach for Digital Investigations  Semi-structured Data Support  Classification and Inference  Extensibility  Provenance  Named Graphs  Search
  • 15. Relation to Digital Investigation Reference Models • Conceptual Mapping between the Semantic Web architecture and digital investigation frameworks • Previous phases are assumed as prerequisites
  • 16. Evaluation Criteria  Goal – Question – Metric (GQM) approach Generic Criteria Goal Questions Metric The proposed method should be appropriate for the task in hand What is the relationship of the proposed method with existing digital investigations practices and tools? What are the case context requirements for the method to be applied? The ability of the method to handle different types of cases (network-related events, media devices examination etc.) measured by the number of different data types it can process. The method should provide good support for decision-making by providing relevant and usable results. What are the types of new knowledge that such the method can extract and what is its usefulness. How can the examiner formulate and evaluate hypotheses about the evidence files and receive informative results The ability of the method to support arbitrary queries and provide answers over the whole body of collected evidence. This can be quantified by the precision and recall information retrieval measures over the query results. The method should be cost effective in terms of storage and time needs How the method accepts and stores input data, intermediate and final results. What are the storage requirements for such an implementation? How much time is needed for applying the method on the input data and how can it reduce the time that the investigation process takes? Storage size requirements for representing input and output data. Time needed for performing the analysis of data or evaluating user-submitted queries. The method should be flexible and scalable Can the method deal with new sources of data or being able to seamlessly integrate new forms of ontologically- expressed knowledge and rules. Can the method support large amounts of data and what problems such complexity may incur? The ability of the method to process new data and accept additional ontologies or rules without the need of major (possibly even none) modifications on the existing steps. It can be measured by the amount of configuration or code modifications such changes may require. The method’s ability to handle large amounts of data. It can be measured by the amounts of input size in relation to the processing time or produced errors (e.g. number of captured network packets, firewall logs, disk image sizes etc.)
  • 17. Evaluation Criteria Forensic Criteria Goal Questions Metric The method’s results should be reproducible Are the results of the method behave in a deterministic manner when applied on the same input data or they are inconsistent among multiple tests? The method’s results (e.g. inferred axioms, query results) should be the same given the same dataset and independently of other factors like order of processing the evidence files. This can be measured by the number of errors or different results after multiple applications of the method on the same dataset. The method’s possible errors should be minimal and determined Does the method produce accurate results? Can the method accept inconsistent or malformed input data? How the method deals with incomplete data? Can the method produce results that are ambiguous or inconsistent to the specified ontologies? The method’s results can be automatically checked by a reasoning engine for possible inconsistencies between asserted and inferred axioms and the given ontologies. The method’s error rate can be measured by the error messages produced during its lifecycle. The method must provide logging capabilities for the inclusion of arbitrary metadata regarding the case, the entities and the evidence objects involved. Does the method support the addition of annotation axioms with respect to the asserted or inferred axioms? Does the method allow the logging of the various steps of it as they are applied and their results produced? The ability to insert logging information during the method can be measured by its flexibility to accept arbitrary metadata. The method should protect the integrity of the collected data Can the method operate on forensic copies of the collected evidence? Does the method use hashing algorithms in order to ensure the consistency and integrity of these forensic copies? The method should protect the integrity of the collected data, files and devices throughout its whole lifecycle by being able to work on forensic copies instead of the original and verify any hash values that these copies carry as forensic metadata. The ability of performing these checks for different data sources can be considered as a metric.
  • 18. Evaluation Criteria Semantic Web Related Criteria Goal Questions Metric System Heterogeneity – Platform Independence Can parts of the method be applied in different system and the partial results later recombined? Are there any restrictions with respect to the configuration of these analysis systems? The ability of the method to be successfully applied in different system configurations can be measured through multiple tests in different systems. Implementable with the current Semantic Web Stack Can the method’s steps that utilize Semantic Web concepts be implemented with current technology or other improvements/extensions are needed? The method should be able to rely on existing Semantic Web technologies without the need to develop or improve their current status. Errors produced or modifications needed when implementing the proposed method can be considered a metric of how much implementable the method currently is. The method and its results should be semantically rich allowing the description of high level contexts and events along with their interrelationships. Can the method describe arbitrary data? Can the method accept descriptions of high level and user-defined concepts and associate set of lower level events into them? Can the method establish relationships between these higher level descriptions? The method should be able to accept user-defined high level concepts and associate lower level events to them using well defined rules/restrictions. Errors produced or inability to define custom-defined events can be considered as a metric of how semantically rich the method is.
  • 19. Description of the Method  Design structure of the method  The Data Collection phase assumes proper acquisition techniques and possible pre-processing tasks.  Ontological representation based on light-weight domain specific ontologies to the RDF data model.  Automated Reasoning for inferencing new axioms (class, property, inverse property assertion axioms).  Rule evaluation / integration with rule engines.  Integrated query against the set of asserted and inferred axioms.
  • 20. Ontological Representation of Evidence  Two types of data  Case Related Data  Storage Media Forensic Images, Network Packet Captures, Firewall Logs  Supportive Data  WHOIS domain information, IP geo-location, IP to ASN mappings, databases of malicious files or hosts  Lightweight ontologies have been specified with the Protégé Ontology Editor based on  PCAP Network Captures, Disk Images, Windows XP Firewall Logs, WHOIS RIPE Database, VirusTotal, FIRE malicious networks tracker
  • 21. Ontologies  Network Capture  Protocol stack reconstruction  Focused on HTTP  W3C ERT RDF vocabulary for HTTP  Forensic Disk Image  DFXML and fiwalk  Timestamps, hash values, file type
  • 22. Ontologies  Windows XP Firewall Log  W3C Extended Log File Format  RIPE WHOIS  RIPE NCC web interface  XML/JSON formatted results
  • 23. Ontologies  Malicious Networks  FIRE project (Wombat EU FP7)  Aggregation from sources like  Anubis, Wepawet, SpamCop, PhishTank  Web interface (Discontinued)  Malware Detection  VirusTotal provides a web interface to a variety of antimalware engines  Database search web interface based on hash values
  • 24. Semantic Integration of Evidence  URI Format  urn://<source_id>/<resource_ID>  Ontological representation  Natively supported / Semantic Parsers  De-duplication  Single URI resource representation under the same namespace  owl:sameAs for same resource / differently namespaced URIs  OWL 2 hasKey  SWRL rules for integrating individuals in different ontologies  Realistic (manual) approach  Integration ontology (IP address, MD5 hash value) PacketCapture : IPAddress WindowsXPFirewallLog : Host PcapIPToFWLogHost
  • 25. Semantic Correlation of Evidence  Establishing relations between resources of different nature.  Temporal Correlation  SWRL Temporal Ontology (Connor & Das 2011)  Support for time instants and intervals  Two approaches  Modify existing ontologies by importing the time ontology.  Specifying existing classes as subclasses of ‘ExtendedProposition’ in an external ontology.
  • 26. Semantic Correlation of Evidence  Temporal Correlation (Cont’d)  Relations between time intervals  Allen’s Interval Algebra (Allen 1983)  Relations between time instants and intervals  ‘inside’,’before’,’after’ (Hobbs 2004)  SWRL builtins  Mereological Correlation  ‘partOf’ relations  Transitivity  E.g. IP address (partOf) IP range (partOf) AS => IP address (partOfAS) AS
  • 27. Integrated Query Formulation and Evaluation  Two methods of query preparation  Precomputing inferred axioms  Back-propagation  Two methods of query evaluation  Merging ontologies  Named graphs (Distributed SPARQL Endpoints)
  • 28. A Reference Implementation  Tools Used  Java 6  Protégé 4.1.0  OWL API 3.2.4  Pellet 2.3.0  Protégé OWL API 3.4.8  Jena 2.6.4  Jess 7.1p2  Kraken Pcap API 1.3.0  Apache HTTP Components, Jsoup, JSON
  • 29. A Reference Implementation • Evidence Manager • Load evidence files • Semantic Parser • 6 parsers • Filtering options (NIST NSRL) can lead to 40-50% reduction of an XP image. • Collector Objects • Reduce complexity • Coupled with parsers • Inference Engine • Class Assertion • Inverse Property Assertion • Integration Ontology • Investigator-specific classes/properties • SWRL Rule Engine • SPARQL In-memory endpoint
  • 30. Experimental Setup  2x HP Compaq 8000 Elite  Intel Core 2 Duo E8400 Processor  4 GB RAM  Microsoft XP SP 3  Backtrack 5 R1  MS11-006  Vulnerability in Windows Shell Graphics Processing  Office documents in thumbnail mode  Analysis Workstation  Dell XPS 15  Intel Core i7  4GB RAM
  • 33. Results (Experiment A) CompromisedSystem.xml (Fiwalk output of the system’s disk image) Original Disk Size 25GB Original Fiwalk XML output File Size 9,46MB RDF/XML Serialization File Size 7,08MB Number of Allocated Files in the Disk 6610 Number of Nodes in the Graph Representation 34012 Number of Edges in the Graph Representation 83032 Network Packet Capture (filtered for the system’s IP address and TCP protocol only) Original File Size 454KB RDF/XML Serialization File Size 662KB Number of TCP sessions 40 Number of Nodes in the Graph Representation 1616 Number of Edges in the Graph Representation 5891 Windows XP Firewall Log of the compromised system Original File Size 38KB RDF/XML Serialization File Size 684KB Number of Log Entries 413 Number of Nodes in the Graph Representation 1344 Number of Edges in the Graph Representation 5866 RIPE NCC WHOIS Database RDF/XML Serialization File Size 210KB Number of Queried IP Addresses 37 Number of Nodes in the Graph Representation 137 Number of Edges in the Graph Representation 395 FIRE Malicious Networks Database RDF/XML Serialization File Size 113KB Number of Queried Autonomous Systems 5 Number of Nodes in the Graph Representation 384 Number of Edges in the Graph Representation 1083 VirusTotal Anti-Malware Web Service RDF/XML Serialization File Size 2,45MB Number of Queried and Indexed by VT Files 2304 Number of Nodes in the Graph Representation 11519 Number of Edges in the Graph Representation 18508
  • 34. Results (Experiment A)  Reasoning Engine  72130 inferred axioms (approx. 6.1 MB)  SWRL Engine  160 ‘bridging’ properties  PacketCapture:hasIPValue(?x,?y) ^ WindowsXPFirewallLog:hasAddress(?w,?z) ^ swrlb:stringEqualIgnoreCase(?y,?z) -> IntegrationOntology:PcapIPToFWLogHost(?x,?w)  39610 time-related re-mapping properties  DigitalMedia:File(?x) ^ DigitalMedia:hasFileModificationTime(?x,?y) ^ temporal:ValidInstant(?z) ^ temporal:hasTime(?z,?w) ^ swrlb:stringEqualIgnoreCase(?y,?w) ^ swrlx:makeOWLThing(?filemodificationevent,?x) -> IntegrationOntology:FileModificationEvent(?filemodificationevent) ^ IntegrationOntology:Event(?filemodificationevent) ^ temporal:hasValidTime(?filemodificationevent,?z)
  • 35. Results (Experiment B) CompromisedSystem.xml (Fiwalk output of the system’s disk image) Original Disk Size 25GB Original Fiwalk XML output File Size 9,34MB RDF/XML Serialization File Size 6,44MB Number of Allocated Files in the Disk 3273 Number of Nodes in the Graph Representation 16330 Number of Edges in the Graph Representation 45039 Network Packet Capture (filtered for the system’s IP address and TCP protocol only) Original File Size 2,63MB RDF/XML Serialization File Size 2MB Number of TCP sessions 57 Number of Nodes in the Graph Representation 5419 Number of Edges in the Graph Representation 21712 Windows XP Firewall Log of the compromised system Original File Size 46KB RDF/XML Serialization File Size 784KB Number of Log Entries 480 Number of Nodes in the Graph Representation 1510 Number of Edges in the Graph Representation 6794 RIPE NCC WHOIS Database RDF/XML Serialization File Size 38KB Number of Queried IP Addresses 41 Number of Nodes in the Graph Representation 181 Number of Edges in the Graph Representation 326 FIRE Malicious Networks Database RDF/XML Serialization File Size 113KB Number of Queried Autonomous Systems 5 Number of Nodes in the Graph Representation 384 Number of Edges in the Graph Representation 1083 VirusTotal Anti-Malware Web Service RDF/XML Serialization File Size 54KB Number of Queried and Indexed by VT Files 2540 Number of Nodes in the Graph Representation 253 Number of Edges in the Graph Representation 386
  • 36. Results (Experiment B)  Additional Temporal Rules  temporalBefore between  Time Instants  Time Intervals  Time Instants and Time Periods  Time Periods and Time Instants  temporalStarts  temporalInside  1024 ValidInstant individuals  21 ValidPeriod individuals  58854 inferred temporal relations
  • 37. Example Hypotheses - Queries Hypoth esis The investigator hypothesizes that the compromised system may have had network communications with external IP addresses that belong to autonomous systems that may be listed as malicious networks. Query SELECT ?tcpflow ?destipvalue ?netname ?asnumber ?host_fire WHERE { ?tcpflow packetcapture:hasDestinationIP ?destip . ?destip packetcapture:hasIPValue ?destipvalue . ?destip integration:PcapIPToWHOISIpAddr ?whoisip . ?whoisip whois:isContainedInRange ?range . ?whoisip integration:WHOISIpAddrToFireIPAddr ?fireip . ?fireip fire:IPbelongsToHost ?host_fire . ?host_fire rdf:type fire:MaliciousHost . ?range whois:hasRange ?rangeValue . ?range whois:isContainedInAS ?as . ?as whois:hasNetName ?netname . ?as whois:hasASNumber ?asnumber . ?as whois:hasRoute ?route } Results tcpflow destipvalue netname asnumber <urn://bind_tcp_F Wed_tcp.pcap#tcpS ession_6> "78.46.173.193" ^^<http://www.w3. org/2001/XMLSche ma#string> "HETZNER-AS" ^^<http://www.w3. org/2001/XMLSche ma#string> "24940" ^^<http://www.w3. org/2001/XMLSche ma#string> <urn://bind_tcp_F Wed_tcp.pcap#tcpS ession_4> "78.46.173.193" ^^<http://www.w3. org/2001/XMLSche ma#string> "HETZNER-AS" ^^<http://www.w3. org/2001/XMLSche ma#string> "24940" ^^<http://www.w3. org/2001/XMLSche ma#string> Interpr etation The results of the query support the hypothesis that the compromised system had indeed network communications with IP addresses that belongs to autonomous systems known to demonstrate malicious behavior. The query is able to match a graph pattern in the provided dataset thus retrieving additional information regarding the specific blacklisted AS.
  • 38. Evaluation  The method can be relevant to a lot of different cases due to its ability to deal with heterogeneous data.  Ability to formulate complex and expressive queries over the integrated data that match closely logical hypotheses  Efficient data abstraction and query evaluation, given axiom pre-inference  Inverse object properties can improve considerably query evaluation time  Evidence-neutral implementation  Temporal correlation can be computationally demanding
  • 39. Evaluation  Reliance to online source may affect the precision of the results.  Ontological consistency of the results given valid ontologies.  The implementation can be system-independent.  Ontologies can be dynamically expanded or new ones (case-specific) introduced.