These slides have been presented virtually at the IFIP SEC 2020 on 23th September 2020, for the full paper please download at this link : https://link.springer.com/chapter/10.1007/978-3-030-58201-2_26
*Abstract:
Ensuring data confidentiality and integrity are key concerns for information security professionals, who typically have to obtain and integrate information from multiple sources to detect unauthorized data modifications and transmissions. The instrumentation that operating systems provide for the monitoring of file system level activity can yield important clues on possible data tampering and exfiltration activity but the raw data that these tools provide is difficult to interpret, contextualize and query. In this paper, we propose and implement an architecture for file system activity log acquisition, extraction, linking, and storage that leverages semantic techniques to tackle limitations of existing monitoring approaches in terms of integration, contextualization, and cross-platform interoperability. We illustrate the applicability of the proposed approach in both forensic and monitoring scenarios and conduct a performance valuation in a virtual setting
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach
1. Cross-Platform File System Activity Monitoring
and Forensics - A Semantic Approach
Kabul Kurniawan, Andreas Ekelhart
Fajar Ekaputra, Elmar Kiesling
This work was sponsored by the Austrian Science Fund (FWF) and netidee SCIENCE under grant P30437-
N31, and the COMET K1 program by the Austrian Research Promotion Agency.
2. Motivation
• Increasing collection of sensitive data..
• the number and size of data breaches have been on the rise ..
• 4.1 billion records in the first half of 2019 (Verizon)
• USD 3.86 million average total cost/incident in 2020 (IBM)
• Sophisticated attack tactics/techniques..
• exfiltration of sensitive data is often difficult to detect..
• 280 days on average to identify and contain a breach (IBM)
2
(External)
Insider
3. Challenges in Log Analysis
Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan 3
Dispersed ICT
Asset
Information
Dispersed
Cybersecurity
information
Manually searching log data and comparing
related information to understand attack/event
chains is a tedious & time consuming process!
4. Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan
Syntactic heterogeneity Semantic heterogeneity
Inconsistent identifiers
Windows
Eventlog
Firewall-
Log
Linux
AuthlogLog Data HeterogeneitySyslog
4
5. Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan
• File Activity Monitoring
• Statistical analysis to identify anomalies (Hu Y., 2011)
• Policy based OS call provenance for data leakage detection (Awad, 2016)
• OS kernel provenance to detect exfiltration from a database (Daren, 2019)
• Deep learning model to predict insider threats (Bhavsar, 2018)
• File System Ontologies & Semantic Approaches for File Monitoring
• TripFS : File exploration framework based on linked data using the NEPOMUK File Ontology (Schand, 2010),
VDB-FilePub (Shen, 2011), Semantic File System (SFS) (Mashwani, 2018)
• Existing Tools & SIEMs: Commercially tools : e.g. SolarWinds, PA File Insight, STEALTHbits File Activity Monitor, and
Decision File Audit. SIEMs: e.g. LogDNA, Splunk, ElasticSearch
State of The Art
5
Research Gaps:
• Mainly focus on regular expression, rule-based classification, and statistical log analysis, etc.
• Lack of interoperability, contextualization and linking to cybersecurity information.
• Existing tools provide simple alerting upon suspicious activity.
• Existing SIEMs do not specifically tackle the problem of file activity tracking.
6. Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan
Standards-based
SPARQL
RDF
JSON-LD
Graph-based
flexible querying
flexible schemas
context-rich
representation
terminological clarity
Explicit Semantics
reasoning
integration
“machine-readability”
Decentralization
alignment
linking
federation
reconciliation
sharing
6
Semantic Web Technologies
• Flexible schema for unstructured, semi-structured log
data (xml, json, csv, etc.)
• Semantic integration of heterogeneous security-
related data (Win log, Linux Audit log, etc.)
• Contextualization and linking to external & internal
background knowledge (IT Assets, Cybersecurity
Information, etc.)
• Stream Reasoning over security-related log data (e.g.
for real-time file activity monitoring)
• Standard Query language for log analysis & forensic
Potential solution for the security domain
Addressing the current gaps in
file activity monitoring & forensics
7. Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan 7
Proposed Approach
RDF/OWL Ontology:
• Vocabulary (e.g. Low level Log Ontology, Event Ontology etc.)
• Background Knowledge (e.g. IT Assets, Cybersecurity Information)
1. Conceptual
Modelling
• Log Acquisition and Extraction
• Log Transformation (i.e. RDF Mapping)
• Event Extraction & Linking
2. Semantic Log
Processing
• Event Monitoring via Semantic Continuous Querying over log streams
• Log Analysis and Forensics through SPARQL-Queries
3. Semantic Log Analysis and
Monitoring
8. Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan
Conceptual Modelling: Ontology Construction
8
Bottom-up approach
Log Entry Ontology
(e.g. Windows, Linux Log Ontology)
File Operation/Access Event Ontology
High-level events
(output)
Low – level information from log sources
(e.g. Windows, Linux)
(Input)
9. Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan
Conceptual Modelling : Log Entry & File Access ontology
9
Windows Log Ontology Linux Log Ontology
File Access Ontology
10. Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan
System Architecture
10
11. Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan
Semantic Log Processing: Log Acquisition & Extraction
{
"timestamp":"2018−04−09T07:37:47.000Z",
“message”:”Mounted Huge Pages File System”,
"program":"systemd“,
"host":"kabul−VirtualBox“,
"pid":"1“,
….
}
Extracted Log Data
(example) Raw Log Data
Apr 9 09:37:47 kabul-VirtualBox systemd[1]: Mounted Huge Pages File System.
Raw Log
Data
Extracted
Log data
11
12. Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan
Semantic Log Processing: Extracted File Access events in JSON
12
Created
Modified
Renamed
Copied Deleted
13. Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan
Semantic Log Processing: RDF Mapping
13
{
"timestamp":"2018−04−09T07:37:47.000Z",
“message”:”Mounted Huge Pages File System”,
"program":"systemd“,
"host":"kabul−VirtualBox“,
"pid":"1“,
….
}
Extracted Log Data (JSON) - example
{
"@context":"http://w3id.orgt/contexts/syslog.jsonld",
"logMessage":"Mounted Huge Pages File System",
"timestamp":"2018−04−09T07:37:47.000Z",
"hasProcessId":"1",
"hasSeverity":{
"severityName":"notice",
"severityCode":"5"
},
"@type":"http://w3id.org/sepses/vocab/log/sysLog#SysLogEntry",
"hasLogType":"http://example.org/system#syslog",
"@id":"http://example.org/logEntry#logEntry−befd−abc",
"hasProgram":{
"programName":"systemd"
},
"logFilePath":"/var/log/syslog",
"input":{
"type":"log"
},
"originatesFrom":{
"hostName":"kabul−VirtualBox"
}
}
` RDF / JSON-LD
Enrichment…
Extracted
Log data
Log data
In RDF
Standard
Mapping
Language
Log
Vocabularies
14. Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan
Example SPARQL CONSTRUCT QUERY for Event Extraction (Rename)
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
…
CONSTRUCT {
?subject fae:hasFileAccessType sys:Renamed;
rdf:type fae:FileAccessEvent; fae:timestamp ?logtimestamp; fae:hasSourceFile ?sourceFile;
fae:hasTargetFile ?targetFile; fae:hasSourceHost ?sourceHost; fae:hasTargetHost ?targetHost;
fae:hasUser ?user .?sourceFile fae:fileName ?filename .?targetFile fae:fileName ?filename2 .
?sourceHost fae:hostName ?hostname .?targetHost fae:hostName ?hostname2 .?user
fae:userName ?username .}
WHERE {?s file:pathName ?filename2 . ?s file:hostName ?hostname2 . ?s file:timestamp ?logtimestamp .
?s file:userName ?username .?s file:eventName ?event2 .
{SELECT * WHERE
{?r file:pathName ?filename . ?r file:hostName ?hostname .
?r file:timestamp ?logtimestamp2 . ?r file:eventName ?event
FILTER regex(str(?event),"moved")}}
FILTER (regex(str(?event2),"created") && ?filename!=?filename2 && ?hostname=?hostname2 )
BIND (URI(CONCAT(REPLACE(str(?s),"LogEntry","event"),"-renamed")) AS ?subject)
BIND (URI(CONCAT(REPLACE(str(?s),"LogEntry","source-file"),"-renamed")) AS ?sourceFile)
BIND (URI(CONCAT(REPLACE(str(?s),"LogEntry","target-file"),"-renamed")) AS ?targetFile)
BIND (URI(CONCAT(REPLACE(str(?s),"LogEntry","source-host"),"-renamed")) AS ?sourceHost)
BIND (URI(CONCAT(REPLACE(str(?s),"LogEntry","target-host"),"-renamed")) AS ?targetHost)
BIND (URI(CONCAT(REPLACE(str(?s),"LogEntry","user"),"-renamed")) AS ?user)}
Semantic Log Processing: (High-level) Event Extraction
14
Result
15. Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan
Semantic Log Processing: File Access Activity Visualization
15
Event Graph
Generated RDF file access events
File life-cycle visualization
17. Use-Case 1: Log Forensics (File Access History)
Scenario:
Goals:
• Improve situational awareness
• Correlating event sequences
Analyst
Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan 17
18. Use-Case 1: Log Forensic (File Access History)
Query Evaluation:
Result in table:
Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan 18
FileServer (Linux) Workstation (Windows)
Graph Visualization
19. Use-Case 2: Log Monitoring (Sensitive data on vulnerable hosts)
Query Evaluation:
Result:
Goals:
• Improve situational awareness
• Detect malicious activities
• Reduce false positive
Scenario:
Analyst
Cybersecurity
Knowledge-Base
Internal
Background
Knowledge
Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan 19
20. Evaluation Setup:
• C-Sprite as event extraction engine (3s sliding time-window every 1s)
• Java-based event generator for random file activities (weighted random choices)
• Report the average times over five runs for each experiment
Evaluation
Results:
Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan 20
21. Conclusions
• We tackled current challenges in file activity monitoring and analysis
interoperability, contextualization and uniform querying) with Semantic Web
technologies.
• We introduced a set of vocabularies.
• We developed a prototype and illustrated how to monitor file system activities,
trace file life-cycles, and enrich them with information to understand their context
(e.g., internal and external background knowledge).
• We demonstrated the applicability of the approach in two scenarios in virtual
environments.
• The results of our evaluation indicate that the approach can effectively extract and
link micro-level operations of multiple operating systems and consolidate them in an
integrated stream of semantically explicit file activity.
Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan 21
22. Future work
• We aim to address the accuracy and scalability limitations of the current approach
• We will investigate the integration of our approach into existing standards (e.g., STIX
and CASE) to increase interoperability for forensic investigation
Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan 22
26. 26
• Internal Knowledge
• System knowledge
- Capture organization-specific concepts, assets.
e.g., hosts, users and network components etc.
• Event knowledge
- Event definitions and associated extraction patterns..
e.g., Authentication (login, logout),
File Access (created, copied, removed etc.)
• External Knowledge
• Cybersecurity Information:
- Vulnerability information , Weaknesses, Indicators
of compromise, common attack patterns..
e.g., CVE, CVSS, CPE, CWE, CAPEC etc.
Conceptual Modelling – Background Knowledge
Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan
27. 27
Causal Linking of Security Event
Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan
28. Cross-Platform File System Activity Monitoring and Forensics - A Semantic Approach. Kabul Kurniawan
• File Activity Monitoring
• Mainly focus on regular expression, rule-based classification, anomaly detection, and statistical log analysis.
• Focus on data exfiltration on Insider, result in high number of false positive (Hu, 2011)
• Policy based operating system call provenance (Awad, 2016)
• Lack of interoperability, contextualization and linking to cybersecurity information.
• Limited to specific approaches e.g. exfiltration from database (Daren, 2019)
• File System Ontology
• File exploration based on linked data principle using NEPOMUK File Ontology (NFO)
• NFS (Schand, 2010), VDB-FilePub (Shen, 2011), Semantic File System (SFS) (Mashwani, 2018)
• Semantic Approach for File Access Monitoring & Forensic
• Mostly do not focus on file activity monitoring and life-cycle construction in particular.
• Focus on text processing while file activity is not considered (Amato, 2018)
• Existing Tools & SIEM
• Provide simple alerting upon suspicious activity
• e.g. Solarwind Server and Application Monitor, ManageEngine DataSecurity Plus, PA File Insight, STEALTHbits File Activity
Monitor, and Decision File Audit.
• Existing SIEM do not specifically tackle the problem of file activity tracking
• e.g. eg. LogDNA, Splunk, ElasticSearch
State of The Art & Research Gaps
32