MattockFS Computer Forensic File-System

•Download as ODP, PDF•

1 like•298 views

A short presentation describing the roots and features of the MattockFS forensic file system, implementing page-cache friendly, capability secure, write-once data archive, opportunistic hashing and local message bus implementation geared primarily at digital forensic framework usage.

Technology

MattockFS
Computer-Forensics File-System

A family tree
● 2002: OCFA Anycast
● 2006: CarvFS
● 2006: Sealed Digital Evidence Bag
● 2008: MinorFS

The OCFA Anycast
● Message bus solution
● Based on the workers concept
● Part of Open Computer forensics Architecture
● Persistent Priority Queues
● Virtually infinite size queues

Issues with AnyCast
● Infinite size queues
● Multiple tools in tool-chain access same data
● Result: Many page-cache misses

Spurious reads in OCFA
● Page-cache misses
● Early hashing and Content Addressed Storage
● No locality of data design
● Server-side data entry

CarvFS
● Storage requirements traditional file carving
● CarvFS allows for zero-storage carving
● Carved files not copied our but designated
● CarvPath designations
– /mnt/carvfs/mp3/18400+4096_S4096_47912+975.crv

Issues with CarvFS
● Read-only access to forensic disk image
● In large cases hundreds of mounted image files
● OCFA hacks
– Bypass CarvFS to write to underlying growing
archive
– Inefficient hybrid CarvFS/CAS storage

Shared mutable data
● Meta-data
● Evidence data
● Derived evidence data
● Provenance logs

Shared mutable data
● Meta-data
● Evidence data
● Derived evidence data
● Provenance logs
● Anti forensics ?
● Forensic process integrity ?

Integrity measures
● Trusted immutable data and meta-data
– Lab side sealed digital evidence bags.
● Trusted provenance logging
● Capability based access control
– Sparse capabilities
– Mandatory Access Controls

Spurious reads measures
● Opportunistic hashing
● Linux fadvise API
● Integration of message-bus and data-archive
● Priority queues → Sortable sets
● Provide pressure data for throttling purposes

MattockFS
● User space file-system for forensic frameworks
● Zero-storage carving
● Hybrid data-archive/message-bus solution
● Helps minimize spurious read induced performance
degradation.
● Provides write-once (meta) data
● Provides safe capability based API
● Provides trusted provenance logs

MattockFS: Hooks
● Hooks for forensic frameworks
– Framework based routing functionality
– Pressure data for throttling new-data input
– Queue size for throttling
● Hooks for multi-server setup
– Geared to locality of data
– Migrating CPU intensive jobs

MattockFS
Combining a zero-storage carving-capable high-
integrity data-storage archive with trusted forensic
logging, a capability secure API and a local page-
cache friendly message bus solution.

Modern day Data Engineering requires creating reliable data pipelines, architecting distributed systems, designing data stores, and preparing data for other teams. We’ll describe a year in the life of a Data Engineer who is tasked with creating a streaming data pipeline and touch on the skills necessary to set one up using Apache Spark. Slides from the April 2019 meeting of the St. Louis Big Data IDEA meetup.

Towards Data Operations

Andrea Monacchi

2011 IBM-KNAW Cambridge - How to store meaningful bits permanently

Dirk Roorda

Introducing gluster filesystem by aditya

Aditya Chhikara

CEPH DAY BERLIN - WELCOME

Ceph Community

Lisa 2015-gluster fs-introduction

Gluster.org

GlusterFs Architecture & Roadmap - LinuxCon EU 2013

Gluster.org

EKAW - Publishing with Triple Pattern Fragments

Ruben Taelman

Glusterfs and Hadoop

Shubhendu Tripathi

Elasticsearch avoiding hotspots

Christophe Marchal

Improve Presto Architectural Decisions with Shadow Cache

Alluxio, Inc.

MongoDB SF Ruby

Mike Dirolf

Handling the growth of data

Piyush Katariya

GlusterFS Talk for CentOS Dojo Bangalore

Raghavendra Talur

Už se vám někdy stalo?Zdeněk Klusák

The design of forensic computer workstations

jkvr100

What's hot

HPCS16 - Frederick Lefebvre - Bridging the last mileFrédérick Lefebvre

YDAL Barcelona

Gluster.org

Enabling Presto Caching at Uber with Alluxio

Alluxio, Inc.

Gluster for sysadmins

Gluster.org

Tiering barcelona

Gluster.org

Distributed Timeseries Database In Go (gophercon India 17)

Matthew Campbell

Data engineering Stl Big Data IDEA user group

Adam Doyle

Towards Data Operations

Andrea Monacchi

2011 IBM-KNAW Cambridge - How to store meaningful bits permanently

Dirk Roorda

Introducing gluster filesystem by aditya

Aditya Chhikara

CEPH DAY BERLIN - WELCOME

Ceph Community

Lisa 2015-gluster fs-introduction

Gluster.org

GlusterFs Architecture & Roadmap - LinuxCon EU 2013

Gluster.org

EKAW - Publishing with Triple Pattern Fragments

Ruben Taelman

Glusterfs and Hadoop

Shubhendu Tripathi

Elasticsearch avoiding hotspots

Christophe Marchal

Improve Presto Architectural Decisions with Shadow Cache

Alluxio, Inc.

MongoDB SF Ruby

Mike Dirolf

Handling the growth of data

Piyush Katariya

GlusterFS Talk for CentOS Dojo Bangalore

Raghavendra Talur

What's hot (20)

HPCS16 - Frederick Lefebvre - Bridging the last mile

YDAL Barcelona

Enabling Presto Caching at Uber with Alluxio

Gluster for sysadmins

Tiering barcelona

Distributed Timeseries Database In Go (gophercon India 17)

Data engineering Stl Big Data IDEA user group

Towards Data Operations

2011 IBM-KNAW Cambridge - How to store meaningful bits permanently

Introducing gluster filesystem by aditya

CEPH DAY BERLIN - WELCOME

Lisa 2015-gluster fs-introduction

GlusterFs Architecture & Roadmap - LinuxCon EU 2013

EKAW - Publishing with Triple Pattern Fragments

Glusterfs and Hadoop

Elasticsearch avoiding hotspots

Improve Presto Architectural Decisions with Shadow Cache

MongoDB SF Ruby

Handling the growth of data

GlusterFS Talk for CentOS Dojo Bangalore

Viewers also liked

Už se vám někdy stalo?Zdeněk Klusák

The design of forensic computer workstations

jkvr100

Computer forensic ppt

Onkar1431

Electornic evidence collectionFakrul Alam

Capturing forensics image

Chris Harrington

File000173Desmond Devendran

Forensic Science - 01 What is forensic science?

Ian Anderson

Elements Of Forensic Scienceannperry09

Computer forensicbhavithd

BoyarMiller - You Lost Me At Gigabyte: Working with Computer Forensic Examiners

BoyarMiller

Computer forensicSinggih Prasetya

Computer Forensics in Fighting CrimesIsaiah Edem

Business Intelligence (BI) Tools For Computer Forensic

Dhiren Gala

Document clustering for forensic analysis an approach for improving compute...

Madan Golla

7 Forensic Science Powerpoint Chapter 07 Forensic AnthropologyGrossmont College

Introduction to computer forensic

Online

Computer forensic 101 - OWASP Khartoum

OWASP Khartoum

Introduction to forensic science

Abby Varghese

Digital Evidence in Computer Forensic Investigations

Filip Maertens

Principles of forensic science

Bhoopendra Singh

Viewers also liked (20)

Už se vám někdy stalo?

The design of forensic computer workstations

Computer forensic ppt

Electornic evidence collection

Capturing forensics image

File000173

Forensic Science - 01 What is forensic science?

Elements Of Forensic Science

Computer forensic

BoyarMiller - You Lost Me At Gigabyte: Working with Computer Forensic Examiners

Computer forensic

Computer Forensics in Fighting Crimes

Business Intelligence (BI) Tools For Computer Forensic

Document clustering for forensic analysis an approach for improving compute...

7 Forensic Science Powerpoint Chapter 07 Forensic Anthropology

Introduction to computer forensic

Computer forensic 101 - OWASP Khartoum

Introduction to forensic science

Digital Evidence in Computer Forensic Investigations

Principles of forensic science

Similar to MattockFS Computer Forensic File-System

Log ingestion kafka -- impala using apex

Apache Apex

Ursa Labs and Apache Arrow in 2019

Wes McKinney

Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...

inside-BigData.com

In this deck from the Stanford HPC Conference, DK Panda from Ohio State University presents: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing. "This talk will provide an overview of challenges in accelerating Hadoop, Spark and Memcached on modern HPC clusters. An overview of RDMA-based designs for Hadoop (HDFS, MapReduce, RPC and HBase), Spark, Memcached, Swift, and Kafka using native RDMA support for InfiniBand and RoCE will be presented. Enhanced designs for these components to exploit NVM-based in-memory technology and parallel file systems (such as Lustre) will also be presented. Benefits of these designs on various cluster configurations using the publicly available RDMA-enabled packages from the OSU HiBD project (http://hibd.cse.ohio-state.edu) will be shown." Watch the video: https://youtu.be/iLTYkTandEA Learn more: http://web.cse.ohio-state.edu/~panda.2/ and http://hpcadvisorycouncil.com Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

Architecture of a Next-Generation Parallel File System

Great Wide Open

Merging heterogeneous network measurement data

Jorge E. López de Vergara Méndez

Apache Arrow -- Cross-language development platform for in-memory data

Wes McKinney

Supercharging Data Performance for Real-Time Data Analysis

Ryft

Big Data Streams Architectures. Why? What? How?

Anton Nazaruk

With a current zoo of technologies and different ways of their interaction it's a big challenge to architect a system (or adopt existed one) that will conform to low-latency BigData analysis requirements. Apache Kafka and Kappa Architecture in particular take more and more attention over classic Hadoop-centric technologies stack. New Consumer API put significant boost in this direction. Microservices-based streaming processing and new Kafka Streams tend to be a synergy in BigData world.

Big Data Architecture Workshop - Vahid Amiri

datastack

Hadoop Distributed File Systemelliando dias

Hadoop

Abhishek Agarwal

Datasheet hus-and-hitachi-nas-platform-4000-seriesJeff Yana

Drill at the Chicago Hug

MapR Technologies

SQL and Machine Learning on Hadoop

Mukund Babbar

Hadoop ppt1

chariorienit

Storage solutions for High Performance Computing

gmateesc

Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022

JayjeetChakraborty

With the ever-increasing dataset sizes, several file formats such as Parquet, ORC, and Avro have been developed to store data efficiently, save the network, and interconnect bandwidth at the price of additional CPU utilization. However, with the advent of networks supporting 25-100 Gb/s and storage devices delivering 1, 000, 000 reqs/sec, the CPU has become the bottleneck trying to keep up feeding data in and out of these fast devices. The result is that data access libraries executed on single clients are often CPU-bound and cannot utilize the scale-out benefits of distributed storage systems. One attractive solution to this problem is to offload data-reducing processing and filtering tasks to the storage layer. However, modifying legacy storage systems to support compute offloading is often tedious and requires an extensive understanding of the system internals. Previous approaches re-implemented functionality of data processing frameworks and access libraries for a particular storage system, a duplication of effort that might have to be repeated for different storage systems. This paper introduces a new design paradigm that allows extending programmable object storage systems to embed existing, widely used data processing frameworks and access libraries into the storage layer with no modifications. In this approach, data processing frameworks and access libraries can evolve independently from storage systems while leveraging distributed storage systems’ scale-out and availability properties. We present Skyhook, an example implementation of our design paradigm using Ceph, Apache Arrow, and Parquet. We provide a brief performance evaluation of Skyhook and discuss key results.

Hadoop storage

SanSan149

Apache Spark 101 - Demi Ben-Ari

Demi Ben-Ari

The world has changed and having one huge server won’t do the job anymore, when you’re talking about vast amounts of data, growing all the time the ability to Scale Out would be your savior. Apache Spark is a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. This lecture will be about the basics of Apache Spark and distributed computing and the development tools needed to have a functional environment.

Interactive Data Analysis in Spark Streaming

datamantra

Similar to MattockFS Computer Forensic File-System (20)

Log ingestion kafka -- impala using apex

Ursa Labs and Apache Arrow in 2019

Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...

Architecture of a Next-Generation Parallel File System

Merging heterogeneous network measurement data

Apache Arrow -- Cross-language development platform for in-memory data

Supercharging Data Performance for Real-Time Data Analysis

Big Data Streams Architectures. Why? What? How?

Big Data Architecture Workshop - Vahid Amiri

Hadoop Distributed File System

Hadoop

Datasheet hus-and-hitachi-nas-platform-4000-series

Drill at the Chicago Hug

SQL and Machine Learning on Hadoop

Hadoop ppt1

Storage solutions for High Performance Computing

Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022

Hadoop storage

Apache Spark 101 - Demi Ben-Ari

Interactive Data Analysis in Spark Streaming

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster. However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks. In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

UiPathCommunity

💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™: See how to accelerate model training and optimize model performance with active learning Learn about the latest enhancements to out-of-the-box document processing – with little to no training required Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath. Speakers: 👨‍🏫 Andras Palfi, Senior Product Manager, UiPath 👩‍🏫 Lenka Dulovicova, Product Program Manager, UiPath

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Product School

Monitoring Java Application Security with JDK Tools and JFR Events

Ana-Maria Mihalceanu

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...

James Anderson

Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management. The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM). Speakers: Bob Boule Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle. Gopinath Rebala Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.

Generating a custom Ruby SDK for your web service or Rails API using Smithy

g2nightmarescribd

Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Sri Ambati

Knowledge engineering: from people to machines and back

Elena Simperl

UiPath Test Automation using UiPath Test Suite series, part 4

DianaGray10

Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap. The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies. Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques What will you get from this session? 1. Insights into SAP testing best practices 2. Heatmap utilization for testing 3. Optimization of testing processes 4. Demo Topics covered: Execution from the test manager Orchestrator execution result Defect reporting SAP heatmap example with demo Speaker: Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

Accelerate your Kubernetes clusters with Varnish Caching

Thijs Feryn

Assuring Contact Center Experiences for Your Customers With ThousandEyes

ThousandEyes

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance

Neuro-symbolic is not enough, we need neuro-*semantic*

Frank van Harmelen

Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”. All of this illustrated with link prediction over knowledge graphs, but the argument is general.

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Jeffrey Haguewood

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams. Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

FIDO Alliance

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

Product School

Epistemic Interaction - tuning interfaces to provide information for AI support

Alan Dix

Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024 https://alandix.com/academic/papers/synergy2024-epistemic/ As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.

The Art of the Pitch: WordPress Relationships and Sales

Laura Byrne

Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes? All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

Paul Groth

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

Thierry Lestable

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Monitoring Java Application Security with JDK Tools and JFR Events

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...

Generating a custom Ruby SDK for your web service or Rails API using Smithy

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Knowledge engineering: from people to machines and back

UiPath Test Automation using UiPath Test Suite series, part 4

Accelerate your Kubernetes clusters with Varnish Caching

Assuring Contact Center Experiences for Your Customers With ThousandEyes

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

Neuro-symbolic is not enough, we need neuro-*semantic*

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

Epistemic Interaction - tuning interfaces to provide information for AI support

The Art of the Pitch: WordPress Relationships and Sales

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

MattockFS Computer Forensic File-System

1. MattockFS Computer-Forensics File-System

2. A family tree ● 2002: OCFA Anycast ● 2006: CarvFS ● 2006: Sealed Digital Evidence Bag ● 2008: MinorFS

3. The OCFA Anycast ● Message bus solution ● Based on the workers concept ● Part of Open Computer forensics Architecture ● Persistent Priority Queues ● Virtually infinite size queues

4. Issues with AnyCast ● Infinite size queues ● Multiple tools in tool-chain access same data ● Result: Many page-cache misses

5. Spurious reads in OCFA ● Page-cache misses ● Early hashing and Content Addressed Storage ● No locality of data design ● Server-side data entry

6. CarvFS ● Storage requirements traditional file carving ● CarvFS allows for zero-storage carving ● Carved files not copied our but designated ● CarvPath designations – /mnt/carvfs/mp3/18400+4096_S4096_47912+975.crv

7. Issues with CarvFS ● Read-only access to forensic disk image ● In large cases hundreds of mounted image files ● OCFA hacks – Bypass CarvFS to write to underlying growing archive – Inefficient hybrid CarvFS/CAS storage

8. Shared mutable data ● Meta-data ● Evidence data ● Derived evidence data ● Provenance logs

9. Shared mutable data ● Meta-data ● Evidence data ● Derived evidence data ● Provenance logs ● Anti forensics ? ● Forensic process integrity ?

10. Integrity measures ● Trusted immutable data and meta-data – Lab side sealed digital evidence bags. ● Trusted provenance logging ● Capability based access control – Sparse capabilities – Mandatory Access Controls

11. Spurious reads measures ● Opportunistic hashing ● Linux fadvise API ● Integration of message-bus and data-archive ● Priority queues → Sortable sets ● Provide pressure data for throttling purposes

12. MattockFS ● User space file-system for forensic frameworks ● Zero-storage carving ● Hybrid data-archive/message-bus solution ● Helps minimize spurious read induced performance degradation. ● Provides write-once (meta) data ● Provides safe capability based API ● Provides trusted provenance logs

13. MattockFS: Hooks ● Hooks for forensic frameworks – Framework based routing functionality – Pressure data for throttling new-data input – Queue size for throttling ● Hooks for multi-server setup – Geared to locality of data – Migrating CPU intensive jobs

14. MattockFS Combining a zero-storage carving-capable high- integrity data-storage archive with trusted forensic logging, a capability secure API and a local page- cache friendly message bus solution.

Editor's Notes

This is a short talk about the MattockFS computer-forensic file-system. MattockFS is a user space file-system that runs on the Linux platform and that is geared towards the specific needs for computer-forensic frameworks with respect to data storage and access and with respect to message-bus facilities.
Before we get into MattockFS itself, you could say MattockFS has four important ancestors with respect to parts of its features. The Open Computer Forensics Architecture had a core component named the AnyCast. This was basically what today would be called a message broker or a message bus. CarvFS introduced the concept of CarvPath designations as a way to use pseudo file-paths to implement a feature known as in-line carving or zero-storage carving where you could extract files from a disk image without actually copying them out. Than the Sealed Digital Evidence Bag concept and MinorFS introduced system integrity properties and facilities that I will show are important with respect to the threath of anti-forensics.
Lets start by looking at one of these MattockFS ancestors. The OCFA Anycast. The Anycast was a message bus solution, kind of like RabitMQ, but geared soly on the worker concept. It was part of a computer forensic framework named the Open Computer Forensics Architecture or OCFA. It used the concept of persistent priority queues that for all intent and purpose were infinite in size, a property that we later found out had some drawbacks.
So what issues were there with the Anycast? The problem lies in the very nature of the computer forensic process. In this process, multiple tools in a tool chain are used consecutively on a single piece of data. The infinite size queues however combined with the lack of throttling in the OCFA framework, resulted in the fact that much data would long have been moved out of the operating systems page-cache at the moment the next tool would access the data. This would result in many page-cache misses and thus many spurious reads from the actual hard disks.
Given the IO intensive nature of the computer forensic process, spurious reads can be quite a performance issue for the overall system. Page cache misses as induced by the Anycast design are one source of such spurious reads, but in OCFA there were more design issues also leading to even more spurious reads. Either by aggravating the page-cache miss problems, or for other reasons. OCFA used early hashing of all data, partialy resulting from its primary storage model that was based on Content Addressed Storage or CAS. This added extra time between the first read and the last read, thus resulting in increased page cache issues. Two other spurious read issues resulted from the lack of a “locality of data” based design and the server-side data entry.
A second ancestor of MattockFS is CarvFS. CarvFS is a user space file-system that allows mounting a single computer forensic disk image, and allows to then carve files from unallocated space without actually having to copy the data out. CatvFS introduced the concept of CarvPath designations, like the one you see at the bottom of this slide. These designations are made up primarily of a combination of offset and size that together describe where in the mounted entity the actual data resides.
While CarvFS was a big step forward for carving out files, in the light of usage within a computer forensic framework there were a number of issues. CarvFS was designed to mount a single disk image, In a large case, hundreds of disk images will be entered. In the case of OCFA this meant that hundreds of disk image files would have to be mounted as user-space file-system simultaneously, resulting in massive use of system resources that turned out to be prohibitive. An other issue was that for adding data in any other way, CAS storage was still needed. A hack was used to use CarvFS with OCFA that basically had modules bypass CarvFS altogether in order to create one single big archive file. The integration of CarvFS into OCFA never became fully optimal because CAS storage remained such an integral part of its core design.
Modern insights with respect to system integrity force us to look at the use of shared mutable state within computer forensic frameworks. Basically a forensic framework weaves together a number of tools into suitable tool-chains for different types of data. Each module in such a framework accessing shared mutable data has the theoretic ability to change the shared mutable data. As non of the modules is expected to be malicious, the historic approach to shared mutable data also used in OCFA was that this would not be a serious issue.
Enter anti-forensics. Tools used in the tool-chain, just like programs such as web browsers and PDF readers do have vulnerabilities. Remember that the suspect controls what data resided on his disk. There could be exploits targeting specific tools or libraries in the tool-chain, and these exploits could be used to target shared mutable data in our framework. Either corrupting the data or possibly even rendering the whole system useless. With respect to system integrity, experience with OCFA has shown some modules can just crash benignly if exposed to certain wrongly identified data files. Given that a random crash can have undefined behavior, there remains a serious risk that even non anti-forensic crashes might corrupt the system through shared mutable state.
Ok, lets start looking at MattockFS now. First let look at the system integrity measures that MattockFS introduces. The concept of Sealed Digital Evidence Bags introduced us to the idea that data and meta-data should be sealed. MattockFS uses the fact that it runs under a different user id than the rest of the system to provide trusted write-once immutable data. It also uses this fact to provide a trusted provenance log facility. Data, meta-data and provenance log are protected from tempering by an exploit against one of the tools in the tool-chain. Access to the MattockFS file-system with respect to anything that might intervene with this goes through what is called capability based access control. We implement these in a way we borrowed from the MinorFS least authority file-system.
Now that we have tackled the anti-forensics and integrity concerns, lets look at what MattockFS does for attenuating our spurious reads problem. Firs of all MattockFS has built-in hashing functionality. Whenever data is read from or written to in MattockFS, MattockFS tries if it can use this data to further calculating the hash of any CarvPath entity that this data is part of. Secondly, mattockFS communicates with the Linux kernel regarding archive data being accessed. It informs the kernel what data it will soon want to access again and what data it doesn&apos;t. This should allow the kernel to be smarter about page-cache handling. In order to allow the message bus and storage system to effectively work together, MattockFS integrates the message-bus into the file-system. If provides multiple policies for picking messages that are most beneficial to opportunistic hashing or page-cache pressure. Finaly MattockFS provides data to the module framework that should allow the framnework to properly implement throttling.
A quick sumary of the main features of mattockFS. MattockFS is a user-space file-system for Linux that has facilities for zero storage carving. It is a hybrid data-archive and message-bus sytem geares pecifically at the needs of the computer forensic process. One of its primary goals is the minimization of spurious hard disk reads that hinder overal system performance. Next to performance, MattockFS provides multiple anti-anti-forensics facilities. Write-once access to data and meta-data, trusted provenance logs and a secure capability based form of access control for modules communicating with the file-sytem.
There are some things that MattockFS only facilitates in by providing hooks. MattockFS expects the module framework on top of it to implement routing and file-type determination functionality. It provides the routing functionality with a possibility to maintain state. Also the module framework is expected to implement some form of throttling, especially for entering new data into the system, and to a lesser extent, adding new messages into the system. MattockFS provides data about how filled up the page-cache could be about to get, and about how much data and messages are queued for the different modules. Finally MattockFS provides hooks for creating multi-server setups. These are geared to respecting locality of data concerns and migrating only the most CPU intensive data to other servers.
To wrap things up, this basically in short describes what MattockFS can provide. If you have any data-intensive problems that require a message-bus solution, MattockFS is an option you might want to explore. MattockFS is geared primarily at computer forensics, but should be suitable also for non-forensic data-intensive problems that favor a worker approach to concurrency

MattockFS Computer Forensic File-System

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to MattockFS Computer Forensic File-System

Similar to MattockFS Computer Forensic File-System (20)

Recently uploaded

Recently uploaded (20)

MattockFS Computer Forensic File-System

Editor's Notes