A short presentation describing the roots and features of the MattockFS forensic file system, implementing page-cache friendly, capability secure, write-once data archive, opportunistic hashing and local message bus implementation geared primarily at digital forensic framework usage.
Using Ceph for Large Hadron Collider DataRob Gardner
Talk by Lincoln Bryant (University of Chicago ATLAS team) on using Ceph for ATLAS data analysis @ Ceph Days Chicago http://ceph.com/cephdays/ceph-day-chicago/
RubiX: A caching framework for big data engines in the cloud. Helps provide data caching capabilities to engines like Presto, Spark, Hadoop, etc transparently without user intervention.
Using Ceph for Large Hadron Collider DataRob Gardner
Talk by Lincoln Bryant (University of Chicago ATLAS team) on using Ceph for ATLAS data analysis @ Ceph Days Chicago http://ceph.com/cephdays/ceph-day-chicago/
RubiX: A caching framework for big data engines in the cloud. Helps provide data caching capabilities to engines like Presto, Spark, Hadoop, etc transparently without user intervention.
Data engineering Stl Big Data IDEA user groupAdam Doyle
Modern day Data Engineering requires creating reliable data pipelines, architecting distributed systems, designing data stores, and preparing data for other teams.
We’ll describe a year in the life of a Data Engineer who is tasked with creating a streaming data pipeline and touch on the skills necessary to set one up using Apache Spark.
Slides from the April 2019 meeting of the St. Louis Big Data IDEA meetup.
EKAW - Publishing with Triple Pattern FragmentsRuben Taelman
Slides for the presentation on Publishing with Triple Pattern Fragments in the Modeling, Generating and Publishing knowledge as Linked Data tutorial at EKAW 2016.
10 minutes lightning talks about how to avoid hotspots in Elasticsearch. It goes through the way elasticsearch decides which node will host your data as well as how to force it to store the data on the nodes you want.
This is the presentation given during CentOS Dojo Bangalore.
The slides are originally authored by Vijay Bellur, Lalatendu Mohanty. Added a few slides at the end for CentOS setup.
The design of forensic computer workstationsjkvr100
The design of digital forensic computer workstations is not overly complex, but it does require careful attention to the exacting requirements of retrieving and preserving evidence as quickly as possible. Because of this, forensic computer workstations are almost always custom designed and manufactured by companies like Ace Computers.
Data engineering Stl Big Data IDEA user groupAdam Doyle
Modern day Data Engineering requires creating reliable data pipelines, architecting distributed systems, designing data stores, and preparing data for other teams.
We’ll describe a year in the life of a Data Engineer who is tasked with creating a streaming data pipeline and touch on the skills necessary to set one up using Apache Spark.
Slides from the April 2019 meeting of the St. Louis Big Data IDEA meetup.
EKAW - Publishing with Triple Pattern FragmentsRuben Taelman
Slides for the presentation on Publishing with Triple Pattern Fragments in the Modeling, Generating and Publishing knowledge as Linked Data tutorial at EKAW 2016.
10 minutes lightning talks about how to avoid hotspots in Elasticsearch. It goes through the way elasticsearch decides which node will host your data as well as how to force it to store the data on the nodes you want.
This is the presentation given during CentOS Dojo Bangalore.
The slides are originally authored by Vijay Bellur, Lalatendu Mohanty. Added a few slides at the end for CentOS setup.
The design of forensic computer workstationsjkvr100
The design of digital forensic computer workstations is not overly complex, but it does require careful attention to the exacting requirements of retrieving and preserving evidence as quickly as possible. Because of this, forensic computer workstations are almost always custom designed and manufactured by companies like Ace Computers.
Business Intelligence (BI) Tools For Computer ForensicDhiren Gala
The presentation contains: Concept of Forensic, Need & Purpose of Forensic
Computer Forensic, Role of IT for Forensic, Data Collection / Mining Tools, Data Analysis & Reporting, Fraud Detection & Auditing
An Introduction to Computer Forensics Field ... Some Information's about the Field .. Some Demos ... How to be a Forensic expert ... Forensics Steps .... Dark Side of Forensics .... and lot more great Information's .....
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...inside-BigData.com
In this deck from the Stanford HPC Conference, DK Panda from Ohio State University presents: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing.
"This talk will provide an overview of challenges in accelerating Hadoop, Spark and Memcached on modern HPC clusters. An overview of RDMA-based designs for Hadoop (HDFS, MapReduce, RPC and HBase), Spark, Memcached, Swift, and Kafka using native RDMA support for InfiniBand and RoCE will be presented. Enhanced designs for these components to exploit NVM-based in-memory technology and parallel file systems (such as Lustre) will also be presented. Benefits of these designs on various cluster configurations using the publicly available RDMA-enabled packages from the OSU HiBD project (http://hibd.cse.ohio-state.edu) will be shown."
Watch the video: https://youtu.be/iLTYkTandEA
Learn more: http://web.cse.ohio-state.edu/~panda.2/
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Supercharging Data Performance for Real-Time Data Analysis Ryft
The velocity and volume of data are growing faster than ever before, and companies are looking for new methods to speed their data analytics. Using an innovative FPGA-based architecture, the Ryft ONE supercharges data analytics and provides you more value from your data.
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
With a current zoo of technologies and different ways of their interaction it's a big challenge to architect a system (or adopt existed one) that will conform to low-latency BigData analysis requirements. Apache Kafka and Kappa Architecture in particular take more and more attention over classic Hadoop-centric technologies stack. New Consumer API put significant boost in this direction. Microservices-based streaming processing and new Kafka Streams tend to be a synergy in BigData world.
Big Data Architecture Workshop - Vahid Amiridatastack
Big Data Architecture Workshop
This slide is about big data tools, thecnologies and layers that can be used in enterprise solutions.
TopHPC Conference
2019
We Provide Hadoop training institute in Hyderabad and Bangalore with corporate training by 12+ Experience faculty.
Real-time industry experts from MNCs
Resume Preparation by expert Professionals
Lab exercises
Interview Preparation
Experts advice
Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022JayjeetChakraborty
With the ever-increasing dataset sizes, several file
formats such as Parquet, ORC, and Avro have been developed
to store data efficiently, save the network, and interconnect
bandwidth at the price of additional CPU utilization. However,
with the advent of networks supporting 25-100 Gb/s and storage
devices delivering 1, 000, 000 reqs/sec, the CPU has become the
bottleneck trying to keep up feeding data in and out of these
fast devices. The result is that data access libraries executed
on single clients are often CPU-bound and cannot utilize the
scale-out benefits of distributed storage systems. One attractive
solution to this problem is to offload data-reducing processing
and filtering tasks to the storage layer. However, modifying
legacy storage systems to support compute offloading is often
tedious and requires an extensive understanding of the system
internals. Previous approaches re-implemented functionality of
data processing frameworks and access libraries for a particular
storage system, a duplication of effort that might have to be
repeated for different storage systems.
This paper introduces a new design paradigm that allows extending programmable object storage systems to embed existing,
widely used data processing frameworks and access libraries
into the storage layer with no modifications. In this approach,
data processing frameworks and access libraries can evolve
independently from storage systems while leveraging distributed
storage systems’ scale-out and availability properties. We present
Skyhook, an example implementation of our design paradigm
using Ceph, Apache Arrow, and Parquet. We provide a brief
performance evaluation of Skyhook and discuss key results.
The world has changed and having one huge server won’t do the job anymore, when you’re talking about vast amounts of data, growing all the time the ability to Scale Out would be your savior. Apache Spark is a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
This lecture will be about the basics of Apache Spark and distributed computing and the development tools needed to have a functional environment.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
2. A family tree
● 2002: OCFA Anycast
● 2006: CarvFS
● 2006: Sealed Digital Evidence Bag
● 2008: MinorFS
3. The OCFA Anycast
● Message bus solution
● Based on the workers concept
● Part of Open Computer forensics Architecture
● Persistent Priority Queues
● Virtually infinite size queues
4. Issues with AnyCast
● Infinite size queues
● Multiple tools in tool-chain access same data
● Result: Many page-cache misses
5. Spurious reads in OCFA
● Page-cache misses
● Early hashing and Content Addressed Storage
● No locality of data design
● Server-side data entry
6. CarvFS
● Storage requirements traditional file carving
● CarvFS allows for zero-storage carving
● Carved files not copied our but designated
● CarvPath designations
– /mnt/carvfs/mp3/18400+4096_S4096_47912+975.crv
7. Issues with CarvFS
● Read-only access to forensic disk image
● In large cases hundreds of mounted image files
● OCFA hacks
– Bypass CarvFS to write to underlying growing
archive
– Inefficient hybrid CarvFS/CAS storage
8. Shared mutable data
● Meta-data
● Evidence data
● Derived evidence data
● Provenance logs
9. Shared mutable data
● Meta-data
● Evidence data
● Derived evidence data
● Provenance logs
● Anti forensics ?
● Forensic process integrity ?
10. Integrity measures
● Trusted immutable data and meta-data
– Lab side sealed digital evidence bags.
● Trusted provenance logging
● Capability based access control
– Sparse capabilities
– Mandatory Access Controls
11. Spurious reads measures
● Opportunistic hashing
● Linux fadvise API
● Integration of message-bus and data-archive
● Priority queues → Sortable sets
● Provide pressure data for throttling purposes
12. MattockFS
● User space file-system for forensic frameworks
● Zero-storage carving
● Hybrid data-archive/message-bus solution
● Helps minimize spurious read induced performance
degradation.
● Provides write-once (meta) data
● Provides safe capability based API
● Provides trusted provenance logs
13. MattockFS: Hooks
● Hooks for forensic frameworks
– Framework based routing functionality
– Pressure data for throttling new-data input
– Queue size for throttling
● Hooks for multi-server setup
– Geared to locality of data
– Migrating CPU intensive jobs
14. MattockFS
Combining a zero-storage carving-capable high-
integrity data-storage archive with trusted forensic
logging, a capability secure API and a local page-
cache friendly message bus solution.
Editor's Notes
This is a short talk about the MattockFS computer-forensic file-system. MattockFS is a user space file-system that runs on the Linux platform and that is geared towards the specific needs for computer-forensic frameworks with respect to data storage and access and with respect to message-bus facilities.
Before we get into MattockFS itself, you could say MattockFS has four important ancestors with respect to parts of its features.
The Open Computer Forensics Architecture had a core component named the AnyCast. This was basically what today would be called a message broker or a message bus.
CarvFS introduced the concept of CarvPath designations as a way to use pseudo file-paths to implement a feature known as in-line carving or zero-storage carving where you could extract files from a disk image without actually copying them out. Than the Sealed Digital Evidence Bag concept and MinorFS introduced system integrity properties and facilities that I will show are important with respect to the threath of anti-forensics.
Lets start by looking at one of these MattockFS ancestors. The OCFA Anycast. The Anycast was a message bus solution, kind of like RabitMQ, but geared soly on the worker concept. It was part of a computer forensic framework named the Open Computer Forensics Architecture or OCFA. It used the concept of persistent priority queues that for all intent and purpose were infinite in size, a property that we later found out had some drawbacks.
So what issues were there with the Anycast? The problem lies in the very nature of the computer forensic process. In this process, multiple tools in a tool chain are used consecutively on a single piece of data. The infinite size queues however combined with the lack of throttling in the OCFA framework, resulted in the fact that much data would long have been moved out of the operating systems page-cache at the moment the next tool would access the data. This would result in many page-cache misses and thus many spurious reads from the actual hard disks.
Given the IO intensive nature of the computer forensic process, spurious reads can be quite a performance issue for the overall system. Page cache misses as induced by the Anycast design are one source of such spurious reads, but in OCFA there were more design issues also leading to even more spurious reads. Either by aggravating the page-cache miss problems, or for other reasons. OCFA used early hashing of all data, partialy resulting from its primary storage model that was based on Content Addressed Storage or CAS. This added extra time between the first read and the last read, thus resulting in increased page cache issues. Two other spurious read issues resulted from the lack of a “locality of data” based design and the server-side data entry.
A second ancestor of MattockFS is CarvFS. CarvFS is a user space file-system that allows mounting a single computer forensic disk image, and allows to then carve files from unallocated space without actually having to copy the data out. CatvFS introduced the concept of CarvPath designations, like the one you see at the bottom of this slide. These designations are made up primarily of a combination of offset and size that together describe where in the mounted entity the actual data resides.
While CarvFS was a big step forward for carving out files, in the light of usage within a computer forensic framework there were a number of issues. CarvFS was designed to mount a single disk image, In a large case, hundreds of disk images will be entered. In the case of OCFA this meant that hundreds of disk image files would have to be mounted as user-space file-system simultaneously, resulting in massive use of system resources that turned out to be prohibitive. An other issue was that for adding data in any other way, CAS storage was still needed. A hack was used to use CarvFS with OCFA that basically had modules bypass CarvFS altogether in order to create one single big archive file. The integration of CarvFS into OCFA never became fully optimal because CAS storage remained such an integral part of its core design.
Modern insights with respect to system integrity force us to look at the use of shared mutable state within computer forensic frameworks. Basically a forensic framework weaves together a number of tools into suitable tool-chains for different types of data. Each module in such a framework accessing shared mutable data has the theoretic ability to change the shared mutable data. As non of the modules is expected to be malicious, the historic approach to shared mutable data also used in OCFA was that this would not be a serious issue.
Enter anti-forensics. Tools used in the tool-chain, just like programs such as web browsers and PDF readers do have vulnerabilities. Remember that the suspect controls what data resided on his disk. There could be exploits targeting specific tools or libraries in the tool-chain, and these exploits could be used to target shared mutable data in our framework. Either corrupting the data or possibly even rendering the whole system useless.
With respect to system integrity, experience with OCFA has shown some modules can just crash benignly if exposed to certain wrongly identified data files. Given that a random crash can have undefined behavior, there remains a serious risk that even non anti-forensic crashes might corrupt the system through shared mutable state.
Ok, lets start looking at MattockFS now. First let look at the system integrity measures that MattockFS introduces. The concept of Sealed Digital Evidence Bags introduced us to the idea that data and meta-data should be sealed. MattockFS uses the fact that it runs under a different user id than the rest of the system to provide trusted write-once immutable data. It also uses this fact to provide a trusted provenance log facility. Data, meta-data and provenance log are protected from tempering by an exploit against one of the tools in the tool-chain. Access to the MattockFS file-system with respect to anything that might intervene with this goes through what is called capability based access control. We implement these in a way we borrowed from the MinorFS least authority file-system.
Now that we have tackled the anti-forensics and integrity concerns, lets look at what MattockFS does for attenuating our spurious reads problem. Firs of all MattockFS has built-in hashing functionality. Whenever data is read from or written to in MattockFS, MattockFS tries if it can use this data to further calculating the hash of any CarvPath entity that this data is part of. Secondly, mattockFS communicates with the Linux kernel regarding archive data being accessed. It informs the kernel what data it will soon want to access again and what data it doesn't. This should allow the kernel to be smarter about page-cache handling. In order to allow the message bus and storage system to effectively work together, MattockFS integrates the message-bus into the file-system. If provides multiple policies for picking messages that are most beneficial to opportunistic hashing or page-cache pressure. Finaly MattockFS provides data to the module framework that should allow the framnework to properly implement throttling.
A quick sumary of the main features of mattockFS. MattockFS is a user-space file-system for Linux that has facilities for zero storage carving. It is a hybrid data-archive and message-bus sytem geares pecifically at the needs of the computer forensic process. One of its primary goals is the minimization of spurious hard disk reads that hinder overal system performance. Next to performance, MattockFS provides multiple anti-anti-forensics facilities. Write-once access to data and meta-data, trusted provenance logs and a secure capability based form of access control for modules communicating with the file-sytem.
There are some things that MattockFS only facilitates in by providing hooks. MattockFS expects the module framework on top of it to implement routing and file-type determination functionality. It provides the routing functionality with a possibility to maintain state. Also the module framework is expected to implement some form of throttling, especially for entering new data into the system, and to a lesser extent, adding new messages into the system. MattockFS provides data about how filled up the page-cache could be about to get, and about how much data and messages are queued for the different modules.
Finally MattockFS provides hooks for creating multi-server setups. These are geared to respecting locality of data concerns and migrating only the most CPU intensive data to other servers.
To wrap things up, this basically in short describes what MattockFS can provide. If you have any data-intensive problems that require a message-bus solution, MattockFS is an option you might want to explore. MattockFS is geared primarily at computer forensics, but should be suitable also for non-forensic data-intensive problems that favor a worker approach to concurrency