Secure provenance can provide integrity and confidentiality assurances for document histories. It aims to prevent undetectable history rewriting through techniques like cryptographic checksums and signatures. For most real-world workloads, the overhead of recording and verifying provenance information is between 1-15%. Provenance is useful for applications like data quality assessment, replication of data derivation processes, attribution of copyright, and providing audit trails.
Blockchain is disrupting many facets of personal and business live. This presentation presents the best use cases out there in Novemeber 2018 as to how blockchain is being applied to the scientific world : in enabling open research and open science, in biology, astronomy, healthcare, clinical trials, genomics, cryonics, radiology, life sciences, supply chains and provenance, livestock sustainability, weather forecasting , air pollution, carbon emissions and other items related to smart cities and the environment.
GigaScience Editor-in-Chief Laurie Goodman's talk at the International Conference on Genomics pre-conference press-session on the release of new unpublished datasets, and a new look beta version of their database: GigaDB.org
KnowledgeCoin: recognizing and rewarding metadata integration and sharing ...Francisco Couto
Research is increasingly becoming a data-intensive science, however proper data integration and
sharing is more than storing the datasets in a public repository, it requires the data to be
organized, characterized and updated continuously. This article assumes that by rewarding and
recognizing metadata sharing and integration on the semantic web using ontologies, we are
promoting and intensifying the trust and quality in data sharing and integration. So, the proposed
approach aims at measuring the knowledge rating of a dataset according to the specificity and
distinctiveness of its mappings to ontology concepts. The knowledge ratings will then be used as
the basis of a novel reward and recognition mechanism that will rely on a virtual currency, dubbed
KnowledgeCoin (KC). Its implementation could explore some of the solutions provided by current
cryptocurrencies, but KC will not be a cryptocurrency since it will not rely on a cryptographic proof
but on a central authority whose trust depends on the knowledge rating measures proposed by
this article. The idea is that every time a scientific article is published, KCs are distributed
according to the knowledge rating of the datasets supporting the article.
What is digital evidence? , sources of digital evidence, types of digital evidence, the procedure for collecting digital evidence, records, digital vs physical evidence, controlling contamination.
Blockchain is disrupting many facets of personal and business live. This presentation presents the best use cases out there in Novemeber 2018 as to how blockchain is being applied to the scientific world : in enabling open research and open science, in biology, astronomy, healthcare, clinical trials, genomics, cryonics, radiology, life sciences, supply chains and provenance, livestock sustainability, weather forecasting , air pollution, carbon emissions and other items related to smart cities and the environment.
GigaScience Editor-in-Chief Laurie Goodman's talk at the International Conference on Genomics pre-conference press-session on the release of new unpublished datasets, and a new look beta version of their database: GigaDB.org
KnowledgeCoin: recognizing and rewarding metadata integration and sharing ...Francisco Couto
Research is increasingly becoming a data-intensive science, however proper data integration and
sharing is more than storing the datasets in a public repository, it requires the data to be
organized, characterized and updated continuously. This article assumes that by rewarding and
recognizing metadata sharing and integration on the semantic web using ontologies, we are
promoting and intensifying the trust and quality in data sharing and integration. So, the proposed
approach aims at measuring the knowledge rating of a dataset according to the specificity and
distinctiveness of its mappings to ontology concepts. The knowledge ratings will then be used as
the basis of a novel reward and recognition mechanism that will rely on a virtual currency, dubbed
KnowledgeCoin (KC). Its implementation could explore some of the solutions provided by current
cryptocurrencies, but KC will not be a cryptocurrency since it will not rely on a cryptographic proof
but on a central authority whose trust depends on the knowledge rating measures proposed by
this article. The idea is that every time a scientific article is published, KCs are distributed
according to the knowledge rating of the datasets supporting the article.
What is digital evidence? , sources of digital evidence, types of digital evidence, the procedure for collecting digital evidence, records, digital vs physical evidence, controlling contamination.
Scott Edmunds talk on GigaScience Big-Data, Data Citation and future data handling at the International Conference of Genomics on the 15th November 2011.
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
Scott Edmunds presentation on: Reproducible method and benchmarking publishing for the data (and evidence) driven era. The Silk Road Forensics Conference, Yantai, 18th September 2018
Data Communities - reusable data in and outside your organization.Paul Groth
Description
Data is a critical both to facilitate an organization and as a product. How can you make that data more usable for both internal and external stakeholders? There are a myriad of recommendations, advice, and strictures about what data providers should do to facilitate data (re)use. It can be overwhelming. Based on recent empirical work (analyzing data reuse proxies at scale, understanding data sensemaking and looking at how researchers search for data), I talk about what practices are a good place to start for helping others to reuse your data. I put this in the context of the notion data communities that organizations can use to help foster the use of data both within your organization and externally.
Forensics Investigation Report
Accuracy International (AI)
Prepared for
Head of Forensic Department
By
<
Your Name goes here
>
Table of Contents
1.
INTRODUCTION
3
1.1
Nature of incident
3
1.1.1
Location
3
2.
VICTIMS
3
2.1
Victim details
3
3.
LOCATION OF EVIDENCE
3
3.1
Evidence description
3
3.1.1
System, Network, Server Descriptions
3
3.2
Seizure details
3
3.3
Handling Details (Chain of Custody)
3
3.4
Location of Evidence
4
4.
DEFINITIONS
4
4.1
Definitions
4
4.2
Tools
4
5.
PRESERVATION OF EVIDENCE
4
5.1
Validation of Original Evidence
4
5.1.1
Procedures
4
5.1.2
Result
5
5.1.3
Validation
5
5.2
Imaging
5
5.2.1
Procedures
5
5.2.2
Result
5
5.2.3
Validation
5
6.
INITIAL EVALUATION OF THE EVIDENCE
5
6.1
Existing Data Details
5
6.1.1
.
5
6.1.2
5
7.
ANALYSIS STEPS
5
7.1
Procedures
5
7.1.1
5
7.1.2
5
8.
RESULTS
6
8.1
Pertinent Document Summaries
6
8.1.1
Document 1 Summary –
6
8.1.2
Document 2 Summary -
6
8.2
Pertinent Images Summary
6
8.2.1
Image 1 Summary –
6
8.2.2
Image 2 Summary –
6
9.
CONCLUSIONS
6
9.1
Summary
6
1.
INTRODUCTION
1.1
Nature of incident
Accuracy International (AI) is a specialist British firearms manufacturer based in Portsmouth, Hampshire, England and best known for producing the Accuracy International Arctic Warfare series of precision sniper rifles. Earlier this year, AI's computer network was hit by a data stealing malware which cost thousands of pounds to recover from. As part of an ongoing covert investigation, the head of Security at AI (DG) has hired you to conduct a forensic investigation on an image of a USB device. The USB device, it is a non-company issued device, allegedly belonging to an employee Christian Macleod, a consultant and technical manager at AI for more than six years.
The USB device in question allegedly was removed from Christian's workstation at AI while he was out of the office for lunch, the device was imaged and then it was plugged in back into Christian's workstation. You have been provided with a copy of that image (the original copy is at the moment secure in a secure locker at the security department).
1.1.1
Location
Research and Development department
2.
VICTIMS
2.1
Victim details
Accuracy International (AI)
is the victim in this case
3.
LOCATION OF EVIDENCE
3.1
Evidence description
3.1.1
System, Network, Server Descriptions
3.1.1.1
System 1
3.2
Seizure details
An identical copy of the suspect’s USB stick has been made for Forensic analysis on 16
th
Feb 2015.
The USB stick was then returned to the suspect’s work computer, while he was at lunch.
3.3
Handling Details (Chain of Custody)
16/02/2015 12:30
seizure of the USB stick by investigator David Chadwick.
16/02/2015 12:45
an ISO image was created, which is a digitally identical copy of the original USB stick – verified by investigator Diane Gan
3.4
Location of Evidence
The original ISO has been placed in the secure locker, No 1625
A copy of the ISO has been passed to the Forensic Department for anaylsis.
Scott Edmunds talk in the "Policies and Standards for Reproducible Research" session on Revolutionizing Data Dissemination: GigaScience, at the Genomic Standards Consortium meeting at Shenzhen. 6th March 2012
Interpretation, Context, and Metadata: Examples from Open ContextEric Kansa
Presentation given at the International Data Curation Conference (#IDCC!6) in Amsterdam, at the "A Context-driven Approach to Data Curation for Reuse" workshop (organized by Ixchel Faniel and Elizabeth Yakel) on Monday, February 22, 2015
Scott Edmunds talk on GigaScience Big-Data, Data Citation and future data handling at the International Conference of Genomics on the 15th November 2011.
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
Scott Edmunds presentation on: Reproducible method and benchmarking publishing for the data (and evidence) driven era. The Silk Road Forensics Conference, Yantai, 18th September 2018
Data Communities - reusable data in and outside your organization.Paul Groth
Description
Data is a critical both to facilitate an organization and as a product. How can you make that data more usable for both internal and external stakeholders? There are a myriad of recommendations, advice, and strictures about what data providers should do to facilitate data (re)use. It can be overwhelming. Based on recent empirical work (analyzing data reuse proxies at scale, understanding data sensemaking and looking at how researchers search for data), I talk about what practices are a good place to start for helping others to reuse your data. I put this in the context of the notion data communities that organizations can use to help foster the use of data both within your organization and externally.
Forensics Investigation Report
Accuracy International (AI)
Prepared for
Head of Forensic Department
By
<
Your Name goes here
>
Table of Contents
1.
INTRODUCTION
3
1.1
Nature of incident
3
1.1.1
Location
3
2.
VICTIMS
3
2.1
Victim details
3
3.
LOCATION OF EVIDENCE
3
3.1
Evidence description
3
3.1.1
System, Network, Server Descriptions
3
3.2
Seizure details
3
3.3
Handling Details (Chain of Custody)
3
3.4
Location of Evidence
4
4.
DEFINITIONS
4
4.1
Definitions
4
4.2
Tools
4
5.
PRESERVATION OF EVIDENCE
4
5.1
Validation of Original Evidence
4
5.1.1
Procedures
4
5.1.2
Result
5
5.1.3
Validation
5
5.2
Imaging
5
5.2.1
Procedures
5
5.2.2
Result
5
5.2.3
Validation
5
6.
INITIAL EVALUATION OF THE EVIDENCE
5
6.1
Existing Data Details
5
6.1.1
.
5
6.1.2
5
7.
ANALYSIS STEPS
5
7.1
Procedures
5
7.1.1
5
7.1.2
5
8.
RESULTS
6
8.1
Pertinent Document Summaries
6
8.1.1
Document 1 Summary –
6
8.1.2
Document 2 Summary -
6
8.2
Pertinent Images Summary
6
8.2.1
Image 1 Summary –
6
8.2.2
Image 2 Summary –
6
9.
CONCLUSIONS
6
9.1
Summary
6
1.
INTRODUCTION
1.1
Nature of incident
Accuracy International (AI) is a specialist British firearms manufacturer based in Portsmouth, Hampshire, England and best known for producing the Accuracy International Arctic Warfare series of precision sniper rifles. Earlier this year, AI's computer network was hit by a data stealing malware which cost thousands of pounds to recover from. As part of an ongoing covert investigation, the head of Security at AI (DG) has hired you to conduct a forensic investigation on an image of a USB device. The USB device, it is a non-company issued device, allegedly belonging to an employee Christian Macleod, a consultant and technical manager at AI for more than six years.
The USB device in question allegedly was removed from Christian's workstation at AI while he was out of the office for lunch, the device was imaged and then it was plugged in back into Christian's workstation. You have been provided with a copy of that image (the original copy is at the moment secure in a secure locker at the security department).
1.1.1
Location
Research and Development department
2.
VICTIMS
2.1
Victim details
Accuracy International (AI)
is the victim in this case
3.
LOCATION OF EVIDENCE
3.1
Evidence description
3.1.1
System, Network, Server Descriptions
3.1.1.1
System 1
3.2
Seizure details
An identical copy of the suspect’s USB stick has been made for Forensic analysis on 16
th
Feb 2015.
The USB stick was then returned to the suspect’s work computer, while he was at lunch.
3.3
Handling Details (Chain of Custody)
16/02/2015 12:30
seizure of the USB stick by investigator David Chadwick.
16/02/2015 12:45
an ISO image was created, which is a digitally identical copy of the original USB stick – verified by investigator Diane Gan
3.4
Location of Evidence
The original ISO has been placed in the secure locker, No 1625
A copy of the ISO has been passed to the Forensic Department for anaylsis.
Scott Edmunds talk in the "Policies and Standards for Reproducible Research" session on Revolutionizing Data Dissemination: GigaScience, at the Genomic Standards Consortium meeting at Shenzhen. 6th March 2012
Interpretation, Context, and Metadata: Examples from Open ContextEric Kansa
Presentation given at the International Data Curation Conference (#IDCC!6) in Amsterdam, at the "A Context-driven Approach to Data Curation for Reuse" workshop (organized by Ixchel Faniel and Elizabeth Yakel) on Monday, February 22, 2015
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Fake Picassos, Tampered History, and Digital Forgery: Protecting the Genealogy of Bits with Secure Provenance
1. Fake Picassos, Tampered History, and Digital Forgery:Protecting the Genealogy of Bits withSecure Provenance Ragib Hasan NSF Computing Innovation Fellow Dept. of Computer Science, Johns Hopkins University [Joint work with Marianne Winslett (UIUC) and Radu Sion(Stony Brook)] Computer Science Seminar, Johns Hopkins University February 2, 2010
2. Let’s play a game Real, worth $101.8 million Fake, listed at eBay, worth nothing Can you spot the fake Picasso?
3. So, how do art buyers authenticate art? Among other things, they look at Provenance records Provenance: from Latin provenire ‘come from’, defined as “(i) the fact of coming from some particular source or quarter; origin, derivation. (ii) the history or pedigree of a work of art, manuscript, rare book, etc.; a record of the ultimate derivation and passage of an item through its various owners” (Oxford English Dictionary) In other words,Who owned it, what was done to it, how was it transferred … Widely used in arts, archives, and archeology, called the Fundamental Principle of Archival http://moma.org/collection/provenance/items/644.67.html L'artiste et son modèle (1928), at Museum of Modern Art
4. Provenance of a book Owner names recorded in sequential order on left side 4
5. Let’s consider the Digital World Am I getting back untampered data? Was this data created and processed by persons I trust? Unlike data processing in the past, digital data can be rapidly copied, modified, and erased To trust data we receive from others or retrieve from storage, we need to look into the integrity of both the present state and the past historyof data Data is generated, processed, and transmitted between different systems and principals, and stored in database/storage Our life today has become increasingly dependent on digital data Our Most Valuable Asset is Data 5
6. Data History is especially important for mobile data Celebrity death hoax Resulting tweet/retweet cycle Hoaxes propagates fast, so before retweeting, do check the original source
7. The origin of data is important to measure its quality Dr. Evil Dr. Jones Dataset is used by many unsuspecting scientists to produce new data Unreliable experiments create faulty dataset Tainted data only produces more tainted results Reliable Lab Dr. Jones got the data from a lab he trusts, but the original data was tainted, so any data derived from it will also be tainted
8. Bottom line: History matters … The origins of a data object can affect its quality and usefulness The transformation path that a data object took affects its trustworthiness In other words, data provenance is extremely important to decide data quality and trustworthiness
9. What exactly is Data Provenance? Definition* Description of the origins of data and the process by which it arrived at the database. [Buneman et al.] Information describing materials and transformations applied to derive the data. [Lanter] Metadata recording the process of experiment workflows, annotations, and notes about experiments. [Greenwood] Information that helps determine the derivation history of a data product, starting from its original sources. [Simmhan et al.] 9 *Simmhan et al. A Survey of Provenance in E-Science. SIGMOD Record, 2005.
11. What was the common theme of all those systems? They were all scientific computing systems And scientists trust people (more or less) Previous research covers provenance collection, annotation, querying, and workflow, but security issues are not handled For provenance in untrusted environments, we need integrity, confidentiality and privacy guarantees So, we need provenance of provenance, i.e. a model for Secure Provenance Data
12. “History is not history unless it is the truth”. Abraham Lincoln, 1856
13. Problem Statement Can we design an efficient system for securing the provenance of documents, and allow auditing with fine-grained preservation of confidentiality?
14. Previous work on integrity assurances (Logically) centralized repository (CVS, Subversion, GIT) Changes to files recorded Not applicable to mobile documents File systems with integrity assurances (SUNDR, PASIS, TCFS) Provide local integrity checking Do not apply to data that traverses systems System state entanglement (Baker 02) Entangle one system’s state with another, so others can serve as witness to a system’s state Not applicable to mobile data Secure audit logs / trails (Schneier and Kelsey 99), LogCrypt (Holt 2004), (Peterson et al. 2006) Trusted notary certifies logs, or trusted third party given hash chain seed
15. Secure provenance means preventing “Undetectable history rewriting” Preventundetectable modification of history, such that, Adversaries cannot insert fake events, remove genuine events from a document’s provenance No one can deny history of own actions Allow fine grained preservation of privacy and confidentiality of actions Users can choose which auditors can see details of their work Attributes can be selectively disclosed or hidden without harming integrity check 15
22. What realistic guarantees can we provide against a strong adversary? We can’t expect or force the adversary to play by the books Access control is not feasible in a loosely coupled distributed environment So, we can only verify plausibility of a given history
23. Plausible history is something that could have happened Plausible history: The version history that will result if a document were created and subsequently edited and transferred from user to user, with provenance information correctly and indelibly recorded all along the way. A document can have multiple plausible histories
24. Original scheme: Ensure Verification of Plausible History Given a document and a provenance chain, we want to verify if this could have happened Forger’s goal: To create a plausible history involving honest users
25. Analogy: The Forger and the Prada Fake Prada, street price $50 Forger’s goal is to create a plausible history for a fake Prada bag (i.e. make it appear to come from real Prada distributors via real supply chain) Forger’s goal is NOT to replace a real Prada bag’s plausible history, to show it was made elsewhere. Real Prada, price $3000
26. Our solution: Overview P1 P2 P3 P4 Pn-1 Pn Provenance Chain … U3 W3 K3 C3 Pub3 Provenance Entry Uid3 Pid3 Host3 IP3 time3 Ui= identity of the principal (lineage) Ci= integrity checksum(s) Ki = confidentiality locks for Wi Wi= Encrypted modification log
27.
28. Only the auditor(s) trusted by the user can see the user’s actions on the documentMultiple auditors Encrypted Modification log Modification log Encrypted Modification log Modification log Encrypted Modification log Optimization: Use broadcast encryption tree to reduce number of required keys
29. Sidenote: Broadcast Encryption Organize the keys as a tree, with auditors at leaves Each principal knows all the keys from leaf to root A D B C To trust everyone, use the root key To trust only D To trust B and C To trust A and C and D
30.
31. Fine grained control over confidentiality Redacted (unclassified) Document Classified Document Provenance chain has sensitive info Declassify / release Deleting sensitive information will break integrity checks P1 P2 P3 P4 P1 P2 P3 P4 Nonsensitive Information Sensitive Information Original attributes commitment Disclosable provenance entry Nonsensitive Info Sensitive info Commit(sensitive info) Checksum calculation Blinded entry disclosed to third party Nonsensitive Info Commit(sensitive info)
32. We can summarizeprovenance chains to save space, make audits fast 1:1 chain n:1 chain Each entry has 1 checksum, calculated from 1 previous checksum Each entry has n checksums, each of them calculated from 1 previous checksum We can systematically remove entries from the chain while still being able to prove integrity of chain
34. Co-Provenance provides stronger guarantees Problem: We can only tell if a given provenance chain shows what could have happened, but not exactly what really happened Solution: Entangle chains of related documents Each entry now contains n checksums, each computed using the checksum of the last entry of the provenance chain of a related document
36. Implementation Options Secure provenance management can be implemented in Kernel layer Transparent to user apps But less portable File System layer Also transparent to user apps Less portable Application layer Needs (slight) modification of apps Highly portable
37.
38. Record all write data, and related context information in the provenance entrySPROV is an application layer library in C Can be added to existing applications with almost no change to code Provides the file system APIs from stdio.h To add secure provenance, simply relinkapplications with sprov library instead of stdio.h 33
39. Implementation: How Sprov works Modified fopen / fwrite / fclose calls File I/O calls trigger provenance chain handling mechanism Calls to fwrite log the modifications Calls to fclose writes a new provenance entry, computes new checksum
40. Experiment goals Measure run time overhead of secure provenance Run the benchmarks with and without secure provenance Calculate the % overhead Observe the effects of different types of workloads
41. Experimental settings Crypto settings 1024 bit DSA signatures 128 bit AES encryption SHA-1 for hashes Experiments Postmark benchmark : performance of writes on small files Hybrid workload benchmark: performance with real life file system distribution and workloads 36
42.
43.
44. Hybrid workloads: Simulating real file systems File system distribution: File size distribution in real file systems follows the log normal distribution [Bolosky and Douceur 99] We created a file system with 20,000 files, using the lognormal parameters mu = 8.46, sigma = 2.4 Median file size = 4KB , mean file size = 80KB In addition, we included a few large (1GB+) files 39
50. Typical real life workloads: 1 - 13% overhead EECS EECS CIFS-corp/eng CIFS-corp/eng INS, RES INS, RES Config-Disk Config-RD INS and RES are read-intensive (80%+ reads), so overheads are very low in both cases. CIFS-corp and CIFS-eng have 2:1 ratio of reads and writes, overheads are still low (range from 12% to 2.5%) EECS has very high write load (82%+), so the overhead is higher, but still less than 35% for Config-Disk, and less than 7% for Config-RD
52. Application-specific history requires different techniques History integrity in databases Developed a solution that leverages WORM storage to provide history integrity for database tuples Incurs only 1%-3% overhead for TPC-C run-time Reduces window of vulnerability 5-fold Reduced audit times 100-fold over the previous state-of-the-art solution History integrity in cloud computing
53. Beyond Provenance Provenance, versions, and snapshots are general cases of the property – remembrance By enabling different objects to recall their past, we can introduce richer functionality, for example: 1.5th factor authentication: Use “What you remember” to strengthen any other authentication factor Flexible security policies: Use knowledge of past of attributes/variables when evaluating security policies Ragib Hasan, Radu Sion, and Marianne Winslett, “Remembrance: The Unbearable Sentience of Being Digital”, CIDR 2009 44
54. Papers Ragib Hasan, Radu Sion, and Marianne Winslett, “Introducing Secure Provenance: Problems and Challenges”, ACM StorageSS 2007 Ragib Hasan, Radu Sion, and Marianne Winslett, “The Case of the Fake Picasso: Preventing History Forgery with Secure Provenance”, USENIX FAST 2009, ACM TOSDec 2009 issue Ragib Hasan, Radu Sion, and Marianne Winslett, “Remembrance: The Unbearable Sentience of Being Digital”, CIDR 2009 45
55. Summary: Secure provenance possible at low cost Yes, We CAN achieve secure provenance with integrity and confidentiality assurances with reasonable overheads For most real-life workloads, overheads are between 1% and 15% only More info at http://tinyurl.com/secprov
56. Usage of Provenance Data Quality – depends on input data and actions taken on data Replication – a recipe for recreating data Attribution – proper attribution of copyright Informational – learn how data was derived Audit Trails –proof of due diligence; investigation of problem sources, system intrusions, proof of prior work in patents 47