SlideShare a Scribd company logo

Auditing and Maintaining Provenance in Software Packages

IPAW 2014

1 of 42
Download to read offline
Auditing and Maintaining Provenance in
Software Packages
Quan Pham1 Tanu Malik2 Ian Foster1,2
Department of Computer Science1 and Computation Institute2,
The University of Chicago,
Chicago, IL 60637, USA
quanpt@cs.uchicago.edu, tanum@ci.uchicago.edu
Presented by Boris Glavic
Illinois Institute of Technology
IPAW14
June, 10th, 2014
Provenance in Software Packages June, 10th
, 2014 1 / 29
Outline
1 Introduction
2 Software Pipeline Usecase
3 CDE-SP: Software Provenance in CDE
4 Experiment and Evaluation
5 Related Work
6 Conclusion
Provenance in Software Packages June, 10th
, 2014 2 / 29
Current Solutions for Ensuring Reproducibility and Issues
1 Publish source code and data
− GitHub, Figshare, Research Compendia
Pros: (in many cases) easy to accomplish
× Cons: need to recompile and re-execute
2 Publish software package including source code, data, and
environment dependencies
− CDE, RunMyCode.org
Pros: re-execute without installation
× Cons: not easy to combine and merge shared packages
3 Publish a virtual machine image (VMI) that includes OS, source code,
data, and environment
− Cloud BioLinux (NEBC), Swift Appliance (RDCEP)
Pros: no additional modules or components needed to rerun
× Problem: too hard to provision and understand
Introduction Provenance in Software Packages June, 10th
, 2014 3 / 29
Reproducibility Problem
Our philosophy:
”... releasing shoddy VMs is easy to do, but it doesn’t help you learn how
to do a better job of reproducibility along the way. Releasing software
pipelines, however crappy, is on the path towards better reproducibility.”
C. Tituss Brown1
Reproducibility problem: How can we make it easy to combine and
merge shared packages, while correctly attributing authorship of software
packages?
No need to provision VMIs or publish simply source code and data.
1
http://ivory.idyll.org/blog/vms-considered-harmful.html
Introduction Provenance in Software Packages June, 10th
, 2014 4 / 29
Problem Scope
Use CDE2 to capture and create portable software package
Extend, partially re-use, and combine CDE packages to create new
reproducible software pipelines
Attribute authorship of software packages in new software pipelines
CDE has an OVERLAP conflict!
2
Guo, P.J., Engler, D.: CDE: using system call interposition to automatically create
portable software packages. USENIX Association, Portland, OR (2011)
Introduction Provenance in Software Packages June, 10th
, 2014 5 / 29
CDE
Create a portable software package
without installation, configuration, or privilege permissions
Audit mode to create a CDE package
Introduction Provenance in Software Packages June, 10th
, 2014 6 / 29
Ad

Recommended

PTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityPTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityTanu Malik
 
LDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationLDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationTanu Malik
 
GlobusWorld 2015
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015Tanu Malik
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsTanu Malik
 
GeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with GlobusGeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with GlobusTanu Malik
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentRutger Vos
 
FireWorks workflow software
FireWorks workflow softwareFireWorks workflow software
FireWorks workflow softwareAnubhav Jain
 

More Related Content

What's hot

The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...University of California, San Diego
 
FireWorks overview
FireWorks overviewFireWorks overview
FireWorks overviewAnubhav Jain
 
Benchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesTanu Malik
 
The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...University of California, San Diego
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsPeter van Heusden
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"Pinar Alper
 
Atomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discoveryAtomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discoveryAnubhav Jain
 
Scientific
Scientific Scientific
Scientific marpierc
 
Mining and Untangling Change Genealogies (PhD Defense Talk)
Mining and Untangling Change Genealogies (PhD Defense Talk)Mining and Untangling Change Genealogies (PhD Defense Talk)
Mining and Untangling Change Genealogies (PhD Defense Talk)Kim Herzig
 
Adding Transparency and Automation into the Galaxy Tool Installation Process
Adding Transparency and Automation into the Galaxy Tool Installation ProcessAdding Transparency and Automation into the Galaxy Tool Installation Process
Adding Transparency and Automation into the Galaxy Tool Installation ProcessEnis Afgan
 
Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceAccelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceIan Foster
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Ben Busby
 
The Open Chemistry Project
The Open Chemistry ProjectThe Open Chemistry Project
The Open Chemistry ProjectMarcus Hanwell
 
Jonathan Coveney: Why Pig?
Jonathan Coveney: Why Pig?Jonathan Coveney: Why Pig?
Jonathan Coveney: Why Pig?mortardata
 
05561 Xfer Research 01
05561 Xfer Research 0105561 Xfer Research 01
05561 Xfer Research 01Rob Gillen
 
Ase2010 shang
Ase2010 shangAse2010 shang
Ase2010 shangSAIL_QU
 

What's hot (20)

The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
 
FireWorks overview
FireWorks overviewFireWorks overview
FireWorks overview
 
Benchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging Services
 
The Materials API
The Materials APIThe Materials API
The Materials API
 
The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...
 
ICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials ProjectICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials Project
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformatics
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
 
Atomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discoveryAtomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discovery
 
Scientific
Scientific Scientific
Scientific
 
Mining and Untangling Change Genealogies (PhD Defense Talk)
Mining and Untangling Change Genealogies (PhD Defense Talk)Mining and Untangling Change Genealogies (PhD Defense Talk)
Mining and Untangling Change Genealogies (PhD Defense Talk)
 
Adding Transparency and Automation into the Galaxy Tool Installation Process
Adding Transparency and Automation into the Galaxy Tool Installation ProcessAdding Transparency and Automation into the Galaxy Tool Installation Process
Adding Transparency and Automation into the Galaxy Tool Installation Process
 
Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceAccelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy Science
 
Cluster Schedulers
Cluster SchedulersCluster Schedulers
Cluster Schedulers
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
 
The Open Chemistry Project
The Open Chemistry ProjectThe Open Chemistry Project
The Open Chemistry Project
 
Jonathan Coveney: Why Pig?
Jonathan Coveney: Why Pig?Jonathan Coveney: Why Pig?
Jonathan Coveney: Why Pig?
 
05561 Xfer Research 01
05561 Xfer Research 0105561 Xfer Research 01
05561 Xfer Research 01
 
Ase2010 shang
Ase2010 shangAse2010 shang
Ase2010 shang
 

Viewers also liked

Marketing assignment for undergraduate students
Marketing assignment for undergraduate studentsMarketing assignment for undergraduate students
Marketing assignment for undergraduate studentsEren Yılmaz
 
Challenges of rare earths - Wright et al - Sep 2016 - Argus Rare Earths Confe...
Challenges of rare earths - Wright et al - Sep 2016 - Argus Rare Earths Confe...Challenges of rare earths - Wright et al - Sep 2016 - Argus Rare Earths Confe...
Challenges of rare earths - Wright et al - Sep 2016 - Argus Rare Earths Confe...John Sykes
 
Nebosh important q&a
Nebosh important q&aNebosh important q&a
Nebosh important q&ahijaziosama
 
El fuego que hace descender la bestia.
El fuego que hace descender la bestia.El fuego que hace descender la bestia.
El fuego que hace descender la bestia.Ernesto García
 
The Industrial Group Ltd. Company Profile
The Industrial Group Ltd. Company ProfileThe Industrial Group Ltd. Company Profile
The Industrial Group Ltd. Company ProfileMarwa Kaabour
 
I'm Sorry. I Can't. Don't Hate Me. The Post-it Breakup
I'm Sorry. I Can't. Don't Hate Me. The Post-it BreakupI'm Sorry. I Can't. Don't Hate Me. The Post-it Breakup
I'm Sorry. I Can't. Don't Hate Me. The Post-it BreakupKyle Soucy
 
Environment conference presentation by Julie Girling, MEP
Environment conference presentation by Julie Girling, MEPEnvironment conference presentation by Julie Girling, MEP
Environment conference presentation by Julie Girling, MEPMargaret Mathews
 
MERI NAZAR--Contribution of Solcrats in my SBL Journey
MERI NAZAR--Contribution of Solcrats in my SBL JourneyMERI NAZAR--Contribution of Solcrats in my SBL Journey
MERI NAZAR--Contribution of Solcrats in my SBL JourneySBL DIGITAL
 
Best practices for automating your import processes
Best practices for automating your import processesBest practices for automating your import processes
Best practices for automating your import processesLivingston International
 
CEO's Guide to Sound Decision Making in 21st Century
CEO's Guide to Sound Decision Making in 21st CenturyCEO's Guide to Sound Decision Making in 21st Century
CEO's Guide to Sound Decision Making in 21st CenturyVivek Sood
 
AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Obser...
AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Obser...AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Obser...
AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Obser...jmbanda
 
Ecrinal - Beauty for Nails and Eyelashes. Shop Now!
Ecrinal - Beauty for Nails and Eyelashes. Shop Now!Ecrinal - Beauty for Nails and Eyelashes. Shop Now!
Ecrinal - Beauty for Nails and Eyelashes. Shop Now!Al Manara Pharmacy
 
Get Smart About Personal and Enterprise Vitality
Get Smart About Personal and Enterprise VitalityGet Smart About Personal and Enterprise Vitality
Get Smart About Personal and Enterprise Vitalitymarsha shenk
 
Who will be taking decisions in the boardroom of the future?
Who will be taking decisions in the boardroom of the future?Who will be taking decisions in the boardroom of the future?
Who will be taking decisions in the boardroom of the future?Omobono
 
5 Reasons a Forklift Preventative Maintenance Plan Makes Sense
5 Reasons a Forklift Preventative Maintenance Plan Makes Sense5 Reasons a Forklift Preventative Maintenance Plan Makes Sense
5 Reasons a Forklift Preventative Maintenance Plan Makes SenseForklift Trucks in Minnesota
 
DESARROLLO ENDOGENO SOSTENIBLE COMUNITARIO
DESARROLLO ENDOGENO SOSTENIBLE COMUNITARIODESARROLLO ENDOGENO SOSTENIBLE COMUNITARIO
DESARROLLO ENDOGENO SOSTENIBLE COMUNITARIOFundesuruguapo Nudesur
 

Viewers also liked (17)

Marketing assignment for undergraduate students
Marketing assignment for undergraduate studentsMarketing assignment for undergraduate students
Marketing assignment for undergraduate students
 
Challenges of rare earths - Wright et al - Sep 2016 - Argus Rare Earths Confe...
Challenges of rare earths - Wright et al - Sep 2016 - Argus Rare Earths Confe...Challenges of rare earths - Wright et al - Sep 2016 - Argus Rare Earths Confe...
Challenges of rare earths - Wright et al - Sep 2016 - Argus Rare Earths Confe...
 
Nebosh important q&a
Nebosh important q&aNebosh important q&a
Nebosh important q&a
 
El fuego que hace descender la bestia.
El fuego que hace descender la bestia.El fuego que hace descender la bestia.
El fuego que hace descender la bestia.
 
The Industrial Group Ltd. Company Profile
The Industrial Group Ltd. Company ProfileThe Industrial Group Ltd. Company Profile
The Industrial Group Ltd. Company Profile
 
Wound Swabs basics
Wound Swabs basics Wound Swabs basics
Wound Swabs basics
 
I'm Sorry. I Can't. Don't Hate Me. The Post-it Breakup
I'm Sorry. I Can't. Don't Hate Me. The Post-it BreakupI'm Sorry. I Can't. Don't Hate Me. The Post-it Breakup
I'm Sorry. I Can't. Don't Hate Me. The Post-it Breakup
 
Environment conference presentation by Julie Girling, MEP
Environment conference presentation by Julie Girling, MEPEnvironment conference presentation by Julie Girling, MEP
Environment conference presentation by Julie Girling, MEP
 
MERI NAZAR--Contribution of Solcrats in my SBL Journey
MERI NAZAR--Contribution of Solcrats in my SBL JourneyMERI NAZAR--Contribution of Solcrats in my SBL Journey
MERI NAZAR--Contribution of Solcrats in my SBL Journey
 
Best practices for automating your import processes
Best practices for automating your import processesBest practices for automating your import processes
Best practices for automating your import processes
 
CEO's Guide to Sound Decision Making in 21st Century
CEO's Guide to Sound Decision Making in 21st CenturyCEO's Guide to Sound Decision Making in 21st Century
CEO's Guide to Sound Decision Making in 21st Century
 
AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Obser...
AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Obser...AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Obser...
AMIA Joint Summits 2017 - Electronic phenotyping with APHRODITE and the Obser...
 
Ecrinal - Beauty for Nails and Eyelashes. Shop Now!
Ecrinal - Beauty for Nails and Eyelashes. Shop Now!Ecrinal - Beauty for Nails and Eyelashes. Shop Now!
Ecrinal - Beauty for Nails and Eyelashes. Shop Now!
 
Get Smart About Personal and Enterprise Vitality
Get Smart About Personal and Enterprise VitalityGet Smart About Personal and Enterprise Vitality
Get Smart About Personal and Enterprise Vitality
 
Who will be taking decisions in the boardroom of the future?
Who will be taking decisions in the boardroom of the future?Who will be taking decisions in the boardroom of the future?
Who will be taking decisions in the boardroom of the future?
 
5 Reasons a Forklift Preventative Maintenance Plan Makes Sense
5 Reasons a Forklift Preventative Maintenance Plan Makes Sense5 Reasons a Forklift Preventative Maintenance Plan Makes Sense
5 Reasons a Forklift Preventative Maintenance Plan Makes Sense
 
DESARROLLO ENDOGENO SOSTENIBLE COMUNITARIO
DESARROLLO ENDOGENO SOSTENIBLE COMUNITARIODESARROLLO ENDOGENO SOSTENIBLE COMUNITARIO
DESARROLLO ENDOGENO SOSTENIBLE COMUNITARIO
 

Similar to Auditing and Maintaining Provenance in Software Packages

Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with ConcourseContinuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with ConcourseVMware Tanzu
 
Paradigmo. Rock Kit, the Rapid Deployment Toolkit for ForgeRock Identity Plat...
Paradigmo. Rock Kit, the Rapid Deployment Toolkit for ForgeRock Identity Plat...Paradigmo. Rock Kit, the Rapid Deployment Toolkit for ForgeRock Identity Plat...
Paradigmo. Rock Kit, the Rapid Deployment Toolkit for ForgeRock Identity Plat...ForgeRock
 
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Vincenzo Ferme
 
Keynote: DevOps 4 Networks by JR Rivers of Cumulus Networks
Keynote: DevOps 4 Networks by JR Rivers of Cumulus NetworksKeynote: DevOps 4 Networks by JR Rivers of Cumulus Networks
Keynote: DevOps 4 Networks by JR Rivers of Cumulus NetworksDevOps4Networks
 
PuppetConf 2016: Continuous Delivery and DevOps with Jenkins and Puppet Enter...
PuppetConf 2016: Continuous Delivery and DevOps with Jenkins and Puppet Enter...PuppetConf 2016: Continuous Delivery and DevOps with Jenkins and Puppet Enter...
PuppetConf 2016: Continuous Delivery and DevOps with Jenkins and Puppet Enter...Puppet
 
Operations Support Workflow - Rundeck
Operations Support Workflow - RundeckOperations Support Workflow - Rundeck
Operations Support Workflow - RundeckNeil McCaughley
 
Continuous Delivery with a PaaS Application
Continuous Delivery with a PaaS ApplicationContinuous Delivery with a PaaS Application
Continuous Delivery with a PaaS ApplicationMark Rendell
 
penetration test using Kali linux seminar report
penetration test using Kali linux seminar reportpenetration test using Kali linux seminar report
penetration test using Kali linux seminar reportAbhayNaik8
 
Integração contínua com Jenkins
Integração contínua com JenkinsIntegração contínua com Jenkins
Integração contínua com JenkinsAécio Pires
 
RockKit, the Rapid Deployment Toolkit for ForgeRock Identity Platform
RockKit, the Rapid Deployment Toolkit for ForgeRock Identity PlatformRockKit, the Rapid Deployment Toolkit for ForgeRock Identity Platform
RockKit, the Rapid Deployment Toolkit for ForgeRock Identity PlatformOlivier Naveau
 
What is the Secure Supply Chain and the Current State of the PHP Ecosystem
What is the Secure Supply Chain and the Current State of the PHP EcosystemWhat is the Secure Supply Chain and the Current State of the PHP Ecosystem
What is the Secure Supply Chain and the Current State of the PHP Ecosystemsparkfabrik
 
Quality Control of NGS Data
Quality Control of NGS Data Quality Control of NGS Data
Quality Control of NGS Data Surya Saha
 
Dependency-Check Ecosystem - OWASP Summit 2017
Dependency-Check Ecosystem - OWASP Summit 2017Dependency-Check Ecosystem - OWASP Summit 2017
Dependency-Check Ecosystem - OWASP Summit 2017Steve Springett
 
Deep Dive on CI/CD NYC Meet Up Group
Deep Dive on CI/CD NYC Meet Up GroupDeep Dive on CI/CD NYC Meet Up Group
Deep Dive on CI/CD NYC Meet Up GroupNeerajKumar1965
 
Effectively using Open Source with conda
Effectively using Open Source with condaEffectively using Open Source with conda
Effectively using Open Source with condaTravis Oliphant
 
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereApache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereGanesh Raju
 
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...Puppet
 
Getting Started with Azure Artifacts
Getting Started with Azure ArtifactsGetting Started with Azure Artifacts
Getting Started with Azure ArtifactsCallon Campbell
 

Similar to Auditing and Maintaining Provenance in Software Packages (20)

Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with ConcourseContinuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
 
Paradigmo. Rock Kit, the Rapid Deployment Toolkit for ForgeRock Identity Plat...
Paradigmo. Rock Kit, the Rapid Deployment Toolkit for ForgeRock Identity Plat...Paradigmo. Rock Kit, the Rapid Deployment Toolkit for ForgeRock Identity Plat...
Paradigmo. Rock Kit, the Rapid Deployment Toolkit for ForgeRock Identity Plat...
 
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
 
Keynote: DevOps 4 Networks by JR Rivers of Cumulus Networks
Keynote: DevOps 4 Networks by JR Rivers of Cumulus NetworksKeynote: DevOps 4 Networks by JR Rivers of Cumulus Networks
Keynote: DevOps 4 Networks by JR Rivers of Cumulus Networks
 
PuppetConf 2016: Continuous Delivery and DevOps with Jenkins and Puppet Enter...
PuppetConf 2016: Continuous Delivery and DevOps with Jenkins and Puppet Enter...PuppetConf 2016: Continuous Delivery and DevOps with Jenkins and Puppet Enter...
PuppetConf 2016: Continuous Delivery and DevOps with Jenkins and Puppet Enter...
 
Operations Support Workflow - Rundeck
Operations Support Workflow - RundeckOperations Support Workflow - Rundeck
Operations Support Workflow - Rundeck
 
Continuous Delivery with a PaaS Application
Continuous Delivery with a PaaS ApplicationContinuous Delivery with a PaaS Application
Continuous Delivery with a PaaS Application
 
penetration test using Kali linux seminar report
penetration test using Kali linux seminar reportpenetration test using Kali linux seminar report
penetration test using Kali linux seminar report
 
Integração contínua com Jenkins
Integração contínua com JenkinsIntegração contínua com Jenkins
Integração contínua com Jenkins
 
RockKit, the Rapid Deployment Toolkit for ForgeRock Identity Platform
RockKit, the Rapid Deployment Toolkit for ForgeRock Identity PlatformRockKit, the Rapid Deployment Toolkit for ForgeRock Identity Platform
RockKit, the Rapid Deployment Toolkit for ForgeRock Identity Platform
 
What is the Secure Supply Chain and the Current State of the PHP Ecosystem
What is the Secure Supply Chain and the Current State of the PHP EcosystemWhat is the Secure Supply Chain and the Current State of the PHP Ecosystem
What is the Secure Supply Chain and the Current State of the PHP Ecosystem
 
Azure DevOps in Action
Azure DevOps in ActionAzure DevOps in Action
Azure DevOps in Action
 
Quality Control of NGS Data
Quality Control of NGS Data Quality Control of NGS Data
Quality Control of NGS Data
 
Dependency-Check Ecosystem - OWASP Summit 2017
Dependency-Check Ecosystem - OWASP Summit 2017Dependency-Check Ecosystem - OWASP Summit 2017
Dependency-Check Ecosystem - OWASP Summit 2017
 
Deep Dive on CI/CD NYC Meet Up Group
Deep Dive on CI/CD NYC Meet Up GroupDeep Dive on CI/CD NYC Meet Up Group
Deep Dive on CI/CD NYC Meet Up Group
 
What_is_DevOps.pptx
What_is_DevOps.pptxWhat_is_DevOps.pptx
What_is_DevOps.pptx
 
Effectively using Open Source with conda
Effectively using Open Source with condaEffectively using Open Source with conda
Effectively using Open Source with conda
 
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereApache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
 
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...
 
Getting Started with Azure Artifacts
Getting Started with Azure ArtifactsGetting Started with Azure Artifacts
Getting Started with Azure Artifacts
 

Recently uploaded

My sample product research idea for you!
My sample product research idea for you!My sample product research idea for you!
My sample product research idea for you!KivenRaySarsaba
 
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERNRonnelBaroc
 
AI Act & Standardization: UNINFO involvement
AI Act & Standardization: UNINFO involvementAI Act & Standardization: UNINFO involvement
AI Act & Standardization: UNINFO involvementMimmo Squillace
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...Neo4j
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolProduct School
 
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...ISPMAIndia
 
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, TripadvisorProduct School
 
"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys VasylievFwdays
 
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Umar Saif
 
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...UiPathCommunity
 
"Testing of Helm Charts or There and Back Again", Yura Rochniak
"Testing of Helm Charts or There and Back Again", Yura Rochniak"Testing of Helm Charts or There and Back Again", Yura Rochniak
"Testing of Helm Charts or There and Back Again", Yura RochniakFwdays
 
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...Adrian Sanabria
 
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro KozhevinFwdays
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVARobert McDermott
 
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions..."How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions...Fwdays
 
Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...
Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...
Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...MarcovanHurne2
 
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...Product School
 
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...htrindia
 
10 things that helped me advance my career - PHP UK Conference 2024
10 things that helped me advance my career - PHP UK Conference 202410 things that helped me advance my career - PHP UK Conference 2024
10 things that helped me advance my career - PHP UK Conference 2024Thijs Feryn
 

Recently uploaded (20)

My sample product research idea for you!
My sample product research idea for you!My sample product research idea for you!
My sample product research idea for you!
 
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
 
AI Act & Standardization: UNINFO involvement
AI Act & Standardization: UNINFO involvementAI Act & Standardization: UNINFO involvement
AI Act & Standardization: UNINFO involvement
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product School
 
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
 
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
 
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
 
"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev
 
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
 
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
 
"Testing of Helm Charts or There and Back Again", Yura Rochniak
"Testing of Helm Charts or There and Back Again", Yura Rochniak"Testing of Helm Charts or There and Back Again", Yura Rochniak
"Testing of Helm Charts or There and Back Again", Yura Rochniak
 
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...
 
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVA
 
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions..."How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
 
Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...
Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...
Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...
 
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
 
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
 
10 things that helped me advance my career - PHP UK Conference 2024
10 things that helped me advance my career - PHP UK Conference 202410 things that helped me advance my career - PHP UK Conference 2024
10 things that helped me advance my career - PHP UK Conference 2024
 

Auditing and Maintaining Provenance in Software Packages

  • 1. Auditing and Maintaining Provenance in Software Packages Quan Pham1 Tanu Malik2 Ian Foster1,2 Department of Computer Science1 and Computation Institute2, The University of Chicago, Chicago, IL 60637, USA quanpt@cs.uchicago.edu, tanum@ci.uchicago.edu Presented by Boris Glavic Illinois Institute of Technology IPAW14 June, 10th, 2014 Provenance in Software Packages June, 10th , 2014 1 / 29
  • 2. Outline 1 Introduction 2 Software Pipeline Usecase 3 CDE-SP: Software Provenance in CDE 4 Experiment and Evaluation 5 Related Work 6 Conclusion Provenance in Software Packages June, 10th , 2014 2 / 29
  • 3. Current Solutions for Ensuring Reproducibility and Issues 1 Publish source code and data − GitHub, Figshare, Research Compendia Pros: (in many cases) easy to accomplish × Cons: need to recompile and re-execute 2 Publish software package including source code, data, and environment dependencies − CDE, RunMyCode.org Pros: re-execute without installation × Cons: not easy to combine and merge shared packages 3 Publish a virtual machine image (VMI) that includes OS, source code, data, and environment − Cloud BioLinux (NEBC), Swift Appliance (RDCEP) Pros: no additional modules or components needed to rerun × Problem: too hard to provision and understand Introduction Provenance in Software Packages June, 10th , 2014 3 / 29
  • 4. Reproducibility Problem Our philosophy: ”... releasing shoddy VMs is easy to do, but it doesn’t help you learn how to do a better job of reproducibility along the way. Releasing software pipelines, however crappy, is on the path towards better reproducibility.” C. Tituss Brown1 Reproducibility problem: How can we make it easy to combine and merge shared packages, while correctly attributing authorship of software packages? No need to provision VMIs or publish simply source code and data. 1 http://ivory.idyll.org/blog/vms-considered-harmful.html Introduction Provenance in Software Packages June, 10th , 2014 4 / 29
  • 5. Problem Scope Use CDE2 to capture and create portable software package Extend, partially re-use, and combine CDE packages to create new reproducible software pipelines Attribute authorship of software packages in new software pipelines CDE has an OVERLAP conflict! 2 Guo, P.J., Engler, D.: CDE: using system call interposition to automatically create portable software packages. USENIX Association, Portland, OR (2011) Introduction Provenance in Software Packages June, 10th , 2014 5 / 29
  • 6. CDE Create a portable software package without installation, configuration, or privilege permissions Audit mode to create a CDE package Introduction Provenance in Software Packages June, 10th , 2014 6 / 29
  • 7. CDE Create a portable software package without installation, configuration, or privilege permissions Audit mode to create a CDE package Introduction Provenance in Software Packages June, 10th , 2014 6 / 29
  • 8. CDE Create a portable software package without installation, configuration, or privilege permissions Audit mode to create a CDE package Introduction Provenance in Software Packages June, 10th , 2014 6 / 29
  • 9. CDE Create a portable software package without installation, configuration, or privilege permissions Audit mode to create a CDE package Introduction Provenance in Software Packages June, 10th , 2014 6 / 29
  • 10. CDE Create a portable software package without installation, configuration, or privilege permissions Audit mode to create a CDE package Introduction Provenance in Software Packages June, 10th , 2014 6 / 29
  • 11. CDE Create a portable software package without installation, configuration, or privilege permissions Audit mode to create a CDE package Introduction Provenance in Software Packages June, 10th , 2014 6 / 29
  • 12. CDE Create a portable software package without installation, configuration, or privilege permissions Audit mode to create a CDE package Introduction Provenance in Software Packages June, 10th , 2014 6 / 29
  • 13. CDE Create a portable software package without installation, configuration, or privilege permissions Audit mode to create a CDE package Introduction Provenance in Software Packages June, 10th , 2014 6 / 29
  • 14. CDE - Execution Mode Introduction Provenance in Software Packages June, 10th , 2014 7 / 29
  • 15. CDE - Execution Mode Introduction Provenance in Software Packages June, 10th , 2014 7 / 29
  • 16. CDE - Execution Mode Introduction Provenance in Software Packages June, 10th , 2014 7 / 29
  • 17. CDE - Execution Mode Introduction Provenance in Software Packages June, 10th , 2014 7 / 29
  • 18. CDE - Execution Mode Introduction Provenance in Software Packages June, 10th , 2014 7 / 29
  • 19. CDE - Execution Mode Introduction Provenance in Software Packages June, 10th , 2014 7 / 29
  • 20. Software Pipelines Contain CDE packages A software pipeline consists many individual software modules A software module depends on externally-developed libraries A software module is often packaged together with specific versions of libraries Introduction Provenance in Software Packages June, 10th , 2014 8 / 29
  • 21. RDCEP Usecase Alice, Bob, and Charlie are scientists at the Center for Robust Decision Making on Climate and Energy Policy (RDCEP) A develops data integration methods to produce higher-resolution datasets depicting inferred land use over time. B develops computational models to do model-based comparative analysis. B’s software environment consists of A’s software modules to produce high-resolution datasets. C uses A and B’s software modules within data-intensive computing methods to run them in parallel. The Center wants to predict future yields of staple agricultural commodities given changes in the climate. C's Package (Merge from B's) B's Package (from A's) A's Package Parallel init Aggregation Generate images Model-based analysis Parallel summary Generate images Model-based analysisRetrive data Aggregation Software Pipeline Usecase Provenance in Software Packages June, 10th , 2014 9 / 29
  • 22. A’s Experiment & Package A’s package cde-root path to A’s files a-experiment.sh retrieve-data aggregation generate-image f1, f2, a-output path to common libs libc.so Re-execute A’s experiment: cde-exec a-experiment.sh cat a-experiment.sh ./retrieve-data f1 ./aggregation f1 f2 ./generate-image f2 a-output Software Pipeline Usecase Provenance in Software Packages June, 10th , 2014 11 / 29
  • 23. B’s Experiment & Package B’s package cde-root path to A’s files [...] path to B’s files b-experiment.sh analysis b-output path to common libs libc.so Re-execute B’s experiment: cde-exec b-experiment.sh cat b-experiment.sh cd path to A’s experiment cde-exec a-experiment.sh cd path to B’s files ./analysis path to A’s files/a-output b-output Software Pipeline Usecase Provenance in Software Packages June, 10th , 2014 12 / 29
  • 24. C’s Experiment & Package C’s package cde-root path to A’s files [...] path to B’s files [...] path to C’s files c-experiment.sh parallel-init parallel-summary c-output path to common libs libc.so Re-execute C’s experiment: cde-exec c-experiment.sh cat c-experiment.sh parallel-init path to A’s files/f4 cd path to A’s files cde-exec ./aggregation f4 f5 cde-exec ./generate-image f5 f6 cd path to B’s files cde-exec ./analysis path to A’s files/f6 f7 cd path to C’s files ./parallel-summary path to B’s files/f7 c-output Software Pipeline Usecase Provenance in Software Packages June, 10th , 2014 13 / 29
  • 25. Dependency Overlap in Multiple cde-root Directories Software Pipeline Usecase Provenance in Software Packages June, 10th , 2014 14 / 29
  • 26. File Overlap of Different Linux Distributions RH SUSE U12 U13 Amz 5498 / 23k 3184 / 11k 1203 / 5.4k 1819 / 5.5k RH 3861 / 12k 1654 / 6.6k 2223 / 6.3k SUSE 1245 / 3.9k 2085 / 6.4k U12 8226 / 24k Table 1 : Ratio of different files having the same path in 5 popular AMIs. The denominator is number of files having the same path in two distributions, and the numerator is the number of files with the same path but different md5 checksum. Ommited are manual pages in /usr/share/ directory. Amz Amazon Linux AMI RH Red Hat Enterprise Linux 6.4 SUSE SUSE Linux Enterprise Server 11 U12 Ubuntu Server 12.04.3 LTS U13 Ubuntu Server 13.10 Software Pipeline Usecase Provenance in Software Packages June, 10th , 2014 15 / 29
  • 27. Re-direction in Multiple cde-root Directories Software Pipeline Usecase Provenance in Software Packages June, 10th , 2014 16 / 29
  • 28. CDE-SP CDE-SP: Enhanced CDE that includes software provenance Describe tools and methods to audit, store, and query provenance Provenance queries Determine the environment under which a dependency was build Examine the dependencies which must be present Answer if packages in a pipeline can satisfy a new package Attribute authorship of software packages in a pipeline Combine and validate authorship from stored provenance Software Pipeline Usecase Provenance in Software Packages June, 10th , 2014 17 / 29
  • 29. CDE-SP Audit Objectives Capture additional details of the origins of a library or a binary Use these details for compiling and creating software pipelines Methods Create a dependency tree Process system calls are monitored Whenever a process executes a file system call, a dependency of that process is recorded Dependency can be a data file or a shared library Extract information about binaries and required shared libraries file, ldd, strings, and objdump UNIX commands uname -a and function getpwuid(getuid()) CDE-SP: Software Provenance in CDE Provenance in Software Packages June, 10th , 2014 18 / 29
  • 30. CDE-SP Audit Objectives Capture additional details of the origins of a library or a binary Use these details for compiling and creating software pipelines Methods Create a dependency tree Process system calls are monitored Whenever a process executes a file system call, a dependency of that process is recorded Dependency can be a data file or a shared library Extract information about binaries and required shared libraries file, ldd, strings, and objdump UNIX commands uname -a and function getpwuid(getuid()) CDE-SP: Software Provenance in CDE Provenance in Software Packages June, 10th , 2014 18 / 29
  • 31. CDE-SP Audit Objectives Capture additional details of the origins of a library or a binary Use these details for compiling and creating software pipelines Methods Create a dependency tree Process system calls are monitored Whenever a process executes a file system call, a dependency of that process is recorded Dependency can be a data file or a shared library Extract information about binaries and required shared libraries file, ldd, strings, and objdump UNIX commands uname -a and function getpwuid(getuid()) CDE-SP: Software Provenance in CDE Provenance in Software Packages June, 10th , 2014 18 / 29
  • 32. Storage Store provenance within the package itself Use LevelDB: a fast and light-weight key-value storage library Encode in the key the UNIX process identifier along with spawn time Key Value Explanation pid.PID1.exec.TIME PID2 PID1 wasTriggeredBy PID2 pid.PID.[path, pwd, args] VALUES Other properties of PID io.PID.action.IO.TIME FILE(PATH) PID wasGeneratedBy / wa- sUsedBy FILE(PATH) meta.agent USERNAME User information meta.machine OSNAME operating system distribution Table 2 : LevelDB key-value pairs that store file and process provenance. Capital letter words are arguments. CDE-SP: Software Provenance in CDE Provenance in Software Packages June, 10th , 2014 19 / 29
  • 33. Query LevelDB provides a minimal API for querying Simple, light-weight query interface Input: a program whose dependencies need to be retrieved Output: a GraphViz file displaying file and process dependencies Use depth first search algorithm to create a dependency tree with the input program as its root Exclusion option to remove uninteresting dependencies: /lib/, /usr/lib/, /usr/share/, /etc/ CDE-SP: Software Provenance in CDE Provenance in Software Packages June, 10th , 2014 20 / 29
  • 34. Authorship of Software Modules Combine authorship of the contributing packages Validate authorship from the provenance stored in the original package Generate the subgraph associated with the part of the new package Use subgraph isomorphism (NP-Hard) to validate with the original provenance graph Match provenance nodes of processes with the same paths of their binaries and working directories Match provenance nodes of files with the same path CDE-SP: Software Provenance in CDE Provenance in Software Packages June, 10th , 2014 21 / 29
  • 35. Experiments Performance of CDE-SP Auditing performance overhead Disk storage increase Provenance query runtime Redirection overhead when multiple UUID-based directories are created Compare the lightweight virtualization approach of CDE-SP with Kameleon3, a heavyweight virtualization approach used for reproducibility Experiments were run on Ubuntu 12.04 LTS workstation with an 8GBs RAM and 8-core Intel(R) processor clocking at 1600MHz. 3 Emeras, J., Richard, O., Bzeznik, B.: Reconstructing the software environment of an experiment with kameleon (2011) Experiment and Evaluation Provenance in Software Packages June, 10th , 2014 22 / 29
  • 36. Performance & Size Overhead Pipeline with two applications: Aggregation and Generate Image 2.1% slowdown of CDE-SP vs. 0-30% CDE virtualization overhead4 LevelDB database size 236kB (0.03% package size increase) contains approximately 12,000 key-value pairs Create Package Execution Disk Usage Provenance Query CDE 852.6±2.4 568.8±2.4 732MB CDE-SP 870.5±2.5 569.5±1.8 732MB+236kB 0.4±0.03 (seconds) (seconds) (seconds) Table 3 : Increase in CDE-SP performance is negligible in comparison with CDE 4 Guo, P.J., Engler, D.: CDE: using system call interposition to automatically create portable software packages. USENIX Association, Portland, OR (2011) Experiment and Evaluation Provenance in Software Packages June, 10th , 2014 23 / 29
  • 37. Redirection Overhead in CDE-SP Pipelined output of Aggregation to input of Generate Image 3 output files of Aggregation package were moved to Generate Image package 2 cross-package execve() system calls Less than a 1% slowdown of CDE-SP Experiment and Evaluation Provenance in Software Packages June, 10th , 2014 24 / 29
  • 38. Kameleon Use the Kameleon engine to make a bare bone VM appliance Self-written YAML-formatted recipes Self-written macrosteps and microsteps Kameleon can create virtual machine appliances in different formats for different Linux distributions Generates bash scripts to create an initial virtual image of a Linux distribution Populates the image with more Linux packages Populates with content of a CDE-SP package Experiment and Evaluation Provenance in Software Packages June, 10th , 2014 25 / 29
  • 39. CDE-SP Vs Kameleon 0 200 400 600 800 1000 1200 1400 1600 Kameleon CDE-SP Seconds Figure 1 : Overhead when using CDE with Kameleon VM appliance Experiment and Evaluation Provenance in Software Packages June, 10th , 2014 26 / 29
  • 40. Related Work Research Objects: packages scientific workflows with auxiliary information about workflows, including provenance information and metadata, such as the authors, the version CDE and Sumatra can capture an execution environment in a lightweight fashion SystemTap, being a kernel-based tracing mechanism, has better performance compared to ptrace but needs to run at a higher privilege level Provenance-to-Use (PTU) and ReproZip include provenance in self-contained software packages Related Work Provenance in Software Packages June, 10th , 2014 27 / 29
  • 41. Conclusion CDE does not encapsulate provenance of associated dependencies in a software package The lack of information about the origins of dependencies in a software package creates issues when constructing software pipelines from packages CDE-SP can include software provenance as part of a software package CDE-SP can use software package provenance to build software pipelines CDE-SP can maintain provenance when used to construct software pipelines Conclusion Provenance in Software Packages June, 10th , 2014 28 / 29
  • 42. Acknowledgments Neil Best at The University of Chicago Joshua Elliott at The Columbia University Justin Wozniak at Argonne National Laboratory Allison Brizius at RDCEP Center NSF grant SES-0951576, GEO-1343816 Acknowledgments Provenance in Software Packages June, 10th , 2014 29 / 29