SlideShare a Scribd company logo
1 of 20
1IEEE eScience 2017
sciunits:sciunits:
Reusable Research ObjectsReusable Research Objects
School of Computing, College of Computing and Digital MediaSchool of Computing, College of Computing and Digital Media
Dai Hai Ton That, Gabe Fils,Dai Hai Ton That, Gabe Fils,
Zhihao Yuan, Tanu MalikZhihao Yuan, Tanu Malik
Presented by Gabe FilsPresented by Gabe Fils
2IEEE eScience 2017
Problem SpaceProblem Space
No easily creatable, readily reusable, efficiently versioned,No easily creatable, readily reusable, efficiently versioned,
discrete unit of computation existsdiscrete unit of computation exists
Virtualization Distributed
Version Control
Research Object
portable
self-contained
repeatable collaborative
versioned
documented
3IEEE eScience 2017
Introducing: TheIntroducing: The sciunitsciunit
 Captures application executionsCaptures application executions
 Repeats executionsRepeats executions
 Reproduces executions, changing input argsReproduces executions, changing input args
 Versioned executions stored as oneVersioned executions stored as one sciunitsciunit
 Uses provenance for self-documentationUses provenance for self-documentation
AA reusablereusable research object.research object.
4IEEE eScience 2017
Sample Applications: FIE, VICSample Applications: FIE, VIC
FIE Application WorkflowFIE Application Workflow
 City of Chicago Food InspectionsCity of Chicago Food Inspections
Evaluation ModelEvaluation Model
 Four applicationsFour applications
 Two languagesTwo languages
 130 files130 files
 1580 dependencies1580 dependencies
 908 MB908 MB
https://github.com/chicago/food-inspections-https://github.com/chicago/food-inspections-
evaluationevaluation
 Variable Infiltration CapacityVariable Infiltration Capacity
 Four applicationsFour applications
 Five languagesFive languages
 7 GB7 GB https://vic.readthedocs.iohttps://vic.readthedocs.io
5IEEE eScience 2017
sciunitsciunit ArchitectureArchitecture
https://bitbucket.org/geotrust/sciunit-clihttps://bitbucket.org/geotrust/sciunit-cli
sciunitsciunit clientclient sciunitsciunit serverserver
sciunitsciunit
 LightweightLightweight
 VersionedVersioned
 Self-containedSelf-contained
 StoresStores sciunitssciunits
 VisualizesVisualizes sciunitssciunits
 Linux CLI AppLinux CLI App
 C / PythonC / Python
 Minimal DependenciesMinimal Dependencies
6IEEE eScience 2017
ClientClient packagepackage CommandCommand
Initialize or retrieve aInitialize or retrieve a sciunitsciunit::
Run FIE, capture into package:Run FIE, capture into package:
7IEEE eScience 2017
Packaging DetailsPackaging Details
Package In Storage (133 MB):Package In Storage (133 MB):
22ndnd
Version Of Package (133 MB):Version Of Package (133 MB):
FIE pkg-root DirFIE pkg-root Dir
1) Attach to process1) Attach to process
2) Intercept system calls2) Intercept system calls
3) Copy files / executables3) Copy files / executables
4) Log system calls4) Log system calls
8IEEE eScience 2017
ClientClient repeatrepeat CommandCommand
original calloriginal call
61021 exec61021 exec
“/usr/bin/python”“/usr/bin/python”
replaced callreplaced call
61021 exec61021 exec
“/home/user1/pkgroot/usr/bin/python”“/home/user1/pkgroot/usr/bin/python”
Repeat a package:Repeat a package: 1) Attach to process1) Attach to process
2) Replace system call args2) Replace system call args
9IEEE eScience 2017
Versioning SolutionVersioning Solution
80G File, Fixed 4K Chunks:80G File, Fixed 4K Chunks:
Same File, 1 Byte Inserted At Start:Same File, 1 Byte Inserted At Start:
 When to store?When to store?
 During packagingDuring packaging
 After packagingAfter packaging
 How to store?How to store?
 Line-based diffsLine-based diffs
 Fixed-size chunksFixed-size chunks
 Content-definedContent-defined
10IEEE eScience 2017
Rabin HashRabin Hash
 Hash of subset of file bytes (Hash of subset of file bytes (RH(BRH(B11,, BB22, …, … BBnn))))
 Fixed-size sliding windowFixed-size sliding window nn
 Hash at any positionHash at any position ii ((RH(XRH(X(i,n)(i,n)))))
 Deduplicate chunkDeduplicate chunk
11IEEE eScience 2017
Storage And RetrievalStorage And Retrieval
Deduplicated Container StorageDeduplicated Container Storage
Store package:Store package:
1) Archive package-root1) Archive package-root
2) CDC on archive2) CDC on archive
3) Store manifest3) Store manifest
Retrieve package:Retrieve package:
1) Retrieve manifest1) Retrieve manifest
2) Concatenate chunks2) Concatenate chunks
3) Extract archive3) Extract archive
12IEEE eScience 2017
Detailed VisualizationDetailed Visualization
Part Of A Normal (Verbose) Provenance LogPart Of A Normal (Verbose) Provenance Log
Small Section Of Graph Built From Normal Provenance LogSmall Section Of Graph Built From Normal Provenance Log
13IEEE eScience 2017
Summarization: Group By SimilaritySummarization: Group By Similarity
 Group vertices byGroup vertices by type / connectionstype / connections
 Effect: group subprocesses, group files in directoryEffect: group subprocesses, group files in directory
Similarity RuleSimilarity Rule
Type(u) = Type(v), Input(u) = Input(v), Output(u) = Output(v)Type(u) = Type(v), Input(u) = Input(v), Output(u) = Output(v)
Similarity AppliedSimilarity AppliedFull GraphFull Graph
14IEEE eScience 2017
Summarization: PackSummarization: Pack
 Find min-connected nodes, pack into hubsFind min-connected nodes, pack into hubs
Packability RulesPackability Rules
1) Type(u) = file, {1) Type(u) = file, { !e | e E ( e=(u,v) e=(v,u) ) }∃ ∈ ∧ ∨!e | e E ( e=(u,v) e=(v,u) ) }∃ ∈ ∧ ∨
2) Type(u) = process, { !e | e E e=(u,v) }∃ ∈ ∧2) Type(u) = process, { !e | e E e=(u,v) }∃ ∈ ∧
3) Type(u) = file, { !(e∃3) Type(u) = file, { !(e∃ 11,e,e22) | ( x V, v≠x) ( e∃ ∈ ∧) | ( x V, v≠x) ( e∃ ∈ ∧ 11=(u,v) E, e∈=(u,v) E, e∈ 22=(x,u) E ) }∈=(x,u) E ) }∈
Packability AppliedPackability Applied
Similarity AppliedSimilarity Applied
15IEEE eScience 2017
Summarization: AnnotateSummarization: Annotate
 Higher precedence to process nodesHigher precedence to process nodes
 File with n > 1 edges → n annotationsFile with n > 1 edges → n annotations
Annotation AppliedAnnotation AppliedPackability AppliedPackability Applied
16IEEE eScience 2017
Package / Repeat PerformancePackage / Repeat Performance
 Added ptrace system callsAdded ptrace system calls
 I/O-intensive apps: VICI/O-intensive apps: VIC
 Non-I/O-intensive apps: FIENon-I/O-intensive apps: FIE
Package/Repeat RuntimesPackage/Repeat Runtimes
1) Run app normally1) Run app normally
2) Run with2) Run with packagepackage
3) Run with3) Run with repeatrepeat
17IEEE eScience 2017
Versioning PerformanceVersioning Performance
Commit/Reconstruct TimesCommit/Reconstruct Times
Storage SizesStorage Sizes
Package/Repeat RuntimesPackage/Repeat Runtimes
1) Size of several versions1) Size of several versions
2) Size after deduplication2) Size after deduplication
3) CDC / concatenation time3) CDC / concatenation time
18IEEE eScience 2017
Results From Provenance SummarizationResults From Provenance Summarization
Reduction Of EdgesReduction Of EdgesReduction Of File NodesReduction Of File Nodes
Reduction Of Process NodesReduction Of Process Nodes
1) Full FIE graph1) Full FIE graph
2) All techniques applied2) All techniques applied
3) Dynamic expansion3) Dynamic expansion
19IEEE eScience 2017
Conclusion And Current WorkConclusion And Current Work
 Graph summarization testingGraph summarization testing
 Database applicationsDatabase applications
 Exact partial repeatabilityExact partial repeatability
 Apps with network-operationsApps with network-operations
 Parallel HPC applicationsParallel HPC applications
 Emerging reusable object formatsEmerging reusable object formats
sciunitsciunit is a portable, self-contained, and inherentlyis a portable, self-contained, and inherently
understandable versioned unit of computation.understandable versioned unit of computation.
20IEEE eScience 2017
Links And AcknowledgementsLinks And Acknowledgements
sciunitsciunit::
 https://sciunit.runhttps://sciunit.run
sciunitsciunit paper:paper:
 https://arxiv.orghttps://arxiv.org
 Search for “Search for “sciunit”sciunit”
National Science Foundation grants ICER-1639759,National Science Foundation grants ICER-1639759,
ICER-1661918, ICER-1440327, ICER-1343816ICER-1661918, ICER-1440327, ICER-1343816

More Related Content

Similar to Sciunits: Resuable Research Object

Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...Research Data Alliance
 
PyData Meetup Presentation in Natal April 2024
PyData Meetup Presentation in Natal April 2024PyData Meetup Presentation in Natal April 2024
PyData Meetup Presentation in Natal April 2024MarcelRibeiroDantas
 
AI For Software Engineering: Two Industrial Experience Reports
AI For Software Engineering: Two Industrial Experience ReportsAI For Software Engineering: Two Industrial Experience Reports
AI For Software Engineering: Two Industrial Experience ReportsUniversity of Antwerp
 
Towards a Foundational API for Resilient Distributed Systems Design
Towards a Foundational API for Resilient Distributed Systems DesignTowards a Foundational API for Resilient Distributed Systems Design
Towards a Foundational API for Resilient Distributed Systems DesignDanilo Pianini
 
Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Felix Z. Hoffmann
 
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesThomas Zimmermann
 
Beyond the GFLOPS
Beyond the GFLOPSBeyond the GFLOPS
Beyond the GFLOPSSlide_N
 
Tutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RTutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RAvjinder (Avi) Kaler
 
Modern javascript localization with c-3po and the good old gettext
Modern javascript localization with c-3po and the good old gettextModern javascript localization with c-3po and the good old gettext
Modern javascript localization with c-3po and the good old gettextAlexander Mostovenko
 
The Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI toolThe Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI toolIvo Jimenez
 
Kqueue : Generic Event notification
Kqueue : Generic Event notificationKqueue : Generic Event notification
Kqueue : Generic Event notificationMahendra M
 
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat Pôle Systematic Paris-Region
 
An Empirical Study of Identical Function Clones in CRAN
An Empirical Study of Identical Function Clones in CRANAn Empirical Study of Identical Function Clones in CRAN
An Empirical Study of Identical Function Clones in CRANTom Mens
 

Similar to Sciunits: Resuable Research Object (20)

Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...
 
R sharing 101
R sharing 101R sharing 101
R sharing 101
 
PyData Meetup Presentation in Natal April 2024
PyData Meetup Presentation in Natal April 2024PyData Meetup Presentation in Natal April 2024
PyData Meetup Presentation in Natal April 2024
 
AI For Software Engineering: Two Industrial Experience Reports
AI For Software Engineering: Two Industrial Experience ReportsAI For Software Engineering: Two Industrial Experience Reports
AI For Software Engineering: Two Industrial Experience Reports
 
Ab initio training Ab-initio Architecture
Ab initio training Ab-initio ArchitectureAb initio training Ab-initio Architecture
Ab initio training Ab-initio Architecture
 
Towards a Foundational API for Resilient Distributed Systems Design
Towards a Foundational API for Resilient Distributed Systems DesignTowards a Foundational API for Resilient Distributed Systems Design
Towards a Foundational API for Resilient Distributed Systems Design
 
Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?
 
Handout3o
Handout3oHandout3o
Handout3o
 
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development Activities
 
Beyond the GFLOPS
Beyond the GFLOPSBeyond the GFLOPS
Beyond the GFLOPS
 
Slimfast
SlimfastSlimfast
Slimfast
 
Tutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RTutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using R
 
Icsm08a.ppt
Icsm08a.pptIcsm08a.ppt
Icsm08a.ppt
 
Modern javascript localization with c-3po and the good old gettext
Modern javascript localization with c-3po and the good old gettextModern javascript localization with c-3po and the good old gettext
Modern javascript localization with c-3po and the good old gettext
 
The Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI toolThe Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI tool
 
Kqueue : Generic Event notification
Kqueue : Generic Event notificationKqueue : Generic Event notification
Kqueue : Generic Event notification
 
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
 
An Empirical Study of Identical Function Clones in CRAN
An Empirical Study of Identical Function Clones in CRANAn Empirical Study of Identical Function Clones in CRAN
An Empirical Study of Identical Function Clones in CRAN
 
Machine Learning and Deep Software Variability
Machine Learning and Deep Software VariabilityMachine Learning and Deep Software Variability
Machine Learning and Deep Software Variability
 
biopython, doctest and makefiles
biopython, doctest and makefilesbiopython, doctest and makefiles
biopython, doctest and makefiles
 

More from Tanu Malik

Auditing and Maintaining Provenance in Software Packages
Auditing and Maintaining Provenance in Software PackagesAuditing and Maintaining Provenance in Software Packages
Auditing and Maintaining Provenance in Software PackagesTanu Malik
 
GeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with GlobusGeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with GlobusTanu Malik
 
LDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationLDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationTanu Malik
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsTanu Malik
 
GlobusWorld 2015
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015Tanu Malik
 
Benchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesTanu Malik
 
PTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityPTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityTanu Malik
 
Scientometrics
ScientometricsScientometrics
ScientometricsTanu Malik
 
EarthCube DDMA AGU
EarthCube DDMA AGUEarthCube DDMA AGU
EarthCube DDMA AGUTanu Malik
 
SOLE: Linking Research Papers with Science Objects
SOLE: Linking Research Papers with Science ObjectsSOLE: Linking Research Papers with Science Objects
SOLE: Linking Research Papers with Science ObjectsTanu Malik
 

More from Tanu Malik (10)

Auditing and Maintaining Provenance in Software Packages
Auditing and Maintaining Provenance in Software PackagesAuditing and Maintaining Provenance in Software Packages
Auditing and Maintaining Provenance in Software Packages
 
GeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with GlobusGeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with Globus
 
LDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationLDV: Light-weight Database Virtualization
LDV: Light-weight Database Virtualization
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC Programs
 
GlobusWorld 2015
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015
 
Benchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging Services
 
PTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityPTU: Using Provenance for Repeatability
PTU: Using Provenance for Repeatability
 
Scientometrics
ScientometricsScientometrics
Scientometrics
 
EarthCube DDMA AGU
EarthCube DDMA AGUEarthCube DDMA AGU
EarthCube DDMA AGU
 
SOLE: Linking Research Papers with Science Objects
SOLE: Linking Research Papers with Science ObjectsSOLE: Linking Research Papers with Science Objects
SOLE: Linking Research Papers with Science Objects
 

Recently uploaded

jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjaytendertech
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontangsiskavia95
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...mikehavy0
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...BabaJohn3
 
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...Obat Aborsi 088980685493 Jual Obat Aborsi
 
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptxChapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptxkusamee0
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Voces Mineras
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureFuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureBoston Institute of Analytics
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchersdarmandersingh4580
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
Rolex Watch - Design Decision Analysis.
Rolex Watch -  Design Decision Analysis.Rolex Watch -  Design Decision Analysis.
Rolex Watch - Design Decision Analysis.zeddstock
 
Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...rightmanforbloodline
 

Recently uploaded (20)

jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdf
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
Jual Obat Aborsi Lhokseumawe ( Asli No.1 ) 088980685493 Obat Penggugur Kandun...
 
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptxChapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
Chapter 1 - Introduction to Data Mining Concepts and Techniques.pptx
 
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
Las implicancias del memorándum de entendimiento entre Codelco y SQM según la...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureFuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Rolex Watch - Design Decision Analysis.
Rolex Watch -  Design Decision Analysis.Rolex Watch -  Design Decision Analysis.
Rolex Watch - Design Decision Analysis.
 
Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...
 

Sciunits: Resuable Research Object

  • 1. 1IEEE eScience 2017 sciunits:sciunits: Reusable Research ObjectsReusable Research Objects School of Computing, College of Computing and Digital MediaSchool of Computing, College of Computing and Digital Media Dai Hai Ton That, Gabe Fils,Dai Hai Ton That, Gabe Fils, Zhihao Yuan, Tanu MalikZhihao Yuan, Tanu Malik Presented by Gabe FilsPresented by Gabe Fils
  • 2. 2IEEE eScience 2017 Problem SpaceProblem Space No easily creatable, readily reusable, efficiently versioned,No easily creatable, readily reusable, efficiently versioned, discrete unit of computation existsdiscrete unit of computation exists Virtualization Distributed Version Control Research Object portable self-contained repeatable collaborative versioned documented
  • 3. 3IEEE eScience 2017 Introducing: TheIntroducing: The sciunitsciunit  Captures application executionsCaptures application executions  Repeats executionsRepeats executions  Reproduces executions, changing input argsReproduces executions, changing input args  Versioned executions stored as oneVersioned executions stored as one sciunitsciunit  Uses provenance for self-documentationUses provenance for self-documentation AA reusablereusable research object.research object.
  • 4. 4IEEE eScience 2017 Sample Applications: FIE, VICSample Applications: FIE, VIC FIE Application WorkflowFIE Application Workflow  City of Chicago Food InspectionsCity of Chicago Food Inspections Evaluation ModelEvaluation Model  Four applicationsFour applications  Two languagesTwo languages  130 files130 files  1580 dependencies1580 dependencies  908 MB908 MB https://github.com/chicago/food-inspections-https://github.com/chicago/food-inspections- evaluationevaluation  Variable Infiltration CapacityVariable Infiltration Capacity  Four applicationsFour applications  Five languagesFive languages  7 GB7 GB https://vic.readthedocs.iohttps://vic.readthedocs.io
  • 5. 5IEEE eScience 2017 sciunitsciunit ArchitectureArchitecture https://bitbucket.org/geotrust/sciunit-clihttps://bitbucket.org/geotrust/sciunit-cli sciunitsciunit clientclient sciunitsciunit serverserver sciunitsciunit  LightweightLightweight  VersionedVersioned  Self-containedSelf-contained  StoresStores sciunitssciunits  VisualizesVisualizes sciunitssciunits  Linux CLI AppLinux CLI App  C / PythonC / Python  Minimal DependenciesMinimal Dependencies
  • 6. 6IEEE eScience 2017 ClientClient packagepackage CommandCommand Initialize or retrieve aInitialize or retrieve a sciunitsciunit:: Run FIE, capture into package:Run FIE, capture into package:
  • 7. 7IEEE eScience 2017 Packaging DetailsPackaging Details Package In Storage (133 MB):Package In Storage (133 MB): 22ndnd Version Of Package (133 MB):Version Of Package (133 MB): FIE pkg-root DirFIE pkg-root Dir 1) Attach to process1) Attach to process 2) Intercept system calls2) Intercept system calls 3) Copy files / executables3) Copy files / executables 4) Log system calls4) Log system calls
  • 8. 8IEEE eScience 2017 ClientClient repeatrepeat CommandCommand original calloriginal call 61021 exec61021 exec “/usr/bin/python”“/usr/bin/python” replaced callreplaced call 61021 exec61021 exec “/home/user1/pkgroot/usr/bin/python”“/home/user1/pkgroot/usr/bin/python” Repeat a package:Repeat a package: 1) Attach to process1) Attach to process 2) Replace system call args2) Replace system call args
  • 9. 9IEEE eScience 2017 Versioning SolutionVersioning Solution 80G File, Fixed 4K Chunks:80G File, Fixed 4K Chunks: Same File, 1 Byte Inserted At Start:Same File, 1 Byte Inserted At Start:  When to store?When to store?  During packagingDuring packaging  After packagingAfter packaging  How to store?How to store?  Line-based diffsLine-based diffs  Fixed-size chunksFixed-size chunks  Content-definedContent-defined
  • 10. 10IEEE eScience 2017 Rabin HashRabin Hash  Hash of subset of file bytes (Hash of subset of file bytes (RH(BRH(B11,, BB22, …, … BBnn))))  Fixed-size sliding windowFixed-size sliding window nn  Hash at any positionHash at any position ii ((RH(XRH(X(i,n)(i,n)))))  Deduplicate chunkDeduplicate chunk
  • 11. 11IEEE eScience 2017 Storage And RetrievalStorage And Retrieval Deduplicated Container StorageDeduplicated Container Storage Store package:Store package: 1) Archive package-root1) Archive package-root 2) CDC on archive2) CDC on archive 3) Store manifest3) Store manifest Retrieve package:Retrieve package: 1) Retrieve manifest1) Retrieve manifest 2) Concatenate chunks2) Concatenate chunks 3) Extract archive3) Extract archive
  • 12. 12IEEE eScience 2017 Detailed VisualizationDetailed Visualization Part Of A Normal (Verbose) Provenance LogPart Of A Normal (Verbose) Provenance Log Small Section Of Graph Built From Normal Provenance LogSmall Section Of Graph Built From Normal Provenance Log
  • 13. 13IEEE eScience 2017 Summarization: Group By SimilaritySummarization: Group By Similarity  Group vertices byGroup vertices by type / connectionstype / connections  Effect: group subprocesses, group files in directoryEffect: group subprocesses, group files in directory Similarity RuleSimilarity Rule Type(u) = Type(v), Input(u) = Input(v), Output(u) = Output(v)Type(u) = Type(v), Input(u) = Input(v), Output(u) = Output(v) Similarity AppliedSimilarity AppliedFull GraphFull Graph
  • 14. 14IEEE eScience 2017 Summarization: PackSummarization: Pack  Find min-connected nodes, pack into hubsFind min-connected nodes, pack into hubs Packability RulesPackability Rules 1) Type(u) = file, {1) Type(u) = file, { !e | e E ( e=(u,v) e=(v,u) ) }∃ ∈ ∧ ∨!e | e E ( e=(u,v) e=(v,u) ) }∃ ∈ ∧ ∨ 2) Type(u) = process, { !e | e E e=(u,v) }∃ ∈ ∧2) Type(u) = process, { !e | e E e=(u,v) }∃ ∈ ∧ 3) Type(u) = file, { !(e∃3) Type(u) = file, { !(e∃ 11,e,e22) | ( x V, v≠x) ( e∃ ∈ ∧) | ( x V, v≠x) ( e∃ ∈ ∧ 11=(u,v) E, e∈=(u,v) E, e∈ 22=(x,u) E ) }∈=(x,u) E ) }∈ Packability AppliedPackability Applied Similarity AppliedSimilarity Applied
  • 15. 15IEEE eScience 2017 Summarization: AnnotateSummarization: Annotate  Higher precedence to process nodesHigher precedence to process nodes  File with n > 1 edges → n annotationsFile with n > 1 edges → n annotations Annotation AppliedAnnotation AppliedPackability AppliedPackability Applied
  • 16. 16IEEE eScience 2017 Package / Repeat PerformancePackage / Repeat Performance  Added ptrace system callsAdded ptrace system calls  I/O-intensive apps: VICI/O-intensive apps: VIC  Non-I/O-intensive apps: FIENon-I/O-intensive apps: FIE Package/Repeat RuntimesPackage/Repeat Runtimes 1) Run app normally1) Run app normally 2) Run with2) Run with packagepackage 3) Run with3) Run with repeatrepeat
  • 17. 17IEEE eScience 2017 Versioning PerformanceVersioning Performance Commit/Reconstruct TimesCommit/Reconstruct Times Storage SizesStorage Sizes Package/Repeat RuntimesPackage/Repeat Runtimes 1) Size of several versions1) Size of several versions 2) Size after deduplication2) Size after deduplication 3) CDC / concatenation time3) CDC / concatenation time
  • 18. 18IEEE eScience 2017 Results From Provenance SummarizationResults From Provenance Summarization Reduction Of EdgesReduction Of EdgesReduction Of File NodesReduction Of File Nodes Reduction Of Process NodesReduction Of Process Nodes 1) Full FIE graph1) Full FIE graph 2) All techniques applied2) All techniques applied 3) Dynamic expansion3) Dynamic expansion
  • 19. 19IEEE eScience 2017 Conclusion And Current WorkConclusion And Current Work  Graph summarization testingGraph summarization testing  Database applicationsDatabase applications  Exact partial repeatabilityExact partial repeatability  Apps with network-operationsApps with network-operations  Parallel HPC applicationsParallel HPC applications  Emerging reusable object formatsEmerging reusable object formats sciunitsciunit is a portable, self-contained, and inherentlyis a portable, self-contained, and inherently understandable versioned unit of computation.understandable versioned unit of computation.
  • 20. 20IEEE eScience 2017 Links And AcknowledgementsLinks And Acknowledgements sciunitsciunit::  https://sciunit.runhttps://sciunit.run sciunitsciunit paper:paper:  https://arxiv.orghttps://arxiv.org  Search for “Search for “sciunit”sciunit” National Science Foundation grants ICER-1639759,National Science Foundation grants ICER-1639759, ICER-1661918, ICER-1440327, ICER-1343816ICER-1661918, ICER-1440327, ICER-1343816