SlideShare a Scribd company logo
Enter Data Version Control
Problem description
Problem description
Problem description
Index
● Explicit data and process dependencies
● Data and model caching
● Visualize metrics across model and data versions
● “One click” pipeline reproducibility
● 🍻 🍕
Explicit data and process dependencies
raw
Prepare
data
prepared
train
prepared
test
Extract
features
features
train
features
test
Select
model
Test
model
model metrics
raw
Prepare
data
prepared
train
prepared
test
Extract
features
features
train
features
test
Select
model
Test
model
model metrics
Explicit data and process dependencies
raw
Prepare
data
prepared
train
prepared
test
Extract
features
features
train
features
test
Select
model
Test
model
model metrics
$ dvc add data/raw
Explicit data and process dependencies
raw
Prepare
data
prepared
train
prepared
test
Extract
features
features
train
features
test
Select
model
Test
model
model metrics
Explicit data and process dependencies
raw
Prepare
data
prepared
train
prepared
test
Extract
features
features
train
features
test
Select
model
Test
model
model metrics
$ dvc run 
--file prepare_data.dvc 
--deps prepare_data.ipynb 
--deps data/raw 
--outs data/prepared 
papermill 
prepare_data.ipynb 
prepare_data_out.ipynb
Explicit data and process dependencies
raw
Prepare
data
prepared
train
prepared
test
Extract
features
features
train
features
test
Select
model
Test
model
model metrics
Explicit data and process dependencies
raw
Prepare
data
prepared
train
prepared
test
Extract
features
features
train
features
test
Select
model
Test
model
model metrics
$ dvc run 
--file extract_features.dvc 
--deps extract_features.ipynb 
--deps data/prepared 
--outs data/features 
papermill 
extract_features.ipynb 
extract_features_out.ipynb
Explicit data and process dependencies
raw
Prepare
data
prepared
train
prepared
test
Extract
features
features
train
features
test
Select
model
model
Test
model
metrics
Explicit data and process dependencies
raw
Prepare
data
prepared
train
prepared
test
Extract
features
features
train
features
test
Select
model
model
Test
model
metrics
$ dvc run 
--file select_model.dvc 
--deps select_model.ipynb 
--deps model.py 
--deps data/features 
--outs model 
papermill 
select_model.ipynb 
select_model_out.ipynb
Explicit data and process dependencies
raw
Prepare
data
prepared
train
prepared
test
Extract
features
features
train
features
test
Select
model
Test
model
model metrics
Explicit data and process dependencies
raw
Prepare
data
prepared
train
prepared
test
Extract
features
features
train
features
test
Select
model
Test
model
model metrics
$ dvc run 
--file test_model.dvc 
--deps test_model.ipynb 
--deps data/features 
--deps model 
--deps model.py 
--metrics test_metrics.json 
papermill 
test_model.ipynb 
test_model_out.ipynb
Explicit data and process dependencies
Explicit data and process dependencies
$ dvc pipeline show --ascii select_model.dvc
$ dvc pipeline show --ascii --outs select_model.dvc
Data and model caching
raw
Prepare
data
prepared
train
prepared
test
Extract
features
features
train
features
test
Select
model
Test
model
model metrics
Data and model caching
raw
Prepare
data
prepared
train
prepared
test
Extract
features
features
train
features
test
Select
model
Test
model
model metrics
CHANGE HERE
Data and model caching
raw
Prepare
data
prepared
train
prepared
test
Extract
features
features
train
features
test
Select
model
Test
model
model metrics
CHANGE HERE
$ dvc repro test_model.dvc
Data and model caching
$ dvc metrics show -T
Visualize metrics across model and data versions
“One click” pipeline reproducibility
$ git clone git@gitlab.com:adsmurai/training/dvc-meetup.git
$ git pull --tags
$ dvc pull --all-tags --all-branches
$ dvc checkout
$ dvc repro test_model.dvc
Thank you
DVC meetup

More Related Content

Similar to DVC meetup

Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
PAPIs.io
 
DDS_UI_WFs_13012022.pptx
DDS_UI_WFs_13012022.pptxDDS_UI_WFs_13012022.pptx
DDS_UI_WFs_13012022.pptx
SatishreddyMandadi
 
Unit Test your Views
Unit Test your ViewsUnit Test your Views
Unit Test your Views
Jorge Ortiz
 
Pa Project And Best Practice 2
Pa Project And Best Practice 2Pa Project And Best Practice 2
Pa Project And Best Practice 2
alice yang
 
Reproducibility and experiments management in Machine Learning
Reproducibility and experiments management in Machine Learning Reproducibility and experiments management in Machine Learning
Reproducibility and experiments management in Machine Learning
Mikhail Rozhkov
 
A00-440: Useful Questions for SAS ModelOps Specialist Certification Success
A00-440: Useful Questions for SAS ModelOps Specialist Certification SuccessA00-440: Useful Questions for SAS ModelOps Specialist Certification Success
A00-440: Useful Questions for SAS ModelOps Specialist Certification Success
PalakMazumdar1
 
Citrix AppDNA Management Overview v7.6
Citrix AppDNA Management Overview v7.6Citrix AppDNA Management Overview v7.6
Citrix AppDNA Management Overview v7.6
Kerry Dirks MCPS MS
 
Bdd test automation analysis
Bdd test automation analysisBdd test automation analysis
Bdd test automation analysis
ssuser2e8d4b
 
Data science workflows: from notebooks to production
Data science workflows: from notebooks to productionData science workflows: from notebooks to production
Data science workflows: from notebooks to production
Marissa Saunders
 
Netserv Software Testing
Netserv Software TestingNetserv Software Testing
Netserv Software Testing
sthicks14
 
Neotys PAC 2018 - Bruno Da Silva
Neotys PAC 2018 - Bruno Da SilvaNeotys PAC 2018 - Bruno Da Silva
Neotys PAC 2018 - Bruno Da Silva
Neotys_Partner
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)
Arnab Biswas
 
Data Science in the Elastic Stack
Data Science in the Elastic StackData Science in the Elastic Stack
Data Science in the Elastic Stack
Rochelle Sonnenberg
 
Loading Huge Amounts of Data
Loading Huge Amounts of DataLoading Huge Amounts of Data
Loading Huge Amounts of Data
Vaticle
 
DataOps - Production ML
DataOps - Production MLDataOps - Production ML
DataOps - Production ML
Al Zindiq
 
Strategy-driven Test Generation with Open Source Frameworks
Strategy-driven Test Generation with Open Source FrameworksStrategy-driven Test Generation with Open Source Frameworks
Strategy-driven Test Generation with Open Source Frameworks
Dimitry Polivaev
 
Performance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovPerformance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei Radov
Valeriia Maliarenko
 
Wix's ML Platform
Wix's ML PlatformWix's ML Platform
Wix's ML Platform
Ran Romano
 
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
SQUADEX
 
Integration testing - A&BP CC
Integration testing - A&BP CCIntegration testing - A&BP CC
Integration testing - A&BP CC
JWORKS powered by Ordina
 

Similar to DVC meetup (20)

Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
 
DDS_UI_WFs_13012022.pptx
DDS_UI_WFs_13012022.pptxDDS_UI_WFs_13012022.pptx
DDS_UI_WFs_13012022.pptx
 
Unit Test your Views
Unit Test your ViewsUnit Test your Views
Unit Test your Views
 
Pa Project And Best Practice 2
Pa Project And Best Practice 2Pa Project And Best Practice 2
Pa Project And Best Practice 2
 
Reproducibility and experiments management in Machine Learning
Reproducibility and experiments management in Machine Learning Reproducibility and experiments management in Machine Learning
Reproducibility and experiments management in Machine Learning
 
A00-440: Useful Questions for SAS ModelOps Specialist Certification Success
A00-440: Useful Questions for SAS ModelOps Specialist Certification SuccessA00-440: Useful Questions for SAS ModelOps Specialist Certification Success
A00-440: Useful Questions for SAS ModelOps Specialist Certification Success
 
Citrix AppDNA Management Overview v7.6
Citrix AppDNA Management Overview v7.6Citrix AppDNA Management Overview v7.6
Citrix AppDNA Management Overview v7.6
 
Bdd test automation analysis
Bdd test automation analysisBdd test automation analysis
Bdd test automation analysis
 
Data science workflows: from notebooks to production
Data science workflows: from notebooks to productionData science workflows: from notebooks to production
Data science workflows: from notebooks to production
 
Netserv Software Testing
Netserv Software TestingNetserv Software Testing
Netserv Software Testing
 
Neotys PAC 2018 - Bruno Da Silva
Neotys PAC 2018 - Bruno Da SilvaNeotys PAC 2018 - Bruno Da Silva
Neotys PAC 2018 - Bruno Da Silva
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)
 
Data Science in the Elastic Stack
Data Science in the Elastic StackData Science in the Elastic Stack
Data Science in the Elastic Stack
 
Loading Huge Amounts of Data
Loading Huge Amounts of DataLoading Huge Amounts of Data
Loading Huge Amounts of Data
 
DataOps - Production ML
DataOps - Production MLDataOps - Production ML
DataOps - Production ML
 
Strategy-driven Test Generation with Open Source Frameworks
Strategy-driven Test Generation with Open Source FrameworksStrategy-driven Test Generation with Open Source Frameworks
Strategy-driven Test Generation with Open Source Frameworks
 
Performance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovPerformance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei Radov
 
Wix's ML Platform
Wix's ML PlatformWix's ML Platform
Wix's ML Platform
 
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
 
Integration testing - A&BP CC
Integration testing - A&BP CCIntegration testing - A&BP CC
Integration testing - A&BP CC
 

Recently uploaded

NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
RDhivya6
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 

Recently uploaded (20)

NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 

DVC meetup