SlideShare a Scribd company logo
1 of 30
Download to read offline
Organizing Machine Learning Projects
Repository Organization
Hao-Wen Dong
If you…
֎ have a hard time managing a bunch of experiments
֎ always forgot the exact configuration used in a specific experiment
֎ annoyed at having to copy all the code just to test a new architecture
2
Repository organization (core)
data/ Training data
exp/ Experimental inputs and outputs
scripts/ Shell scripts for training, testing, etc.
src/ Source code
LICENSE.txt License file
README.md Readme file
requirement.txt Requirements file
3
Repository organization (core)
data/ Training data
exp/ Experimental inputs and outputs
scripts/ Shell scripts for training, testing, etc.
src/ Source code
LICENSE.txt License file
README.md Readme file
requirement.txt Requirements file
4
Source code (src/)
musegan/ Model source code
__init__.py File to make it a package
inference.py Script for inference
interpolation.py Script for interpolation
process_data.py Script for data preprocessing
train.py Script for training
5
Source code (src/)
musegan/ Model source code
__init__.py File to make it a package
inference.py Script for inference
interpolation.py Script for interpolation
process_data.py Script for data preprocessing
train.py Script for training
6
Model source code (src/musegan/)
presets/ Preset network architectures
__init__.py File to make it a package
config.py System configuration file
data.py Data loader
default_config.yaml Default configurations
default_params.yaml Default parameters
io_utils.py I/O utilities
losses.py Loss functions
metrics.py Metrics
model.py Main model class
utils.py Utilities
7
Model source code (src/musegan/)
presets/ Preset network architectures
__init__.py File to make it a package
config.py System configuration file
data.py Data loader
default_config.yaml Default configurations
default_params.yaml Default parameters
io_utils.py I/O utilities
losses.py Loss functions
metrics.py Metrics
model.py Main model class
utils.py Utilities
8
Define your model
Model source code (src/musegan/)
presets/ Preset network architectures
__init__.py File to make it a package
config.py System configuration file
data.py Data loader
default_config.yaml Default configurations
default_params.yaml Default parameters
io_utils.py I/O utilities
losses.py Loss functions
metrics.py Metrics
model.py Main model class
utils.py Utilities
9
presets/
__init__.py
generator/
__init__.py
default.py
ablated.py
discriminator/
__init__.py
default.py
ablated.py
Model source code (src/musegan/)
presets/ Preset network architectures
__init__.py File to make it a package
config.py System configuration file
data.py Data loader
default_config.yaml Default configurations
default_params.yaml Default parameters
io_utils.py I/O utilities
losses.py Loss functions
metrics.py Metrics
model.py Main model class
utils.py Utilities
10
Recommend to use high-level
APIs for flexibility such as
tf.data or torch.utils.data
Model source code (src/musegan/)
presets/ Preset network architectures
__init__.py File to make it a package
config.py System configuration file
data.py Data loader
default_config.yaml Default configurations
default_params.yaml Default parameters
io_utils.py I/O utilities
losses.py Loss functions
metrics.py Metrics
model.py Main model class
utils.py Utilities
11
params config
Define the model
Define the settings for
training/inference
Will be copied to the experiment
directory when setting up a new
experiment
Model source code (src/musegan/)
presets/ Preset network architectures
__init__.py File to make it a package
config.py System configuration file
data.py Data loader
default_config.yaml Default configurations
default_params.yaml Default parameters
io_utils.py I/O utilities
losses.py Loss functions
metrics.py Metrics
model.py Main model class
utils.py Utilities
12
(for developers only)
Define rarely-changed
configuration variables.
For example,
- logging level/format
- prefetch/buffer size
for the data loader
Source code (src/)
musegan/ Model source code
__init__.py File to make it a package
inference.py Script for inference
interpolation.py Script for interpolation
process_data.py Script for data preprocessing
train.py Script for training
13
Python scripts
(to be called by shell scripts)
Repository organization (core)
data/ Training data
exp/ Experimental inputs and outputs
scripts/ Shell scripts for training, testing, etc.
src/ Source code
LICENSE.txt License file
README.md Readme file
requirement.txt Requirements file
14
All the outputs are saved here
(logs, samples, and checkpoints)
Repository organization (core)
data/ Training data
exp/ Experimental inputs and outputs
scripts/ Shell scripts for training, testing, etc.
src/ Source code
LICENSE.txt License file
README.md Readme file
requirement.txt Requirements file
15
Shell scripts (scripts/)
download_data.sh Download training data
download_models.sh Download pretrained models
process_data.py Process the data
rerun_exp.sh Rerun the experiment
run_exp.sh Run the experiment
run_inference.sh Run the inference
run_interpolation.sh Run the interpolation
run_train.sh Run the training
setup_exp.sh Setup the experiment
16
Experimenting
֎ Create an experiment directory (under exp/)
 One experiment means to compare n different settings
 Examples: default (the default setting), net_archs, streams, binary_neurons
֎ Setup the experiment items (run setup_exp.sh)
 Run n times if you intend to compare n different settings in this experiment
 Each experiment item (i.e., one setting) has its own directory
֎ Modify the configuration and parameters for each experiment item
 Modify config.yaml and params.yaml in each experiment item directory
֎ Run the experiment (run run_exp.sh for each experiment item)
17
Set up an experiment (scripts/setup_exp.sh)
֎ Given
 [exp_name]
 [exp_note]
֎ Do
 Create an experiment directory named [exp_name] under exp/
 Copy the default configuration and parameter files to the experiment directory
 Write exp_note as a text file to the experiment directory
 Examples: ‘Compare different network architectures’ and ‘Compare different types of binary neurons’
18
Run an experiment (scripts/run_exp.sh)
֎ Given
 [exp_dir]
 [gpu_num]
֎ Do
 Automatically search for config.yaml and params.yaml in exp_dir
 Run the scripts in specific orders. For example, a typical GAN experiment might look like
 run_train.sh
 run_inference.sh
 run_interpolation.sh
19
Shell scripts (scripts/)
download_data.sh Download training data
download_models.sh Download pretrained models
process_data.py Process the data
rerun_exp.sh Rerun the experiment
run_exp.sh Run the experiment
run_inference.sh Run the inference
run_interpolation.sh Run the interpolation
run_train.sh Run the training
setup_exp.sh Setup the experiment
20
- Remove the outputs
- Keep the configuration
and parameter files
- Run the experiment again
Shell scripts (scripts/)
download_data.sh Download training data
download_models.sh Download pretrained models
process_data.py Process the data
rerun_exp.sh Rerun the experiment
run_exp.sh Run the experiment
run_inference.sh Run the inference
run_interpolation.sh Run the interpolation
run_train.sh Run the training
setup_exp.sh Setup the experiment
21
The training data and
pretrained models should
be large and thus hosted
somewhere else
Shell scripts (scripts/)
download_data.sh Download training data
download_models.sh Download pretrained models
process_data.py Process the data
rerun_exp.sh Rerun the experiment
run_exp.sh Run the experiment
run_inference.sh Run the inference
run_interpolation.sh Run the interpolation
run_train.sh Run the training
setup_exp.sh Setup the experiment
22
Process the downloaded
data for training
Repository organization (core)
data/ Training data
exp/ Experimental inputs and outputs
scripts/ Shell scripts for training, testing, etc.
src/ Source code
LICENSE.txt License file
README.md Readme file
requirement.txt Requirements file
23
Should be listed in .gitignore
Repository organization (core)
data/ Training data
exp/ Experimental inputs and outputs
scripts/ Shell scripts for training, testing, etc.
src/ Source code
LICENSE.txt License file
README.md Readme file
requirement.txt Requirements file
24
Required for others to use your code
See https://choosealicense.com/ to
choose a proper open source license
Repository organization (core)
data/ Training data
exp/ Experimental inputs and outputs
scripts/ Shell scripts for training, testing, etc.
src/ Source code
LICENSE.txt License file
README.md Readme file
requirement.txt Requirements file
25
(for reproducibility)
Repository organization (complete)
data/ Training data
docs/ Website contents
exp/ Experimental inputs and outputs
scripts/ Shell scripts for training, testing, etc.
src/ Source code
LICENSE.txt License file
Pipfile Dependency files
Pipfile.lock Dependency files
README.md Readme file
requirement.txt Requirements file
26
Repository organization (complete)
27
data/ Training data
docs/ Website contents
exp/ Experimental inputs and outputs
scripts/ Shell scripts for training, testing, etc.
src/ Source code
LICENSE.txt License file
Pipfile Dependency files
Pipfile.lock Dependency files
README.md Readme file
requirement.txt Requirements file
Recommend to use pipenv for packaging
Repository organization (complete)
28
data/ Training data
docs/ Website contents
exp/ Experimental inputs and outputs
scripts/ Shell scripts for training, testing, etc.
src/ Source code
LICENSE.txt License file
Pipfile Dependency files
Pipfile.lock Dependency files
README.md Readme file
requirement.txt Requirements file
Recommend to use GitHub Pages for simplicity
Benefits
֎ Easy to manage lots of experiments
 Each experiment has its own directory
 Configuration and parameters used in each experiment are saved
 Configuration and parameters are loaded locally (no need to modify the source code)
֎ Easy to examine new network architectures
 Simply add a new preset to the preset directory
 No need to modify other source code
29
Thank you for your attention See an example project using
this template—MuseGAN

More Related Content

Similar to Organizing Machine Learning Projects - Repository Organization

CMake Tutorial
CMake TutorialCMake Tutorial
CMake TutorialFu Haiping
 
Question IYou are going to use the semaphores for process sy.docx
Question IYou are going to use the semaphores for process sy.docxQuestion IYou are going to use the semaphores for process sy.docx
Question IYou are going to use the semaphores for process sy.docxaudeleypearl
 
Effective testing with pytest
Effective testing with pytestEffective testing with pytest
Effective testing with pytestHector Canto
 
Setup of EDA tools and workstation environment variables in NCTU 307 Lab. wor...
Setup of EDA tools and workstation environment variables in NCTU 307 Lab. wor...Setup of EDA tools and workstation environment variables in NCTU 307 Lab. wor...
Setup of EDA tools and workstation environment variables in NCTU 307 Lab. wor...Michael Lee
 
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...Databricks
 
Developing web apps using Erlang-Web
Developing web apps using Erlang-WebDeveloping web apps using Erlang-Web
Developing web apps using Erlang-Webfanqstefan
 
Unit testing presentation
Unit testing presentationUnit testing presentation
Unit testing presentationArthur Freyman
 
MyTunesbuild.xml Builds, tests, and runs the project M.docx
MyTunesbuild.xml      Builds, tests, and runs the project M.docxMyTunesbuild.xml      Builds, tests, and runs the project M.docx
MyTunesbuild.xml Builds, tests, and runs the project M.docxgilpinleeanna
 
2 second lesson- attributes
2 second lesson- attributes2 second lesson- attributes
2 second lesson- attributesMohammad Alyan
 
Some useful tips with qtp
Some useful tips with qtpSome useful tips with qtp
Some useful tips with qtpSandeep
 
Deployment with ExpressionEngine
Deployment with ExpressionEngineDeployment with ExpressionEngine
Deployment with ExpressionEngineGreen Egg Media
 
Bring Your Own Apache MXNet and TensorFlow Scripts to Amazon SageMaker (AIM35...
Bring Your Own Apache MXNet and TensorFlow Scripts to Amazon SageMaker (AIM35...Bring Your Own Apache MXNet and TensorFlow Scripts to Amazon SageMaker (AIM35...
Bring Your Own Apache MXNet and TensorFlow Scripts to Amazon SageMaker (AIM35...Amazon Web Services
 
Developing IT infrastructures with Puppet
Developing IT infrastructures with PuppetDeveloping IT infrastructures with Puppet
Developing IT infrastructures with PuppetAlessandro Franceschi
 
Data Storage In Android
Data Storage In Android Data Storage In Android
Data Storage In Android Aakash Ugale
 

Similar to Organizing Machine Learning Projects - Repository Organization (20)

CMake Tutorial
CMake TutorialCMake Tutorial
CMake Tutorial
 
Question IYou are going to use the semaphores for process sy.docx
Question IYou are going to use the semaphores for process sy.docxQuestion IYou are going to use the semaphores for process sy.docx
Question IYou are going to use the semaphores for process sy.docx
 
Effective testing with pytest
Effective testing with pytestEffective testing with pytest
Effective testing with pytest
 
Setup of EDA tools and workstation environment variables in NCTU 307 Lab. wor...
Setup of EDA tools and workstation environment variables in NCTU 307 Lab. wor...Setup of EDA tools and workstation environment variables in NCTU 307 Lab. wor...
Setup of EDA tools and workstation environment variables in NCTU 307 Lab. wor...
 
Hybrid framework
Hybrid frameworkHybrid framework
Hybrid framework
 
Perl 20tips
Perl 20tipsPerl 20tips
Perl 20tips
 
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
 
Developing web apps using Erlang-Web
Developing web apps using Erlang-WebDeveloping web apps using Erlang-Web
Developing web apps using Erlang-Web
 
Oracle applications 11i dba faq
Oracle applications 11i dba faqOracle applications 11i dba faq
Oracle applications 11i dba faq
 
Unit testing presentation
Unit testing presentationUnit testing presentation
Unit testing presentation
 
MyTunesbuild.xml Builds, tests, and runs the project M.docx
MyTunesbuild.xml      Builds, tests, and runs the project M.docxMyTunesbuild.xml      Builds, tests, and runs the project M.docx
MyTunesbuild.xml Builds, tests, and runs the project M.docx
 
2 second lesson- attributes
2 second lesson- attributes2 second lesson- attributes
2 second lesson- attributes
 
Some useful tips with qtp
Some useful tips with qtpSome useful tips with qtp
Some useful tips with qtp
 
Deployment with ExpressionEngine
Deployment with ExpressionEngineDeployment with ExpressionEngine
Deployment with ExpressionEngine
 
11i Logs
11i Logs11i Logs
11i Logs
 
R sharing 101
R sharing 101R sharing 101
R sharing 101
 
Bring Your Own Apache MXNet and TensorFlow Scripts to Amazon SageMaker (AIM35...
Bring Your Own Apache MXNet and TensorFlow Scripts to Amazon SageMaker (AIM35...Bring Your Own Apache MXNet and TensorFlow Scripts to Amazon SageMaker (AIM35...
Bring Your Own Apache MXNet and TensorFlow Scripts to Amazon SageMaker (AIM35...
 
Android Data Storagefinal
Android Data StoragefinalAndroid Data Storagefinal
Android Data Storagefinal
 
Developing IT infrastructures with Puppet
Developing IT infrastructures with PuppetDeveloping IT infrastructures with Puppet
Developing IT infrastructures with Puppet
 
Data Storage In Android
Data Storage In Android Data Storage In Android
Data Storage In Android
 

More from Hao-Wen (Herman) Dong

Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...Hao-Wen (Herman) Dong
 
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...Hao-Wen (Herman) Dong
 
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...Hao-Wen (Herman) Dong
 
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GANRecent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GANHao-Wen (Herman) Dong
 
Introduction to Deep Generative Models
Introduction to Deep Generative ModelsIntroduction to Deep Generative Models
Introduction to Deep Generative ModelsHao-Wen (Herman) Dong
 

More from Hao-Wen (Herman) Dong (6)

Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
Training Generative Adversarial Networks with Binary Neurons by End-to-end Ba...
 
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
Convolutional Generative Adversarial Networks with Binary Neurons for Polypho...
 
What is Critical in GAN Training?
What is Critical in GAN Training?What is Critical in GAN Training?
What is Critical in GAN Training?
 
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic ...
 
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GANRecent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
 
Introduction to Deep Generative Models
Introduction to Deep Generative ModelsIntroduction to Deep Generative Models
Introduction to Deep Generative Models
 

Recently uploaded

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 

Recently uploaded (20)

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 

Organizing Machine Learning Projects - Repository Organization

  • 1. Organizing Machine Learning Projects Repository Organization Hao-Wen Dong
  • 2. If you… ֎ have a hard time managing a bunch of experiments ֎ always forgot the exact configuration used in a specific experiment ֎ annoyed at having to copy all the code just to test a new architecture 2
  • 3. Repository organization (core) data/ Training data exp/ Experimental inputs and outputs scripts/ Shell scripts for training, testing, etc. src/ Source code LICENSE.txt License file README.md Readme file requirement.txt Requirements file 3
  • 4. Repository organization (core) data/ Training data exp/ Experimental inputs and outputs scripts/ Shell scripts for training, testing, etc. src/ Source code LICENSE.txt License file README.md Readme file requirement.txt Requirements file 4
  • 5. Source code (src/) musegan/ Model source code __init__.py File to make it a package inference.py Script for inference interpolation.py Script for interpolation process_data.py Script for data preprocessing train.py Script for training 5
  • 6. Source code (src/) musegan/ Model source code __init__.py File to make it a package inference.py Script for inference interpolation.py Script for interpolation process_data.py Script for data preprocessing train.py Script for training 6
  • 7. Model source code (src/musegan/) presets/ Preset network architectures __init__.py File to make it a package config.py System configuration file data.py Data loader default_config.yaml Default configurations default_params.yaml Default parameters io_utils.py I/O utilities losses.py Loss functions metrics.py Metrics model.py Main model class utils.py Utilities 7
  • 8. Model source code (src/musegan/) presets/ Preset network architectures __init__.py File to make it a package config.py System configuration file data.py Data loader default_config.yaml Default configurations default_params.yaml Default parameters io_utils.py I/O utilities losses.py Loss functions metrics.py Metrics model.py Main model class utils.py Utilities 8 Define your model
  • 9. Model source code (src/musegan/) presets/ Preset network architectures __init__.py File to make it a package config.py System configuration file data.py Data loader default_config.yaml Default configurations default_params.yaml Default parameters io_utils.py I/O utilities losses.py Loss functions metrics.py Metrics model.py Main model class utils.py Utilities 9 presets/ __init__.py generator/ __init__.py default.py ablated.py discriminator/ __init__.py default.py ablated.py
  • 10. Model source code (src/musegan/) presets/ Preset network architectures __init__.py File to make it a package config.py System configuration file data.py Data loader default_config.yaml Default configurations default_params.yaml Default parameters io_utils.py I/O utilities losses.py Loss functions metrics.py Metrics model.py Main model class utils.py Utilities 10 Recommend to use high-level APIs for flexibility such as tf.data or torch.utils.data
  • 11. Model source code (src/musegan/) presets/ Preset network architectures __init__.py File to make it a package config.py System configuration file data.py Data loader default_config.yaml Default configurations default_params.yaml Default parameters io_utils.py I/O utilities losses.py Loss functions metrics.py Metrics model.py Main model class utils.py Utilities 11 params config Define the model Define the settings for training/inference Will be copied to the experiment directory when setting up a new experiment
  • 12. Model source code (src/musegan/) presets/ Preset network architectures __init__.py File to make it a package config.py System configuration file data.py Data loader default_config.yaml Default configurations default_params.yaml Default parameters io_utils.py I/O utilities losses.py Loss functions metrics.py Metrics model.py Main model class utils.py Utilities 12 (for developers only) Define rarely-changed configuration variables. For example, - logging level/format - prefetch/buffer size for the data loader
  • 13. Source code (src/) musegan/ Model source code __init__.py File to make it a package inference.py Script for inference interpolation.py Script for interpolation process_data.py Script for data preprocessing train.py Script for training 13 Python scripts (to be called by shell scripts)
  • 14. Repository organization (core) data/ Training data exp/ Experimental inputs and outputs scripts/ Shell scripts for training, testing, etc. src/ Source code LICENSE.txt License file README.md Readme file requirement.txt Requirements file 14 All the outputs are saved here (logs, samples, and checkpoints)
  • 15. Repository organization (core) data/ Training data exp/ Experimental inputs and outputs scripts/ Shell scripts for training, testing, etc. src/ Source code LICENSE.txt License file README.md Readme file requirement.txt Requirements file 15
  • 16. Shell scripts (scripts/) download_data.sh Download training data download_models.sh Download pretrained models process_data.py Process the data rerun_exp.sh Rerun the experiment run_exp.sh Run the experiment run_inference.sh Run the inference run_interpolation.sh Run the interpolation run_train.sh Run the training setup_exp.sh Setup the experiment 16
  • 17. Experimenting ֎ Create an experiment directory (under exp/)  One experiment means to compare n different settings  Examples: default (the default setting), net_archs, streams, binary_neurons ֎ Setup the experiment items (run setup_exp.sh)  Run n times if you intend to compare n different settings in this experiment  Each experiment item (i.e., one setting) has its own directory ֎ Modify the configuration and parameters for each experiment item  Modify config.yaml and params.yaml in each experiment item directory ֎ Run the experiment (run run_exp.sh for each experiment item) 17
  • 18. Set up an experiment (scripts/setup_exp.sh) ֎ Given  [exp_name]  [exp_note] ֎ Do  Create an experiment directory named [exp_name] under exp/  Copy the default configuration and parameter files to the experiment directory  Write exp_note as a text file to the experiment directory  Examples: ‘Compare different network architectures’ and ‘Compare different types of binary neurons’ 18
  • 19. Run an experiment (scripts/run_exp.sh) ֎ Given  [exp_dir]  [gpu_num] ֎ Do  Automatically search for config.yaml and params.yaml in exp_dir  Run the scripts in specific orders. For example, a typical GAN experiment might look like  run_train.sh  run_inference.sh  run_interpolation.sh 19
  • 20. Shell scripts (scripts/) download_data.sh Download training data download_models.sh Download pretrained models process_data.py Process the data rerun_exp.sh Rerun the experiment run_exp.sh Run the experiment run_inference.sh Run the inference run_interpolation.sh Run the interpolation run_train.sh Run the training setup_exp.sh Setup the experiment 20 - Remove the outputs - Keep the configuration and parameter files - Run the experiment again
  • 21. Shell scripts (scripts/) download_data.sh Download training data download_models.sh Download pretrained models process_data.py Process the data rerun_exp.sh Rerun the experiment run_exp.sh Run the experiment run_inference.sh Run the inference run_interpolation.sh Run the interpolation run_train.sh Run the training setup_exp.sh Setup the experiment 21 The training data and pretrained models should be large and thus hosted somewhere else
  • 22. Shell scripts (scripts/) download_data.sh Download training data download_models.sh Download pretrained models process_data.py Process the data rerun_exp.sh Rerun the experiment run_exp.sh Run the experiment run_inference.sh Run the inference run_interpolation.sh Run the interpolation run_train.sh Run the training setup_exp.sh Setup the experiment 22 Process the downloaded data for training
  • 23. Repository organization (core) data/ Training data exp/ Experimental inputs and outputs scripts/ Shell scripts for training, testing, etc. src/ Source code LICENSE.txt License file README.md Readme file requirement.txt Requirements file 23 Should be listed in .gitignore
  • 24. Repository organization (core) data/ Training data exp/ Experimental inputs and outputs scripts/ Shell scripts for training, testing, etc. src/ Source code LICENSE.txt License file README.md Readme file requirement.txt Requirements file 24 Required for others to use your code See https://choosealicense.com/ to choose a proper open source license
  • 25. Repository organization (core) data/ Training data exp/ Experimental inputs and outputs scripts/ Shell scripts for training, testing, etc. src/ Source code LICENSE.txt License file README.md Readme file requirement.txt Requirements file 25 (for reproducibility)
  • 26. Repository organization (complete) data/ Training data docs/ Website contents exp/ Experimental inputs and outputs scripts/ Shell scripts for training, testing, etc. src/ Source code LICENSE.txt License file Pipfile Dependency files Pipfile.lock Dependency files README.md Readme file requirement.txt Requirements file 26
  • 27. Repository organization (complete) 27 data/ Training data docs/ Website contents exp/ Experimental inputs and outputs scripts/ Shell scripts for training, testing, etc. src/ Source code LICENSE.txt License file Pipfile Dependency files Pipfile.lock Dependency files README.md Readme file requirement.txt Requirements file Recommend to use pipenv for packaging
  • 28. Repository organization (complete) 28 data/ Training data docs/ Website contents exp/ Experimental inputs and outputs scripts/ Shell scripts for training, testing, etc. src/ Source code LICENSE.txt License file Pipfile Dependency files Pipfile.lock Dependency files README.md Readme file requirement.txt Requirements file Recommend to use GitHub Pages for simplicity
  • 29. Benefits ֎ Easy to manage lots of experiments  Each experiment has its own directory  Configuration and parameters used in each experiment are saved  Configuration and parameters are loaded locally (no need to modify the source code) ֎ Easy to examine new network architectures  Simply add a new preset to the preset directory  No need to modify other source code 29
  • 30. Thank you for your attention See an example project using this template—MuseGAN