The document describes a recommended structure for organizing machine learning projects. It recommends organizing the repository into core directories like data, experiments, scripts, and source code. The source code directory would contain the main model code, with presets for different network architectures. Experimental settings would be saved in configuration files within experiment directories. Shell scripts are recommended for running tasks like training, inference, and managing experiments. This structure aims to make projects easier to manage, reproduce, and test new architectures.
2. If you…
֎ have a hard time managing a bunch of experiments
֎ always forgot the exact configuration used in a specific experiment
֎ annoyed at having to copy all the code just to test a new architecture
2
3. Repository organization (core)
data/ Training data
exp/ Experimental inputs and outputs
scripts/ Shell scripts for training, testing, etc.
src/ Source code
LICENSE.txt License file
README.md Readme file
requirement.txt Requirements file
3
4. Repository organization (core)
data/ Training data
exp/ Experimental inputs and outputs
scripts/ Shell scripts for training, testing, etc.
src/ Source code
LICENSE.txt License file
README.md Readme file
requirement.txt Requirements file
4
5. Source code (src/)
musegan/ Model source code
__init__.py File to make it a package
inference.py Script for inference
interpolation.py Script for interpolation
process_data.py Script for data preprocessing
train.py Script for training
5
6. Source code (src/)
musegan/ Model source code
__init__.py File to make it a package
inference.py Script for inference
interpolation.py Script for interpolation
process_data.py Script for data preprocessing
train.py Script for training
6
7. Model source code (src/musegan/)
presets/ Preset network architectures
__init__.py File to make it a package
config.py System configuration file
data.py Data loader
default_config.yaml Default configurations
default_params.yaml Default parameters
io_utils.py I/O utilities
losses.py Loss functions
metrics.py Metrics
model.py Main model class
utils.py Utilities
7
8. Model source code (src/musegan/)
presets/ Preset network architectures
__init__.py File to make it a package
config.py System configuration file
data.py Data loader
default_config.yaml Default configurations
default_params.yaml Default parameters
io_utils.py I/O utilities
losses.py Loss functions
metrics.py Metrics
model.py Main model class
utils.py Utilities
8
Define your model
9. Model source code (src/musegan/)
presets/ Preset network architectures
__init__.py File to make it a package
config.py System configuration file
data.py Data loader
default_config.yaml Default configurations
default_params.yaml Default parameters
io_utils.py I/O utilities
losses.py Loss functions
metrics.py Metrics
model.py Main model class
utils.py Utilities
9
presets/
__init__.py
generator/
__init__.py
default.py
ablated.py
discriminator/
__init__.py
default.py
ablated.py
10. Model source code (src/musegan/)
presets/ Preset network architectures
__init__.py File to make it a package
config.py System configuration file
data.py Data loader
default_config.yaml Default configurations
default_params.yaml Default parameters
io_utils.py I/O utilities
losses.py Loss functions
metrics.py Metrics
model.py Main model class
utils.py Utilities
10
Recommend to use high-level
APIs for flexibility such as
tf.data or torch.utils.data
11. Model source code (src/musegan/)
presets/ Preset network architectures
__init__.py File to make it a package
config.py System configuration file
data.py Data loader
default_config.yaml Default configurations
default_params.yaml Default parameters
io_utils.py I/O utilities
losses.py Loss functions
metrics.py Metrics
model.py Main model class
utils.py Utilities
11
params config
Define the model
Define the settings for
training/inference
Will be copied to the experiment
directory when setting up a new
experiment
12. Model source code (src/musegan/)
presets/ Preset network architectures
__init__.py File to make it a package
config.py System configuration file
data.py Data loader
default_config.yaml Default configurations
default_params.yaml Default parameters
io_utils.py I/O utilities
losses.py Loss functions
metrics.py Metrics
model.py Main model class
utils.py Utilities
12
(for developers only)
Define rarely-changed
configuration variables.
For example,
- logging level/format
- prefetch/buffer size
for the data loader
13. Source code (src/)
musegan/ Model source code
__init__.py File to make it a package
inference.py Script for inference
interpolation.py Script for interpolation
process_data.py Script for data preprocessing
train.py Script for training
13
Python scripts
(to be called by shell scripts)
14. Repository organization (core)
data/ Training data
exp/ Experimental inputs and outputs
scripts/ Shell scripts for training, testing, etc.
src/ Source code
LICENSE.txt License file
README.md Readme file
requirement.txt Requirements file
14
All the outputs are saved here
(logs, samples, and checkpoints)
15. Repository organization (core)
data/ Training data
exp/ Experimental inputs and outputs
scripts/ Shell scripts for training, testing, etc.
src/ Source code
LICENSE.txt License file
README.md Readme file
requirement.txt Requirements file
15
16. Shell scripts (scripts/)
download_data.sh Download training data
download_models.sh Download pretrained models
process_data.py Process the data
rerun_exp.sh Rerun the experiment
run_exp.sh Run the experiment
run_inference.sh Run the inference
run_interpolation.sh Run the interpolation
run_train.sh Run the training
setup_exp.sh Setup the experiment
16
17. Experimenting
֎ Create an experiment directory (under exp/)
One experiment means to compare n different settings
Examples: default (the default setting), net_archs, streams, binary_neurons
֎ Setup the experiment items (run setup_exp.sh)
Run n times if you intend to compare n different settings in this experiment
Each experiment item (i.e., one setting) has its own directory
֎ Modify the configuration and parameters for each experiment item
Modify config.yaml and params.yaml in each experiment item directory
֎ Run the experiment (run run_exp.sh for each experiment item)
17
18. Set up an experiment (scripts/setup_exp.sh)
֎ Given
[exp_name]
[exp_note]
֎ Do
Create an experiment directory named [exp_name] under exp/
Copy the default configuration and parameter files to the experiment directory
Write exp_note as a text file to the experiment directory
Examples: ‘Compare different network architectures’ and ‘Compare different types of binary neurons’
18
19. Run an experiment (scripts/run_exp.sh)
֎ Given
[exp_dir]
[gpu_num]
֎ Do
Automatically search for config.yaml and params.yaml in exp_dir
Run the scripts in specific orders. For example, a typical GAN experiment might look like
run_train.sh
run_inference.sh
run_interpolation.sh
19
20. Shell scripts (scripts/)
download_data.sh Download training data
download_models.sh Download pretrained models
process_data.py Process the data
rerun_exp.sh Rerun the experiment
run_exp.sh Run the experiment
run_inference.sh Run the inference
run_interpolation.sh Run the interpolation
run_train.sh Run the training
setup_exp.sh Setup the experiment
20
- Remove the outputs
- Keep the configuration
and parameter files
- Run the experiment again
21. Shell scripts (scripts/)
download_data.sh Download training data
download_models.sh Download pretrained models
process_data.py Process the data
rerun_exp.sh Rerun the experiment
run_exp.sh Run the experiment
run_inference.sh Run the inference
run_interpolation.sh Run the interpolation
run_train.sh Run the training
setup_exp.sh Setup the experiment
21
The training data and
pretrained models should
be large and thus hosted
somewhere else
22. Shell scripts (scripts/)
download_data.sh Download training data
download_models.sh Download pretrained models
process_data.py Process the data
rerun_exp.sh Rerun the experiment
run_exp.sh Run the experiment
run_inference.sh Run the inference
run_interpolation.sh Run the interpolation
run_train.sh Run the training
setup_exp.sh Setup the experiment
22
Process the downloaded
data for training
23. Repository organization (core)
data/ Training data
exp/ Experimental inputs and outputs
scripts/ Shell scripts for training, testing, etc.
src/ Source code
LICENSE.txt License file
README.md Readme file
requirement.txt Requirements file
23
Should be listed in .gitignore
24. Repository organization (core)
data/ Training data
exp/ Experimental inputs and outputs
scripts/ Shell scripts for training, testing, etc.
src/ Source code
LICENSE.txt License file
README.md Readme file
requirement.txt Requirements file
24
Required for others to use your code
See https://choosealicense.com/ to
choose a proper open source license
25. Repository organization (core)
data/ Training data
exp/ Experimental inputs and outputs
scripts/ Shell scripts for training, testing, etc.
src/ Source code
LICENSE.txt License file
README.md Readme file
requirement.txt Requirements file
25
(for reproducibility)
26. Repository organization (complete)
data/ Training data
docs/ Website contents
exp/ Experimental inputs and outputs
scripts/ Shell scripts for training, testing, etc.
src/ Source code
LICENSE.txt License file
Pipfile Dependency files
Pipfile.lock Dependency files
README.md Readme file
requirement.txt Requirements file
26
27. Repository organization (complete)
27
data/ Training data
docs/ Website contents
exp/ Experimental inputs and outputs
scripts/ Shell scripts for training, testing, etc.
src/ Source code
LICENSE.txt License file
Pipfile Dependency files
Pipfile.lock Dependency files
README.md Readme file
requirement.txt Requirements file
Recommend to use pipenv for packaging
28. Repository organization (complete)
28
data/ Training data
docs/ Website contents
exp/ Experimental inputs and outputs
scripts/ Shell scripts for training, testing, etc.
src/ Source code
LICENSE.txt License file
Pipfile Dependency files
Pipfile.lock Dependency files
README.md Readme file
requirement.txt Requirements file
Recommend to use GitHub Pages for simplicity
29. Benefits
֎ Easy to manage lots of experiments
Each experiment has its own directory
Configuration and parameters used in each experiment are saved
Configuration and parameters are loaded locally (no need to modify the source code)
֎ Easy to examine new network architectures
Simply add a new preset to the preset directory
No need to modify other source code
29
30. Thank you for your attention See an example project using
this template—MuseGAN