Presentation from PyData Munich '19.
You will learn how with some simple steps you can have your work organized around creative iterations, reproducible and easy to share with anyone. You will see how to easily track the code, metrics, hyperparameters, learning curves, data versions and more.
Bonus point: we will speak with a bot that knows a lot about the experiments.
08448380779 Call Girls In Civil Lines Women Seeking Men
How to track and organize your experimentation process
1.
2. Jakub Czakon
Senior Data Scientist
@neptune.ml
● Worked at a data science consulting firm
deepsense.ai
● Joined the team that developed an internal
tool for tracking and managing experiments
● We spinned-off this tool as neptune.ml
● Working on open-source/community side of
things neptune-contrib
4. ● Worked on my machine, when I ran that notebook
● They got 75% on that problem… idk which data version or metric
● I don’t understand this approach... what do you mean this person is
long gone… the confluence page is not really helping
● Mid-work interruptions are not exactly what I like the most
.
5. ● We are missing tracking/organization standards
● Knowledge is scattered across many tools
● People are not really working together
6.
7.
8. ● Time spent fixing/re-doing >> time spent discovering
● “bus factor” goes way up
● Visit to the “alone in the dark” land
● and...
9.
10. ● Magic numbers -> hyperparameters
● Make sure your notebook works
jupyter nbconvert --to script nb.ipynb
python nb.py
11. ● Everything goes into config
● If passed via command -> automagically goes to
config
● If passed via script -> automagically goes into
config
Bonus -> hyper parameter optimization for free(ish)
12. ● Good validation >> insert smth
● Always be (c)logging
● The more metrics the better
13. ● Good validation >> insert smth
● Always be (c)logging
● The more metrics the better score = evaluate(model)
exp.log(‘score’, score)
14. ● Storage is cheap(ish) >> keep old versions
● Log data path
● Log data hash train = pd.read_csv(TRAIN_PATH)
exp.log(‘data_path’, TRAIN_PATH)
md5 = md5_from_file(TRAIN_PATH)
exp.log(‘data_version’, md5)
15. ● You get a better picture, and keep it longer
● Someone may actually be able to understand your problem
● You get a clear picture -> clear head -> better ideas