SlideShare a Scribd company logo
1 of 40
Download to read offline
Why Data Science is not Software Development
and
How experiment management helps bring
science back to data science
kuba@neptune.ml
@NeptuneML
https://medium.com/neptune-ml
Jakub Czakon
● Why Data Science is not Software Development
● What is Experiment Management and how to use it
Agenda
Data Science vs Software Development
Development Science
● Feature scoping
● Extensive testing
● Code review and refactoring
● Extensive monitoring
● Exception handling
● Almost entire codebase used
● ...
● Explore
● Iteratively try ideas
● Communicate/question/analyse
results
● Almost entire codebase dropped
● ...
Data Science vs Software Development
“Render unto Software
development the things that are
Software Development's, and
unto Data Science the things
that are Data Science’s“
Development
Data Science Project
Science
● Data access/preprocessing
● Feature extraction
● Model inference pipelines/Rest API
● Exception handling
● Results monitoring
● Model retraining
● Resource provisioning
● ...
● Data exploration
● Hypothesis testing
● Feature prototyping/development
● Model prototyping/development
● Pipeline comparison
● Results exploration
● Problem understanding
● ...
Development
Data Science Project
Science
● When you know what to do
● When things will be done more than
once
● ...
● When you don’t yet know
● When things could end up being
done just once
● ...
Managing Data Science just like
Software Development is wrong
link
Managing Data Science just like
Software Development is wrong
Infamous quotes:
● “What insights will you be able to derive from this?”
● “How long will the data exploration take?”
● “When can we expect to have a working model?”
● “When will be able to improve by 20%?”
● “How is the MVP accuracy for this problem? 100%
What is your current human accuracy? I don’t know, maybe 70%”
Data Science is not just
machine learning
link to Quora
Data Science is
really close to business
release-0 release-1 release-2
exp-1.0
exp-1.1
exp-1.1.3
exp-1.1.2
exp-1.1.1
exp-1.1.2.1
exp-2.0
exp-2.1
exp-2.1.1
exp-2.1.0
science .how?
development .git
Data science projects are
experimental by design
In data science using just .git
makes keeping track of things .hard
release-0 release-1 release-2
exp-1.0
exp-1.1
exp-1.1.3
exp-1.1.2
exp-1.1.1
exp-1.1.2.1
exp-2.0
exp-2.1
exp-2.1.1
exp-2.1.0
?
To re-run you need more than just code
Experiment management
“Experiment management is a process of tracking experiment
metadata, organizing it in a meaningful way, and making it available
to access and collaborate on within your organization.”- me
What is experiment management?
No framework was hurt during
the production of this talk...
...but hopefully some will be
after
Track
18
Code:
● Version scripts
● Version notebooks:
○ nbdime, jupytext, neptune-notebooks
● Magic numbers -> hyperparameters
● Make sure your notebook runs top-to-bottom
jupyter nbconvert --to script nb.ipynb python nb.py; python nb.py
Hyperparameters:
● Everything goes into config
● If passed via command ->
automagically goes to config
● If passed via script ->
automagically goes into config
Bonus: hyperparameter optimization for free
HPO blog post series
Environment:
● Reproducible environment
preferably in- config
● Good options:
○ Docker
○ Conda
○ Makefiles
Metrics:
● Good validation >> insert smth
● Always be logging
● The more metrics the better
● Track training/validation/test
errors to estimate generalization
score = evaluate(model, valid_data)
exp.log('valid_auc', score)
youtube link
Data version:
● Storage is cheap(ish) >>
keep old versions
● Log data path
● Log data hash
train = pd.read_csv(TRAIN_PATH)
exp.log('data_path', TRAIN_PATH)
md5 = md5_from_file(TRAIN_PATH)
exp.log('data_version', md5)
Results exploration:
● Confusion matrix heatmap
● Predictions distributions
● Best/worst predictions
● In-train predictions
dist_fig = plot_dist(predictions)
exp.log('figure', dist_fig)
for pred_img in predictions:
exp.log('image_preds', pred_img)
26
Organize
Work in creative iterations
time, budget, business_goal = business_specification()
creative_idea = initial_research(business_goal)
while time and budget and not business_goal:
solution = develop(creative_idea)
metrics = evaluate(solution, validation_data)
if metrics > best_metrics:
best_metrics = metrics
best_solution = solution
creative_idea = explore_results(best_solution)
time.update()
budget.update()
Why explore results first?
Explore current results
Identify problems
Implement new idea
that solves the problem
Prioritize problems
Why explore results first?
Explore current results
Identify problems
Implement new idea
that solves the problem
Prioritize problems
Find awesome ideas
Twitter/Medium/Conf
Choose the coolest
method
Hope it solves all the
problems
Implement it
Good Bad
How to explore results?
● Internal
○ Diagnostic charts
○ Model comparisons
○ Permutation importance
○ Worts/best predictions
○ Shap values/eli5
● External
○ User/stakeholder feedback
Yellowbrick
scikit-plot
Altair
shap
eli5
Flask
BentoML
Why explore results at all?
● Know where your model fails
● Validate whether your model improved
where you wanted it to
● Formulate your next steps
● Cherry-pick good/bad/funny results
Connecting the dots
time, budget, business_goal = business_specification()
creative_idea = initial_research(business_goal)
while time and budget and not business_goal:
solution = develop(creative_idea, training_data)
metrics = evaluate(solution, validation_data)
if metrics > best_metrics:
best_metrics = metrics
best_solution = solution
creative_idea = explore_results(best_solution)
time.update()
budget.update()
● Tag with creative idea
● Log train and valid data versions
● Log metrics
● Log model and valid predictions
● Version results exploration
notebook
● Version code .git
Experiments are organized
Exploratory analysis is versioned
35
Collaborate
Central hub facilitates collaboration
Central Data
Science
project hub
Work is accessible
Work is accessible
● Slides link on Twitter @NeptuneML and Linkedin @neptune.ml
● My blog post on experiment management
● Example project with experiment management:
○ Code
○ Parameters
○ Environment
○ Data
○ … more
Materials
Data science collaboration hub
Track | Organize | Collaborate
kuba@neptune.ml
@NeptuneML
https://medium.com/neptune-ml
Jakub Czakon

More Related Content

What's hot

Rapid Software Testing: Strategy
Rapid Software Testing: StrategyRapid Software Testing: Strategy
Rapid Software Testing: StrategyTechWell
 
Rapid Software Testing: Reporting
Rapid Software Testing: ReportingRapid Software Testing: Reporting
Rapid Software Testing: ReportingTechWell
 
Problem solving methodology
Problem solving methodologyProblem solving methodology
Problem solving methodologyByron Mitchell
 
Solving technical challenges
Solving technical challengesSolving technical challenges
Solving technical challengesMoti Margalit
 
7 steps to master problem solving
7 steps to master problem solving7 steps to master problem solving
7 steps to master problem solvingYuri Kaminski
 
Applying good context driven testing in an agile context
Applying good context driven testing in an agile contextApplying good context driven testing in an agile context
Applying good context driven testing in an agile contextMarkus Gärtner
 
A Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software TestingA Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software TestingTechWell
 
Will Robots Replace Testers?
Will Robots Replace Testers?Will Robots Replace Testers?
Will Robots Replace Testers?TEST Huddle
 
Root cause analysis training for beginners
Root cause analysis training for beginnersRoot cause analysis training for beginners
Root cause analysis training for beginnersBryan Len
 
A Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software TestingA Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software TestingTechWell
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Debraj GuhaThakurta
 
Mozilla Open Badges Workshop - Super learners Week @ Irlam & Cadishead College
Mozilla Open Badges Workshop - Super learners Week @ Irlam & Cadishead CollegeMozilla Open Badges Workshop - Super learners Week @ Irlam & Cadishead College
Mozilla Open Badges Workshop - Super learners Week @ Irlam & Cadishead CollegePatrick John McGee
 
Michael Bolton - Heuristics: Solving Problems Rapidly
Michael Bolton - Heuristics: Solving Problems RapidlyMichael Bolton - Heuristics: Solving Problems Rapidly
Michael Bolton - Heuristics: Solving Problems RapidlyTEST Huddle
 
Problem solving and analytical skills
Problem solving and analytical skillsProblem solving and analytical skills
Problem solving and analytical skillstayyabaways
 
Shrini Kulkarni - Software Metrics - So Simple, Yet So Dangerous
Shrini Kulkarni -  Software Metrics - So Simple, Yet So Dangerous Shrini Kulkarni -  Software Metrics - So Simple, Yet So Dangerous
Shrini Kulkarni - Software Metrics - So Simple, Yet So Dangerous TEST Huddle
 
Santa Barbara Agile: Exploratory Testing Explained and Experienced
Santa Barbara Agile: Exploratory Testing Explained and ExperiencedSanta Barbara Agile: Exploratory Testing Explained and Experienced
Santa Barbara Agile: Exploratory Testing Explained and ExperiencedMaaret Pyhäjärvi
 

What's hot (18)

Rapid Software Testing: Strategy
Rapid Software Testing: StrategyRapid Software Testing: Strategy
Rapid Software Testing: Strategy
 
Rapid Software Testing: Reporting
Rapid Software Testing: ReportingRapid Software Testing: Reporting
Rapid Software Testing: Reporting
 
Problem solving methodology
Problem solving methodologyProblem solving methodology
Problem solving methodology
 
Solving technical challenges
Solving technical challengesSolving technical challenges
Solving technical challenges
 
7 steps to master problem solving
7 steps to master problem solving7 steps to master problem solving
7 steps to master problem solving
 
Applying good context driven testing in an agile context
Applying good context driven testing in an agile contextApplying good context driven testing in an agile context
Applying good context driven testing in an agile context
 
A Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software TestingA Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software Testing
 
Will Robots Replace Testers?
Will Robots Replace Testers?Will Robots Replace Testers?
Will Robots Replace Testers?
 
Root cause analysis training for beginners
Root cause analysis training for beginnersRoot cause analysis training for beginners
Root cause analysis training for beginners
 
A Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software TestingA Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software Testing
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017
 
Mozilla Open Badges Workshop - Super learners Week @ Irlam & Cadishead College
Mozilla Open Badges Workshop - Super learners Week @ Irlam & Cadishead CollegeMozilla Open Badges Workshop - Super learners Week @ Irlam & Cadishead College
Mozilla Open Badges Workshop - Super learners Week @ Irlam & Cadishead College
 
ATD2K16
ATD2K16ATD2K16
ATD2K16
 
Michael Bolton - Heuristics: Solving Problems Rapidly
Michael Bolton - Heuristics: Solving Problems RapidlyMichael Bolton - Heuristics: Solving Problems Rapidly
Michael Bolton - Heuristics: Solving Problems Rapidly
 
Application of analytics
Application of analyticsApplication of analytics
Application of analytics
 
Problem solving and analytical skills
Problem solving and analytical skillsProblem solving and analytical skills
Problem solving and analytical skills
 
Shrini Kulkarni - Software Metrics - So Simple, Yet So Dangerous
Shrini Kulkarni -  Software Metrics - So Simple, Yet So Dangerous Shrini Kulkarni -  Software Metrics - So Simple, Yet So Dangerous
Shrini Kulkarni - Software Metrics - So Simple, Yet So Dangerous
 
Santa Barbara Agile: Exploratory Testing Explained and Experienced
Santa Barbara Agile: Exploratory Testing Explained and ExperiencedSanta Barbara Agile: Exploratory Testing Explained and Experienced
Santa Barbara Agile: Exploratory Testing Explained and Experienced
 

Similar to Data science is not Software Development and how Experiment Management can make things better.

Symposium 2019 : Gestion de projet en Intelligence Artificielle
Symposium 2019 : Gestion de projet en Intelligence ArtificielleSymposium 2019 : Gestion de projet en Intelligence Artificielle
Symposium 2019 : Gestion de projet en Intelligence ArtificiellePMI-Montréal
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceJuuso Parkkinen
 
How to track and organize your machine learning experimentations
How to track and organize your machine learning experimentationsHow to track and organize your machine learning experimentations
How to track and organize your machine learning experimentationsJakub Czakon
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analyticssunnypatil1778
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
[DSC Croatia 22] How we create and leverage data services in GitLab - Radovan...
[DSC Croatia 22] How we create and leverage data services in GitLab - Radovan...[DSC Croatia 22] How we create and leverage data services in GitLab - Radovan...
[DSC Croatia 22] How we create and leverage data services in GitLab - Radovan...DataScienceConferenc1
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Edureka!
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamDoug Needham
 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasMerce Crosas
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTrivadis
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the tradeFangda Wang
 
Pin the tail on the metric v01 2016 oct
Pin the tail on the metric v01 2016 octPin the tail on the metric v01 2016 oct
Pin the tail on the metric v01 2016 octSteven Martin
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data careerAdwait Bhave
 
10 Tips From A Young Data Scientist
10 Tips From A Young Data Scientist10 Tips From A Young Data Scientist
10 Tips From A Young Data ScientistNuno Carneiro
 
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys HolovatyiDataScienceConferenc1
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceLivePerson
 
2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible researchYannick Wurm
 

Similar to Data science is not Software Development and how Experiment Management can make things better. (20)

Symposium 2019 : Gestion de projet en Intelligence Artificielle
Symposium 2019 : Gestion de projet en Intelligence ArtificielleSymposium 2019 : Gestion de projet en Intelligence Artificielle
Symposium 2019 : Gestion de projet en Intelligence Artificielle
 
How to Prepare for a Career in Data Science
How to Prepare for a Career in Data ScienceHow to Prepare for a Career in Data Science
How to Prepare for a Career in Data Science
 
How to track and organize your machine learning experimentations
How to track and organize your machine learning experimentationsHow to track and organize your machine learning experimentations
How to track and organize your machine learning experimentations
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
[DSC Croatia 22] How we create and leverage data services in GitLab - Radovan...
[DSC Croatia 22] How we create and leverage data services in GitLab - Radovan...[DSC Croatia 22] How we create and leverage data services in GitLab - Radovan...
[DSC Croatia 22] How we create and leverage data services in GitLab - Radovan...
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug Needham
 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosas
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the trade
 
Pin the tail on the metric v01 2016 oct
Pin the tail on the metric v01 2016 octPin the tail on the metric v01 2016 oct
Pin the tail on the metric v01 2016 oct
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data career
 
10 Tips From A Young Data Scientist
10 Tips From A Young Data Scientist10 Tips From A Young Data Scientist
10 Tips From A Young Data Scientist
 
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
 
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research
 

Recently uploaded

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 

Recently uploaded (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 

Data science is not Software Development and how Experiment Management can make things better.

  • 1. Why Data Science is not Software Development and How experiment management helps bring science back to data science kuba@neptune.ml @NeptuneML https://medium.com/neptune-ml Jakub Czakon
  • 2. ● Why Data Science is not Software Development ● What is Experiment Management and how to use it Agenda
  • 3. Data Science vs Software Development
  • 4. Development Science ● Feature scoping ● Extensive testing ● Code review and refactoring ● Extensive monitoring ● Exception handling ● Almost entire codebase used ● ... ● Explore ● Iteratively try ideas ● Communicate/question/analyse results ● Almost entire codebase dropped ● ... Data Science vs Software Development
  • 5. “Render unto Software development the things that are Software Development's, and unto Data Science the things that are Data Science’s“
  • 6. Development Data Science Project Science ● Data access/preprocessing ● Feature extraction ● Model inference pipelines/Rest API ● Exception handling ● Results monitoring ● Model retraining ● Resource provisioning ● ... ● Data exploration ● Hypothesis testing ● Feature prototyping/development ● Model prototyping/development ● Pipeline comparison ● Results exploration ● Problem understanding ● ...
  • 7. Development Data Science Project Science ● When you know what to do ● When things will be done more than once ● ... ● When you don’t yet know ● When things could end up being done just once ● ...
  • 8. Managing Data Science just like Software Development is wrong link
  • 9. Managing Data Science just like Software Development is wrong Infamous quotes: ● “What insights will you be able to derive from this?” ● “How long will the data exploration take?” ● “When can we expect to have a working model?” ● “When will be able to improve by 20%?” ● “How is the MVP accuracy for this problem? 100% What is your current human accuracy? I don’t know, maybe 70%”
  • 10. Data Science is not just machine learning link to Quora
  • 11. Data Science is really close to business
  • 13. In data science using just .git makes keeping track of things .hard release-0 release-1 release-2 exp-1.0 exp-1.1 exp-1.1.3 exp-1.1.2 exp-1.1.1 exp-1.1.2.1 exp-2.0 exp-2.1 exp-2.1.1 exp-2.1.0 ?
  • 14. To re-run you need more than just code
  • 16. “Experiment management is a process of tracking experiment metadata, organizing it in a meaningful way, and making it available to access and collaborate on within your organization.”- me What is experiment management?
  • 17. No framework was hurt during the production of this talk... ...but hopefully some will be after
  • 19. Code: ● Version scripts ● Version notebooks: ○ nbdime, jupytext, neptune-notebooks ● Magic numbers -> hyperparameters ● Make sure your notebook runs top-to-bottom jupyter nbconvert --to script nb.ipynb python nb.py; python nb.py
  • 20. Hyperparameters: ● Everything goes into config ● If passed via command -> automagically goes to config ● If passed via script -> automagically goes into config
  • 21. Bonus: hyperparameter optimization for free HPO blog post series
  • 22. Environment: ● Reproducible environment preferably in- config ● Good options: ○ Docker ○ Conda ○ Makefiles
  • 23. Metrics: ● Good validation >> insert smth ● Always be logging ● The more metrics the better ● Track training/validation/test errors to estimate generalization score = evaluate(model, valid_data) exp.log('valid_auc', score) youtube link
  • 24. Data version: ● Storage is cheap(ish) >> keep old versions ● Log data path ● Log data hash train = pd.read_csv(TRAIN_PATH) exp.log('data_path', TRAIN_PATH) md5 = md5_from_file(TRAIN_PATH) exp.log('data_version', md5)
  • 25. Results exploration: ● Confusion matrix heatmap ● Predictions distributions ● Best/worst predictions ● In-train predictions dist_fig = plot_dist(predictions) exp.log('figure', dist_fig) for pred_img in predictions: exp.log('image_preds', pred_img)
  • 27. Work in creative iterations time, budget, business_goal = business_specification() creative_idea = initial_research(business_goal) while time and budget and not business_goal: solution = develop(creative_idea) metrics = evaluate(solution, validation_data) if metrics > best_metrics: best_metrics = metrics best_solution = solution creative_idea = explore_results(best_solution) time.update() budget.update()
  • 28. Why explore results first? Explore current results Identify problems Implement new idea that solves the problem Prioritize problems
  • 29. Why explore results first? Explore current results Identify problems Implement new idea that solves the problem Prioritize problems Find awesome ideas Twitter/Medium/Conf Choose the coolest method Hope it solves all the problems Implement it Good Bad
  • 30. How to explore results? ● Internal ○ Diagnostic charts ○ Model comparisons ○ Permutation importance ○ Worts/best predictions ○ Shap values/eli5 ● External ○ User/stakeholder feedback Yellowbrick scikit-plot Altair shap eli5 Flask BentoML
  • 31. Why explore results at all? ● Know where your model fails ● Validate whether your model improved where you wanted it to ● Formulate your next steps ● Cherry-pick good/bad/funny results
  • 32. Connecting the dots time, budget, business_goal = business_specification() creative_idea = initial_research(business_goal) while time and budget and not business_goal: solution = develop(creative_idea, training_data) metrics = evaluate(solution, validation_data) if metrics > best_metrics: best_metrics = metrics best_solution = solution creative_idea = explore_results(best_solution) time.update() budget.update() ● Tag with creative idea ● Log train and valid data versions ● Log metrics ● Log model and valid predictions ● Version results exploration notebook ● Version code .git
  • 36. Central hub facilitates collaboration Central Data Science project hub
  • 39. ● Slides link on Twitter @NeptuneML and Linkedin @neptune.ml ● My blog post on experiment management ● Example project with experiment management: ○ Code ○ Parameters ○ Environment ○ Data ○ … more Materials
  • 40. Data science collaboration hub Track | Organize | Collaborate kuba@neptune.ml @NeptuneML https://medium.com/neptune-ml Jakub Czakon