SlideShare a Scribd company logo
1 of 54
Download to read offline
Machine Learning:
Opening the Pandora’s Box
By Dhiana Deva - Machine Learning Engineer at Spotify
QCon São Paulo - May 2019
Agenda
About me
Open the Pandora’s Box
Start with stupid
Aim for creepy
Hit half-way there
About Me
Me @ QCon Rio 2015
Me @ QCon São Paulo 2019
Open the Pandora’s Box!
Introducing Machine Learning
Introducing Machine Learning
is like opening the Pandora’s Box
Problem
s
Problems
Problem
s
Problems
Introducing Machine Learning
is like opening the Pandora’s Box
Problem
s
Problems
Problem
s
Problems
Introducing Machine Learning
is like opening the Pandora’s BoxAssumptionsC
onstraints
Issues
Risks
Constraints
Be aware (not afraid)
of constraints
What decisions can you affect?
What are the system
implications?
What does your ML Infra
support?
Illustration from the book
"Creative People Must be Stopped”
By David A. Owens
Example Constraints
Business Constraints
• Metrics
• Business logic
• Legal needs
Data Constraints
• Volume
• Features
• Labels
Systems Constraints
• Available levers
• Infrastructure support
• Systems implications
• Engineering effort
Addressing Constraints
Investigate, communicate, and address it by either:
• Accepting and working under its boundaries
• Expanding its boundaries
WARNING: Hitting an unexpected critical constraint too late in the process can kill
your ML product!
Assumptions
"You have no idea,
but you pretend you
know."
You might not have enough data to back
your hypothesis.
Historical data is biased by existing
heuristics.
The hypothesis behind your ML product
might be based on a critical assumption.
Assumptions bridging between "Known Unknowns" and
"Known Knowns"
KNOWN UNKNOWN
KNOWN ASSUMPTIONS
UNKNOWN
Example Assumptions
• Are the metrics sensitive to the levers the ML approach is pulling?
• How do customers behave under changes in the logic?
• Impact analysis assumptions:
- Cost of misclassification
- Benefit of correct classification
- Assumptions for worst case scenario
- Parameters for more optimistic scenarios
Addressing Assumptions
• Experiment early and focus on learning parameters needed for better
impact analysis and further more sophisticated approaches.
• Consider reframing initial problems to be solved, to validate most critical
assumptions first.
• To be able to more forward with an unbiased approach, collect randomized
data.
Issues
Machine Learning
itself might not be
the issue!
Is there latency introduced?
Did the systems need to be changed,
decoupled or refactored?
Issues from systems implications might
impact your metrics and should not be
attributed to Machine Learning.
You don’t want to compare apples and oranges!
vs
vs
Example Issues
Data
• Instrumentation
• Metrics
System
• Latency
• Bugs
Other
• UX
• CX
A/A Test
vs
vs
Unveiling Issues
Running A/A Tests
• A: existing system, existing heuristic
• A*: new system, existing heuristic
- ML “turned-off”
- Bypassing the ML decision
What to expect?
• A should be equal A*:
- Operational metrics
- Business metrics
- CS metrics
• If two A’s perform different:
- Trust me, there’s an issue!
- Time to investigate!
Addressing Issues
In case a discrepancy is found on the A/A Test analysis:
• Which metric is showing discrepancies?
• What could have caused it?
• What is the impact of this discrepancy?
Decide whether to fix it based on its impact size
A/A/B Test
vsvs
vs vs
Run an A/A/B Test if time sensitive!
But only trust the A/B part once you validated the A/A part!
Risks
Careful about
"Squeeze Toys"
Optimizing for metric A might
lead to risking metric B.
"If you optimize your business to maximize one
metric, something important happens. Just like
one of those bulging stress-relief squeeze toys,
squeezing it in one place makes it bulge out in
another.”
Quote from the book “Lean Analytics” by Benjamin Yoskovitz
and Alistair Croll
Addressing Risks
Before experimenting
• Simulate worst case scenarios
• Simulate random baseline
Ps: Same goes when collecting randomised data.
After experiment
• Calculate experiment costs
Start with stupid!
Illustration from the book
“Feature Engineering for Machine Learning"
by Alice Zheng and Amanda Casari.
“Type a quote here.”
Quote from the book "Doing Data Science"
by Cathy O’Neil and Rachel Schutt.
Chapter contributed by Claudia Perlich.
“Doing simple sanity checking to make sure things are what you think they are can
sometimes get you much further in the end than web scraping and a big fancy
machine learning algorithm. It may not seem cool and sexy, but it’s smart and good
practice. People might not invite you to a meetup to talk about it. It may not be
publishable research, but at least it’s legitimate and solid work.”
Iterate!
Illustration from the "Analytics Solutions Unified Method”
ASUM-DM by IBM
Iterate!
Addressing the constraints, assumptions, risks and issues.
Illustration from the "Analytics Solutions Unified Method”
ASUM-DM by IBM
Assumptions
Constraints
Issues Risks
Assumptions
Constraints
Issues
Risks
Iterate!
Addressing the constraints, assumptions, risks and issues.
Illustration from the "Analytics Solutions Unified Method”
ASUM-DM by IBM
Assumptions
Constraints
Issues Risks
Assumptions
Constraints
Risks
Iterate!
Addressing the constraints, assumptions, risks and issues.
Illustration from the "Analytics Solutions Unified Method”
ASUM-DM by IBM
Constraints
Issues Risks
Assumptions
Constraints
Risks
Iterate!
Addressing the constraints, assumptions, risks and issues.
Illustration from the "Analytics Solutions Unified Method”
ASUM-DM by IBM
Issues Risks
Assumptions
Constraints
Risks
Iterate!
Addressing the constraints, assumptions, risks and issues.
Illustration from the "Analytics Solutions Unified Method”
ASUM-DM by IBM
Risks
Assumptions
Constraints
Risks
Iterate!
Addressing the constraints, assumptions, risks and issues.
Illustration from the "Analytics Solutions Unified Method”
ASUM-DM by IBM
Risks
Assumptions
Constraints
Illustration from the paper "Hidden Technical Debt in Machine Learning Systems”
by D Sculley et al (Google) - 2015
ML Systems are complex systems!
Illustration adapted from the paper "Hidden Technical Debt in Machine Learning Systems”
by D Sculley et al (Google) - 2015
Start with stupid!
Illustration adapted from the paper "Hidden Technical Debt in Machine Learning Systems”
by D Sculley et al (Google) - 2015
Iterate with strategical proportional investments across the ML stack.
Illustration adapted from the paper "Hidden Technical Debt in Machine Learning Systems”
by D Sculley et al (Google) - 2015
And so on…
Aim for creepy!
What’s the limit of
what’s achievable?
Machine Learning is a powerful
tool, but buy-in and sponsorship
is much needed.
A big vision is vital for Machine
Learning products.
Questions - cheat sheet
• What if you had all the levers that you could possibly pull?
• What if you could optimize all the aspects of the business and user experience?
• What if you would break it down to multiple Machine Learning products?
• What if you had all the data you would like to use?
• What if you had the ideal Machine Learning infrastructure?
• What if you would use the ideal Machine Learning model and approach?
• What if you had all monitoring in place to quickly catch any issues?
Vision - cheat sheet
Improve _____ and reduce _____ by _____ the right _____
and _____ with the right _____ and the right _____
Multi-Objective Optimization
Multiple LeversMultiple ML Products
Hit half-way there!
Good enough is better than perfect!
• You might discover other interesting opportunities for Machine Learning.
• You might discover other interesting opportunities even without Machine
Learning.
• You might discover there’s a third party service for your domain.
• Machine Learning is as part of the solution, not the whole solution.
• Serendipity is good creepy, but algorithmic bias is bad creepy.
Beware of
algorithmic bias.
Check the slides from the tutorial
"Algorithmic Bias in Practice" at
ACM FAT*2019.
Illustration from “AAAI 2017 Spring Symposium Series -
Designing the UX of ML Systems” by Henriette Cramer, Jenn
Thom and XXX
Enjoy the journey!
Have fun!
• Celebrate the invaluable improvements and learnings brought along the journey:
- Data, metrics, instrumentation and experimentation
- Business and domain understanding
- System design and quality
• Get ready for even more exciting next steps!
• Enjoy the journey and don’t forget the bigger picture: customer value!
Re-cap
Open the Pandora’s Box
Start with stupid
Aim for creepy
Hit half-way there
Enjoy the journey!
Obrigada!
dhiana@spotify.com
@dhianadeva on Twitter

More Related Content

What's hot

DataAnalyticsLC_20180410_public
DataAnalyticsLC_20180410_publicDataAnalyticsLC_20180410_public
DataAnalyticsLC_20180410_publicplka13
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceNiko Vuokko
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteRoger Barga
 
DutchMLSchool. Machine Learning: Why Now?
DutchMLSchool. Machine Learning: Why Now? DutchMLSchool. Machine Learning: Why Now?
DutchMLSchool. Machine Learning: Why Now? BigML, Inc
 
Barga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkBarga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkRoger Barga
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014Roger Barga
 
Bringing AI to Business Intelligence
Bringing AI to Business IntelligenceBringing AI to Business Intelligence
Bringing AI to Business IntelligenceSi Krishan
 
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with MLMegan Neider
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data ScientistDaniel Tunkelang
 
How to Use Artificial Intelligence by Microsoft Product Manager
 How to Use Artificial Intelligence by Microsoft Product Manager How to Use Artificial Intelligence by Microsoft Product Manager
How to Use Artificial Intelligence by Microsoft Product ManagerProduct School
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningDaniel Tunkelang
 
(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learningMax Pagels
 
Machine Learning Landscape
Machine Learning LandscapeMachine Learning Landscape
Machine Learning LandscapeEng Teong Cheah
 
Data Models And Details About Open Data
Data Models And Details About Open DataData Models And Details About Open Data
Data Models And Details About Open DataMichael Bostwick
 
Correctness in Data Science - Data Science Pop-up Seattle
Correctness in Data Science - Data Science Pop-up SeattleCorrectness in Data Science - Data Science Pop-up Seattle
Correctness in Data Science - Data Science Pop-up SeattleDomino Data Lab
 
Barga Data Science lecture 1
Barga Data Science lecture 1Barga Data Science lecture 1
Barga Data Science lecture 1Roger Barga
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreTuri, Inc.
 
from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_scienceMartina Pugliese
 

What's hot (20)

DataAnalyticsLC_20180410_public
DataAnalyticsLC_20180410_publicDataAnalyticsLC_20180410_public
DataAnalyticsLC_20180410_public
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 Keynote
 
DutchMLSchool. Machine Learning: Why Now?
DutchMLSchool. Machine Learning: Why Now? DutchMLSchool. Machine Learning: Why Now?
DutchMLSchool. Machine Learning: Why Now?
 
Barga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkBarga DIDC'14 Invited Talk
Barga DIDC'14 Invited Talk
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
 
Bringing AI to Business Intelligence
Bringing AI to Business IntelligenceBringing AI to Business Intelligence
Bringing AI to Business Intelligence
 
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with ML
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
How to Use Artificial Intelligence by Microsoft Product Manager
 How to Use Artificial Intelligence by Microsoft Product Manager How to Use Artificial Intelligence by Microsoft Product Manager
How to Use Artificial Intelligence by Microsoft Product Manager
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine Learning
 
(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning
 
Machine Learning Landscape
Machine Learning LandscapeMachine Learning Landscape
Machine Learning Landscape
 
Data Models And Details About Open Data
Data Models And Details About Open DataData Models And Details About Open Data
Data Models And Details About Open Data
 
Correctness in Data Science - Data Science Pop-up Seattle
Correctness in Data Science - Data Science Pop-up SeattleCorrectness in Data Science - Data Science Pop-up Seattle
Correctness in Data Science - Data Science Pop-up Seattle
 
Barga Data Science lecture 1
Barga Data Science lecture 1Barga Data Science lecture 1
Barga Data Science lecture 1
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
 
from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_science
 

Similar to Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019

The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)Julien SIMON
 
Keepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 
Machine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLMachine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLBritney Muller
 
UX STRAT Online 2021 Presentation by Adilakshmi Veerubhotla, IBM
UX STRAT Online 2021 Presentation by Adilakshmi Veerubhotla, IBMUX STRAT Online 2021 Presentation by Adilakshmi Veerubhotla, IBM
UX STRAT Online 2021 Presentation by Adilakshmi Veerubhotla, IBMUX STRAT
 
ML crash course
ML crash courseML crash course
ML crash coursemikaelhuss
 
[DSC Europe 22] Avoid mistakes building AI products - Karol Przystalski
[DSC Europe 22] Avoid mistakes building AI products - Karol Przystalski[DSC Europe 22] Avoid mistakes building AI products - Karol Przystalski
[DSC Europe 22] Avoid mistakes building AI products - Karol PrzystalskiDataScienceConferenc1
 
Where have all the data entry candidates gone?
Where have all the data entry candidates gone?Where have all the data entry candidates gone?
Where have all the data entry candidates gone?Infrrd
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxChitrachitrap
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIPramit Choudhary
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for TestingSQALab
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018mark madsen
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?Srinath Perera
 
What are the Assumptions About Data Products by Hiya.com Lead PM
What are the Assumptions About Data Products by Hiya.com Lead PMWhat are the Assumptions About Data Products by Hiya.com Lead PM
What are the Assumptions About Data Products by Hiya.com Lead PMProduct School
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Managementmark madsen
 
Keepler | Understanding your own predictive models
Keepler | Understanding your own predictive modelsKeepler | Understanding your own predictive models
Keepler | Understanding your own predictive modelsKeepler Data Tech
 
My programming and machine learning linked in notes 2021 part 1
My programming and machine learning linked in notes   2021 part 1My programming and machine learning linked in notes   2021 part 1
My programming and machine learning linked in notes 2021 part 1Vedran Markulj
 
Hybrid use of machine learning and ontology
Hybrid use of machine learning and ontologyHybrid use of machine learning and ontology
Hybrid use of machine learning and ontologyAnthony (Tony) Sarris
 

Similar to Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019 (20)

The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)
 
Keepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivos
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Machine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLMachine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXL
 
UX STRAT Online 2021 Presentation by Adilakshmi Veerubhotla, IBM
UX STRAT Online 2021 Presentation by Adilakshmi Veerubhotla, IBMUX STRAT Online 2021 Presentation by Adilakshmi Veerubhotla, IBM
UX STRAT Online 2021 Presentation by Adilakshmi Veerubhotla, IBM
 
ML crash course
ML crash courseML crash course
ML crash course
 
[DSC Europe 22] Avoid mistakes building AI products - Karol Przystalski
[DSC Europe 22] Avoid mistakes building AI products - Karol Przystalski[DSC Europe 22] Avoid mistakes building AI products - Karol Przystalski
[DSC Europe 22] Avoid mistakes building AI products - Karol Przystalski
 
Ezml Stanford 2015
Ezml Stanford 2015Ezml Stanford 2015
Ezml Stanford 2015
 
Where have all the data entry candidates gone?
Where have all the data entry candidates gone?Where have all the data entry candidates gone?
Where have all the data entry candidates gone?
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptx
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AI
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for Testing
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
 
What are the Assumptions About Data Products by Hiya.com Lead PM
What are the Assumptions About Data Products by Hiya.com Lead PMWhat are the Assumptions About Data Products by Hiya.com Lead PM
What are the Assumptions About Data Products by Hiya.com Lead PM
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Management
 
Keepler | Understanding your own predictive models
Keepler | Understanding your own predictive modelsKeepler | Understanding your own predictive models
Keepler | Understanding your own predictive models
 
Debugging AI
Debugging AIDebugging AI
Debugging AI
 
My programming and machine learning linked in notes 2021 part 1
My programming and machine learning linked in notes   2021 part 1My programming and machine learning linked in notes   2021 part 1
My programming and machine learning linked in notes 2021 part 1
 
Hybrid use of machine learning and ontology
Hybrid use of machine learning and ontologyHybrid use of machine learning and ontology
Hybrid use of machine learning and ontology
 

More from Dhiana Deva

Machine Learning in Python - PyLadies Stockholm
Machine Learning in Python - PyLadies StockholmMachine Learning in Python - PyLadies Stockholm
Machine Learning in Python - PyLadies StockholmDhiana Deva
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneDhiana Deva
 
Um Pouquinho Sobre Métodos Ágeis - Rails Girls SP
Um Pouquinho Sobre Métodos Ágeis - Rails Girls SPUm Pouquinho Sobre Métodos Ágeis - Rails Girls SP
Um Pouquinho Sobre Métodos Ágeis - Rails Girls SPDhiana Deva
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data ScienceDhiana Deva
 
QCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for EveryoneQCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for EveryoneDhiana Deva
 
My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!Dhiana Deva
 
AR Post-its @ CBSOFT
AR Post-its @ CBSOFTAR Post-its @ CBSOFT
AR Post-its @ CBSOFTDhiana Deva
 
Self-Organizing Maps 101 (Dhiana Deva)
Self-Organizing Maps 101 (Dhiana Deva)Self-Organizing Maps 101 (Dhiana Deva)
Self-Organizing Maps 101 (Dhiana Deva)Dhiana Deva
 
Sistemas de recomendação
Sistemas de recomendaçãoSistemas de recomendação
Sistemas de recomendaçãoDhiana Deva
 

More from Dhiana Deva (10)

Machine Learning in Python - PyLadies Stockholm
Machine Learning in Python - PyLadies StockholmMachine Learning in Python - PyLadies Stockholm
Machine Learning in Python - PyLadies Stockholm
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Um Pouquinho Sobre Métodos Ágeis - Rails Girls SP
Um Pouquinho Sobre Métodos Ágeis - Rails Girls SPUm Pouquinho Sobre Métodos Ágeis - Rails Girls SP
Um Pouquinho Sobre Métodos Ágeis - Rails Girls SP
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 
QCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for EveryoneQCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for Everyone
 
We love NLTK
We love NLTKWe love NLTK
We love NLTK
 
My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
 
AR Post-its @ CBSOFT
AR Post-its @ CBSOFTAR Post-its @ CBSOFT
AR Post-its @ CBSOFT
 
Self-Organizing Maps 101 (Dhiana Deva)
Self-Organizing Maps 101 (Dhiana Deva)Self-Organizing Maps 101 (Dhiana Deva)
Self-Organizing Maps 101 (Dhiana Deva)
 
Sistemas de recomendação
Sistemas de recomendaçãoSistemas de recomendação
Sistemas de recomendação
 

Recently uploaded

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 

Recently uploaded (20)

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 

Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019

  • 1. Machine Learning: Opening the Pandora’s Box By Dhiana Deva - Machine Learning Engineer at Spotify QCon São Paulo - May 2019
  • 2. Agenda About me Open the Pandora’s Box Start with stupid Aim for creepy Hit half-way there
  • 4. Me @ QCon Rio 2015
  • 5. Me @ QCon São Paulo 2019
  • 8. Introducing Machine Learning is like opening the Pandora’s Box Problem s Problems Problem s Problems
  • 9. Introducing Machine Learning is like opening the Pandora’s Box Problem s Problems Problem s Problems
  • 10. Introducing Machine Learning is like opening the Pandora’s BoxAssumptionsC onstraints Issues Risks
  • 12. Be aware (not afraid) of constraints What decisions can you affect? What are the system implications? What does your ML Infra support? Illustration from the book "Creative People Must be Stopped” By David A. Owens
  • 13. Example Constraints Business Constraints • Metrics • Business logic • Legal needs Data Constraints • Volume • Features • Labels Systems Constraints • Available levers • Infrastructure support • Systems implications • Engineering effort
  • 14. Addressing Constraints Investigate, communicate, and address it by either: • Accepting and working under its boundaries • Expanding its boundaries WARNING: Hitting an unexpected critical constraint too late in the process can kill your ML product!
  • 16. "You have no idea, but you pretend you know." You might not have enough data to back your hypothesis. Historical data is biased by existing heuristics. The hypothesis behind your ML product might be based on a critical assumption. Assumptions bridging between "Known Unknowns" and "Known Knowns" KNOWN UNKNOWN KNOWN ASSUMPTIONS UNKNOWN
  • 17. Example Assumptions • Are the metrics sensitive to the levers the ML approach is pulling? • How do customers behave under changes in the logic? • Impact analysis assumptions: - Cost of misclassification - Benefit of correct classification - Assumptions for worst case scenario - Parameters for more optimistic scenarios
  • 18. Addressing Assumptions • Experiment early and focus on learning parameters needed for better impact analysis and further more sophisticated approaches. • Consider reframing initial problems to be solved, to validate most critical assumptions first. • To be able to more forward with an unbiased approach, collect randomized data.
  • 20. Machine Learning itself might not be the issue! Is there latency introduced? Did the systems need to be changed, decoupled or refactored? Issues from systems implications might impact your metrics and should not be attributed to Machine Learning. You don’t want to compare apples and oranges! vs vs
  • 21. Example Issues Data • Instrumentation • Metrics System • Latency • Bugs Other • UX • CX
  • 23. Unveiling Issues Running A/A Tests • A: existing system, existing heuristic • A*: new system, existing heuristic - ML “turned-off” - Bypassing the ML decision What to expect? • A should be equal A*: - Operational metrics - Business metrics - CS metrics • If two A’s perform different: - Trust me, there’s an issue! - Time to investigate!
  • 24. Addressing Issues In case a discrepancy is found on the A/A Test analysis: • Which metric is showing discrepancies? • What could have caused it? • What is the impact of this discrepancy? Decide whether to fix it based on its impact size
  • 25. A/A/B Test vsvs vs vs Run an A/A/B Test if time sensitive! But only trust the A/B part once you validated the A/A part!
  • 26. Risks
  • 27. Careful about "Squeeze Toys" Optimizing for metric A might lead to risking metric B. "If you optimize your business to maximize one metric, something important happens. Just like one of those bulging stress-relief squeeze toys, squeezing it in one place makes it bulge out in another.” Quote from the book “Lean Analytics” by Benjamin Yoskovitz and Alistair Croll
  • 28. Addressing Risks Before experimenting • Simulate worst case scenarios • Simulate random baseline Ps: Same goes when collecting randomised data. After experiment • Calculate experiment costs
  • 30. Illustration from the book “Feature Engineering for Machine Learning" by Alice Zheng and Amanda Casari. “Type a quote here.”
  • 31. Quote from the book "Doing Data Science" by Cathy O’Neil and Rachel Schutt. Chapter contributed by Claudia Perlich. “Doing simple sanity checking to make sure things are what you think they are can sometimes get you much further in the end than web scraping and a big fancy machine learning algorithm. It may not seem cool and sexy, but it’s smart and good practice. People might not invite you to a meetup to talk about it. It may not be publishable research, but at least it’s legitimate and solid work.”
  • 32. Iterate! Illustration from the "Analytics Solutions Unified Method” ASUM-DM by IBM
  • 33. Iterate! Addressing the constraints, assumptions, risks and issues. Illustration from the "Analytics Solutions Unified Method” ASUM-DM by IBM Assumptions Constraints Issues Risks Assumptions Constraints Issues Risks
  • 34. Iterate! Addressing the constraints, assumptions, risks and issues. Illustration from the "Analytics Solutions Unified Method” ASUM-DM by IBM Assumptions Constraints Issues Risks Assumptions Constraints Risks
  • 35. Iterate! Addressing the constraints, assumptions, risks and issues. Illustration from the "Analytics Solutions Unified Method” ASUM-DM by IBM Constraints Issues Risks Assumptions Constraints Risks
  • 36. Iterate! Addressing the constraints, assumptions, risks and issues. Illustration from the "Analytics Solutions Unified Method” ASUM-DM by IBM Issues Risks Assumptions Constraints Risks
  • 37. Iterate! Addressing the constraints, assumptions, risks and issues. Illustration from the "Analytics Solutions Unified Method” ASUM-DM by IBM Risks Assumptions Constraints Risks
  • 38. Iterate! Addressing the constraints, assumptions, risks and issues. Illustration from the "Analytics Solutions Unified Method” ASUM-DM by IBM Risks Assumptions Constraints
  • 39. Illustration from the paper "Hidden Technical Debt in Machine Learning Systems” by D Sculley et al (Google) - 2015 ML Systems are complex systems!
  • 40. Illustration adapted from the paper "Hidden Technical Debt in Machine Learning Systems” by D Sculley et al (Google) - 2015 Start with stupid!
  • 41. Illustration adapted from the paper "Hidden Technical Debt in Machine Learning Systems” by D Sculley et al (Google) - 2015 Iterate with strategical proportional investments across the ML stack.
  • 42. Illustration adapted from the paper "Hidden Technical Debt in Machine Learning Systems” by D Sculley et al (Google) - 2015 And so on…
  • 44. What’s the limit of what’s achievable? Machine Learning is a powerful tool, but buy-in and sponsorship is much needed. A big vision is vital for Machine Learning products.
  • 45. Questions - cheat sheet • What if you had all the levers that you could possibly pull? • What if you could optimize all the aspects of the business and user experience? • What if you would break it down to multiple Machine Learning products? • What if you had all the data you would like to use? • What if you had the ideal Machine Learning infrastructure? • What if you would use the ideal Machine Learning model and approach? • What if you had all monitoring in place to quickly catch any issues?
  • 46. Vision - cheat sheet Improve _____ and reduce _____ by _____ the right _____ and _____ with the right _____ and the right _____ Multi-Objective Optimization Multiple LeversMultiple ML Products
  • 48. Good enough is better than perfect! • You might discover other interesting opportunities for Machine Learning. • You might discover other interesting opportunities even without Machine Learning. • You might discover there’s a third party service for your domain. • Machine Learning is as part of the solution, not the whole solution. • Serendipity is good creepy, but algorithmic bias is bad creepy.
  • 49. Beware of algorithmic bias. Check the slides from the tutorial "Algorithmic Bias in Practice" at ACM FAT*2019. Illustration from “AAAI 2017 Spring Symposium Series - Designing the UX of ML Systems” by Henriette Cramer, Jenn Thom and XXX
  • 51. Have fun! • Celebrate the invaluable improvements and learnings brought along the journey: - Data, metrics, instrumentation and experimentation - Business and domain understanding - System design and quality • Get ready for even more exciting next steps! • Enjoy the journey and don’t forget the bigger picture: customer value!
  • 53. Open the Pandora’s Box Start with stupid Aim for creepy Hit half-way there Enjoy the journey!