Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019

Machine Learning:
Opening the Pandora’s Box
By Dhiana Deva - Machine Learning Engineer at Spotify
QCon São Paulo - May 2019

Agenda
About me
Open the Pandora’s Box
Start with stupid
Aim for creepy
Hit half-way there

Introducing Machine Learning
is like opening the Pandora’s Box
Problem
s
Problems
Problem
s
Problems

Introducing Machine Learning
is like opening the Pandora’s BoxAssumptionsC
onstraints
Issues
Risks

Be aware (not afraid)
of constraints
What decisions can you affect?
What are the system
implications?
What does your ML Infra
support?
Illustration from the book
"Creative People Must be Stopped”
By David A. Owens

Example Constraints
Business Constraints
• Metrics
• Business logic
• Legal needs
Data Constraints
• Volume
• Features
• Labels
Systems Constraints
• Available levers
• Infrastructure support
• Systems implications
• Engineering effort

Addressing Constraints
Investigate, communicate, and address it by either:
• Accepting and working under its boundaries
• Expanding its boundaries
WARNING: Hitting an unexpected critical constraint too late in the process can kill
your ML product!

"You have no idea,
but you pretend you
know."
You might not have enough data to back
your hypothesis.
Historical data is biased by existing
heuristics.
The hypothesis behind your ML product
might be based on a critical assumption.
Assumptions bridging between "Known Unknowns" and
"Known Knowns"
KNOWN UNKNOWN
KNOWN ASSUMPTIONS
UNKNOWN

Example Assumptions
• Are the metrics sensitive to the levers the ML approach is pulling?
• How do customers behave under changes in the logic?
• Impact analysis assumptions:
- Cost of misclassification
- Benefit of correct classification
- Assumptions for worst case scenario
- Parameters for more optimistic scenarios

Addressing Assumptions
• Experiment early and focus on learning parameters needed for better
impact analysis and further more sophisticated approaches.
• Consider reframing initial problems to be solved, to validate most critical
assumptions first.
• To be able to more forward with an unbiased approach, collect randomized
data.

Machine Learning
itself might not be
the issue!
Is there latency introduced?
Did the systems need to be changed,
decoupled or refactored?
Issues from systems implications might
impact your metrics and should not be
attributed to Machine Learning.
You don’t want to compare apples and oranges!
vs
vs

Example Issues
Data
• Instrumentation
• Metrics
System
• Latency
• Bugs
Other
• UX
• CX

Unveiling Issues
Running A/A Tests
• A: existing system, existing heuristic
• A*: new system, existing heuristic
- ML “turned-off”
- Bypassing the ML decision
What to expect?
• A should be equal A*:
- Operational metrics
- Business metrics
- CS metrics
• If two A’s perform different:
- Trust me, there’s an issue!
- Time to investigate!

Addressing Issues
In case a discrepancy is found on the A/A Test analysis:
• Which metric is showing discrepancies?
• What could have caused it?
• What is the impact of this discrepancy?
Decide whether to fix it based on its impact size

A/A/B Test
vsvs
vs vs
Run an A/A/B Test if time sensitive!
But only trust the A/B part once you validated the A/A part!

Careful about
"Squeeze Toys"
Optimizing for metric A might
lead to risking metric B.
"If you optimize your business to maximize one
metric, something important happens. Just like
one of those bulging stress-relief squeeze toys,
squeezing it in one place makes it bulge out in
another.”
Quote from the book “Lean Analytics” by Benjamin Yoskovitz
and Alistair Croll

Addressing Risks
Before experimenting
• Simulate worst case scenarios
• Simulate random baseline
Ps: Same goes when collecting randomised data.
After experiment
• Calculate experiment costs

Illustration from the book
“Feature Engineering for Machine Learning"
by Alice Zheng and Amanda Casari.
“Type a quote here.”

Quote from the book "Doing Data Science"
by Cathy O’Neil and Rachel Schutt.
Chapter contributed by Claudia Perlich.
“Doing simple sanity checking to make sure things are what you think they are can
sometimes get you much further in the end than web scraping and a big fancy
machine learning algorithm. It may not seem cool and sexy, but it’s smart and good
practice. People might not invite you to a meetup to talk about it. It may not be
publishable research, but at least it’s legitimate and solid work.”

Iterate!
Illustration from the "Analytics Solutions Unified Method”
ASUM-DM by IBM

Iterate!
Addressing the constraints, assumptions, risks and issues.
ASUM-DM by IBM
Assumptions
Constraints
Issues Risks
Assumptions
Constraints
Issues
Risks

Iterate!
ASUM-DM by IBM
Assumptions
Constraints
Issues Risks
Assumptions
Constraints
Risks

Iterate!
ASUM-DM by IBM
Constraints
Issues Risks
Assumptions
Constraints
Risks

Iterate!
ASUM-DM by IBM
Issues Risks
Assumptions
Constraints
Risks

Iterate!
ASUM-DM by IBM
Risks
Assumptions
Constraints
Risks

Iterate!
ASUM-DM by IBM
Risks
Assumptions
Constraints

Illustration from the paper "Hidden Technical Debt in Machine Learning Systems”
by D Sculley et al (Google) - 2015
ML Systems are complex systems!

Illustration adapted from the paper "Hidden Technical Debt in Machine Learning Systems”
Start with stupid!

Iterate with strategical proportional investments across the ML stack.

And so on…

What’s the limit of
what’s achievable?
Machine Learning is a powerful
tool, but buy-in and sponsorship
is much needed.
A big vision is vital for Machine
Learning products.

Questions - cheat sheet
• What if you had all the levers that you could possibly pull?
• What if you could optimize all the aspects of the business and user experience?
• What if you would break it down to multiple Machine Learning products?
• What if you had all the data you would like to use?
• What if you had the ideal Machine Learning infrastructure?
• What if you would use the ideal Machine Learning model and approach?
• What if you had all monitoring in place to quickly catch any issues?

Vision - cheat sheet
Improve _____ and reduce _____ by _____ the right _____
and _____ with the right _____ and the right _____
Multi-Objective Optimization
Multiple LeversMultiple ML Products

Good enough is better than perfect!
• You might discover other interesting opportunities for Machine Learning.
• You might discover other interesting opportunities even without Machine
Learning.
• You might discover there’s a third party service for your domain.
• Machine Learning is as part of the solution, not the whole solution.
• Serendipity is good creepy, but algorithmic bias is bad creepy.

Beware of
algorithmic bias.
Check the slides from the tutorial
"Algorithmic Bias in Practice" at
ACM FAT*2019.
Illustration from “AAAI 2017 Spring Symposium Series -
Designing the UX of ML Systems” by Henriette Cramer, Jenn
Thom and XXX

Have fun!
• Celebrate the invaluable improvements and learnings brought along the journey:
- Data, metrics, instrumentation and experimentation
- Business and domain understanding
- System design and quality
• Get ready for even more exciting next steps!
• Enjoy the journey and don’t forget the bigger picture: customer value!

Open the Pandora’s Box
Start with stupid
Aim for creepy
Hit half-way there
Enjoy the journey!

Obrigada!
dhiana@spotify.com
@dhianadeva on Twitter

Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019

Similar to Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019 (20)

More from Dhiana Deva

More from Dhiana Deva (10)

Recently uploaded

Recently uploaded (20)

Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019