The document discusses introducing machine learning and the challenges that come with it. It likens introducing machine learning to opening Pandora's box, as it brings problems like constraints, assumptions, risks, and issues. It recommends starting with simple approaches, addressing these challenges through iteration, and aiming high with vision while avoiding algorithmic bias. The overall message is to have fun on the journey of machine learning and focus on creating customer value.
12. Be aware (not afraid)
of constraints
What decisions can you affect?
What are the system
implications?
What does your ML Infra
support?
Illustration from the book
"Creative People Must be Stopped”
By David A. Owens
13. Example Constraints
Business Constraints
• Metrics
• Business logic
• Legal needs
Data Constraints
• Volume
• Features
• Labels
Systems Constraints
• Available levers
• Infrastructure support
• Systems implications
• Engineering effort
14. Addressing Constraints
Investigate, communicate, and address it by either:
• Accepting and working under its boundaries
• Expanding its boundaries
WARNING: Hitting an unexpected critical constraint too late in the process can kill
your ML product!
16. "You have no idea,
but you pretend you
know."
You might not have enough data to back
your hypothesis.
Historical data is biased by existing
heuristics.
The hypothesis behind your ML product
might be based on a critical assumption.
Assumptions bridging between "Known Unknowns" and
"Known Knowns"
KNOWN UNKNOWN
KNOWN ASSUMPTIONS
UNKNOWN
17. Example Assumptions
• Are the metrics sensitive to the levers the ML approach is pulling?
• How do customers behave under changes in the logic?
• Impact analysis assumptions:
- Cost of misclassification
- Benefit of correct classification
- Assumptions for worst case scenario
- Parameters for more optimistic scenarios
18. Addressing Assumptions
• Experiment early and focus on learning parameters needed for better
impact analysis and further more sophisticated approaches.
• Consider reframing initial problems to be solved, to validate most critical
assumptions first.
• To be able to more forward with an unbiased approach, collect randomized
data.
20. Machine Learning
itself might not be
the issue!
Is there latency introduced?
Did the systems need to be changed,
decoupled or refactored?
Issues from systems implications might
impact your metrics and should not be
attributed to Machine Learning.
You don’t want to compare apples and oranges!
vs
vs
23. Unveiling Issues
Running A/A Tests
• A: existing system, existing heuristic
• A*: new system, existing heuristic
- ML “turned-off”
- Bypassing the ML decision
What to expect?
• A should be equal A*:
- Operational metrics
- Business metrics
- CS metrics
• If two A’s perform different:
- Trust me, there’s an issue!
- Time to investigate!
24. Addressing Issues
In case a discrepancy is found on the A/A Test analysis:
• Which metric is showing discrepancies?
• What could have caused it?
• What is the impact of this discrepancy?
Decide whether to fix it based on its impact size
25. A/A/B Test
vsvs
vs vs
Run an A/A/B Test if time sensitive!
But only trust the A/B part once you validated the A/A part!
27. Careful about
"Squeeze Toys"
Optimizing for metric A might
lead to risking metric B.
"If you optimize your business to maximize one
metric, something important happens. Just like
one of those bulging stress-relief squeeze toys,
squeezing it in one place makes it bulge out in
another.”
Quote from the book “Lean Analytics” by Benjamin Yoskovitz
and Alistair Croll
28. Addressing Risks
Before experimenting
• Simulate worst case scenarios
• Simulate random baseline
Ps: Same goes when collecting randomised data.
After experiment
• Calculate experiment costs
30. Illustration from the book
“Feature Engineering for Machine Learning"
by Alice Zheng and Amanda Casari.
“Type a quote here.”
31. Quote from the book "Doing Data Science"
by Cathy O’Neil and Rachel Schutt.
Chapter contributed by Claudia Perlich.
“Doing simple sanity checking to make sure things are what you think they are can
sometimes get you much further in the end than web scraping and a big fancy
machine learning algorithm. It may not seem cool and sexy, but it’s smart and good
practice. People might not invite you to a meetup to talk about it. It may not be
publishable research, but at least it’s legitimate and solid work.”
33. Iterate!
Addressing the constraints, assumptions, risks and issues.
Illustration from the "Analytics Solutions Unified Method”
ASUM-DM by IBM
Assumptions
Constraints
Issues Risks
Assumptions
Constraints
Issues
Risks
34. Iterate!
Addressing the constraints, assumptions, risks and issues.
Illustration from the "Analytics Solutions Unified Method”
ASUM-DM by IBM
Assumptions
Constraints
Issues Risks
Assumptions
Constraints
Risks
35. Iterate!
Addressing the constraints, assumptions, risks and issues.
Illustration from the "Analytics Solutions Unified Method”
ASUM-DM by IBM
Constraints
Issues Risks
Assumptions
Constraints
Risks
36. Iterate!
Addressing the constraints, assumptions, risks and issues.
Illustration from the "Analytics Solutions Unified Method”
ASUM-DM by IBM
Issues Risks
Assumptions
Constraints
Risks
37. Iterate!
Addressing the constraints, assumptions, risks and issues.
Illustration from the "Analytics Solutions Unified Method”
ASUM-DM by IBM
Risks
Assumptions
Constraints
Risks
38. Iterate!
Addressing the constraints, assumptions, risks and issues.
Illustration from the "Analytics Solutions Unified Method”
ASUM-DM by IBM
Risks
Assumptions
Constraints
39. Illustration from the paper "Hidden Technical Debt in Machine Learning Systems”
by D Sculley et al (Google) - 2015
ML Systems are complex systems!
40. Illustration adapted from the paper "Hidden Technical Debt in Machine Learning Systems”
by D Sculley et al (Google) - 2015
Start with stupid!
41. Illustration adapted from the paper "Hidden Technical Debt in Machine Learning Systems”
by D Sculley et al (Google) - 2015
Iterate with strategical proportional investments across the ML stack.
42. Illustration adapted from the paper "Hidden Technical Debt in Machine Learning Systems”
by D Sculley et al (Google) - 2015
And so on…
44. What’s the limit of
what’s achievable?
Machine Learning is a powerful
tool, but buy-in and sponsorship
is much needed.
A big vision is vital for Machine
Learning products.
45. Questions - cheat sheet
• What if you had all the levers that you could possibly pull?
• What if you could optimize all the aspects of the business and user experience?
• What if you would break it down to multiple Machine Learning products?
• What if you had all the data you would like to use?
• What if you had the ideal Machine Learning infrastructure?
• What if you would use the ideal Machine Learning model and approach?
• What if you had all monitoring in place to quickly catch any issues?
46. Vision - cheat sheet
Improve _____ and reduce _____ by _____ the right _____
and _____ with the right _____ and the right _____
Multi-Objective Optimization
Multiple LeversMultiple ML Products
48. Good enough is better than perfect!
• You might discover other interesting opportunities for Machine Learning.
• You might discover other interesting opportunities even without Machine
Learning.
• You might discover there’s a third party service for your domain.
• Machine Learning is as part of the solution, not the whole solution.
• Serendipity is good creepy, but algorithmic bias is bad creepy.
49. Beware of
algorithmic bias.
Check the slides from the tutorial
"Algorithmic Bias in Practice" at
ACM FAT*2019.
Illustration from “AAAI 2017 Spring Symposium Series -
Designing the UX of ML Systems” by Henriette Cramer, Jenn
Thom and XXX
51. Have fun!
• Celebrate the invaluable improvements and learnings brought along the journey:
- Data, metrics, instrumentation and experimentation
- Business and domain understanding
- System design and quality
• Get ready for even more exciting next steps!
• Enjoy the journey and don’t forget the bigger picture: customer value!