6. @integrateai | @humekathryn
Jeffrey Friedman & Richard Zeckhauser
President Obama reportedly complained that the discussion
offered ‘not more certainty but more confusion,’ and that his
advisors were offering ‘probabilities that disguised uncertainty as
opposed to actually providing you with useful information.’
10. “Over the past decades computers have broadly
automated tasks that programmers could describe with
clear rules and algorithms. Modern machine learning
techniques now allow us to do the same for tasks where
describing the precise rules is much harder.”
Bezos 2017 Letter to Shareholders
@integrateai | @humekathryn
19. The Raw Ingredients of an AI product
Deep understanding of a business problem
Data, data, data
Algorithmic capability
Integrations and interfaces
@integrateai | @humekathryn
20. Machine Learning Product Lifecycle
Design
What problem
are we solving?
Data Exploration
What does the data
look like?
Production
Can we scale the model?
Which is best?
Maintenance
How to update
as data changes?
Data Processing
Is our data ready
for use?
Model Prototyping
Will this work?
Should we pivot?
Data Generation
How do tools capture
information?
Data Collection
Who is well represented?
Where are blind spots?
Evaluation/Interpretation
How do we know
it’s working?
21. @integrateai | @humekathryn
Case Study #1
Partially automate checklist completion
20,000 examples completed lists
First party keeps data & model
23. Design
What level of certainty does the problem demand?
@integrateai | @humekathryn
versus
24. Design
Do you have a motivated subject matter expert?
@integrateai | @humekathryn
25. Model Type Amount of data Structure of data
Subject Matter
Expertise
Explainability
Regression Small ok
Less Complex
Features
Selecting the
features
Medium-High
Bayesian Small ok
Hierarchical
Features
Defining the priors Medium-High
Deep Learning Large only
Hierarchical
Features
Little
(Labeling)
Low
Reinforcement
Learning
Explore/Exploit
Depends
(Deep vs.
Classical)
Simulate the
environment
Depends
(Deep vs.
Classical)
29. @integrateai | @humekathryn
Lessons learned
• Key risk is wasted cost & time
• Seek
• Clearly defined input-output with large labeled training set
• Motivated SME to collaborate with technical team
• Avoid
• Tasks that require certainty
• Tasks with little output variety
• Beware
• Hype leading to misunderstanding of what’s possible
• Tactics
• Scope PoC tightly
• Disprove hypothesis to minimize sunk cost
30. @integrateai | @humekathryn
Case Study #2
Increase conversion rates
Design infrastructure to measure outcomes
Third party gets data & owns model
32. “There is often a difference between the quantity you
want to measure and the one you can measure.
When these differ,
your model will become good at predicting the quantity you measured,
not the quantity for which it was meant to be a proxy. "
https://medium.com/@yonatanzunger/asking-the-right-questions-about-ai-7ed2d9820c48
@integrateai | @humekathryn
33. COMPAS risk assessment tool
• What we think we’re asking: “what is the
probability that this person will commit a
serious crime in the future, as a function of the
sentence you give them now?”
• What we’re actually asking: “who is more likely
to be convicted?”
@integrateai | @humekathryn
Design
What problem are we solving?
38. @integrateai | @humekathryn
Lessons learned
• Risks around sunk cost, data privacy & security, ethics
• Seek
• Problems where a good guess can provide measurable boost
• Collaborative and flexible partners
• Beware
• Understand what your proxies actually optimize for
• Be prepared to problem solve with imbalanced data sets
• Be Pragmatic but Principled
• Explainability is not an absolute - it’s context dependent
• Go the extra mile to evaluate fairness
• Tactics
• Performance-based fee structure to align incentives