SlideShare a Scribd company logo
1 of 42
Download to read offline
The
Uncanny Valley
of ML
Dr June Andrews Delphi Data Nov 2019
Human Decision Systems
Simple Paradigm Represents:
• Judges setting bail
• Doctors processing images
• DMV clerks renewing licenses
• Muni train drivers stopping & going
• Administrators admitting students
• … you coming to MLConf
Information In, Decision Out, Works Pretty Well
Hype: Technology Will Replace People Overnight
???
Far More Likely Progression of Technology
Machine Learning with Augment Decision Making with Recommendations
Template Design for ML + Decision Systems
Recommendation: Yes
Feature A is high
Keep the Chain of Responsibility Intact
Integrate as soon as ML Accuracy ≈ Human Accuracy
Decreasing
cost of ML
Pressures for Introducing ML into Decision Systems
Increasing data
from revolutions in
sensors, records &
infrastructure
Fewer experts
graduating in
‘older fields’
Increasing number
of decisions
created by more
people
Ideal Result of ML + Human Decisions
ML Accuracy
Terrible Human Perfect
Accuracy
Time & Cost
As ML Accuracy Approaches Human Accuracy, System Performance Improves
LowHigh
The Uncanny Valley of ML
As ML Accuracy Approaches Human Accuracy, System Performance Degrades
ML Accuracy
Terrible Human Perfect
Accuracy
Time & Cost
LowHigh
Finding
the
Uncanny
Valley
of ML
Finding An Uncanny Valley of ML
Use Test Environments To Avoid The Uncanny Valley in Production
1. Create a simple
labeling task with
ground truth labels
2. Measure human
accuracy & speed
3.
1. Add a recommended decision
from a ‘Model’
2. Simulate models of different
accuracy near human accuracy
by perturbing the ground truth
labels
3. Assign each person to a
simulated model and run test
labels for normalization
4. Measure system accuracy &
speed as a function of ML
accuracy
5.
Easy Street - How many coffee mugs do you see?
Throwback to the first demos of Neural Nets for Compute Vision @ Cornell
100+ photos hand labeled
with the number of coffee
mugs for ground truth
Label quality is then
perturbed to simulate
different ML accuracies, with
a bias to perturbing the
images with many mugs
100+ photos hand labeled
with the number of coffee
mugs for ground truth
Label quality is then
perturbed to simulate
different ML accuracies, with
a bias to perturbing the
images with many mugs
…used Amazon Mechanical
Turk workers
Easy Street - How many coffee mugs do you see?
Throwback to the first demos of Neural Nets for Compute Vision @ Cornell
2 mugs
1 mug
Ideal Results
ML Accuracy
Terrible Human Perfect
Accuracy
Time & Cost
LowHigh
Actual System Behavior
SystemAccuracy
ML Accuracy
Terrible Human (94%) Perfect
2 mugs
1 mug
LowHigh
Actual System Behavior
DecisionTime
ML Accuracy
Terrible Human (94%) Perfect
2 mugs
1 mug
LowHigh
Uncanny Valley for ML in
Counting Coffee Mugs?
Do people want machines to be wrong?
We trust machines more than we trust
ourselves when they are near but not over
human accuracy?
We’re lazy and want to defer decision
making (people varied from the ML when
it was correct)
2 mugs
1 mug
The Uncanny
Valley of ML
in the
Judicial
System
Decreasing
cost of ML
Pressures for Introducing ML into Court Systems
Increasing data
from records &
social media
Fewer experts
graduating in
‘older fields’
Increasing number of decisions
created by more people
Estonia is building a ‘Robot
Judge’ to settle disputes
under $8,000 - DailyMail
Broader initiative of e-
government. France wants
to match Estonia's level by
2022
160,000 parking tickets
overturned in the UK & US
with a chatbot -Guardian
on DoNotPay
Risk Score Print Outs in Cleveland. Includes features
like ‘how often are you bored’ -Quartz
Note - arraignment hearings are often under 5
minutes.
ML has a Growing Presence in Courts
Countries are Comparing Notes & Learning How to Use AI in the Courts
Locally ML is used in the
Judicial System for Bail
In California 49 of 58 counties
use a Pretrial Assessment System
(yes SF is one) [courts.ca.gov]
SB 10 signed in 2018 would
make it mandatory in October of
2019, but a 2020 referendum
contradicting SB 10 has created
a temporary pause
Just a sec, does Bail Matter?
• 20% of jail inmates in US are
awaiting trial
• Misdemeanors can take several
months for trial, felonies can take
years. Average wait time in the
Bronx is 642 days for a non-jury
trial and 827 days for a jury trial.
• Pretrial detention leads to 13%
increase in plea agreements, 42%
increase in length of sentence and
41% increase increase in court fees
-Stevenson The Journal of Law,
Economics, and Organization
8th Amendment (Bill of Rights)
‘Excessive bail shall not be
required, nor excessive fines
imposed, nor cruel and unusual
punishments inflicted.’
Finding An Uncanny Valley of ML
Unknown System Accuracy, Show Manipulation of a Single Label
1. Take a Real Case
2.
1. Simulate different UI’s &
different model deliverables
2. Compare label distribution
with actual outcome
Finding An Uncanny Valley of ML
Unknown System Accuracy, Instead Show Manipulation of a Single Label
Details Taken from Machine Bias by Propublica -2016
Summary - high schooler stole a bike for a few blocks, had a
High Risk Compas Score by Equivalent. Bail was set at $1000
400+ Survey Participants From Amazon MTurk
No ML
ML - Low Risk
ML - Medium Risk
ML - High Risk
ML - High Risk Positive Support
ML - High Risk Negative Support
Proportion of Responses
0% 33% 67% 100%
$0 Bail $1000 Bail No Release
Power of Suggestion of High Risk ML (no reason) results in +14% in bail denied
Power of Suggestion of Low Risk ML (no Reason) results in +14% in $0 bail
High Risk ML with Negative Features results in +40% in denying bail
104 Survey Participants From June’s Network
No ML
ML - Positive Support
ML - Negative Support
Proportion of Responses
0% 33% 67% 100%
$0 Bail $1000 Bail No Release
While overall more forgiving group, still +40% increase denying
bail
A Higher Level of Machine Learning Knowledge Does NOT Change the Trend
Fundamental Design Flaw in
SB10 & Compas Scores
• Need to allow for the ML system to
return ‘Uncertain, not enough data’
• The Bureau of Justice Statistics has
a warning of ‘Interpret data with
caution. Estimate based on 10 or
fewer sample cases’ for someone
with Brisha’s details
• … also, effectiveness of requiring
ML in the California courts is not
slated to be measured until 2023,
4 years after release
Aside:
The
Uncanny
Valley of
ML
- Why does it exist?
Uncanny Valley of AI
Discovered by Masahiro Mori (1970)
Box office success of movies is
potentially related to the Uncanny
Valley:
• Final Fantasy
• Polar Express
• Beowulf
• The Incredible Hulk
Uncanny Valley of AI
Why it exists is open field of research:
• Mismatch between expectations
and observations [Tinwell]
• Difficult to classify objects that
move between the boundaries of
categories [Looser & Wheatly]
• Recognizing a similar cognitive
[Fray & Wegner]
• Ambiguity about the presence of
threat [McAndrews]
When it exists is also a debate.
2013 Activision Animation
Uncanny Valley of ML
Additional Theories to Consider:
• People excuse biased decisions on
the machine
• People want machines to be wrong
• Disagreeing is a different skill set
than analyzing
• Providing explanations of reasoning
suppresses intuitive decisions
When it exists should also be studied
further.
Thought Experiment
Yes / No
Imagine if the ML system was
another person, who wasn’t
quite as bright as the first
person
It would take longer, the bright
person would question
themselves more
Small Group Communication: A
Theoretical Approach has
additional details of when
groups underperform
individuals
Lean on the field of Team Research to Bootstrap Expectations on Integration
Crossing
the
Chasm
- Avoiding the Uncanny Valley
Self-Driving Cars - Headed for Uncanny Valley
People viewed as backups who would stay behind the wheel and intervene to
avoid accidents in unpredictable or computer confusing instants. Self-driving
option should be included as soon as possible for competitive advantage.
Left
No-op
Right
Left Right
Decreasing cost of ML
Increasing data
from revolutions in
sensors & records
Aging population
Increasing number of
drivers, commutes
increasing
Best Practice: Bet on the Power of ML
Left
No-op
Right
Left Right
Volvo changed from targeting Level 2 to
Levels {4, 5} after including executives in a
simulation of driving a Level 2 car [Wired]
Delaying Release Until Performance Crosses the Valley
Best Practice: Build Simulators
Nuclear Power Plants, Aviation,
Moon Landings … all use
simulators to refine product
designs before launch
**include actual judges/experts
in simulations
Identify location and impact of the Valley before building
Best Practice: Avoid by Redefining Success
Repurpose - ML
designed for 1 system,
may work well for
another
Reset expectations Relabel Bad Labels
The Uncanny Valley
Doesn’t Always Exist
First UI, people completely ignored the ML suggestion
You could design a system no one uses to avoid the Valley
System
Accuracy
ML Accuracy
Terrible Human Perfect
2 mugs
1 mug
Delivering Results in
the Uncanny Valley
Can Lead to Early
Project Termination
• These are critical systems
where a 5% drop in accuracy
undoes years of research and
investments. Mistakes are not
treated kindly in these fields.
• Funding is much more tightly
controlled and hard to obtain
after failed launches.
• Legal modifications may
become barriers
Call to Action:
Source a New Field of Research ‘ML Integration’
• HCI, Team Research, Data
Science, AI, Psychology, User
Research & Application Fields are
all trying to understand
integrating ML into Human
Decision Systems …
independently and slowly
• Binding efforts into a single
discipline will rapidly increase
development and possibly meet
demand
* I am not qualified to give this talk … but who is?
Bootstrapping ML Integration
Funding Sources {Military, Accenture,
Academia, …?}
Initial Areas of Research:
• How to calculate the speed and
accuracy of large distributed
human + ML decision systems
• How to safely train and roll out
new decision processes to experts
• How to fairly explain a ML
decision. Beyond explainable AI.
• Design and run experiments in
these systems
Machine Learning
DS
AI
Politics
Team Research
Effective
ML Integration
Call to Action: Implement Guardrail Metrics for
Vulnerable Members of the Population
• Guardrail Metrics are
used to allow ML to
optimize as much as it
can within a specified
business boundary
• Let’s define a boundary
in tech to not make
systems worse for black
girls
Thank You.
Slides at /drandrews

More Related Content

Similar to The Uncanny Valley of ML

Disrupting with Data: Lessons from Silicon Valley
Disrupting with Data: Lessons from Silicon ValleyDisrupting with Data: Lessons from Silicon Valley
Disrupting with Data: Lessons from Silicon ValleyAnand Rajaraman
 
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)Amazon Web Services
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managersNitin T Bhat
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongMarTech Conference
 
Deep Credit Risk Ranking with LSTM with Kyle Grove
Deep Credit Risk Ranking with LSTM with Kyle GroveDeep Credit Risk Ranking with LSTM with Kyle Grove
Deep Credit Risk Ranking with LSTM with Kyle GroveDatabricks
 
Analytics in business
Analytics in businessAnalytics in business
Analytics in businessNiko Vuokko
 
Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learningUmmeSalmaM1
 
The Dangers of Machine Learning
The Dangers of Machine LearningThe Dangers of Machine Learning
The Dangers of Machine LearningtothepointIT
 
Monitoring Models in Production
Monitoring Models in ProductionMonitoring Models in Production
Monitoring Models in ProductionJannes Klaas
 
Is Bigger Data Really Better? 10 Facts from Theory and Practice
Is Bigger Data Really Better? 10 Facts from Theory and PracticeIs Bigger Data Really Better? 10 Facts from Theory and Practice
Is Bigger Data Really Better? 10 Facts from Theory and PracticeDataWorks Summit
 
Fraud Management_CAS_Presentation_Oct2016
Fraud Management_CAS_Presentation_Oct2016Fraud Management_CAS_Presentation_Oct2016
Fraud Management_CAS_Presentation_Oct2016Mark Jones
 
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...Analytics India Magazine
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a professionJose Quesada
 
Jay Budzik, Ai4 Finance, Aug 21, 2019
Jay Budzik, Ai4 Finance, Aug 21, 2019Jay Budzik, Ai4 Finance, Aug 21, 2019
Jay Budzik, Ai4 Finance, Aug 21, 2019Bruce Upbin
 
Machine Learning: Need of Machine Learning, Its Challenges and its Applications
Machine Learning: Need of Machine Learning, Its Challenges and its ApplicationsMachine Learning: Need of Machine Learning, Its Challenges and its Applications
Machine Learning: Need of Machine Learning, Its Challenges and its ApplicationsArpana Awasthi
 
It's About The Citizen - Changing Needs and Rising Expectations
It's About The Citizen - Changing Needs and Rising ExpectationsIt's About The Citizen - Changing Needs and Rising Expectations
It's About The Citizen - Changing Needs and Rising ExpectationsPeter Coffee
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...Dell World
 
Automated Decision making with Predictive Applications – Big Data Hamburg
Automated Decision making with Predictive Applications – Big Data HamburgAutomated Decision making with Predictive Applications – Big Data Hamburg
Automated Decision making with Predictive Applications – Big Data HamburgLars Trieloff
 
Gary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkGary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkSaratoga
 
Data is the New Oil: Presented By Naveen Narayanan, Global Client Partner of ...
Data is the New Oil: Presented By Naveen Narayanan, Global Client Partner of ...Data is the New Oil: Presented By Naveen Narayanan, Global Client Partner of ...
Data is the New Oil: Presented By Naveen Narayanan, Global Client Partner of ...InterCon
 

Similar to The Uncanny Valley of ML (20)

Disrupting with Data: Lessons from Silicon Valley
Disrupting with Data: Lessons from Silicon ValleyDisrupting with Data: Lessons from Silicon Valley
Disrupting with Data: Lessons from Silicon Valley
 
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managers
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin Strong
 
Deep Credit Risk Ranking with LSTM with Kyle Grove
Deep Credit Risk Ranking with LSTM with Kyle GroveDeep Credit Risk Ranking with LSTM with Kyle Grove
Deep Credit Risk Ranking with LSTM with Kyle Grove
 
Analytics in business
Analytics in businessAnalytics in business
Analytics in business
 
Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learning
 
The Dangers of Machine Learning
The Dangers of Machine LearningThe Dangers of Machine Learning
The Dangers of Machine Learning
 
Monitoring Models in Production
Monitoring Models in ProductionMonitoring Models in Production
Monitoring Models in Production
 
Is Bigger Data Really Better? 10 Facts from Theory and Practice
Is Bigger Data Really Better? 10 Facts from Theory and PracticeIs Bigger Data Really Better? 10 Facts from Theory and Practice
Is Bigger Data Really Better? 10 Facts from Theory and Practice
 
Fraud Management_CAS_Presentation_Oct2016
Fraud Management_CAS_Presentation_Oct2016Fraud Management_CAS_Presentation_Oct2016
Fraud Management_CAS_Presentation_Oct2016
 
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a profession
 
Jay Budzik, Ai4 Finance, Aug 21, 2019
Jay Budzik, Ai4 Finance, Aug 21, 2019Jay Budzik, Ai4 Finance, Aug 21, 2019
Jay Budzik, Ai4 Finance, Aug 21, 2019
 
Machine Learning: Need of Machine Learning, Its Challenges and its Applications
Machine Learning: Need of Machine Learning, Its Challenges and its ApplicationsMachine Learning: Need of Machine Learning, Its Challenges and its Applications
Machine Learning: Need of Machine Learning, Its Challenges and its Applications
 
It's About The Citizen - Changing Needs and Rising Expectations
It's About The Citizen - Changing Needs and Rising ExpectationsIt's About The Citizen - Changing Needs and Rising Expectations
It's About The Citizen - Changing Needs and Rising Expectations
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
 
Automated Decision making with Predictive Applications – Big Data Hamburg
Automated Decision making with Predictive Applications – Big Data HamburgAutomated Decision making with Predictive Applications – Big Data Hamburg
Automated Decision making with Predictive Applications – Big Data Hamburg
 
Gary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkGary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you Think
 
Data is the New Oil: Presented By Naveen Narayanan, Global Client Partner of ...
Data is the New Oil: Presented By Naveen Narayanan, Global Client Partner of ...Data is the New Oil: Presented By Naveen Narayanan, Global Client Partner of ...
Data is the New Oil: Presented By Naveen Narayanan, Global Client Partner of ...
 

More from June Andrews

Scaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveScaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveJune Andrews
 
Critical turbine maintenance: Monitoring and diagnosing planes and power plan...
Critical turbine maintenance: Monitoring and diagnosing planes and power plan...Critical turbine maintenance: Monitoring and diagnosing planes and power plan...
Critical turbine maintenance: Monitoring and diagnosing planes and power plan...June Andrews
 
Push & Pull History of Data Science in Industry & Academia
Push & Pull History of Data Science in Industry & AcademiaPush & Pull History of Data Science in Industry & Academia
Push & Pull History of Data Science in Industry & AcademiaJune Andrews
 
Counter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of ThingsCounter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of ThingsJune Andrews
 
Counter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of ThingsCounter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of ThingsJune Andrews
 
Replication in Data Science
Replication in Data ScienceReplication in Data Science
Replication in Data ScienceJune Andrews
 
Replication in Data Science - A Dance Between Data Science & Machine Learning...
Replication in Data Science - A Dance Between Data Science & Machine Learning...Replication in Data Science - A Dance Between Data Science & Machine Learning...
Replication in Data Science - A Dance Between Data Science & Machine Learning...June Andrews
 
Trends on Pinterest
Trends on PinterestTrends on Pinterest
Trends on PinterestJune Andrews
 
Growth, Engagement & Search Metrics: Snake Oil or North Stars
Growth, Engagement & Search Metrics: Snake Oil or North StarsGrowth, Engagement & Search Metrics: Snake Oil or North Stars
Growth, Engagement & Search Metrics: Snake Oil or North StarsJune Andrews
 
Predictive Analytics & Business Insights
Predictive Analytics & Business InsightsPredictive Analytics & Business Insights
Predictive Analytics & Business InsightsJune Andrews
 

More from June Andrews (14)

Scaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveScaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
 
Critical turbine maintenance: Monitoring and diagnosing planes and power plan...
Critical turbine maintenance: Monitoring and diagnosing planes and power plan...Critical turbine maintenance: Monitoring and diagnosing planes and power plan...
Critical turbine maintenance: Monitoring and diagnosing planes and power plan...
 
Data Competitive
Data CompetitiveData Competitive
Data Competitive
 
Push & Pull History of Data Science in Industry & Academia
Push & Pull History of Data Science in Industry & AcademiaPush & Pull History of Data Science in Industry & Academia
Push & Pull History of Data Science in Industry & Academia
 
ML Playbook
ML PlaybookML Playbook
ML Playbook
 
Counter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of ThingsCounter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of Things
 
Counter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of ThingsCounter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of Things
 
Replication in Data Science
Replication in Data ScienceReplication in Data Science
Replication in Data Science
 
Replication in Data Science - A Dance Between Data Science & Machine Learning...
Replication in Data Science - A Dance Between Data Science & Machine Learning...Replication in Data Science - A Dance Between Data Science & Machine Learning...
Replication in Data Science - A Dance Between Data Science & Machine Learning...
 
Trends on Pinterest
Trends on PinterestTrends on Pinterest
Trends on Pinterest
 
Math in data
Math in dataMath in data
Math in data
 
Growth, Engagement & Search Metrics: Snake Oil or North Stars
Growth, Engagement & Search Metrics: Snake Oil or North StarsGrowth, Engagement & Search Metrics: Snake Oil or North Stars
Growth, Engagement & Search Metrics: Snake Oil or North Stars
 
Economic Insights
Economic InsightsEconomic Insights
Economic Insights
 
Predictive Analytics & Business Insights
Predictive Analytics & Business InsightsPredictive Analytics & Business Insights
Predictive Analytics & Business Insights
 

Recently uploaded

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 

Recently uploaded (20)

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 

The Uncanny Valley of ML

  • 1. The Uncanny Valley of ML Dr June Andrews Delphi Data Nov 2019
  • 2. Human Decision Systems Simple Paradigm Represents: • Judges setting bail • Doctors processing images • DMV clerks renewing licenses • Muni train drivers stopping & going • Administrators admitting students • … you coming to MLConf Information In, Decision Out, Works Pretty Well
  • 3. Hype: Technology Will Replace People Overnight
  • 4. ??? Far More Likely Progression of Technology Machine Learning with Augment Decision Making with Recommendations
  • 5. Template Design for ML + Decision Systems Recommendation: Yes Feature A is high Keep the Chain of Responsibility Intact Integrate as soon as ML Accuracy ≈ Human Accuracy
  • 6. Decreasing cost of ML Pressures for Introducing ML into Decision Systems Increasing data from revolutions in sensors, records & infrastructure Fewer experts graduating in ‘older fields’ Increasing number of decisions created by more people
  • 7. Ideal Result of ML + Human Decisions ML Accuracy Terrible Human Perfect Accuracy Time & Cost As ML Accuracy Approaches Human Accuracy, System Performance Improves LowHigh
  • 8. The Uncanny Valley of ML As ML Accuracy Approaches Human Accuracy, System Performance Degrades ML Accuracy Terrible Human Perfect Accuracy Time & Cost LowHigh
  • 10. Finding An Uncanny Valley of ML Use Test Environments To Avoid The Uncanny Valley in Production 1. Create a simple labeling task with ground truth labels 2. Measure human accuracy & speed 3. 1. Add a recommended decision from a ‘Model’ 2. Simulate models of different accuracy near human accuracy by perturbing the ground truth labels 3. Assign each person to a simulated model and run test labels for normalization 4. Measure system accuracy & speed as a function of ML accuracy 5.
  • 11. Easy Street - How many coffee mugs do you see? Throwback to the first demos of Neural Nets for Compute Vision @ Cornell 100+ photos hand labeled with the number of coffee mugs for ground truth Label quality is then perturbed to simulate different ML accuracies, with a bias to perturbing the images with many mugs
  • 12. 100+ photos hand labeled with the number of coffee mugs for ground truth Label quality is then perturbed to simulate different ML accuracies, with a bias to perturbing the images with many mugs …used Amazon Mechanical Turk workers Easy Street - How many coffee mugs do you see? Throwback to the first demos of Neural Nets for Compute Vision @ Cornell
  • 13. 2 mugs 1 mug Ideal Results ML Accuracy Terrible Human Perfect Accuracy Time & Cost LowHigh
  • 14. Actual System Behavior SystemAccuracy ML Accuracy Terrible Human (94%) Perfect 2 mugs 1 mug LowHigh
  • 15. Actual System Behavior DecisionTime ML Accuracy Terrible Human (94%) Perfect 2 mugs 1 mug LowHigh
  • 16. Uncanny Valley for ML in Counting Coffee Mugs? Do people want machines to be wrong? We trust machines more than we trust ourselves when they are near but not over human accuracy? We’re lazy and want to defer decision making (people varied from the ML when it was correct) 2 mugs 1 mug
  • 17. The Uncanny Valley of ML in the Judicial System
  • 18. Decreasing cost of ML Pressures for Introducing ML into Court Systems Increasing data from records & social media Fewer experts graduating in ‘older fields’ Increasing number of decisions created by more people
  • 19. Estonia is building a ‘Robot Judge’ to settle disputes under $8,000 - DailyMail Broader initiative of e- government. France wants to match Estonia's level by 2022 160,000 parking tickets overturned in the UK & US with a chatbot -Guardian on DoNotPay Risk Score Print Outs in Cleveland. Includes features like ‘how often are you bored’ -Quartz Note - arraignment hearings are often under 5 minutes. ML has a Growing Presence in Courts Countries are Comparing Notes & Learning How to Use AI in the Courts
  • 20. Locally ML is used in the Judicial System for Bail In California 49 of 58 counties use a Pretrial Assessment System (yes SF is one) [courts.ca.gov] SB 10 signed in 2018 would make it mandatory in October of 2019, but a 2020 referendum contradicting SB 10 has created a temporary pause
  • 21. Just a sec, does Bail Matter? • 20% of jail inmates in US are awaiting trial • Misdemeanors can take several months for trial, felonies can take years. Average wait time in the Bronx is 642 days for a non-jury trial and 827 days for a jury trial. • Pretrial detention leads to 13% increase in plea agreements, 42% increase in length of sentence and 41% increase increase in court fees -Stevenson The Journal of Law, Economics, and Organization 8th Amendment (Bill of Rights) ‘Excessive bail shall not be required, nor excessive fines imposed, nor cruel and unusual punishments inflicted.’
  • 22. Finding An Uncanny Valley of ML Unknown System Accuracy, Show Manipulation of a Single Label 1. Take a Real Case 2. 1. Simulate different UI’s & different model deliverables 2. Compare label distribution with actual outcome
  • 23. Finding An Uncanny Valley of ML Unknown System Accuracy, Instead Show Manipulation of a Single Label Details Taken from Machine Bias by Propublica -2016 Summary - high schooler stole a bike for a few blocks, had a High Risk Compas Score by Equivalent. Bail was set at $1000
  • 24. 400+ Survey Participants From Amazon MTurk No ML ML - Low Risk ML - Medium Risk ML - High Risk ML - High Risk Positive Support ML - High Risk Negative Support Proportion of Responses 0% 33% 67% 100% $0 Bail $1000 Bail No Release Power of Suggestion of High Risk ML (no reason) results in +14% in bail denied Power of Suggestion of Low Risk ML (no Reason) results in +14% in $0 bail High Risk ML with Negative Features results in +40% in denying bail
  • 25. 104 Survey Participants From June’s Network No ML ML - Positive Support ML - Negative Support Proportion of Responses 0% 33% 67% 100% $0 Bail $1000 Bail No Release While overall more forgiving group, still +40% increase denying bail A Higher Level of Machine Learning Knowledge Does NOT Change the Trend
  • 26. Fundamental Design Flaw in SB10 & Compas Scores • Need to allow for the ML system to return ‘Uncertain, not enough data’ • The Bureau of Justice Statistics has a warning of ‘Interpret data with caution. Estimate based on 10 or fewer sample cases’ for someone with Brisha’s details • … also, effectiveness of requiring ML in the California courts is not slated to be measured until 2023, 4 years after release Aside:
  • 28. Uncanny Valley of AI Discovered by Masahiro Mori (1970) Box office success of movies is potentially related to the Uncanny Valley: • Final Fantasy • Polar Express • Beowulf • The Incredible Hulk
  • 29. Uncanny Valley of AI Why it exists is open field of research: • Mismatch between expectations and observations [Tinwell] • Difficult to classify objects that move between the boundaries of categories [Looser & Wheatly] • Recognizing a similar cognitive [Fray & Wegner] • Ambiguity about the presence of threat [McAndrews] When it exists is also a debate. 2013 Activision Animation
  • 30. Uncanny Valley of ML Additional Theories to Consider: • People excuse biased decisions on the machine • People want machines to be wrong • Disagreeing is a different skill set than analyzing • Providing explanations of reasoning suppresses intuitive decisions When it exists should also be studied further.
  • 31. Thought Experiment Yes / No Imagine if the ML system was another person, who wasn’t quite as bright as the first person It would take longer, the bright person would question themselves more Small Group Communication: A Theoretical Approach has additional details of when groups underperform individuals Lean on the field of Team Research to Bootstrap Expectations on Integration
  • 33. Self-Driving Cars - Headed for Uncanny Valley People viewed as backups who would stay behind the wheel and intervene to avoid accidents in unpredictable or computer confusing instants. Self-driving option should be included as soon as possible for competitive advantage. Left No-op Right Left Right Decreasing cost of ML Increasing data from revolutions in sensors & records Aging population Increasing number of drivers, commutes increasing
  • 34. Best Practice: Bet on the Power of ML Left No-op Right Left Right Volvo changed from targeting Level 2 to Levels {4, 5} after including executives in a simulation of driving a Level 2 car [Wired] Delaying Release Until Performance Crosses the Valley
  • 35. Best Practice: Build Simulators Nuclear Power Plants, Aviation, Moon Landings … all use simulators to refine product designs before launch **include actual judges/experts in simulations Identify location and impact of the Valley before building
  • 36. Best Practice: Avoid by Redefining Success Repurpose - ML designed for 1 system, may work well for another Reset expectations Relabel Bad Labels
  • 37. The Uncanny Valley Doesn’t Always Exist First UI, people completely ignored the ML suggestion You could design a system no one uses to avoid the Valley System Accuracy ML Accuracy Terrible Human Perfect 2 mugs 1 mug
  • 38. Delivering Results in the Uncanny Valley Can Lead to Early Project Termination • These are critical systems where a 5% drop in accuracy undoes years of research and investments. Mistakes are not treated kindly in these fields. • Funding is much more tightly controlled and hard to obtain after failed launches. • Legal modifications may become barriers
  • 39. Call to Action: Source a New Field of Research ‘ML Integration’ • HCI, Team Research, Data Science, AI, Psychology, User Research & Application Fields are all trying to understand integrating ML into Human Decision Systems … independently and slowly • Binding efforts into a single discipline will rapidly increase development and possibly meet demand * I am not qualified to give this talk … but who is?
  • 40. Bootstrapping ML Integration Funding Sources {Military, Accenture, Academia, …?} Initial Areas of Research: • How to calculate the speed and accuracy of large distributed human + ML decision systems • How to safely train and roll out new decision processes to experts • How to fairly explain a ML decision. Beyond explainable AI. • Design and run experiments in these systems Machine Learning DS AI Politics Team Research Effective ML Integration
  • 41. Call to Action: Implement Guardrail Metrics for Vulnerable Members of the Population • Guardrail Metrics are used to allow ML to optimize as much as it can within a specified business boundary • Let’s define a boundary in tech to not make systems worse for black girls
  • 42. Thank You. Slides at /drandrews