Margaret Mitchell, Senior Research Scientist, Google, at MLonf Seattle 2017

MLconf
MLconfMLconf
Ishan Misra Larry Zitnick Ross Girshick Josh Lovejoy Hartwig Adam Blaise Agüera y Arcas
CMU FAIR FAIR Google Google Google
The Seen and Unseen Factors
Influencing
Machine Perception
of Images and Language Margaret Mitchell
Google
What do you see?
What do you see?
● Bananas
What do you see?
● Bananas
● Bananas at a store
What do you see?
● Bananas
● Bananas at a store
● Bananas on shelves
What do you see?
● Bananas
● Bananas at a store
● Bananas on shelves
● Bunches of bananas
What do you see?
● Bananas
● Bananas at a store
● Bananas on shelves
● Bunches of bananas
● Bananas with stickers on them
What do you see?
● Bananas
● Bananas at a store
● Bananas on shelves
● Bunches of bananas
● Bananas with stickers on them
● Bunches of bananas with stickers on
them on shelves in a store
What do you see?
● Bananas
● Bananas at a store
● Bananas on shelves
● Bunches of bananas
● Bananas with stickers on them
● Bunches of bananas with stickers on
them on shelves in a store
Yellow Bananas
What do you see?
Green Bananas
Unripe Bananas
What do you see?
Ripe Bananas
Bananas with spots
What do you see?
Ripe Bananas
Bananas with spots
Bananas good for banana
bread
What do you see?
Yellow Bananas
Yellow is prototypical for
bananas
A man and his son are in a terrible
accident and are rushed to the hospital
in critical care.
The doctor looks at the boy and
exclaims "I can't operate on this boy,
he's my son!"
How could this be?
A man and his son are in a terrible
accident and are rushed to the hospital
in critical care.
The doctor looks at the boy and
exclaims "I can't operate on this boy,
he's my son!"
How could this be?
“Female doctor”
A man and his son are in a terrible
accident and are rushed to the hospital
in critical care.
The doctor looks at the boy and
exclaims "I can't operate on this boy,
he's my son!"
How could this be?
“Female doctor”“Doctor”
The majority of test subjects overlooked the
possibility that the doctor is a she - including men,
women, and self-described feminists.
Wapman & Belle, Boston University
Word Frequency in corpus
“spoke” 11,577,917
“laughed” 3,904,519
“murdered” 2,834,529
“inhaled” 984,613
“breathed” 725,034
“hugged” 610,040
“blinked” 390,692
“exhale” 168,985
World learning
from text
Gordon and Van Durme, 2013
Word Frequency in corpus
“spoke” 11,577,917
“laughed” 3,904,519
“murdered” 2,834,529
“inhaled” 984,613
“breathed” 725,034
“hugged” 610,040
“blinked” 390,692
“exhale” 168,985
World learning
from text
Gordon and Van Durme, 2013
Human Reporting Bias
The frequency with which people write about
actions, outcomes, or properties is not a
reflection of real-world frequencies or the degree
to which a property is characteristic of a class of
individuals.
(Gordon and Van Durme, 2013)
Training data are
collected and
annotated
Training data are
collected and
annotated
Model is trained
Training data are
collected and
annotated
Model is trained
Media are
filtered, ranked,
aggregated, or
generated
Training data are
collected and
annotated
Model is trained
Media are
filtered, ranked,
aggregated, or
generated
People see
output
Human Biases in Data
Reporting bias
Selection bias
Overgeneralization
Out-group
homogeneity bias
Stereotypical bias
Historical unfairness
Implicit associations
Implicit stereotypes
Prejudice
Group attribution error
Halo effect
Training data are
collected and
annotated
Human Biases in Data
Reporting bias
Selection bias
Overgeneralization
Out-group
homogeneity bias
Stereotypical bias
Historical unfairness
Implicit associations
Implicit stereotypes
Prejudice
Group attribution error
Halo effect
Training data are
collected and
annotated Human Biases in Collection and Annotation
Sampling error
Non-sampling error
Insensitivity to
sample size
Correspondence bias
In-group bias
Bias blind spot
Confirmation bias
Subjective validation
Experimenter’s bias
Choice-supportive
bias
Neglect of probability
Anecdotal fallacy
Illusion of validity
Automation bias
Training data are
collected and
annotated
Model is trained
Media are
filtered, ranked,
aggregated, or
generated
People see
output
Training data are
collected and
annotated
Model is trained
Media are
filtered, ranked,
aggregated, or
generated
People see
output
Human Bias
Training data are
collected and
annotated
Model is trained
Media are
filtered, ranked,
aggregated, or
generated
People see
output
Human Bias
Human Bias
Human Bias
Human Bias
Training data are
collected and
annotated
Model is trained
Media are
filtered, ranked,
aggregated, or
generated
People see
output
Human Bias
Biased data created from process becomes new training data
Human Bias
Human Bias
Human Bias
Bias: A systematic deviation
from full ground truth.
Bias: A systematic deviation
from full ground truth.
Can be used as a signal.
Reporting Bias in Vision and Language
Data data everywhere …
But not many labels to train
Exhaustively annotated data is expensive
dog, chair, pizza, donut dog, chihuahua, brown, chair, table, wall,
space heater, pizza, greasy, donut 1, donut 2,
pizza slice 1, pizza slice 2...
Simple Image Classification
Linear
Sigmoid
Classifier
CNN
Banana
Yellow
Output
Input Image Ground
Truth
“Gold standard” Annotation: Human-biased label yw
∈ {0, 1}
Prediction hw
(yw
|I)
w ∈ {banana,
yellow}
yw
For each w
Factoring in Reporting Bias: Idea
● A human-biased prediction h can be factored into two terms
● A human-biased prediction h can be factored into two terms
○ Visual presence v – Is the concept visually present?
w ∈ {banana,
yellow}
Factoring in Reporting Bias: Idea
● A human-biased prediction h can be factored into two terms
○ Visual presence v – Is the concept visually present?
○ Relevance r – Is the concept relevant for a human?
w ∈ {banana,
yellow}
Factoring in Reporting Bias: Idea
● A human-biased prediction h can be factored into two terms
○ Visual presence v – Is the concept visually present?
○ Relevance r – Is the concept relevant for a human?
Factoring in Reporting Bias: Idea
● A human-biased prediction h can be factored into two terms
○ Visual presence v – Is the concept visually present?
○ Relevance r – Is the concept relevant for a human?
Label Prediction
Visually correct
ground truth
(Unknown)
Available ground
truth
(human-biased)
Is concept present?
Given visual presence,
is concept relevant?
Factoring in Reporting Bias: Idea
End-to-End Approach
SOURCE
Misra, Ishan; Girshick, Ross; Mitchell, Margaret; Zitnick, Larry (2016). “Seeing through the Human
Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels”. Proceedings of CVPR.
“Female doctor”“Male doctor”
Ishan Misra Larry Zitnick Ross Girshick Josh Lovejoy Hartwig Adam Blaise Agüera y Arcas
CMU FAIR FAIR Google Google Google
Thanks!
margarmitchell@gmail.com
Margaret Mitchell
Google
1 of 45

Recommended

Mukund Narasimhan, Engineer, Pinterest at MLconf Seattle 2017 by
Mukund Narasimhan, Engineer, Pinterest at MLconf Seattle 2017Mukund Narasimhan, Engineer, Pinterest at MLconf Seattle 2017
Mukund Narasimhan, Engineer, Pinterest at MLconf Seattle 2017MLconf
2.3K views33 slides
Claudia Perlich, Chief Scientist, Dstillery by
Claudia Perlich, Chief Scientist, Dstillery Claudia Perlich, Chief Scientist, Dstillery
Claudia Perlich, Chief Scientist, Dstillery MLconf
1.4K views57 slides
Yuri M. Brovman, Data Scientist, eBay by
Yuri M. Brovman, Data Scientist, eBayYuri M. Brovman, Data Scientist, eBay
Yuri M. Brovman, Data Scientist, eBayMLconf
1.3K views16 slides
Erik Bernhardsson, CTO, Better Mortgage by
Erik Bernhardsson, CTO, Better MortgageErik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better MortgageMLconf
825 views63 slides
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 by
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 MLconf
1.5K views24 slides
Tim Chartier, Chief Academic Officer, Tresata at MLconf ATL 2017 by
Tim Chartier, Chief Academic Officer, Tresata at MLconf ATL 2017Tim Chartier, Chief Academic Officer, Tresata at MLconf ATL 2017
Tim Chartier, Chief Academic Officer, Tresata at MLconf ATL 2017MLconf
242 views45 slides

More Related Content

Viewers also liked

Dr. Bryce Meredig, Chief Science Officer, Citrine at The AI Conference by
Dr. Bryce Meredig, Chief Science Officer, Citrine at The AI Conference Dr. Bryce Meredig, Chief Science Officer, Citrine at The AI Conference
Dr. Bryce Meredig, Chief Science Officer, Citrine at The AI Conference MLconf
677 views7 slides
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017 by
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017MLconf
545 views27 slides
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017 by
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017MLconf
648 views28 slides
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo... by
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf
739 views35 slides
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017 by
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017MLconf
458 views15 slides
Jacob Eisenstein, Assistant Professor, School of Interactive Computing, Georg... by
Jacob Eisenstein, Assistant Professor, School of Interactive Computing, Georg...Jacob Eisenstein, Assistant Professor, School of Interactive Computing, Georg...
Jacob Eisenstein, Assistant Professor, School of Interactive Computing, Georg...MLconf
401 views45 slides

Viewers also liked(19)

Dr. Bryce Meredig, Chief Science Officer, Citrine at The AI Conference by MLconf
Dr. Bryce Meredig, Chief Science Officer, Citrine at The AI Conference Dr. Bryce Meredig, Chief Science Officer, Citrine at The AI Conference
Dr. Bryce Meredig, Chief Science Officer, Citrine at The AI Conference
MLconf677 views
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017 by MLconf
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
MLconf545 views
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017 by MLconf
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
MLconf648 views
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo... by MLconf
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
MLconf739 views
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017 by MLconf
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
MLconf458 views
Jacob Eisenstein, Assistant Professor, School of Interactive Computing, Georg... by MLconf
Jacob Eisenstein, Assistant Professor, School of Interactive Computing, Georg...Jacob Eisenstein, Assistant Professor, School of Interactive Computing, Georg...
Jacob Eisenstein, Assistant Professor, School of Interactive Computing, Georg...
MLconf401 views
Qiaoling Liu, Lead Data Scientist, CareerBuilder at MLconf ATL 2017 by MLconf
Qiaoling Liu, Lead Data Scientist, CareerBuilder at MLconf ATL 2017Qiaoling Liu, Lead Data Scientist, CareerBuilder at MLconf ATL 2017
Qiaoling Liu, Lead Data Scientist, CareerBuilder at MLconf ATL 2017
MLconf616 views
Rahul Mehrotra, Product Manager, Maluuba at The AI Conference 2017 by MLconf
Rahul Mehrotra, Product Manager, Maluuba at The AI Conference 2017Rahul Mehrotra, Product Manager, Maluuba at The AI Conference 2017
Rahul Mehrotra, Product Manager, Maluuba at The AI Conference 2017
MLconf660 views
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT... by MLconf
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
MLconf624 views
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017 by MLconf
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
MLconf497 views
Artemy Malkov, CEO, Data Monsters at The AI Conference 2017 by MLconf
Artemy Malkov, CEO, Data Monsters at The AI Conference 2017 Artemy Malkov, CEO, Data Monsters at The AI Conference 2017
Artemy Malkov, CEO, Data Monsters at The AI Conference 2017
MLconf1.5K views
Will Murphy, VP of Business Development & Co-Founder, Talla at The AI Confere... by MLconf
Will Murphy, VP of Business Development & Co-Founder, Talla at The AI Confere...Will Murphy, VP of Business Development & Co-Founder, Talla at The AI Confere...
Will Murphy, VP of Business Development & Co-Founder, Talla at The AI Confere...
MLconf907 views
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar... by MLconf
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
MLconf1.7K views
Talha Obaid, Email Security, Symantec at MLconf ATL 2017 by MLconf
Talha Obaid, Email Security, Symantec at MLconf ATL 2017Talha Obaid, Email Security, Symantec at MLconf ATL 2017
Talha Obaid, Email Security, Symantec at MLconf ATL 2017
MLconf1.1K views
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017 by MLconf
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
MLconf1.1K views
Jennifer Marsman, Principal Software Development Engineer, Microsoft at MLcon... by MLconf
Jennifer Marsman, Principal Software Development Engineer, Microsoft at MLcon...Jennifer Marsman, Principal Software Development Engineer, Microsoft at MLcon...
Jennifer Marsman, Principal Software Development Engineer, Microsoft at MLcon...
MLconf607 views
Daniel Shank, Data Scientist, Talla at MLconf SF 2017 by MLconf
Daniel Shank, Data Scientist, Talla at MLconf SF 2017Daniel Shank, Data Scientist, Talla at MLconf SF 2017
Daniel Shank, Data Scientist, Talla at MLconf SF 2017
MLconf1.2K views
Jonas Schneider, Head of Engineering for Robotics, OpenAI by MLconf
Jonas Schneider, Head of Engineering for Robotics, OpenAIJonas Schneider, Head of Engineering for Robotics, OpenAI
Jonas Schneider, Head of Engineering for Robotics, OpenAI
MLconf2.1K views
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017 by MLconf
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
MLconf1K views

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments... by
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
945 views15 slides
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding by
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
634 views49 slides
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re... by
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
535 views18 slides
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush by
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
749 views25 slides
Josh Wills - Data Labeling as Religious Experience by
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
627 views22 slides
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai... by
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
614 views60 slides

More from MLconf(20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments... by MLconf
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
MLconf945 views
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding by MLconf
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf634 views
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re... by MLconf
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf535 views
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush by MLconf
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
MLconf749 views
Josh Wills - Data Labeling as Religious Experience by MLconf
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
MLconf627 views
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai... by MLconf
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
MLconf614 views
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea... by MLconf
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf954 views
Meghana Ravikumar - Optimized Image Classification on the Cheap by MLconf
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
MLconf371 views
Noam Finkelstein - The Importance of Modeling Data Collection by MLconf
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
MLconf304 views
June Andrews - The Uncanny Valley of ML by MLconf
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
MLconf423 views
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks by MLconf
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
MLconf451 views
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D... by MLconf
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf444 views
Vito Ostuni - The Voice: New Challenges in a Zero UI World by MLconf
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf303 views
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection... by MLconf
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
MLconf811 views
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip... by MLconf
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
MLconf573 views
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o... by MLconf
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
MLconf650 views
Neel Sundaresan - Teaching a machine to code by MLconf
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
MLconf1K views
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl... by MLconf
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
MLconf4K views
Soumith Chintala - Increasing the Impact of AI Through Better Software by MLconf
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
MLconf646 views
Roy Lowrance - Predicting Bond Prices: Regime Changes by MLconf
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
MLconf426 views

Recently uploaded

Mini-Track: Challenges to Network Automation Adoption by
Mini-Track: Challenges to Network Automation AdoptionMini-Track: Challenges to Network Automation Adoption
Mini-Track: Challenges to Network Automation AdoptionNetwork Automation Forum
12 views27 slides
Democratising digital commerce in India-Report by
Democratising digital commerce in India-ReportDemocratising digital commerce in India-Report
Democratising digital commerce in India-ReportKapil Khandelwal (KK)
15 views161 slides
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...James Anderson
85 views32 slides
Info Session November 2023.pdf by
Info Session November 2023.pdfInfo Session November 2023.pdf
Info Session November 2023.pdfAleksandraKoprivica4
12 views15 slides
Serverless computing with Google Cloud (2023-24) by
Serverless computing with Google Cloud (2023-24)Serverless computing with Google Cloud (2023-24)
Serverless computing with Google Cloud (2023-24)wesley chun
11 views33 slides
Special_edition_innovator_2023.pdf by
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdfWillDavies22
17 views6 slides

Recently uploaded(20)

GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson85 views
Serverless computing with Google Cloud (2023-24) by wesley chun
Serverless computing with Google Cloud (2023-24)Serverless computing with Google Cloud (2023-24)
Serverless computing with Google Cloud (2023-24)
wesley chun11 views
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2217 views
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software263 views
Empathic Computing: Delivering the Potential of the Metaverse by Mark Billinghurst
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the Metaverse
Mark Billinghurst478 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker37 views
handbook for web 3 adoption.pdf by Liveplex
handbook for web 3 adoption.pdfhandbook for web 3 adoption.pdf
handbook for web 3 adoption.pdf
Liveplex22 views
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab19 views
Transcript: The Details of Description Techniques tips and tangents on altern... by BookNet Canada
Transcript: The Details of Description Techniques tips and tangents on altern...Transcript: The Details of Description Techniques tips and tangents on altern...
Transcript: The Details of Description Techniques tips and tangents on altern...
BookNet Canada136 views
Attacking IoT Devices from a Web Perspective - Linux Day by Simone Onofri
Attacking IoT Devices from a Web Perspective - Linux Day Attacking IoT Devices from a Web Perspective - Linux Day
Attacking IoT Devices from a Web Perspective - Linux Day
Simone Onofri16 views

Margaret Mitchell, Senior Research Scientist, Google, at MLonf Seattle 2017

  • 1. Ishan Misra Larry Zitnick Ross Girshick Josh Lovejoy Hartwig Adam Blaise Agüera y Arcas CMU FAIR FAIR Google Google Google The Seen and Unseen Factors Influencing Machine Perception of Images and Language Margaret Mitchell Google
  • 2. What do you see?
  • 3. What do you see? ● Bananas
  • 4. What do you see? ● Bananas ● Bananas at a store
  • 5. What do you see? ● Bananas ● Bananas at a store ● Bananas on shelves
  • 6. What do you see? ● Bananas ● Bananas at a store ● Bananas on shelves ● Bunches of bananas
  • 7. What do you see? ● Bananas ● Bananas at a store ● Bananas on shelves ● Bunches of bananas ● Bananas with stickers on them
  • 8. What do you see? ● Bananas ● Bananas at a store ● Bananas on shelves ● Bunches of bananas ● Bananas with stickers on them ● Bunches of bananas with stickers on them on shelves in a store
  • 9. What do you see? ● Bananas ● Bananas at a store ● Bananas on shelves ● Bunches of bananas ● Bananas with stickers on them ● Bunches of bananas with stickers on them on shelves in a store Yellow Bananas
  • 10. What do you see? Green Bananas Unripe Bananas
  • 11. What do you see? Ripe Bananas Bananas with spots
  • 12. What do you see? Ripe Bananas Bananas with spots Bananas good for banana bread
  • 13. What do you see? Yellow Bananas Yellow is prototypical for bananas
  • 14. A man and his son are in a terrible accident and are rushed to the hospital in critical care. The doctor looks at the boy and exclaims "I can't operate on this boy, he's my son!" How could this be?
  • 15. A man and his son are in a terrible accident and are rushed to the hospital in critical care. The doctor looks at the boy and exclaims "I can't operate on this boy, he's my son!" How could this be?
  • 16. “Female doctor” A man and his son are in a terrible accident and are rushed to the hospital in critical care. The doctor looks at the boy and exclaims "I can't operate on this boy, he's my son!" How could this be?
  • 18. The majority of test subjects overlooked the possibility that the doctor is a she - including men, women, and self-described feminists. Wapman & Belle, Boston University
  • 19. Word Frequency in corpus “spoke” 11,577,917 “laughed” 3,904,519 “murdered” 2,834,529 “inhaled” 984,613 “breathed” 725,034 “hugged” 610,040 “blinked” 390,692 “exhale” 168,985 World learning from text Gordon and Van Durme, 2013
  • 20. Word Frequency in corpus “spoke” 11,577,917 “laughed” 3,904,519 “murdered” 2,834,529 “inhaled” 984,613 “breathed” 725,034 “hugged” 610,040 “blinked” 390,692 “exhale” 168,985 World learning from text Gordon and Van Durme, 2013
  • 21. Human Reporting Bias The frequency with which people write about actions, outcomes, or properties is not a reflection of real-world frequencies or the degree to which a property is characteristic of a class of individuals. (Gordon and Van Durme, 2013)
  • 23. Training data are collected and annotated Model is trained
  • 24. Training data are collected and annotated Model is trained Media are filtered, ranked, aggregated, or generated
  • 25. Training data are collected and annotated Model is trained Media are filtered, ranked, aggregated, or generated People see output
  • 26. Human Biases in Data Reporting bias Selection bias Overgeneralization Out-group homogeneity bias Stereotypical bias Historical unfairness Implicit associations Implicit stereotypes Prejudice Group attribution error Halo effect Training data are collected and annotated
  • 27. Human Biases in Data Reporting bias Selection bias Overgeneralization Out-group homogeneity bias Stereotypical bias Historical unfairness Implicit associations Implicit stereotypes Prejudice Group attribution error Halo effect Training data are collected and annotated Human Biases in Collection and Annotation Sampling error Non-sampling error Insensitivity to sample size Correspondence bias In-group bias Bias blind spot Confirmation bias Subjective validation Experimenter’s bias Choice-supportive bias Neglect of probability Anecdotal fallacy Illusion of validity Automation bias
  • 28. Training data are collected and annotated Model is trained Media are filtered, ranked, aggregated, or generated People see output
  • 29. Training data are collected and annotated Model is trained Media are filtered, ranked, aggregated, or generated People see output Human Bias
  • 30. Training data are collected and annotated Model is trained Media are filtered, ranked, aggregated, or generated People see output Human Bias Human Bias Human Bias Human Bias
  • 31. Training data are collected and annotated Model is trained Media are filtered, ranked, aggregated, or generated People see output Human Bias Biased data created from process becomes new training data Human Bias Human Bias Human Bias
  • 32. Bias: A systematic deviation from full ground truth.
  • 33. Bias: A systematic deviation from full ground truth.
  • 34. Can be used as a signal.
  • 35. Reporting Bias in Vision and Language
  • 36. Data data everywhere … But not many labels to train Exhaustively annotated data is expensive dog, chair, pizza, donut dog, chihuahua, brown, chair, table, wall, space heater, pizza, greasy, donut 1, donut 2, pizza slice 1, pizza slice 2...
  • 37. Simple Image Classification Linear Sigmoid Classifier CNN Banana Yellow Output Input Image Ground Truth “Gold standard” Annotation: Human-biased label yw ∈ {0, 1} Prediction hw (yw |I) w ∈ {banana, yellow} yw For each w
  • 38. Factoring in Reporting Bias: Idea ● A human-biased prediction h can be factored into two terms
  • 39. ● A human-biased prediction h can be factored into two terms ○ Visual presence v – Is the concept visually present? w ∈ {banana, yellow} Factoring in Reporting Bias: Idea
  • 40. ● A human-biased prediction h can be factored into two terms ○ Visual presence v – Is the concept visually present? ○ Relevance r – Is the concept relevant for a human? w ∈ {banana, yellow} Factoring in Reporting Bias: Idea
  • 41. ● A human-biased prediction h can be factored into two terms ○ Visual presence v – Is the concept visually present? ○ Relevance r – Is the concept relevant for a human? Factoring in Reporting Bias: Idea
  • 42. ● A human-biased prediction h can be factored into two terms ○ Visual presence v – Is the concept visually present? ○ Relevance r – Is the concept relevant for a human? Label Prediction Visually correct ground truth (Unknown) Available ground truth (human-biased) Is concept present? Given visual presence, is concept relevant? Factoring in Reporting Bias: Idea
  • 43. End-to-End Approach SOURCE Misra, Ishan; Girshick, Ross; Mitchell, Margaret; Zitnick, Larry (2016). “Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels”. Proceedings of CVPR.
  • 45. Ishan Misra Larry Zitnick Ross Girshick Josh Lovejoy Hartwig Adam Blaise Agüera y Arcas CMU FAIR FAIR Google Google Google Thanks! margarmitchell@gmail.com Margaret Mitchell Google