CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning

CATS4ML:
Crowdsourcing Adverse Test Sets for ML
Welcome!
ˈl ɪ k ə r t
cats4ml.humancomputation.com

● Click on the three dots on the lower corner of
your Google Meets screen.
● Select “Change layout.”
● Choose your preferred layout:
○ Sidebar View
○ Spotlight View
○ Tiled View
● Select “Turn on captions” for Closed
Captioning
Your Google Meet View

Fill out our interest form to hear
about events and opportunities
at goo.gle/neurips-booth-form
Google at NeurIPS 2020
Build
for
everyone

ˈl ɪ k ə r t
TAKE HOME MESSAGE
data is the compass for AI - AI advances where
there is data
data quality must be addressed in AI practices
especially in the way we evaluate AI
improving evaluation of AI must consider ways to
measure variance and capture bias to bring us
one step closer to data excellence
to address variance in AI evaluation we propose a
number of novel metrics for reliability,
signiﬁcance (metrology for AI) and disagreement
(CrowdTruth)
to address bias in AI evaluation we propose a
novel method for crowdsourcing adverse test
sets for ML models (CATS4ML)

ˈl ɪ k ə r t
TAKE HOME MESSAGE
data is the compass for AI - AI advances where
there is data
data quality must be addressed in AI practices
especially in the way we evaluate AI
improving evaluation of AI must consider ways to
measure variance and capture bias to bring us
one step closer to data excellence
to address variance in AI evaluation we propose a
number of novel metrics for reliability ,
signiﬁcance (metrology for AI) and disagreement
(CrowdTruth)
to address bias in AI evaluation we propose a
novel method for crowdsourcing adverse test
sets for ML models (CATS4ML)

ˈl ɪ k ə r t
The Life of AI Data
“It exists!” → “It is bigger!” → “It is better!”
but before it got better ...
reactive
data improvement

ˈl ɪ k ə r t
The Life of AI Data
“It exists!” → “It is bigger!” → “It is better!”
to reach here
we need proactive
data improvement

ˈl ɪ k ə r t
Your AI model is as good as
your evaluation data
… but is your evaluation data
missing relevant examples?

ˈl ɪ k ə r t
Your AI model is as good as
your evaluation data
… but is your evaluation data
missing relevant examples?
How can we ﬁnd such
examples, especially if they are
AI blindspots
(i.e. unknown unknowns)? cats4ml.humancomputation.com

ˈl ɪ k ə r t
offers a crowdsourced solution for finding
blindspots of your AI models
inspired by prior research in human computation
Beat the Machine: Challenging Humans to Find a Predictive Model's “Unknown Unknowns” (Attenberg, Ipeirotis, Provost, 2014)
Vulnerability Reward Program (Google)
CATS4ML Challenge

ˈl ɪ k ə r t
offers a crowdsourced red team
for finding blindspots of your AI models
CATS4ML Challenge

ˈl ɪ k ə r t
In this first version of the CATS4ML challenge
participants will discover AI blindspots in the Open Images Dataset
Lipstick?
Airplane?
Car?
Construction worker?
Thanksgiving?
These AI blindspots are real images with visual
patterns that confuse AI models
in ways humans might find meaningful

ˈl ɪ k ə r t
image1 label1
image2 label2
…
challenge
participants
labels comparison
Open
Images
dataset
submission
image 1: label 1: lipstick
image 2: label 2: thanksgiving
...
human-in-the-loop verification
image1-label1
image2-label2
...
image1-label1
image2-label2
...
submission scoring
vs.
The Challenge Process

ˈl ɪ k ə r t

ˈl ɪ k ə r t
The challenge will run until 30 April, 2021

ˈl ɪ k ə r t
Join us now!
● join the data challenge
individually or form a team
● discover interesting AI
blindspots in Open Images
Dataset & contribute these to
the challenge
● be part of this research effort
● winning contributions will be
promoted at next
CrowdCamp 2021

ˈl ɪ k ə r t
The Team
Praveen Paritosh Ka Wong
Lora Aroyo Devi Krishna
ˈl ɪ k ə r t

ˈl ɪ k ə r t
Share your voice about ML data!
We are inviting ML professionals to
participate in a short survey to
learn about your challenges and
needs with data.
Scan to participate now!

CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning

Similar to CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning (20)

More from Lora Aroyo

More from Lora Aroyo (20)

Recently uploaded

Recently uploaded (20)

CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning