SlideShare a Scribd company logo
1 of 20
CATS4ML:
Crowdsourcing Adverse Test Sets for ML
Welcome!
ˈl ɪ k ə r t
cats4ml.humancomputation.com
● Click on the three dots on the lower corner of
your Google Meets screen.
● Select “Change layout.”
● Choose your preferred layout:
○ Sidebar View
○ Spotlight View
○ Tiled View
● Select “Turn on captions” for Closed
Captioning
Your Google Meet View
Fill out our interest form to hear
about events and opportunities
at goo.gle/neurips-booth-form
Google at NeurIPS 2020
Build
for
everyone
ˈl ɪ k ə r t
TAKE HOME MESSAGE
data is the compass for AI - AI advances where
there is data
data quality must be addressed in AI practices
especially in the way we evaluate AI
improving evaluation of AI must consider ways to
measure variance and capture bias to bring us
one step closer to data excellence
to address variance in AI evaluation we propose a
number of novel metrics for reliability,
significance (metrology for AI) and disagreement
(CrowdTruth)
to address bias in AI evaluation we propose a
novel method for crowdsourcing adverse test
sets for ML models (CATS4ML)
ˈl ɪ k ə r t
TAKE HOME MESSAGE
data is the compass for AI - AI advances where
there is data
data quality must be addressed in AI practices
especially in the way we evaluate AI
improving evaluation of AI must consider ways to
measure variance and capture bias to bring us
one step closer to data excellence
to address variance in AI evaluation we propose a
number of novel metrics for reliability ,
significance (metrology for AI) and disagreement
(CrowdTruth)
to address bias in AI evaluation we propose a
novel method for crowdsourcing adverse test
sets for ML models (CATS4ML)
ˈl ɪ k ə r t
The Life of AI Data
“It exists!” → “It is bigger!” → “It is better!”
but before it got better ...
reactive
data improvement
ˈl ɪ k ə r t
The Life of AI Data
“It exists!” → “It is bigger!” → “It is better!”
to reach here
we need proactive
data improvement
ˈl ɪ k ə r t
Your AI model is as good as
your evaluation data
… but is your evaluation data
missing relevant examples?
cats4ml.humancomputation.com
ˈl ɪ k ə r t
Your AI model is as good as
your evaluation data
… but is your evaluation data
missing relevant examples?
How can we find such
examples, especially if they are
AI blindspots
(i.e. unknown unknowns)? cats4ml.humancomputation.com
ˈl ɪ k ə r t
offers a crowdsourced solution for finding
blindspots of your AI models
inspired by prior research in human computation
Beat the Machine: Challenging Humans to Find a Predictive Model's “Unknown Unknowns” (Attenberg, Ipeirotis, Provost, 2014)
Vulnerability Reward Program (Google)
CATS4ML Challenge
Crowdsourcing Adverse Test Sets for ML
ˈl ɪ k ə r t
offers a crowdsourced red team
for finding blindspots of your AI models
CATS4ML Challenge
Crowdsourcing Adverse Test Sets for ML
cats4ml.humancomputation.com
ˈl ɪ k ə r t
In this first version of the CATS4ML challenge
participants will discover AI blindspots in the Open Images Dataset
Lipstick?
Airplane?
Car?
Construction worker?
Thanksgiving?
These AI blindspots are real images with visual
patterns that confuse AI models
in ways humans might find meaningful
cats4ml.humancomputation.com
ˈl ɪ k ə r t
image1 label1
image2 label2
…
challenge
participants
labels comparison
Open
Images
dataset
submission
image 1: label 1: lipstick
image 2: label 2: thanksgiving
...
human-in-the-loop verification
image1-label1
image2-label2
...
image1-label1
image2-label2
...
submission scoring
vs.
The Challenge Process
cats4ml.humancomputation.com
ˈl ɪ k ə r t
cats4ml.humancomputation.com
ˈl ɪ k ə r t
cats4ml.humancomputation.com
ˈl ɪ k ə r t
cats4ml.humancomputation.com
ˈl ɪ k ə r t
cats4ml.humancomputation.com
The challenge will run until 30 April, 2021
ˈl ɪ k ə r t
cats4ml.humancomputation.com
Join us now!
● join the data challenge
individually or form a team
● discover interesting AI
blindspots in Open Images
Dataset & contribute these to
the challenge
● be part of this research effort
● winning contributions will be
promoted at next
CrowdCamp 2021
ˈl ɪ k ə r t
The Team
Praveen Paritosh Ka Wong
Lora Aroyo Devi Krishna
ˈl ɪ k ə r t
cats4ml.humancomputation.com
ˈl ɪ k ə r t
Share your voice about ML data!
We are inviting ML professionals to
participate in a short survey to
learn about your challenges and
needs with data.
Scan to participate now!

More Related Content

What's hot

Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
butest
 
M2 l10 fairness, accountability, and transparency
M2 l10 fairness, accountability, and transparencyM2 l10 fairness, accountability, and transparency
M2 l10 fairness, accountability, and transparency
BoPeng76
 
ESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social Media
Arjen de Vries
 
Looking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterpriseLooking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterprise
Arjen de Vries
 

What's hot (19)

Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Business Intelligence and Big Data in Cloud
Business Intelligence and Big Data in CloudBusiness Intelligence and Big Data in Cloud
Business Intelligence and Big Data in Cloud
 
Demystifying Artificial Intelligence: Solving Difficult Problems at ProductCa...
Demystifying Artificial Intelligence: Solving Difficult Problems at ProductCa...Demystifying Artificial Intelligence: Solving Difficult Problems at ProductCa...
Demystifying Artificial Intelligence: Solving Difficult Problems at ProductCa...
 
SEO & Artificial Intelligence: The new rules to stay on top!
SEO & Artificial Intelligence: The new rules to stay on top!SEO & Artificial Intelligence: The new rules to stay on top!
SEO & Artificial Intelligence: The new rules to stay on top!
 
M2 l10 fairness, accountability, and transparency
M2 l10 fairness, accountability, and transparencyM2 l10 fairness, accountability, and transparency
M2 l10 fairness, accountability, and transparency
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science Demystified
 
Codes of Ethics and the Ethics of Code
Codes of Ethics and the Ethics of CodeCodes of Ethics and the Ethics of Code
Codes of Ethics and the Ethics of Code
 
Dynamic UXR: Ethical Responsibilities and AI. Carol Smith at Strive in Toronto
Dynamic UXR: Ethical Responsibilities and AI. Carol Smith at Strive in TorontoDynamic UXR: Ethical Responsibilities and AI. Carol Smith at Strive in Toronto
Dynamic UXR: Ethical Responsibilities and AI. Carol Smith at Strive in Toronto
 
Applications of Machine Learning at USC
Applications of Machine Learning at USCApplications of Machine Learning at USC
Applications of Machine Learning at USC
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
ESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social Media
 
Panel: AI for Social Good - Fairness, Ethics, Accountability, and Transparency
Panel: AI for Social Good - Fairness, Ethics, Accountability, and TransparencyPanel: AI for Social Good - Fairness, Ethics, Accountability, and Transparency
Panel: AI for Social Good - Fairness, Ethics, Accountability, and Transparency
 
Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big Data
 
Google Cloud - Google's vision on AI
Google Cloud - Google's vision on AIGoogle Cloud - Google's vision on AI
Google Cloud - Google's vision on AI
 
Applying Machine Learning and Artificial Intelligence to Business
Applying Machine Learning and Artificial Intelligence to BusinessApplying Machine Learning and Artificial Intelligence to Business
Applying Machine Learning and Artificial Intelligence to Business
 
R, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsR, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science Competitions
 
Digital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora AroyoDigital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora Aroyo
 
Looking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterpriseLooking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterprise
 
UseR 2017
UseR 2017UseR 2017
UseR 2017
 

Similar to CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning

The Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistThe Incredible Disappearing Data Scientist
The Incredible Disappearing Data Scientist
Rebecca Bilbro
 
Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)
Krishnaram Kenthapadi
 
STQA-Vol9-Issue2-March-2012-Software-Testing-Magazine
STQA-Vol9-Issue2-March-2012-Software-Testing-MagazineSTQA-Vol9-Issue2-March-2012-Software-Testing-Magazine
STQA-Vol9-Issue2-March-2012-Software-Testing-Magazine
Albert Gareev
 

Similar to CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning (20)

Human-Centered AI: Scalable, Interactive Tools for Interpretation and Attribu...
Human-Centered AI: Scalable, Interactive Tools for Interpretation and Attribu...Human-Centered AI: Scalable, Interactive Tools for Interpretation and Attribu...
Human-Centered AI: Scalable, Interactive Tools for Interpretation and Attribu...
 
The Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistThe Incredible Disappearing Data Scientist
The Incredible Disappearing Data Scientist
 
BSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, EvaluationsBSSML17 - Introduction, Models, Evaluations
BSSML17 - Introduction, Models, Evaluations
 
Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)
 
VSSML18. Introduction to Machine Learning and the BigML Platform
VSSML18. Introduction to Machine Learning and the BigML PlatformVSSML18. Introduction to Machine Learning and the BigML Platform
VSSML18. Introduction to Machine Learning and the BigML Platform
 
Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering
 
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
 
BSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and Evaluations
 
MLSEV. Machine Learning: Technical Perspective
MLSEV. Machine Learning: Technical PerspectiveMLSEV. Machine Learning: Technical Perspective
MLSEV. Machine Learning: Technical Perspective
 
Myth vs Reality: Understanding AI/ML for QA Automation - w/ Jonathan Lipps
Myth vs Reality: Understanding AI/ML for QA Automation - w/ Jonathan LippsMyth vs Reality: Understanding AI/ML for QA Automation - w/ Jonathan Lipps
Myth vs Reality: Understanding AI/ML for QA Automation - w/ Jonathan Lipps
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx
 
All the data and still not enough
All the data and still not enoughAll the data and still not enough
All the data and still not enough
 
Data Science at The New York Times
Data Science at The New York TimesData Science at The New York Times
Data Science at The New York Times
 
Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...
Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...
Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...
 
BSSML17 - Basic Data Transformations
BSSML17 - Basic Data TransformationsBSSML17 - Basic Data Transformations
BSSML17 - Basic Data Transformations
 
How to use LLMs in synthesizing training data?
How to use LLMs in synthesizing training data?How to use LLMs in synthesizing training data?
How to use LLMs in synthesizing training data?
 
STQA-Vol9-Issue2-March-2012-Software-Testing-Magazine
STQA-Vol9-Issue2-March-2012-Software-Testing-MagazineSTQA-Vol9-Issue2-March-2012-Software-Testing-Magazine
STQA-Vol9-Issue2-March-2012-Software-Testing-Magazine
 
Keepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivos
 
Debugging AI
Debugging AIDebugging AI
Debugging AI
 
The Disappearing Data Scientist
The Disappearing Data ScientistThe Disappearing Data Scientist
The Disappearing Data Scientist
 

More from Lora Aroyo

Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Lora Aroyo
 
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneMy ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
Lora Aroyo
 

More from Lora Aroyo (20)

NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdfNeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
 
Data excellence: Better data for better AI
Data excellence: Better data for better AIData excellence: Better data for better AI
Data excellence: Better data for better AI
 
CHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH SymposiumCHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH Symposium
 
Semantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP DemonstratorSemantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP Demonstrator
 
The Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataThe Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked Data
 
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @RijksmuseumKeynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum
 
FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18
 
Understanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithmsUnderstanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithms
 
StorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & MachinesStorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & Machines
 
Data Science with Humans in the Loop
Data Science with Humans in the LoopData Science with Humans in the Loop
Data Science with Humans in the Loop
 
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
 
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
 
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneMy ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
 
Data Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityData Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden University
 
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchSXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
 
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital AgeEuropeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
 
"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat
 
UMAP 2016 Opening Ceremony
UMAP 2016 Opening CeremonyUMAP 2016 Opening Ceremony
UMAP 2016 Opening Ceremony
 
Crowdsourcing & Nichesourcing: Enriching Cultural Heritage with Experts & Cr...
Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...
Crowdsourcing & Nichesourcing: Enriching Cultural Heritage with Experts & Cr...
 
Stitch by Stitch: Annotating Fashion at the Rijksmuseum
Stitch by Stitch: Annotating Fashion at the RijksmuseumStitch by Stitch: Annotating Fashion at the Rijksmuseum
Stitch by Stitch: Annotating Fashion at the Rijksmuseum
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning

  • 1. CATS4ML: Crowdsourcing Adverse Test Sets for ML Welcome! ˈl ɪ k ə r t cats4ml.humancomputation.com
  • 2. ● Click on the three dots on the lower corner of your Google Meets screen. ● Select “Change layout.” ● Choose your preferred layout: ○ Sidebar View ○ Spotlight View ○ Tiled View ● Select “Turn on captions” for Closed Captioning Your Google Meet View
  • 3. Fill out our interest form to hear about events and opportunities at goo.gle/neurips-booth-form Google at NeurIPS 2020 Build for everyone
  • 4. ˈl ɪ k ə r t TAKE HOME MESSAGE data is the compass for AI - AI advances where there is data data quality must be addressed in AI practices especially in the way we evaluate AI improving evaluation of AI must consider ways to measure variance and capture bias to bring us one step closer to data excellence to address variance in AI evaluation we propose a number of novel metrics for reliability, significance (metrology for AI) and disagreement (CrowdTruth) to address bias in AI evaluation we propose a novel method for crowdsourcing adverse test sets for ML models (CATS4ML)
  • 5. ˈl ɪ k ə r t TAKE HOME MESSAGE data is the compass for AI - AI advances where there is data data quality must be addressed in AI practices especially in the way we evaluate AI improving evaluation of AI must consider ways to measure variance and capture bias to bring us one step closer to data excellence to address variance in AI evaluation we propose a number of novel metrics for reliability , significance (metrology for AI) and disagreement (CrowdTruth) to address bias in AI evaluation we propose a novel method for crowdsourcing adverse test sets for ML models (CATS4ML)
  • 6. ˈl ɪ k ə r t The Life of AI Data “It exists!” → “It is bigger!” → “It is better!” but before it got better ... reactive data improvement
  • 7. ˈl ɪ k ə r t The Life of AI Data “It exists!” → “It is bigger!” → “It is better!” to reach here we need proactive data improvement
  • 8. ˈl ɪ k ə r t Your AI model is as good as your evaluation data … but is your evaluation data missing relevant examples? cats4ml.humancomputation.com
  • 9. ˈl ɪ k ə r t Your AI model is as good as your evaluation data … but is your evaluation data missing relevant examples? How can we find such examples, especially if they are AI blindspots (i.e. unknown unknowns)? cats4ml.humancomputation.com
  • 10. ˈl ɪ k ə r t offers a crowdsourced solution for finding blindspots of your AI models inspired by prior research in human computation Beat the Machine: Challenging Humans to Find a Predictive Model's “Unknown Unknowns” (Attenberg, Ipeirotis, Provost, 2014) Vulnerability Reward Program (Google) CATS4ML Challenge Crowdsourcing Adverse Test Sets for ML
  • 11. ˈl ɪ k ə r t offers a crowdsourced red team for finding blindspots of your AI models CATS4ML Challenge Crowdsourcing Adverse Test Sets for ML cats4ml.humancomputation.com
  • 12. ˈl ɪ k ə r t In this first version of the CATS4ML challenge participants will discover AI blindspots in the Open Images Dataset Lipstick? Airplane? Car? Construction worker? Thanksgiving? These AI blindspots are real images with visual patterns that confuse AI models in ways humans might find meaningful cats4ml.humancomputation.com
  • 13. ˈl ɪ k ə r t image1 label1 image2 label2 … challenge participants labels comparison Open Images dataset submission image 1: label 1: lipstick image 2: label 2: thanksgiving ... human-in-the-loop verification image1-label1 image2-label2 ... image1-label1 image2-label2 ... submission scoring vs. The Challenge Process cats4ml.humancomputation.com
  • 14. ˈl ɪ k ə r t cats4ml.humancomputation.com
  • 15. ˈl ɪ k ə r t cats4ml.humancomputation.com
  • 16. ˈl ɪ k ə r t cats4ml.humancomputation.com
  • 17. ˈl ɪ k ə r t cats4ml.humancomputation.com The challenge will run until 30 April, 2021
  • 18. ˈl ɪ k ə r t cats4ml.humancomputation.com Join us now! ● join the data challenge individually or form a team ● discover interesting AI blindspots in Open Images Dataset & contribute these to the challenge ● be part of this research effort ● winning contributions will be promoted at next CrowdCamp 2021
  • 19. ˈl ɪ k ə r t The Team Praveen Paritosh Ka Wong Lora Aroyo Devi Krishna ˈl ɪ k ə r t cats4ml.humancomputation.com
  • 20. ˈl ɪ k ə r t Share your voice about ML data! We are inviting ML professionals to participate in a short survey to learn about your challenges and needs with data. Scan to participate now!