Open science can contribute to AI trustworthiness. This talk is a categorization of scientific data platforms, and a framing of AI trustworthiness with pointers to open science contributions.
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Trustworthy AI and Open Science
1. TRUSTWORTHY AI AND
OPEN SCIENCE
Beth Plale
Michael A and Laurie Burns McRobbie Professor of Computer Engineering
Beilstein Open Science symposium
October 06, 2021
Luddy School of Informatics, Computing, and Engineering
Data To Insight Center
2. Observations influenced by my role (2017-2020) in the
National Science Foundation working on agency policies
and practice in open science. Views expressed are
entirely my own.
Funding agency perspective on open science: how do
we bring visibility to the products of research (that we
fund)
3. NSF funds the collection and capture
of research data through projects
ranging from a few hundred thousand
dollars to tens of millions of dollars.
The data are maintained in a
landscape of solutions to meet the
needs of researchers.
4. Specialist repositories
- Organizational resources
Generalist repositories
- Organizational resources
Data Portals
- Low velocity data
- Employs cloud resources
- Employs data-compute proximity for analysis
Observation networks
- High velocity data
- Employs cloud resources
RESEARCH DATA LANDSCAPE
SAGE
NEON ARM
HPWREN
UWI
LTER, OOI
NEON
HydroShare
LTER
MGDS, IRIS
ICPSR
QDR
TAIR
MDF
IEDA
PDB
CCDC
DataVerse
Figshare
Dryad
Zenodo
IRs
Exemplar
systems
5. RESEARCH DATA LANDSCAPE
Data
timeliness
need
Researcher
depth of
expertise
Expectation
for level of
curation
Expectation
of data
longevity
Specialist repositories
- Organizational resources
Generalist repositories
- Organizational resources
Data Portals
- Low velocity data
- Employs cloud resources
- Employs data-compute proximity for analysis
Observation networks
- High velocity data
- Employs cloud resources
SAGE
NEON ARM
HPWREN
UWI
LTER, OOI
NEON
HydroShare
LTER
MGDS, IRIS
ICPSR
QDR
TAIR
MDF
IEDA
PDB
CCDC
DataVerse
Figshare
Dryad
Zenodo
IRs
6. RESEARCH DATA LANDSCAPE
Publisher’s
view of
landscape
(general
public
view as
well)
Optimization
for timeliness
of research
could
suggest
lower value
over time
Specialist repositories
- Organizational resources
Generalist repositories
- Organizational resources
Data Portals
- Low velocity data
- Employs cloud resources
- Employs data-compute proximity for analysis
Observation networks
- High velocity data
- Employs cloud resources
SAGE
NEON ARM
HPWREN
UWI
LTER, OOI
NEON
HydroShare
LTER
MGDS, IRIS
ICPSR
QDR
TAIR
MDF
IEDA
PDB
CCDC
DataVerse
Figshare
Dryad
Zenodo
IRs
7. Generalist–Aided
Deposit:
engages generalist
curators
Metadata:
generalist schema
Reuse potential:
moderate-low as
metadata is curated
but general
Scope:
discipline agnostic
scope
Discovery:
broad name
recognition
Specialist-DBMS
Deposit:
difficult so DB often
read-only
Metadata:
data dictionary + DB
schema
Reuse potential:
high potential as self
contained
Scope:
subdiscipline scope
Discovery:
known within
subdiscipline
Specialist–Aided
Deposit:
engages specialist
curators
Metadata:
specialized
schema
Reuse potential:
high due to
specialists
Scope:
discipline scope
Discovery:
known within
discipline
Specialist-Unaided
Deposit:
unaided deposit
Metadata:
specialized schema
Reuse potential:
moderate-high from
discipline focus of
metadata schema
Scope:
discipline scope
Discovery:
known within
discipline
Generalist-Unaided
Deposit:
unaided deposit
Metadata:
generalist schema
Reuse potential:
low as metadata is
minimal
Scope:
discipline agnostic
scope
Discovery:
broad name
recognition
i.e., institutional repositories
8. Generalist–Aided
Deposit:
engages generalist
curators
Metadata:
generalist schema
Reuse potential:
moderate-low as
metadata is curated
but general
Scope:
discipline agnostic
scope
Discovery:
broad name
recognition
Specialist-DBMS
Deposit:
difficult so DB often
read-only
Metadata:
data dictionary + DB
schema
Reuse potential:
high potential as self
contained
Scope:
subdiscipline scope
Discovery:
known within
subdiscipline
Specialist–Aided
Deposit:
engages specialist
curators
Metadata:
specialized
schema
Reuse potential:
high due to
specialists
Scope:
discipline scope
Discovery:
known within
discipline
Specialist-Unaided
Deposit:
unaided deposit
Metadata:
specialized schema
Reuse potential:
moderate-high from
discipline focus of
metadata schema
Scope:
discipline scope
Discovery:
known within
discipline
Generalist-Unaided
Deposit:
unaided deposit
Metadata:
generalist schema
Reuse potential:
low as metadata is
minimal
Scope:
discipline agnostic
scope
Discovery:
broad name
recognition
i.e., institutional repositories
9. Generalist–Aided
Deposit:
engages generalist
curators
Metadata:
generalist schema
Reuse potential:
moderate-low as
metadata is curated
but general
Scope:
discipline agnostic
scope
Discovery:
broad name
recognition
Specialist-DBMS
Deposit:
difficult so DB often
read-only
Metadata:
data dictionary + DB
schema
Reuse potential:
high potential as self
contained
Scope:
subdiscipline scope
Discovery:
known within
subdiscipline
Specialist–Aided
Deposit:
engages specialist
curators
Metadata:
specialized
schema
Reuse potential:
high due to
specialists
Scope:
discipline scope
Discovery:
known within
discipline
Specialist-Unaided
Deposit:
unaided deposit
Metadata:
specialized schema
Reuse potential:
moderate-high from
discipline focus of
metadata schema
Scope:
discipline scope
Discovery:
known within
discipline
Generalist-Unaided
Deposit:
unaided deposit
Metadata:
generalist schema
Reuse potential:
low as metadata is
minimal
Scope:
discipline agnostic
scope
Discovery:
broad name
recognition
i.e., institutional repositories
10. FEDERAL RESEARCH DATA SUMMARY
• Observation networks and data portals are a fixed part of the
landscape. They have a different role in open science than do
repositories
• Generalist repositories are easier to use than specialist
repositories
• Specialist repositories have higher reusability
• Generalist repositories have economies of scale
• If specialist repositories can leverage generalist repositories as
back ends it would reduce overall cost
12. “ON ARTIFICIAL
INTELLIGENCE, TRUST
IS A MUST, NOT A
NICE-TO-HAVE”
Margrethe Vestager, the European
Commission executive vice president
who oversees digital policy for the 27-
nation bloc
13. TRUST ó TRUSTWORTHINESS
TRUST
• An individual’s confidence in an
entity
• “I trust this web site”
TRUSTWORTHINESS
• An entity’s state of being
trustworthy or reliable
• An estimate of an object’s
worthiness to receive someone’s
trust
• Trustworthiness is difficult to
accurately quantify
14.
15.
16. INDIANA UNIVERSITY BLOOMINGTON
AI: Human-Machine Interaction
§ Fitness smartwatch, smart hearing aids
§ Co-bots, cyber-crews, digital twins
§ Integration of smart machines into human body in
form of computer-brain interfaces or cyborgs
AI: Autonomous and Semi-
Autonomous Actors
• Weapon systems
• Robots in deep sea and space
exploration
• Self driving cars
• Bots in financial trade
AI: Big Data / Big Compute
• Deep learning / Machine Learning /
Natural Language Processing
• Medical diagnosis, image recognition
Broad Categories
of AI
17. INDIANA UNIVERSITY BLOOMINGTON
AI: Human-Machine Interaction
§ Fitness smartwatch, smart hearing aids
§ Co-bots, cyber-crews, digital twins
§ Integration of smart machines into human body in
form of computer-brain interfaces or cyborgs
AI: Autonomous and Semi-
Autonomous Actors
• Weapon systems
• Robots in deep sea and space
exploration
• Self driving cars
• Bots in financial trade
AI: Big Data / Big Compute
• Deep learning / Machine Learning /
Natural Language Processing
• Medical diagnosis, image recognition
Broad Categories
of AI
Category with most
urgency in issues of
artificial moral agency
18. INDIANA UNIVERSITY BLOOMINGTON
AI: Human-Machine Interaction
§ Fitness smartwatch, smart hearing aids
§ Co-bots, cyber-crews, digital twins
§ Integration of smart machines into human body in
form of computer-brain interfaces or cyborgs
AI: Autonomous and Semi-
Autonomous Actors
• Weapon systems
• Robots in deep sea and space
exploration
• Self driving cars
• Bots in financial trade
AI: Big Data / Big Compute
• Deep learning / Machine Learning /
Natural Language Processing
• Medical diagnosis, image recognition
Broad Categories
of AI
Research needed in policy
and technical extensions
that lead to greater and
more measurable forms of
accountability
19. INDIANA UNIVERSITY BLOOMINGTON
INTERVENTION POINTS: ENHANCED
TRUSTWORTHINESS
Developer
ethics,
development
process norms
Societal influence:
public pressure,
legislation,
regulatory
oversight AI algorithmic
knowledge
exhibiting
higher levels
of
trustworthiness
Technological
manifestation:
verifiable claims,
explainability,
accountability
20. Trustworthy AI is AI that is designed, developed, and used in a
manner that is lawful, fair, unbiased, accurate, reliable,
effective, safe, secure, resilient, understandable, and with
processes in place to regularly monitor and evaluate the AI
system’s performance and outcomes
Lynne Parker, Deputy US Chief Technology Officer and Director of the National Artificial Intelligence Initiative Office
21. ML PROCESS
M. Veale et al., CHI 2018
Data
Training
data
Feature
extraction
Test data
Learning
algorithm
Trained
model
Predict
New
data
Explain-
ability
inquiries
dev ops
22. RESEARCH PRODUCTS
M. Veale et al., CHI 2018
Data
Training
data
Feature
extraction
Test data
Learning
algorithm
Trained
model
Predict
New
data
Explain-
ability
inquiries
dev ops
23. Open science contributes to trustworthy
AI (trusted products)
The research products of AI need to
include intermediate results and
explainability services