Whistler2023_Saige.pptx

@being_saige
Value-Based Machine Learning:
Optimizing for Utility >> Accuracy
Saige Rutherford
Whistler Workshop
February 27th, 2023

Road Map
• Introduction
• Goals & values of the field, current state of the field,
definitions (paradigm shift, optimization, accuracy, utility)
• Accuracy
• Measuring, optimizing, limitations
• Utility
• Measuring, optimizing, benefits
• What’s Next?
• Roadblocks, future directions, take-home messages

Artificial Intelligence / Machine Learning
Brain-Behavior Predictive Modeling
Value-Based Machine Learning | Intro | Accuracy | Utility | What’s Next
Brain-Behavior Prediction
Whistler Workshop on
Brain Functional Organization, Connectivity, and Behavior

Brain-Behavior Predictive Modeling: Goals & Values
• Validity?
• Reliability?
• Explainability?
• Fairness?
• Accountability?
• Useability?
• Impact?
Goals Values

Brain-Behavior Predictive Modeling: My Journey
• Data Scientist @ UMich
• Predicting BrainAge or Cognition
2017 2023
2022
2021
2020
2019
2018
Whistler: Whistler:
“Value-Based
Machine Learning”
• Ph.D. candidate @ Donders Institute
• Big Data Normative Modeling +
Transfer Learning to Clinical Datasets
Whistler:
“Developmental Mega Sample”

Brain-Behavior Predictive Modeling: Current Status
Combine a bunch
of data from
open datasets Fit a bunch of different
algorithms, ranging
from simple to super
complicated
Realize there is little
overlap in available
phenotypes across these
datasets. You are left with
age, sex, maybe cognition
(if you’re lucky).
Realize that there isn’t a
lot of signal in the data,
and that you can’t even
predict age that well
(maybe within ~3-5years)
Publish your results anyways….
a) being super optimistic and
slightly overselling the
interpretation and potential.
b) sharing your honest viewpoint
(using MRI doesn’t help much).
Have trouble finding a journal that
will publish this perspective.
a) repeat
b) leave for a data science industry
job or another field “where ML can
have more impact”

Brain-Behavior Predictive Modeling: Bingo Card
Fluid
Intelligence
Brain Age
Poor
Reliability
Reference to
Marek et al.
Nature Paper
No confound
correction...
“could be
motion”
HCP /
ABCD /
UKBiobank
r = 0.28
“Has clinical
potential
(one day)”
“We need a
bigger
sample size”

Value-Based Machine Learning

Fee for service healthcare Value-based healthcare
DSM-V Diagnostics RDoC
Biomedical
Utility
Accuracy
Biopsychosocial
Example Paradigm Shifts
Lower costs Improving patient outcomes
Dimensional, continuous
Categorical, binary
Lack of illness Improved functioning
Tunnel vision Big picture

Why Do We Need A Paradigm Shift?

Definition: Optimization

Define
Objectives
Acquire
Date
Prepare
Date
Analyze
& Explore
Feature
Extraction
Develop
& Train
Model
Test
Model
Deploy
Model
Monitor &
Optimize
Optimization within the Machine Learning Lifecycle
*Optimize*
*Optimize*
*Optimize*
*Optimize*
*Optimize*
*Optimize*
*Optimize*
*Optimize*

• The quest for high performance.
• A narrow objective of becoming more
accurate, and an immediate (short term)
action plan for how to achieve this goal
(minimize the loss function on a
particular set of data).
Definition: Accuracy

• Consider utility to be more closely aligned with
the model’s purpose (i.e., answering the research
question and adding real world value).
• Utility looks at the bigger picture and makes
creative adjustments to align with the ultimate
research goal and real-world application.
Definition: Utility

• In the process of setting up the optimization problem we convinced ourselves
that it makes sense to optimize for accuracy because it is more easily
mathematically formulated than utility…
• … But if you zoom out to look at the bigger picture you realize the goal of the
A.I. field is to do useful stuff that makes life easier for humans, not to create
intelligence (become more accurate).
Bringing Together Optimization, Accuracy, & Utility

Measuring Accuracy
• A single standard metric that represents the models’ ability to predict
observations in the test set.

Optimization for Accuracy
• A specific loss function is used to improve model during training/validation. Often
same metric is used to evaluate “out of sample” performance in test set.

Optimization for Accuracy
• Benchmarking  A comparison of model performance to another model.
• “Best” model is determined by being more accurate than the others.

• There is no consensus regarding the definition of success in creating artificial
intelligence meaning there is no finish line or upper limit on attaining A.I.
• Without a clear definition of goals and a vision of what success looks like,
how will we know when we have reached the goal?
• What does it mean to become infinitely more intelligent?
• What purpose does it serve to have a world full of agents (machines or humans)
that are super intelligent?
• Goodhart’s Law: “When a measure becomes a target, it ceases to be a good metric.”
Limitations of Accuracy

(Abstract) Limitations of Accuracy
• Soccer team example…. the star player who only thinks about themselves
(accuracy) vs. the team captain who puts the team first (utility).
Jamie Tart
vs.
Roy Kent

(Concrete) Limitations of Accuracy
• High accuracy does not imply:
• reproducibility
• meaningfulness (that the features used are better than random)
• does not come with explainability
• equal accuracy doesn’t imply that two models have learned in the same way

Measurement of Utility
• Validity?
• Reliability?
• Explainability?
• Fairness?
• Accountability?
• Useability?
• Impact?

Measurement of Utility

Predicting patient pain score instead of radiologist’s dx
Optimization for Utility (fairness is priority value)
• Use knee X-rays to predict patients’ self-reported experienced pain,
instead of using standard measures of pain severity (radiologist dx).
• Relative to radiologist dx, which accounted for only 9% of racial
disparities in pain, using self reported pain labels accounted for 43%
of racial disparities in pain (4.7× more than radiologist dx).

• Equal opportunity & Multi-objective optimization
Fairness
Accuracy
Optimization for Utility (fairness is priority value)

Optimization for Utility (efficiency/useability is priority value)
• Optimizing for teamwork, AI learning to complement humans.
https://pcnportal.dccn.nl/
• Sharing pre-trained models & creating accessible tools

Benefits of Utility
• Collaborative, efficient, well-defined purpose.
• Functional (real depth and meaning) rather than attractive (shallow,
surface-level appeal).
• An opportunity to think deeply and align your models with your purpose.
• Creative thinking and problem solving is required.
• More of a challenge… thus more satisfying solutions will be created.

Roadblocks
• Cognitive biases make us focus on simpler problems.
• As problem complexity increases, we shift responsibility, and think along the
lines of “this is out of my expertise, it is someone else’s problem to solve”.
Ambiguity Effect Bandwagon Effect Status Quo Bias
Loss Aversion

Roadblocks
During
Development
In the Wild
(Real world)
Stationary Data
Single Decision Maker
Complex, Non-stationary Data
Many Stakeholders

Roadblocks
Fairness Accuracy
Accuracy
Fairness
Transparency
Reliability

• Many other fields have defined utility and figured out how to optimize for it.
• Let’s learn from them.
Future Directions
Human-Computer Interaction (HCI)
Ethical A.I.
Value-based healthcare
Behavioral Economics

• Utility (and value priorities) will always depend on the context.
• We need open communication and guidelines about making these decisions.
Future Directions

• Too much (tunnel vision) focus on accuracy of predictive models.
• We have lost track of our “why” and this has created a lack of model utility.
• It should be a priority to define our values which will help us build a better plan
for moving towards these goals and values.
• Optimizing for utility is an abstract and creative practice that requires diverse
perspectives and input. It should be an on-going process.
Take Home Messages

Charlie-Mop
Acknowledgments
Chandra Sripada, Mike Angstadt, Daniel Kessler, Liza
Levina, Ivy Tso, Alex Weigard, Jenna Wiens
Ph.D. supervisors: Andre Marquand, Eric Ruhé, &
Christian Beckmann.
Lab members: Seyed Mostafa Kia, Thomas Wolfers,
Mariam Zabihi, Charlotte Fraza, Pieter Barkema, Stijn
de Boers, Barbora Rehák Bučková
Donders Institute, Nijmegen University of Michigan, Ann Arbor

Thank you.
Questions?
@being_saige

Whistler2023_Saige.pptx

Recommended

Recommended

More Related Content

Similar to Whistler2023_Saige.pptx

Similar to Whistler2023_Saige.pptx (20)

More from SaigeRutherford

More from SaigeRutherford (8)

Recently uploaded

Recently uploaded (20)

Whistler2023_Saige.pptx

Editor's Notes