Invited to present the work under development by Institute for Systems and Robotics (ISR-Lisboa) and Interactive Technologies Institute (ITI), the one-hour presentation and discussion were held in the 27th of October 2022. The work was presented remotely to the Department of Rad/Neuroimaging and Neurointervention at Stanford University in California. For this talk, I was invited to present our team, project, and work to the research team of Prof. Greg Zaharchuk. In the end, the presentation proposes and discusses how personalizing and customizing the answers coming from the AI outputs can positively affect the clinical workflow. Moreover, we present how those strategies are promoting the unbiased behavior of clinicians while improving the clinical workflow.
3. Team
João Fernandes
HCI MSc
Francisco M. Calisto
HCI PhD
Carlos Santiago
ML Researcher
Nuno Nunes
HCI Professor
Jacinto Nascimento
ML Professor
Clara Aleluia
Radiologist
Margarida Morais
ML MSc
João Maria Abrantes
Radiologist
4.
5. 9.6 million
deaths in 2018
Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R.L., Torre, L.A. and Jemal, A., 2018. Global cancer statistics 2018: GLOBOCAN estimates of
incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians, 68(6), pp.394-424.
6. ~10%
… yielding false-negative results.
Smith, Robert A., Kimberly S. Andrews, Durado Brooks, Stacey A. Fedewa, Deana Manassaram‐Baptiste, Debbie Saslow, Otis W. Brawley, and
Richard C. Wender. Cancer screening in the United States, 2018: a review of current American Cancer Society guidelines and current issues
in cancer screening. CA: A Cancer Journal for Clinicians 68, no. 4 (2018): 297-316.
7. ~40%
… yielding false-positive results.
Smith, Robert A., Kimberly S. Andrews, Durado Brooks, Stacey A. Fedewa, Deana Manassaram‐Baptiste, Debbie Saslow, Otis W. Brawley, and
Richard C. Wender. Cancer screening in the United States, 2018: a review of current American Cancer Society guidelines and current issues
in cancer screening. CA: A Cancer Journal for Clinicians 68, no. 4 (2018): 297-316.
10. MULTIMODALITY WORKFLOW
>
Magnetic Resonance Imaging
(MRI)
UltraSound
(US)
MammoGraphy
(MG)
>
Lesions
Calisto, F.M., Nunes, N. and Nascimento, J.C., 2020, September. “BreastScreening: On the Use of Multi-Modality in Medical Imaging
Diagnosis”. In Proceedings of the International Conference on Advanced Visual Interfaces (pp. 1-5).
11. BREAST SEVERITY
BI-RADS Meaning
0 Needs more information (more exams or waiting for more exams)
1 Negative
2 Benign
3 Probably Benign
4 Suspicious
5 Highly suggestive of malignancy
6 Known biopsy-proven malignancy
Schaekermann, M., Beaton, G., Habib, M., Lim, A., Larson, K. and Law, E., 2019, May. “Capturing Expert Arguments from Medical Adjudication
Discussions in a Machine-readable Format”. In Companion Proceedings of The 2019 World Wide Web Conference (pp. 1131-1137).
13. ✓
PROBLEM
?
Schaekermann, M., 2020. “Human-AI Interaction in the Presence of Ambiguity: From Deliberation-based Labeling to Ambiguity-aware AI”.
14. MEDICAL IMAGE ASSESSMENT
Prior work in behavioral
sciences for medical
relation extraction
substantiate the
disagreement relations
between inter-variability
and intra-variability.
Dumitrache, A., Aroyo, L. and Welty, C., 2018. “Crowdsourcing Ground Truth for Medical Relation Extraction”. ACM Transactions on Interactive
Intelligent Systems (TiiS), 8(2), pp.1-20.
15. EXPERT DISAGREEMENT
Disagreement relations are addressed as a function of three phenomena:
1. Differences among clinical professionals, such as the medical background
of each clinical institution and bias;
2. Heterogeneous characteristics of the dataset to be analyzed, such as noisy
and heterogeneous modalities;
3. Nature of the diagnostic guidelines, such as the subjective and ambiguous
classification of the BI-RADS.
Schaekermann, M., Beaton, G., Habib, M., Lim, A., Larson, K. and Law, E., 2019. “Understanding expert disagreement in medical data analysis
through structured adjudication”. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW), pp.1-23.
16. PROBLEM
In medical imaging, Doctors
need to trust that AI is being
used safely, and for their
benefit during decision-
making.
17. MEDICAL EXPERIENCE
Interns Juniors Middles Seniors
Calisto, Francisco Maria, Nuno Nunes, and Jacinto C. Nascimento. "Modeling adoption of intelligent agents in medical imaging." International
Journal of Human-Computer Studies 168 (2022): 102922.
18. AGENTS
Strategies, such as adapting the agent
communication, to promote the unbiased
behavior per each category of medical
experience, improving medical
performance, AI perception and
experience.
19. MEDICAL ASSISTANCE
Radiologist fatigue levels and
performance are related to
environmental factors such as
number of False-Positives and
False-Negatives.
20. HUMAN-AI DELIBERATION
BI-RADS = 5
with 99.94%
of accuracy
BI-RADS = 4? BI-RADS = 5
Round 1: the
clinician
interprets the
image alone.
Round 2: the
clinician
interprets AI
suggestions.
Round 3: the
clinician
controls the
final result.
Calisto, F. M., Santiago, C., Nunes, N., & Nascimento, J. C. (2021). Introduction of human-centric AI assistant to aid radiologists for multimodal
breast image classification. International Journal of Human-Computer Studies, 150, 102607.
21. 52 clinicians
… from nine public and private medical institutions in
Portugal.
USER STUDIES
22. 491 patients
from a multimodality dataset of medical images.
Calisto, F. M., Santiago, C., Nunes, N., & Nascimento, J. C. (2022). BreastScreening-AI: Evaluating Medical Intelligent Agents for Human-AI
Interactions. Artificial Intelligence in Medicine, 127, 102285.
DATASET
24. 98%
… of clinicians do understand what the system is thinking.
USER EXPECTATIONS
Calisto, F. M., Santiago, C., Nunes, N., & Nascimento, J. C. (2022). BreastScreening-AI: Evaluating Medical Intelligent Agents for Human-AI
Interactions. Artificial Intelligence in Medicine, 127, 102285.
25. 93%
… trust on the system capability.
USER EXPECTATIONS
Calisto, F. M., Santiago, C., Nunes, N., & Nascimento, J. C. (2022). BreastScreening-AI: Evaluating Medical Intelligent Agents for Human-AI
Interactions. Artificial Intelligence in Medicine, 127, 102285.
26. INTER-VARIABILITY vs INTRA-VARIABILITY
Calisto, F. M., Santiago, C., Nunes, N., & Nascimento, J. C. (2022). BreastScreening-AI: Evaluating Medical Intelligent Agents for Human-AI
Interactions. Artificial Intelligence in Medicine, 127, 102285.
27. CLINICAL IMPACT: Clinician-AI vs Clinician-Only
Calisto, F. M., Santiago, C., Nunes, N., & Nascimento, J. C. (2022). BreastScreening-AI: Evaluating Medical Intelligent Agents for Human-AI
Interactions. Artificial Intelligence in Medicine, 127, 102285.
29. AMBIGUITY-AWARE (Future Work)
Schaekermann, M., Beaton, G., Sanoubari, E., Lim, A., Larson, K. and Law, E., 2020, April. “Ambiguity-aware AI Assistants for Medical Data
Analysis”. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (pp. 1-14).
» What is the potential benefit of communicating ambiguity in AI outputs?
» How ambiguity-aware AI can be implemented for breast cancer diagnosis?
» How to combine techniques to develop AI assistants capable of recognizing
and explaining ambiguous cases?
30. CONCLUSION
In this research work:
1. We identified how the different suggestion levels (i.e., more suggestive or
imposing the AI recommendations) will impact the radiologists' decision-
making process;
2. We are developing a system that enables clinicians to accept or reject the
AI breast analysis with an adaptive communication depending on the
levels of medical profession experience, such as novice or expert
clinicians;
3. Likewise, we will propose a series of recommendations for a human-
centered approach around personalizable and customizable AI in breast
cancer diagnosis;
31.
32. “
Into whatsoever houses I enter, I will enter to
help the sick, and I will abstain from all
intentional wrong-doing and harm, especially
from abusing the bodies of man or woman,
bond or free.
- Hippocratic
Hello, my name is Francisco Calisto and I am presenting our work in progress, titled as the “Personalizing and Customizing AI Explanations for Clinicians” applied to breast cancer.
… I am a researcher from Portugal and currently a visiting scholar of the Human-Computer Interaction Institute at CMU.
While being involved with a great team from Human-Computer Interaction researchers to Machine Learning and Radiologists…
… we are trying to surpass some of the breast cancer challenges.
Every year about 10 million women die from breast cancer. They can be our grandmothers, wifes or even our daughters.
These numbers are just 10% of suspected women from having breast cancer, but…
… radiologists are yielding about 40% of false-positives, which can lead women to unnecessary biopsies.
The cancer burden can be reduced through early detection of cancer, as well as management of patients who develop cancer.
For an early detection, breast cancer is usually diagnosed with several medical imaging modalities.
Across this multimodality workflow, clinicians perform a kind of loop for the inspection of lesions.
From here, clinicians are using the BI-RADS for classification of the lesion severity. However, the severity classification is not trivial and consensus is not always achieved.
AI can help on this…
… but the “black box” nature of AI introduces a large element of opacity into decision-making. Such problem can be solved by the introduction of eXplainable-AI (XAI) techniques.
Prior work in behavioral sciences for medical relation extraction substantiate the disagreement relations between inter-variability and intra-variability.
Disagreement relations are addressed as a function of three phenomena: (1) differences among clinical professionals, such as the medical background and bias; (2) heterogeneous characteristics of the dataset, such as noisy and heterogeneous modalities; and (3) the nature of the diagnostic guidelines, with subjective and ambiguous classifications of the BI-RADS. In fact, clinical experts often rely on complex viewing technology to inspect medical data.
In medical imaging, Doctors need to trust that AI is being used safely, and for their benefit during decision-making. Intrinsically, humans feel the need to understand how decisions are made.
Specifically, decision-making on the radiology reading room is done by professionals with different medical experiences and clinical profile characteristics.
We expect that strategies, such as adapting the agent communication by personalizing and customizing the AI explanations to each clinician, will promote the unbiased behaviour per each category of professional experience, improving the rates of false-positives and false-negatives during diagnostic.
… which will influence the burning rates of clinicians.
Since AI models are developed and measured using a pipeline of several characteristics, we studied the deliberation process of clinicians during diagnosis without and with the introduction of an intelligent agent. So that we can understand the expectation levels of tolerance on satisfaction and acceptance, as well as tolerance of the model accuracy.
Currently, our user studies are involving 52 clinicians from nine public and private medical institutions in Portugal.
We are training our AI models with four hundred ninety one patients from a multimodality dataset of medical images, such as MammoGrams, UltraSound and MRI.
Our demo hour…
While using our agent, 98% of the 52 clinicians answered that they do understand what the AI system is thinking...
… and 93% trust on the system capability.
We also divided our participant results into groups of inter-variability and intra-variability. Not only, the inter-variability was reduced between groups of patients, but also the intra-variability was reduced for the groups of interns, juniors, middles and seniors…
… as well as, we could improve the final clinician’s performance due to the Clinician-AI diagnostic.
With this studies, we could understand the behaviour characteristics of each medical group, while diagnosing the different groups of patients. Now, we have information from whom and when should the intelligent agent provide a suggestion and how an explanation will influence interpretability and the decision-making.
Indeed, as humans, radiologists are exposed to fatigue levels, where performance is related to environmental factors, such as the number of working hours. As a future direction, we will study and implement a system that will give the hardest patient cases in the beginning of the clinician’s shift, while the most trivial cases are diagnosed in the end of the shift. For that, we will need to study and implement an ambiguous-aware AI system that will classify all cases in our dataset as more or less ambiguous to diagnose. This idea will transform the way radiologists are screening and diagnosing breast cancer.
To conclude, we identified how different levels of suggestions will impact the radiologists' decision-making process. We developed a system that enables clinicians to accept or reject the AI breast analysis by adapting the communication through personalization and customization to each medical professional experience. We propose a series of recommendations and future directions for the development of intelligent agents in breast cancer diagnosis.
During this presentation, about 5 women died for breast cancer, while other 10 went to unnecessary biopsies in Portugal, making our NHS lose 50 thousand euros in just 10 minutes…
I humbly ask the help of the most noble professions, healthcare professionals, to help us cross this path together!