The document summarizes a seminar presentation given by Maarten van Smeden on algorithm based medicine and machine learning. Some key points made in the presentation include: the terminology of artificial intelligence often refers to machine learning or algorithms in medical research; examples are given of areas where machine learning has performed well, such as detecting diabetic retinopathy and lymph node metastases; examples are also provided of where machine learning has done poorly, such as predicting recidivism and mortality; and the sources of prediction error from machine learning models are discussed.
Development and evaluation of prediction models: pitfalls and solutionsMaarten van Smeden
Slides for the statistics in practice session for the Biometrisches Kolloqium (organized by the Deutsche Region der Internationalen Biometrischen Gesellschaft), 18 March 2021
How to combine results from randomised clinical trials on the additive scale with real world data to provide predictions on the clinically relevant scale for individual patients
Improving epidemiological research: avoiding the statistical paradoxes and fa...Maarten van Smeden
Keynote at Norwegian Epidemiological Association conference, October 26 2022. Discussing absence of evidence fallacy, Table 2 fallacy, Winner's curse and Stein's paradox.
Improving predictions: Lasso, Ridge and Stein's paradoxMaarten van Smeden
Slides of masterclass "Improving predictions: Lasso, Ridge and Stein's paradox" at the (Dutch) National Institute for Public Health and the Environment (RIVM)
Unfortunately, some have interpreted Numbers Needed to Treat as indicating the proportion of patients on whom the treatment has had a causal effect. This interpretation is very rarely, if ever, necessarily correct. It is certainly inappropriate if based on a responder dichotomy. I shall illustrate the problem using simple causal models.
One also sometimes encounters the claim that the extent to which two distributions of outcomes overlap from a clinical trial indicates how many patients benefit. This is also false and can be traced to a similar causal confusion.
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Ewout Steyerberg
Title"Clinical prediction models in the age of artificial intelligence and big data", presented at the Basel Biometrics Society seminar Nov 1, 2019, Basel, by Ewout Steyerberg, with substantial inout from Maarten van Smeden and Ben van Calster
Clinical trials are about comparability not generalisability V2.pptxStephenSenn2
Lecture delivered at the September 2022 EFSPI meeting in Basle in which I argued that the patients in a clinical trial should not be viewed as being a representative sample of some target population.
Whatever happened to design based inferenceStephenSenn2
Given as the Sprott lecture, University of Waterloo September 2022
Abstract
What exactly should we think about appropriate analyses for designed experiments and why? If conditional inference trumps marginal inference, why should we care about randomisation? Isn’t everything just modelling? The Rothamsted School held that design matters. Taking an example of applying John Nelder’s general balance approach to a notorious problem, Lord’s paradox, I shall show that there may be some lessons for two fashionable topics: causal analysis and big data. I shall conclude that if we want not only to make good estimates but estimate how good our estimates are, design does matter.
Development and evaluation of prediction models: pitfalls and solutionsMaarten van Smeden
Slides for the statistics in practice session for the Biometrisches Kolloqium (organized by the Deutsche Region der Internationalen Biometrischen Gesellschaft), 18 March 2021
How to combine results from randomised clinical trials on the additive scale with real world data to provide predictions on the clinically relevant scale for individual patients
Improving epidemiological research: avoiding the statistical paradoxes and fa...Maarten van Smeden
Keynote at Norwegian Epidemiological Association conference, October 26 2022. Discussing absence of evidence fallacy, Table 2 fallacy, Winner's curse and Stein's paradox.
Improving predictions: Lasso, Ridge and Stein's paradoxMaarten van Smeden
Slides of masterclass "Improving predictions: Lasso, Ridge and Stein's paradox" at the (Dutch) National Institute for Public Health and the Environment (RIVM)
Unfortunately, some have interpreted Numbers Needed to Treat as indicating the proportion of patients on whom the treatment has had a causal effect. This interpretation is very rarely, if ever, necessarily correct. It is certainly inappropriate if based on a responder dichotomy. I shall illustrate the problem using simple causal models.
One also sometimes encounters the claim that the extent to which two distributions of outcomes overlap from a clinical trial indicates how many patients benefit. This is also false and can be traced to a similar causal confusion.
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Ewout Steyerberg
Title"Clinical prediction models in the age of artificial intelligence and big data", presented at the Basel Biometrics Society seminar Nov 1, 2019, Basel, by Ewout Steyerberg, with substantial inout from Maarten van Smeden and Ben van Calster
Clinical trials are about comparability not generalisability V2.pptxStephenSenn2
Lecture delivered at the September 2022 EFSPI meeting in Basle in which I argued that the patients in a clinical trial should not be viewed as being a representative sample of some target population.
Whatever happened to design based inferenceStephenSenn2
Given as the Sprott lecture, University of Waterloo September 2022
Abstract
What exactly should we think about appropriate analyses for designed experiments and why? If conditional inference trumps marginal inference, why should we care about randomisation? Isn’t everything just modelling? The Rothamsted School held that design matters. Taking an example of applying John Nelder’s general balance approach to a notorious problem, Lord’s paradox, I shall show that there may be some lessons for two fashionable topics: causal analysis and big data. I shall conclude that if we want not only to make good estimates but estimate how good our estimates are, design does matter.
The absence of a gold standard: a measurement error problemMaarten van Smeden
Talk about gold standard problems and solutions in medicine and epidemiology. Invited by the department of infectious disease epidemiology, University Medical Center Utrecht
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Algorithm based medicine: old statistics wine in new machine learning bottles?
1. Maarten van Smeden, PhD
Interdisciplinary Medical & Health
Seminar, Ghent University
30 Septemberl 2021
Algorithm based medicine: old statistics
wine in new machine learning bottles?
3. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Terminology
In medical research, “artificial intelligence” usually
just means “machine learning” or “algorithm”
9. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Tech company business model
10. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Tech company business model
https://bit.ly/2HSp8X5; https://bit.ly/2Z0Pfop; https://bit.ly/2KIcpHG; https://bit.ly/33IJhr9
11. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Other success stories
https://go.nature.com/2VG2hS7; https://bbc.in/2Z1drXQ; https://bit.ly/2TAfRIP
12. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
IBM Watson winning Jeopardy! (2011)
https://bbc.in/2TMvV8I
13. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
IBM Watson for oncology
https://bit.ly/2LxiWGj
16. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
“As of today, we have deployed the system in 16 hospitals, and
it is performing over 1,300 screenings per day”
MedRxiv pre-print only, 23 March 2020,
doi.org/10.1101/2020.03.19.20039354
27. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
“Everything is an ML method”
https://bit.ly/2lEVn33
28. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
“ML methods come from computer science”
https://bit.ly/2zhbwPv; https://stanford.io/2TVp1xK; https://stanford.io/2ZfED0k
Leo Breiman Jerome H Friedman Trevor Hastie
CART, random forest Gradient boosting Elements of statistical learning
Education Physics/Math Physics Statistics
Job title Professor of Statistics Professor of Statistics Professor of Statistics
29. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
“ML methods for prediction, statistics for explaining”
1See further: Kreiff and Diaz Ordaz; https://bit.ly/2m1eYdK
ML and causal inference, small selection1
• Superlearner (e.g. van der Laan)
• High dimensional propensity scores (e.g. Schneeweiss)
• The book of why (Pearl)
30. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Two cultures
Breiman, Stat Sci, 2001, DOI: 10.1214/ss/1009213726
31. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Statistics Machine learning
Covariates Features
Outcome variable Target
Model Network, graphs
Parameters Weights
Model for discrete var. Classifier
Model for continuous var. Regression
Log-likelihood Cross-entropy loss
Multinomial regression Softmax
Measurement error Noise
Subject/observation Sample/instance
Dummy coding One-hot encoding
Measurement invariance Concept drift
Statistics Machine learning
Prediction Supervised learning
Latent variable modeling Unsupervised learning
Fitting Learning
Prediction error Error
Sensitivity Recall
Positive predictive value Precision
Contingency table Confusion matrix
Measurement error model Noise-aware ML
Structural equation model Gaussian Bayesian network
Gold standard Ground truth
Derivation–validation Training–test
Experiment A/B test
Adapted from Daniel Obserski: https://bit.ly/2YN12Xf and Robert Tibshirani: https://stanford.io/2zqEGfr
Language
32. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Robert Tibshirani: https://stanford.io/2zqEGfr
Machine learning: large grant = $1,000,000
Statistics: large grant = $50,000
33. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
ML refers to a culture, not to methods
Distinguishing between statistics and machine learning
• Substantial overlap methods used by both cultures
• Substantial overlap analysis goals
• Attempts to separate the two frequently result in disagreement
Pragmatic approach:
I’ll use “ML” to refer to models roughly outside of the traditional regression
types of analysis: decision trees (and descendants), SVMs, neural networks
(including Deep learning), boosting etc.
34. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Beam & Kohane, JAMA, 2018, doi : 10.1001/jama.2017.18391
37. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Example: retinal disease
Gulshan et al, JAMA, 2016, 10.1001/jama.2016.17216; Picture retinopathy: https://bit.ly/2kB3X2w
Diabetic retinopathy
Deep learning (= Neural network)
• 128,000 images
• Transfer learning (preinitialization)
• Sensitivity and specificity > .90
• Estimated from training data
38. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Example: lymph node metastases
Bejnordi et al, JAMA, 2018, doi: 10.1001/jama.2017.14585. See our letter to the editor for a critical discussion: https://bit.ly/2kcYS0e
Deep learning competition
But:
• 390 teams signed up, 23 submitted
• “Only” 270 images for training
• Test AUC range: 0.56 to 0.99
41. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Primary outcome: time to TB treatment.
Time to TB treatment lowered from a median of 11 days in
standard of care to 1 day with computer aided X-ray screening
47. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Skin cancer and rulers
Esteva et al., Nature, 2016, DOI: 10.1038/nature21056; https://bit.ly/2lE0vV0
48.
49. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Predicting mortality – the conclusion
PlosOne, 2018, DOI: 10.1371/journal.pone.0202344
50. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Predicting mortality – the results
PlosOne, 2018, DOI: 10.1371/journal.pone.0202344
51. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Predicting mortality – the media
PlosOne, 2018, DOI: 10.1371/journal.pone.0202344; https://bit.ly/2Q6H41R; https://bit.ly/2m3RLrn
53. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Systematic review clinical prediction models
Christodoulou et al. Journal of Clinical Epidemiology, 2019, doi: 10.1016/j.jclinepi.2019.02.004
54. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Sources of prediction error
Y = 𝑓 𝑥 + 𝜀
For a model 𝑘 the expected test prediction error is:
σ!
+ bias! -
𝑓" 𝑥 + var -
𝑓" 𝑥
See equation 2.46 in Hastie et al., the elements of statistical learning, https://stanford.io/2voWjra
Irreducible error Mean squared prediction error
(with E 𝜀 = 0, var 𝜀 = 𝜎!
, values in 𝑥 are not random)
What we don’t model How we model
≈
≈
55. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Sources of prediction error
Y = 𝑓 𝑥 + 𝜀
For a model 𝑘 the expected test prediction error is:
σ!
+ bias! -
𝑓" 𝑥 + var -
𝑓" 𝑥
See equation 2.46 in Hastie et al., the elements of statistical learning, https://stanford.io/2voWjra
Irreducible error Mean squared prediction error
(with E 𝜀 = 0, var 𝜀 = 𝜎!
, values in 𝑥 are not random)
What we don’t model How we model
≈
≈
In words, two main components for error in predictions are:
• Mean squared predictor error
• Under control of the modeler
56. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Sources of prediction error
Y = 𝑓 𝑥 + 𝜀
For a model 𝑘 the expected test prediction error is:
σ!
+ bias! -
𝑓" 𝑥 + var -
𝑓" 𝑥
See equation 2.46 in Hastie et al., the elements of statistical learning, https://stanford.io/2voWjra
Irreducible error Mean squared prediction error
(with E 𝜀 = 0, var 𝜀 = 𝜎!
, values in 𝑥 are not random)
What we don’t model How we model
≈
≈
In words, two main components for error in predictions are:
• Mean squared predictor error
• Under control of the modeler
overfitting underfitting ”just right”
57. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Sources of prediction error
Y = 𝑓 𝑥 + 𝜀
For a model 𝑘 the expected test prediction error is:
σ!
+ bias! -
𝑓" 𝑥 + var -
𝑓" 𝑥
See equation 2.46 in Hastie et al., the elements of statistical learning, https://stanford.io/2voWjra
Irreducible error Mean squared prediction error
(with E 𝜀 = 0, var 𝜀 = 𝜎!
, values in 𝑥 are not random)
What we don’t model How we model
≈
≈
In words, two main components for error in predictions are:
• Mean squared predictor error
• Under control of the modeler
• Irreducible error
• Not under direct control of the modeler
59. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Irreducible error is often large
• Health and lack thereof complex to measure (‘no gold standard’)
• Predictors of diseases are often imperfectly and partly
measured
• We often don’t know all the causal mechanisms at play
• much easier to predict if you know the causal mechanisms!
• “Prediction is very difficult, especially if it’s about the future!”
(Niels Bohr might have said this first)
Courtesy Cecile Janssens: https://bit.ly/2Jf5ft6
60. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
What can we do to reduce “irreducible” error?
• Changing the information
• Prognostication by text mining electronic health records
• e.g. predicting life expectancy
https://bit.ly/2k8Ao8e
• Analyzing social media posts
• e.g. pharmacovigilance, adverse events monitoring via Twitter posts
https://bit.ly/2m0KKrg
• Speech signal processing
• e.g. Parkinson‟s disease,
https://bit.ly/2v3ZdHR
• Medical imaging
63. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Flexible algorithms are data hungry
From slide deck Ben van Calster: https://bit.ly/38Aqmjs
64. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Flexible algorithms are energy hungry
The costs of running (cloud computing) the Transformer
algorithm are estimated at 1 to 3 million Dollars
https://bit.ly/33Dj38X
66. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Algorithm based medicine
• Algorithms are high maintenance
• Developed models need repeated testing and updating to
remain useful over time and place
• Many new barriers: black box proprietary algorithms,
computing costs
• Regulation and quality control of algorithms
• Algorithms need testing, preferably in experimental fashion
67. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
https://twitter.com/DrHughHarvey/status/1230218991026819077
68. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Old statistics wine in new machine learning bottles?
Lots of…
• Hype
• Rebranding traditional analysis as ML and AI
• Methodological reinventions
• Traditional issues such as low sample size, lack of adequate
validation, poor reporting
Also, real developments in…
• Methods and architectures, allowing for modeling (unstructured)
data that could previously not easily be used
• Software
• Computing power
• Clinical trials showing benefit of AI assistance
69. Ghent, 30 September 2021 Twitter: @MaartenvSmeden
Pipeline of algorithmic medicine failure