Excited to share insights from my recent presentation on leveraging machine learning in the diagnosis of rare diseases! 🌐💡 Exploring innovative approaches that could revolutionize early detection and personalized treatment. What are your thoughts on the intersection of AI and healthcare? Comment below! #MachineLearning #RareDiseases #HealthTech #Innovation 🔬🤖
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
ML in rare diseases
1. Machine learning in Rare diseases
Qaisar Anoosha
Ratti Matteo
Grazioso Francesca
Mascherpa Margaret
Molinari Silvia
2. Introduction of rare diseases
✓ According to the Orphan Drug Act4 of the USA, diseases
or conditions that impact less than 200,000 people in the USA
are considered to be rare diseases.
✓ The European Union defines a disease as rare when it affects
less than 1 in 2,000 people.
✓ Identified and included 211 human studies from 32 countries in
the systematic review. Encompassed a 10-year timeframe to
provide a comprehensive overview.1
✓ ML applications covered 74 distinct rare diseases.1
✓ Conditions affecting a small percentage of the population.
✓ Mostly rare diseases have no available treatment .
3. Introduction Of Machine Learning
Machine learning:
• ML identifies patterns and informative groups within data, enabling insights that may not be
apparent through manual analysis.
• Predictive models in ML make it possible to forecast future trends and outcomes based on the
learned patterns.
Challenges in Machine Learning:
• Addresses overfitting, bias, and emphasizes model interpretability, particularly challenging with
small datasets in rare diseases.
• Small-sample datasets lack statistical power, leading to misinterpretation and unstable ML
performance.
4. ML applications in rare diseases
• Identifying Rare Disease Patients: Analysing high-dimensional data, including electronic health
records and genetics, to predict rare disease presence by correlating features with patient
phenotypes
• Drug Discovery:. ML identifies potential drug candidates using unsupervised/supervised algorithms
on genetic and molecular data. Knowledge graphs and genomic data enhance target identification,
with database analysis revealing therapeutic candidates
• Clinical Trial Design Improvement:Utilizing unsupervised ML for refined study design and patient
subgroup identification, and supervised ML to predict drug response, improving overall trial efficiency
• Patient Prognosis Prediction:Addressing gaps in rare disease understanding, supervised ML identify
biomarkers and clinical features for predicting adverse outcomes. This facilitates patient stratification
for early, aggressive interventions
5. Support Vector Machines
(SVM):
algorithm effective in high-
dimensional spaces. Accura
tely classifies cases, aids
precise diagnoses, especially
in complex datasets.
Clustering Algorithms
(e.g., K-Means):Groups
similar data
points. Identifies
subgroups
for personalized
treatments, aiding
targeted analysis.
Neural Network:
Computational model
learning complex
patterns from data, aiding
understanding and
outcome prediction in rare
disease complexities.
Bayesian Methods:
Applies Bayes’
theorem, Enhances
decision-making with
limited data, crucial for
effective uncertainty
management.
Principal Component
Analysis (PCA): Reduces
dimensionality,
prevents
overfitting. Crucial for
analysing limited
datasets, identifying key
features.
Traditional methods of ML used in rare diseases
6. Supervised and Unsupervised learning
Supervised Learning :
• ML algorithms identify patterns from labeled data, often creating lower-
dimensional representations.
• Useful for classifying rare disease patients into subtypes based on molecular
profiling.
Unsupervised Learning :
• Learns patterns or features from unlabeled data.
• Applied to gene expression data to identify groups with similar molecular states.
7. How ML works – Supervised learning
Training set
Evaluation set
Test/Held-out set
Cross validation
Fit / generalizability
8. How ML works – Supervised learning
• Supervised ML models can be trained on electronic health records,
genetic data or medical images to identify potential new patients with
a rare disease.
• Supervised ML approaches can be used to predict drug response in
patients with rare diseases.
• Supervised ML algorithms can be useful in identifying factors
contributing to the risk of adverse outcomes or progression to
advanced disease in patients with rare diseases.
9. How ML works – Unsupervised learning
Minimization of within cluster variance
Maximization of between clusters variance
10. How ML works – Unsupervised learning
• ML unsupervised approaches can be useful in the design of clinical
trials (e.g., to identify subgroups of patients who are more likely to
respond well to a particular treatment).
• Unsupervised learning can identify new subtypes of rare diseases
using molecular and genetic data.
• Unsupervised techniques can find hypotetical biological patterns or
new therapeutic targets for diseases.
11. Limitations of ML in rare diseases
When dealing with rare diseases,
often feature space is much larger than sample space
Poor generalizability of ML model
Classification of rare diseases often evolve over time
Not comparable labels (Label noise)
Decreased accuracy of ML model
12. Issues in constructing datasets for ML
Small sample size
in relation to
features
Low signal-to-
noise ratio
Technical variability
when combining
datasets Insufficient
representation of
variability or class
imbalance
• High data missingness (sparsity)
• More dissimilarity between samples
(variance)
• Increased redundancy
(multicollinearity)
• Uncomparable labels for
evolving classification
• Heterogeneity of both
genotypes and phenotypes
• Poor harmonization
• Heterogeneity of both
genotypes and phenotypes
Poor performance of ML model
(decreased accuracy, limited generalizability)
13. General solutions
Small
sample data
Low signal-
to-noise ratio
(intro of)
technical
variability
Insuff representation
of variability or class
imbalance
• Combining datasets
• Knowledge graph (KG)
• Transfer learning
• Regularization
(ridge, lasso....)
• Increase data quality • Random forests
• Cascade learning
• Class re-balancing techniques
Approaches that address or better tolerate the limitations of rare disease data
Combining multiple strategies
14. Solutions – Small sample size
Combine smaller datasets into a larger composite dataset
Multiple small,
rare disease
datasets*.
*The color of the samples suggests classes or groups;
The shape the origin of dataset.
PCA of the combined
datasets to verify
proper integration of
samples in the larger
dataset
15. Solutions – Small sample size
Explore rare disease data alongside other existing knowledge
Different
data types
Knowledge graph (KG) is a network rappresentation of human knowledge.
It includes edges (links) and nodes and provides a framework for data
integration, unification, analytics and sharing.
Variety of possible applications
16. Solutions – Small sample size
Builds on prior knowledge and large volumes of related data
Applied to a small cell line dataset, the representations are
incomplete and correlate poorly with clinical or drugs.
Transfer learning
Features representing samples of a large dataset
*benchmark in image classification and object detection
Transfer learning is a ML approach where a model developed for a task is reused as the starting point for a model
on a second task. It can be supervised or unsupervised. One example, a model pretrained with natural images from
the ImageNet* dataset can potentially be used to classify medical images.
17. Solutions – Low signal-to-noise ratio
The issue of capturing relevant signals in a model
• Insufficient variables representation
• "Sparse" dataset
• Label ambiguity - overlap
• Variables fully represented
• Complete dataset
• Clear label definition
> SIGNAL
> NOISE
18. Solutions – Regularization
The presence of label noise and sparsity in the data can lead to poor generalizability
or overfitting.
Regularization approaches ensure a model can generalize to new, unseen data.
> NOISE
19. Solutions – Regularization
Examples of ML methods with regularization include ridge regression, lasso regression
and elastic net regression
> NOSE
From Founta et a. Gene targeting in amyotrophic lateral sclerosis using causality-based feature selection and machine learning.
Mol Med. 2023 Jan 24;29(1):12.
Example of
application --> to
select a few
informative genes as
features to include in
disease classifying
models
20. Solutions – data harmonization
Biological variability vs technical variability
• Normalization of raw values
• Batch correction methods
• Reprocessing
21. Class imbalance
Solutions
• Random forests
• Cascade learning
• Class re-balancing
techniques
Failure in fully
capturing the
sample variability
Poor generalizability
22. Class imbalance
• Random forests
• Cascade learning
• Class re-balancing
techniques
Solutions
Solutions
• Random forests
• Cascade learning
• Class re-balancing
techniques
inverse sampling probability weighting
inverse class frequency weighting
oversampling of rare classes
uniformly random undersampling of the majority class
23. Conclusions
Collectively, rare diseases are not ‘rare’
•1:2000
•6000 known
•10% of people worldwide
•Most affect children from birth
•Responsible for 35% of child deaths
•Includes all childhood cancers
•95% no available treatment
Machine learning for extracting disease-relevant patterns from high-dimensional datasets
24. Take home messages, present and future
State of art
Many tecniques to bind rare data
from different sources
Most ML methods for rare diseases
are for classification tasks
Developing comprehensive phenotypic –
genotypic databases:
•Domain expert collaboration
•Binding biobanks projects and patient
registries
•federated learning methods
(electronic healthcare records)
Future perspective
Developing methods investigating
biological variability:
•Focus on explainability of the model
•Representation learning –
regularization methods
•Robust error analysis
•Reliable methods finding anchors
26. References
• Schaefer J, Lehne M, Schepers J, Prasser F, Thun S. The use of machine
learning in rare diseases: a scoping review. Orphanet J Rare Dis. 2020
Jun 9;15(1):145. doi: 10.1186/s13023-020-01424-6. PMID: 32517778;
PMCID: PMC7285453.