SlideShare a Scribd company logo
1 of 26
Download to read offline
Machine learning in Rare diseases
Qaisar Anoosha
Ratti Matteo
Grazioso Francesca
Mascherpa Margaret
Molinari Silvia
Introduction of rare diseases
✓ According to the Orphan Drug Act4 of the USA, diseases
or conditions that impact less than 200,000 people in the USA
are considered to be rare diseases.
✓ The European Union defines a disease as rare when it affects
less than 1 in 2,000 people.
✓ Identified and included 211 human studies from 32 countries in
the systematic review. Encompassed a 10-year timeframe to
provide a comprehensive overview.1
✓ ML applications covered 74 distinct rare diseases.1
✓ Conditions affecting a small percentage of the population.
✓ Mostly rare diseases have no available treatment .
Introduction Of Machine Learning
Machine learning:
• ML identifies patterns and informative groups within data, enabling insights that may not be
apparent through manual analysis.
• Predictive models in ML make it possible to forecast future trends and outcomes based on the
learned patterns.
Challenges in Machine Learning:
• Addresses overfitting, bias, and emphasizes model interpretability, particularly challenging with
small datasets in rare diseases.
• Small-sample datasets lack statistical power, leading to misinterpretation and unstable ML
performance.
ML applications in rare diseases
• Identifying Rare Disease Patients: Analysing high-dimensional data, including electronic health
records and genetics, to predict rare disease presence by correlating features with patient
phenotypes
• Drug Discovery:. ML identifies potential drug candidates using unsupervised/supervised algorithms
on genetic and molecular data. Knowledge graphs and genomic data enhance target identification,
with database analysis revealing therapeutic candidates
• Clinical Trial Design Improvement:Utilizing unsupervised ML for refined study design and patient
subgroup identification, and supervised ML to predict drug response, improving overall trial efficiency
• Patient Prognosis Prediction:Addressing gaps in rare disease understanding, supervised ML identify
biomarkers and clinical features for predicting adverse outcomes. This facilitates patient stratification
for early, aggressive interventions
Support Vector Machines
(SVM):
algorithm effective in high-
dimensional spaces. Accura
tely classifies cases, aids
precise diagnoses, especially
in complex datasets.
Clustering Algorithms
(e.g., K-Means):Groups
similar data
points. Identifies
subgroups
for personalized
treatments, aiding
targeted analysis.
Neural Network:
Computational model
learning complex
patterns from data, aiding
understanding and
outcome prediction in rare
disease complexities.
Bayesian Methods:
Applies Bayes’
theorem, Enhances
decision-making with
limited data, crucial for
effective uncertainty
management.
Principal Component
Analysis (PCA): Reduces
dimensionality,
prevents
overfitting. Crucial for
analysing limited
datasets, identifying key
features.
Traditional methods of ML used in rare diseases
Supervised and Unsupervised learning
Supervised Learning :
• ML algorithms identify patterns from labeled data, often creating lower-
dimensional representations.
• Useful for classifying rare disease patients into subtypes based on molecular
profiling.
Unsupervised Learning :
• Learns patterns or features from unlabeled data.
• Applied to gene expression data to identify groups with similar molecular states.
How ML works – Supervised learning
Training set
Evaluation set
Test/Held-out set
Cross validation
Fit / generalizability
How ML works – Supervised learning
• Supervised ML models can be trained on electronic health records,
genetic data or medical images to identify potential new patients with
a rare disease.
• Supervised ML approaches can be used to predict drug response in
patients with rare diseases.
• Supervised ML algorithms can be useful in identifying factors
contributing to the risk of adverse outcomes or progression to
advanced disease in patients with rare diseases.
How ML works – Unsupervised learning
Minimization of within cluster variance
Maximization of between clusters variance
How ML works – Unsupervised learning
• ML unsupervised approaches can be useful in the design of clinical
trials (e.g., to identify subgroups of patients who are more likely to
respond well to a particular treatment).
• Unsupervised learning can identify new subtypes of rare diseases
using molecular and genetic data.
• Unsupervised techniques can find hypotetical biological patterns or
new therapeutic targets for diseases.
Limitations of ML in rare diseases
When dealing with rare diseases,
often feature space is much larger than sample space
Poor generalizability of ML model
Classification of rare diseases often evolve over time
Not comparable labels (Label noise)
Decreased accuracy of ML model
Issues in constructing datasets for ML
Small sample size
in relation to
features
Low signal-to-
noise ratio
Technical variability
when combining
datasets Insufficient
representation of
variability or class
imbalance
• High data missingness (sparsity)
• More dissimilarity between samples
(variance)
• Increased redundancy
(multicollinearity)
• Uncomparable labels for
evolving classification
• Heterogeneity of both
genotypes and phenotypes
• Poor harmonization
• Heterogeneity of both
genotypes and phenotypes
Poor performance of ML model
(decreased accuracy, limited generalizability)
General solutions
Small
sample data
Low signal-
to-noise ratio
(intro of)
technical
variability
Insuff representation
of variability or class
imbalance
• Combining datasets
• Knowledge graph (KG)
• Transfer learning
• Regularization
(ridge, lasso....)
• Increase data quality • Random forests
• Cascade learning
• Class re-balancing techniques
Approaches that address or better tolerate the limitations of rare disease data
Combining multiple strategies
Solutions – Small sample size
Combine smaller datasets into a larger composite dataset
Multiple small,
rare disease
datasets*.
*The color of the​ samples suggests classes or groups;
The shape the origin of dataset.
PCA of the combined
datasets to verify
proper integration of​
samples in the larger
dataset
Solutions – Small sample size
Explore rare disease data alongside other existing knowledge
Different
data types
Knowledge graph (KG) is a network rappresentation of human knowledge.
It includes edges (links) and nodes and provides a framework for data
integration, unification, analytics and sharing.
Variety of possible applications
Solutions – Small sample size
Builds on prior knowledge and large volumes of related data
Applied to a small cell line dataset, the representations are
incomplete and correlate poorly with clinical or drugs.
Transfer learning
Features representing samples of a large dataset
*benchmark in image classification and object detection
Transfer learning is a ML approach where a model developed for a task is reused as the starting point for a model
on a second task. It can be supervised or unsupervised. One example, a model pretrained with natural images from
the ImageNet* dataset can potentially be used to classify medical images.
Solutions – Low signal-to-noise ratio
The issue of capturing relevant signals in a model
• Insufficient variables representation
• "Sparse" dataset
• Label ambiguity - overlap
• Variables fully represented
• Complete dataset
• Clear label definition
> SIGNAL
> NOISE
Solutions – Regularization
The presence of label noise and sparsity in the data can lead to poor generalizability
or overfitting.
Regularization approaches ensure a model can generalize to new, unseen data.
> NOISE
Solutions – Regularization
Examples of ML methods with regularization include ridge regression, lasso regression
and elastic net regression
> NOSE
From Founta et a. Gene targeting in amyotrophic lateral sclerosis using causality-based feature selection and machine learning.
Mol Med. 2023 Jan 24;29(1):12.
Example of
application --> to
select a few
informative genes as
features to include in
disease classifying
models
Solutions – data harmonization
Biological variability vs technical variability
• Normalization of raw values
• Batch correction methods
• Reprocessing
Class imbalance
Solutions
• Random forests
• Cascade learning
• Class re-balancing
techniques
Failure in fully
capturing the
sample variability
Poor generalizability
Class imbalance
• Random forests
• Cascade learning
• Class re-balancing
techniques
Solutions
Solutions
• Random forests
• Cascade learning
• Class re-balancing
techniques
inverse sampling probability weighting
inverse class frequency weighting
oversampling of rare classes
uniformly random undersampling of the majority class
Conclusions
Collectively, rare diseases are not ‘rare’
•1:2000
•6000 known
•10% of people worldwide
•Most affect children from birth
•Responsible for 35% of child deaths
•Includes all childhood cancers
•95% no available treatment
Machine learning for extracting disease-relevant patterns from high-dimensional datasets
Take home messages, present and future
State of art
Many tecniques to bind rare data
from different sources
Most ML methods for rare diseases
are for classification tasks
Developing comprehensive phenotypic –
genotypic databases:
•Domain expert collaboration
•Binding biobanks projects and patient
registries
•federated learning methods
(electronic healthcare records)
Future perspective
Developing methods investigating
biological variability:
•Focus on explainability of the model
•Representation learning –
regularization methods
•Robust error analysis
•Reliable methods finding anchors
THANK YOU FOR YOUR ATTENTION
References
• Schaefer J, Lehne M, Schepers J, Prasser F, Thun S. The use of machine
learning in rare diseases: a scoping review. Orphanet J Rare Dis. 2020
Jun 9;15(1):145. doi: 10.1186/s13023-020-01424-6. PMID: 32517778;
PMCID: PMC7285453.

More Related Content

Similar to ML in rare diseases

Prediction for breast cancer using various machine learning algorithms
Prediction for breast cancer using various machine learning algorithmsPrediction for breast cancer using various machine learning algorithms
Prediction for breast cancer using various machine learning algorithmsvishnuisahumanbeing
 
Dealing with imbalanced data sets.pdf
Dealing with imbalanced data sets.pdfDealing with imbalanced data sets.pdf
Dealing with imbalanced data sets.pdfNagaVarthini
 
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
 
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...IJDKP
 
Clinical Research Statistics for Non-Statisticians
Clinical Research Statistics for Non-StatisticiansClinical Research Statistics for Non-Statisticians
Clinical Research Statistics for Non-StatisticiansBrook White, PMP
 
Digital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineDigital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineJoel Saltz
 
Analysis of kinetic data
Analysis of kinetic dataAnalysis of kinetic data
Analysis of kinetic dataVineetha Menon
 
Amia tb-review-13
Amia tb-review-13Amia tb-review-13
Amia tb-review-13Russ Altman
 
Sample Size Estimation and Statistical Test Selection
Sample Size Estimation  and Statistical Test SelectionSample Size Estimation  and Statistical Test Selection
Sample Size Estimation and Statistical Test SelectionVaggelis Vergoulas
 
Project on disease prediction
Project on disease predictionProject on disease prediction
Project on disease predictionKOYELMAJUMDAR1
 
IRJET- Cancer Disease Prediction using Machine Learning over Big Data
IRJET- Cancer Disease Prediction using Machine Learning over Big DataIRJET- Cancer Disease Prediction using Machine Learning over Big Data
IRJET- Cancer Disease Prediction using Machine Learning over Big DataIRJET Journal
 
An introduction to machine learning in biomedical research: Key concepts, pr...
An introduction to machine learning in biomedical research:  Key concepts, pr...An introduction to machine learning in biomedical research:  Key concepts, pr...
An introduction to machine learning in biomedical research: Key concepts, pr...FranciscoJAzuajeG
 
Data science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxData science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxswapnaraghav
 
Draft AMCP 2006 Model Quality 4-4-06
Draft AMCP 2006 Model Quality 4-4-06Draft AMCP 2006 Model Quality 4-4-06
Draft AMCP 2006 Model Quality 4-4-06Joe Gricar, MS
 

Similar to ML in rare diseases (20)

Prediction for breast cancer using various machine learning algorithms
Prediction for breast cancer using various machine learning algorithmsPrediction for breast cancer using various machine learning algorithms
Prediction for breast cancer using various machine learning algorithms
 
Dealing with imbalanced data sets.pdf
Dealing with imbalanced data sets.pdfDealing with imbalanced data sets.pdf
Dealing with imbalanced data sets.pdf
 
Parkinson disease classification recorded v2.0
Parkinson disease classification recorded   v2.0Parkinson disease classification recorded   v2.0
Parkinson disease classification recorded v2.0
 
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
 
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
 
Parkinson disease classification v2.0
Parkinson disease classification v2.0Parkinson disease classification v2.0
Parkinson disease classification v2.0
 
Clinical Research Statistics for Non-Statisticians
Clinical Research Statistics for Non-StatisticiansClinical Research Statistics for Non-Statisticians
Clinical Research Statistics for Non-Statisticians
 
Digital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineDigital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision Medicine
 
Analysis of kinetic data
Analysis of kinetic dataAnalysis of kinetic data
Analysis of kinetic data
 
poster_Reza
poster_Rezaposter_Reza
poster_Reza
 
vaagdevi paper.pdf
vaagdevi paper.pdfvaagdevi paper.pdf
vaagdevi paper.pdf
 
Amia tb-review-13
Amia tb-review-13Amia tb-review-13
Amia tb-review-13
 
NTU-2019
NTU-2019NTU-2019
NTU-2019
 
Sample Size Estimation and Statistical Test Selection
Sample Size Estimation  and Statistical Test SelectionSample Size Estimation  and Statistical Test Selection
Sample Size Estimation and Statistical Test Selection
 
Project on disease prediction
Project on disease predictionProject on disease prediction
Project on disease prediction
 
Csit110713
Csit110713Csit110713
Csit110713
 
IRJET- Cancer Disease Prediction using Machine Learning over Big Data
IRJET- Cancer Disease Prediction using Machine Learning over Big DataIRJET- Cancer Disease Prediction using Machine Learning over Big Data
IRJET- Cancer Disease Prediction using Machine Learning over Big Data
 
An introduction to machine learning in biomedical research: Key concepts, pr...
An introduction to machine learning in biomedical research:  Key concepts, pr...An introduction to machine learning in biomedical research:  Key concepts, pr...
An introduction to machine learning in biomedical research: Key concepts, pr...
 
Data science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxData science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptx
 
Draft AMCP 2006 Model Quality 4-4-06
Draft AMCP 2006 Model Quality 4-4-06Draft AMCP 2006 Model Quality 4-4-06
Draft AMCP 2006 Model Quality 4-4-06
 

Recently uploaded

Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...mikehavy0
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...varanasisatyanvesh
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives23050636
 
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...LuisMiguelPaz5
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024patrickdtherriault
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444saurabvyas476
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjadimosmejiaslendon
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIf6x4zqzk86
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSSnehalVinod
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 

Recently uploaded (20)

Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AI
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
 

ML in rare diseases

  • 1. Machine learning in Rare diseases Qaisar Anoosha Ratti Matteo Grazioso Francesca Mascherpa Margaret Molinari Silvia
  • 2. Introduction of rare diseases ✓ According to the Orphan Drug Act4 of the USA, diseases or conditions that impact less than 200,000 people in the USA are considered to be rare diseases. ✓ The European Union defines a disease as rare when it affects less than 1 in 2,000 people. ✓ Identified and included 211 human studies from 32 countries in the systematic review. Encompassed a 10-year timeframe to provide a comprehensive overview.1 ✓ ML applications covered 74 distinct rare diseases.1 ✓ Conditions affecting a small percentage of the population. ✓ Mostly rare diseases have no available treatment .
  • 3. Introduction Of Machine Learning Machine learning: • ML identifies patterns and informative groups within data, enabling insights that may not be apparent through manual analysis. • Predictive models in ML make it possible to forecast future trends and outcomes based on the learned patterns. Challenges in Machine Learning: • Addresses overfitting, bias, and emphasizes model interpretability, particularly challenging with small datasets in rare diseases. • Small-sample datasets lack statistical power, leading to misinterpretation and unstable ML performance.
  • 4. ML applications in rare diseases • Identifying Rare Disease Patients: Analysing high-dimensional data, including electronic health records and genetics, to predict rare disease presence by correlating features with patient phenotypes • Drug Discovery:. ML identifies potential drug candidates using unsupervised/supervised algorithms on genetic and molecular data. Knowledge graphs and genomic data enhance target identification, with database analysis revealing therapeutic candidates • Clinical Trial Design Improvement:Utilizing unsupervised ML for refined study design and patient subgroup identification, and supervised ML to predict drug response, improving overall trial efficiency • Patient Prognosis Prediction:Addressing gaps in rare disease understanding, supervised ML identify biomarkers and clinical features for predicting adverse outcomes. This facilitates patient stratification for early, aggressive interventions
  • 5. Support Vector Machines (SVM): algorithm effective in high- dimensional spaces. Accura tely classifies cases, aids precise diagnoses, especially in complex datasets. Clustering Algorithms (e.g., K-Means):Groups similar data points. Identifies subgroups for personalized treatments, aiding targeted analysis. Neural Network: Computational model learning complex patterns from data, aiding understanding and outcome prediction in rare disease complexities. Bayesian Methods: Applies Bayes’ theorem, Enhances decision-making with limited data, crucial for effective uncertainty management. Principal Component Analysis (PCA): Reduces dimensionality, prevents overfitting. Crucial for analysing limited datasets, identifying key features. Traditional methods of ML used in rare diseases
  • 6. Supervised and Unsupervised learning Supervised Learning : • ML algorithms identify patterns from labeled data, often creating lower- dimensional representations. • Useful for classifying rare disease patients into subtypes based on molecular profiling. Unsupervised Learning : • Learns patterns or features from unlabeled data. • Applied to gene expression data to identify groups with similar molecular states.
  • 7. How ML works – Supervised learning Training set Evaluation set Test/Held-out set Cross validation Fit / generalizability
  • 8. How ML works – Supervised learning • Supervised ML models can be trained on electronic health records, genetic data or medical images to identify potential new patients with a rare disease. • Supervised ML approaches can be used to predict drug response in patients with rare diseases. • Supervised ML algorithms can be useful in identifying factors contributing to the risk of adverse outcomes or progression to advanced disease in patients with rare diseases.
  • 9. How ML works – Unsupervised learning Minimization of within cluster variance Maximization of between clusters variance
  • 10. How ML works – Unsupervised learning • ML unsupervised approaches can be useful in the design of clinical trials (e.g., to identify subgroups of patients who are more likely to respond well to a particular treatment). • Unsupervised learning can identify new subtypes of rare diseases using molecular and genetic data. • Unsupervised techniques can find hypotetical biological patterns or new therapeutic targets for diseases.
  • 11. Limitations of ML in rare diseases When dealing with rare diseases, often feature space is much larger than sample space Poor generalizability of ML model Classification of rare diseases often evolve over time Not comparable labels (Label noise) Decreased accuracy of ML model
  • 12. Issues in constructing datasets for ML Small sample size in relation to features Low signal-to- noise ratio Technical variability when combining datasets Insufficient representation of variability or class imbalance • High data missingness (sparsity) • More dissimilarity between samples (variance) • Increased redundancy (multicollinearity) • Uncomparable labels for evolving classification • Heterogeneity of both genotypes and phenotypes • Poor harmonization • Heterogeneity of both genotypes and phenotypes Poor performance of ML model (decreased accuracy, limited generalizability)
  • 13. General solutions Small sample data Low signal- to-noise ratio (intro of) technical variability Insuff representation of variability or class imbalance • Combining datasets • Knowledge graph (KG) • Transfer learning • Regularization (ridge, lasso....) • Increase data quality • Random forests • Cascade learning • Class re-balancing techniques Approaches that address or better tolerate the limitations of rare disease data Combining multiple strategies
  • 14. Solutions – Small sample size Combine smaller datasets into a larger composite dataset Multiple small, rare disease datasets*. *The color of the​ samples suggests classes or groups; The shape the origin of dataset. PCA of the combined datasets to verify proper integration of​ samples in the larger dataset
  • 15. Solutions – Small sample size Explore rare disease data alongside other existing knowledge Different data types Knowledge graph (KG) is a network rappresentation of human knowledge. It includes edges (links) and nodes and provides a framework for data integration, unification, analytics and sharing. Variety of possible applications
  • 16. Solutions – Small sample size Builds on prior knowledge and large volumes of related data Applied to a small cell line dataset, the representations are incomplete and correlate poorly with clinical or drugs. Transfer learning Features representing samples of a large dataset *benchmark in image classification and object detection Transfer learning is a ML approach where a model developed for a task is reused as the starting point for a model on a second task. It can be supervised or unsupervised. One example, a model pretrained with natural images from the ImageNet* dataset can potentially be used to classify medical images.
  • 17. Solutions – Low signal-to-noise ratio The issue of capturing relevant signals in a model • Insufficient variables representation • "Sparse" dataset • Label ambiguity - overlap • Variables fully represented • Complete dataset • Clear label definition > SIGNAL > NOISE
  • 18. Solutions – Regularization The presence of label noise and sparsity in the data can lead to poor generalizability or overfitting. Regularization approaches ensure a model can generalize to new, unseen data. > NOISE
  • 19. Solutions – Regularization Examples of ML methods with regularization include ridge regression, lasso regression and elastic net regression > NOSE From Founta et a. Gene targeting in amyotrophic lateral sclerosis using causality-based feature selection and machine learning. Mol Med. 2023 Jan 24;29(1):12. Example of application --> to select a few informative genes as features to include in disease classifying models
  • 20. Solutions – data harmonization Biological variability vs technical variability • Normalization of raw values • Batch correction methods • Reprocessing
  • 21. Class imbalance Solutions • Random forests • Cascade learning • Class re-balancing techniques Failure in fully capturing the sample variability Poor generalizability
  • 22. Class imbalance • Random forests • Cascade learning • Class re-balancing techniques Solutions Solutions • Random forests • Cascade learning • Class re-balancing techniques inverse sampling probability weighting inverse class frequency weighting oversampling of rare classes uniformly random undersampling of the majority class
  • 23. Conclusions Collectively, rare diseases are not ‘rare’ •1:2000 •6000 known •10% of people worldwide •Most affect children from birth •Responsible for 35% of child deaths •Includes all childhood cancers •95% no available treatment Machine learning for extracting disease-relevant patterns from high-dimensional datasets
  • 24. Take home messages, present and future State of art Many tecniques to bind rare data from different sources Most ML methods for rare diseases are for classification tasks Developing comprehensive phenotypic – genotypic databases: •Domain expert collaboration •Binding biobanks projects and patient registries •federated learning methods (electronic healthcare records) Future perspective Developing methods investigating biological variability: •Focus on explainability of the model •Representation learning – regularization methods •Robust error analysis •Reliable methods finding anchors
  • 25. THANK YOU FOR YOUR ATTENTION
  • 26. References • Schaefer J, Lehne M, Schepers J, Prasser F, Thun S. The use of machine learning in rare diseases: a scoping review. Orphanet J Rare Dis. 2020 Jun 9;15(1):145. doi: 10.1186/s13023-020-01424-6. PMID: 32517778; PMCID: PMC7285453.