An introduction to machine learning in biomedical research:
Key concepts, problems and applications
Francisco Azuaje, PhD.
Head of Bioinformatics and Modeling Research Group (BIOMOD)
Luxembourg Institute of Health (LIH)
Presentation for the ISCB Regional Student Group (RSG) - Luxembourg
7 November 2018
Today’s presentation:
• ML in biomedical research: concepts, approaches, applications
• A selection of recent, interesting examples
• Challenges
• ML software frameworks, and final remarks
Feel free to ask questions during or after the presentation
• Artificial Intelligence (AI), a field of computer science whose origins
can be dated back to the 1940s (Turing, 1950), aims to develop
computational systems with advanced analytical or predictive
capabilities.
What is Artificial Intelligence (AI) ?
• Machine Learning (ML) is the most successful branch of AI.
• “A computer program is said to learn from experience E with respect
to some class of tasks T and performance measure P, if its
performance at tasks in T, as measured by P, improves with
experience E” (Mitchell, 1997).
What is Machine Learning (ML)?
• ML is concerned with the development of programs with the capacity
to “learn” from diverse data sources. These programs can be used to
make predictions, help make decisions…etc.
Major learning “paradigms”:
• Supervised learning
• Unsupervised learning, semi-supervised learning
• Reinforcement learning
ML taps into recent and older research advances
• ML stands on the shoulders of probability theory, optimization algorithms, calculus, algebra…
• For example, Deep Learning (DL) builds on progress achieved in artificial neural networks over
decades. Key periods: 1980s and 2000s.
• DL is a sub-branch of ML: A diverse family of computational models consisting of many
(“deep”) data processing layers for automated feature extraction and pattern recognition in
large datasets.
• DL’s progress possible not only because of larger datasets and accelerated computing
capabilities, but also by developments in statistical learning theory, algorithms and open-
source software accumulated over the past four decades.
Major (supervised) learning approaches/techniques
• Generalizations of linear models for regression and classification
• Bayesian models: naïve, graphical models, probabilistic learning/programming
• Support vector machines (SVMs), kernel-based models
• Decision trees, random forests
• Gradient boosting machines (GBM)
• Neural networks, deep learning
• …..
(Topol, 2014, Cell)
(Eisenstein, 2015, Nature)
Biomedical research: larger and diverse datasets
High inter-individual variabilityDatasets change in time and space High intra-individual variability
(Hutter and Zenklusen, Cell, 2018)
Key example of “big data” in cancer research
Typical questions answered by ML using such and other datasets
“Fundamental” research “Applied” research
• What is the “behavior”,
“mechanism”…?
• Within a data layer, how are
samples or features related?
• How are different data layers
interrelated…?
• What if …?
• Why…?
• Risk assessment
• Diagnosis
• Prognosis
• Other clinical outcome prediction
• Prevention
• Drug targets
• Therapeutic strategies
Machine Learning
Examples of applications of ML in biomedicine
(Table adapted from Yu et al., Nat Bio Med Eng, 2018)
Koohy, 2018, F1000 Research
ML in biomedical research
Global usage of ML techniques Trends of ML techniques ( !(PCA & LRM) )
Trends of ML techniques
SVM
RF
DNN
Typical ML development cycle and applications
Supervised machine learning Unsupervised machine learning
Figures adapted from Yu et al., Nat Bio Med Eng, 2018
ML in biomedical research: Examples of model diversity and applications (1)
Large-scale phenotypic image analysis
Novelty/anomaly detection
Prediction of hard-to-discretize states
Image classification according to phenotypes
(e.g., here with Cell Profiler Analyst)
Smith et al., 2018, Cell Systems
ML in biomedical research: Examples of model diversity and applications (2)
Kermany et al., 2018, Cell
Medical Diagnoses with Transfer Image-Based Deep Learning
Retina images Retinal diseases
• DL system: Low classification errors, comparable to humans (on 1000
images)
• Strategy also successfully applied to analysis of chest X-ray images
• Potential “generalized” platform for image-based diagnoses (?)
ML in biomedical research: Examples of model diversity and applications (3)
Ambale-Venkatesh et al., Circ. Res., 2017
Random survival forests for predicting cardiovascular (CV) events
Variable importance for each of the 735 variables used in analysis
Variableimportanceis
measuredusingtheminimum
depthofthemaximalsubtree
• Accurate prediction of 6 CV outcomes (in
asymptomatic population).
• Subsets of predictive features for each
event.
• 12-year follow-up, multi-center, -ethnic,
wide age range.
Imaging, noninvasive
tests, questionnaires,
biomarker panels
Top-20 features
Key challenges of ML in biomedical research
Heterogeneity: Data, events, states,
within and between individuals…
Data not always “big”: relative lack of
labelled data, curse of dimensionality
Data: multi-layered, hierarchical
For same data type/layer: multiple
measurement platforms
Key challenges of ML in biomedical research (2)
Interpretability, understandability:
Global and local, novelty and consistency
with prior knowledge
Reproducibility:
Crucial requirement
“Gold standards”/”ground truth”:
Lack, limitations
Complexity of pattern recurrence,
regularities
ML software frameworks
• Different programing languages offer extensive libraries for
implementing ML: Python, R, Java, C++…
• Widely used frameworks: Scikit-learn, Keras, PyTorch,
TensorFlow, H2O
• Also “automated”, “driverless”…tools are available to get you
started.
Takeaways:
• Many ML challenges in BM research are shared by different
application domains, but this field poses its unique challenges.
• ML in BM research will continue advancing driven by: more data, new
expectations and emerging questions.
• Supervised learning, including e.g., deep learning, will meet many of
these needs, however: unbiased exploration, unsupervised learning,
hypothesis generation and interpretation are also crucial.
Thanks to:
Funding from:
Bioinformatics and Modelling Research
Group (BIOMOD)
Our research partners in Luxembourg and abroad

An introduction to machine learning in biomedical research: Key concepts, problems and applications

  • 1.
    An introduction tomachine learning in biomedical research: Key concepts, problems and applications Francisco Azuaje, PhD. Head of Bioinformatics and Modeling Research Group (BIOMOD) Luxembourg Institute of Health (LIH) Presentation for the ISCB Regional Student Group (RSG) - Luxembourg 7 November 2018
  • 2.
    Today’s presentation: • MLin biomedical research: concepts, approaches, applications • A selection of recent, interesting examples • Challenges • ML software frameworks, and final remarks Feel free to ask questions during or after the presentation
  • 3.
    • Artificial Intelligence(AI), a field of computer science whose origins can be dated back to the 1940s (Turing, 1950), aims to develop computational systems with advanced analytical or predictive capabilities. What is Artificial Intelligence (AI) ? • Machine Learning (ML) is the most successful branch of AI.
  • 4.
    • “A computerprogram is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E” (Mitchell, 1997). What is Machine Learning (ML)? • ML is concerned with the development of programs with the capacity to “learn” from diverse data sources. These programs can be used to make predictions, help make decisions…etc.
  • 5.
    Major learning “paradigms”: •Supervised learning • Unsupervised learning, semi-supervised learning • Reinforcement learning
  • 6.
    ML taps intorecent and older research advances • ML stands on the shoulders of probability theory, optimization algorithms, calculus, algebra… • For example, Deep Learning (DL) builds on progress achieved in artificial neural networks over decades. Key periods: 1980s and 2000s. • DL is a sub-branch of ML: A diverse family of computational models consisting of many (“deep”) data processing layers for automated feature extraction and pattern recognition in large datasets. • DL’s progress possible not only because of larger datasets and accelerated computing capabilities, but also by developments in statistical learning theory, algorithms and open- source software accumulated over the past four decades.
  • 7.
    Major (supervised) learningapproaches/techniques • Generalizations of linear models for regression and classification • Bayesian models: naïve, graphical models, probabilistic learning/programming • Support vector machines (SVMs), kernel-based models • Decision trees, random forests • Gradient boosting machines (GBM) • Neural networks, deep learning • …..
  • 8.
    (Topol, 2014, Cell) (Eisenstein,2015, Nature) Biomedical research: larger and diverse datasets High inter-individual variabilityDatasets change in time and space High intra-individual variability
  • 9.
    (Hutter and Zenklusen,Cell, 2018) Key example of “big data” in cancer research
  • 10.
    Typical questions answeredby ML using such and other datasets “Fundamental” research “Applied” research • What is the “behavior”, “mechanism”…? • Within a data layer, how are samples or features related? • How are different data layers interrelated…? • What if …? • Why…? • Risk assessment • Diagnosis • Prognosis • Other clinical outcome prediction • Prevention • Drug targets • Therapeutic strategies Machine Learning
  • 11.
    Examples of applicationsof ML in biomedicine (Table adapted from Yu et al., Nat Bio Med Eng, 2018)
  • 12.
    Koohy, 2018, F1000Research ML in biomedical research Global usage of ML techniques Trends of ML techniques ( !(PCA & LRM) ) Trends of ML techniques SVM RF DNN
  • 13.
    Typical ML developmentcycle and applications Supervised machine learning Unsupervised machine learning Figures adapted from Yu et al., Nat Bio Med Eng, 2018
  • 14.
    ML in biomedicalresearch: Examples of model diversity and applications (1) Large-scale phenotypic image analysis Novelty/anomaly detection Prediction of hard-to-discretize states Image classification according to phenotypes (e.g., here with Cell Profiler Analyst) Smith et al., 2018, Cell Systems
  • 15.
    ML in biomedicalresearch: Examples of model diversity and applications (2) Kermany et al., 2018, Cell Medical Diagnoses with Transfer Image-Based Deep Learning Retina images Retinal diseases • DL system: Low classification errors, comparable to humans (on 1000 images) • Strategy also successfully applied to analysis of chest X-ray images • Potential “generalized” platform for image-based diagnoses (?)
  • 16.
    ML in biomedicalresearch: Examples of model diversity and applications (3) Ambale-Venkatesh et al., Circ. Res., 2017 Random survival forests for predicting cardiovascular (CV) events Variable importance for each of the 735 variables used in analysis Variableimportanceis measuredusingtheminimum depthofthemaximalsubtree • Accurate prediction of 6 CV outcomes (in asymptomatic population). • Subsets of predictive features for each event. • 12-year follow-up, multi-center, -ethnic, wide age range. Imaging, noninvasive tests, questionnaires, biomarker panels Top-20 features
  • 17.
    Key challenges ofML in biomedical research Heterogeneity: Data, events, states, within and between individuals… Data not always “big”: relative lack of labelled data, curse of dimensionality Data: multi-layered, hierarchical For same data type/layer: multiple measurement platforms
  • 18.
    Key challenges ofML in biomedical research (2) Interpretability, understandability: Global and local, novelty and consistency with prior knowledge Reproducibility: Crucial requirement “Gold standards”/”ground truth”: Lack, limitations Complexity of pattern recurrence, regularities
  • 19.
    ML software frameworks •Different programing languages offer extensive libraries for implementing ML: Python, R, Java, C++… • Widely used frameworks: Scikit-learn, Keras, PyTorch, TensorFlow, H2O • Also “automated”, “driverless”…tools are available to get you started.
  • 20.
    Takeaways: • Many MLchallenges in BM research are shared by different application domains, but this field poses its unique challenges. • ML in BM research will continue advancing driven by: more data, new expectations and emerging questions. • Supervised learning, including e.g., deep learning, will meet many of these needs, however: unbiased exploration, unsupervised learning, hypothesis generation and interpretation are also crucial.
  • 21.
    Thanks to: Funding from: Bioinformaticsand Modelling Research Group (BIOMOD) Our research partners in Luxembourg and abroad