An introduction to machine learning in biomedical research: Key concepts, problems and applications

An introduction to machine learning in biomedical research:
Key concepts, problems and applications
Francisco Azuaje, PhD.
Head of Bioinformatics and Modeling Research Group (BIOMOD)
Luxembourg Institute of Health (LIH)
Presentation for the ISCB Regional Student Group (RSG) - Luxembourg
7 November 2018

Today’s presentation:
• ML in biomedical research: concepts, approaches, applications
• A selection of recent, interesting examples
• Challenges
• ML software frameworks, and final remarks
Feel free to ask questions during or after the presentation

• Artificial Intelligence (AI), a field of computer science whose origins
can be dated back to the 1940s (Turing, 1950), aims to develop
computational systems with advanced analytical or predictive
capabilities.
What is Artificial Intelligence (AI) ?
• Machine Learning (ML) is the most successful branch of AI.

• “A computer program is said to learn from experience E with respect
to some class of tasks T and performance measure P, if its
performance at tasks in T, as measured by P, improves with
experience E” (Mitchell, 1997).
What is Machine Learning (ML)?
• ML is concerned with the development of programs with the capacity
to “learn” from diverse data sources. These programs can be used to
make predictions, help make decisions…etc.

Major learning “paradigms”:
• Supervised learning
• Unsupervised learning, semi-supervised learning
• Reinforcement learning

ML taps into recent and older research advances
• ML stands on the shoulders of probability theory, optimization algorithms, calculus, algebra…
• For example, Deep Learning (DL) builds on progress achieved in artificial neural networks over
decades. Key periods: 1980s and 2000s.
• DL is a sub-branch of ML: A diverse family of computational models consisting of many
(“deep”) data processing layers for automated feature extraction and pattern recognition in
large datasets.
• DL’s progress possible not only because of larger datasets and accelerated computing
capabilities, but also by developments in statistical learning theory, algorithms and open-
source software accumulated over the past four decades.

Major (supervised) learning approaches/techniques
• Generalizations of linear models for regression and classification
• Bayesian models: naïve, graphical models, probabilistic learning/programming
• Support vector machines (SVMs), kernel-based models
• Decision trees, random forests
• Gradient boosting machines (GBM)
• Neural networks, deep learning
• …..

(Topol, 2014, Cell)
(Eisenstein, 2015, Nature)
Biomedical research: larger and diverse datasets
High inter-individual variabilityDatasets change in time and space High intra-individual variability

(Hutter and Zenklusen, Cell, 2018)
Key example of “big data” in cancer research

Typical questions answered by ML using such and other datasets
“Fundamental” research “Applied” research
• What is the “behavior”,
“mechanism”…?
• Within a data layer, how are
samples or features related?
• How are different data layers
interrelated…?
• What if …?
• Why…?
• Risk assessment
• Diagnosis
• Prognosis
• Other clinical outcome prediction
• Prevention
• Drug targets
• Therapeutic strategies
Machine Learning

Examples of applications of ML in biomedicine
(Table adapted from Yu et al., Nat Bio Med Eng, 2018)

Koohy, 2018, F1000 Research
ML in biomedical research
Global usage of ML techniques Trends of ML techniques ( !(PCA & LRM) )
Trends of ML techniques
SVM
RF
DNN

Typical ML development cycle and applications
Supervised machine learning Unsupervised machine learning
Figures adapted from Yu et al., Nat Bio Med Eng, 2018

ML in biomedical research: Examples of model diversity and applications (1)
Large-scale phenotypic image analysis
Novelty/anomaly detection
Prediction of hard-to-discretize states
Image classification according to phenotypes
(e.g., here with Cell Profiler Analyst)
Smith et al., 2018, Cell Systems

Kermany et al., 2018, Cell
Medical Diagnoses with Transfer Image-Based Deep Learning
Retina images Retinal diseases
• DL system: Low classification errors, comparable to humans (on 1000
images)
• Strategy also successfully applied to analysis of chest X-ray images
• Potential “generalized” platform for image-based diagnoses (?)

Ambale-Venkatesh et al., Circ. Res., 2017
Random survival forests for predicting cardiovascular (CV) events
Variable importance for each of the 735 variables used in analysis
Variableimportanceis
measuredusingtheminimum
depthofthemaximalsubtree
• Accurate prediction of 6 CV outcomes (in
asymptomatic population).
• Subsets of predictive features for each
event.
• 12-year follow-up, multi-center, -ethnic,
wide age range.
Imaging, noninvasive
tests, questionnaires,
biomarker panels
Top-20 features

Key challenges of ML in biomedical research
Heterogeneity: Data, events, states,
within and between individuals…
Data not always “big”: relative lack of
labelled data, curse of dimensionality
Data: multi-layered, hierarchical
For same data type/layer: multiple
measurement platforms

Key challenges of ML in biomedical research (2)
Interpretability, understandability:
Global and local, novelty and consistency
with prior knowledge
Reproducibility:
Crucial requirement
“Gold standards”/”ground truth”:
Lack, limitations
Complexity of pattern recurrence,
regularities

ML software frameworks
• Different programing languages offer extensive libraries for
implementing ML: Python, R, Java, C++…
• Widely used frameworks: Scikit-learn, Keras, PyTorch,
TensorFlow, H2O
• Also “automated”, “driverless”…tools are available to get you
started.

Takeaways:
• Many ML challenges in BM research are shared by different
application domains, but this field poses its unique challenges.
• ML in BM research will continue advancing driven by: more data, new
expectations and emerging questions.
• Supervised learning, including e.g., deep learning, will meet many of
these needs, however: unbiased exploration, unsupervised learning,
hypothesis generation and interpretation are also crucial.

Thanks to:
Funding from:
Bioinformatics and Modelling Research
Group (BIOMOD)
Our research partners in Luxembourg and abroad

An introduction to machine learning in biomedical research: Key concepts, problems and applications

More Related Content

What's hot

Similar to An introduction to machine learning in biomedical research: Key concepts, problems and applications

Recently uploaded

An introduction to machine learning in biomedical research: Key concepts, problems and applications