Machine learning in biology

Machine Learning in Biology
Applications, Opportunities and Challenges
Pranavathiyani G
PhD Student
Centre for Bioinformatics
Pondicherry University

Introduction
Opportunities
Background
Applications
Limitations
Learning
resources

Google trends
https://trends.google.com/trends/

Machine Learning Vs Deep Learning

Why ML?
●More data – More complexity
E.g.: Big data in Biology

Neural Network
●According to Wikipedia “An artificial neural network is an
interconnected group of nodes, akin to the vast network of
neurons in a brain. Here, each circular node represents an
artificial neuron and an arrow represents a connection from the
output of one artificial neuron to the input of another.”

Deep Learning
Deep learning refers to artificial neural networks
that are composed of many layers.

Backpropagation
Backpropagation is a method used in artificial neural
networks to calculate a gradient that is needed in the
calculation of the weights to be used in the network.
The backpropagation algorithm is used to find a local
minimum of the error function. The network is
initialized with randomly chosen weights. The gradient
of the error function is computed and used to correct the
initial weights. Our task is to compute this gradient
recursively.

Generative adversarial networks

A Convolutional Neural Network (CNN) is comprised of one or more
convolutional layers (often with a subsampling step) and then followed
by one or more fully connected layers as in a standard multilayer
neural network.

ML applications
in Bioinformatics

https://www.opentargets.org/ https://www.targetvalidation.org/
• Open Targets Platform is a comprehensive and robust data integration for
access to and visualisation of potential drug targets associated with disease.
• A drug target can be a protein, protein complex or RNA molecule and it’s
displayed by its gene name from the Human Gene Nomenclature
Committee, HGNC. It integrates all the evidence to the target
using Ensembl stable IDs and describe relationships between diseases by
mapping them to Experimental Factor Ontology (EFO) terms.
• The Platform supports workflows starting from a target or disease, and
shows the available evidence for target – disease associations. Target and
Disease profile pages showing specific information for both target (e.g
baseline expression) and disease (e.g. Disease Classification) are also
available.
Data driven research

DeepChem – a Deep Learning
Framework for Drug Discovery
https://deepchem.io/
A powerful new open source deep learning framework for drug discovery is
now available for public download on github. This new framework,
called DeepChem, is python-based, and offers a feature-rich set of
functionality for applying deep learning to problems in drug discovery and
cheminformatics. Previous deep learning frameworks, such as scikit-
learn have been applied to chemiformatics, but DeepChem is the first to
accelerate computation with NVIDIA GPUs.
The framework uses Google TensorFlow, along with scikit-learn, for
expressing neural networks for deep learning. It also makes use of
the RDKit python framework, for performing more basic operations on
molecular data, such as converting SMILES strings into molecular graphs.

An algorithm that can detect pneumonia from chest
X-rays at a level exceeding practicing radiologists

Pneumonia is responsible for more
than 1 million hospitalizations and
50,000 deaths per year in the US
alone.
Chest X-rays are currently the best
available method for diagnosing
pneumonia, playing a crucial role in
clinical care and epidemiological
studies.
Diagnosis

Methods
• The dataset, released by the NIH, contains 112,120 frontal-view
X-ray images of 30,805 unique patients, annotated with up to 14
different thoracic pathology labels using NLP methods on
radiology reports.
• The images were labeled that have pneumonia as one of the
annotated pathologies as positive examples and label all other
images as negative examples for the pneumonia detection task.
• A test set of 420 frontal chest X-rays. Annotations were obtained
independently from four practicing radiologists at Stanford
University, who were asked to label all 14 pathologies

Methods
• Evaluated the performance of an individual radiologist by using
the majority vote of the other 3 radiologists as ground truth.
Similarly, we evaluate CheXNet using the majority vote of 3 of 4
radiologists, repeated four times to cover all groups of 3.

Machine learning in biology

More Related Content

What's hot

Similar to Machine learning in biology

More from Pranavathiyani G

Recently uploaded

Machine learning in biology