Machine Learning in Biology
Applications, Opportunities and Challenges
Pranavathiyani G
PhD Student
Centre for Bioinformatics
Pondicherry University
Introduction
Opportunities
Background
Applications
Limitations
Learning
resources
Google trends
https://trends.google.com/trends/
AI – ML - DL
History
Turing Test
What is Machine Learning?
Types of ML
Machine Learning Vs Deep Learning
Why ML?
●More data – More complexity
E.g.: Big data in Biology
Neural Network
●According to Wikipedia “An artificial neural network is an
interconnected group of nodes, akin to the vast network of
neurons in a brain. Here, each circular node represents an
artificial neuron and an arrow represents a connection from the
output of one artificial neuron to the input of another.”
Deep Learning
Deep learning refers to artificial neural networks
that are composed of many layers.
Backpropagation
Backpropagation is a method used in artificial neural
networks to calculate a gradient that is needed in the
calculation of the weights to be used in the network.
The backpropagation algorithm is used to find a local
minimum of the error function. The network is
initialized with randomly chosen weights. The gradient
of the error function is computed and used to correct the
initial weights. Our task is to compute this gradient
recursively.
Gradient Descent
Deep Learning
Generative adversarial networks
A Convolutional Neural Network (CNN) is comprised of one or more
convolutional layers (often with a subsampling step) and then followed
by one or more fully connected layers as in a standard multilayer
neural network.
Where can we use ML?
ML applications
in Bioinformatics
https://www.opentargets.org/ https://www.targetvalidation.org/
• Open Targets Platform is a comprehensive and robust data integration for
access to and visualisation of potential drug targets associated with disease.
• A drug target can be a protein, protein complex or RNA molecule and it’s
displayed by its gene name from the Human Gene Nomenclature
Committee, HGNC. It integrates all the evidence to the target
using Ensembl stable IDs and describe relationships between diseases by
mapping them to Experimental Factor Ontology (EFO) terms.
• The Platform supports workflows starting from a target or disease, and
shows the available evidence for target – disease associations. Target and
Disease profile pages showing specific information for both target (e.g
baseline expression) and disease (e.g. Disease Classification) are also
available.
Data driven research
DeepChem – a Deep Learning
Framework for Drug Discovery
https://deepchem.io/
A powerful new open source deep learning framework for drug discovery is
now available for public download on github. This new framework,
called DeepChem, is python-based, and offers a feature-rich set of
functionality for applying deep learning to problems in drug discovery and
cheminformatics. Previous deep learning frameworks, such as scikit-
learn have been applied to chemiformatics, but DeepChem is the first to
accelerate computation with NVIDIA GPUs.
The framework uses Google TensorFlow, along with scikit-learn, for
expressing neural networks for deep learning. It also makes use of
the RDKit python framework, for performing more basic operations on
molecular data, such as converting SMILES strings into molecular graphs.
An algorithm that can detect pneumonia from chest
X-rays at a level exceeding practicing radiologists
Pneumonia is responsible for more
than 1 million hospitalizations and
50,000 deaths per year in the US
alone.
Chest X-rays are currently the best
available method for diagnosing
pneumonia, playing a crucial role in
clinical care and epidemiological
studies.
Diagnosis
Methods
• The dataset, released by the NIH, contains 112,120 frontal-view
X-ray images of 30,805 unique patients, annotated with up to 14
different thoracic pathology labels using NLP methods on
radiology reports.
• The images were labeled that have pneumonia as one of the
annotated pathologies as positive examples and label all other
images as negative examples for the pneumonia detection task.
• A test set of 420 frontal chest X-rays. Annotations were obtained
independently from four practicing radiologists at Stanford
University, who were asked to label all 14 pathologies
Methods
• Evaluated the performance of an individual radiologist by using
the majority vote of the other 3 radiologists as ground truth.
Similarly, we evaluate CheXNet using the majority vote of 3 of 4
radiologists, repeated four times to cover all groups of 3.
Results
Results
Results
Conclusion
Where to learn?
What to do after learning?
Thank you

Machine learning in biology

  • 1.
    Machine Learning inBiology Applications, Opportunities and Challenges Pranavathiyani G PhD Student Centre for Bioinformatics Pondicherry University
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 8.
    What is MachineLearning?
  • 9.
  • 11.
    Machine Learning VsDeep Learning
  • 12.
    Why ML? ●More data– More complexity E.g.: Big data in Biology
  • 13.
    Neural Network ●According toWikipedia “An artificial neural network is an interconnected group of nodes, akin to the vast network of neurons in a brain. Here, each circular node represents an artificial neuron and an arrow represents a connection from the output of one artificial neuron to the input of another.”
  • 15.
    Deep Learning Deep learningrefers to artificial neural networks that are composed of many layers.
  • 16.
    Backpropagation Backpropagation is amethod used in artificial neural networks to calculate a gradient that is needed in the calculation of the weights to be used in the network. The backpropagation algorithm is used to find a local minimum of the error function. The network is initialized with randomly chosen weights. The gradient of the error function is computed and used to correct the initial weights. Our task is to compute this gradient recursively.
  • 17.
  • 18.
  • 19.
  • 20.
    A Convolutional NeuralNetwork (CNN) is comprised of one or more convolutional layers (often with a subsampling step) and then followed by one or more fully connected layers as in a standard multilayer neural network.
  • 21.
    Where can weuse ML?
  • 22.
  • 23.
    https://www.opentargets.org/ https://www.targetvalidation.org/ • OpenTargets Platform is a comprehensive and robust data integration for access to and visualisation of potential drug targets associated with disease. • A drug target can be a protein, protein complex or RNA molecule and it’s displayed by its gene name from the Human Gene Nomenclature Committee, HGNC. It integrates all the evidence to the target using Ensembl stable IDs and describe relationships between diseases by mapping them to Experimental Factor Ontology (EFO) terms. • The Platform supports workflows starting from a target or disease, and shows the available evidence for target – disease associations. Target and Disease profile pages showing specific information for both target (e.g baseline expression) and disease (e.g. Disease Classification) are also available. Data driven research
  • 26.
    DeepChem – aDeep Learning Framework for Drug Discovery https://deepchem.io/ A powerful new open source deep learning framework for drug discovery is now available for public download on github. This new framework, called DeepChem, is python-based, and offers a feature-rich set of functionality for applying deep learning to problems in drug discovery and cheminformatics. Previous deep learning frameworks, such as scikit- learn have been applied to chemiformatics, but DeepChem is the first to accelerate computation with NVIDIA GPUs. The framework uses Google TensorFlow, along with scikit-learn, for expressing neural networks for deep learning. It also makes use of the RDKit python framework, for performing more basic operations on molecular data, such as converting SMILES strings into molecular graphs.
  • 29.
    An algorithm thatcan detect pneumonia from chest X-rays at a level exceeding practicing radiologists
  • 30.
    Pneumonia is responsiblefor more than 1 million hospitalizations and 50,000 deaths per year in the US alone. Chest X-rays are currently the best available method for diagnosing pneumonia, playing a crucial role in clinical care and epidemiological studies. Diagnosis
  • 31.
    Methods • The dataset,released by the NIH, contains 112,120 frontal-view X-ray images of 30,805 unique patients, annotated with up to 14 different thoracic pathology labels using NLP methods on radiology reports. • The images were labeled that have pneumonia as one of the annotated pathologies as positive examples and label all other images as negative examples for the pneumonia detection task. • A test set of 420 frontal chest X-rays. Annotations were obtained independently from four practicing radiologists at Stanford University, who were asked to label all 14 pathologies
  • 32.
    Methods • Evaluated theperformance of an individual radiologist by using the majority vote of the other 3 radiologists as ground truth. Similarly, we evaluate CheXNet using the majority vote of 3 of 4 radiologists, repeated four times to cover all groups of 3.
  • 33.
  • 34.
  • 35.
  • 36.
  • 38.
  • 39.
    What to doafter learning?
  • 40.